mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@aerifal.cx>
To: musl@lists.openwall.com
Subject: Re: strcasestr.c
Date: Sun, 17 Feb 2013 14:04:52 -0500	[thread overview]
Message-ID: <20130217190452.GH20323@brightrain.aerifal.cx> (raw)
In-Reply-To: <20130214152349.GA20323@brightrain.aerifal.cx>

On Thu, Feb 14, 2013 at 10:23:49AM -0500, Rich Felker wrote:
> On Thu, Feb 14, 2013 at 09:59:56AM -0500, Todd C. Miller wrote:
> > When investigating using the musl strstr.c ofr OpenBSD I noticed
> > that musl only has a stub for strcasestr() that calls strstr().  I
> > was curious whether the twoway algorithm could be adapted to do a
> > case-insensitive search.  It turned out to be pretty trivial to
> > just add calls to tolower() in the right places, making sure to
> > avoid sign extension.
> > 
> > The changes are mostly mechanical.  You might wish to inline
> > _strcasechr() though the compiler will probably do that for you.
> 
> Unfortunately, as far as I can tell making this correct is nontrivial;
> your version only works for ascii, not the rest of ucs. I don't see
> any easy way to adapt 2way to the case where matching classes contain
> characters of different lengths...

To elaborate, since strcasestr is not a standard function with a
rigorous specification, I find it difficult to determine what the
"correct" behavior should be. It's only documented to "ignore the
case". Does this mean it should operate like a literal regex seach
with REG_ICASE? Or should it operate as if by using single-byte
tolower/toupper functions, in which case your code would be correct?

Actually we already have this problem for strcasecmp, even though it's
specified by POSIX. POSIX just says:

    When the LC_CTYPE category of the current locale is from the POSIX
    locale, strcasecmp() and strncasecmp() shall behave as if the
    strings had been converted to lowercase and then a byte comparison
    performed. Otherwise, the results are unspecified.

It seems POSIX clearly allows implementations to do whatever they like
in non-POSIX locale, but since musl is providing just a POSIX locale
for LC_CTYLE, we're under certain obligations. Note that XBD 7.2 POSIX
Locale allows us, as we're doing, to have character properties and
case mappings outside of those specified in POSIX as long as they're
for characters outside the portable character set:

    The tables in Locale Definition describe the characteristics and
    behavior of the POSIX locale for data consisting entirely of
    characters from the portable character set and the control
    character set. For other characters, the behavior is unspecified.
    For C-language programs, the POSIX locale shall be the default
    locale when the setlocale() function is not called.

Thus, it seems like since we have case mappings for all of Unicode,
strcasecmp needs to honor those, and thus the current implementation
is probably non-conforming.

Since strcasestr is nonstandard and not clearly specified, I don't see
any _obligation_ for it to do the same, but perhaps it should. Thus I
think my previous reply to your patch was premature; your code isn't
necessarily wrong, but it's a matter that needs further consideration
to decide what course of action is appropriate.

Comments from anybody else?

Rich


  reply	other threads:[~2013-02-17 19:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-14 14:59 strcasestr.c Todd C. Miller
2013-02-14 15:23 ` strcasestr.c Rich Felker
2013-02-17 19:04   ` Rich Felker [this message]
2013-02-20 22:28     ` strcasestr.c John Spencer
2013-02-20 23:56       ` strcasestr.c Szabolcs Nagy
2013-02-21  1:03       ` strcasestr.c Rich Felker
2013-02-21  1:30         ` strcasestr.c Kurt H Maier
2013-02-21  1:34           ` strcasestr.c Rich Felker
2013-02-21  6:18         ` strcasestr.c Isaac Dunham
2013-02-21 20:00           ` strcasestr.c John Spencer
2013-02-21 20:13             ` strcasestr.c Szabolcs Nagy
2013-02-22  5:20         ` strcasestr.c Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130217190452.GH20323@brightrain.aerifal.cx \
    --to=dalias@aerifal.cx \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).