mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] strcasestr("", "") returns NULL
@ 2025-05-16  7:32 Nuno Cruces
       [not found] ` <20250516125024.GI1827@brightrain.aerifal.cx>
  0 siblings, 1 reply; 4+ messages in thread
From: Nuno Cruces @ 2025-05-16  7:32 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 360 bytes --]

Hi,

Currently strcasestr("", "") returns NULL which is inconsistent
with strstr("", "").

For strstr, the C standard specifies "If s2 points to a string with zero
length, the function returns s1."

strcasestr is a nonstandard extension, but to the best of my knowledge,
both glibc and the BSDs decide to be consistent with strstr in this case.

Regards,
Nuno

[-- Attachment #2: Type: text/html, Size: 540 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] strcasestr("", "") returns NULL
       [not found] ` <20250516125024.GI1827@brightrain.aerifal.cx>
@ 2025-05-16 16:02   ` Nuno Cruces
  2025-05-16 16:09     ` Rich Felker
  0 siblings, 1 reply; 4+ messages in thread
From: Nuno Cruces @ 2025-05-16 16:02 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 1629 bytes --]

I don't know about it not allowing optimization.

The C/POSIX local does, as you wrote. And regular expression engines kinda
prove it's at least possible not to be quadratic on the length of the
haystack (although they spend a lot more effort preprocessing the needle).

But yeah, at a minimum any locale where case differences interfere with
random access invalidates all the classical algorithms.

Nuno

On Fri 16 May 2025, 13:50 Rich Felker, <dalias@libc.org> wrote:

> On Fri, May 16, 2025 at 08:32:00AM +0100, Nuno Cruces wrote:
> > Hi,
> >
> > Currently strcasestr("", "") returns NULL which is inconsistent
> > with strstr("", "").
> >
> > For strstr, the C standard specifies "If s2 points to a string with zero
> > length, the function returns s1."
> >
> > strcasestr is a nonstandard extension, but to the best of my knowledge,
> > both glibc and the BSDs decide to be consistent with strstr in this case.
>
> Indeed, this should be fixed. Thanks for the report.
>
> As an aside, strcasestr is an awful function we should probably never
> have provided and that no one should use, that doesn't admit any
> decent optimization. I believe it's vaguely possible to do the twoway
> algorithm for the current ascii-only case equivalence strcasecmp does,
> but I declined to investigate further or implement because if we ever
> want to have strcasecmp do more, the effort would have been wasted (or
> worse, would incentivize setting LC_ALL=C for performance purposes if
> we left the code conditionally in place).
>
> I'll write a fix and push it along with a big queue of stuff I didn't
> realize had piled up.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 2302 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] strcasestr("", "") returns NULL
  2025-05-16 16:02   ` Nuno Cruces
@ 2025-05-16 16:09     ` Rich Felker
  2025-05-16 16:59       ` Nuno Cruces
  0 siblings, 1 reply; 4+ messages in thread
From: Rich Felker @ 2025-05-16 16:09 UTC (permalink / raw)
  To: Nuno Cruces; +Cc: musl

On Fri, May 16, 2025 at 05:02:25PM +0100, Nuno Cruces wrote:
> I don't know about it not allowing optimization.
> 
> The C/POSIX local does, as you wrote. And regular expression engines kinda
> prove it's at least possible not to be quadratic on the length of the
> haystack (although they spend a lot more effort preprocessing the needle).

Regular expression isn't O(1) in space so not a possible
implementation for strcasestr. And even if you do have a space budget,
the optimizations needed to make it not quadratic-in-time have very
large space tradeoffs. It's not the magic solution fans make it sound
like.

Rich

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] strcasestr("", "") returns NULL
  2025-05-16 16:09     ` Rich Felker
@ 2025-05-16 16:59       ` Nuno Cruces
  0 siblings, 0 replies; 4+ messages in thread
From: Nuno Cruces @ 2025-05-16 16:59 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]

I wasn't suggesting they'd be used, just that they kinda demonstrate the
problem is not absolutely intractable.
In any case, if you're interested, glibc seems to use two-way for
strcasestr, though I'm not sure if it supports every locale.
I haven't looked into the details.

Thanks a lot for musl!

Regards,
Nuno


On Fri, 16 May 2025 at 17:09, Rich Felker <dalias@libc.org> wrote:

> On Fri, May 16, 2025 at 05:02:25PM +0100, Nuno Cruces wrote:
> > I don't know about it not allowing optimization.
> >
> > The C/POSIX local does, as you wrote. And regular expression engines
> kinda
> > prove it's at least possible not to be quadratic on the length of the
> > haystack (although they spend a lot more effort preprocessing the
> needle).
>
> Regular expression isn't O(1) in space so not a possible
> implementation for strcasestr. And even if you do have a space budget,
> the optimizations needed to make it not quadratic-in-time have very
> large space tradeoffs. It's not the magic solution fans make it sound
> like.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 1529 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-16 17:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-16  7:32 [musl] strcasestr("", "") returns NULL Nuno Cruces
     [not found] ` <20250516125024.GI1827@brightrain.aerifal.cx>
2025-05-16 16:02   ` Nuno Cruces
2025-05-16 16:09     ` Rich Felker
2025-05-16 16:59       ` Nuno Cruces

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).