mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: Stefan Kanthak <stefan.kanthak@nexgo.de>
Cc: musl@lists.openwall.com
Subject: Re: [musl] Changes for strcspn(), strspn(), strtok() and strtok_r()
Date: Tue, 13 Jul 2021 11:45:47 -0400	[thread overview]
Message-ID: <20210713154547.GA13220@brightrain.aerifal.cx> (raw)
In-Reply-To: <0F84CB3415BB437FBD48B8B81795064E@H270>

On Tue, Jul 13, 2021 at 02:02:06PM +0200, Stefan Kanthak wrote:
> <https://git.musl-libc.org/cgit/musl/plain/src/string/strcspn.c>
> 
>  #include <string.h>
>  
>  #define BITOP(a,b,op) \
>   ((a)[(size_t)(b)/(8*sizeof *(a))] op (size_t)1<<((size_t)(b)%(8*sizeof *(a))))
>  
> -size_t strcspn(const char *s, const char *c)
> +size_t strcspn(const char *restrict s, const char *c)
>  {
> -        const char *a = s;
> +        const char *a;
> -        size_t byteset[32/sizeof(size_t)];
> +        size_t byteset[32/sizeof(size_t)] = { 0 };
>  
> -        if (!c[0] || !c[1]) return __strchrnul(s, *c)-a;
> +        if (!c[0] || !c[1]) return __strchrnul(a=s, *c)-a;
>  
> -        memset(byteset, 0, sizeof byteset);
>          for (; *c && BITOP(byteset, *(unsigned char *)c, |=); c++);
> -        for (; *s && !BITOP(byteset, *(unsigned char *)s, &); s++);
> -        return s-a;
> +        for (a=s; *a && !BITOP(byteset, *(unsigned char *)a, &); a++);
> +        return a-s;
>  }
> 
> After this change, musl's favorite compiler (GCC) will generate code
> like the following (here for x86-64), just like the code it already
> generates for strspn(), where the initialization of byteset[] is NOT
> done via memset():

That's probably a good change, but it's a 2-line change. The above
diff has at least 4 other gratuitous changes that have nothing to do
with getting rid of the memset call. Even if they were desirable, they
can't be done as part of the same change; that destroys the ability to
review/audit the history.

Note that I'd like to restore the ability of the compiler to transform
calls to memset into a builtin, but it's nontrivial because we have to
have -fno-builtin (via -ffreestanding) as the baseline to prevent
circular definitions.

>  #include <string.h>
>  
>  char *strtok(char *restrict s, const char *restrict sep)
>  {
>          static char *p;
> +        return strtok_r(s, sep, &p);
> -        if (!s && !(s = p)) return NULL;
> -        s += strspn(s, sep);
> -        if (!*s) return p = 0;
> -        p = s + strcspn(s, sep);
> -        if (*p) *p++ = 0;
> -        else p = 0;
> -        return s;
>  }
> 
> <https://git.musl-libc.org/cgit/musl/plain/src/string/strtok_r.c>

This cannot be done due to namespace. It may be desirable to make a
namespace-safe alias __strtok_r that could be called to implement
strtok, though.

>  #include <string.h>
>  
>  char *strtok_r(char *restrict s, const char *restrict sep, char **restrict p)
>  {
>          if (!s && !(s = *p)) return NULL;
>          s += strspn(s, sep);
> -        if (!*s) return *p = 0;
> +        if (!*s) return *p = NULL;
>          *p = s + strcspn(s, sep);
>          if (**p) *(*p)++ = 0;
> -        else *p = 0;
> +        else *p = NULL;
>          return s;
>  }

This is a gratuitous style change in the opposite direction of what's
preferred in musl.

> If you want to go a step further, avoid to build the same byteset twice:
> 
> <https://git.musl-libc.org/cgit/musl/plain/src/string/strtok_r.c>
> 
>  #include <string.h>
>  
>  char *strtok_r(char *restrict s, const char *restrict sep, char **restrict p)
>  {
> +        size_t byteset[32/sizeof(size_t)] = { 0 };
> +
>          if (!s && !(s = *p)) return NULL;
> +        if (!*s) return *p = NULL;
> +        if (!*sep) return *p = NULL, *s;
> -        s += strspn(s, sep);
> +        for (; *c && BITOP(byteset, *(unsigned char *)c, |=); c++);
> +        for (; *s && BITOP(byteset, *(unsigned char *)s, &); s++);
> -        if (!*s) return *p = 0;
> +        if (!*s) return *p = NULL;
> -        *p = s + strcspn(s, sep);
> -        if (**p) *(*p)++ = 0;
> -        else *p = 0;
> -        return s;
> +        sep = s;
> +        for (; *s && !BITOP(byteset, *(unsigned char *)s, &); s++);
> +        if (*s) *s++ = 0;
> +        else *s = NULL;
> +        *p = s;
> +        return sep;
>  }

Rewriting code that works and does not have any problems (note: UB or
platform assumptions that could bite someone later would count as a
problem) is not improvement. It consumes maintainer time *and* time of
anyone consuming musl who wants to review all changes for possible
security bugs/backdoors. If there are any worthwhile changes to this
type of legacy function, they should be made as isolated and simple as
possible to ensure that they can be reviewed easily by anyone who
wants to.

Rich

  parent reply	other threads:[~2021-07-13 15:46 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-13 12:02 Stefan Kanthak
2021-07-13 14:22 ` Érico Nogueira
2021-07-13 14:49 ` Érico Nogueira
2021-07-13 15:45 ` Rich Felker [this message]
2021-07-13 16:31   ` Stefan Kanthak
2021-07-13 17:40     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210713154547.GA13220@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    --cc=stefan.kanthak@nexgo.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).