From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 13556 invoked from network); 13 Jul 2021 15:46:03 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 13 Jul 2021 15:46:03 -0000 Received: (qmail 21791 invoked by uid 550); 13 Jul 2021 15:46:01 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 21773 invoked from network); 13 Jul 2021 15:46:00 -0000 Date: Tue, 13 Jul 2021 11:45:47 -0400 From: Rich Felker To: Stefan Kanthak Cc: musl@lists.openwall.com Message-ID: <20210713154547.GA13220@brightrain.aerifal.cx> References: <0F84CB3415BB437FBD48B8B81795064E@H270> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0F84CB3415BB437FBD48B8B81795064E@H270> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Changes for strcspn(), strspn(), strtok() and strtok_r() On Tue, Jul 13, 2021 at 02:02:06PM +0200, Stefan Kanthak wrote: > > > #include > > #define BITOP(a,b,op) \ > ((a)[(size_t)(b)/(8*sizeof *(a))] op (size_t)1<<((size_t)(b)%(8*sizeof *(a)))) > > -size_t strcspn(const char *s, const char *c) > +size_t strcspn(const char *restrict s, const char *c) > { > - const char *a = s; > + const char *a; > - size_t byteset[32/sizeof(size_t)]; > + size_t byteset[32/sizeof(size_t)] = { 0 }; > > - if (!c[0] || !c[1]) return __strchrnul(s, *c)-a; > + if (!c[0] || !c[1]) return __strchrnul(a=s, *c)-a; > > - memset(byteset, 0, sizeof byteset); > for (; *c && BITOP(byteset, *(unsigned char *)c, |=); c++); > - for (; *s && !BITOP(byteset, *(unsigned char *)s, &); s++); > - return s-a; > + for (a=s; *a && !BITOP(byteset, *(unsigned char *)a, &); a++); > + return a-s; > } > > After this change, musl's favorite compiler (GCC) will generate code > like the following (here for x86-64), just like the code it already > generates for strspn(), where the initialization of byteset[] is NOT > done via memset(): That's probably a good change, but it's a 2-line change. The above diff has at least 4 other gratuitous changes that have nothing to do with getting rid of the memset call. Even if they were desirable, they can't be done as part of the same change; that destroys the ability to review/audit the history. Note that I'd like to restore the ability of the compiler to transform calls to memset into a builtin, but it's nontrivial because we have to have -fno-builtin (via -ffreestanding) as the baseline to prevent circular definitions. > #include > > char *strtok(char *restrict s, const char *restrict sep) > { > static char *p; > + return strtok_r(s, sep, &p); > - if (!s && !(s = p)) return NULL; > - s += strspn(s, sep); > - if (!*s) return p = 0; > - p = s + strcspn(s, sep); > - if (*p) *p++ = 0; > - else p = 0; > - return s; > } > > This cannot be done due to namespace. It may be desirable to make a namespace-safe alias __strtok_r that could be called to implement strtok, though. > #include > > char *strtok_r(char *restrict s, const char *restrict sep, char **restrict p) > { > if (!s && !(s = *p)) return NULL; > s += strspn(s, sep); > - if (!*s) return *p = 0; > + if (!*s) return *p = NULL; > *p = s + strcspn(s, sep); > if (**p) *(*p)++ = 0; > - else *p = 0; > + else *p = NULL; > return s; > } This is a gratuitous style change in the opposite direction of what's preferred in musl. > If you want to go a step further, avoid to build the same byteset twice: > > > > #include > > char *strtok_r(char *restrict s, const char *restrict sep, char **restrict p) > { > + size_t byteset[32/sizeof(size_t)] = { 0 }; > + > if (!s && !(s = *p)) return NULL; > + if (!*s) return *p = NULL; > + if (!*sep) return *p = NULL, *s; > - s += strspn(s, sep); > + for (; *c && BITOP(byteset, *(unsigned char *)c, |=); c++); > + for (; *s && BITOP(byteset, *(unsigned char *)s, &); s++); > - if (!*s) return *p = 0; > + if (!*s) return *p = NULL; > - *p = s + strcspn(s, sep); > - if (**p) *(*p)++ = 0; > - else *p = 0; > - return s; > + sep = s; > + for (; *s && !BITOP(byteset, *(unsigned char *)s, &); s++); > + if (*s) *s++ = 0; > + else *s = NULL; > + *p = s; > + return sep; > } Rewriting code that works and does not have any problems (note: UB or platform assumptions that could bite someone later would count as a problem) is not improvement. It consumes maintainer time *and* time of anyone consuming musl who wants to review all changes for possible security bugs/backdoors. If there are any worthwhile changes to this type of legacy function, they should be made as isolated and simple as possible to ensure that they can be reviewed easily by anyone who wants to. Rich