From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_LOW,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 21251 invoked from network); 2 Jun 2023 14:38:15 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 2 Jun 2023 14:38:15 -0000 Received: (qmail 17810 invoked by uid 550); 2 Jun 2023 14:38:13 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 17775 invoked from network); 2 Jun 2023 14:38:12 -0000 Date: Fri, 2 Jun 2023 10:38:00 -0400 From: Rich Felker To: Jens Gustedt Cc: musl@lists.openwall.com, jens.gustedt@posteo.eu Message-ID: <20230602143759.GQ4163@brightrain.aerifal.cx> References: <5a2c0731967330e9a3832d531ea4212e223aaead.1685429467.git.Jens.Gustedt@inria.fr> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5a2c0731967330e9a3832d531ea4212e223aaead.1685429467.git.Jens.Gustedt@inria.fr> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] [C23 printf 2/3] C23: implement the wN length specifiers for printf On Tue, May 30, 2023 at 08:55:27AM +0200, Jens Gustedt wrote: > These are mandatory for C23 and concern all types for which the > platform has `int_leastN_t` and `uint_leastN_t`. For musl these types > always coincide with `intN_t` and `uintN_t` and are always present for > N equal 8, 16, 32 and 64. > > They can be added for general use since all lowercase letters were > previously reserved. > > Nevertheless, users that use these modifiers will see a lot of > warnings from compilers in the beginning. This is because the > compilers have not yet integrated this form of a specifier into their > correponding extensions (gcc attributes). So unfortunately also > testing this feature may be a bit noisy for the moment. > > The only architecture dependend choice is the type for N == 64, which > may be `long` or `long long`. We just mimick the test that is done in > other places to compare `UINTPTR_MAX` and `UINT64_MAX` to determine > that. > --- > src/stdio/vfprintf.c | 28 +++++++++++++++++++++++++--- > src/stdio/vfwprintf.c | 28 +++++++++++++++++++++++++--- > 2 files changed, 50 insertions(+), 6 deletions(-) > > diff --git a/src/stdio/vfprintf.c b/src/stdio/vfprintf.c > index cbc79783..265fb7ad 100644 > --- a/src/stdio/vfprintf.c > +++ b/src/stdio/vfprintf.c > @@ -33,7 +33,7 @@ > > enum { > BARE, LPRE, LLPRE, HPRE, HHPRE, BIGLPRE, > - ZTPRE, JPRE, > + ZTPRE, JPRE, WPRE, > STOP, > PTR, INT, UINT, ULLONG, > LONG, ULONG, > @@ -57,7 +57,7 @@ static const unsigned char states[]['z'-'A'+1] = { > S('s') = PTR, S('S') = PTR, S('p') = UIPTR, S('n') = PTR, > S('m') = NOARG, > S('l') = LPRE, S('h') = HPRE, S('L') = BIGLPRE, > - S('z') = ZTPRE, S('j') = JPRE, S('t') = ZTPRE, > + S('z') = ZTPRE, S('j') = JPRE, S('t') = ZTPRE, S('w') = WPRE, > }, { /* 1: l-prefixed */ > S('b') = ULONG, S('B') = ULONG, > S('d') = LONG, S('i') = LONG, > @@ -101,6 +101,12 @@ static const unsigned char states[]['z'-'A'+1] = { > S('o') = UMAX, S('u') = UMAX, > S('x') = UMAX, S('X') = UMAX, > S('n') = PTR, > + }, { /* 8: w-prefixed */ > + S('b') = UINT, S('B') = UINT, > + S('d') = INT, S('i') = INT, > + S('o') = UINT, S('u') = UINT, > + S('x') = UINT, S('X') = UINT, > + S('n') = PTR, > } > }; > > @@ -447,7 +453,7 @@ static int printf_core(FILE *f, const char *fmt, va_list *ap, union arg *nl_arg, > int w, p, xp; > union arg arg; > int argpos; > - unsigned st, ps; > + unsigned st, ps, width=0; > int cnt=0, l=0; > size_t i; > char buf[sizeof(uintmax_t)*CHAR_BIT+3+LDBL_MANT_DIG/4]; > @@ -527,9 +533,25 @@ static int printf_core(FILE *f, const char *fmt, va_list *ap, union arg *nl_arg, > if (OOB(*s)) goto inval; > ps=st; > st=states[st]S(*s++); > + if (st == WPRE) { > + if (*s == '0') goto inval; > + width = getint(&s); > + } > } while (st-1 if (!st) goto inval; > > + if (ps == WPRE) switch (width) { > + case 8: ps = HHPRE; st = (st == UINT) ? UCHAR : ((st == INT) ? CHAR : PTR); break; > + case 16: ps = HPRE; st = (st == UINT) ? USHORT : ((st == INT) ? SHORT : PTR); break; > + case 32: ps = BARE; break; > +#if UINTPTR_MAX >= UINT64_MAX > + case 64: ps = LPRE; st = (st == UINT) ? ULONG : ((st == INT) ? LONG : PTR); break; > +#else > + case 64: ps = LLPRE; st = (st == UINT) ? ULLONG : ((st == INT) ? LLONG : PTR); break; > +#endif The logic here for whether w64 is LPRE or LLPRE is wrong. The condition for whether [u]int64_t is long or long long is not whether uintptr_t or long long would be too wide (they never are, in our ABIs) but whether long is long enough. All our [u]intN_t are the lowest-rank type of the desired size, and this seems to be consistent with what other implementations like glibc do. Short of a compelling reason not to, though, I plan to just go with the fix-up of my v2 patch for this functionality. Its ordering avoids duplication of logic (the ?: branches in the cases above) and the need for maintaining extra side state (the width variable) thru the state machine. Rich