mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: "Jₑₙₛ Gustedt" <jens.gustedt@inria.fr>
Cc: musl@lists.openwall.com
Subject: Re: [musl] [C23 printf 2/3] C23: implement the wN length specifiers for printf
Date: Fri, 2 Jun 2023 10:29:37 -0400	[thread overview]
Message-ID: <20230602142936.GP4163@brightrain.aerifal.cx> (raw)
In-Reply-To: <20230530014822.GW4163@brightrain.aerifal.cx>

On Mon, May 29, 2023 at 09:48:22PM -0400, Rich Felker wrote:
> On Mon, May 29, 2023 at 09:21:55PM +0200, Jₑₙₛ Gustedt wrote:
> > Rich,
> > 
> > on Mon, 29 May 2023 11:46:40 -0400 you (Rich Felker <dalias@libc.org>)
> > wrote:
> > 
> > > On Mon, May 29, 2023 at 09:14:13AM +0200, Jₑₙₛ Gustedt wrote:
> > > > Rich,
> > > > 
> > > > on Fri, 26 May 2023 17:03:58 -0400 you (Rich Felker
> > > > <dalias@libc.org>) wrote:
> > > >   
> > > > > I think you need an extra state that's "plain but not bare" that
> > > > > duplicates only the integer transitions out of it, like the l, ll,
> > > > > etc. prefix states do.  
> > > > 
> > > > Hm, the problem is that for the other prefixes the table entries
> > > > then encode the concrete type that is to be expected. We could not
> > > > do this here because the type depends on the requested width. So we
> > > > would then need to "repair" that type after the loop. A `switch` to
> > > > do that would look substantially similar to what is there, now. Do
> > > > you think that would be better?  
> > > 
> > > OK I think I can communicate better with code than natural language
> > > text, so here's a diff, completely untested, of what I had in mind.
> > 
> > that's ... ugh ... not so prety, I think
> > 
> > In my current version I track the desired width, if there is w
> > specifier, and then do the adjustments after the loop. That takes
> > indeed care of undefined character sequences.
> > 
> > I find that much better readable, and also easier to extend (later
> > there comes the `wf` case and the `128`, and perhaps some day `256`)
> 
> It sounds like the core issue is that you don't like the state machine
> approach to how musl's printf processes format specifiers. Personally,
> I like it because there's an obvious structured way to validate that
> it's accepting exactly the right things and nothing else, vs an
> approach like what you tried where you ended up accepting a lot of
> bogus specifiers.
> 
> One alternative I would consider is doing something like what you did,
> but moving it outside of/before the state machine loop, so it's not
> mixing the w* processing with the state machine. This avoids accepting
> bogus repeated w32 prefixes and similar (because there is no loop) and
> lets you get by with just adding one PLAIN state to have it start in
> (rather than BARE) after w32. I expect the overall size would be
> similar. Concept attached.
> 
> Rich

> diff --git a/src/stdio/vfprintf.c b/src/stdio/vfprintf.c
> index a712d80f..a9e4d638 100644
> --- a/src/stdio/vfprintf.c
> +++ b/src/stdio/vfprintf.c
> @@ -33,7 +33,7 @@
>  
>  enum {
>  	BARE, LPRE, LLPRE, HPRE, HHPRE, BIGLPRE,
> -	ZTPRE, JPRE,
> +	ZTPRE, JPRE, PLAIN,
>  	STOP,
>  	PTR, INT, UINT, ULLONG,
>  	LONG, ULONG,
> @@ -44,9 +44,10 @@ enum {
>  	MAXSTATE
>  };
>  
> -#define S(x) [(x)-'A']
> +#define ST_BASE 'A'
> +#define S(x) [(x)-ST_BASE]
>  
> -static const unsigned char states[]['z'-'A'+1] = {
> +static const unsigned char states[]['z'-ST_BASE+1] = {
>  	{ /* 0: bare types */
>  		S('d') = INT, S('i') = INT,
>  		S('o') = UINT, S('u') = UINT, S('x') = UINT, S('X') = UINT,
> @@ -94,10 +95,15 @@ static const unsigned char states[]['z'-'A'+1] = {
>  		S('o') = UMAX, S('u') = UMAX,
>  		S('x') = UMAX, S('X') = UMAX,
>  		S('n') = PTR,
> +	}, { /* 8: explicit-width-prefixed bare equivalent */
> +		S('d') = INT, S('i') = INT,
> +		S('o') = UINT, S('u') = UINT,
> +		S('x') = UINT, S('X') = UINT,
> +		S('n') = PTR,
>  	}
>  };
>  
> -#define OOB(x) ((unsigned)(x)-'A' > 'z'-'A')
> +#define OOB(x) ((unsigned)(x)-ST_BASE > 'z'-ST_BASE)
>  
>  union arg
>  {
> @@ -510,6 +516,13 @@ static int printf_core(FILE *f, const char *fmt, va_list *ap, union arg *nl_arg,
>  
>  		/* Format specifier state machine */
>  		st=0;
> +		if (*s=='w' && s[1]-'1'<9u) switch (getint(&s)) {
> +		case 8:  st = HHPRE; break;
> +		case 16: st = HPRE; break;
> +		case 32: st = PLAIN; break;
> +		case 64: st = LLONG_MAX > LONG_MAX ? LLPRE : LPRE; break;
> +		default: goto inval;
> +		}
>  		do {
>  			if (OOB(*s)) goto inval;
>  			ps=st;
> @@ -547,6 +560,7 @@ static int printf_core(FILE *f, const char *fmt, va_list *ap, union arg *nl_arg,
>  		switch(t) {
>  		case 'n':
>  			switch(ps) {
> +			case PLAIN:
>  			case BARE: *(int *)arg.p = cnt; break;
>  			case LPRE: *(long *)arg.p = cnt; break;
>  			case LLPRE: *(long long *)arg.p = cnt; break;

OK, some numbers. This alt version (with a couple bug fixes and
support for wf added), vs the full state machine version. On i386,
using this instead of the full state machine (which is still missing
'wf') adds 246 bytes of .text but saves 440 bytes of .rodata. With wf
added, it will be even more of a win. So I think it makes more sense
to go with this version.

As an aside, since think the state table is the same for wide printf
as for normal, and since wide printf already depends on normal printf,
we could make the states table external (but hidden, of course) and
let wide printf reuse it, for some decent size savings and elimination
of source-level redundancy (DRY).

Rich

  parent reply	other threads:[~2023-06-02 14:29 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-26 19:41 [musl] [C23 printf 0/3] Jens Gustedt
2023-05-26 19:41 ` [musl] [C23 printf 1/3] C23: implement the b and B printf specifiers Jens Gustedt
2023-05-26 19:41 ` [musl] [C23 printf 2/3] C23: implement the wN length specifiers for printf Jens Gustedt
2023-05-26 20:31   ` Rich Felker
2023-05-26 20:51     ` Jₑₙₛ Gustedt
2023-05-26 21:03       ` Rich Felker
2023-05-29  7:14         ` Jₑₙₛ Gustedt
2023-05-29 15:46           ` Rich Felker
2023-05-29 19:21             ` Jₑₙₛ Gustedt
2023-05-30  1:48               ` Rich Felker
2023-05-30  6:46                 ` Jₑₙₛ Gustedt
2023-05-30 17:28                   ` Rich Felker
2023-05-30 18:00                     ` Rich Felker
2023-05-31  7:59                     ` Jₑₙₛ Gustedt
2023-06-02 14:29                 ` Rich Felker [this message]
2023-06-02 14:55                   ` Jₑₙₛ Gustedt
2023-06-02 15:07                     ` Rich Felker
2023-05-26 19:41 ` [musl] [C23 printf 3/3] C23: implement the wfN length modifiers " Jens Gustedt
2023-05-26 20:33   ` Rich Felker
2023-05-26 21:00     ` Jₑₙₛ Gustedt
     [not found] <cover.1685429467.git.Jens.Gustedt@inria.fr>
2023-05-30  6:55 ` [musl] [C23 printf 2/3] C23: implement the wN length specifiers " Jens Gustedt
2023-06-02 14:38   ` Rich Felker
2023-06-02 15:09     ` Jₑₙₛ Gustedt
2023-06-02 15:16       ` Rich Felker
2023-06-02 15:37         ` Jₑₙₛ Gustedt
2023-05-31 14:04 [musl] [C23 printf 0/3] to be replaced Jens Gustedt
2023-05-31 14:04 ` [musl] [C23 printf 2/3] C23: implement the wN length specifiers for printf Jens Gustedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230602142936.GP4163@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=jens.gustedt@inria.fr \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).