From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2,URIBL_BLACK
	autolearn=no autolearn_force=no version=3.4.4
Received: (qmail 3971 invoked from network); 20 Sep 2022 13:29:46 -0000
Received: from second.openwall.net (193.110.157.125)
  by inbox.vuxu.org with ESMTPUTF8; 20 Sep 2022 13:29:46 -0000
Received: (qmail 19584 invoked by uid 550); 20 Sep 2022 13:29:43 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 19552 invoked from network); 20 Sep 2022 13:29:42 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=inria.fr; s=dc;
  h=date:from:to:cc:subject:message-id:in-reply-to:
   references:mime-version;
  bh=y4mOEzJfkfW1P/Hus3yBGqLN+XY/OUSGKJb+bHW57x4=;
  b=oJ/CfDMmGUb/0iGQjKZqmOugRW6ke2QSabggtnqMU5k4lUqkNt2kpUdx
   FEE1JBay4OBLFkERC2r4OU+n+75rjERyzMB/5b7LbfC79oscx7ho7SkpF
   kGxgKIn4gcUH539mvSTKv2VzY5PERW3/bku3RvB74Ev5wuIvjRgEw8kYc
   c=;
Authentication-Results: mail2-relais-roc.national.inria.fr; dkim=none (message not signed) header.i=none; spf=SoftFail smtp.mailfrom=jens.gustedt@inria.fr; dmarc=fail (p=none dis=none) d=inria.fr
X-IronPort-AV: E=Sophos;i="5.93,330,1654552800"; 
   d="scan'208";a="53657013"
Date: Tue, 20 Sep 2022 15:29:29 +0200
From: =?UTF-8?B?SuKCkeKCmeKCmw==?= Gustedt <jens.gustedt@inria.fr>
To: Rich Felker <dalias@libc.org>
Cc: musl@lists.openwall.com
Message-ID: <20220920152929.66a33c9b@inria.fr>
In-Reply-To: <20220920122829.GM9709@brightrain.aerifal.cx>
References: <20220908163649.634728-1-gabravier@gmail.com>
	<20220912135904.GI9709@brightrain.aerifal.cx>
	<20220912164251.53a32cac@inria.fr>
	<20220919150916.GP9709@brightrain.aerifal.cx>
	<20220919175952.GB2158779@port70.net>
	<20220919181039.GS9709@brightrain.aerifal.cx>
	<20220920111934.4dcdc985@inria.fr>
	<20220920122829.GM9709@brightrain.aerifal.cx>
Organization: inria.fr
X-Mailer: Claws Mail 4.0.0 (GTK+ 3.24.33; x86_64-pc-linux-gnu)
X-Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAAXNSR0IArs4c6QAAACRQTFRFERslNjAsLTE9Ok9wUk9TaUs8iWhSrYZkj42Rz6aD3sGZ
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="Sig_/h1oeunmdAArg6JEP=TDN2x_";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Subject: Re: [musl] [PATCH] vfprintf: support C2x %b and %B conversion
 specifiers

--Sig_/h1oeunmdAArg6JEP=TDN2x_
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Rich,

on Tue, 20 Sep 2022 08:28:29 -0400 you (Rich Felker <dalias@libc.org>)
wrote:

> On Tue, Sep 20, 2022 at 11:19:34AM +0200, J=E2=82=91=E2=82=99=E2=82=9B Gu=
stedt wrote:
> > Rich,
> >=20
> > on Mon, 19 Sep 2022 14:10:39 -0400 you (Rich Felker
> > <dalias@libc.org>) wrote:
> >  =20
> > > On Mon, Sep 19, 2022 at 07:59:52PM +0200, Szabolcs Nagy wrote: =20
>  [...] =20
>  [...] =20
> > >  [...]   =20
>  [...] =20
>  [...] =20
> > >=20
> > > Yeah, I don't see that as being a usable approach. It's closely
> > > tied to the glibc printf model that's not usable in bounded
> > > memory with arbitrary width and precision, and not compatible
> > > with linking semantics as you mention. The amount of code needed
> > > for decimal float printing in decimal is miniscule anyway and
> > > something we can easily do with no actual decimal floating point
> > > code. I thought the hard case was hex, but looking at the spec
> > > again, %a doesn't actually do hex for decimal floats, so it
> > > should be easy too. =20
> >=20
> > Yes exactly. There is nothing conceptually difficult here and
> > nothing that should not be in some form or another already in every
> > C library.
> >=20
> > So yes, sorry, for the separate library part I forgot formated IO
> > and string functions. But the huge amount of functions that are
> > added for these types are math functions (I guess something like
> > 600 or so) stepping on user's identifier space all over. =20
>=20
> Yes, I think it's fine for now to have a separate math library for the
> math functions. Otherwise the work of adding these interfaces becomes
> rather prohibitive. I would assume they're all pure functions where
> correct implementations are basically interchangable, so I don't see a
> lot of value in insisting these "go with" libc.

Depends on your instantiation of "pure", but yes, these should be
mostly interchangeable. The only thing to worry here are that there
are two possible representations for these types, one where the
mantissa is basically represented as an integer, and the other where
decimal digits are packed into groups of bytes in a clever way.

> =E2=80=A6

> > But for implementing the parts that are outside of math, things
> > should indeed not be so difficult. gcc has support for the types
> > since long, I think, and should also provide predefined macros that
> > could be used to check for language support. Then, the types
> > themselves have clear definition and prescribed representation, the
> > ABI is de-facto sorted out, so there would be not much other
> > implementation dependency to worry about. =20
>=20
> The thing is we don't have the option to "check for language support".
> Doing that would mean you get a deficient musl build if your compiler
> doesn't have the language features, so essentially we'd be requiring
> bleeding-edge gcc or clang (dropping all other-compiler support at the
> same time) to get a properly featured libc.so that's capable of
> supporting arbitrary musl-linked binaries.

I don't think that this needs to be. If you add e.g support for
decimal floating point to `printf`, the compiler support for that only
has to be there on the platform where you compile musl. If a user
platform that uses such a library does not support it, that part will
simply never be called because users can't defined variables of that
type. This increases the size of `printf` a bit, though, but my guess
is that this would be marginal compared to the size that `printf` has,
anyhow.

> This is why we're going to need asm thunks for performing va_arg with
> the new types and (programmatically generated, I assume) asm entry
> thunks for accepting arguments to any non-variadic functions, which
> can convert (ideally as a no-op) the decimal float type arguments to
> integer-type or struct arguments the underlying implementation files
> would then receive.

There are no C library functions other than in math.h, I think, that
accept decimal floating types as prototyped arguments. So if we don't
do math.h, only `printf` and similar remain with `va_arg` calling
conventions.

The only functions that have decimal floating return types are the
`strtodN` functions in 7.24.1.6, AFAICS.

So, yes, we'd have to extend `va_arg` with the necessary knowledge to
obtain a decimal floating point, but hopefully that is just the same
as obtaining access to other 32, 64 or 128 bit types.

> > Other types that come with C23, and these are mandatory, are
> > bit-precise integers. There the support by compilers is probably not
> > yet completely established. I know of an integration into llvm, but
> > I am not sure about the state of affairs for gcc, nor if there is a
> > de-facto agreement on ABI issues. In any case, these types need
> > support in formatted IO, too. =20
>=20
> As far as I can tell, the draft standard makes printf support for all
> but the ones defined as [u]intNN_t a choice for the implementation, so
> the obvious choice is not to support any additional ones.

(There are also the "fast" version that have a different format
specifier, but which hopefully are basically the same as for the exact
width.)

I think for QoI it would be really good to support the bit-precise
types. These are a quite good design that avoids a lot of the
complications of the classical integer types. In particular we will
see them pop up for bit-fields and stuff like that, where there have
clearer semantics than the traditional ones and extend the
possibilities beyond the width of `int` to at least 64 bit.

> > Also, C23, provides the possibility for extended integer types that
> > are wider than `[u]intmax_t` under some conditions. This is intended
> > in particular to allow for implementations such as gcc on x86_64 to
> > interface the existing 128 bit integer types properly as
> > `[u]int128_t`. From a C library POV, these then also would need
> > integration into formatted IO, but here again support in the
> > compiler with usable feature test macros is there for ages and the
> > ABI should already be sorted out. =20
>=20
> Yes. I haven't followed the latest on this but my leaning was to leave
> them as "compiler extensions" that don't count as "extended integer
> types". However presumably they could be handled the same way as
> decimal floats if needed.

For once this allows to define extended integer types in the sense of
the standard and to provide full support for them. But you are right
the approach could be the same as for decimal floating point: compile
them in if the compilation platform of the C library supports them.

> > So in summary that means that there is some work to do to make
> > formatted IO of C libraries become compliant with C23. Let me know
> > if and where I could help to make that happen for musl. =20
>=20
> The big issue is probably collating the list of what's actually needed
> to meet requirements,

that I could do

> and what the ABIs for them are.

that's were I am not an expert in :-((

> If there's cross-arch agreement on a general pattern ABIs follow for
> them, that would be wonderful, and even if not entirely so, a
> general pattern would advise how we structure the underlying
> functions (to make thunks as minimal as possible on the largest
> number of archs).

My guess for that is that the decimal floating point types are just
handled by their respective width, and that the bit-precise integer
types of width N will be rounded up to the next power of two M and use
representation and calling convention for `uintM_t`. But that would of
course have to be verified. I can ask Aaron (who wrote this stuff and
has provided the implementation in llvm) how that is actually done
there.

J=E2=82=91=E2=82=99=E2=82=9B

--=20
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

--Sig_/h1oeunmdAArg6JEP=TDN2x_
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----

iF0EARECAB0WIQSN9stI2OFN1pLljN0P0+hp2tU34gUCYynAOQAKCRAP0+hp2tU3
4hCVAJ9NSHhZCImODkN22tAL9GIHFdDVxwCaAxPeSZToQK0XM107RH0WUbaeoT0=
=4xng
-----END PGP SIGNATURE-----

--Sig_/h1oeunmdAArg6JEP=TDN2x_--