From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <musl-return-15379-ml=inbox.vuxu.org@lists.openwall.com>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.0 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,
	RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.2
Received: from mother.openwall.net (mother.openwall.net [195.42.179.200])
	by inbox.vuxu.org (OpenSMTPD) with SMTP id b89c99b5
	for <ml@inbox.vuxu.org>;
	Mon, 3 Feb 2020 23:05:49 +0000 (UTC)
Received: (qmail 19831 invoked by uid 550); 3 Feb 2020 23:05:47 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 19813 invoked from network); 3 Feb 2020 23:05:47 -0000
Date: Tue, 4 Feb 2020 00:05:35 +0100
From: Szabolcs Nagy <nsz@port70.net>
To: musl@lists.openwall.com
Cc: Simon <simonhf@gmail.com>
Message-ID: <20200203230534.GA23985@port70.net>
Mail-Followup-To: musl@lists.openwall.com, Simon <simonhf@gmail.com>
References: <CABkUXbdOP8d=BzFTpYetmEEKyEwWqwaW7NmmB9vdJacu-wXABQ@mail.gmail.com>
 <CABkUXbee4S_SfUFScAs7xau6Q43U8Upr_L9wM9=p74fkX+32pg@mail.gmail.com>
 <20200203215713.GS1663@brightrain.aerifal.cx>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <20200203215713.GS1663@brightrain.aerifal.cx>
User-Agent: Mutt/1.10.1 (2018-07-13)
Subject: Re: [musl] Why does musl printf() use so much more stack than other
 implementations when printf()ing floating point numbers?

* Rich Felker <dalias@libc.org> [2020-02-03 16:57:13 -0500]:

> On Mon, Feb 03, 2020 at 01:14:21PM -0800, Simon wrote:
> > I recently noticed that musl printf() implementation uses surprisingly =
more
> > stack space than other implementations, but only if printing floating p=
oint
> > numbers, and made some notes here [1]. Any ideas why this happens, and =
any
> > chance of fixing it?
> >=20
> > [1] https://gist.github.com/simonhf/2a7b7eb98d2a10c549e8cc858bbefd53
>=20
> It's fundamental; ability to exactly print arbitrary floating point
> numbers takes considerable working space unless you want to spend
> O(n=C2=B3) time or so (n=3Dexponent value) to keep recomputing things. The
> minimum needed is probably only around 2/3 of what we use, so it would
> be possible to reduce slightly, but I doubt a savings of <3k is worth
> the complexity of ensuring it would still be safe and correct.
>=20
> Note that on archs without extended long double type, which covers
> everything used in extreme low-memory embedded environments, the
> memory usage is far lower. This is because it's proportional to the
> max possible exponent value, which is 1k instead of 16k if nothing
> larger than IEEE double is supported.

the musl stack usage is fixed, independent of input when decimal
formatting is done so it can be easily tested. (and yes the size
is mainly determined by the long double exponent range and close
to optimal if performance matters.)

i think stack usage is < 9K not just for printf but any libc call,
currently the exceptions are execl, nftw and regcomp (from which
execl is not a bug the other two could be fixed).

> I don't know exactly what glibc does, but it's likely they're just
> using malloc, which is going to be incorrect because it can fail
> dynamically with OOM.

glibc uses variable amount of stack and it can be big, so
there is a check and then an alloca falls back to malloc.
(so yes it can probably fail with oom and not as-safe).

the alloca threshold is 64k, i don't know if printf can
actually hit that (there are multiple allocas in printf,
some have smaller bounds).

i don't think the actual worst case memory usage is known,
but i can easily imagine it to be above 64k on all targets
(glibc supports _Float128).

as a consequence validating printf using code on glibc
cannot be done by naive tests: in production different
inputs will be used so different stack usage or oom
failure may happen.

>=20
> In principle we could also make the working array a VLA and compute
> smaller bounds on the size needed when precision is limited (the
> common case). This might really be a practical "fix" for cases people
> care about, and it would also solve the problem where LLVM makes
> printf *always* use ~9k stack because it hoists the lifetime of the
> floating point working array all the way to the top when inlining
> (this is arguably a serious optimization bug since it can transform
> all sorts of code that's possible to execute into code that's
> impossible to execute due to huge stack requirements). By having it be
> a VLA whose size isn't determined except in the floating point path,
> LLVM wouldn't be able to hoist it like that.
>=20
> Making this change would still be significant work though, mainly in
> verification that the bounds are correct and that there are no cases
> where the smaller array can be made to overflow.
>=20
> Rich