From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.2 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by inbox.vuxu.org (OpenSMTPD) with SMTP id b89c99b5 for ; Mon, 3 Feb 2020 23:05:49 +0000 (UTC) Received: (qmail 19831 invoked by uid 550); 3 Feb 2020 23:05:47 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 19813 invoked from network); 3 Feb 2020 23:05:47 -0000 Date: Tue, 4 Feb 2020 00:05:35 +0100 From: Szabolcs Nagy To: musl@lists.openwall.com Cc: Simon Message-ID: <20200203230534.GA23985@port70.net> Mail-Followup-To: musl@lists.openwall.com, Simon References: <20200203215713.GS1663@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <20200203215713.GS1663@brightrain.aerifal.cx> User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [musl] Why does musl printf() use so much more stack than other implementations when printf()ing floating point numbers? * Rich Felker [2020-02-03 16:57:13 -0500]: > On Mon, Feb 03, 2020 at 01:14:21PM -0800, Simon wrote: > > I recently noticed that musl printf() implementation uses surprisingly = more > > stack space than other implementations, but only if printing floating p= oint > > numbers, and made some notes here [1]. Any ideas why this happens, and = any > > chance of fixing it? > >=20 > > [1] https://gist.github.com/simonhf/2a7b7eb98d2a10c549e8cc858bbefd53 >=20 > It's fundamental; ability to exactly print arbitrary floating point > numbers takes considerable working space unless you want to spend > O(n=C2=B3) time or so (n=3Dexponent value) to keep recomputing things. The > minimum needed is probably only around 2/3 of what we use, so it would > be possible to reduce slightly, but I doubt a savings of <3k is worth > the complexity of ensuring it would still be safe and correct. >=20 > Note that on archs without extended long double type, which covers > everything used in extreme low-memory embedded environments, the > memory usage is far lower. This is because it's proportional to the > max possible exponent value, which is 1k instead of 16k if nothing > larger than IEEE double is supported. the musl stack usage is fixed, independent of input when decimal formatting is done so it can be easily tested. (and yes the size is mainly determined by the long double exponent range and close to optimal if performance matters.) i think stack usage is < 9K not just for printf but any libc call, currently the exceptions are execl, nftw and regcomp (from which execl is not a bug the other two could be fixed). > I don't know exactly what glibc does, but it's likely they're just > using malloc, which is going to be incorrect because it can fail > dynamically with OOM. glibc uses variable amount of stack and it can be big, so there is a check and then an alloca falls back to malloc. (so yes it can probably fail with oom and not as-safe). the alloca threshold is 64k, i don't know if printf can actually hit that (there are multiple allocas in printf, some have smaller bounds). i don't think the actual worst case memory usage is known, but i can easily imagine it to be above 64k on all targets (glibc supports _Float128). as a consequence validating printf using code on glibc cannot be done by naive tests: in production different inputs will be used so different stack usage or oom failure may happen. >=20 > In principle we could also make the working array a VLA and compute > smaller bounds on the size needed when precision is limited (the > common case). This might really be a practical "fix" for cases people > care about, and it would also solve the problem where LLVM makes > printf *always* use ~9k stack because it hoists the lifetime of the > floating point working array all the way to the top when inlining > (this is arguably a serious optimization bug since it can transform > all sorts of code that's possible to execute into code that's > impossible to execute due to huge stack requirements). By having it be > a VLA whose size isn't determined except in the floating point path, > LLVM wouldn't be able to hoist it like that. >=20 > Making this change would still be significant work though, mainly in > verification that the bounds are correct and that there are no cases > where the smaller array can be made to overflow. >=20 > Rich