From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.2 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by inbox.vuxu.org (OpenSMTPD) with SMTP id e16c45bc for ; Mon, 3 Feb 2020 21:57:28 +0000 (UTC) Received: (qmail 23854 invoked by uid 550); 3 Feb 2020 21:57:26 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 23834 invoked from network); 3 Feb 2020 21:57:25 -0000 Date: Mon, 3 Feb 2020 16:57:13 -0500 From: Rich Felker To: musl@lists.openwall.com Cc: Simon Message-ID: <20200203215713.GS1663@brightrain.aerifal.cx> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: Rich Felker Subject: Re: [musl] Why does musl printf() use so much more stack than other implementations when printf()ing floating point numbers? On Mon, Feb 03, 2020 at 01:14:21PM -0800, Simon wrote: > I recently noticed that musl printf() implementation uses surprisingly more > stack space than other implementations, but only if printing floating point > numbers, and made some notes here [1]. Any ideas why this happens, and any > chance of fixing it? > > [1] https://gist.github.com/simonhf/2a7b7eb98d2a10c549e8cc858bbefd53 It's fundamental; ability to exactly print arbitrary floating point numbers takes considerable working space unless you want to spend O(n³) time or so (n=exponent value) to keep recomputing things. The minimum needed is probably only around 2/3 of what we use, so it would be possible to reduce slightly, but I doubt a savings of <3k is worth the complexity of ensuring it would still be safe and correct. Note that on archs without extended long double type, which covers everything used in extreme low-memory embedded environments, the memory usage is far lower. This is because it's proportional to the max possible exponent value, which is 1k instead of 16k if nothing larger than IEEE double is supported. I don't know exactly what glibc does, but it's likely they're just using malloc, which is going to be incorrect because it can fail dynamically with OOM. In principle we could also make the working array a VLA and compute smaller bounds on the size needed when precision is limited (the common case). This might really be a practical "fix" for cases people care about, and it would also solve the problem where LLVM makes printf *always* use ~9k stack because it hoists the lifetime of the floating point working array all the way to the top when inlining (this is arguably a serious optimization bug since it can transform all sorts of code that's possible to execute into code that's impossible to execute due to huge stack requirements). By having it be a VLA whose size isn't determined except in the floating point path, LLVM wouldn't be able to hoist it like that. Making this change would still be significant work though, mainly in verification that the bounds are correct and that there are no cases where the smaller array can be made to overflow. Rich