* [musl] Proposed printf stack usage improvement
@ 2024-08-26 20:09 Rich Felker
2024-08-27 9:23 ` Pedro Falcato
0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2024-08-26 20:09 UTC (permalink / raw)
To: musl
[-- Attachment #1: Type: text/plain, Size: 75 bytes --]
See attached, as discussed on IRC. Motivation described in commit
message.
[-- Attachment #2: 0001-printf-drastically-reduce-stack-usage-without-long-d.patch --]
[-- Type: text/plain, Size: 2388 bytes --]
From 572a2e2eb91f00f2f25d301cfb50f435e7ae16b3 Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Mon, 26 Aug 2024 16:01:11 -0400
Subject: [PATCH] printf: drastically reduce stack usage without [long] double
args
internally, printf always works with the maximal-size supported
integer and floating point formats. however, the space needed to
format a floating point number is proportional to the mantissa and
exponent ranges. on archs where long double is larger than double,
knowing that the actual value fit in double allows us to use a much
smaller buffer, roughly 1/16 the size.
as a bonus, making the working buffer a VLA whose dimension depends on
the format specifier prevents the compiler from lifting the stack
adjustment to the top of printf_core. this makes it so printf calls
without floating point arguments do not waste even the smaller amount
of stack space needed for double, making it much more practical to use
printf in tightly stack-constrained environments.
---
src/stdio/vfprintf.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/src/stdio/vfprintf.c b/src/stdio/vfprintf.c
index 3c450c3a..76733997 100644
--- a/src/stdio/vfprintf.c
+++ b/src/stdio/vfprintf.c
@@ -178,10 +178,14 @@ static char *fmt_u(uintmax_t x, char *s)
typedef char compiler_defines_long_double_incorrectly[9-(int)sizeof(long double)];
#endif
-static int fmt_fp(FILE *f, long double y, int w, int p, int fl, int t)
+static int fmt_fp(FILE *f, long double y, int w, int p, int fl, int t, int ps)
{
- uint32_t big[(LDBL_MANT_DIG+28)/29 + 1 // mantissa expansion
- + (LDBL_MAX_EXP+LDBL_MANT_DIG+28+8)/9]; // exponent expansion
+ int bufsize = (ps==BIGLPRE)
+ ? (LDBL_MANT_DIG+28)/29 + 1 + // mantissa expansion
+ (LDBL_MAX_EXP+LDBL_MANT_DIG+28+8)/9 // exponent expansion
+ : (DBL_MANT_DIG+28)/29 + 1 +
+ (DBL_MAX_EXP+DBL_MANT_DIG+28+8)/9;
+ uint32_t big[bufsize];
uint32_t *a, *d, *r, *z;
int e2=0, e, i, j, l;
char buf[9+LDBL_MANT_DIG/4], *s;
@@ -618,7 +622,7 @@ static int printf_core(FILE *f, const char *fmt, va_list *ap, union arg *nl_arg,
case 'e': case 'f': case 'g': case 'a':
case 'E': case 'F': case 'G': case 'A':
if (xp && p<0) goto overflow;
- l = fmt_fp(f, arg.f, w, p, fl, t);
+ l = fmt_fp(f, arg.f, w, p, fl, t, ps);
if (l<0) goto overflow;
continue;
}
--
2.21.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] Proposed printf stack usage improvement
2024-08-26 20:09 [musl] Proposed printf stack usage improvement Rich Felker
@ 2024-08-27 9:23 ` Pedro Falcato
2024-08-27 15:21 ` Rich Felker
0 siblings, 1 reply; 5+ messages in thread
From: Pedro Falcato @ 2024-08-27 9:23 UTC (permalink / raw)
To: musl
LGTM.
But maybe you should also include my __attribute__((noinline)) sugestion, to make sure
the integer printf and floating point paths get mixed by the compiler. Even if current gcc/clang don't seem
to want to do that, it's better to be safe than sorry (and I assume any LTO/PGO might change that atm).
--
Pedro
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] Proposed printf stack usage improvement
2024-08-27 9:23 ` Pedro Falcato
@ 2024-08-27 15:21 ` Rich Felker
2024-08-27 15:42 ` Pedro Falcato
0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2024-08-27 15:21 UTC (permalink / raw)
To: Pedro Falcato; +Cc: musl
On Tue, Aug 27, 2024 at 10:23:57AM +0100, Pedro Falcato wrote:
> LGTM.
>
> But maybe you should also include my __attribute__((noinline))
> sugestion, to make sure the integer printf and floating point paths
> get mixed by the compiler. Even if current gcc/clang don't seem to
> want to do that, it's better to be safe than sorry (and I assume any
> LTO/PGO might change that atm).
I'm not clear what ill effect you're trying to mitigate here.
Rich
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] Proposed printf stack usage improvement
2024-08-27 15:21 ` Rich Felker
@ 2024-08-27 15:42 ` Pedro Falcato
2024-08-27 21:32 ` Rich Felker
0 siblings, 1 reply; 5+ messages in thread
From: Pedro Falcato @ 2024-08-27 15:42 UTC (permalink / raw)
To: Rich Felker; +Cc: musl
On Tue, Aug 27, 2024 at 11:21:33AM GMT, Rich Felker wrote:
> On Tue, Aug 27, 2024 at 10:23:57AM +0100, Pedro Falcato wrote:
> > LGTM.
> >
> > But maybe you should also include my __attribute__((noinline))
> > sugestion, to make sure the integer printf and floating point paths
> > get mixed by the compiler. Even if current gcc/clang don't seem to
> > want to do that, it's better to be safe than sorry (and I assume any
> > LTO/PGO might change that atm).
>
> I'm not clear what ill effect you're trying to mitigate here.
(fwiw, if it wasn't clear: I meant "make sure the <...> *don't* get mixed)
fmt_fp with the patch applied still has a significant stack impact (520 bytes according to my
measurement) which can be avoided on the vast majority of (integer) printfs.
printf_core OTOH uses up 472 bytes of stack, so the simple possibility of inlining it can
(worst case) more than double the stack space used by all printfs.
Granted, the patch seems to convince clang not to inline fmt_fp at all, but AFAIK this is by no means
a guarantee.
One could consider this somewhat of a microoptimization, but musl thread stacks are by no
means big, so...
--
Pedro
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] Proposed printf stack usage improvement
2024-08-27 15:42 ` Pedro Falcato
@ 2024-08-27 21:32 ` Rich Felker
0 siblings, 0 replies; 5+ messages in thread
From: Rich Felker @ 2024-08-27 21:32 UTC (permalink / raw)
To: Pedro Falcato; +Cc: musl
[-- Attachment #1: Type: text/plain, Size: 2849 bytes --]
On Tue, Aug 27, 2024 at 04:42:35PM +0100, Pedro Falcato wrote:
> On Tue, Aug 27, 2024 at 11:21:33AM GMT, Rich Felker wrote:
> > On Tue, Aug 27, 2024 at 10:23:57AM +0100, Pedro Falcato wrote:
> > > LGTM.
> > >
> > > But maybe you should also include my __attribute__((noinline))
> > > sugestion, to make sure the integer printf and floating point paths
> > > get mixed by the compiler. Even if current gcc/clang don't seem to
> > > want to do that, it's better to be safe than sorry (and I assume any
> > > LTO/PGO might change that atm).
> >
> > I'm not clear what ill effect you're trying to mitigate here.
>
> (fwiw, if it wasn't clear: I meant "make sure the <...> *don't* get mixed)
>
> fmt_fp with the patch applied still has a significant stack impact (520 bytes according to my
> measurement) which can be avoided on the vast majority of (integer) printfs.
How did you measure? There should be essentially no static stack usage
in fmt_fp with this patch, only dynamic (VLA). On archs with
ld==double, it's possible that the compiler could decide to "optimize"
a VLA whose size can only have one possible value to a non-VLA, then
lift if, but this would be a highly malicious transformation that
could lead to much more catastrophic stack overflows in real-world
usage I think, so I would not expect compilers to do it.
Indeed a quick check of the attached, which I wrote to be as naively
easy to mis-optimize as possible, shows neither gcc nor clang lifting
the VLA.
> printf_core OTOH uses up 472 bytes of stack, so the simple possibility of inlining it can
> (worst case) more than double the stack space used by all printfs.
>
> Granted, the patch seems to convince clang not to inline fmt_fp at all, but AFAIK this is by no means
> a guarantee.
GCC inlines it fine, which is a good thing. This is a function which
is called only one place, and just outlined in the source for the sake
of readability, having its own locals, etc. There's no good reason to
*want* the call boundary overhead.
At some point it might make sense to move fmt_fp to its own TU if we
want to have a way to suppress it from getting linked at all, and this
would also force non-inlining. But it doesn't seem to be desirable to
suppress inlining for its own sake.
> One could consider this somewhat of a microoptimization, but musl thread stacks are by no
> means big, so...
I think generally we don't care about 500 bytes anyway -- I'm not
going to deem a function that overflows the last 500 bytes of a stack
that's too small a bug. Even printf using 8k wasn't a "bug"; the main
motivation for changing this is not to let people YOLO calling printf
with a stack that's barely big enough, but to avoid dirtying extra
pages for no good reason. The 8k pretty much unconditionally dirtied 2
extra otherwise-unused pages for any program using printf.
Rich
[-- Attachment #2: vla_lift.c --]
[-- Type: text/plain, Size: 94 bytes --]
void bah(int *);
void foo(int n)
{
int m = 10000;
if (n) {
char bar[m];
bah(bar);
}
}
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-08-27 21:32 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-26 20:09 [musl] Proposed printf stack usage improvement Rich Felker
2024-08-27 9:23 ` Pedro Falcato
2024-08-27 15:21 ` Rich Felker
2024-08-27 15:42 ` Pedro Falcato
2024-08-27 21:32 ` Rich Felker
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).