On Tue, Aug 27, 2024 at 04:42:35PM +0100, Pedro Falcato wrote: > On Tue, Aug 27, 2024 at 11:21:33AM GMT, Rich Felker wrote: > > On Tue, Aug 27, 2024 at 10:23:57AM +0100, Pedro Falcato wrote: > > > LGTM. > > > > > > But maybe you should also include my __attribute__((noinline)) > > > sugestion, to make sure the integer printf and floating point paths > > > get mixed by the compiler. Even if current gcc/clang don't seem to > > > want to do that, it's better to be safe than sorry (and I assume any > > > LTO/PGO might change that atm). > > > > I'm not clear what ill effect you're trying to mitigate here. > > (fwiw, if it wasn't clear: I meant "make sure the <...> *don't* get mixed) > > fmt_fp with the patch applied still has a significant stack impact (520 bytes according to my > measurement) which can be avoided on the vast majority of (integer) printfs. How did you measure? There should be essentially no static stack usage in fmt_fp with this patch, only dynamic (VLA). On archs with ld==double, it's possible that the compiler could decide to "optimize" a VLA whose size can only have one possible value to a non-VLA, then lift if, but this would be a highly malicious transformation that could lead to much more catastrophic stack overflows in real-world usage I think, so I would not expect compilers to do it. Indeed a quick check of the attached, which I wrote to be as naively easy to mis-optimize as possible, shows neither gcc nor clang lifting the VLA. > printf_core OTOH uses up 472 bytes of stack, so the simple possibility of inlining it can > (worst case) more than double the stack space used by all printfs. > > Granted, the patch seems to convince clang not to inline fmt_fp at all, but AFAIK this is by no means > a guarantee. GCC inlines it fine, which is a good thing. This is a function which is called only one place, and just outlined in the source for the sake of readability, having its own locals, etc. There's no good reason to *want* the call boundary overhead. At some point it might make sense to move fmt_fp to its own TU if we want to have a way to suppress it from getting linked at all, and this would also force non-inlining. But it doesn't seem to be desirable to suppress inlining for its own sake. > One could consider this somewhat of a microoptimization, but musl thread stacks are by no > means big, so... I think generally we don't care about 500 bytes anyway -- I'm not going to deem a function that overflows the last 500 bytes of a stack that's too small a bug. Even printf using 8k wasn't a "bug"; the main motivation for changing this is not to let people YOLO calling printf with a stack that's barely big enough, but to avoid dirtying extra pages for no good reason. The 8k pretty much unconditionally dirtied 2 extra otherwise-unused pages for any program using printf. Rich