mailing list of musl libc
 help / color / mirror / code / Atom feed
* Stdio resource usage
@ 2019-02-19 23:34 Nick Bray
  2019-02-20  2:43 ` Rich Felker
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Bray @ 2019-02-19 23:34 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1527 bytes --]

Other that compiler warnings, the main pain point I ran into porting a
subset of Musl into a resource constrained environment was the resource
usage of stdio.  I don't expect any of these modifications to make it
upstream.  Talking out loud as a FYI / user feedback.  Also curious to see
if there's any wisdom out there.

Stack usage of stdio was an issue.  On arm64, printf takes 8k of stack
which is a rough when you only have 4-12k of stack.  This is because fmt_fp
allocates stack space proportional O(log(MAX_LONG_DOUBLE)).  It also gets
inlined into printf so you always take the hit.  (noinline fmt_fp is a
Faustian bargain that makes stack usage worse in the worst case... hmmm.)
On arm64, long double is defined as 128 bits, which not only increases
stack size because of the larger mantisa, but also pulls in software
emulation for fp128.  In terms of spec compliance, Musl is doing the right
thing.  But as a practical matter, none of the programs I care about will
ever use long double.  So my rough first pass was to reduce the max float
size from long double to double.  In a later pass, I'll also add a knob to
remove floating point formatting entirely.

%m calls strerror which pulls in a string table, so removing support for %m
lets static linking and DCE work its magic.  I also eliminated %n for
security hardening reasons.

The "states" structure is sparse and takes a little more memory than I'd
like -  464b of rodata.  I don't see any workarounds without deeper
changes, so for now I am living with it.

[-- Attachment #2: Type: text/html, Size: 1739 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stdio resource usage
  2019-02-19 23:34 Stdio resource usage Nick Bray
@ 2019-02-20  2:43 ` Rich Felker
  2019-02-20 10:49   ` Szabolcs Nagy
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2019-02-20  2:43 UTC (permalink / raw)
  To: musl

On Tue, Feb 19, 2019 at 03:34:52PM -0800, Nick Bray wrote:
> Other that compiler warnings, the main pain point I ran into porting a
> subset of Musl into a resource constrained environment was the resource
> usage of stdio.

For what it's worth, I think this is better described as "printf" than
"stdio". The rest of stdio is utterly tiny.

> I don't expect any of these modifications to make it
> upstream.  Talking out loud as a FYI / user feedback.  Also curious to see
> if there's any wisdom out there.
> 
> Stack usage of stdio was an issue.  On arm64, printf takes 8k of stack
> which is a rough when you only have 4-12k of stack.  This is because fmt_fp
> allocates stack space proportional O(log(MAX_LONG_DOUBLE)).  It also gets
> inlined into printf so you always take the hit.  (noinline fmt_fp is a

This is a known compiler flaw, hoisting large stack allocations, and
one I've complained a lot about but with little luck. It might be
possible to work around it by making the array a VLA, whose size is 1
or the proper size depending on some condition the compiler can't
easily see, but that's rather awful. It might be worth doing though,
given the lack of progress fixing the bug.

> Faustian bargain that makes stack usage worse in the worst case... hmmm.)
> On arm64, long double is defined as 128 bits, which not only increases
> stack size because of the larger mantisa, but also pulls in software
> emulation for fp128.  In terms of spec compliance, Musl is doing the right
> thing.  But as a practical matter, none of the programs I care about will
> ever use long double.  So my rough first pass was to reduce the max float
> size from long double to double.  In a later pass, I'll also add a knob to
> remove floating point formatting entirely.

It's kinda unfortunate that aarch64 defined long double as IEEE quad
without hardware implementation of it, but it's probably the right
future-facing choice. I was under the impression that aarch64 was
intended mostly for "large" systems, and that you'd use 32-bit arm
(with much smaller code due to thumb) for tiny space-constrained
systems, though.

> %m calls strerror which pulls in a string table, so removing support for %m
> lets static linking and DCE work its magic.

Yes. Note that %m is needed for a confirming syslog(), which was the
motivation for supporting it in printf.

> I also eliminated %n for
> security hardening reasons.

This actually introduces security bugs by breaking the contract. At
some point I believe there may even have been some parts of musl you
would have broken in dangerous ways, though I'm not sure if that's the
case now. If you have a situation where the format string is
non-constant, that, not %n, is the problem.

> The "states" structure is sparse and takes a little more memory than I'd
> like -  464b of rodata.  I don't see any workarounds without deeper
> changes, so for now I am living with it.

I think you'd have a hard time fitting the code to use a more
space-efficient data structure (e.g. binary search of a sorted
non-sparse table with pairs rather than just outputs) in less than the
size difference.

Rich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stdio resource usage
  2019-02-20  2:43 ` Rich Felker
@ 2019-02-20 10:49   ` Szabolcs Nagy
  2019-02-20 15:47     ` Markus Wichmann
  0 siblings, 1 reply; 12+ messages in thread
From: Szabolcs Nagy @ 2019-02-20 10:49 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2019-02-19 21:43:13 -0500]:
> On Tue, Feb 19, 2019 at 03:34:52PM -0800, Nick Bray wrote:
> > I don't expect any of these modifications to make it
> > upstream.  Talking out loud as a FYI / user feedback.  Also curious to see
> > if there's any wisdom out there.
> > 
> > Stack usage of stdio was an issue.  On arm64, printf takes 8k of stack
> > which is a rough when you only have 4-12k of stack.  This is because fmt_fp
> > allocates stack space proportional O(log(MAX_LONG_DOUBLE)).  It also gets
> > inlined into printf so you always take the hit.  (noinline fmt_fp is a
> 
> This is a known compiler flaw, hoisting large stack allocations, and
> one I've complained a lot about but with little luck. It might be
> possible to work around it by making the array a VLA, whose size is 1
> or the proper size depending on some condition the compiler can't
> easily see, but that's rather awful. It might be worth doing though,
> given the lack of progress fixing the bug.

i think it's just an llvm issue, or does this happen with gcc too now?

> > Faustian bargain that makes stack usage worse in the worst case... hmmm.)
> > On arm64, long double is defined as 128 bits, which not only increases
> > stack size because of the larger mantisa, but also pulls in software

note: the mantissa is not the real issue, the exponent range is.
(e.g. to printf 0x1p-16494L you need to compute 5^16494/10^16494
which is floor(log10(5)*16494) + 1 = 11529 digits)

> > emulation for fp128.  In terms of spec compliance, Musl is doing the right
> > thing.  But as a practical matter, none of the programs I care about will
> > ever use long double.  So my rough first pass was to reduce the max float
> > size from long double to double.  In a later pass, I'll also add a knob to
> > remove floating point formatting entirely.
> 
> It's kinda unfortunate that aarch64 defined long double as IEEE quad
> without hardware implementation of it, but it's probably the right
> future-facing choice. I was under the impression that aarch64 was
> intended mostly for "large" systems, and that you'd use 32-bit arm
> (with much smaller code due to thumb) for tiny space-constrained
> systems, though.

aarch64 has 128 bit fp regs, so in principle future arch extension
may add 128bit instructions without breaking abi. (which may happen
if aarch64 gets adoption in supercomputers, e.g. powerpc64 did that)

> > %m calls strerror which pulls in a string table, so removing support for %m
> > lets static linking and DCE work its magic.
> 
> Yes. Note that %m is needed for a confirming syslog(), which was the
> motivation for supporting it in printf.
> 
> > I also eliminated %n for
> > security hardening reasons.
> 
> This actually introduces security bugs by breaking the contract. At
> some point I believe there may even have been some parts of musl you
> would have broken in dangerous ways, though I'm not sure if that's the
> case now. If you have a situation where the format string is
> non-constant, that, not %n, is the problem.

i think %n is not a huge loss, but it does sound like
repeating the bionic mistakes.  (providing posix symbols
with slightly not posix conform semantics because of
speculative resons which turned out to be a lot more
expensive to fix up than just following the standard)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stdio resource usage
  2019-02-20 10:49   ` Szabolcs Nagy
@ 2019-02-20 15:47     ` Markus Wichmann
  2019-02-20 16:37       ` Szabolcs Nagy
  2019-02-20 18:34       ` A. Wilcox
  0 siblings, 2 replies; 12+ messages in thread
From: Markus Wichmann @ 2019-02-20 15:47 UTC (permalink / raw)
  To: musl

On Wed, Feb 20, 2019 at 11:49:01AM +0100, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2019-02-19 21:43:13 -0500]:
> > This is a known compiler flaw, hoisting large stack allocations, and
> > one I've complained a lot about but with little luck. It might be
> > possible to work around it by making the array a VLA, whose size is 1
> > or the proper size depending on some condition the compiler can't
> > easily see, but that's rather awful. It might be worth doing though,
> > given the lack of progress fixing the bug.
> 
> i think it's just an llvm issue, or does this happen with gcc too now?
> 

Take me like a data point: On x86_64, with gcc 8.2.0, and -Os, fmt_fp() is
not inlined into printf_core(). And it alone takes a whopping 7496 bytes
of stack (printf_core() only takes 168).

Compiling with -O2 also does not inline fmt_fp(), and it takes 7480
bytes. So somehow -O2 manages to save sixteen bytes of stack.

If I play for all the marbles and use -O3, it still doesn't inline
fmt_fp(), and now it needs 7752 bytes of stack. So now it needs 256
bytes more than originally.

It appears as though at least gcc 8 is no longer as inline happy as it
once was.

> 
> aarch64 has 128 bit fp regs, so in principle future arch extension
> may add 128bit instructions without breaking abi. (which may happen
> if aarch64 gets adoption in supercomputers, e.g. powerpc64 did that)
> 

Say, if IEEE quad is causing problems, wouldn't it be possible to
compile a tool chain with long double == double for the time being?

Ciao,
Markus


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stdio resource usage
  2019-02-20 15:47     ` Markus Wichmann
@ 2019-02-20 16:37       ` Szabolcs Nagy
  2019-02-20 17:13         ` Rich Felker
  2019-02-20 18:34       ` A. Wilcox
  1 sibling, 1 reply; 12+ messages in thread
From: Szabolcs Nagy @ 2019-02-20 16:37 UTC (permalink / raw)
  To: musl

* Markus Wichmann <nullplan@gmx.net> [2019-02-20 16:47:40 +0100]:
> On Wed, Feb 20, 2019 at 11:49:01AM +0100, Szabolcs Nagy wrote:
> > aarch64 has 128 bit fp regs, so in principle future arch extension
> > may add 128bit instructions without breaking abi. (which may happen
> > if aarch64 gets adoption in supercomputers, e.g. powerpc64 did that)
> > 
> 
> Say, if IEEE quad is causing problems, wouldn't it be possible to
> compile a tool chain with long double == double for the time being?

of course if you are writing your own os you can do all sorts of things,
but toolchains will follow the pcs abi in general which says that
long double == ieee binary128

e.g. gcc does not have a config option to turn long double type into
binary64 on aarch64. so you would have to maintain your own gcc fork.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stdio resource usage
  2019-02-20 16:37       ` Szabolcs Nagy
@ 2019-02-20 17:13         ` Rich Felker
  0 siblings, 0 replies; 12+ messages in thread
From: Rich Felker @ 2019-02-20 17:13 UTC (permalink / raw)
  To: musl

On Wed, Feb 20, 2019 at 05:37:59PM +0100, Szabolcs Nagy wrote:
> * Markus Wichmann <nullplan@gmx.net> [2019-02-20 16:47:40 +0100]:
> > On Wed, Feb 20, 2019 at 11:49:01AM +0100, Szabolcs Nagy wrote:
> > > aarch64 has 128 bit fp regs, so in principle future arch extension
> > > may add 128bit instructions without breaking abi. (which may happen
> > > if aarch64 gets adoption in supercomputers, e.g. powerpc64 did that)
> > > 
> > 
> > Say, if IEEE quad is causing problems, wouldn't it be possible to
> > compile a tool chain with long double == double for the time being?
> 
> of course if you are writing your own os you can do all sorts of things,
> but toolchains will follow the pcs abi in general which says that
> long double == ieee binary128
> 
> e.g. gcc does not have a config option to turn long double type into
> binary64 on aarch64. so you would have to maintain your own gcc fork.

One thing I think we could consider doing in musl is extract the long
double significand via the representation rather than math, and using
float_t rather than long double for the rounding probes. However I'm
not clear that most of the soft float machinery wouldn't already be
pulled in via the promotion of double args to long double for storage
in the arg structure... Also hex float would need new code I think.

Rich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stdio resource usage
  2019-02-20 15:47     ` Markus Wichmann
  2019-02-20 16:37       ` Szabolcs Nagy
@ 2019-02-20 18:34       ` A. Wilcox
  2019-02-20 19:11         ` Markus Wichmann
  1 sibling, 1 reply; 12+ messages in thread
From: A. Wilcox @ 2019-02-20 18:34 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 432 bytes --]

On 02/20/19 09:47, Markus Wichmann wrote:
> It appears as though at least gcc 8 is no longer as inline happy as it
> once was.


I have 0 experience with gcc8, but have you tried explicitly asking?
These CFLAGS look useful:

-finline-functions
-finline-functions-called-once
-finline-small-functions

Best to you and yours,
--arw

-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stdio resource usage
  2019-02-20 18:34       ` A. Wilcox
@ 2019-02-20 19:11         ` Markus Wichmann
  2019-02-20 19:24           ` Rich Felker
  0 siblings, 1 reply; 12+ messages in thread
From: Markus Wichmann @ 2019-02-20 19:11 UTC (permalink / raw)
  To: musl

On Wed, Feb 20, 2019 at 12:34:49PM -0600, A. Wilcox wrote:
> On 02/20/19 09:47, Markus Wichmann wrote:
> > It appears as though at least gcc 8 is no longer as inline happy as it
> > once was.
> 
> 
> I have 0 experience with gcc8, but have you tried explicitly asking?
> These CFLAGS look useful:
> 
> -finline-functions
> -finline-functions-called-once
> -finline-small-functions
> 
> Best to you and yours,
> --arw
> 
> -- 
> A. Wilcox (awilfox)
> Project Lead, Adélie Linux
> https://www.adelielinux.org
> 


For one, that doesn't count, since the whole purpose was to try to
trigger the problem inadvertantly. For two, according to the manpage:

| -finline-small-functions
|    [...]
|    Enabled at level -O2.
|
|-finline-functions
|    [...]
|    Enabled at level -O3.
|
|-finline-functions-called-once
|    [...]
|    Enabled at levels -O1, -O2, -O3, and -Os.

I have no idea what the purpose of the enumeration in the last one is,
since the levels are supposed to be cumulative, with -Os being on top of
level 1. Anyway, it appears I inadvertantly *did* try all those
switches.

Though I did get curious, and decided to check if my method even works.
I'm running objdump on vfprintf.o, and check for the first stack
allocation in the functions. And let the following be my validation:
clang will inline fmt_fp into printf_core at levels -Os and -O3. And
printf_core will allocate 8k of stack.

Ciao,
Markus


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stdio resource usage
  2019-02-20 19:11         ` Markus Wichmann
@ 2019-02-20 19:24           ` Rich Felker
  2019-02-21 16:09             ` Markus Wichmann
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2019-02-20 19:24 UTC (permalink / raw)
  To: musl

On Wed, Feb 20, 2019 at 08:11:51PM +0100, Markus Wichmann wrote:
> On Wed, Feb 20, 2019 at 12:34:49PM -0600, A. Wilcox wrote:
> > On 02/20/19 09:47, Markus Wichmann wrote:
> > > It appears as though at least gcc 8 is no longer as inline happy as it
> > > once was.
> > 
> > 
> > I have 0 experience with gcc8, but have you tried explicitly asking?
> > These CFLAGS look useful:
> > 
> > -finline-functions
> > -finline-functions-called-once
> > -finline-small-functions
> > 
> > Best to you and yours,
> > --arw
> > 
> > -- 
> > A. Wilcox (awilfox)
> > Project Lead, Adélie Linux
> > https://www.adelielinux.org
> > 
> 
> 
> For one, that doesn't count, since the whole purpose was to try to
> trigger the problem inadvertantly. For two, according to the manpage:
> 
> | -finline-small-functions
> |    [...]
> |    Enabled at level -O2.
> |
> |-finline-functions
> |    [...]
> |    Enabled at level -O3.
> |
> |-finline-functions-called-once
> |    [...]
> |    Enabled at levels -O1, -O2, -O3, and -Os.
> 
> I have no idea what the purpose of the enumeration in the last one is,
> since the levels are supposed to be cumulative, with -Os being on top of
> level 1. Anyway, it appears I inadvertantly *did* try all those
> switches.
> 
> Though I did get curious, and decided to check if my method even works.
> I'm running objdump on vfprintf.o, and check for the first stack
> allocation in the functions. And let the following be my validation:
> clang will inline fmt_fp into printf_core at levels -Os and -O3. And
> printf_core will allocate 8k of stack.

For what it's worth, gcc has a -fconserve-stack that in principle
should avoid this problem, but I could never get it to do anything. If
it works now we should probably detect and add it to default CFLAGS.

Rich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stdio resource usage
  2019-02-20 19:24           ` Rich Felker
@ 2019-02-21 16:09             ` Markus Wichmann
  2019-02-21 16:27               ` Jens Gustedt
  2019-02-21 17:02               ` Rich Felker
  0 siblings, 2 replies; 12+ messages in thread
From: Markus Wichmann @ 2019-02-21 16:09 UTC (permalink / raw)
  To: musl

On Wed, Feb 20, 2019 at 02:24:23PM -0500, Rich Felker wrote:
> For what it's worth, gcc has a -fconserve-stack that in principle
> should avoid this problem, but I could never get it to do anything. If
> it works now we should probably detect and add it to default CFLAGS.
> 
> Rich

Well, that also doesn't help since gcc is the compiler that *doesn't*
exhibit the problem. clang does. And clang doesn't have an option to
conserve stack (that I've seen).

I am wondering what other possibilities exist to prevent the issue. If
we won't change the algorithm, that only leaves exploring other
possibilities for the memory allocation.

So, what are our choices?

- Heap allocation: But that can fail. Now, printf() is actually allowed
  to fail, but no-one expects it to. I would expect such behavior to be
  problematic at best.
- Static allocation: Without synchronization this won't be thread-safe,
  with synchronization it won't be re-entrant. Now, as far as I could
  see, the printf() family is actually not required to be re-entrant
  (e.g. signal-safety(7) fails to list any of them), but I have seen
  sprintf() in signal handlers in the wild (well, exception handlers,
  really).
- Thread-local static allocation: Which is always a hassle in libc, and
  does not take care of re-entrancy. It would only solve the
  thread-safety issue.
- As-needed stack allocation (e.g. alloca()): This fails to prevent the
  worst case allocation, though it would make the average allocation
  more bearable. But I don't know if especially clever compilers like
  clang wouldn't optimize this stuff away, and we'd be back to square
  one.

Any ideas left?

Ciao,
Markus


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stdio resource usage
  2019-02-21 16:09             ` Markus Wichmann
@ 2019-02-21 16:27               ` Jens Gustedt
  2019-02-21 17:02               ` Rich Felker
  1 sibling, 0 replies; 12+ messages in thread
From: Jens Gustedt @ 2019-02-21 16:27 UTC (permalink / raw)
  Cc: musl

[-- Attachment #1: Type: text/plain, Size: 1930 bytes --]

Hello,

On Thu, 21 Feb 2019 17:09:37 +0100 Markus Wichmann <nullplan@gmx.net>
wrote:

> On Wed, Feb 20, 2019 at 02:24:23PM -0500, Rich Felker wrote:
> So, what are our choices?
> 
> - Heap allocation: But that can fail. Now, printf() is actually
> allowed to fail, but no-one expects it to. I would expect such
> behavior to be problematic at best.
> - Static allocation: Without synchronization this won't be
> thread-safe, with synchronization it won't be re-entrant. Now, as far
> as I could see, the printf() family is actually not required to be
> re-entrant (e.g. signal-safety(7) fails to list any of them), but I
> have seen sprintf() in signal handlers in the wild (well, exception
> handlers, really).
> - Thread-local static allocation: Which is always a hassle in libc,
> and does not take care of re-entrancy. It would only solve the
>   thread-safety issue.
> - As-needed stack allocation (e.g. alloca()): This fails to prevent
> the worst case allocation, though it would make the average allocation
>   more bearable. But I don't know if especially clever compilers like
>   clang wouldn't optimize this stuff away, and we'd be back to square
>   one.

Perhaps the latter, but maybe with VLA? Unfortunately these techniques
have no reliable error detection mechanism.

For the static allocation strategy one could try to implement
something like a "bounded" stack, that is two or three versions of the
data in a array, protected by a lock and a counter, such that at least
one level of signal handler could still use it. But this is probably a
bit tedious to implement.

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Stdio resource usage
  2019-02-21 16:09             ` Markus Wichmann
  2019-02-21 16:27               ` Jens Gustedt
@ 2019-02-21 17:02               ` Rich Felker
  1 sibling, 0 replies; 12+ messages in thread
From: Rich Felker @ 2019-02-21 17:02 UTC (permalink / raw)
  To: musl

On Thu, Feb 21, 2019 at 05:09:37PM +0100, Markus Wichmann wrote:
> On Wed, Feb 20, 2019 at 02:24:23PM -0500, Rich Felker wrote:
> > For what it's worth, gcc has a -fconserve-stack that in principle
> > should avoid this problem, but I could never get it to do anything. If
> > it works now we should probably detect and add it to default CFLAGS.
> > 
> > Rich
> 
> Well, that also doesn't help since gcc is the compiler that *doesn't*
> exhibit the problem. clang does. And clang doesn't have an option to
> conserve stack (that I've seen).
> 
> I am wondering what other possibilities exist to prevent the issue. If
> we won't change the algorithm, that only leaves exploring other
> possibilities for the memory allocation.

There is no algorithm that takes less space, at not without some kind
of cubic-in-exponent-value or worse time. The amount of space we use
is optimal up to some small factor. It might be possible to shrink
this factor with a sharper bound on number of digits needed, with no
change in the algorihm, but I think the reduction would be at most
something like 20%.

> So, what are our choices?
> 
> - Heap allocation: But that can fail. Now, printf() is actually allowed
>   to fail, but no-one expects it to. I would expect such behavior to be
>   problematic at best.

printf can fail for valid reasons, but snprintf cannot. Technically
POSIX allows any interface that can fail to be able to fail for
additional implementation-defined reasons, but this is unacceptably
bad QoI and completely contrary to the principles of musl, that
nothing fails unless there's an underlying reason it has to be able to
fail.

> - Static allocation: Without synchronization this won't be thread-safe,
>   with synchronization it won't be re-entrant. Now, as far as I could
>   see, the printf() family is actually not required to be re-entrant
>   (e.g. signal-safety(7) fails to list any of them), but I have seen
>   sprintf() in signal handlers in the wild (well, exception handlers,
>   really).

If you can afford to increase .data size by ~8k, why can'd you just
increase stack size by ~8k instead? Of course the latter would scale
in number of threads, but presumably if you're this
resource-constrained you're not using threads, or can avoid using
printf from most of them.

> - Thread-local static allocation: Which is always a hassle in libc, and
>   does not take care of re-entrancy. It would only solve the
>   thread-safety issue.

This is strictly-worse than just using the stack. Implementation-wise,
the TLS is equivalent to a stack object on the top-level call frame of
the thread. There's no reason to put it there rather than in the
bottom-level call frame.

> - As-needed stack allocation (e.g. alloca()): This fails to prevent the
>   worst case allocation, though it would make the average allocation
>   more bearable. But I don't know if especially clever compilers like
>   clang wouldn't optimize this stuff away, and we'd be back to square
>   one.

This is what I already suggested (via VLA, not alloca, as the latter
is not C and worse in most ways) as a workaround for the clang
hoisting of allocations. But in principle the compiler could still see
that if the declaration is reachable the size is constant (or even
close enough to constant that it could just optimize to a fixed-size
array of the upper bound), and optimize out its being variable, then
hoist it. So this really is a hack that's "tricking the optimizer",
not any fundamental fix.

> Any ideas left?

Getting clang to fix their hoisting of (large) stack objects beyond
their scope/lifetime?

Rich


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-02-21 17:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-19 23:34 Stdio resource usage Nick Bray
2019-02-20  2:43 ` Rich Felker
2019-02-20 10:49   ` Szabolcs Nagy
2019-02-20 15:47     ` Markus Wichmann
2019-02-20 16:37       ` Szabolcs Nagy
2019-02-20 17:13         ` Rich Felker
2019-02-20 18:34       ` A. Wilcox
2019-02-20 19:11         ` Markus Wichmann
2019-02-20 19:24           ` Rich Felker
2019-02-21 16:09             ` Markus Wichmann
2019-02-21 16:27               ` Jens Gustedt
2019-02-21 17:02               ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).