possible getopt stderr output changes

mailing list of musl libc
 help / color / mirror / code / Atom feed

* possible getopt stderr output changes
@ 2014-12-11  0:10 Rich Felker
  2014-12-11  3:53 ` Laurent Bercot
  2014-12-11 22:07 ` possible getopt stderr output changes Rich Felker
  0 siblings, 2 replies; 12+ messages in thread
From: Rich Felker @ 2014-12-11  0:10 UTC (permalink / raw)
  To: musl

The current getopt code uses some ugly write() sequences to generate
its output to stderr, and fails to support message translation. The
latter was an oversight when locale/translation support was added and
should absolutely be fixed. I'm not sure whether we should leave the
code using write() though or switch to fprintf.

The original motivation for write() was to avoid pulling in the printf
core and stdio in programs that use getopt but otherwise don't need
printf/stdio. However, the use of multiple write() calls splits the
messages up into multiple syscalls unnecessarily (increasing the
likelihood of getting output interleaved with other processes running
in parallel on the same stderr) and failure to use the stderr FILE
makes it so the output is not even atomic within the same process. I
don't think there's any formal requirement of atomicity here, but it
could be seen as a QoI issue.

Note that even converted to use fprintf, the code would still be
mildly ugly, since it would have to use multiple %s formats and locale
lookups to construct the message. This is because musl security policy
forbids use of translatable format strings in libc; instead,
translatable literals have to be used and processed by a fixed,
non-translated format string.

Thoughts on what color the bikeshed should be?

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: possible getopt stderr output changes
  2014-12-11  0:10 possible getopt stderr output changes Rich Felker
@ 2014-12-11  3:53 ` Laurent Bercot
  2014-12-11  6:44   ` Rich Felker
  2014-12-11 22:07 ` possible getopt stderr output changes Rich Felker
  1 sibling, 1 reply; 12+ messages in thread
From: Laurent Bercot @ 2014-12-11  3:53 UTC (permalink / raw)
  To: musl

On 11/12/2014 01:10, Rich Felker wrote:
> The current getopt code uses some ugly write() sequences to generate
> its output to stderr, and fails to support message translation. The
> latter was an oversight when locale/translation support was added and
> should absolutely be fixed. I'm not sure whether we should leave the
> code using write() though or switch to fprintf.

  For what is worth, I may use getopt() sometime, but I will never, ever
use stdio, which should burn in the deepest pits of Hell, and I'm being
nuanced here.
  Please don't tie a reasonable interface to the flying kitchen sink
monster just because it's guilty of having to write stuff to stderr in
one particular case. It doesn't deserve that much punishment.

> printf/stdio. However, the use of multiple write() calls splits the
> messages up into multiple syscalls unnecessarily (increasing the
> likelihood of getting output interleaved with other processes running
> in parallel on the same stderr)

  It is rare for getopt to return a parsing error when the program is
used without an interactive terminal: scripts are usually debugged
before they're daemonized. Most use cases of getopt writing to stderr
are interactive, so the likelihood of interleaving output is low.

  That said, I'm all for buffering, but is there anything more to do
than print localized versions of "illegal option" and "option requires
an argument", with some locale-independent data prepended and appended ?
Isn't it possible to compute the size of the final string in advance,
and build it in a temporary buffer on the stack, before writing ?
It's simple buffering: neither stdio's formatting engine, nor its
FILE plate of noodles, are needed.

> Thoughts on what color the bikeshed should be?

  I don't mind the color, but let's keep it SUV-free.

-- 
  Laurent

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: possible getopt stderr output changes
  2014-12-11  3:53 ` Laurent Bercot
@ 2014-12-11  6:44   ` Rich Felker
  2014-12-11 15:40     ` Laurent Bercot
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2014-12-11  6:44 UTC (permalink / raw)
  To: musl

On Thu, Dec 11, 2014 at 04:53:52AM +0100, Laurent Bercot wrote:
> On 11/12/2014 01:10, Rich Felker wrote:
> >The current getopt code uses some ugly write() sequences to generate
> >its output to stderr, and fails to support message translation. The
> >latter was an oversight when locale/translation support was added and
> >should absolutely be fixed. I'm not sure whether we should leave the
> >code using write() though or switch to fprintf.
> 
>  For what is worth, I may use getopt() sometime, but I will never, ever
> use stdio, which should burn in the deepest pits of Hell, and I'm being
> nuanced here.

Is there a reason behind this? On my build, the printf core is ~6.5k
and the other parts of stdio you might be likely to pull in are under
2k. I'm happy to take your opinion into consideration but it would be
nice to have some rationale.

>  Please don't tie a reasonable interface to the flying kitchen sink
> monster just because it's guilty of having to write stuff to stderr in
> one particular case. It doesn't deserve that much punishment.

Personally I find stdio a lot more reasonable than getopt. The latter
has ugly global state, including possibly hidden internal state with
no standard way to reset it. It works well enough for most things
(because you can pretend the global state is a sort of main-local
state), but it's a problem if you want to handle multiple virtual
command lines in the same process (things like busybox-type shell with
builtins, or a program handling input from network, GUI, etc. as
command lines to be parsed like options, etc.).

> >printf/stdio. However, the use of multiple write() calls splits the
> >messages up into multiple syscalls unnecessarily (increasing the
> >likelihood of getting output interleaved with other processes running
> >in parallel on the same stderr)
> 
>  It is rare for getopt to return a parsing error when the program is
> used without an interactive terminal: scripts are usually debugged
> before they're daemonized. Most use cases of getopt writing to stderr
> are interactive, so the likelihood of interleaving output is low.

This is certainly true.

>  That said, I'm all for buffering, but is there anything more to do
> than print localized versions of "illegal option" and "option requires
> an argument", with some locale-independent data prepended and appended ?
> Isn't it possible to compute the size of the final string in advance,
> and build it in a temporary buffer on the stack, before writing ?
> It's simple buffering: neither stdio's formatting engine, nor its
> FILE plate of noodles, are needed.

For proper reporting of errors with long options (note: currently this
is not done right), at least one component of the message, the option
name, has unbounded size, so there's no simple way to generate the
whole message in a buffer. And even if we just did as much as we
could, the code for buffering would be ugly and increase code size by
at least a few hundred bytes I think. So this doesn't sound like much
of a win over just doing the current multiple-write() approach.

And yes you're right about the nature of the translatable portion and
locale-independent portion of the messages.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: possible getopt stderr output changes
  2014-12-11  6:44   ` Rich Felker
@ 2014-12-11 15:40     ` Laurent Bercot
  2014-12-11 17:51       ` stdio [de]merits discussion [Re: [musl] possible getopt stderr output changes] Rich Felker
  0 siblings, 1 reply; 12+ messages in thread
From: Laurent Bercot @ 2014-12-11 15:40 UTC (permalink / raw)
  To: musl

On 11/12/2014 07:44, Rich Felker wrote:
> Is there a reason behind this? On my build, the printf core is ~6.5k
> and the other parts of stdio you might be likely to pull in are under
> 2k. I'm happy to take your opinion into consideration but it would be
> nice to have some rationale.

  6.5k, or even 8.5k, is not much in the grand scale of things, but it's
about the ratio of useful pulled in code / total pulled in code, which
I like to be as close as possible to 1. And stdio tanks that ratio,
see below. The modest size of the printf code is a testimony to the
efficiency of the musl implementation, not to the sanity of the
interface.


> Personally I find stdio a lot more reasonable than getopt.

  I dislike stdio for several reasons:

  - The formatting engine is certainly convenient, but it is basically
a runtime interpreter, which has to be entirely pulled in as soon as
there's a format string, no matter how simple the formatting is.
(Unless compilers perform specific static analysis on format strings
to know which part of the interpreter they have to pull, but I doubt
this is the case; gcc magically replaces printf(x) with puts(x) when
x is devoid of format operations, and it is ugly enough as is.)
That means I have to pull in the formatting code for floating point
numbers, even if I only handle integers and strings; I have to pull in
the code for edge cases of the specification, including the infamous
"%n$" format, even if I never need it; I have to pull in varargs even
if I only do very regular things with a fixed number of arguments.
Most of the time I just want to print a string, a character, or an
integer: being able to do this shouldn't add more than 2k to my
executable, at most.

  - The FILE interface is not by any mesure suited to reliable I/O.
  When printf fails, there's no way to know how many bytes have been
written to the descriptor. Same with fclose: if it fails, and the
buffer was not empty, there's no way to know if everything was written.
Having the same structure for buffered (stdout) and unbuffered (stderr)
output is unnecessarily confusing; and don't get me started on buffered
input, the details of which users have exactly zero control over. FILE
is totally unusable for asynchronous I/O, which is 99% of what I do;
it's just good enough to write error messages to stderr, where you don't
need accurate reporting - in which case you can even do without stdio
because stderr is unbuffered anyway.

  stdio, like a lot of today's standards, is only there because it's
historical, and interface designers didn't know better at the time.
It being a widely used and established standard doesn't mean that
it's a good standard, by far.


> [getopt]
> has ugly global state, including possibly hidden internal state with
> no standard way to reset it. It works well enough for most things
> (because you can pretend the global state is a sort of main-local
> state), but it's a problem if you want to handle multiple virtual
> command lines in the same process

  I agree, it's ugly; but global state is a known problem and it's
easy to fix. It's already been fixed for pwd/grp/netdb, for localtime,
and a lot of other interfaces; it's only a matter of time before some
kind of getopt_r() is standardized.


> For proper reporting of errors with long options (note: currently this
> is not done right), at least one component of the message, the option
> name, has unbounded size, so there's no simple way to generate the
> whole message in a buffer.

  Ah, long options. I have no idea how feasible it is to keep getopt and
getopt_long as separated as possible, but I wouldn't mind at all if
getopt_long (but not getopt) relied on stdio. Because programs using
getopt_long are likely to already be using stdio anyway, and this is
probably GNU so no one cares about code size. :)


> So this doesn't sound like much
> of a win over just doing the current multiple-write() approach.

  Since it mostly happens in the interactive case, avoiding multiple
writes is essentially an artistic consideration. I was just interested
in learning why you hadn't suggested manual buffering.

-- 
  Laurent


^ permalink raw reply	[flat|nested] 12+ messages in thread

* stdio [de]merits discussion [Re: [musl] possible getopt stderr output changes]
  2014-12-11 15:40     ` Laurent Bercot
@ 2014-12-11 17:51       ` Rich Felker
  2014-12-11 23:05         ` Laurent Bercot
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2014-12-11 17:51 UTC (permalink / raw)
  To: musl

On Thu, Dec 11, 2014 at 04:40:15PM +0100, Laurent Bercot wrote:
> On 11/12/2014 07:44, Rich Felker wrote:
> >Is there a reason behind this? On my build, the printf core is ~6.5k
> >and the other parts of stdio you might be likely to pull in are under
> >2k. I'm happy to take your opinion into consideration but it would be
> >nice to have some rationale.
> 
>  6.5k, or even 8.5k, is not much in the grand scale of things, but it's
> about the ratio of useful pulled in code / total pulled in code, which
> I like to be as close as possible to 1. And stdio tanks that ratio,
> see below. The modest size of the printf code is a testimony to the
> efficiency of the musl implementation, not to the sanity of the
> interface.
> 
> 
> >Personally I find stdio a lot more reasonable than getopt.
> 
>  I dislike stdio for several reasons:
> 
>  - The formatting engine is certainly convenient, but it is basically

I like it because in all but the tiniest programs, you end up needing
this kind of functionality, and whenever somebody rolls their own,
it's inevitably 10x to 100x larger and uglier than musl's printf core.

> a runtime interpreter, which has to be entirely pulled in as soon as
> there's a format string, no matter how simple the formatting is.
> (Unless compilers perform specific static analysis on format strings
> to know which part of the interpreter they have to pull, but I doubt
> this is the case; gcc magically replaces printf(x) with puts(x) when
> x is devoid of format operations, and it is ugly enough as is.)
> That means I have to pull in the formatting code for floating point
> numbers, even if I only handle integers and strings; I have to pull in
> the code for edge cases of the specification, including the infamous
> "%n$" format, even if I never need it; I have to pull in varargs even
> if I only do very regular things with a fixed number of arguments.
> Most of the time I just want to print a string, a character, or an
> integer: being able to do this shouldn't add more than 2k to my
> executable, at most.

Of all that, the only thing contributing non-trivial size is floating
point support.

>  - The FILE interface is not by any mesure suited to reliable I/O.

This is certainly true.

>  When printf fails, there's no way to know how many bytes have been
> written to the descriptor.

For seekable files, ftello can tell you. Generally I agree with this
reasoning, that stdio is not the right tool for working in-place on
valuable files. But it's perfectly usable for producing new output in
cases where all write errors will simply result in failing the whole
"make a file" operation.

> Same with fclose: if it fails, and the
> buffer was not empty, there's no way to know if everything was written.

This is solved by fflush before fclose.

> Having the same structure for buffered (stdout) and unbuffered (stderr)
> output is unnecessarily confusing; and don't get me started on buffered
> input, the details of which users have exactly zero control over. FILE

Stdio read operations should not block unless more data is needed to
satisfy the actual request the application is making. If they do it's
an implementation bug. Of course it's not usable with select/poll
loops because you can't see if there's data already in the buffer. GNU
software (gnulib in particular) likes to ignore this problem by poking
at internals; we gave them an alternate solution with musl a couple
years back just to avoid this. :(

> is totally unusable for asynchronous I/O, which is 99% of what I do;

For event-driven models, yes. For threaded models, it's quite usable
and IMO it simplifies code by a a larger factor than the size it adds,
in cases where it's sufficient.

> it's just good enough to write error messages to stderr, where you don't
> need accurate reporting - in which case you can even do without stdio
> because stderr is unbuffered anyway.

The big thing it provides here is a standard point of synchronization
for error messages in multithreaded programs. Otherwise there would be
no lock for different library components to agree on to prevent
interleaved error output.

>  stdio, like a lot of today's standards, is only there because it's
> historical, and interface designers didn't know better at the time.
> It being a widely used and established standard doesn't mean that
> it's a good standard, by far.

Yes and no. There are some things that could have been done better,
and some backwards-compatible additions that could be made to make it
a lot more useful, but I think stdio still largely succeeds in freeing
the programmer from having to spend lots of effort on IO code, for a
large class of useful programs (certainly not all, though!).

> >So this doesn't sound like much
> >of a win over just doing the current multiple-write() approach.
> 
>  Since it mostly happens in the interactive case, avoiding multiple
> writes is essentially an artistic consideration. I was just interested
> in learning why you hadn't suggested manual buffering.

I agree that it's essentially an artistic consideration.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: stdio [de]merits discussion [Re: [musl] possible getopt stderr output changes]
  2014-12-11 17:51       ` stdio [de]merits discussion [Re: [musl] possible getopt stderr output changes] Rich Felker
@ 2014-12-11 23:05         ` Laurent Bercot
  2014-12-11 23:35           ` Rich Felker
  2014-12-12  2:33           ` Morten Welinder
  0 siblings, 2 replies; 12+ messages in thread
From: Laurent Bercot @ 2014-12-11 23:05 UTC (permalink / raw)
  To: musl

On 11/12/2014 18:51, Rich Felker wrote:
> I like it because in all but the tiniest programs, you end up needing
> this kind of functionality, and whenever somebody rolls their own,
> it's inevitably 10x to 100x larger and uglier than musl's printf core.

  You haven't tried skalibs. ;)
  (Sure, calling format functions individually is far from being as
convenient, but the resulting code path is much shorter and there's
no bloat.)
  I agree we need standards. I just wish the existing standards were
better, and I don't want to be forced to use them.

> Of all that, the only thing contributing non-trivial size is floating
> point support.

  Yes, that's the main thing, but it's an important one: in system
programming, floating point operations are uncommon - someone who
cares about code size is probably not using floating points.

> For seekable files, ftello can tell you.

  Same thing: system programming is more about pipes and sockets than
seekable files. In applications that write files, the interesting
logic is probably not in the I/O, and they don't care.

> But it's perfectly usable for producing new output in
> cases where all write errors will simply result in failing the whole
> "make a file" operation.

  I agree.

> This is solved by fflush before fclose.

  I'm surprised that you of all people say this. What if another thread
writes to the FILE between the fflush and the fclose ? Granted, if the
situation arises, it's probably a programming error, but still, since
atomicity is a big thing for FILE, needing 2 operations instead of 1
doesn't scream good design.

> GNU software (gnulib in particular) likes to ignore this problem by poking
> at internals; we gave them an alternate solution with musl a couple
> years back just to avoid this. :(

  Jesus. And you still argue that it's a usable interface, if people have
to hack internal implementation details to get a simple readability
notification working ?

> For event-driven models, yes. For threaded models, it's quite usable
> and IMO it simplifies code by a a larger factor than the size it adds,
> in cases where it's sufficient.

  "If you can't write asynchronous code, use threads and write synchronous
code." :-Þ
  I agree that threads are a good paradigm to have, but the choice of
which model to use should not be dictated by the indigence of available
interfaces.

> The big thing it provides here is a standard point of synchronization
> for error messages in multithreaded programs. Otherwise there would be
> no lock for different library components to agree on to prevent
> interleaved error output.

  write() guarantees atomicity up to PIPE_BUF bytes. I have never seen
an stderr error message that was bigger than that.

> Yes and no. There are some things that could have been done better,
> and some backwards-compatible additions that could be made to make it
> a lot more useful, but I think stdio still largely succeeds in freeing
> the programmer from having to spend lots of effort on IO code, for a
> large class of useful programs (certainly not all, though!).

  I agree it's good enough for Hello World and applications that just
need very basic I/O. What irks me is that stdio sets a potential barrier
to designing better I/O interfaces, and people who need reliable I/O
management often still contort themselves to use stdio, and the results
are ugly. See the aforementioned gnulib case.

-- 
  Laurent

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: stdio [de]merits discussion [Re: [musl] possible getopt stderr output changes]
  2014-12-11 23:05         ` Laurent Bercot
@ 2014-12-11 23:35           ` Rich Felker
  2014-12-12  2:33           ` Morten Welinder
  1 sibling, 0 replies; 12+ messages in thread
From: Rich Felker @ 2014-12-11 23:35 UTC (permalink / raw)
  To: musl

On Fri, Dec 12, 2014 at 12:05:28AM +0100, Laurent Bercot wrote:
> On 11/12/2014 18:51, Rich Felker wrote:
> >I like it because in all but the tiniest programs, you end up needing
> >this kind of functionality, and whenever somebody rolls their own,
> >it's inevitably 10x to 100x larger and uglier than musl's printf core.
> 
>  You haven't tried skalibs. ;)

I'm not thinking of thing I've tried using myself but rather things
I've seen in mainstream, deployed software. Hideous stuff like APR.
Let me know when you convince folks using APR, NSPR, etc. to switch to
skalibs. :-)

I suppose my use of the word "inevitably" was wrong, but I was trying
to talk about stuff I've seen happen in the wild rather than what's in
the realm of possibility.

> >This is solved by fflush before fclose.
> 
>  I'm surprised that you of all people say this. What if another thread
> writes to the FILE between the fflush and the fclose ? Granted, if the
> situation arises, it's probably a programming error, but still, since
> atomicity is a big thing for FILE, needing 2 operations instead of 1
> doesn't scream good design.

In that case you have UB, since the write from another thread could
equally happen after the fclose (it's unsynchronized) resulting in use
of an invalid FILE*. So it's not an interesting case. The point being:
you can't close a FILE without ensuring that no other threads could
still be accessing it. In practice, aside from the standard streams,
it's really unusual to have simultaneous accesses to the same stream
from multiple threads anyway.

> >GNU software (gnulib in particular) likes to ignore this problem by poking
> >at internals; we gave them an alternate solution with musl a couple
> >years back just to avoid this. :(
> 
>  Jesus. And you still argue that it's a usable interface, if people have
> to hack internal implementation details to get a simple readability
> notification working ?

No, I think this is a misuse of it for something it's not good for. I
don't advocate this kind of hackery at all. The only reason I went to
the effort to support it and negotiate a solution with the gnulib
people was that otherwise we risked them writing hacks to poke at the
internals, which are intentionally allowed to change between versions
of libc.so, thereby making binaries that crash and burn when libc.so
is upgraded. Given rate of musl adoption at the time, this would have
just made musl look bad, even if it was 100% their fault. It would
also have been a big mess for distros to fix.

> >For event-driven models, yes. For threaded models, it's quite usable
> >and IMO it simplifies code by a a larger factor than the size it adds,
> >in cases where it's sufficient.
> 
>  "If you can't write asynchronous code, use threads and write synchronous
> code." :-Þ
>  I agree that threads are a good paradigm to have, but the choice of
> which model to use should not be dictated by the indigence of available
> interfaces.

Agreed. I think stdio is a good choice for many things if you are
using threads though.

> >The big thing it provides here is a standard point of synchronization
> >for error messages in multithreaded programs. Otherwise there would be
> >no lock for different library components to agree on to prevent
> >interleaved error output.
> 
>  write() guarantees atomicity up to PIPE_BUF bytes. I have never seen
> an stderr error message that was bigger than that.

Only for pipes. Ordinary files are also required by POSIX to have
atomic write(), but Linux fails to deliver on this requirement, and
the only correct way for a kernel to deliver is by returning a short
write (which undermines the atomicity) when the full write would
sleep. Terminals have no atomicity at all.

Stdio cannot make the fd atomic, but it does make _arbitrarily large_
output atomic within a single process (i.e. assuming other processes
aren't writing to the file at the same time) simply by doing the
locking in userspace where it's not a DoS issue.

> >Yes and no. There are some things that could have been done better,
> >and some backwards-compatible additions that could be made to make it
> >a lot more useful, but I think stdio still largely succeeds in freeing
> >the programmer from having to spend lots of effort on IO code, for a
> >large class of useful programs (certainly not all, though!).
> 
>  I agree it's good enough for Hello World and applications that just
> need very basic I/O.

I think it also works fine for basically all programs that fit the
unix model of running as a "do one thing and do it well" utility that
reads input from stdin and/or one or more files and writes output to
one file (possibly stdout).

> What irks me is that stdio sets a potential barrier
> to designing better I/O interfaces, and people who need reliable I/O
> management often still contort themselves to use stdio, and the results
> are ugly. See the aforementioned gnulib case.

Perhaps I should look at how skalibs does it.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: stdio [de]merits discussion [Re: [musl] possible getopt stderr output changes]
  2014-12-11 23:05         ` Laurent Bercot
  2014-12-11 23:35           ` Rich Felker
@ 2014-12-12  2:33           ` Morten Welinder
  1 sibling, 0 replies; 12+ messages in thread
From: Morten Welinder @ 2014-12-12  2:33 UTC (permalink / raw)
  To: musl

It sounds a bit like the main beef you have with printf is that
it pulls in floating point stuff.  Absent that, it really isn't a lot
of code and it really doesn't eat much stack space.

So maybe what you are looking for is a way to compile
away all floating point support.

M.


On Thu, Dec 11, 2014 at 6:05 PM, Laurent Bercot
<ska-dietlibc@skarnet.org> wrote:
> On 11/12/2014 18:51, Rich Felker wrote:
>>
>> I like it because in all but the tiniest programs, you end up needing
>> this kind of functionality, and whenever somebody rolls their own,
>> it's inevitably 10x to 100x larger and uglier than musl's printf core.
>
>
>  You haven't tried skalibs. ;)
>  (Sure, calling format functions individually is far from being as
> convenient, but the resulting code path is much shorter and there's
> no bloat.)
>  I agree we need standards. I just wish the existing standards were
> better, and I don't want to be forced to use them.
>
>
>> Of all that, the only thing contributing non-trivial size is floating
>> point support.
>
>
>  Yes, that's the main thing, but it's an important one: in system
> programming, floating point operations are uncommon - someone who
> cares about code size is probably not using floating points.
>
>
>> For seekable files, ftello can tell you.
>
>
>  Same thing: system programming is more about pipes and sockets than
> seekable files. In applications that write files, the interesting
> logic is probably not in the I/O, and they don't care.
>
>
>> But it's perfectly usable for producing new output in
>> cases where all write errors will simply result in failing the whole
>> "make a file" operation.
>
>
>  I agree.
>
>
>> This is solved by fflush before fclose.
>
>
>  I'm surprised that you of all people say this. What if another thread
> writes to the FILE between the fflush and the fclose ? Granted, if the
> situation arises, it's probably a programming error, but still, since
> atomicity is a big thing for FILE, needing 2 operations instead of 1
> doesn't scream good design.
>
>
>> GNU software (gnulib in particular) likes to ignore this problem by poking
>> at internals; we gave them an alternate solution with musl a couple
>> years back just to avoid this. :(
>
>
>  Jesus. And you still argue that it's a usable interface, if people have
> to hack internal implementation details to get a simple readability
> notification working ?
>
>
>> For event-driven models, yes. For threaded models, it's quite usable
>> and IMO it simplifies code by a a larger factor than the size it adds,
>> in cases where it's sufficient.
>
>
>  "If you can't write asynchronous code, use threads and write synchronous
> code." :-Þ
>  I agree that threads are a good paradigm to have, but the choice of
> which model to use should not be dictated by the indigence of available
> interfaces.
>
>
>> The big thing it provides here is a standard point of synchronization
>> for error messages in multithreaded programs. Otherwise there would be
>> no lock for different library components to agree on to prevent
>> interleaved error output.
>
>
>  write() guarantees atomicity up to PIPE_BUF bytes. I have never seen
> an stderr error message that was bigger than that.
>
>
>> Yes and no. There are some things that could have been done better,
>> and some backwards-compatible additions that could be made to make it
>> a lot more useful, but I think stdio still largely succeeds in freeing
>> the programmer from having to spend lots of effort on IO code, for a
>> large class of useful programs (certainly not all, though!).
>
>
>  I agree it's good enough for Hello World and applications that just
> need very basic I/O. What irks me is that stdio sets a potential barrier
> to designing better I/O interfaces, and people who need reliable I/O
> management often still contort themselves to use stdio, and the results
> are ugly. See the aforementioned gnulib case.
>
> --
>  Laurent
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: possible getopt stderr output changes
  2014-12-11  0:10 possible getopt stderr output changes Rich Felker
  2014-12-11  3:53 ` Laurent Bercot
@ 2014-12-11 22:07 ` Rich Felker
  2014-12-13  0:02   ` Isaac Dunham
  2014-12-19 21:49   ` Rich Felker
  1 sibling, 2 replies; 12+ messages in thread
From: Rich Felker @ 2014-12-11 22:07 UTC (permalink / raw)
  To: musl

On Wed, Dec 10, 2014 at 07:10:32PM -0500, Rich Felker wrote:
> The current getopt code uses some ugly write() sequences to generate
> its output to stderr, and fails to support message translation. The
> latter was an oversight when locale/translation support was added and
> should absolutely be fixed. I'm not sure whether we should leave the
> code using write() though or switch to fprintf.

It's been pointed out on irc that POSIX requires ferror(stderr) to be
set if writing the message fails. However fwrite could still be used
instead of fprintf. If we need to use stdio at all, however, I'd lean
towards wanting to make the whole write atomic (i.e. hold the lock for
the whole time) which is more of a pain without fprintf. So basically
we're looking at:

fprintf:
PROS: smaller and simpler code in getopt.c, only one syscall
CONS: additional ~6.5k of additional code pulled in for static

fwrite:
PROS: minimal static linking deps
CONS: need to use flockfile (or implementation internals) for
atomicity if desired, and multiple writes (so no atomicity on the fd)

Rich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: possible getopt stderr output changes
  2014-12-11 22:07 ` possible getopt stderr output changes Rich Felker
@ 2014-12-13  0:02   ` Isaac Dunham
  2014-12-13  3:11     ` Rich Felker
  2014-12-19 21:49   ` Rich Felker
  1 sibling, 1 reply; 12+ messages in thread
From: Isaac Dunham @ 2014-12-13  0:02 UTC (permalink / raw)
  To: musl

On Thu, Dec 11, 2014 at 05:07:56PM -0500, Rich Felker wrote:
> On Wed, Dec 10, 2014 at 07:10:32PM -0500, Rich Felker wrote:
> > The current getopt code uses some ugly write() sequences to generate
> > its output to stderr, and fails to support message translation. The
> > latter was an oversight when locale/translation support was added and
> > should absolutely be fixed. I'm not sure whether we should leave the
> > code using write() though or switch to fprintf.
> 
> It's been pointed out on irc that POSIX requires ferror(stderr) to be
> set if writing the message fails. However fwrite could still be used
> instead of fprintf. If we need to use stdio at all, however, I'd lean
> towards wanting to make the whole write atomic (i.e. hold the lock for
> the whole time) which is more of a pain without fprintf. So basically
> we're looking at:
> 
> fprintf:
> PROS: smaller and simpler code in getopt.c, only one syscall
> CONS: additional ~6.5k of additional code pulled in for static
> 
> fwrite:
> PROS: minimal static linking deps
> CONS: need to use flockfile (or implementation internals) for
> atomicity if desired, and multiple writes (so no atomicity on the fd)

I realize there's quality of implementation to be concerned about and 
similar issues, but I'm really wondering:

How brain-damaged does code have to be to call getopt() from a thread,
*after* starting a second thread and beginning writes to stderr?
Is there any real-world use of this? 

Thanks,
Isaac Dunham


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: possible getopt stderr output changes
  2014-12-13  0:02   ` Isaac Dunham
@ 2014-12-13  3:11     ` Rich Felker
  0 siblings, 0 replies; 12+ messages in thread
From: Rich Felker @ 2014-12-13  3:11 UTC (permalink / raw)
  To: musl

On Fri, Dec 12, 2014 at 04:02:24PM -0800, Isaac Dunham wrote:
> I realize there's quality of implementation to be concerned about and 
> similar issues, but I'm really wondering:
> 
> How brain-damaged does code have to be to call getopt() from a thread,
> *after* starting a second thread and beginning writes to stderr?
> Is there any real-world use of this? 

The way it's most likely to happen is when something that runs early
in main, or from a global ctor, starts threads, and they encounter
errors to print to stderr. Generally I think any 'library level' use
of stderr (or other standard streams) is bad design and so in my
opinion, this shouldn't happen, but plenty of people disagree.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: possible getopt stderr output changes
  2014-12-11 22:07 ` possible getopt stderr output changes Rich Felker
  2014-12-13  0:02   ` Isaac Dunham
@ 2014-12-19 21:49   ` Rich Felker
  1 sibling, 0 replies; 12+ messages in thread
From: Rich Felker @ 2014-12-19 21:49 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1946 bytes --]

On Thu, Dec 11, 2014 at 05:07:56PM -0500, Rich Felker wrote:
> On Wed, Dec 10, 2014 at 07:10:32PM -0500, Rich Felker wrote:
> > The current getopt code uses some ugly write() sequences to generate
> > its output to stderr, and fails to support message translation. The
> > latter was an oversight when locale/translation support was added and
> > should absolutely be fixed. I'm not sure whether we should leave the
> > code using write() though or switch to fprintf.
> 
> It's been pointed out on irc that POSIX requires ferror(stderr) to be
> set if writing the message fails. However fwrite could still be used
> instead of fprintf. If we need to use stdio at all, however, I'd lean
> towards wanting to make the whole write atomic (i.e. hold the lock for
> the whole time) which is more of a pain without fprintf. So basically
> we're looking at:
> 
> fprintf:
> PROS: smaller and simpler code in getopt.c, only one syscall
> CONS: additional ~6.5k of additional code pulled in for static
> 
> fwrite:
> PROS: minimal static linking deps
> CONS: need to use flockfile (or implementation internals) for
> atomicity if desired, and multiple writes (so no atomicity on the fd)

I'm attaching a patch which allows either solution, and the approach
using fwrite is considerably uglier. Even using some stdio internals
directly rather than the public API, the resulting getopt.o is 176
bytes larger than the the fprintf version, but I think the ugliness to
get the required semantics is the worst part. So I'm strongly leaning
towards just using fprintf. The other viable alternative would be
having an internal-use function for printing a variadic list of
strings atomically through stdio, but I still think there's probably
more value in keeping getopt.c independent of musl internals as much
as possible.

Adding support for translation of the messages is a separate step yet
to be done, but the code is setup to support it in either way.

Rich

[-- Attachment #2: getopt_stdio.diff --]
[-- Type: text/plain, Size: 1837 bytes --]

diff --git a/src/misc/getopt.c b/src/misc/getopt.c
index e77e460..e2a309a 100644
--- a/src/misc/getopt.c
+++ b/src/misc/getopt.c
@@ -4,6 +4,7 @@
 #include <limits.h>
 #include <stdlib.h>
 #include "libc.h"
+#include "stdio_impl.h"
 
 char *optarg;
 int optind=1, opterr=1, optopt, __optpos, __optreset=0;
@@ -11,6 +12,21 @@ int optind=1, opterr=1, optopt, __optpos, __optreset=0;
 #define optpos __optpos
 weak_alias(__optreset, optreset);
 
+#if 0
+static void errmsg(const char *a, const char *b, int l, const char *c)
+{
+	FILE *f = stderr;
+	FLOCK(f);
+	size_t al = strlen(a), bl = strlen(b);
+	__fwritex((void *)a, al, f) == al &&
+	__fwritex((void *)b, bl, f) == bl &&
+	__fwritex((void *)c, l, f) == l &&
+	__fwritex((void *)"\n", 1, f);
+	FUNLOCK(f);
+}
+#define fprintf(fi, fo, a, b, l, c) errmsg(a, b, l, c)
+#endif
+
 int getopt(int argc, char * const argv[], const char *optstring)
 {
 	int i;
@@ -66,24 +82,18 @@ int getopt(int argc, char * const argv[], const char *optstring)
 	} while (l && d != c);
 
 	if (d != c) {
-		if (optstring[0] != ':' && opterr) {
-			write(2, argv[0], strlen(argv[0]));
-			write(2, ": illegal option: ", 18);
-			write(2, optchar, k);
-			write(2, "\n", 1);
-		}
+		if (optstring[0] != ':' && opterr)
+			fprintf(stderr, "%s%s%.*s\n", argv[0],
+				": illegal option: ", k, optchar);
 		return '?';
 	}
 	if (optstring[i] == ':') {
 		if (optstring[i+1] == ':') optarg = 0;
 		else if (optind >= argc) {
 			if (optstring[0] == ':') return ':';
-			if (opterr) {
-				write(2, argv[0], strlen(argv[0]));
-				write(2, ": option requires an argument: ", 31);
-				write(2, optchar, k);
-				write(2, "\n", 1);
-			}
+			if (opterr)
+				fprintf(stderr, "%s%s%.*s\n", argv[0],
+					": option requires an argument: ", k, optchar);
 			return '?';
 		}
 		if (optstring[i+1] != ':' || optpos) {

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-12-19 21:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-11  0:10 possible getopt stderr output changes Rich Felker
2014-12-11  3:53 ` Laurent Bercot
2014-12-11  6:44   ` Rich Felker
2014-12-11 15:40     ` Laurent Bercot
2014-12-11 17:51       ` stdio [de]merits discussion [Re: [musl] possible getopt stderr output changes] Rich Felker
2014-12-11 23:05         ` Laurent Bercot
2014-12-11 23:35           ` Rich Felker
2014-12-12  2:33           ` Morten Welinder
2014-12-11 22:07 ` possible getopt stderr output changes Rich Felker
2014-12-13  0:02   ` Isaac Dunham
2014-12-13  3:11     ` Rich Felker
2014-12-19 21:49   ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).