mailing list of musl libc
 help / color / mirror / code / Atom feed
* getopt_long permutation algorithm questions
@ 2014-12-03 19:29 Rich Felker
  2014-12-03 19:43 ` Jens Gustedt
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2014-12-03 19:29 UTC (permalink / raw)
  To: musl

As part of resolving the rest of the dist-local changes Alpine is
applying to musl, I'm trying to figure out how to add GNU-style
argument permutation to getopt_long. The basic concept is simple: when
a non-option argument is encountered, skip forward until the next
option (argument beginning with '-') and move it (and possibly its
argument) before the non-option arguments. However, there are some
ugly corner cases like:

arg1 -ab foo arg2

where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it
seems like, in order to perform the correct permutation, lookahead is
required to see that foo also needs to be moved. Is this correct?

For long options, it's immediately decidable from the option being
processed whether it has no argument, or an argument that's part of
the same argv[] string, or a separate option in the next argv[] slot.
For short options, it seems necessary to scan each character of the
argv[] string to be moved, looking for the first option that takes an
argument. If none is found, or if such a character is found in a
non-final position, only this string needs to be moved. If an option
needing an argument is found in the final position, two argv[] strings
need to be moved. Is this correct?

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: getopt_long permutation algorithm questions
  2014-12-03 19:29 getopt_long permutation algorithm questions Rich Felker
@ 2014-12-03 19:43 ` Jens Gustedt
  2014-12-03 19:56   ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Gustedt @ 2014-12-03 19:43 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2091 bytes --]

Am Mittwoch, den 03.12.2014, 14:29 -0500 schrieb Rich Felker:
> As part of resolving the rest of the dist-local changes Alpine is
> applying to musl, I'm trying to figure out how to add GNU-style
> argument permutation to getopt_long. The basic concept is simple: when
> a non-option argument is encountered, skip forward until the next
> option (argument beginning with '-') and move it (and possibly its
> argument) before the non-option arguments. However, there are some
> ugly corner cases like:
> 
> arg1 -ab foo arg2
> 
> where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it
> seems like, in order to perform the correct permutation, lookahead is
> required to see that foo also needs to be moved. Is this correct?
> 
> For long options, it's immediately decidable from the option being
> processed whether it has no argument, or an argument that's part of
> the same argv[] string, or a separate option in the next argv[] slot.
> For short options, it seems necessary to scan each character of the
> argv[] string to be moved, looking for the first option that takes an
> argument. If none is found, or if such a character is found in a
> non-final position, only this string needs to be moved. If an option
> needing an argument is found in the final position, two argv[] strings
> need to be moved. Is this correct?

sounds all a bit complicated and fragile to me.  getopt should neither
be performance nor memory critical, so there is not much need for
optimization here, no?

Why don't you just keep track of the cases in an array on the stack,
and then do all the moves after the processing in a second scan? You
have argc, so you know the size of a VLA (`char[argc]` should suffice)
that you would have to define.

Jens
-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: getopt_long permutation algorithm questions
  2014-12-03 19:43 ` Jens Gustedt
@ 2014-12-03 19:56   ` Rich Felker
  2014-12-03 20:47     ` Jens Gustedt
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2014-12-03 19:56 UTC (permalink / raw)
  To: musl

On Wed, Dec 03, 2014 at 08:43:09PM +0100, Jens Gustedt wrote:
> Am Mittwoch, den 03.12.2014, 14:29 -0500 schrieb Rich Felker:
> > As part of resolving the rest of the dist-local changes Alpine is
> > applying to musl, I'm trying to figure out how to add GNU-style
> > argument permutation to getopt_long. The basic concept is simple: when
> > a non-option argument is encountered, skip forward until the next
> > option (argument beginning with '-') and move it (and possibly its
> > argument) before the non-option arguments. However, there are some
> > ugly corner cases like:
> > 
> > arg1 -ab foo arg2
> > 
> > where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it
> > seems like, in order to perform the correct permutation, lookahead is
> > required to see that foo also needs to be moved. Is this correct?
> > 
> > For long options, it's immediately decidable from the option being
> > processed whether it has no argument, or an argument that's part of
> > the same argv[] string, or a separate option in the next argv[] slot.
> > For short options, it seems necessary to scan each character of the
> > argv[] string to be moved, looking for the first option that takes an
> > argument. If none is found, or if such a character is found in a
> > non-final position, only this string needs to be moved. If an option
> > needing an argument is found in the final position, two argv[] strings
> > need to be moved. Is this correct?
> 
> sounds all a bit complicated and fragile to me.  getopt should neither
> be performance nor memory critical, so there is not much need for
> optimization here, no?
> 
> Why don't you just keep track of the cases in an array on the stack,
> and then do all the moves after the processing in a second scan? You
> have argc, so you know the size of a VLA (`char[argc]` should suffice)
> that you would have to define.

I'm not trying to optimize anything for performance here, just get it
correct and simple. I don't quite understand what you're proposing,
but doing a multi-pass, out-of-place operation does not sound simple,
and it's not robust since the kernel allows huge argument lists
whereby char[argc] would overflow the stack (and even if it didn't,
assuming that it doesn't allow them would be an unwarranted assumption
unless it actually bought us some simplicity, which I don't see).

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: getopt_long permutation algorithm questions
  2014-12-03 19:56   ` Rich Felker
@ 2014-12-03 20:47     ` Jens Gustedt
  2014-12-04  0:04       ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Gustedt @ 2014-12-03 20:47 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3387 bytes --]

Am Mittwoch, den 03.12.2014, 14:56 -0500 schrieb Rich Felker:
> On Wed, Dec 03, 2014 at 08:43:09PM +0100, Jens Gustedt wrote:
> > sounds all a bit complicated and fragile to me.  getopt should neither
> > be performance nor memory critical, so there is not much need for
> > optimization here, no?
> > 
> > Why don't you just keep track of the cases in an array on the stack,
> > and then do all the moves after the processing in a second scan? You
> > have argc, so you know the size of a VLA (`char[argc]` should suffice)
> > that you would have to define.
> 
> I'm not trying to optimize anything for performance here, just get it
> correct and simple. I don't quite understand what you're proposing,

so I have to explain better

> but doing a multi-pass,

actually two-pass

> out-of-place operation does not sound simple,

but in place

> and it's not robust since the kernel allows huge argument lists
> whereby char[argc] would overflow the stack (and even if it didn't,
> assuming that it doesn't allow them would be an unwarranted assumption
> unless it actually bought us some simplicity, which I don't see).

My idea was just to do a first pass for the "real" argument processing
and to note during that pass if an argument wasn't used. Then in a
second pass reorder argv with that information.

For the storage of that information, this is one bit per argc, and if
we waste one byte for it to have life simple, then `char[argc]` should
suffice. argc can be greater than several K ?

But thinking of it:

> > Am Mittwoch, den 03.12.2014, 14:29 -0500 schrieb Rich Felker:
> > > As part of resolving the rest of the dist-local changes Alpine is
> > > applying to musl, I'm trying to figure out how to add GNU-style
> > > argument permutation to getopt_long. The basic concept is simple: when
> > > a non-option argument is encountered, skip forward until the next
> > > option (argument beginning with '-') and move it (and possibly its
> > > argument) before the non-option arguments. However, there are some
> > > ugly corner cases like:
> > > 
> > > arg1 -ab foo arg2
> > > 
> > > where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it
> > > seems like, in order to perform the correct permutation, lookahead is
> > > required to see that foo also needs to be moved. Is this correct?

I wouldn't call it "lookahead".

Why don't you

 - have k as the position where the next option argument should be
   placed and p as the next position that has not been processed yet
 - move p such that argv[p] is the next option argument
 - process argv[p] in place, including an eventual argument
   argv[p+1] that a long option or the last option character takes
 - shuffle the found argv[p] (plus eventually argv[p+1]) to the
   correct position argv[k] (and argv[k+1])
 - increment k and p by 1 (or 2)
 - iterate

The disadvantage of that general approach is that it can be
quadratic. For very long argument lists, if you have argc more than
several K, this could be prohibitive.

Jens

-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: getopt_long permutation algorithm questions
  2014-12-03 20:47     ` Jens Gustedt
@ 2014-12-04  0:04       ` Rich Felker
  2014-12-04  6:25         ` Timo Teras
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2014-12-04  0:04 UTC (permalink / raw)
  To: musl

On Wed, Dec 03, 2014 at 09:47:53PM +0100, Jens Gustedt wrote:
> Am Mittwoch, den 03.12.2014, 14:56 -0500 schrieb Rich Felker:
> > On Wed, Dec 03, 2014 at 08:43:09PM +0100, Jens Gustedt wrote:
> > > sounds all a bit complicated and fragile to me.  getopt should neither
> > > be performance nor memory critical, so there is not much need for
> > > optimization here, no?
> > > 
> > > Why don't you just keep track of the cases in an array on the stack,
> > > and then do all the moves after the processing in a second scan? You
> > > have argc, so you know the size of a VLA (`char[argc]` should suffice)
> > > that you would have to define.
> > 
> > I'm not trying to optimize anything for performance here, just get it
> > correct and simple. I don't quite understand what you're proposing,
> 
> so I have to explain better
> 
> > but doing a multi-pass,
> 
> actually two-pass
> 
> > out-of-place operation does not sound simple,
> 
> but in place
> 
> > and it's not robust since the kernel allows huge argument lists
> > whereby char[argc] would overflow the stack (and even if it didn't,
> > assuming that it doesn't allow them would be an unwarranted assumption
> > unless it actually bought us some simplicity, which I don't see).
> 
> My idea was just to do a first pass for the "real" argument processing
> and to note during that pass if an argument wasn't used. Then in a
> second pass reorder argv with that information.

I don't think you want to pre-process the whole argument list, for
various reasons. The GNU version, as I understand, does argument
permutation as it goes based on the state of the opt* vars rather than
permuting everything at once, and presumably some callers could rely
on this, for example as a way of conditionally suppressing the
permutation by incrementing optind rather than calling getopt when
optind points to a non-option argument.

It's also just a more complicated design and not compatible with the
existing design/implementation of getopt[_long] to preprocess
everything.

> For the storage of that information, this is one bit per argc, and if
> we waste one byte for it to have life simple, then `char[argc]` should
> suffice. argc can be greater than several K ?

My understanding is that the kernel imposes no hard limit, aside from
resource limits, on modern kernels... Yes this is a nasty and
completely unnecessary DoS vector.

> But thinking of it:
> 
> > > Am Mittwoch, den 03.12.2014, 14:29 -0500 schrieb Rich Felker:
> > > > As part of resolving the rest of the dist-local changes Alpine is
> > > > applying to musl, I'm trying to figure out how to add GNU-style
> > > > argument permutation to getopt_long. The basic concept is simple: when
> > > > a non-option argument is encountered, skip forward until the next
> > > > option (argument beginning with '-') and move it (and possibly its
> > > > argument) before the non-option arguments. However, there are some
> > > > ugly corner cases like:
> > > > 
> > > > arg1 -ab foo arg2
> > > > 
> > > > where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it
> > > > seems like, in order to perform the correct permutation, lookahead is
> > > > required to see that foo also needs to be moved. Is this correct?
> 
> I wouldn't call it "lookahead".
> 
> Why don't you
> 
>  - have k as the position where the next option argument should be
>    placed and p as the next position that has not been processed yet
>  - move p such that argv[p] is the next option argument
>  - process argv[p] in place, including an eventual argument
>    argv[p+1] that a long option or the last option character takes
>  - shuffle the found argv[p] (plus eventually argv[p+1]) to the
>    correct position argv[k] (and argv[k+1])
>  - increment k and p by 1 (or 2)
>  - iterate
> 
> The disadvantage of that general approach is that it can be
> quadratic. For very long argument lists, if you have argc more than
> several K, this could be prohibitive.

This is pretty close to what I described, except I avoided the
"in-place" processing which doesn't work because of examples like the
one I gave, where you would have to process more than one option at a
time, then rewind to return just the first one. The process I
described avoids the need for that rewinding. Note that part of the
complexity here in "processing ahead and then rewinding" is that the
internal state of getopt[_long] is not encapsulated but exists in
global variables that the caller can access.

By quadratic time, I assume you meant for processing the whole
argument list. The algorithm I described is linear-time in the number
of consecutive misordered non-option arguments for a single call to
getopt_long, and thus quadratic when you call it repeatedly. This
could be avoided by saving additional hidden state, but I suspect the
practical gains are small and the potential for incorrect behavior
exists if the argument list is modified between calls.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: getopt_long permutation algorithm questions
  2014-12-04  0:04       ` Rich Felker
@ 2014-12-04  6:25         ` Timo Teras
  2014-12-06 17:47           ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Timo Teras @ 2014-12-04  6:25 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Wed, 3 Dec 2014 19:04:34 -0500
Rich Felker <dalias@libc.org> wrote:

> On Wed, Dec 03, 2014 at 09:47:53PM +0100, Jens Gustedt wrote:
> > Am Mittwoch, den 03.12.2014, 14:56 -0500 schrieb Rich Felker:
> > > On Wed, Dec 03, 2014 at 08:43:09PM +0100, Jens Gustedt wrote:
> > > and it's not robust since the kernel allows huge argument lists
> > > whereby char[argc] would overflow the stack (and even if it
> > > didn't, assuming that it doesn't allow them would be an
> > > unwarranted assumption unless it actually bought us some
> > > simplicity, which I don't see).
> > 
> > My idea was just to do a first pass for the "real" argument
> > processing and to note during that pass if an argument wasn't used.
> > Then in a second pass reorder argv with that information.
> 
> I don't think you want to pre-process the whole argument list, for
> various reasons. The GNU version, as I understand, does argument
> permutation as it goes based on the state of the opt* vars rather than
> permuting everything at once, and presumably some callers could rely
> on this, for example as a way of conditionally suppressing the
> permutation by incrementing optind rather than calling getopt when
> optind points to a non-option argument.
> 
> It's also just a more complicated design and not compatible with the
> existing design/implementation of getopt[_long] to preprocess
> everything.

As reference the relevant part from man-pages is:

       By default, getopt() permutes the contents of argv as it scans,
       so that eventually all the nonoptions are at the end.  Two other
       modes are also implemented.   If  the first character of
       optstring is '+' or the envi- ronment variable POSIXLY_CORRECT
       is set, then option  processing  stops as soon as a nonoption
       argument is encountered.  If the first character of optstring is
       '-', then each nonoption argv-element is handled as  if it were
       the argument of an option with character code 1.  (This is used
       by programs that were written to expect options and other
       argv-elements in any order and that care about the ordering of
       the two.)  The special argument "--" forces an end of
       option-scanning regardless of the  scan- ning mode.

So it seems permutation should be done incrementally, not as one big
reorder everything operation. Though the wording allows some
interpretation freedom too.

But what Rich suggested as first, would have been my approach too. I'm
slightly confused by the complexity of permute_args() in the BSD version
we currently use for Alpine (just grabbed it as it was the first
working and BSD licensed version of the function I could find).

Not sure if there's some nasty corner case surprises, but I'd start
with that as it's the simple approach.

Thanks,
Timo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: getopt_long permutation algorithm questions
  2014-12-04  6:25         ` Timo Teras
@ 2014-12-06 17:47           ` Rich Felker
  2014-12-06 19:22             ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2014-12-06 17:47 UTC (permalink / raw)
  To: musl

On Thu, Dec 04, 2014 at 08:25:06AM +0200, Timo Teras wrote:
> Not sure if there's some nasty corner case surprises, but I'd start
> with that as it's the simple approach.

There is one nasty corner case I don't know how to deal with. Consider
an option string of "ab:" with the command line:

./a.out foo -ab

The "desired" result is that -a is accepted successfully and -b yields
an error due to missing argument. But permuting "-ab" before "foo"
results in "foo" being treated as the argument to -b.

Fortunately this issue only matters for erroneous (syntactically
invalid) command lines, so we can probably ignore it by just refusing
to permute them (thus simply treating "foo" and "-ab" as non-option
arguments).

Empirically (I haven't RTFS'd and don't intend to) glibc has an
alternate approach where permutation happens after the argv[]
component has been consumed rather than before. When it sees a
non-option argument, it saves the location, jumps forward and
processes the next option-containing argument, and only moves it back
to the saved location when advancing optind. This exposes nonsensical
values of optind to the application; after processing the above and
getting the error for -b, optind points to the null pointer slot at
the end of argv[], and only jumps back to point to the permuted foo
when getopt[_long] is called _again_ after the error.

I haven't yet tested what the BSD version does.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: getopt_long permutation algorithm questions
  2014-12-06 17:47           ` Rich Felker
@ 2014-12-06 19:22             ` Rich Felker
  0 siblings, 0 replies; 8+ messages in thread
From: Rich Felker @ 2014-12-06 19:22 UTC (permalink / raw)
  To: musl; +Cc: alpine-devel

On Sat, Dec 06, 2014 at 12:47:10PM -0500, Rich Felker wrote:
> On Thu, Dec 04, 2014 at 08:25:06AM +0200, Timo Teras wrote:
> > Not sure if there's some nasty corner case surprises, but I'd start
> > with that as it's the simple approach.
> 
> There is one nasty corner case I don't know how to deal with. Consider
> an option string of "ab:" with the command line:
> 
> ../a.out foo -ab
> 
> The "desired" result is that -a is accepted successfully and -b yields
> an error due to missing argument. But permuting "-ab" before "foo"
> results in "foo" being treated as the argument to -b.
> 
> Fortunately this issue only matters for erroneous (syntactically
> invalid) command lines, so we can probably ignore it by just refusing
> to permute them (thus simply treating "foo" and "-ab" as non-option
> arguments).
> 
> Empirically (I haven't RTFS'd and don't intend to) glibc has an
> alternate approach where permutation happens after the argv[]
> component has been consumed rather than before. When it sees a
> non-option argument, it saves the location, jumps forward and
> processes the next option-containing argument, and only moves it back
> to the saved location when advancing optind. This exposes nonsensical
> values of optind to the application; after processing the above and
> getting the error for -b, optind points to the null pointer slot at
> the end of argv[], and only jumps back to point to the permuted foo
> when getopt[_long] is called _again_ after the error.
> 
> I haven't yet tested what the BSD version does.

The BSD code Alpine is using right now seems to do the same thing as
glibc does empirically, and the implementation matches what I expected
-- saving state and applying the permutation later. My leaning is
still to go with the method I proposed that avoids internal state and
inconsistent values of optind during parsing, at the expense of not
permuting syntactically invalid command lines. Any opinions?

BTW just a heads-up for Alpine: the BSD code that's being patched in
right now contains namespace violations and is causing code linked to
their modified libc to contain copy-reloc references to optreset
rather than __optreset, if it uses optreset. I think this will be safe
to fix (old binaries will continue to work once it's fixed), but it's
not forwards compatible. Binaries built with the correct symbol name
(again, only programs which use optreset) will not work on Alpine's
old libc.so, so upgrading libc will be mandatory to use new binaries.
This also applies to binaries build against musl on non-Alpine systems
and carried over to Alpine.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-12-06 19:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-03 19:29 getopt_long permutation algorithm questions Rich Felker
2014-12-03 19:43 ` Jens Gustedt
2014-12-03 19:56   ` Rich Felker
2014-12-03 20:47     ` Jens Gustedt
2014-12-04  0:04       ` Rich Felker
2014-12-04  6:25         ` Timo Teras
2014-12-06 17:47           ` Rich Felker
2014-12-06 19:22             ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).