* getopt_long permutation algorithm questions @ 2014-12-03 19:29 Rich Felker 2014-12-03 19:43 ` Jens Gustedt 0 siblings, 1 reply; 8+ messages in thread From: Rich Felker @ 2014-12-03 19:29 UTC (permalink / raw) To: musl As part of resolving the rest of the dist-local changes Alpine is applying to musl, I'm trying to figure out how to add GNU-style argument permutation to getopt_long. The basic concept is simple: when a non-option argument is encountered, skip forward until the next option (argument beginning with '-') and move it (and possibly its argument) before the non-option arguments. However, there are some ugly corner cases like: arg1 -ab foo arg2 where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it seems like, in order to perform the correct permutation, lookahead is required to see that foo also needs to be moved. Is this correct? For long options, it's immediately decidable from the option being processed whether it has no argument, or an argument that's part of the same argv[] string, or a separate option in the next argv[] slot. For short options, it seems necessary to scan each character of the argv[] string to be moved, looking for the first option that takes an argument. If none is found, or if such a character is found in a non-final position, only this string needs to be moved. If an option needing an argument is found in the final position, two argv[] strings need to be moved. Is this correct? Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: getopt_long permutation algorithm questions 2014-12-03 19:29 getopt_long permutation algorithm questions Rich Felker @ 2014-12-03 19:43 ` Jens Gustedt 2014-12-03 19:56 ` Rich Felker 0 siblings, 1 reply; 8+ messages in thread From: Jens Gustedt @ 2014-12-03 19:43 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 2091 bytes --] Am Mittwoch, den 03.12.2014, 14:29 -0500 schrieb Rich Felker: > As part of resolving the rest of the dist-local changes Alpine is > applying to musl, I'm trying to figure out how to add GNU-style > argument permutation to getopt_long. The basic concept is simple: when > a non-option argument is encountered, skip forward until the next > option (argument beginning with '-') and move it (and possibly its > argument) before the non-option arguments. However, there are some > ugly corner cases like: > > arg1 -ab foo arg2 > > where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it > seems like, in order to perform the correct permutation, lookahead is > required to see that foo also needs to be moved. Is this correct? > > For long options, it's immediately decidable from the option being > processed whether it has no argument, or an argument that's part of > the same argv[] string, or a separate option in the next argv[] slot. > For short options, it seems necessary to scan each character of the > argv[] string to be moved, looking for the first option that takes an > argument. If none is found, or if such a character is found in a > non-final position, only this string needs to be moved. If an option > needing an argument is found in the final position, two argv[] strings > need to be moved. Is this correct? sounds all a bit complicated and fragile to me. getopt should neither be performance nor memory critical, so there is not much need for optimization here, no? Why don't you just keep track of the cases in an array on the stack, and then do all the moves after the processing in a second scan? You have argc, so you know the size of a VLA (`char[argc]` should suffice) that you would have to define. Jens -- :: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: getopt_long permutation algorithm questions 2014-12-03 19:43 ` Jens Gustedt @ 2014-12-03 19:56 ` Rich Felker 2014-12-03 20:47 ` Jens Gustedt 0 siblings, 1 reply; 8+ messages in thread From: Rich Felker @ 2014-12-03 19:56 UTC (permalink / raw) To: musl On Wed, Dec 03, 2014 at 08:43:09PM +0100, Jens Gustedt wrote: > Am Mittwoch, den 03.12.2014, 14:29 -0500 schrieb Rich Felker: > > As part of resolving the rest of the dist-local changes Alpine is > > applying to musl, I'm trying to figure out how to add GNU-style > > argument permutation to getopt_long. The basic concept is simple: when > > a non-option argument is encountered, skip forward until the next > > option (argument beginning with '-') and move it (and possibly its > > argument) before the non-option arguments. However, there are some > > ugly corner cases like: > > > > arg1 -ab foo arg2 > > > > where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it > > seems like, in order to perform the correct permutation, lookahead is > > required to see that foo also needs to be moved. Is this correct? > > > > For long options, it's immediately decidable from the option being > > processed whether it has no argument, or an argument that's part of > > the same argv[] string, or a separate option in the next argv[] slot. > > For short options, it seems necessary to scan each character of the > > argv[] string to be moved, looking for the first option that takes an > > argument. If none is found, or if such a character is found in a > > non-final position, only this string needs to be moved. If an option > > needing an argument is found in the final position, two argv[] strings > > need to be moved. Is this correct? > > sounds all a bit complicated and fragile to me. getopt should neither > be performance nor memory critical, so there is not much need for > optimization here, no? > > Why don't you just keep track of the cases in an array on the stack, > and then do all the moves after the processing in a second scan? You > have argc, so you know the size of a VLA (`char[argc]` should suffice) > that you would have to define. I'm not trying to optimize anything for performance here, just get it correct and simple. I don't quite understand what you're proposing, but doing a multi-pass, out-of-place operation does not sound simple, and it's not robust since the kernel allows huge argument lists whereby char[argc] would overflow the stack (and even if it didn't, assuming that it doesn't allow them would be an unwarranted assumption unless it actually bought us some simplicity, which I don't see). Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: getopt_long permutation algorithm questions 2014-12-03 19:56 ` Rich Felker @ 2014-12-03 20:47 ` Jens Gustedt 2014-12-04 0:04 ` Rich Felker 0 siblings, 1 reply; 8+ messages in thread From: Jens Gustedt @ 2014-12-03 20:47 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 3387 bytes --] Am Mittwoch, den 03.12.2014, 14:56 -0500 schrieb Rich Felker: > On Wed, Dec 03, 2014 at 08:43:09PM +0100, Jens Gustedt wrote: > > sounds all a bit complicated and fragile to me. getopt should neither > > be performance nor memory critical, so there is not much need for > > optimization here, no? > > > > Why don't you just keep track of the cases in an array on the stack, > > and then do all the moves after the processing in a second scan? You > > have argc, so you know the size of a VLA (`char[argc]` should suffice) > > that you would have to define. > > I'm not trying to optimize anything for performance here, just get it > correct and simple. I don't quite understand what you're proposing, so I have to explain better > but doing a multi-pass, actually two-pass > out-of-place operation does not sound simple, but in place > and it's not robust since the kernel allows huge argument lists > whereby char[argc] would overflow the stack (and even if it didn't, > assuming that it doesn't allow them would be an unwarranted assumption > unless it actually bought us some simplicity, which I don't see). My idea was just to do a first pass for the "real" argument processing and to note during that pass if an argument wasn't used. Then in a second pass reorder argv with that information. For the storage of that information, this is one bit per argc, and if we waste one byte for it to have life simple, then `char[argc]` should suffice. argc can be greater than several K ? But thinking of it: > > Am Mittwoch, den 03.12.2014, 14:29 -0500 schrieb Rich Felker: > > > As part of resolving the rest of the dist-local changes Alpine is > > > applying to musl, I'm trying to figure out how to add GNU-style > > > argument permutation to getopt_long. The basic concept is simple: when > > > a non-option argument is encountered, skip forward until the next > > > option (argument beginning with '-') and move it (and possibly its > > > argument) before the non-option arguments. However, there are some > > > ugly corner cases like: > > > > > > arg1 -ab foo arg2 > > > > > > where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it > > > seems like, in order to perform the correct permutation, lookahead is > > > required to see that foo also needs to be moved. Is this correct? I wouldn't call it "lookahead". Why don't you - have k as the position where the next option argument should be placed and p as the next position that has not been processed yet - move p such that argv[p] is the next option argument - process argv[p] in place, including an eventual argument argv[p+1] that a long option or the last option character takes - shuffle the found argv[p] (plus eventually argv[p+1]) to the correct position argv[k] (and argv[k+1]) - increment k and p by 1 (or 2) - iterate The disadvantage of that general approach is that it can be quadratic. For very long argument lists, if you have argc more than several K, this could be prohibitive. Jens -- :: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: getopt_long permutation algorithm questions 2014-12-03 20:47 ` Jens Gustedt @ 2014-12-04 0:04 ` Rich Felker 2014-12-04 6:25 ` Timo Teras 0 siblings, 1 reply; 8+ messages in thread From: Rich Felker @ 2014-12-04 0:04 UTC (permalink / raw) To: musl On Wed, Dec 03, 2014 at 09:47:53PM +0100, Jens Gustedt wrote: > Am Mittwoch, den 03.12.2014, 14:56 -0500 schrieb Rich Felker: > > On Wed, Dec 03, 2014 at 08:43:09PM +0100, Jens Gustedt wrote: > > > sounds all a bit complicated and fragile to me. getopt should neither > > > be performance nor memory critical, so there is not much need for > > > optimization here, no? > > > > > > Why don't you just keep track of the cases in an array on the stack, > > > and then do all the moves after the processing in a second scan? You > > > have argc, so you know the size of a VLA (`char[argc]` should suffice) > > > that you would have to define. > > > > I'm not trying to optimize anything for performance here, just get it > > correct and simple. I don't quite understand what you're proposing, > > so I have to explain better > > > but doing a multi-pass, > > actually two-pass > > > out-of-place operation does not sound simple, > > but in place > > > and it's not robust since the kernel allows huge argument lists > > whereby char[argc] would overflow the stack (and even if it didn't, > > assuming that it doesn't allow them would be an unwarranted assumption > > unless it actually bought us some simplicity, which I don't see). > > My idea was just to do a first pass for the "real" argument processing > and to note during that pass if an argument wasn't used. Then in a > second pass reorder argv with that information. I don't think you want to pre-process the whole argument list, for various reasons. The GNU version, as I understand, does argument permutation as it goes based on the state of the opt* vars rather than permuting everything at once, and presumably some callers could rely on this, for example as a way of conditionally suppressing the permutation by incrementing optind rather than calling getopt when optind points to a non-option argument. It's also just a more complicated design and not compatible with the existing design/implementation of getopt[_long] to preprocess everything. > For the storage of that information, this is one bit per argc, and if > we waste one byte for it to have life simple, then `char[argc]` should > suffice. argc can be greater than several K ? My understanding is that the kernel imposes no hard limit, aside from resource limits, on modern kernels... Yes this is a nasty and completely unnecessary DoS vector. > But thinking of it: > > > > Am Mittwoch, den 03.12.2014, 14:29 -0500 schrieb Rich Felker: > > > > As part of resolving the rest of the dist-local changes Alpine is > > > > applying to musl, I'm trying to figure out how to add GNU-style > > > > argument permutation to getopt_long. The basic concept is simple: when > > > > a non-option argument is encountered, skip forward until the next > > > > option (argument beginning with '-') and move it (and possibly its > > > > argument) before the non-option arguments. However, there are some > > > > ugly corner cases like: > > > > > > > > arg1 -ab foo arg2 > > > > > > > > where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it > > > > seems like, in order to perform the correct permutation, lookahead is > > > > required to see that foo also needs to be moved. Is this correct? > > I wouldn't call it "lookahead". > > Why don't you > > - have k as the position where the next option argument should be > placed and p as the next position that has not been processed yet > - move p such that argv[p] is the next option argument > - process argv[p] in place, including an eventual argument > argv[p+1] that a long option or the last option character takes > - shuffle the found argv[p] (plus eventually argv[p+1]) to the > correct position argv[k] (and argv[k+1]) > - increment k and p by 1 (or 2) > - iterate > > The disadvantage of that general approach is that it can be > quadratic. For very long argument lists, if you have argc more than > several K, this could be prohibitive. This is pretty close to what I described, except I avoided the "in-place" processing which doesn't work because of examples like the one I gave, where you would have to process more than one option at a time, then rewind to return just the first one. The process I described avoids the need for that rewinding. Note that part of the complexity here in "processing ahead and then rewinding" is that the internal state of getopt[_long] is not encapsulated but exists in global variables that the caller can access. By quadratic time, I assume you meant for processing the whole argument list. The algorithm I described is linear-time in the number of consecutive misordered non-option arguments for a single call to getopt_long, and thus quadratic when you call it repeatedly. This could be avoided by saving additional hidden state, but I suspect the practical gains are small and the potential for incorrect behavior exists if the argument list is modified between calls. Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: getopt_long permutation algorithm questions 2014-12-04 0:04 ` Rich Felker @ 2014-12-04 6:25 ` Timo Teras 2014-12-06 17:47 ` Rich Felker 0 siblings, 1 reply; 8+ messages in thread From: Timo Teras @ 2014-12-04 6:25 UTC (permalink / raw) To: Rich Felker; +Cc: musl On Wed, 3 Dec 2014 19:04:34 -0500 Rich Felker <dalias@libc.org> wrote: > On Wed, Dec 03, 2014 at 09:47:53PM +0100, Jens Gustedt wrote: > > Am Mittwoch, den 03.12.2014, 14:56 -0500 schrieb Rich Felker: > > > On Wed, Dec 03, 2014 at 08:43:09PM +0100, Jens Gustedt wrote: > > > and it's not robust since the kernel allows huge argument lists > > > whereby char[argc] would overflow the stack (and even if it > > > didn't, assuming that it doesn't allow them would be an > > > unwarranted assumption unless it actually bought us some > > > simplicity, which I don't see). > > > > My idea was just to do a first pass for the "real" argument > > processing and to note during that pass if an argument wasn't used. > > Then in a second pass reorder argv with that information. > > I don't think you want to pre-process the whole argument list, for > various reasons. The GNU version, as I understand, does argument > permutation as it goes based on the state of the opt* vars rather than > permuting everything at once, and presumably some callers could rely > on this, for example as a way of conditionally suppressing the > permutation by incrementing optind rather than calling getopt when > optind points to a non-option argument. > > It's also just a more complicated design and not compatible with the > existing design/implementation of getopt[_long] to preprocess > everything. As reference the relevant part from man-pages is: By default, getopt() permutes the contents of argv as it scans, so that eventually all the nonoptions are at the end. Two other modes are also implemented. If the first character of optstring is '+' or the envi- ronment variable POSIXLY_CORRECT is set, then option processing stops as soon as a nonoption argument is encountered. If the first character of optstring is '-', then each nonoption argv-element is handled as if it were the argument of an option with character code 1. (This is used by programs that were written to expect options and other argv-elements in any order and that care about the ordering of the two.) The special argument "--" forces an end of option-scanning regardless of the scan- ning mode. So it seems permutation should be done incrementally, not as one big reorder everything operation. Though the wording allows some interpretation freedom too. But what Rich suggested as first, would have been my approach too. I'm slightly confused by the complexity of permute_args() in the BSD version we currently use for Alpine (just grabbed it as it was the first working and BSD licensed version of the function I could find). Not sure if there's some nasty corner case surprises, but I'd start with that as it's the simple approach. Thanks, Timo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: getopt_long permutation algorithm questions 2014-12-04 6:25 ` Timo Teras @ 2014-12-06 17:47 ` Rich Felker 2014-12-06 19:22 ` Rich Felker 0 siblings, 1 reply; 8+ messages in thread From: Rich Felker @ 2014-12-06 17:47 UTC (permalink / raw) To: musl On Thu, Dec 04, 2014 at 08:25:06AM +0200, Timo Teras wrote: > Not sure if there's some nasty corner case surprises, but I'd start > with that as it's the simple approach. There is one nasty corner case I don't know how to deal with. Consider an option string of "ab:" with the command line: ./a.out foo -ab The "desired" result is that -a is accepted successfully and -b yields an error due to missing argument. But permuting "-ab" before "foo" results in "foo" being treated as the argument to -b. Fortunately this issue only matters for erroneous (syntactically invalid) command lines, so we can probably ignore it by just refusing to permute them (thus simply treating "foo" and "-ab" as non-option arguments). Empirically (I haven't RTFS'd and don't intend to) glibc has an alternate approach where permutation happens after the argv[] component has been consumed rather than before. When it sees a non-option argument, it saves the location, jumps forward and processes the next option-containing argument, and only moves it back to the saved location when advancing optind. This exposes nonsensical values of optind to the application; after processing the above and getting the error for -b, optind points to the null pointer slot at the end of argv[], and only jumps back to point to the permuted foo when getopt[_long] is called _again_ after the error. I haven't yet tested what the BSD version does. Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: getopt_long permutation algorithm questions 2014-12-06 17:47 ` Rich Felker @ 2014-12-06 19:22 ` Rich Felker 0 siblings, 0 replies; 8+ messages in thread From: Rich Felker @ 2014-12-06 19:22 UTC (permalink / raw) To: musl; +Cc: alpine-devel On Sat, Dec 06, 2014 at 12:47:10PM -0500, Rich Felker wrote: > On Thu, Dec 04, 2014 at 08:25:06AM +0200, Timo Teras wrote: > > Not sure if there's some nasty corner case surprises, but I'd start > > with that as it's the simple approach. > > There is one nasty corner case I don't know how to deal with. Consider > an option string of "ab:" with the command line: > > ../a.out foo -ab > > The "desired" result is that -a is accepted successfully and -b yields > an error due to missing argument. But permuting "-ab" before "foo" > results in "foo" being treated as the argument to -b. > > Fortunately this issue only matters for erroneous (syntactically > invalid) command lines, so we can probably ignore it by just refusing > to permute them (thus simply treating "foo" and "-ab" as non-option > arguments). > > Empirically (I haven't RTFS'd and don't intend to) glibc has an > alternate approach where permutation happens after the argv[] > component has been consumed rather than before. When it sees a > non-option argument, it saves the location, jumps forward and > processes the next option-containing argument, and only moves it back > to the saved location when advancing optind. This exposes nonsensical > values of optind to the application; after processing the above and > getting the error for -b, optind points to the null pointer slot at > the end of argv[], and only jumps back to point to the permuted foo > when getopt[_long] is called _again_ after the error. > > I haven't yet tested what the BSD version does. The BSD code Alpine is using right now seems to do the same thing as glibc does empirically, and the implementation matches what I expected -- saving state and applying the permutation later. My leaning is still to go with the method I proposed that avoids internal state and inconsistent values of optind during parsing, at the expense of not permuting syntactically invalid command lines. Any opinions? BTW just a heads-up for Alpine: the BSD code that's being patched in right now contains namespace violations and is causing code linked to their modified libc to contain copy-reloc references to optreset rather than __optreset, if it uses optreset. I think this will be safe to fix (old binaries will continue to work once it's fixed), but it's not forwards compatible. Binaries built with the correct symbol name (again, only programs which use optreset) will not work on Alpine's old libc.so, so upgrading libc will be mandatory to use new binaries. This also applies to binaries build against musl on non-Alpine systems and carried over to Alpine. Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-12-06 19:22 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-12-03 19:29 getopt_long permutation algorithm questions Rich Felker 2014-12-03 19:43 ` Jens Gustedt 2014-12-03 19:56 ` Rich Felker 2014-12-03 20:47 ` Jens Gustedt 2014-12-04 0:04 ` Rich Felker 2014-12-04 6:25 ` Timo Teras 2014-12-06 17:47 ` Rich Felker 2014-12-06 19:22 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).