* [musl] getopt_long() can corrupt argv when an argument for a short option is missing @ 2023-05-25 7:53 Alexey Izbyshev 2023-05-25 13:25 ` Rich Felker 0 siblings, 1 reply; 3+ messages in thread From: Alexey Izbyshev @ 2023-05-25 7:53 UTC (permalink / raw) To: musl POSIX requires getopt() to set optind to argc + 1 in case of a missing argument[1], and musl follows it. This bites getopt_long() (which reuses getopt()) in two ways: * getopt_long() moves argv[optind - 1] (NULL) when permuting argv to make all options precede other arguments, essentially corrupting argv. * even when permuting is not required, getopt_long() is both incompatible with glibc (which doesn't increment optind past NULL) and inconsistent with itself (for a long option with a missing argument, musl doesn't increment optind past NULL too). Example of the wrong NULL shifting: #include <getopt.h> #include <stdio.h> int main(int argc, char *argv[]) { for (int i = 0; i < 2; i++) { int r = getopt_long(argc, argv, "o:", NULL, NULL); printf("r: %d\n", r); printf("optind: %d\n", optind); for (int i = 0; i <= argc; i++) printf("%d: '%s'\n", i, argv[i]); } } With glibc: $ ./a.out arg -o ./a.out: option requires an argument -- 'o' r: 63 optind: 3 0: './a.out' 1: 'arg' 2: '-o' 3: '(null)' r: -1 optind: 2 0: './a.out' 1: '-o' 2: 'arg' 3: '(null)' (Note that glibc permutes argv *before* parsing then next option, and even before comparing optind and argc, so argv is still permuted on the second invocation.) With musl: $ ./a.out arg -o ./a.out: option requires an argument: o r: 63 optind: 3 0: './a.out' 1: '-o' 2: '(null)' 3: 'arg' r: -1 optind: 3 0: './a.out' 1: '-o' 2: '(null)' 3: 'arg' Maybe we could just skip permuting and adjust optind if we detected a missing argument? resumed = optind; ret = __getopt_long_core(argc, argv, optstring, longopts, idx, longonly); + if (optind > argc) + return optind--, ret; if (resumed > skipped) { On a subsequent invocation we won't permute, unlike glibc, but maybe this is a good thing, given that such permutation makes it look like there is no missing argument, essentially changing the command semantics. Alexey [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [musl] getopt_long() can corrupt argv when an argument for a short option is missing 2023-05-25 7:53 [musl] getopt_long() can corrupt argv when an argument for a short option is missing Alexey Izbyshev @ 2023-05-25 13:25 ` Rich Felker 2023-05-25 14:42 ` Alexey Izbyshev 0 siblings, 1 reply; 3+ messages in thread From: Rich Felker @ 2023-05-25 13:25 UTC (permalink / raw) To: musl On Thu, May 25, 2023 at 10:53:09AM +0300, Alexey Izbyshev wrote: > POSIX requires getopt() to set optind to argc + 1 in case of a > missing argument[1], and musl follows it. This bites getopt_long() > (which reuses getopt()) in two ways: > > * getopt_long() moves argv[optind - 1] (NULL) when permuting argv to > make all options precede other arguments, essentially corrupting > argv. > > * even when permuting is not required, getopt_long() is both > incompatible with glibc (which doesn't increment optind past NULL) > and inconsistent with itself (for a long option with a missing > argument, musl doesn't increment optind past NULL too). > > Example of the wrong NULL shifting: > > #include <getopt.h> > #include <stdio.h> > > int main(int argc, char *argv[]) { > for (int i = 0; i < 2; i++) { > int r = getopt_long(argc, argv, "o:", NULL, NULL); > printf("r: %d\n", r); > printf("optind: %d\n", optind); > for (int i = 0; i <= argc; i++) > printf("%d: '%s'\n", i, argv[i]); > } > } > > With glibc: > $ ./a.out arg -o > ../a.out: option requires an argument -- 'o' > r: 63 > optind: 3 > 0: './a.out' > 1: 'arg' > 2: '-o' > 3: '(null)' > r: -1 > optind: 2 > 0: './a.out' > 1: '-o' > 2: 'arg' > 3: '(null)' > > (Note that glibc permutes argv *before* parsing then next option, > and even before comparing optind and argc, so argv is still permuted > on the second invocation.) > > With musl: > $ ./a.out arg -o > ../a.out: option requires an argument: o > r: 63 > optind: 3 > 0: './a.out' > 1: '-o' > 2: '(null)' > 3: 'arg' > r: -1 > optind: 3 > 0: './a.out' > 1: '-o' > 2: '(null)' > 3: 'arg' > > Maybe we could just skip permuting and adjust optind if we detected > a missing argument? > > resumed = optind; > ret = __getopt_long_core(argc, argv, optstring, longopts, > idx, longonly); > + if (optind > argc) > + return optind--, ret; > if (resumed > skipped) { > > On a subsequent invocation we won't permute, unlike glibc, but maybe > this is a good thing, given that such permutation makes it look like > there is no missing argument, essentially changing the command > semantics. > > Alexey > > [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html OK, this is indeed a mess. I think there's some inherent inconsistency here, and in general the application should not be calling getopt* again after a missing argument error, but argv[] should not be clobbered and the application might semi-legitimately want to do something with remaining non-option arguments. Just leaving optind indexing the end of the argv array is probably not nice. It loses all information about where non-option arguments started. I think there are two "kinda reasonable" options aside from what you proposed: 1. We could leave optind where it was on invocation (so that it points to the first non-option arg and not do any permutation. This will make subsequent calls to getopt_long repeat the same error over and over, but if the caller does not attempt further calls, would tell the caller the start of the non-option args. However, the final option with missing argument would also appear in this list. 2. We could permute the option with missing argument before the remaining non-option args. I think this gives a final ordering matching glibc, and lets the application see all of the non-option args, without gratuitously including the option with missing arg. However, it does produce a result that re-running getopt_long from the start would misinterpret that option as having had an argument (repurposing the first non-option arg as its arg). Since glibc does this, though, apparently it's expected. My leaning is to do option 2. I think it's as easy as getting rid of the return part of your patch: + if (optind > argc) + optind--; Rich ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [musl] getopt_long() can corrupt argv when an argument for a short option is missing 2023-05-25 13:25 ` Rich Felker @ 2023-05-25 14:42 ` Alexey Izbyshev 0 siblings, 0 replies; 3+ messages in thread From: Alexey Izbyshev @ 2023-05-25 14:42 UTC (permalink / raw) To: musl On 2023-05-25 16:25, Rich Felker wrote: > On Thu, May 25, 2023 at 10:53:09AM +0300, Alexey Izbyshev wrote: >> POSIX requires getopt() to set optind to argc + 1 in case of a >> missing argument[1], and musl follows it. This bites getopt_long() >> (which reuses getopt()) in two ways: >> >> * getopt_long() moves argv[optind - 1] (NULL) when permuting argv to >> make all options precede other arguments, essentially corrupting >> argv. >> >> * even when permuting is not required, getopt_long() is both >> incompatible with glibc (which doesn't increment optind past NULL) >> and inconsistent with itself (for a long option with a missing >> argument, musl doesn't increment optind past NULL too). >> >> Example of the wrong NULL shifting: >> >> #include <getopt.h> >> #include <stdio.h> >> >> int main(int argc, char *argv[]) { >> for (int i = 0; i < 2; i++) { >> int r = getopt_long(argc, argv, "o:", NULL, NULL); >> printf("r: %d\n", r); >> printf("optind: %d\n", optind); >> for (int i = 0; i <= argc; i++) >> printf("%d: '%s'\n", i, argv[i]); >> } >> } >> >> With glibc: >> $ ./a.out arg -o >> ../a.out: option requires an argument -- 'o' >> r: 63 >> optind: 3 >> 0: './a.out' >> 1: 'arg' >> 2: '-o' >> 3: '(null)' >> r: -1 >> optind: 2 >> 0: './a.out' >> 1: '-o' >> 2: 'arg' >> 3: '(null)' >> >> (Note that glibc permutes argv *before* parsing then next option, >> and even before comparing optind and argc, so argv is still permuted >> on the second invocation.) >> >> With musl: >> $ ./a.out arg -o >> ../a.out: option requires an argument: o >> r: 63 >> optind: 3 >> 0: './a.out' >> 1: '-o' >> 2: '(null)' >> 3: 'arg' >> r: -1 >> optind: 3 >> 0: './a.out' >> 1: '-o' >> 2: '(null)' >> 3: 'arg' >> >> Maybe we could just skip permuting and adjust optind if we detected >> a missing argument? >> >> resumed = optind; >> ret = __getopt_long_core(argc, argv, optstring, longopts, >> idx, longonly); >> + if (optind > argc) >> + return optind--, ret; >> if (resumed > skipped) { >> >> On a subsequent invocation we won't permute, unlike glibc, but maybe >> this is a good thing, given that such permutation makes it look like >> there is no missing argument, essentially changing the command >> semantics. >> >> Alexey >> >> [1] >> https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html > > OK, this is indeed a mess. I think there's some inherent inconsistency > here, and in general the application should not be calling getopt* > again after a missing argument error, but argv[] should not be > clobbered and the application might semi-legitimately want to do > something with remaining non-option arguments. > > Just leaving optind indexing the end of the argv array is probably not > nice. It loses all information about where non-option arguments > started. > > I think there are two "kinda reasonable" options aside from what you > proposed: > > 1. We could leave optind where it was on invocation (so that it points > to the first non-option arg and not do any permutation. This will > make subsequent calls to getopt_long repeat the same error over and > over, but if the caller does not attempt further calls, would tell > the caller the start of the non-option args. However, the final > option with missing argument would also appear in this list. > IMO, while not unreasonable, this option would leave us incompatible with glibc (which I assume to be the source of truth for getopt_long()). Also, either handling of long and short options would remain inconsistent, or we'd have to change the former too, creating even more incompatibility with glibc. > 2. We could permute the option with missing argument before the > remaining non-option args. I think this gives a final ordering > matching glibc, and lets the application see all of the non-option > args, without gratuitously including the option with missing arg. > However, it does produce a result that re-running getopt_long from > the start would misinterpret that option as having had an argument > (repurposing the first non-option arg as its arg). Since glibc does > this, though, apparently it's expected. > > My leaning is to do option 2. I think it's as easy as getting rid of > the return part of your patch: > > + if (optind > argc) > + optind--; > This is what I considered before changing to what I proposed. The reason of the change is that I thought it's more important to match glibc on the getopt_long() invocation that reports a missing argument (and does no reordering) than to mimic its subsequent reordering behavior, because the application is unlikely to call getopt_long() again after the first error. However, in my patch I missed one thing: reordering would still be performed in the same situation for long options (because "optind > argc" is never true), so getopt_long() would remain inconsistent. So, unless we want to stop doing reordering for both short and long options to match glibc on the first getopt_long() call, I agree that your proposal is better. Thanks, Alexey ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-05-25 14:43 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-05-25 7:53 [musl] getopt_long() can corrupt argv when an argument for a short option is missing Alexey Izbyshev 2023-05-25 13:25 ` Rich Felker 2023-05-25 14:42 ` Alexey Izbyshev
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).