From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_LOW,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 9700 invoked from network); 25 May 2023 13:26:14 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 25 May 2023 13:26:14 -0000 Received: (qmail 7415 invoked by uid 550); 25 May 2023 13:26:10 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 7379 invoked from network); 25 May 2023 13:26:09 -0000 Date: Thu, 25 May 2023 09:25:57 -0400 From: Rich Felker To: musl@lists.openwall.com Message-ID: <20230525132557.GI4163@brightrain.aerifal.cx> References: <4c82138762e69f64a1f95639090edbd8@ispras.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4c82138762e69f64a1f95639090edbd8@ispras.ru> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] getopt_long() can corrupt argv when an argument for a short option is missing On Thu, May 25, 2023 at 10:53:09AM +0300, Alexey Izbyshev wrote: > POSIX requires getopt() to set optind to argc + 1 in case of a > missing argument[1], and musl follows it. This bites getopt_long() > (which reuses getopt()) in two ways: > > * getopt_long() moves argv[optind - 1] (NULL) when permuting argv to > make all options precede other arguments, essentially corrupting > argv. > > * even when permuting is not required, getopt_long() is both > incompatible with glibc (which doesn't increment optind past NULL) > and inconsistent with itself (for a long option with a missing > argument, musl doesn't increment optind past NULL too). > > Example of the wrong NULL shifting: > > #include > #include > > int main(int argc, char *argv[]) { > for (int i = 0; i < 2; i++) { > int r = getopt_long(argc, argv, "o:", NULL, NULL); > printf("r: %d\n", r); > printf("optind: %d\n", optind); > for (int i = 0; i <= argc; i++) > printf("%d: '%s'\n", i, argv[i]); > } > } > > With glibc: > $ ./a.out arg -o > ../a.out: option requires an argument -- 'o' > r: 63 > optind: 3 > 0: './a.out' > 1: 'arg' > 2: '-o' > 3: '(null)' > r: -1 > optind: 2 > 0: './a.out' > 1: '-o' > 2: 'arg' > 3: '(null)' > > (Note that glibc permutes argv *before* parsing then next option, > and even before comparing optind and argc, so argv is still permuted > on the second invocation.) > > With musl: > $ ./a.out arg -o > ../a.out: option requires an argument: o > r: 63 > optind: 3 > 0: './a.out' > 1: '-o' > 2: '(null)' > 3: 'arg' > r: -1 > optind: 3 > 0: './a.out' > 1: '-o' > 2: '(null)' > 3: 'arg' > > Maybe we could just skip permuting and adjust optind if we detected > a missing argument? > > resumed = optind; > ret = __getopt_long_core(argc, argv, optstring, longopts, > idx, longonly); > + if (optind > argc) > + return optind--, ret; > if (resumed > skipped) { > > On a subsequent invocation we won't permute, unlike glibc, but maybe > this is a good thing, given that such permutation makes it look like > there is no missing argument, essentially changing the command > semantics. > > Alexey > > [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html OK, this is indeed a mess. I think there's some inherent inconsistency here, and in general the application should not be calling getopt* again after a missing argument error, but argv[] should not be clobbered and the application might semi-legitimately want to do something with remaining non-option arguments. Just leaving optind indexing the end of the argv array is probably not nice. It loses all information about where non-option arguments started. I think there are two "kinda reasonable" options aside from what you proposed: 1. We could leave optind where it was on invocation (so that it points to the first non-option arg and not do any permutation. This will make subsequent calls to getopt_long repeat the same error over and over, but if the caller does not attempt further calls, would tell the caller the start of the non-option args. However, the final option with missing argument would also appear in this list. 2. We could permute the option with missing argument before the remaining non-option args. I think this gives a final ordering matching glibc, and lets the application see all of the non-option args, without gratuitously including the option with missing arg. However, it does produce a result that re-running getopt_long from the start would misinterpret that option as having had an argument (repurposing the first non-option arg as its arg). Since glibc does this, though, apparently it's expected. My leaning is to do option 2. I think it's as easy as getting rid of the return part of your patch: + if (optind > argc) + optind--; Rich