mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Alexey Izbyshev <izbyshev@ispras.ru>
To: musl@lists.openwall.com
Subject: Re: [musl] getopt_long() can corrupt argv when an argument for a short option is missing
Date: Thu, 25 May 2023 17:42:57 +0300	[thread overview]
Message-ID: <3beea6f283fcbe9ea08e6579347e0af6@ispras.ru> (raw)
In-Reply-To: <20230525132557.GI4163@brightrain.aerifal.cx>

On 2023-05-25 16:25, Rich Felker wrote:
> On Thu, May 25, 2023 at 10:53:09AM +0300, Alexey Izbyshev wrote:
>> POSIX requires getopt() to set optind to argc + 1 in case of a
>> missing argument[1], and musl follows it. This bites getopt_long()
>> (which reuses getopt()) in two ways:
>> 
>> * getopt_long() moves argv[optind - 1] (NULL) when permuting argv to
>> make all options precede other arguments, essentially corrupting
>> argv.
>> 
>> * even when permuting is not required, getopt_long() is both
>> incompatible with glibc (which doesn't increment optind past NULL)
>> and inconsistent with itself (for a long option with a missing
>> argument, musl doesn't increment optind past NULL too).
>> 
>> Example of the wrong NULL shifting:
>> 
>> #include <getopt.h>
>> #include <stdio.h>
>> 
>> int main(int argc, char *argv[]) {
>>     for (int i = 0; i < 2; i++) {
>>         int r = getopt_long(argc, argv, "o:", NULL, NULL);
>>         printf("r: %d\n", r);
>>         printf("optind: %d\n", optind);
>>         for (int i = 0; i <= argc; i++)
>>             printf("%d: '%s'\n", i, argv[i]);
>>     }
>> }
>> 
>> With glibc:
>> $ ./a.out arg -o
>> ../a.out: option requires an argument -- 'o'
>> r: 63
>> optind: 3
>> 0: './a.out'
>> 1: 'arg'
>> 2: '-o'
>> 3: '(null)'
>> r: -1
>> optind: 2
>> 0: './a.out'
>> 1: '-o'
>> 2: 'arg'
>> 3: '(null)'
>> 
>> (Note that glibc permutes argv *before* parsing then next option,
>> and even before comparing optind and argc, so argv is still permuted
>> on the second invocation.)
>> 
>> With musl:
>> $ ./a.out arg -o
>> ../a.out: option requires an argument: o
>> r: 63
>> optind: 3
>> 0: './a.out'
>> 1: '-o'
>> 2: '(null)'
>> 3: 'arg'
>> r: -1
>> optind: 3
>> 0: './a.out'
>> 1: '-o'
>> 2: '(null)'
>> 3: 'arg'
>> 
>> Maybe we could just skip permuting and adjust optind if we detected
>> a missing argument?
>> 
>>         resumed = optind;
>>         ret = __getopt_long_core(argc, argv, optstring, longopts,
>> idx, longonly);
>> +       if (optind > argc)
>> +               return optind--, ret;
>>         if (resumed > skipped) {
>> 
>> On a subsequent invocation we won't permute, unlike glibc, but maybe
>> this is a good thing, given that such permutation makes it look like
>> there is no missing argument, essentially changing the command
>> semantics.
>> 
>> Alexey
>> 
>> [1] 
>> https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html
> 
> OK, this is indeed a mess. I think there's some inherent inconsistency
> here, and in general the application should not be calling getopt*
> again after a missing argument error, but argv[] should not be
> clobbered and the application might semi-legitimately want to do
> something with remaining non-option arguments.
> 
> Just leaving optind indexing the end of the argv array is probably not
> nice. It loses all information about where non-option arguments
> started.
> 
> I think there are two "kinda reasonable" options aside from what you
> proposed:
> 
> 1. We could leave optind where it was on invocation (so that it points
>    to the first non-option arg and not do any permutation. This will
>    make subsequent calls to getopt_long repeat the same error over and
>    over, but if the caller does not attempt further calls, would tell
>    the caller the start of the non-option args. However, the final
>    option with missing argument would also appear in this list.
> 
IMO, while not unreasonable, this option would leave us incompatible 
with glibc (which I assume to be the source of truth for getopt_long()).

Also, either handling of long and short options would remain 
inconsistent, or we'd have to change the former too, creating even more 
incompatibility with glibc.

> 2. We could permute the option with missing argument before the
>    remaining non-option args. I think this gives a final ordering
>    matching glibc, and lets the application see all of the non-option
>    args, without gratuitously including the option with missing arg.
>    However, it does produce a result that re-running getopt_long from
>    the start would misinterpret that option as having had an argument
>    (repurposing the first non-option arg as its arg). Since glibc does
>    this, though, apparently it's expected.
> 
> My leaning is to do option 2. I think it's as easy as getting rid of
> the return part of your patch:
> 
> +       if (optind > argc)
> +               optind--;
> 
This is what I considered before changing to what I proposed. The reason 
of the change is that I thought it's more important to match glibc on 
the getopt_long() invocation that reports a missing argument (and does 
no reordering) than to mimic its subsequent reordering behavior, because 
the application is unlikely to call getopt_long() again after the first 
error.

However, in my patch I missed one thing: reordering would still be 
performed in the same situation for long options (because "optind > 
argc" is never true), so getopt_long() would remain inconsistent.

So, unless we want to stop doing reordering for both short and long 
options to match glibc on the first getopt_long() call, I agree that 
your proposal is better.

Thanks,
Alexey

      reply	other threads:[~2023-05-25 14:43 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-25  7:53 Alexey Izbyshev
2023-05-25 13:25 ` Rich Felker
2023-05-25 14:42   ` Alexey Izbyshev [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3beea6f283fcbe9ea08e6579347e0af6@ispras.ru \
    --to=izbyshev@ispras.ru \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).