mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Denys Vlasenko <vda.linux@googlemail.com>
To: musl <musl@lists.openwall.com>, Rich Felker <dalias@libc.org>,
	 Waldemar Brodkorb <wbx@openadk.org>
Subject: getopt() not exposing __optpos - shell needs it
Date: Mon, 28 Aug 2017 12:18:57 +0200	[thread overview]
Message-ID: <CAK1hOcNk95zZeEaWCX0irwZkAnixYjTsU=GMk8Z9153ZwE9rng@mail.gmail.com> (raw)

I am using getopt() in busybox hush shell.
"unset" builtin, for example: it takes -v and -f options.
This works fine.

However, POSIX requires that shells has a "getopts" builtin:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/getopts.html

It is basically an API binding to access getopt() in the shell code:
it uses OPTIND and (in bash) OPTERR on entry, returns a single-char
variable on return and updates OPTIND and OPTARG. Sounds familiar, right?

When I try to do that (use getopt() to implement "getopts"), it hits a snag.
Unlike normal getopt() usage in C programs, where it is called in a loop
with the same argv[] array until parsing is finished,
when it is used from "getopts", each successive call will (usually) have
the same argv[] CONTENTS, but not the ADDRESSES.
(The reason is in how shell works: it re-creates command arguments just before
running a command, since there can be variable substitution, globbing, etc).

Worse yet, it's possible that between invocations of "getopts", there will be
calls to shell builtins which use getopt() intenally. Example:

while getopts "abc" RES -a -bc -abc de; do
    unset -vf func
done

This would not work correctly: getopt() call inside "unset" would modifies
internal libc state which is tracking position in multi-option strings a-la
"-abc". At best, it can skip options. If libc does not check that position is
not beyond strlen(argv[i]), it can return garbage. (With glibc implementation
of getopt(), it would use outright invalid pointers and return garbage
even _without_ "unset" mangling internal state).

The reason for this is that internal state is not fully exposed. It only
shows (and allows modification) of the current argv[i] index, not the
position inside it:

int getopt(int argc, char **argv, const char *optstring);
extern int optind;   // <== internal state

My current "getopts" code preserves optind value across "getopts" calls,
keeping it in $OPTIND as POSIX intends all along:

        cp = get_local_var_value("OPTIND");
        optind = cp ? atoi(cp) : 0;
        optarg = NULL;

        c = getopt(string_array_len(argv), argv, optstring);

        /* Set OPTARG */
        cp = optarg;
        if (cp)
                set_local_var_from_halves("OPTARG", cp);
        else
                unset_local_var("OPTARG");
        /* Convert -1 to "?" */
        exitcode = EXIT_SUCCESS;
        if (c < 0) { /* -1: end of options */
                exitcode = EXIT_FAILURE;
                c = '?';
        }
        /* Set VAR and OPTIND */
        cbuf[0] = c;
        set_local_var_from_halves(var, cbuf);
        set_local_var_from_halves("OPTIND", utoa(optind));

However, position inside argv[OPTIND] can not be preserved across calls:
it is not exposed by libc.

Musl implementation is pretty simple: it has the "int __optpos" variable,
which holds this information.

I propose to export it along with optind et al:

-extern int optind, opterr, optopt;
+extern int optind, __optpos, opterr, optopt;

I know that the general pushback to such ideas is "this is not standard".

Well, the standard is having a defect here: it was designed with only
single-option strings in mind (-a -b, not -ab). Now multi-options
are a must for any meaningful compatibility. Thus, libc must have additional
internal state. Without extending API a bit, it's impossible to use getopt()
for "getopts" (which seems to be what _shell_ standard intends), instead,
shell C code is forced to reimplement getopt().

This makes me think that "optpos" should be added. even if it's "not standard".
This is how standards evolve: real-world use discover deficiencies, APIs
are fixed, then these changes trickle into next revisions of standards.

(Optionally, musl may want to add some robustification here:

        if (!optpos) optpos++;
        if ((k = mbtowc(&c, argv[optind]+optpos, MB_LEN_MAX)) < 0) {

to prevent "argv[optind]+optpos" to point past the end of the string.)
(This can only happen with fairly pathological cases when user changes
argv[] strings mid-parsing, but still).

glibc/uclibc need more extensive changes, since they use a pointer to store
the position. This is a test in busybox tree which fails miserably
because of that:

https://git.busybox.net/busybox/tree/shell/hush_test/hush-getopts/getopt_test_libc_bug.tests
# This test can fail with libc with buggy getopt() implementation.
# If getopt() wants to parse multi-option args (-abc),
# it needs to remember a position within current arg.
#
# If this position is kept as a POINTER, not an offset,
# and if argv[] ADDRESSES (not contents!) change, it blows up.


             reply	other threads:[~2017-08-28 10:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-28 10:18 Denys Vlasenko [this message]
2017-08-28 15:17 ` Denys Vlasenko
2017-08-28 15:28 ` Rich Felker
2017-08-29 11:32   ` Denys Vlasenko
2017-08-29 12:20     ` Rich Felker
2017-08-29 12:47       ` Denys Vlasenko
2017-08-29 13:07         ` Rich Felker
2017-08-29 16:47           ` Denys Vlasenko
2017-08-29 17:38             ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAK1hOcNk95zZeEaWCX0irwZkAnixYjTsU=GMk8Z9153ZwE9rng@mail.gmail.com' \
    --to=vda.linux@googlemail.com \
    --cc=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    --cc=wbx@openadk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).