mailing list of musl libc
 help / color / mirror / code / Atom feed
* getopt() not exposing __optpos - shell needs it
@ 2017-08-28 10:18 Denys Vlasenko
  2017-08-28 15:17 ` Denys Vlasenko
  2017-08-28 15:28 ` Rich Felker
  0 siblings, 2 replies; 9+ messages in thread
From: Denys Vlasenko @ 2017-08-28 10:18 UTC (permalink / raw)
  To: musl, Rich Felker, Waldemar Brodkorb

I am using getopt() in busybox hush shell.
"unset" builtin, for example: it takes -v and -f options.
This works fine.

However, POSIX requires that shells has a "getopts" builtin:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/getopts.html

It is basically an API binding to access getopt() in the shell code:
it uses OPTIND and (in bash) OPTERR on entry, returns a single-char
variable on return and updates OPTIND and OPTARG. Sounds familiar, right?

When I try to do that (use getopt() to implement "getopts"), it hits a snag.
Unlike normal getopt() usage in C programs, where it is called in a loop
with the same argv[] array until parsing is finished,
when it is used from "getopts", each successive call will (usually) have
the same argv[] CONTENTS, but not the ADDRESSES.
(The reason is in how shell works: it re-creates command arguments just before
running a command, since there can be variable substitution, globbing, etc).

Worse yet, it's possible that between invocations of "getopts", there will be
calls to shell builtins which use getopt() intenally. Example:

while getopts "abc" RES -a -bc -abc de; do
    unset -vf func
done

This would not work correctly: getopt() call inside "unset" would modifies
internal libc state which is tracking position in multi-option strings a-la
"-abc". At best, it can skip options. If libc does not check that position is
not beyond strlen(argv[i]), it can return garbage. (With glibc implementation
of getopt(), it would use outright invalid pointers and return garbage
even _without_ "unset" mangling internal state).

The reason for this is that internal state is not fully exposed. It only
shows (and allows modification) of the current argv[i] index, not the
position inside it:

int getopt(int argc, char **argv, const char *optstring);
extern int optind;   // <== internal state

My current "getopts" code preserves optind value across "getopts" calls,
keeping it in $OPTIND as POSIX intends all along:

        cp = get_local_var_value("OPTIND");
        optind = cp ? atoi(cp) : 0;
        optarg = NULL;

        c = getopt(string_array_len(argv), argv, optstring);

        /* Set OPTARG */
        cp = optarg;
        if (cp)
                set_local_var_from_halves("OPTARG", cp);
        else
                unset_local_var("OPTARG");
        /* Convert -1 to "?" */
        exitcode = EXIT_SUCCESS;
        if (c < 0) { /* -1: end of options */
                exitcode = EXIT_FAILURE;
                c = '?';
        }
        /* Set VAR and OPTIND */
        cbuf[0] = c;
        set_local_var_from_halves(var, cbuf);
        set_local_var_from_halves("OPTIND", utoa(optind));

However, position inside argv[OPTIND] can not be preserved across calls:
it is not exposed by libc.

Musl implementation is pretty simple: it has the "int __optpos" variable,
which holds this information.

I propose to export it along with optind et al:

-extern int optind, opterr, optopt;
+extern int optind, __optpos, opterr, optopt;

I know that the general pushback to such ideas is "this is not standard".

Well, the standard is having a defect here: it was designed with only
single-option strings in mind (-a -b, not -ab). Now multi-options
are a must for any meaningful compatibility. Thus, libc must have additional
internal state. Without extending API a bit, it's impossible to use getopt()
for "getopts" (which seems to be what _shell_ standard intends), instead,
shell C code is forced to reimplement getopt().

This makes me think that "optpos" should be added. even if it's "not standard".
This is how standards evolve: real-world use discover deficiencies, APIs
are fixed, then these changes trickle into next revisions of standards.

(Optionally, musl may want to add some robustification here:

        if (!optpos) optpos++;
        if ((k = mbtowc(&c, argv[optind]+optpos, MB_LEN_MAX)) < 0) {

to prevent "argv[optind]+optpos" to point past the end of the string.)
(This can only happen with fairly pathological cases when user changes
argv[] strings mid-parsing, but still).

glibc/uclibc need more extensive changes, since they use a pointer to store
the position. This is a test in busybox tree which fails miserably
because of that:

https://git.busybox.net/busybox/tree/shell/hush_test/hush-getopts/getopt_test_libc_bug.tests
# This test can fail with libc with buggy getopt() implementation.
# If getopt() wants to parse multi-option args (-abc),
# it needs to remember a position within current arg.
#
# If this position is kept as a POINTER, not an offset,
# and if argv[] ADDRESSES (not contents!) change, it blows up.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getopt() not exposing __optpos - shell needs it
  2017-08-28 10:18 getopt() not exposing __optpos - shell needs it Denys Vlasenko
@ 2017-08-28 15:17 ` Denys Vlasenko
  2017-08-28 15:28 ` Rich Felker
  1 sibling, 0 replies; 9+ messages in thread
From: Denys Vlasenko @ 2017-08-28 15:17 UTC (permalink / raw)
  To: musl, Rich Felker, Waldemar Brodkorb

[-- Attachment #1: Type: text/plain, Size: 722 bytes --]

On Mon, Aug 28, 2017 at 12:18 PM, Denys Vlasenko
<vda.linux@googlemail.com> wrote:
> glibc/uclibc need more extensive changes, since they use a pointer to store
> the position. This is a test in busybox tree which fails miserably
> because of that:
>
> https://git.busybox.net/busybox/tree/shell/hush_test/hush-getopts/getopt_test_libc_bug.tests
> # This test can fail with libc with buggy getopt() implementation.
> # If getopt() wants to parse multi-option args (-abc),
> # it needs to remember a position within current arg.
> #
> # If this position is kept as a POINTER, not an offset,
> # and if argv[] ADDRESSES (not contents!) change, it blows up.


uclibc would need something like this patch (not tested at all):

[-- Attachment #2: z.diff --]
[-- Type: text/plain, Size: 12275 bytes --]

diff --git a/libc/sysdeps/linux/common/bits/getopt.h b/libc/sysdeps/linux/common/bits/getopt.h
index dababe0..c4f5ab7 100644
--- a/libc/sysdeps/linux/common/bits/getopt.h
+++ b/libc/sysdeps/linux/common/bits/getopt.h
@@ -48,6 +48,7 @@ extern char *optarg;
    how much of ARGV has been scanned so far.  */
 
 extern int optind;
+extern int __optpos; /* position within current "-abc" multiopt string */
 
 /* Callers store zero here to inhibit the error message `getopt' prints
    for unrecognized options.  */
diff --git a/libc/unistd/getopt.c b/libc/unistd/getopt.c
index 9d2df34..3c7e6c5 100644
--- a/libc/unistd/getopt.c
+++ b/libc/unistd/getopt.c
@@ -106,6 +106,8 @@ char *optarg;
 /* 1003.2 says this must be 1 before any call.  */
 int optind = 1;
 
+int __optpos = 0;
+
 /* Callers store zero here to inhibit the error message
    for unrecognized options.  */
 
@@ -261,7 +263,7 @@ _getopt_initialize (attribute_unused int argc, attribute_unused char *const *arg
 
   d->__first_nonopt = d->__last_nonopt = d->optind;
 
-  d->__nextchar = NULL;
+  d->optpos = 0;
 
   d->__posixly_correct = !!getenv ("POSIXLY_CORRECT");
 
@@ -325,7 +327,7 @@ _getopt_initialize (attribute_unused int argc, attribute_unused char *const *arg
    from each of the option elements.
 
    If `getopt' finds another option character, it returns that character,
-   updating `optind' and `nextchar' so that the next call to `getopt' can
+   updating `optind' and `optpos' so that the next call to `getopt' can
    resume the scan with the following option character or ARGV-element.
 
    If there are no more option characters, `getopt' returns -1.
@@ -376,6 +378,7 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 		    const struct option *longopts, int *longind,
 		    int long_only, struct _getopt_data *d)
 {
+  const char *nextchar;
   int print_errors = d->opterr;
   if (optstring[0] == ':')
     print_errors = 0;
@@ -405,9 +408,17 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 # define NONOPTION_P (argv[d->optind][0] != '-' || argv[d->optind][1] == '\0')
 #endif
 
-  if (d->__nextchar == NULL || *d->__nextchar == '\0')
+  nextchar = "";
+  if (d->optind < argc && d->optpos > 0)
+    /* Could be simply "nextchar = argv[d->optind] + d->optpos"
+     * but let's be paranoid about optpos pointing beyond nul byte.
+     */
+    nextchar = argv[d->optind] + strnlen(argv[d->optind], d->optpos);
+
+  if (*nextchar == '\0')
     {
       /* Advance to the next ARGV-element.  */
+      /* (d->optind already points to the next argv[i]) */
 
       /* Give FIRST_NONOPT & LAST_NONOPT rational values if OPTIND has been
 	 moved back by the user (who may also have changed the arguments).  */
@@ -480,8 +491,8 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
       /* We have found another option-ARGV-element.
 	 Skip the initial punctuation.  */
 
-      d->__nextchar = (argv[d->optind] + 1
-		  + (longopts != NULL && argv[d->optind][1] == '-'));
+      d->optpos = (1 + (longopts != NULL && argv[d->optind][1] == '-'));
+      nextchar = argv[d->optind] + d->optpos;
     }
 
   /* Decode the current option-ARGV-element.  */
@@ -512,15 +523,15 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
       int indfound = -1;
       int option_index;
 
-      for (nameend = d->__nextchar; *nameend && *nameend != '='; nameend++)
+      for (nameend = nextchar; *nameend && *nameend != '='; nameend++)
 	/* Do nothing.  */ ;
 
       /* Test all long options for either exact match
 	 or abbreviated matches.  */
       for (p = longopts, option_index = 0; p->name; p++, option_index++)
-	if (!strncmp (p->name, d->__nextchar, nameend - d->__nextchar))
+	if (!strncmp (p->name, nextchar, nameend - nextchar))
 	  {
-	    if ((unsigned int) (nameend - d->__nextchar)
+	    if ((unsigned int) (nameend - nextchar)
 		== (unsigned int) strlen (p->name))
 	      {
 		/* Exact match found.  */
@@ -570,7 +581,7 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 		       argv[0], argv[d->optind]);
 #endif
 	    }
-	  d->__nextchar += strlen (d->__nextchar);
+	  d->optpos = 0;
 	  d->optind++;
 	  d->optopt = 0;
 	  return '?';
@@ -643,7 +654,7 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 #endif
 		    }
 
-		  d->__nextchar += strlen (d->__nextchar);
+		  d->optpos = 0;
 
 		  d->optopt = pfound->val;
 		  return '?';
@@ -683,12 +694,12 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 			       argv[0], argv[d->optind - 1]);
 #endif
 		    }
-		  d->__nextchar += strlen (d->__nextchar);
+		  d->optpos = 0;
 		  d->optopt = pfound->val;
 		  return optstring[0] == ':' ? ':' : '?';
 		}
 	    }
-	  d->__nextchar += strlen (d->__nextchar);
+	  d->optpos = 0;
 	  if (longind != NULL)
 	    *longind = option_index;
 	  if (pfound->flag)
@@ -704,7 +715,7 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 	 option, then it's an error.
 	 Otherwise interpret it as a short option.  */
       if (!long_only || argv[d->optind][1] == '-'
-	  || strchr (optstring, *d->__nextchar) == NULL)
+	  || strchr (optstring, *nextchar) == NULL)
 	{
 	  if (print_errors)
 	    {
@@ -718,10 +729,10 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 		  /* --option */
 #if defined _LIBC && defined USE_IN_LIBIO
 		  n = __asprintf (&buf, "%s: unrecognized option `--%s'\n",
-				  argv[0], d->__nextchar);
+				  argv[0], nextchar);
 #else
 		  fprintf (stderr, "%s: unrecognized option `--%s'\n",
-			   argv[0], d->__nextchar);
+			   argv[0], nextchar);
 #endif
 		}
 	      else
@@ -729,10 +740,10 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 		  /* +option or -option */
 #if defined _LIBC && defined USE_IN_LIBIO
 		  n = __asprintf (&buf, "%s: unrecognized option `%c%s'\n",
-				  argv[0], argv[d->optind][0], d->__nextchar);
+				  argv[0], argv[d->optind][0], nextchar);
 #else
 		  fprintf (stderr, "%s: unrecognized option `%c%s'\n",
-			   argv[0], argv[d->optind][0], d->__nextchar);
+			   argv[0], argv[d->optind][0], nextchar);
 #endif
 		}
 
@@ -753,7 +764,7 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 		}
 #endif
 	    }
-	  d->__nextchar = (char *) "";
+	  d->optpos = 0;
 	  d->optind++;
 	  d->optopt = 0;
 	  return '?';
@@ -763,11 +774,12 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
   /* Look at and handle the next short option-character.  */
 
   {
-    char c = *d->__nextchar++;
+    char c = *nextchar++;
     char *temp = strchr (optstring, c);
+    d->optpos++;
 
     /* Increment `optind' when we start to process its last character.  */
-    if (*d->__nextchar == '\0')
+    if (*nextchar == '\0')
       ++d->optind;
 
     if (temp == NULL || c == ':')
@@ -832,9 +844,9 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 	int option_index;
 
 	/* This is an option that requires an argument.  */
-	if (*d->__nextchar != '\0')
+	if (*nextchar != '\0')
 	  {
-	    d->optarg = d->__nextchar;
+	    d->optarg = nextchar;
 	    /* If we end this ARGV-element by taking the rest as an arg,
 	       we must advance to the next element now.  */
 	    d->optind++;
@@ -879,20 +891,21 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 	  /* We already incremented `d->optind' once;
 	     increment it again when taking next ARGV-elt as argument.  */
 	  d->optarg = argv[d->optind++];
+	d->optpos = 0;
 
 	/* optarg is now the argument, see if it's in the
 	   table of longopts.  */
 
-	for (d->__nextchar = nameend = d->optarg; *nameend && *nameend != '=';
+	for (nextchar = nameend = d->optarg; *nameend && *nameend != '=';
 	     nameend++)
 	  /* Do nothing.  */ ;
 
 	/* Test all long options for either exact match
 	   or abbreviated matches.  */
 	for (p = longopts, option_index = 0; p->name; p++, option_index++)
-	  if (!strncmp (p->name, d->__nextchar, nameend - d->__nextchar))
+	  if (!strncmp (p->name, nextchar, nameend - nextchar))
 	    {
-	      if ((unsigned int) (nameend - d->__nextchar) == strlen (p->name))
+	      if ((unsigned int) (nameend - nextchar) == strlen (p->name))
 		{
 		  /* Exact match found.  */
 		  pfound = p;
@@ -937,7 +950,7 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 			 argv[0], argv[d->optind]);
 #endif
 	      }
-	    d->__nextchar += strlen (d->__nextchar);
+	    d->optpos = 0;
 	    d->optind++;
 	    return '?';
 	  }
@@ -981,7 +994,7 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 #endif
 		      }
 
-		    d->__nextchar += strlen (d->__nextchar);
+		    d->optpos = 0;
 		    return '?';
 		  }
 	      }
@@ -1019,11 +1032,11 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 				 argv[0], argv[d->optind - 1]);
 #endif
 		      }
-		    d->__nextchar += strlen (d->__nextchar);
+		    d->optpos = 0;
 		    return optstring[0] == ':' ? ':' : '?';
 		  }
 	      }
-	    d->__nextchar += strlen (d->__nextchar);
+	    d->optpos = 0;
 	    if (longind != NULL)
 	      *longind = option_index;
 	    if (pfound->flag)
@@ -1033,7 +1046,7 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 	      }
 	    return pfound->val;
 	  }
-	  d->__nextchar = NULL;
+	  d->optpos = 0;
 	  return 'W';	/* Let the application handle it.   */
       }
 #endif
@@ -1042,21 +1055,21 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 	if (temp[2] == ':')
 	  {
 	    /* This is an option that accepts an argument optionally.  */
-	    if (*d->__nextchar != '\0')
+	    if (*nextchar != '\0')
 	      {
-		d->optarg = d->__nextchar;
+		d->optarg = nextchar;
 		d->optind++;
 	      }
 	    else
 	      d->optarg = NULL;
-	    d->__nextchar = NULL;
+	    d->optpos = 0;
 	  }
 	else
 	  {
 	    /* This is an option that requires an argument.  */
-	    if (*d->__nextchar != '\0')
+	    if (*nextchar != '\0')
 	      {
-		d->optarg = d->__nextchar;
+		d->optarg = nextchar;
 		/* If we end this ARGV-element by taking the rest as an arg,
 		   we must advance to the next element now.  */
 		d->optind++;
@@ -1101,7 +1114,7 @@ _getopt_internal_r (int argc, char *const *argv, const char *optstring,
 	      /* We already incremented `optind' once;
 		 increment it again when taking next ARGV-elt as argument.  */
 	      d->optarg = argv[d->optind++];
-	    d->__nextchar = NULL;
+	    d->optpos = 0;
 	  }
       }
     return c;
@@ -1115,12 +1128,14 @@ _getopt_internal (int argc, char *const *argv, const char *optstring,
   int result;
 
   getopt_data.optind = optind;
+  getopt_data.optpos = __optpos;
   getopt_data.opterr = opterr;
 
   result = _getopt_internal_r (argc, argv, optstring, longopts,
 			       longind, long_only, &getopt_data);
 
   optind = getopt_data.optind;
+  __optpos = getopt_data.optpos;
   optarg = getopt_data.optarg;
   optopt = getopt_data.optopt;
 
diff --git a/libc/unistd/getopt_int.h b/libc/unistd/getopt_int.h
index a70fa8b..4c03fe0 100644
--- a/libc/unistd/getopt_int.h
+++ b/libc/unistd/getopt_int.h
@@ -42,6 +42,13 @@ struct _getopt_data
      variables, except that they are used for the reentrant
      versions of getopt.  */
   int optind;
+  /* The next char to be scanned in the option-element
+     in which the last option character we returned was found.
+     This allows us to pick up the scan where we left off.
+
+     If this is zero, it means resume the scan
+     by advancing to the next ARGV-element.  */
+  int optpos;
   int opterr;
   char *optarg;
   smalluint optopt; /* we store characters here, a byte is enough */
@@ -84,14 +91,6 @@ struct _getopt_data
   /* If the POSIXLY_CORRECT environment variable is set.  */
   smallint __posixly_correct;
 
-  /* The next char to be scanned in the option-element
-     in which the last option character we returned was found.
-     This allows us to pick up the scan where we left off.
-
-     If this is zero, or a null string, it means resume the scan
-     by advancing to the next ARGV-element.  */
-  char *__nextchar;
-
 
   /* Handle permutation of arguments.  */
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getopt() not exposing __optpos - shell needs it
  2017-08-28 10:18 getopt() not exposing __optpos - shell needs it Denys Vlasenko
  2017-08-28 15:17 ` Denys Vlasenko
@ 2017-08-28 15:28 ` Rich Felker
  2017-08-29 11:32   ` Denys Vlasenko
  1 sibling, 1 reply; 9+ messages in thread
From: Rich Felker @ 2017-08-28 15:28 UTC (permalink / raw)
  To: musl

On Mon, Aug 28, 2017 at 12:18:57PM +0200, Denys Vlasenko wrote:
> I am using getopt() in busybox hush shell.
> "unset" builtin, for example: it takes -v and -f options.
> This works fine.
> 
> However, POSIX requires that shells has a "getopts" builtin:
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/getopts.html
> 
> It is basically an API binding to access getopt() in the shell code:
> it uses OPTIND and (in bash) OPTERR on entry, returns a single-char
> variable on return and updates OPTIND and OPTARG. Sounds familiar, right?
> 
> When I try to do that (use getopt() to implement "getopts"), it hits a snag.
> Unlike normal getopt() usage in C programs, where it is called in a loop
> with the same argv[] array until parsing is finished,
> when it is used from "getopts", each successive call will (usually) have
> the same argv[] CONTENTS, but not the ADDRESSES.
> (The reason is in how shell works: it re-creates command arguments just before
> running a command, since there can be variable substitution, globbing, etc).

First, some background out of the spec to establish what is supposed
to work and what's not:

    If the application sets OPTIND to the value 1, a new set of
    parameters can be used: either the current positional parameters
    or new arg values. Any other attempt to invoke getopts multiple
    times in a single shell execution environment with parameters
    (positional parameters or arg operands) that are not the same in
    all invocations, or with an OPTIND value modified to be a value
    other than 1, produces unspecified results.

What this means is that, when you use getopts(1), you need to either
use the exact same arguments (as you said, *string contents*, not
likely to be the same argv[] pointers) or reset it with OPTIND=1.

It seems to me that the easiest, fully-portable fix is just the
obvious quadratic-time solution: on each run of getopts(1), reset
getopt(3) to the start and call it ++N times.

A less costly portable solution would be to track the number of calls
since optind advanced, and only reset that far, so that the number of
getopt(3) calls is only quadratic in the number of 'flag-type' options
that are stuck together in a single argv[] entry.

Rich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getopt() not exposing __optpos - shell needs it
  2017-08-28 15:28 ` Rich Felker
@ 2017-08-29 11:32   ` Denys Vlasenko
  2017-08-29 12:20     ` Rich Felker
  0 siblings, 1 reply; 9+ messages in thread
From: Denys Vlasenko @ 2017-08-29 11:32 UTC (permalink / raw)
  To: musl, Rich Felker

On Mon, Aug 28, 2017 at 5:28 PM, Rich Felker <dalias@libc.org> wrote:
> On Mon, Aug 28, 2017 at 12:18:57PM +0200, Denys Vlasenko wrote:
>> I am using getopt() in busybox hush shell.
>> "unset" builtin, for example: it takes -v and -f options.
>> This works fine.
>>
>> However, POSIX requires that shells has a "getopts" builtin:
>> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/getopts.html
>>
>> It is basically an API binding to access getopt() in the shell code:
>> it uses OPTIND and (in bash) OPTERR on entry, returns a single-char
>> variable on return and updates OPTIND and OPTARG. Sounds familiar, right?
>>
>> When I try to do that (use getopt() to implement "getopts"), it hits a snag.
>> Unlike normal getopt() usage in C programs, where it is called in a loop
>> with the same argv[] array until parsing is finished,
>> when it is used from "getopts", each successive call will (usually) have
>> the same argv[] CONTENTS, but not the ADDRESSES.
>> (The reason is in how shell works: it re-creates command arguments just before
>> running a command, since there can be variable substitution, globbing, etc).
>
> First, some background out of the spec to establish what is supposed
> to work and what's not:
>
>     If the application sets OPTIND to the value 1, a new set of
>     parameters can be used: either the current positional parameters
>     or new arg values. Any other attempt to invoke getopts multiple
>     times in a single shell execution environment with parameters
>     (positional parameters or arg operands) that are not the same in
>     all invocations, or with an OPTIND value modified to be a value
>     other than 1, produces unspecified results.
>
> What this means is that, when you use getopts(1), you need to either
> use the exact same arguments (as you said, *string contents*, not
> likely to be the same argv[] pointers) or reset it with OPTIND=1.
>
> It seems to me that the easiest, fully-portable fix is just the
> obvious quadratic-time solution: on each run of getopts(1), reset
> getopt(3) to the start and call it ++N times.

This has several problems:
It prints multiple messages "invalid option -q"
when there are options which are not in optstring.
It mangles optarg if an option without argument follows
an option with an argument.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getopt() not exposing __optpos - shell needs it
  2017-08-29 11:32   ` Denys Vlasenko
@ 2017-08-29 12:20     ` Rich Felker
  2017-08-29 12:47       ` Denys Vlasenko
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Felker @ 2017-08-29 12:20 UTC (permalink / raw)
  To: musl

On Tue, Aug 29, 2017 at 01:32:01PM +0200, Denys Vlasenko wrote:
> On Mon, Aug 28, 2017 at 5:28 PM, Rich Felker <dalias@libc.org> wrote:
> > On Mon, Aug 28, 2017 at 12:18:57PM +0200, Denys Vlasenko wrote:
> >> I am using getopt() in busybox hush shell.
> >> "unset" builtin, for example: it takes -v and -f options.
> >> This works fine.
> >>
> >> However, POSIX requires that shells has a "getopts" builtin:
> >> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/getopts.html
> >>
> >> It is basically an API binding to access getopt() in the shell code:
> >> it uses OPTIND and (in bash) OPTERR on entry, returns a single-char
> >> variable on return and updates OPTIND and OPTARG. Sounds familiar, right?
> >>
> >> When I try to do that (use getopt() to implement "getopts"), it hits a snag.
> >> Unlike normal getopt() usage in C programs, where it is called in a loop
> >> with the same argv[] array until parsing is finished,
> >> when it is used from "getopts", each successive call will (usually) have
> >> the same argv[] CONTENTS, but not the ADDRESSES.
> >> (The reason is in how shell works: it re-creates command arguments just before
> >> running a command, since there can be variable substitution, globbing, etc).
> >
> > First, some background out of the spec to establish what is supposed
> > to work and what's not:
> >
> >     If the application sets OPTIND to the value 1, a new set of
> >     parameters can be used: either the current positional parameters
> >     or new arg values. Any other attempt to invoke getopts multiple
> >     times in a single shell execution environment with parameters
> >     (positional parameters or arg operands) that are not the same in
> >     all invocations, or with an OPTIND value modified to be a value
> >     other than 1, produces unspecified results.
> >
> > What this means is that, when you use getopts(1), you need to either
> > use the exact same arguments (as you said, *string contents*, not
> > likely to be the same argv[] pointers) or reset it with OPTIND=1.
> >
> > It seems to me that the easiest, fully-portable fix is just the
> > obvious quadratic-time solution: on each run of getopts(1), reset
> > getopt(3) to the start and call it ++N times.
> 
> This has several problems:
> It prints multiple messages "invalid option -q"
> when there are options which are not in optstring.

opterr=0;

Either leave it 0 and always do your own error printing, or set it
nonzero just before the last call (for the current option) so that
only that one prints an error.

> It mangles optarg if an option without argument follows
> an option with an argument.

Maybe I'm missing what you're trying to say, but all the state is
clobbered; I don't see how optarg is a problem specifically. You can
clear or set it to a sentinel value before the relevant call if you're
trying to determine if the call set it. Across other calls (not the
one for the current option) I don't see why it matters at all what
happens to it.

Rich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getopt() not exposing __optpos - shell needs it
  2017-08-29 12:20     ` Rich Felker
@ 2017-08-29 12:47       ` Denys Vlasenko
  2017-08-29 13:07         ` Rich Felker
  0 siblings, 1 reply; 9+ messages in thread
From: Denys Vlasenko @ 2017-08-29 12:47 UTC (permalink / raw)
  To: musl

On Tue, Aug 29, 2017 at 2:20 PM, Rich Felker <dalias@libc.org> wrote:
>> >> When I try to do that (use getopt() to implement "getopts"), it hits a snag.
>> >> Unlike normal getopt() usage in C programs, where it is called in a loop
>> >> with the same argv[] array until parsing is finished,
>> >> when it is used from "getopts", each successive call will (usually) have
>> >> the same argv[] CONTENTS, but not the ADDRESSES.
>> >> (The reason is in how shell works: it re-creates command arguments just before
>> >> running a command, since there can be variable substitution, globbing, etc).
>> >
>> > First, some background out of the spec to establish what is supposed
>> > to work and what's not:
>> >
>> >     If the application sets OPTIND to the value 1, a new set of
>> >     parameters can be used: either the current positional parameters
>> >     or new arg values. Any other attempt to invoke getopts multiple
>> >     times in a single shell execution environment with parameters
>> >     (positional parameters or arg operands) that are not the same in
>> >     all invocations, or with an OPTIND value modified to be a value
>> >     other than 1, produces unspecified results.
>> >
>> > What this means is that, when you use getopts(1), you need to either
>> > use the exact same arguments (as you said, *string contents*, not
>> > likely to be the same argv[] pointers) or reset it with OPTIND=1.
>> >
>> > It seems to me that the easiest, fully-portable fix is just the
>> > obvious quadratic-time solution: on each run of getopts(1), reset
>> > getopt(3) to the start and call it ++N times.
>>
>> This has several problems:
>> It prints multiple messages "invalid option -q"
>> when there are options which are not in optstring.
>
> opterr=0;
>
> Either leave it 0 and always do your own error printing, or set it
> nonzero just before the last call (for the current option) so that
> only that one prints an error.
>
>> It mangles optarg if an option without argument follows
>> an option with an argument.
>
> Maybe I'm missing what you're trying to say, but all the state is
> clobbered; I don't see how optarg is a problem specifically. You can
> clear or set it to a sentinel value before the relevant call if you're
> trying to determine if the call set it. Across other calls (not the
> one for the current option) I don't see why it matters at all what
> happens to it.

Yes, this can be done.

It gets increasigly ugly, though.

With these amounts of massaging around libc API design breakage,
"getopts" builtin code in hush is almost as big as simply
reimplementing getopt(): ~500 versus ~750 bytes on x86.
If I factor out ash getopts implementation and use it in both shells,
I can probably even decrease code size.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getopt() not exposing __optpos - shell needs it
  2017-08-29 12:47       ` Denys Vlasenko
@ 2017-08-29 13:07         ` Rich Felker
  2017-08-29 16:47           ` Denys Vlasenko
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Felker @ 2017-08-29 13:07 UTC (permalink / raw)
  To: musl

On Tue, Aug 29, 2017 at 02:47:13PM +0200, Denys Vlasenko wrote:
> On Tue, Aug 29, 2017 at 2:20 PM, Rich Felker <dalias@libc.org> wrote:
> >> >> When I try to do that (use getopt() to implement "getopts"), it hits a snag.
> >> >> Unlike normal getopt() usage in C programs, where it is called in a loop
> >> >> with the same argv[] array until parsing is finished,
> >> >> when it is used from "getopts", each successive call will (usually) have
> >> >> the same argv[] CONTENTS, but not the ADDRESSES.
> >> >> (The reason is in how shell works: it re-creates command arguments just before
> >> >> running a command, since there can be variable substitution, globbing, etc).
> >> >
> >> > First, some background out of the spec to establish what is supposed
> >> > to work and what's not:
> >> >
> >> >     If the application sets OPTIND to the value 1, a new set of
> >> >     parameters can be used: either the current positional parameters
> >> >     or new arg values. Any other attempt to invoke getopts multiple
> >> >     times in a single shell execution environment with parameters
> >> >     (positional parameters or arg operands) that are not the same in
> >> >     all invocations, or with an OPTIND value modified to be a value
> >> >     other than 1, produces unspecified results.
> >> >
> >> > What this means is that, when you use getopts(1), you need to either
> >> > use the exact same arguments (as you said, *string contents*, not
> >> > likely to be the same argv[] pointers) or reset it with OPTIND=1.
> >> >
> >> > It seems to me that the easiest, fully-portable fix is just the
> >> > obvious quadratic-time solution: on each run of getopts(1), reset
> >> > getopt(3) to the start and call it ++N times.
> >>
> >> This has several problems:
> >> It prints multiple messages "invalid option -q"
> >> when there are options which are not in optstring.
> >
> > opterr=0;
> >
> > Either leave it 0 and always do your own error printing, or set it
> > nonzero just before the last call (for the current option) so that
> > only that one prints an error.
> >
> >> It mangles optarg if an option without argument follows
> >> an option with an argument.
> >
> > Maybe I'm missing what you're trying to say, but all the state is
> > clobbered; I don't see how optarg is a problem specifically. You can
> > clear or set it to a sentinel value before the relevant call if you're
> > trying to determine if the call set it. Across other calls (not the
> > one for the current option) I don't see why it matters at all what
> > happens to it.
> 
> Yes, this can be done.
> 
> It gets increasigly ugly, though.
> 
> With these amounts of massaging around libc API design breakage,

Yes the getopt API is horribly broken. It's all global state, with a
tiny portion of that state internal/inaccessible. It doesn't follow
that the solution is adding new extensions every time an application
hits an obstacle from the brokenness. The right direction for fixing
it on the libc side would be introduction (with consensus across
important implementations) of a getopt_r API or similar with no
global/internal state.

> "getopts" builtin code in hush is almost as big as simply
> reimplementing getopt(): ~500 versus ~750 bytes on x86.
> If I factor out ash getopts implementation and use it in both shells,
> I can probably even decrease code size.

I really don't think adding a single store to optarg is going to have
relevant effects on the size, so we're back to just talking about the
cost of the obvious quadratic solutions I discussed above. Yes it may
turn out that just implementing getopts(1) without getopt(3) can be
smaller -- seems very likely if you can drop getopt(3) entirely from
the link, but less so if other code in busybox uses getopt(3) anyway
or if it's dynamic-linked and thus not included in the binary anyway.
I don't know whether this makes sense for you; it's your call. One
hidden cost is that getopt does have a number of nasty corner cases
that have already been considered (or found as bugs and fixed) in
musl, but I don't know if any of them carry over to corner cases in
getopts(1); if not they're probably not relevant to you.

Rich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getopt() not exposing __optpos - shell needs it
  2017-08-29 13:07         ` Rich Felker
@ 2017-08-29 16:47           ` Denys Vlasenko
  2017-08-29 17:38             ` Rich Felker
  0 siblings, 1 reply; 9+ messages in thread
From: Denys Vlasenko @ 2017-08-29 16:47 UTC (permalink / raw)
  To: musl

On Tue, Aug 29, 2017 at 3:07 PM, Rich Felker <dalias@libc.org> wrote:
>> > Maybe I'm missing what you're trying to say, but all the state is
>> > clobbered; I don't see how optarg is a problem specifically. You can
>> > clear or set it to a sentinel value before the relevant call if you're
>> > trying to determine if the call set it. Across other calls (not the
>> > one for the current option) I don't see why it matters at all what
>> > happens to it.
>>
>> Yes, this can be done.
>>
>> It gets increasigly ugly, though.
>>
>> With these amounts of massaging around libc API design breakage,
>
> Yes the getopt API is horribly broken. It's all global state, with a
> tiny portion of that state internal/inaccessible. It doesn't follow
> that the solution is adding new extensions every time an application
> hits an obstacle from the brokenness. The right direction for fixing
> it on the libc side would be introduction (with consensus across
> important implementations) of a getopt_r API or similar with no
> global/internal state.

I don't understand why you are opposed to exposing __optpos.
It does not even require any coding. Not a single insn needs
to be added.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: getopt() not exposing __optpos - shell needs it
  2017-08-29 16:47           ` Denys Vlasenko
@ 2017-08-29 17:38             ` Rich Felker
  0 siblings, 0 replies; 9+ messages in thread
From: Rich Felker @ 2017-08-29 17:38 UTC (permalink / raw)
  To: musl

On Tue, Aug 29, 2017 at 06:47:24PM +0200, Denys Vlasenko wrote:
> On Tue, Aug 29, 2017 at 3:07 PM, Rich Felker <dalias@libc.org> wrote:
> >> > Maybe I'm missing what you're trying to say, but all the state is
> >> > clobbered; I don't see how optarg is a problem specifically. You can
> >> > clear or set it to a sentinel value before the relevant call if you're
> >> > trying to determine if the call set it. Across other calls (not the
> >> > one for the current option) I don't see why it matters at all what
> >> > happens to it.
> >>
> >> Yes, this can be done.
> >>
> >> It gets increasigly ugly, though.
> >>
> >> With these amounts of massaging around libc API design breakage,
> >
> > Yes the getopt API is horribly broken. It's all global state, with a
> > tiny portion of that state internal/inaccessible. It doesn't follow
> > that the solution is adding new extensions every time an application
> > hits an obstacle from the brokenness. The right direction for fixing
> > it on the libc side would be introduction (with consensus across
> > important implementations) of a getopt_r API or similar with no
> > global/internal state.
> 
> I don't understand why you are opposed to exposing __optpos.
> It does not even require any coding. Not a single insn needs
> to be added.

New public interfaces are a lot more expensive than new code. The
latter can be changed or removed; the former can't.

Over the past 6+ years, just about the only other party asking musl to
add newly-invented nonstandard interfaces without existing precedent
was gnulib, and after a lot of discussion that made sense because they
were already doing the equivalent on other libcs through hideous
poking at internals that were never meant to be exposed, and because
the new interfaces fixed compatibility with tens (or more) of packages
using gnulib that weren't even aware of its dependencies on poking at
internals. OTOH you've requested this kind of thing for busybox hush
twice over just a month or so, and in this case the (getopt) the same
can clearly be achieved portably through the existing interfaces so
there's not even a need.

Rich


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-08-29 17:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-28 10:18 getopt() not exposing __optpos - shell needs it Denys Vlasenko
2017-08-28 15:17 ` Denys Vlasenko
2017-08-28 15:28 ` Rich Felker
2017-08-29 11:32   ` Denys Vlasenko
2017-08-29 12:20     ` Rich Felker
2017-08-29 12:47       ` Denys Vlasenko
2017-08-29 13:07         ` Rich Felker
2017-08-29 16:47           ` Denys Vlasenko
2017-08-29 17:38             ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).