From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11849 Path: news.gmane.org!.POSTED!not-for-mail From: Denys Vlasenko Newsgroups: gmane.linux.lib.musl.general Subject: getopt() not exposing __optpos - shell needs it Date: Mon, 28 Aug 2017 12:18:57 +0200 Message-ID: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Trace: blaine.gmane.org 1503915577 15348 195.159.176.226 (28 Aug 2017 10:19:37 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 28 Aug 2017 10:19:37 +0000 (UTC) To: musl , Rich Felker , Waldemar Brodkorb Original-X-From: musl-return-11862-gllmg-musl=m.gmane.org@lists.openwall.com Mon Aug 28 12:19:33 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1dmH8g-0003Sj-T2 for gllmg-musl@m.gmane.org; Mon, 28 Aug 2017 12:19:27 +0200 Original-Received: (qmail 7637 invoked by uid 550); 28 Aug 2017 10:19:30 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 7598 invoked from network); 28 Aug 2017 10:19:29 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=beky5qXlYoazIschjVNm+yv6FcjK4dR6mBqvxowm0HQ=; b=EIfo0r2tP1yey84r25z3Nt3378NR1TigvVg6EQ+ke39QoKO+UWULe583lVsViRQa1K X1yBpP+/YjeGnvtago49sIeeJJlP+PPvU54MH4ElnyE12PMhkwpFmjHkNgHd5VW9hB/x uf4DOjSjaMsCkCnruz3CerizDnFP1IpukmD+aaJtK11ZErcvruwzbZx+dBTA29c5Nd/1 VR/v4azqJ/S5xGdmd93w+U9YWBC6uZ1EMITvuSh5PRyAMSWFS52Hs/Yo56XiIEXNDbOy LmKdXRkN7Jj3jb/LGSJjL1JbkcqlM+MtGjBK1TGLVlFq1Snclex99iyMemxn0podop1a nXAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=beky5qXlYoazIschjVNm+yv6FcjK4dR6mBqvxowm0HQ=; b=lHR/cpfbhVl5I3IJiE0UZvPUbtsOUW/o9fgubWyEZyG2ReOikvj22SYGHQtmejaCp2 P6+33neygRv5jZ7WWoafsiaAkCGPx7pTwpPb12iZM0LHNgjXj09z7PjftY4frCgty85Y qg+yhxeRGyYLywHRF+ke3pi1gQt0PRLtqkUjWQOHesmd47dfq54DBie46yGWA0DaEeBD /m3Y6F8lgw1zIPc9CzyUcjF4Em9gtFGROC63vNycBqwmCK7I4NfKPvwVftwbm4KvFKYN JtYKhHAQ9ndGLzrsB9aXZGikxI7dmBeECWgdmU2wBUxunaoUqgcHbBRq+MeRfXXMuk4t l8sg== X-Gm-Message-State: AHYfb5h/YV0cD5aPiiW9UaZ1DlaOHv0DYTwHNsk7Cy1AMh9FusVKluUU M7pe242uEvmudU5LQlzeB+t9yihz4j9T X-Received: by 10.80.138.144 with SMTP id j16mr57203edj.129.1503915557764; Mon, 28 Aug 2017 03:19:17 -0700 (PDT) Xref: news.gmane.org gmane.linux.lib.musl.general:11849 Archived-At: I am using getopt() in busybox hush shell. "unset" builtin, for example: it takes -v and -f options. This works fine. However, POSIX requires that shells has a "getopts" builtin: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/getopts.html It is basically an API binding to access getopt() in the shell code: it uses OPTIND and (in bash) OPTERR on entry, returns a single-char variable on return and updates OPTIND and OPTARG. Sounds familiar, right? When I try to do that (use getopt() to implement "getopts"), it hits a snag. Unlike normal getopt() usage in C programs, where it is called in a loop with the same argv[] array until parsing is finished, when it is used from "getopts", each successive call will (usually) have the same argv[] CONTENTS, but not the ADDRESSES. (The reason is in how shell works: it re-creates command arguments just before running a command, since there can be variable substitution, globbing, etc). Worse yet, it's possible that between invocations of "getopts", there will be calls to shell builtins which use getopt() intenally. Example: while getopts "abc" RES -a -bc -abc de; do unset -vf func done This would not work correctly: getopt() call inside "unset" would modifies internal libc state which is tracking position in multi-option strings a-la "-abc". At best, it can skip options. If libc does not check that position is not beyond strlen(argv[i]), it can return garbage. (With glibc implementation of getopt(), it would use outright invalid pointers and return garbage even _without_ "unset" mangling internal state). The reason for this is that internal state is not fully exposed. It only shows (and allows modification) of the current argv[i] index, not the position inside it: int getopt(int argc, char **argv, const char *optstring); extern int optind; // <== internal state My current "getopts" code preserves optind value across "getopts" calls, keeping it in $OPTIND as POSIX intends all along: cp = get_local_var_value("OPTIND"); optind = cp ? atoi(cp) : 0; optarg = NULL; c = getopt(string_array_len(argv), argv, optstring); /* Set OPTARG */ cp = optarg; if (cp) set_local_var_from_halves("OPTARG", cp); else unset_local_var("OPTARG"); /* Convert -1 to "?" */ exitcode = EXIT_SUCCESS; if (c < 0) { /* -1: end of options */ exitcode = EXIT_FAILURE; c = '?'; } /* Set VAR and OPTIND */ cbuf[0] = c; set_local_var_from_halves(var, cbuf); set_local_var_from_halves("OPTIND", utoa(optind)); However, position inside argv[OPTIND] can not be preserved across calls: it is not exposed by libc. Musl implementation is pretty simple: it has the "int __optpos" variable, which holds this information. I propose to export it along with optind et al: -extern int optind, opterr, optopt; +extern int optind, __optpos, opterr, optopt; I know that the general pushback to such ideas is "this is not standard". Well, the standard is having a defect here: it was designed with only single-option strings in mind (-a -b, not -ab). Now multi-options are a must for any meaningful compatibility. Thus, libc must have additional internal state. Without extending API a bit, it's impossible to use getopt() for "getopts" (which seems to be what _shell_ standard intends), instead, shell C code is forced to reimplement getopt(). This makes me think that "optpos" should be added. even if it's "not standard". This is how standards evolve: real-world use discover deficiencies, APIs are fixed, then these changes trickle into next revisions of standards. (Optionally, musl may want to add some robustification here: if (!optpos) optpos++; if ((k = mbtowc(&c, argv[optind]+optpos, MB_LEN_MAX)) < 0) { to prevent "argv[optind]+optpos" to point past the end of the string.) (This can only happen with fairly pathological cases when user changes argv[] strings mid-parsing, but still). glibc/uclibc need more extensive changes, since they use a pointer to store the position. This is a test in busybox tree which fails miserably because of that: https://git.busybox.net/busybox/tree/shell/hush_test/hush-getopts/getopt_test_libc_bug.tests # This test can fail with libc with buggy getopt() implementation. # If getopt() wants to parse multi-option args (-abc), # it needs to remember a position within current arg. # # If this position is kept as a POINTER, not an offset, # and if argv[] ADDRESSES (not contents!) change, it blows up.