From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11858 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: getopt() not exposing __optpos - shell needs it Date: Tue, 29 Aug 2017 09:07:57 -0400 Message-ID: <20170829130757.GA1627@brightrain.aerifal.cx> References: <20170828152844.GY1627@brightrain.aerifal.cx> <20170829122014.GZ1627@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1504012105 17083 195.159.176.226 (29 Aug 2017 13:08:25 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 29 Aug 2017 13:08:25 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-11871-gllmg-musl=m.gmane.org@lists.openwall.com Tue Aug 29 15:08:19 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1dmgFQ-0003cA-NF for gllmg-musl@m.gmane.org; Tue, 29 Aug 2017 15:08:04 +0200 Original-Received: (qmail 7912 invoked by uid 550); 29 Aug 2017 13:08:09 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 7891 invoked from network); 29 Aug 2017 13:08:08 -0000 Content-Disposition: inline In-Reply-To: Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:11858 Archived-At: On Tue, Aug 29, 2017 at 02:47:13PM +0200, Denys Vlasenko wrote: > On Tue, Aug 29, 2017 at 2:20 PM, Rich Felker wrote: > >> >> When I try to do that (use getopt() to implement "getopts"), it hits a snag. > >> >> Unlike normal getopt() usage in C programs, where it is called in a loop > >> >> with the same argv[] array until parsing is finished, > >> >> when it is used from "getopts", each successive call will (usually) have > >> >> the same argv[] CONTENTS, but not the ADDRESSES. > >> >> (The reason is in how shell works: it re-creates command arguments just before > >> >> running a command, since there can be variable substitution, globbing, etc). > >> > > >> > First, some background out of the spec to establish what is supposed > >> > to work and what's not: > >> > > >> > If the application sets OPTIND to the value 1, a new set of > >> > parameters can be used: either the current positional parameters > >> > or new arg values. Any other attempt to invoke getopts multiple > >> > times in a single shell execution environment with parameters > >> > (positional parameters or arg operands) that are not the same in > >> > all invocations, or with an OPTIND value modified to be a value > >> > other than 1, produces unspecified results. > >> > > >> > What this means is that, when you use getopts(1), you need to either > >> > use the exact same arguments (as you said, *string contents*, not > >> > likely to be the same argv[] pointers) or reset it with OPTIND=1. > >> > > >> > It seems to me that the easiest, fully-portable fix is just the > >> > obvious quadratic-time solution: on each run of getopts(1), reset > >> > getopt(3) to the start and call it ++N times. > >> > >> This has several problems: > >> It prints multiple messages "invalid option -q" > >> when there are options which are not in optstring. > > > > opterr=0; > > > > Either leave it 0 and always do your own error printing, or set it > > nonzero just before the last call (for the current option) so that > > only that one prints an error. > > > >> It mangles optarg if an option without argument follows > >> an option with an argument. > > > > Maybe I'm missing what you're trying to say, but all the state is > > clobbered; I don't see how optarg is a problem specifically. You can > > clear or set it to a sentinel value before the relevant call if you're > > trying to determine if the call set it. Across other calls (not the > > one for the current option) I don't see why it matters at all what > > happens to it. > > Yes, this can be done. > > It gets increasigly ugly, though. > > With these amounts of massaging around libc API design breakage, Yes the getopt API is horribly broken. It's all global state, with a tiny portion of that state internal/inaccessible. It doesn't follow that the solution is adding new extensions every time an application hits an obstacle from the brokenness. The right direction for fixing it on the libc side would be introduction (with consensus across important implementations) of a getopt_r API or similar with no global/internal state. > "getopts" builtin code in hush is almost as big as simply > reimplementing getopt(): ~500 versus ~750 bytes on x86. > If I factor out ash getopts implementation and use it in both shells, > I can probably even decrease code size. I really don't think adding a single store to optarg is going to have relevant effects on the size, so we're back to just talking about the cost of the obvious quadratic solutions I discussed above. Yes it may turn out that just implementing getopts(1) without getopt(3) can be smaller -- seems very likely if you can drop getopt(3) entirely from the link, but less so if other code in busybox uses getopt(3) anyway or if it's dynamic-linked and thus not included in the binary anyway. I don't know whether this makes sense for you; it's your call. One hidden cost is that getopt does have a number of nasty corner cases that have already been considered (or found as bugs and fixed) in musl, but I don't know if any of them carry over to corner cases in getopts(1); if not they're probably not relevant to you. Rich