From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/6681 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: getopt_long permutation algorithm questions Date: Wed, 3 Dec 2014 19:04:34 -0500 Message-ID: <20141204000434.GN4574@brightrain.aerifal.cx> References: <20141203192929.GA14223@brightrain.aerifal.cx> <1417635789.4789.35.camel@eris.loria.fr> <20141203195608.GM4574@brightrain.aerifal.cx> <1417639673.4789.45.camel@eris.loria.fr> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1417651497 9829 80.91.229.3 (4 Dec 2014 00:04:57 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 4 Dec 2014 00:04:57 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-6694-gllmg-musl=m.gmane.org@lists.openwall.com Thu Dec 04 01:04:51 2014 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XwJub-000522-O6 for gllmg-musl@m.gmane.org; Thu, 04 Dec 2014 01:04:49 +0100 Original-Received: (qmail 24393 invoked by uid 550); 4 Dec 2014 00:04:48 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 24382 invoked from network); 4 Dec 2014 00:04:47 -0000 Content-Disposition: inline In-Reply-To: <1417639673.4789.45.camel@eris.loria.fr> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:6681 Archived-At: On Wed, Dec 03, 2014 at 09:47:53PM +0100, Jens Gustedt wrote: > Am Mittwoch, den 03.12.2014, 14:56 -0500 schrieb Rich Felker: > > On Wed, Dec 03, 2014 at 08:43:09PM +0100, Jens Gustedt wrote: > > > sounds all a bit complicated and fragile to me. getopt should neither > > > be performance nor memory critical, so there is not much need for > > > optimization here, no? > > > > > > Why don't you just keep track of the cases in an array on the stack, > > > and then do all the moves after the processing in a second scan? You > > > have argc, so you know the size of a VLA (`char[argc]` should suffice) > > > that you would have to define. > > > > I'm not trying to optimize anything for performance here, just get it > > correct and simple. I don't quite understand what you're proposing, > > so I have to explain better > > > but doing a multi-pass, > > actually two-pass > > > out-of-place operation does not sound simple, > > but in place > > > and it's not robust since the kernel allows huge argument lists > > whereby char[argc] would overflow the stack (and even if it didn't, > > assuming that it doesn't allow them would be an unwarranted assumption > > unless it actually bought us some simplicity, which I don't see). > > My idea was just to do a first pass for the "real" argument processing > and to note during that pass if an argument wasn't used. Then in a > second pass reorder argv with that information. I don't think you want to pre-process the whole argument list, for various reasons. The GNU version, as I understand, does argument permutation as it goes based on the state of the opt* vars rather than permuting everything at once, and presumably some callers could rely on this, for example as a way of conditionally suppressing the permutation by incrementing optind rather than calling getopt when optind points to a non-option argument. It's also just a more complicated design and not compatible with the existing design/implementation of getopt[_long] to preprocess everything. > For the storage of that information, this is one bit per argc, and if > we waste one byte for it to have life simple, then `char[argc]` should > suffice. argc can be greater than several K ? My understanding is that the kernel imposes no hard limit, aside from resource limits, on modern kernels... Yes this is a nasty and completely unnecessary DoS vector. > But thinking of it: > > > > Am Mittwoch, den 03.12.2014, 14:29 -0500 schrieb Rich Felker: > > > > As part of resolving the rest of the dist-local changes Alpine is > > > > applying to musl, I'm trying to figure out how to add GNU-style > > > > argument permutation to getopt_long. The basic concept is simple: when > > > > a non-option argument is encountered, skip forward until the next > > > > option (argument beginning with '-') and move it (and possibly its > > > > argument) before the non-option arguments. However, there are some > > > > ugly corner cases like: > > > > > > > > arg1 -ab foo arg2 > > > > > > > > where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it > > > > seems like, in order to perform the correct permutation, lookahead is > > > > required to see that foo also needs to be moved. Is this correct? > > I wouldn't call it "lookahead". > > Why don't you > > - have k as the position where the next option argument should be > placed and p as the next position that has not been processed yet > - move p such that argv[p] is the next option argument > - process argv[p] in place, including an eventual argument > argv[p+1] that a long option or the last option character takes > - shuffle the found argv[p] (plus eventually argv[p+1]) to the > correct position argv[k] (and argv[k+1]) > - increment k and p by 1 (or 2) > - iterate > > The disadvantage of that general approach is that it can be > quadratic. For very long argument lists, if you have argc more than > several K, this could be prohibitive. This is pretty close to what I described, except I avoided the "in-place" processing which doesn't work because of examples like the one I gave, where you would have to process more than one option at a time, then rewind to return just the first one. The process I described avoids the need for that rewinding. Note that part of the complexity here in "processing ahead and then rewinding" is that the internal state of getopt[_long] is not encapsulated but exists in global variables that the caller can access. By quadratic time, I assume you meant for processing the whole argument list. The algorithm I described is linear-time in the number of consecutive misordered non-option arguments for a single call to getopt_long, and thus quadratic when you call it repeatedly. This could be avoided by saving additional hidden state, but I suspect the practical gains are small and the potential for incorrect behavior exists if the argument list is modified between calls. Rich