The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
@ 2021-08-01 18:17 Douglas McIlroy
  2021-08-01 19:48 ` arnold
  0 siblings, 1 reply; 20+ messages in thread
From: Douglas McIlroy @ 2021-08-01 18:17 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

I have considerable sympathy with the general idea of formally
specifying and parsing inputs. Langsec people make a strong case
for doing so. The white paper,"A systematic approach to modern
Unix-like command interfaces", proposes to "simplify parsing by
facilitating the creation of easy-to-use 'grammar-based' parsers".

I'm not clear on what is meant by "parser". A plain parser is a
beast that builds a parse tree according to a grammar. For most
standard Unix programs, the parse tree has two kinds of leaves:
non-options and options with k parameters. Getopt restricts
k to {0,1}.

Aside from fiddling with argc and argv, I see little difference
in working with a parse tree for arguments that could be
handled by getopt and working with using getopt directly.

A more general parser could handle more elaborate grammatic
constraints on options, for example, field specs in sort(1),
requirements on presence of options in tar(1), or representation
of multiple parameters in cut(1).

In realizing the white paper's desire to "have the parser
provide the results to the program", it's likely that the mechanism
will, like Yacc, go beyond parsing and invoke semantic actions
as it identifies tree nodes.

Pioneer Yaccification of some commands might be a worthy demo.

Doug

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-08-01 18:17 [TUHS] Systematic approach to command-line interfaces [ meta issues ] Douglas McIlroy
@ 2021-08-01 19:48 ` arnold
  2021-08-01 21:30   ` John Cowan
  2021-08-02 12:11   ` Steffen Nurpmeso
  0 siblings, 2 replies; 20+ messages in thread
From: arnold @ 2021-08-01 19:48 UTC (permalink / raw)
  To: tuhs, douglas.mcilroy

Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote:

> In realizing the white paper's desire to "have the parser
> provide the results to the program", it's likely that the mechanism
> will, like Yacc, go beyond parsing and invoke semantic actions
> as it identifies tree nodes.

I have to admit that all this feels like overkill. Parsing options
is only a very small part of the real work that a program does.

Speaking for myself, I want something simple and regular that will
get the job done and let me get on with the actual business of
my software.  A grammar just for command-line argument parsing feels
like the tail wagging the dog: not nearly enough ROI, at least
for me.

I happen to like the getopt_long interface designed by the GNU
project. It's easy to learn, setup and use. Once it's in place
it's set and forget.

My two cents,

Arnold

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-08-01 19:48 ` arnold
@ 2021-08-01 21:30   ` John Cowan
  2021-08-02 12:11   ` Steffen Nurpmeso
  1 sibling, 0 replies; 20+ messages in thread
From: John Cowan @ 2021-08-01 21:30 UTC (permalink / raw)
  To: arnold; +Cc: The Eunuchs Hysterical Society, M Douglas McIlroy

[-- Attachment #1: Type: text/plain, Size: 1035 bytes --]

On Sun, Aug 1, 2021 at 3:48 PM <arnold@skeeve.com> wrote:


> I happen to like the getopt_long interface designed by the GNU
> project. It's easy to learn, setup and use. Once it's in place
> it's set and forget.
>

I agree, and what is more, I say, it is a grammar already, if a simple
one.  You declare what you accept and what's to be done, making it a DSL
expressed as an array of structs.

The only thing it lacks is that old getopt is a bag on the side rather than
being integrated: struct option should have an additional member "char
short_option", where '\0' means "no short option".  Given that feature and
three per-program values "progname" (argv[0] by default), "version", and
"usage_string", the --version and --help options can be processed inside
getopt itself.  I especially like that you pass per-option pointers saying
where to put the value, so no case statement required, just create some
global or local variables and pass in their addresses.  Automatic support
for "--nofoo" given "--foo" would be good as well.

[-- Attachment #2: Type: text/html, Size: 1870 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-08-01 19:48 ` arnold
  2021-08-01 21:30   ` John Cowan
@ 2021-08-02 12:11   ` Steffen Nurpmeso
  1 sibling, 0 replies; 20+ messages in thread
From: Steffen Nurpmeso @ 2021-08-02 12:11 UTC (permalink / raw)
  To: arnold; +Cc: tuhs, douglas.mcilroy

arnold@skeeve.com wrote in
 <202108011948.171JmAcK001895@freefriends.org>:
 |Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote:
 |
 |> In realizing the white paper's desire to "have the parser
 |> provide the results to the program", it's likely that the mechanism
 |> will, like Yacc, go beyond parsing and invoke semantic actions
 |> as it identifies tree nodes.
 |
 |I have to admit that all this feels like overkill. Parsing options
 |is only a very small part of the real work that a program does.
 |
 |Speaking for myself, I want something simple and regular that will
 |get the job done and let me get on with the actual business of
 |my software.  A grammar just for command-line argument parsing feels
 |like the tail wagging the dog: not nearly enough ROI, at least
 |for me.
 |
 |I happen to like the getopt_long interface designed by the GNU
 |project. It's easy to learn, setup and use. Once it's in place
 |it's set and forget.

By coincidence just last week i stumbled over (actually searched
and fixed) an issue where that terrible command line resorting hit
me where i did not expect it.  Ie after changing aspects of
a scripts that affect variable content, where that string then
appended to a string constant and then evaluated-passed to
a program, where the variable content did never contain
a hyphen-minus initially, but after the rewrite.  So they saw
a leading hyphen-minus somewhere on the line and turned it into an
option.  (The fix was easy, just turn 'X'$Y into 'X'"$Y", it maybe
should have always been written like that, but it seemed
unnecessary at first.)

 |My two cents,

For C++/C i have always had my own one which can long options,
optionally relates long to short options, where the long ones also
can include a help string (all in one string, as in "debug;d;"
N_("identical to -Sdebug") and N_() expands to literal).

I agree with the other post that turning command lines into a tree
of nodes is great, but of course this is hard to define.  For
first level only yet (without support for multiplexer commands,
ie, commands where the first command chooses an actual command)
i have this for the mailer i maintain, for some commands already.
It is a pain to write things like the following by hand

  mx_CMD_ARG_DESC_SUBCLASS_DEF(call, 2, a_cmd_cad_call){
     {mx_CMD_ARG_DESC_SHEXP | mx_CMD_ARG_DESC_HONOUR_STOP,
        n_SHEXP_PARSE_TRIM_IFSSPACE}, /* macro name */
     {mx_CMD_ARG_DESC_SHEXP | mx_CMD_ARG_DESC_OPTION |
           mx_CMD_ARG_DESC_GREEDY | mx_CMD_ARG_DESC_HONOUR_STOP,
        n_SHEXP_PARSE_IFS_VAR | n_SHEXP_PARSE_TRIM_IFSSPACE} /* args */
  }mx_CMD_ARG_DESC_SUBCLASS_DEF_END;

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 23:26           ` Warner Losh
@ 2021-07-31 23:41             ` Jon Steinhart
  0 siblings, 0 replies; 20+ messages in thread
From: Jon Steinhart @ 2021-07-31 23:41 UTC (permalink / raw)
  To: TUHS main list

Warner Losh writes:
> The large number of times I've had to replace inline code like you've
> quoted with
> getopt to fix the irregularities in command line parsing suggests that we
> differ on
> this viewpoint.

Fine by me.  Never hurts to know what other people consider best practices.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:20         ` Jon Steinhart
@ 2021-07-31 23:26           ` Warner Losh
  2021-07-31 23:41             ` Jon Steinhart
  0 siblings, 1 reply; 20+ messages in thread
From: Warner Losh @ 2021-07-31 23:26 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 792 bytes --]

On Sat, Jul 31, 2021 at 4:37 PM Jon Steinhart <jon@fourwinds.com> wrote:

> Warner Losh writes:
> >
> > The flip side to this is that libraries can be debugged once, while
> inline
> > code like the above needs to be deugged over and over....
>
> Well, no.  Inline code doesn't need to be debugged over and over.  It
> doesn't
> have to be written from scratch every time.  While in theory your point
> about
> libraries is correct, it hasn't seem to have worked out in practice.
> Better
> in C than in node.js, but there have been plenty of spectacular bugs found
> in
> old C libraries recently.
>

The large number of times I've had to replace inline code like you've
quoted with
getopt to fix the irregularities in command line parsing suggests that we
differ on
this viewpoint.

Warner

[-- Attachment #2: Type: text/html, Size: 1217 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:16     ` Jon Steinhart
@ 2021-07-31 22:20       ` Bakul Shah
  0 siblings, 0 replies; 20+ messages in thread
From: Bakul Shah @ 2021-07-31 22:20 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: TUHS main list


-- Bakul

> On Jul 31, 2021, at 3:16 PM, Jon Steinhart <jon@fourwinds.com> wrote:
> 
> Bakul Shah writes:
>> On Jul 31, 2021, at 12:20 PM, Jon Steinhart <jon@fourwinds.com> wrote:
>>> 
>>> So I never got getopt().  One of my rules is that I don't use a library
>>> in cases where the number of lines of gunk that that it takes to use a
>>> library function is >= the number of lines to just write it myself.  Yeah,
>>> I know the "but the library has more eyeballs and is debugged" argument
>>> but in reality libraries are the source of many bugs.  I've always taken
>>> the approach that I would never hire someone who had to use a library to
>>> implement a singly-linked list.
>> 
>> getopt() is perhaps the wrong solution but consider something like MH,
>> whose commands all follow a common pattern. Consider:
>> 
>>  - options (switches) all start with a single '-'
>>  - they may be abbreviated to a unique prefix.
>>  - Boolean options may be inverted by prepending -no (e.g. -nolist)
>>  - value options may also have -no format to remove a previous (or default) value
>>  - options may appear anywhere and the last instance wins
>> 
>> But different commands take different options. It would make sense to factor
>> out common parsing, help etc. for a consistent treatment. In my Go code for
>> parsing MH like options I used Go's flag package as a model.
>> 
>> Personally I vastly prefer MH style option processing to either single char
>> options or --very-long-names which can't be abbreviated. Never mind options for
>> commands like gcc, who can even remember 40+ ls options?
>> 
>> But I haven't thought about how to extend this for shell scripts, or
>> exposing these so that shells like zsh can do command completion. To specify
>> these you need a vector of tuples (name, type, default, brief-help) but that
>> is painful to do in a shell.
> 
> Ah, well, you've given away the secret of real UNIX geezers, we're on both
> this mailing list and the nmh list :-)

:-)

> Yes, I'm mostly happy with the way that nmh does options.
> 
> I guess that I would look more kindly on getopt if it had existed much earlier
> so that people writing new commands would be encouraged to use the same format.
> Not as happy with it as an afterthought.
> 
> Once again, I have to go back to the meatspace locality of reference issues.
> Sure, it would be nice to be able to factor out common parsing, for example
> if a related set of programs shared the same option set.  But unless it's
> something huge, I'd just put it in it's own file and use it for multiple
> programs; I wouldn't put it in a library.  My point is that the code that
> does the actual parsing is really trivial, and not necessarily the best
> use of a library.
> 
> As far as help goes, I don't expect help built into command line programs;
> I expect to look up error messages on the manual pages.  I'm happy with a
> generic usage error as most "helpful" output that I get from programs is
> not actually helpful.

Note that -help in MH program is far more useful as it spells out the full option name.
Consider

% refile -he
Usage: refile [msgs] [switches] +folder ...
  switches are:
  -draft
  -[no]link
  -[no]preserve
  -[no]retainsequences
  -[no]unlink
  -src +folder
  -file file
  -rmmproc program
  -normmproc
  -version
  -help
...

vs

% ls -z
ls: invalid option -- z
usage: ls [-ABCFGHILPRSTUWZabcdfghiklmnopqrstuwxy1,] [--color=when] [-D format] [file ...]



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:10       ` Warner Losh
  2021-07-31 22:19         ` Larry McVoy
@ 2021-07-31 22:20         ` Jon Steinhart
  2021-07-31 23:26           ` Warner Losh
  1 sibling, 1 reply; 20+ messages in thread
From: Jon Steinhart @ 2021-07-31 22:20 UTC (permalink / raw)
  To: TUHS main list

Warner Losh writes:
>
> The flip side to this is that libraries can be debugged once, while inline
> code like the above needs to be deugged over and over....

Well, no.  Inline code doesn't need to be debugged over and over.  It doesn't
have to be written from scratch every time.  While in theory your point about
libraries is correct, it hasn't seem to have worked out in practice.  Better
in C than in node.js, but there have been plenty of spectacular bugs found in
old C libraries recently.

Jon

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:10       ` Warner Losh
@ 2021-07-31 22:19         ` Larry McVoy
  2021-07-31 22:20         ` Jon Steinhart
  1 sibling, 0 replies; 20+ messages in thread
From: Larry McVoy @ 2021-07-31 22:19 UTC (permalink / raw)
  To: Warner Losh; +Cc: TUHS main list

On Sat, Jul 31, 2021 at 04:10:04PM -0600, Warner Losh wrote:
> On Sat, Jul 31, 2021 at 3:33 PM Jon Steinhart <jon@fourwinds.com> wrote:
> 
> > Richard Salz writes:
> > > On Sat, Jul 31, 2021 at 3:21 PM Jon Steinhart <jon@fourwinds.com> wrote:
> > >
> > > > opinion, it doesn't add value to do something that's already been done
> > > > but differently; it detracts from value because now there's yet another
> > > > competing way to do something.
> > > >
> > >
> > > You mean like not using getopt and rolling your own?  Shrug.
> > >
> > > while ((i = getopt(argc, argv, "xxxxx:xxxx")) != -1)
> > >    switch (i) {
> > >    case ....
> > >   }
> > > argc -= optind;
> > > argv += optind;
> > >
> > > So I never got getopt().  One of my rules is that I don't use a library
> > > > in cases where the number of lines of gunk that that it takes to use a
> > > > library function is >= the number of lines to just write it myself.
> > >
> > >
> > > I don't know, what lines in the above are extra beyond what you write?
> > The
> > > last two if being generous I suppose.
> >
> > Well, in my opinion that's not really an accurate representation of using
> > getopt.
> >
> > I would of course write the #include line, and the table of options, which
> > would
> > end up being >= the number of lines that it takes me to do this...
> >
> >         while (--argc > 0) {
> >                 if (*(++argv)[0] == '-') {
> >                         for (p = *argv + 1; *p != '\0'; p++) {
> >                                 switch (*p) {
> >
> 
> Except for all the things this gets wrong, it's ok. The problem with
> inlining getopt
> is that you wind up with cases like -f foo'' on the command line being
> treated differently
> than '-ffoo'. 

BitKeeper's getopt had a different char for that: "f:" allows -ffoo or -f foo
but "f;" insists on no space.

With that, I'm bowing out of this thread, it's becoming a bike shed.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:14       ` Bakul Shah
@ 2021-07-31 22:17         ` Bakul Shah
  0 siblings, 0 replies; 20+ messages in thread
From: Bakul Shah @ 2021-07-31 22:17 UTC (permalink / raw)
  To: Larry McVoy; +Cc: TUHS main list

On Jul 31, 2021, at 3:14 PM, Bakul Shah <bakul@iitbombay.org> wrote:
> 
> On Jul 31, 2021, at 3:13 PM, Larry McVoy <lm@mcvoy.com> wrote:
>> 
>> On Sat, Jul 31, 2021 at 03:04:48PM -0700, Bakul Shah wrote:
>>> On Jul 31, 2021, at 12:20 PM, Jon Steinhart <jon@fourwinds.com> wrote:
>>>> 
>>>> So I never got getopt().  One of my rules is that I don't use a library
>>>> in cases where the number of lines of gunk that that it takes to use a
>>>> library function is >= the number of lines to just write it myself.  Yeah,
>>>> I know the "but the library has more eyeballs and is debugged" argument
>>>> but in reality libraries are the source of many bugs.  I've always taken
>>>> the approach that I would never hire someone who had to use a library to
>>>> implement a singly-linked list.
>>> 
>>> getopt() is perhaps the wrong solution but consider something like MH,
>>> whose commands all follow a common pattern. Consider:
>>> 
>>> - options (switches) all start with a single '-'
>>> - they may be abbreviated to a unique prefix.
>> 
>> That last one is a gotcha waiting to happen:
>> 
>> program --this-is-the-long-option
>> 
>> is the same as 
>> 
>> program --this
>> 
>> but that will break scripts (and fingers) when program gets a new 
>> option like
> 
> That is easy to fix: use full options in scripts. Abbreviations for
> interactive use. Much better than --always-having-to-type-long-names.
> 
>> 
>> program --this-is-the-even-longer-option
>> 
>> We wrote our own getopt() for BitKeeper and it had long and short options
>> but no gotcha unique prefix.

Forgot to add that whoever extends "program" should know not to create a new
option that uses a longer name breaking a full form old options.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:04   ` Bakul Shah
  2021-07-31 22:13     ` Larry McVoy
@ 2021-07-31 22:16     ` Jon Steinhart
  2021-07-31 22:20       ` Bakul Shah
  1 sibling, 1 reply; 20+ messages in thread
From: Jon Steinhart @ 2021-07-31 22:16 UTC (permalink / raw)
  To: TUHS main list

Bakul Shah writes:
> On Jul 31, 2021, at 12:20 PM, Jon Steinhart <jon@fourwinds.com> wrote:
> > 
> > So I never got getopt().  One of my rules is that I don't use a library
> > in cases where the number of lines of gunk that that it takes to use a
> > library function is >= the number of lines to just write it myself.  Yeah,
> > I know the "but the library has more eyeballs and is debugged" argument
> > but in reality libraries are the source of many bugs.  I've always taken
> > the approach that I would never hire someone who had to use a library to
> > implement a singly-linked list.
>
> getopt() is perhaps the wrong solution but consider something like MH,
> whose commands all follow a common pattern. Consider:
>
>   - options (switches) all start with a single '-'
>   - they may be abbreviated to a unique prefix.
>   - Boolean options may be inverted by prepending -no (e.g. -nolist)
>   - value options may also have -no format to remove a previous (or default) value
>   - options may appear anywhere and the last instance wins
>
> But different commands take different options. It would make sense to factor
> out common parsing, help etc. for a consistent treatment. In my Go code for
> parsing MH like options I used Go's flag package as a model.
>
> Personally I vastly prefer MH style option processing to either single char
> options or --very-long-names which can't be abbreviated. Never mind options for
> commands like gcc, who can even remember 40+ ls options?
>
> But I haven't thought about how to extend this for shell scripts, or
> exposing these so that shells like zsh can do command completion. To specify
> these you need a vector of tuples (name, type, default, brief-help) but that
> is painful to do in a shell.

Ah, well, you've given away the secret of real UNIX geezers, we're on both
this mailing list and the nmh list :-)

Yes, I'm mostly happy with the way that nmh does options.

I guess that I would look more kindly on getopt if it had existed much earlier
so that people writing new commands would be encouraged to use the same format.
Not as happy with it as an afterthought.

Once again, I have to go back to the meatspace locality of reference issues.
Sure, it would be nice to be able to factor out common parsing, for example
if a related set of programs shared the same option set.  But unless it's
something huge, I'd just put it in it's own file and use it for multiple
programs; I wouldn't put it in a library.  My point is that the code that
does the actual parsing is really trivial, and not necessarily the best
use of a library.

As far as help goes, I don't expect help built into command line programs;
I expect to look up error messages on the manual pages.  I'm happy with a
generic usage error as most "helpful" output that I get from programs is
not actually helpful.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:13     ` Larry McVoy
@ 2021-07-31 22:14       ` Bakul Shah
  2021-07-31 22:17         ` Bakul Shah
  0 siblings, 1 reply; 20+ messages in thread
From: Bakul Shah @ 2021-07-31 22:14 UTC (permalink / raw)
  To: Larry McVoy; +Cc: TUHS main list

On Jul 31, 2021, at 3:13 PM, Larry McVoy <lm@mcvoy.com> wrote:
> 
> On Sat, Jul 31, 2021 at 03:04:48PM -0700, Bakul Shah wrote:
>> On Jul 31, 2021, at 12:20 PM, Jon Steinhart <jon@fourwinds.com> wrote:
>>> 
>>> So I never got getopt().  One of my rules is that I don't use a library
>>> in cases where the number of lines of gunk that that it takes to use a
>>> library function is >= the number of lines to just write it myself.  Yeah,
>>> I know the "but the library has more eyeballs and is debugged" argument
>>> but in reality libraries are the source of many bugs.  I've always taken
>>> the approach that I would never hire someone who had to use a library to
>>> implement a singly-linked list.
>> 
>> getopt() is perhaps the wrong solution but consider something like MH,
>> whose commands all follow a common pattern. Consider:
>> 
>>  - options (switches) all start with a single '-'
>>  - they may be abbreviated to a unique prefix.
> 
> That last one is a gotcha waiting to happen:
> 
> program --this-is-the-long-option
> 
> is the same as 
> 
> program --this
> 
> but that will break scripts (and fingers) when program gets a new 
> option like

That is easy to fix: use full options in scripts. Abbreviations for
interactive use. Much better than --always-having-to-type-long-names.

> 
> program --this-is-the-even-longer-option
> 
> We wrote our own getopt() for BitKeeper and it had long and short options
> but no gotcha unique prefix.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:04   ` Bakul Shah
@ 2021-07-31 22:13     ` Larry McVoy
  2021-07-31 22:14       ` Bakul Shah
  2021-07-31 22:16     ` Jon Steinhart
  1 sibling, 1 reply; 20+ messages in thread
From: Larry McVoy @ 2021-07-31 22:13 UTC (permalink / raw)
  To: Bakul Shah; +Cc: TUHS main list

On Sat, Jul 31, 2021 at 03:04:48PM -0700, Bakul Shah wrote:
> On Jul 31, 2021, at 12:20 PM, Jon Steinhart <jon@fourwinds.com> wrote:
> > 
> > So I never got getopt().  One of my rules is that I don't use a library
> > in cases where the number of lines of gunk that that it takes to use a
> > library function is >= the number of lines to just write it myself.  Yeah,
> > I know the "but the library has more eyeballs and is debugged" argument
> > but in reality libraries are the source of many bugs.  I've always taken
> > the approach that I would never hire someone who had to use a library to
> > implement a singly-linked list.
> 
> getopt() is perhaps the wrong solution but consider something like MH,
> whose commands all follow a common pattern. Consider:
> 
>   - options (switches) all start with a single '-'
>   - they may be abbreviated to a unique prefix.

That last one is a gotcha waiting to happen:

program --this-is-the-long-option

is the same as 

program --this

but that will break scripts (and fingers) when program gets a new 
option like

program --this-is-the-even-longer-option

We wrote our own getopt() for BitKeeper and it had long and short options
but no gotcha unique prefix.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 21:32     ` Jon Steinhart
  2021-07-31 21:37       ` Richard Salz
@ 2021-07-31 22:10       ` Warner Losh
  2021-07-31 22:19         ` Larry McVoy
  2021-07-31 22:20         ` Jon Steinhart
  1 sibling, 2 replies; 20+ messages in thread
From: Warner Losh @ 2021-07-31 22:10 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 3644 bytes --]

On Sat, Jul 31, 2021 at 3:33 PM Jon Steinhart <jon@fourwinds.com> wrote:

> Richard Salz writes:
> > On Sat, Jul 31, 2021 at 3:21 PM Jon Steinhart <jon@fourwinds.com> wrote:
> >
> > > opinion, it doesn't add value to do something that's already been done
> > > but differently; it detracts from value because now there's yet another
> > > competing way to do something.
> > >
> >
> > You mean like not using getopt and rolling your own?  Shrug.
> >
> > while ((i = getopt(argc, argv, "xxxxx:xxxx")) != -1)
> >    switch (i) {
> >    case ....
> >   }
> > argc -= optind;
> > argv += optind;
> >
> > So I never got getopt().  One of my rules is that I don't use a library
> > > in cases where the number of lines of gunk that that it takes to use a
> > > library function is >= the number of lines to just write it myself.
> >
> >
> > I don't know, what lines in the above are extra beyond what you write?
> The
> > last two if being generous I suppose.
>
> Well, in my opinion that's not really an accurate representation of using
> getopt.
>
> I would of course write the #include line, and the table of options, which
> would
> end up being >= the number of lines that it takes me to do this...
>
>         while (--argc > 0) {
>                 if (*(++argv)[0] == '-') {
>                         for (p = *argv + 1; *p != '\0'; p++) {
>                                 switch (*p) {
>

Except for all the things this gets wrong, it's ok. The problem with
inlining getopt
is that you wind up with cases like -f foo'' on the command line being
treated differently
than '-ffoo'. Inlined code like this can be quite frustrating for the user
to use. Your
locality of reference is cut and paste bugs that getopt eliminates because
it handles
all the special cases in a uniform way.


> Even if it took a few more lines to do it my way, I'm a believer that good
> coding
> style keeps "meatspace locality of reference" in mind.  As programmers, we
> put in
> a lot of effort to ensure locality of reference for computers, but then
> completely
> toss it for people who aren't as good as it.  So given a choice of a few
> lines of
> code versus having to look something up somewhere else, I choose the few
> lines of
> code.
>

And a few more bugs...

Being a geezer, I have lots of code lying around from which I can extract
> working
> fragments such as the one above.  Writing those few lines of code provides
> insulation
> from supply-side attack vectors bugs in libraries, versioning issues,
> having to load
> debug libraries, and so on.
>

getopt has been standardized since the 80s and has had universal adoption
since
the 90s. Hardly a version chasing issue since it's in everybody's libc.


> I realize that this isn't a huge deal by itself; it's a philosophical
> point.  When
> I strace any random program that I didn't write I'm astonished by the
> amount of
> library loading that takes place.  So any issues are multiplied by n.
>

The flip side to this is that libraries can be debugged once, while inline
code
like the above needs to be deugged over and over....


> Don't get me wrong; I use plenty of libraries.  But I tend to use those
> for stuff
> that is so common that there is a benefit from shared libraries (or at
> least there
> was before everything got containerized) and for libraries that do actual
> hard stuff.
> But I don't use libraries for small snippets of code that I could easily
> write
> myself yielding better code clarity for others reading my code.
>

Given the number of times I've been burned by trying to roll my own getopt,
I stopped trying years ago. It's harder than it looks.

Warner


> Jon
>

[-- Attachment #2: Type: text/html, Size: 5365 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 19:20 ` [TUHS] Systematic approach to command-line interfaces [ meta issues ] Jon Steinhart
  2021-07-31 21:06   ` Richard Salz
@ 2021-07-31 22:04   ` Bakul Shah
  2021-07-31 22:13     ` Larry McVoy
  2021-07-31 22:16     ` Jon Steinhart
  1 sibling, 2 replies; 20+ messages in thread
From: Bakul Shah @ 2021-07-31 22:04 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: TUHS main list

On Jul 31, 2021, at 12:20 PM, Jon Steinhart <jon@fourwinds.com> wrote:
> 
> So I never got getopt().  One of my rules is that I don't use a library
> in cases where the number of lines of gunk that that it takes to use a
> library function is >= the number of lines to just write it myself.  Yeah,
> I know the "but the library has more eyeballs and is debugged" argument
> but in reality libraries are the source of many bugs.  I've always taken
> the approach that I would never hire someone who had to use a library to
> implement a singly-linked list.

getopt() is perhaps the wrong solution but consider something like MH,
whose commands all follow a common pattern. Consider:

  - options (switches) all start with a single '-'
  - they may be abbreviated to a unique prefix.
  - Boolean options may be inverted by prepending -no (e.g. -nolist)
  - value options may also have -no format to remove a previous (or default) value
  - options may appear anywhere and the last instance wins

But different commands take different options. It would make sense to factor
out common parsing, help etc. for a consistent treatment. In my Go code for
parsing MH like options I used Go's flag package as a model.

Personally I vastly prefer MH style option processing to either single char
options or --very-long-names which can't be abbreviated. Never mind options for
commands like gcc, who can even remember 40+ ls options?

But I haven't thought about how to extend this for shell scripts, or
exposing these so that shells like zsh can do command completion. To specify
these you need a vector of tuples (name, type, default, brief-help) but that
is painful to do in a shell.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 21:37       ` Richard Salz
@ 2021-07-31 21:55         ` Jon Steinhart
  0 siblings, 0 replies; 20+ messages in thread
From: Jon Steinhart @ 2021-07-31 21:55 UTC (permalink / raw)
  To: TUHS main list

Richard Salz writes:
>
> On Sat, Jul 31, 2021 at 5:34 PM Jon Steinhart <jon@fourwinds.com> wrote:
>
> > Well, in my opinion that's not really an accurate representation of using
> > getopt.
> >
> >
> It's how all my getopt code works.
>
> getopt is in libc and a stdlib.h so you can't count that against it :)  on
> the other hand, your sample code didn't show arg/no-arg handling.

Well, at least on my system it's here:

SYNOPSIS
       #include <unistd.h>

not either of those other places.  I could provide you with a complete working
example, but I don't think that it's the important part of the discussion.
Using getopt() is more or less a wash in terms of lines of code so the
meatspace locality of reference argument carries the day for me.  Your mileage
may vary.

Jon

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 21:32     ` Jon Steinhart
@ 2021-07-31 21:37       ` Richard Salz
  2021-07-31 21:55         ` Jon Steinhart
  2021-07-31 22:10       ` Warner Losh
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Salz @ 2021-07-31 21:37 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 341 bytes --]

On Sat, Jul 31, 2021 at 5:34 PM Jon Steinhart <jon@fourwinds.com> wrote:

> Well, in my opinion that's not really an accurate representation of using
> getopt.
>
>
It's how all my getopt code works.

getopt is in libc and a stdlib.h so you can't count that against it :)  on
the other hand, your sample code didn't show arg/no-arg handling.

[-- Attachment #2: Type: text/html, Size: 727 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 21:06   ` Richard Salz
@ 2021-07-31 21:32     ` Jon Steinhart
  2021-07-31 21:37       ` Richard Salz
  2021-07-31 22:10       ` Warner Losh
  0 siblings, 2 replies; 20+ messages in thread
From: Jon Steinhart @ 2021-07-31 21:32 UTC (permalink / raw)
  To: TUHS main list

Richard Salz writes:
> On Sat, Jul 31, 2021 at 3:21 PM Jon Steinhart <jon@fourwinds.com> wrote:
>
> > opinion, it doesn't add value to do something that's already been done
> > but differently; it detracts from value because now there's yet another
> > competing way to do something.
> >
>
> You mean like not using getopt and rolling your own?  Shrug.
>
> while ((i = getopt(argc, argv, "xxxxx:xxxx")) != -1)
>    switch (i) {
>    case ....
>   }
> argc -= optind;
> argv += optind;
>
> So I never got getopt().  One of my rules is that I don't use a library
> > in cases where the number of lines of gunk that that it takes to use a
> > library function is >= the number of lines to just write it myself.
>
>
> I don't know, what lines in the above are extra beyond what you write?  The
> last two if being generous I suppose.

Well, in my opinion that's not really an accurate representation of using getopt.

I would of course write the #include line, and the table of options, which would
end up being >= the number of lines that it takes me to do this...

	while (--argc > 0) {
		if (*(++argv)[0] == '-') {
			for (p = *argv + 1; *p != '\0'; p++) {
				switch (*p) {

Even if it took a few more lines to do it my way, I'm a believer that good coding
style keeps "meatspace locality of reference" in mind.  As programmers, we put in
a lot of effort to ensure locality of reference for computers, but then completely
toss it for people who aren't as good as it.  So given a choice of a few lines of
code versus having to look something up somewhere else, I choose the few lines of
code.

Being a geezer, I have lots of code lying around from which I can extract working
fragments such as the one above.  Writing those few lines of code provides insulation
from supply-side attack vectors bugs in libraries, versioning issues, having to load
debug libraries, and so on.

I realize that this isn't a huge deal by itself; it's a philosophical point.  When
I strace any random program that I didn't write I'm astonished by the amount of
library loading that takes place.  So any issues are multiplied by n.

Don't get me wrong; I use plenty of libraries.  But I tend to use those for stuff
that is so common that there is a benefit from shared libraries (or at least there
was before everything got containerized) and for libraries that do actual hard stuff.
But I don't use libraries for small snippets of code that I could easily write
myself yielding better code clarity for others reading my code.

Jon

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 19:20 ` [TUHS] Systematic approach to command-line interfaces [ meta issues ] Jon Steinhart
@ 2021-07-31 21:06   ` Richard Salz
  2021-07-31 21:32     ` Jon Steinhart
  2021-07-31 22:04   ` Bakul Shah
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Salz @ 2021-07-31 21:06 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 769 bytes --]

On Sat, Jul 31, 2021 at 3:21 PM Jon Steinhart <jon@fourwinds.com> wrote:

> opinion, it doesn't add value to do something that's already been done
> but differently; it detracts from value because now there's yet another
> competing way to do something.
>

You mean like not using getopt and rolling your own?  Shrug.

while ((i = getopt(argc, argv, "xxxxx:xxxx")) != -1)
   switch (i) {
   case ....
  }
argc -= optind;
argv += optind;

So I never got getopt().  One of my rules is that I don't use a library
> in cases where the number of lines of gunk that that it takes to use a
> library function is >= the number of lines to just write it myself.


I don't know, what lines in the above are extra beyond what you write?  The
last two if being generous I suppose.

[-- Attachment #2: Type: text/html, Size: 1410 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 12:25 [TUHS] Systematic approach to command-line interfaces Michael Siegel
@ 2021-07-31 19:20 ` Jon Steinhart
  2021-07-31 21:06   ` Richard Salz
  2021-07-31 22:04   ` Bakul Shah
  0 siblings, 2 replies; 20+ messages in thread
From: Jon Steinhart @ 2021-07-31 19:20 UTC (permalink / raw)
  To: tuhs

Michael Siegel writes:
> Hello,
>
> I've recently started to implement a set of helper functions and
> procedures for parsing Unix-like command-line interfaces (i.e., POSIX +
> GNU-style long options, in this case) in Ada.
>
> While doing that, I learned that there is a better way to approach
> this problem – beyond using getopt(s) (which never really made sense to
> me) and having to write case statements in loops every time: Define a
> grammar, let a pre-built parser do the work, and have the parser
> provide the results to the program.
>
> Now, defining such a grammar requires a thoroughly systematic approach
> to the design of command-line interfaces. One problem with that is
> whether that grammar should allow for sub-commands. And that leads to
> the question of how task-specific tool sets should be designed. These
> seem to be a relatively new phenomenon in Unix-like systems that POSIX
> doesn't say anything about, as far as I can see.
>
> So, I've prepared a bit of a write-up, pondering on the pros and cons
> of two different ways of having task-specific tool sets
> (non-hierarchical command sets vs. sub-commands) that is available at
>
>   https://www.msiism.org/files/doc/unix-like_command-line_interfaces.html
>
> I tend to think the sub-command approach is better. But I'm neither a UI
> nor a Unix expert and have no formal training in computer things. So, I
> thought this would be a good place to ask for comment (and get some
> historical perspective).
>
> This is all just my pro-hobbyist attempt to make some people's lives
> easier, especially mine. I mean, currently, the "Unix" command line is
> quite a zoo, and not in a positive sense. Also, the number of
> well-thought-out command-line interfaces doesn't seem to be a growing
> one. But I guess that could be changed by providing truly easy ways to
> make good interfaces.
>
>
> --
> Michael

Well, don't let me discourage you from doing what you want.  But, in my
opinion, it doesn't add value to do something that's already been done
but differently; it detracts from value because now there's yet another
competing way to do something.

I'm actually surprised that the format of commands was as consistent as
it was for as long as it was.  Sure, I never liked the way that things
like tar were inconsistent, but it was mostly good for a long time.  I
see two reasons for the more recent messiness.

 o  Minimal learning of history by newer practicioners resulting in
    doing things in a way that they're familiar instead of learning
    the behavior of the target environment and fitting in.  It's what
    I call the "Ugly American Tourist" model; I'm in your country so
    you should speak English; why would I learn your language?

 o  The endless pile of --gnu-long-option-names.

On one hand, I sort of understand the long option names because so many
commands have so many options that there's no way to have one-character
mnemonics.  But, I would make the argument that if one has that many
options it's a sign that the command is trying to do too much stuff.
For example, I don't care for the compression options on tar.  These are
actually just invoking separate programs in a pipeline, which to me is
the job of the shell.  Sure, it can be convenient, but again, one of the
advantages of the shell is that I can make scripts for anything that I'm
doing so often that the convenience would be nice.  And without restarting
an old flame war, separate programs are more stylisticly in line and more
useful than many of the cat options.

So I never got getopt().  One of my rules is that I don't use a library
in cases where the number of lines of gunk that that it takes to use a
library function is >= the number of lines to just write it myself.  Yeah,
I know the "but the library has more eyeballs and is debugged" argument
but in reality libraries are the source of many bugs.  I've always taken
the approach that I would never hire someone who had to use a library to
implement a singly-linked list.

Jon

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-08-02 12:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-01 18:17 [TUHS] Systematic approach to command-line interfaces [ meta issues ] Douglas McIlroy
2021-08-01 19:48 ` arnold
2021-08-01 21:30   ` John Cowan
2021-08-02 12:11   ` Steffen Nurpmeso
  -- strict thread matches above, loose matches on Subject: below --
2021-07-31 12:25 [TUHS] Systematic approach to command-line interfaces Michael Siegel
2021-07-31 19:20 ` [TUHS] Systematic approach to command-line interfaces [ meta issues ] Jon Steinhart
2021-07-31 21:06   ` Richard Salz
2021-07-31 21:32     ` Jon Steinhart
2021-07-31 21:37       ` Richard Salz
2021-07-31 21:55         ` Jon Steinhart
2021-07-31 22:10       ` Warner Losh
2021-07-31 22:19         ` Larry McVoy
2021-07-31 22:20         ` Jon Steinhart
2021-07-31 23:26           ` Warner Losh
2021-07-31 23:41             ` Jon Steinhart
2021-07-31 22:04   ` Bakul Shah
2021-07-31 22:13     ` Larry McVoy
2021-07-31 22:14       ` Bakul Shah
2021-07-31 22:17         ` Bakul Shah
2021-07-31 22:16     ` Jon Steinhart
2021-07-31 22:20       ` Bakul Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).