[TUHS] Systematic approach to command-line interfaces

The Unix Heritage Society mailing list
 help / color / mirror / Atom feed

* [TUHS] Systematic approach to command-line interfaces
@ 2021-07-31 12:25 Michael Siegel
  2021-07-31 13:05 ` Dan Halbert
                   ` (6 more replies)
  0 siblings, 7 replies; 72+ messages in thread
From: Michael Siegel @ 2021-07-31 12:25 UTC (permalink / raw)
  To: tuhs

Hello,

I've recently started to implement a set of helper functions and
procedures for parsing Unix-like command-line interfaces (i.e., POSIX +
GNU-style long options, in this case) in Ada.

While doing that, I learned that there is a better way to approach
this problem – beyond using getopt(s) (which never really made sense to
me) and having to write case statements in loops every time: Define a
grammar, let a pre-built parser do the work, and have the parser
provide the results to the program.

Now, defining such a grammar requires a thoroughly systematic approach
to the design of command-line interfaces. One problem with that is
whether that grammar should allow for sub-commands. And that leads to
the question of how task-specific tool sets should be designed. These
seem to be a relatively new phenomenon in Unix-like systems that POSIX
doesn't say anything about, as far as I can see.

So, I've prepared a bit of a write-up, pondering on the pros and cons
of two different ways of having task-specific tool sets
(non-hierarchical command sets vs. sub-commands) that is available at

  https://www.msiism.org/files/doc/unix-like_command-line_interfaces.html

I tend to think the sub-command approach is better. But I'm neither a UI
nor a Unix expert and have no formal training in computer things. So, I
thought this would be a good place to ask for comment (and get some
historical perspective).

This is all just my pro-hobbyist attempt to make some people's lives
easier, especially mine. I mean, currently, the "Unix" command line is
quite a zoo, and not in a positive sense. Also, the number of
well-thought-out command-line interfaces doesn't seem to be a growing
one. But I guess that could be changed by providing truly easy ways to
make good interfaces.

--
Michael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 12:25 [TUHS] Systematic approach to command-line interfaces Michael Siegel
@ 2021-07-31 13:05 ` Dan Halbert
  2021-07-31 14:21 ` Adam Thornton
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 72+ messages in thread
From: Dan Halbert @ 2021-07-31 13:05 UTC (permalink / raw)
  To: tuhs

The "click" CLI parser for Python I think would be of interest to you: 
https://click.palletsprojects.com/. It has support for sub-commands and 
nesting. It's not grammar-based internally, as far as I know.
Also I think PowerShell has some interesting concepts, though I've not 
looked at it in detail.

Dan H.

On 7/31/21 8:25 AM, Michael Siegel wrote:
> Hello,
>
> I've recently started to implement a set of helper functions and
> procedures for parsing Unix-like command-line interfaces (i.e., POSIX +
> GNU-style long options, in this case) in Ada.
>
> While doing that, I learned that there is a better way to approach
> this problem – beyond using getopt(s) (which never really made sense to
> me) and having to write case statements in loops every time: Define a
> grammar, let a pre-built parser do the work, and have the parser
> provide the results to the program.
>
> Now, defining such a grammar requires a thoroughly systematic approach
> to the design of command-line interfaces. One problem with that is
> whether that grammar should allow for sub-commands. And that leads to
> the question of how task-specific tool sets should be designed. These
> seem to be a relatively new phenomenon in Unix-like systems that POSIX
> doesn't say anything about, as far as I can see.
>
> So, I've prepared a bit of a write-up, pondering on the pros and cons
> of two different ways of having task-specific tool sets
> (non-hierarchical command sets vs. sub-commands) that is available at
>
>    https://www.msiism.org/files/doc/unix-like_command-line_interfaces.html
>
> I tend to think the sub-command approach is better. But I'm neither a UI
> nor a Unix expert and have no formal training in computer things. So, I
> thought this would be a good place to ask for comment (and get some
> historical perspective).
>
> This is all just my pro-hobbyist attempt to make some people's lives
> easier, especially mine. I mean, currently, the "Unix" command line is
> quite a zoo, and not in a positive sense. Also, the number of
> well-thought-out command-line interfaces doesn't seem to be a growing
> one. But I guess that could be changed by providing truly easy ways to
> make good interfaces.
>
>
> --
> Michael


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 12:25 [TUHS] Systematic approach to command-line interfaces Michael Siegel
  2021-07-31 13:05 ` Dan Halbert
@ 2021-07-31 14:21 ` Adam Thornton
  2021-07-31 14:25   ` Adam Thornton
  2021-07-31 15:45 ` Richard Salz
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 72+ messages in thread
From: Adam Thornton @ 2021-07-31 14:21 UTC (permalink / raw)
  To: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1061 bytes --]

On Jul 31, 2021, at 5:25 AM, Michael Siegel <msi@malbolge.net> wrote:

While doing that, I learned that there is a better way to approach
this problem – beyond using getopt(s) (which never really made sense to
me) and having to write case statements in loops every time: Define a
grammar, let a pre-built parser do the work, and have the parser
provide the results to the program.

I see that Dan Halbert beat me to mentioning "click."

The trick with shell is that unless you write the parser in shell, which is
going to be miserable, you’re doing it in a command in a subshell, and
therefore your return values have to be a structured stream of bytes on
stdout, which the parent shell is going to have to interpret.  An eval-able
shell fragment, where you have a convention of what the variables you get
from the option parser will be, is probably the easiest way, since from the
parent that would look like:

$(parse_my_opts $*)
# Magic variables spring to life
if [ “$OPT_SUBCOMMAND_0” == “burninate” ]; then ….

Adam

[-- Attachment #2: Type: text/html, Size: 1447 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 14:21 ` Adam Thornton
@ 2021-07-31 14:25   ` Adam Thornton
  0 siblings, 0 replies; 72+ messages in thread
From: Adam Thornton @ 2021-07-31 14:25 UTC (permalink / raw)
  To: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1997 bytes --]

Digressing a bit (but only a bit) talking about IPC: Powershell and CMS
PIPELINES both take the approach of more structured pipelines, where pipe
contents are not just streams of bytes but can be structured records.  This
offers a lot of power, but it also inhibits the ability to arbitrarily
compose pipe stages, because you've effectively introduced a type system.

On the other hand you can certainly argue that stream-of-bytes pipes ALSO
introduce a type system, it's just a completely ad-hoc, undocumented, and
fragile one that relies on the cooperation of both ends of the pipe to work
at all, and you'd be right.

In practice...well, I'd rather use stream-of-bytes, but I am more
comfortable in Unix-like environments than Powershell, and my CMS PIPELINES
skills are quite rusty now.

On Sat, Jul 31, 2021 at 7:21 AM Adam Thornton <athornton@gmail.com> wrote:

>
>
> On Jul 31, 2021, at 5:25 AM, Michael Siegel <msi@malbolge.net> wrote:
>
> While doing that, I learned that there is a better way to approach
> this problem – beyond using getopt(s) (which never really made sense to
> me) and having to write case statements in loops every time: Define a
> grammar, let a pre-built parser do the work, and have the parser
> provide the results to the program.
>
>
> I see that Dan Halbert beat me to mentioning "click."
>
> The trick with shell is that unless you write the parser in shell, which
> is going to be miserable, you’re doing it in a command in a subshell, and
> therefore your return values have to be a structured stream of bytes on
> stdout, which the parent shell is going to have to interpret.  An eval-able
> shell fragment, where you have a convention of what the variables you get
> from the option parser will be, is probably the easiest way, since from the
> parent that would look like:
>
> $(parse_my_opts $*)
> # Magic variables spring to life
> if [ “$OPT_SUBCOMMAND_0” == “burninate” ]; then ….
>
> Adam
>

[-- Attachment #2: Type: text/html, Size: 2651 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 12:25 [TUHS] Systematic approach to command-line interfaces Michael Siegel
  2021-07-31 13:05 ` Dan Halbert
  2021-07-31 14:21 ` Adam Thornton
@ 2021-07-31 15:45 ` Richard Salz
  2021-07-31 16:03   ` Clem Cole
  2021-07-31 15:56 ` Paul Winalski
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 72+ messages in thread
From: Richard Salz @ 2021-07-31 15:45 UTC (permalink / raw)
  To: Michael Siegel; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 70 bytes --]

Look for "comnd jsys" that exact spelling. Source code is around.


>

[-- Attachment #2: Type: text/html, Size: 286 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 12:25 [TUHS] Systematic approach to command-line interfaces Michael Siegel
                   ` (2 preceding siblings ...)
  2021-07-31 15:45 ` Richard Salz
@ 2021-07-31 15:56 ` Paul Winalski
  2021-07-31 16:19   ` Dan Cross
  2021-08-01 16:51   ` Michael Siegel
  2021-07-31 16:41 ` Clem Cole
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 72+ messages in thread
From: Paul Winalski @ 2021-07-31 15:56 UTC (permalink / raw)
  To: Michael Siegel; +Cc: tuhs

On 7/31/21, Michael Siegel <msi@malbolge.net> wrote:
>
> While doing that, I learned that there is a better way to approach
> this problem – beyond using getopt(s) (which never really made sense to
> me) and having to write case statements in loops every time: Define a
> grammar, let a pre-built parser do the work, and have the parser
> provide the results to the program.

This method for handling command lines dates back at least to the
1970s.  The COMND JSYS (system call) in TOPS-20 operated this way, as
does the DCL command line interface in OpenVMS.  As you pointed out it
can greatly simplify the code in the application.  It also permits
command completion.  If the command has a long-winded option, such as
-supercalifragilisticexpialidocious, I can type -super then hit the
TAB key and as long as there is only one option that starts with
-super the parser will fill in the rest of the long keyword.  It also
means that you can provide interactive help.  At any point the user
can type a question mark and the command interpreter will say what
syntactic element is expected next.  The TOPS-20 COMND JSYS
implemented both of these features, and I think that command
completion was eventually added to the VMS command interpreter, too.

This method of command line parsing also enforces a degree of
uniformity of syntax between the command lines of the various
utilities and applications.

-Paul W.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 15:45 ` Richard Salz
@ 2021-07-31 16:03   ` Clem Cole
  2021-07-31 16:06     ` Richard Salz
  2021-07-31 16:17     ` Clem Cole
  0 siblings, 2 replies; 72+ messages in thread
From: Clem Cole @ 2021-07-31 16:03 UTC (permalink / raw)
  To: Richard Salz; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 225 bytes --]

https://github.com/PDP-10/sri-nic/blob/master/files/fs/c/ccmd/ccmdmd.unx

On Sat, Jul 31, 2021 at 11:46 AM Richard Salz <rich.salz@gmail.com> wrote:

> Look for "comnd jsys" that exact spelling. Source code is around.
>
>
>>

[-- Attachment #2: Type: text/html, Size: 907 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 16:03   ` Clem Cole
@ 2021-07-31 16:06     ` Richard Salz
  2021-07-31 16:21       ` Clem Cole
  2021-07-31 16:17     ` Clem Cole
  1 sibling, 1 reply; 72+ messages in thread
From: Richard Salz @ 2021-07-31 16:06 UTC (permalink / raw)
  To: Clem Cole; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 456 bytes --]

You gave it away :)  Half the fun is getting the search right.

The old kermit program from Columbia has an implementation in portable (for
its time) C.

On Sat, Jul 31, 2021 at 12:03 PM Clem Cole <clemc@ccc.com> wrote:

> https://github.com/PDP-10/sri-nic/blob/master/files/fs/c/ccmd/ccmdmd.unx
>
> On Sat, Jul 31, 2021 at 11:46 AM Richard Salz <rich.salz@gmail.com> wrote:
>
>> Look for "comnd jsys" that exact spelling. Source code is around.
>>
>>
>>>

[-- Attachment #2: Type: text/html, Size: 1473 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 16:03   ` Clem Cole
  2021-07-31 16:06     ` Richard Salz
@ 2021-07-31 16:17     ` Clem Cole
  2021-07-31 16:30       ` Dan Cross
  1 sibling, 1 reply; 72+ messages in thread
From: Clem Cole @ 2021-07-31 16:17 UTC (permalink / raw)
  To: Richard Salz; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]

Sorry, hit return too soon.   I remember an old AAUGN newsletter describing
it.   If I recall it was original done for kermit.  The same idea is in
tcsh also.  Which came first, I don't remember.  Cut/pasted from AAUGN Vol8
# 2
-----------------
CCMD: A Version of COMND in C

*Andrew Lowry*
*Howard Kaye *

Columbia University

CCMD is a general parsing mechanism for developing User Interfaces to
programs. It is based on the functionality of TOP5.20's COMND Jsys. CCMD
allows a program to parse for various field types (file names, user names,
dates and times, keywords, numbers, arbitrary text, tokens, *etc*.). It is
meant to supply a homogeneous user interface across a variety of machines
and operating systems for C programs. It currently runs under System V
UNIX, 4.2/4.3 BSD, Ultrix 1.2/2.0, and MSDOS. The library defines various
default actions (user settable), and allows field completion, help, file
indirection, comments, *etc*. on a per field basis. Future plans include
command line editing, command history, and ports to other operating systems
(such as VMS).

CCMD is available for anonymous FTP from
[CU20B.COLUMBIA.EDU]WS:<SOURCE.CCMD>*.*

For further information, send mail to:

info-ccmd-request@cu20b.columbia.edu
seismo!columbia!cunixc!info-ccmd-request

On Sat, Jul 31, 2021 at 12:03 PM Clem Cole <clemc@ccc.com> wrote:

> https://github.com/PDP-10/sri-nic/blob/master/files/fs/c/ccmd/ccmdmd.unx
>
> On Sat, Jul 31, 2021 at 11:46 AM Richard Salz <rich.salz@gmail.com> wrote:
>
>> Look for "comnd jsys" that exact spelling. Source code is around.
>>
>>
>>>

[-- Attachment #2: Type: text/html, Size: 3746 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 15:56 ` Paul Winalski
@ 2021-07-31 16:19   ` Dan Cross
  2021-08-01 17:44     ` Chet Ramey
  2021-08-01 16:51   ` Michael Siegel
  1 sibling, 1 reply; 72+ messages in thread
From: Dan Cross @ 2021-07-31 16:19 UTC (permalink / raw)
  To: Paul Winalski; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 2591 bytes --]

On Sat, Jul 31, 2021 at 11:57 AM Paul Winalski <paul.winalski@gmail.com>
wrote:

> On 7/31/21, Michael Siegel <msi@malbolge.net> wrote:
> >
> > While doing that, I learned that there is a better way to approach
> > this problem – beyond using getopt(s) (which never really made sense to
> > me) and having to write case statements in loops every time: Define a
> > grammar, let a pre-built parser do the work, and have the parser
> > provide the results to the program.
>
> This method for handling command lines dates back at least to the
> 1970s.  The COMND JSYS (system call) in TOPS-20 operated this way, as
> does the DCL command line interface in OpenVMS.  As you pointed out it
> can greatly simplify the code in the application.  It also permits
> command completion.  If the command has a long-winded option, such as
> -supercalifragilisticexpialidocious, I can type -super then hit the
> TAB key and as long as there is only one option that starts with
> -super the parser will fill in the rest of the long keyword.  It also
> means that you can provide interactive help.  At any point the user
> can type a question mark and the command interpreter will say what
> syntactic element is expected next.  The TOPS-20 COMND JSYS
> implemented both of these features, and I think that command
> completion was eventually added to the VMS command interpreter, too.
>
> This method of command line parsing also enforces a degree of
> uniformity of syntax between the command lines of the various
> utilities and applications.
>

There was someone posting here on TUHS a while back about leveraging a
special context-sensitive `--shell-help` or similar command line program
and synthesizing a protocol between the shell and a program to provide
TOPS-20 like command completion. It was nowhere near what you get from the
COMND JSYS, but seemed like a reasonable approximation.

This is verging on COFF territory, but one of the reasons such a mechanism
is unlike what you get from TOPS-20 is that, in that system, as soon as you
type the name of a command, you're effectively running that command; the
process model is quite different from that of Unix.

With respect to command line handling in general, I think there are some
attempts at making things more rational available in modern languages.
Command line parsing packages for Go and the `clap` package for Rust come
to mind (
https://rust-lang-nursery.github.io/rust-cookbook/cli/arguments.html). I've
used clap recently in a few places and it's very convenient.

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 3208 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 16:06     ` Richard Salz
@ 2021-07-31 16:21       ` Clem Cole
  0 siblings, 0 replies; 72+ messages in thread
From: Clem Cole @ 2021-07-31 16:21 UTC (permalink / raw)
  To: Richard Salz; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 768 bytes --]

Sorry, I remembered it from AUUGN and checked those first, then searched
for "CCMD Unix Columbia.edu" and got the hit on Lar's PDP-10 sources.

and note the dyslexic spelling in my earlier email of AUUGN -- sigh.

On Sat, Jul 31, 2021 at 12:06 PM Richard Salz <rich.salz@gmail.com> wrote:

> You gave it away :)  Half the fun is getting the search right.
>
> The old kermit program from Columbia has an implementation in portable
> (for its time) C.
>
> On Sat, Jul 31, 2021 at 12:03 PM Clem Cole <clemc@ccc.com> wrote:
>
>> https://github.com/PDP-10/sri-nic/blob/master/files/fs/c/ccmd/ccmdmd.unx
>>
>> On Sat, Jul 31, 2021 at 11:46 AM Richard Salz <rich.salz@gmail.com>
>> wrote:
>>
>>> Look for "comnd jsys" that exact spelling. Source code is around.
>>>
>>>
>>>>

[-- Attachment #2: Type: text/html, Size: 2331 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 16:17     ` Clem Cole
@ 2021-07-31 16:30       ` Dan Cross
  0 siblings, 0 replies; 72+ messages in thread
From: Dan Cross @ 2021-07-31 16:30 UTC (permalink / raw)
  To: Clem Cole; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 3123 bytes --]

On Sat, Jul 31, 2021 at 12:18 PM Clem Cole <clemc@ccc.com> wrote:

> Sorry, hit return too soon.   I remember an old AAUGN newsletter
> describing it.   If I recall it was original done for kermit.  The same
> idea is in tcsh also.  Which came first, I don't remember.  Cut/pasted from
> AAUGN Vol8 # 2
>

Frank da Cruz wrote a very nice reminiscence of the DECSYSTEM-20s at
Columbia that discusses the creation of CCMD as they decommissioned the
PDP-10s and switched to Unix on VAXen (and then Suns).
http://www.columbia.edu/kermit/dec20.html

When I was a student, we were still given accounts on the CUNIX cluster;
64-bit SPARC machines running Solaris at the time. At the time, the actress
Julia Styles was a student. One day, I was walking out of Mudd (the
engineering building) with a friend of mine who suddenly grabbed my arm and
said, "oh my god oh my god oh my god that's Julia Styles!" Being
perpetually ignorant of popular culture, I had no idea who she was
referring to confusedly thought she meant Julia Child, the late host of a
cooking show. "...But I thought she was dead?" "No, Dan, that's Julia
Child!" We decided to look up Ms Styles in the student directory, but being
a celebrity she wasn't listed. However, one could still discover her "UNI"
(login name) by grepping for her in the NIS password database. We did that
and sent her an email: "Christy was too embarrassed to say hi to you and
Dan thought you were Julia Child." Predictably, she did not respond. In
retrospect, I idly wonder how many such emails she got, most presumably of
the creepy variety, but we just thought ours was funny.

It appears that CUNIX still exists: https://cuit.columbia.edu/unix

        - Dan C.

-----------------
> CCMD: A Version of COMND in C
>
> *Andrew Lowry*
> *Howard Kaye *
>
> Columbia University
>
> CCMD is a general parsing mechanism for developing User Interfaces to
> programs. It is based on the functionality of TOP5.20's COMND Jsys. CCMD
> allows a program to parse for various field types (file names, user names,
> dates and times, keywords, numbers, arbitrary text, tokens, *etc*.). It
> is meant to supply a homogeneous user interface across a variety of
> machines and operating systems for C programs. It currently runs under
> System V UNIX, 4.2/4.3 BSD, Ultrix 1.2/2.0, and MSDOS. The library defines
> various default actions (user settable), and allows field completion, help,
> file indirection, comments, *etc*. on a per field basis. Future plans
> include command line editing, command history, and ports to other operating
> systems (such as VMS).
>
> CCMD is available for anonymous FTP from
> [CU20B.COLUMBIA.EDU]WS:<SOURCE.CCMD>*.*
>
> For further information, send mail to:
>
> info-ccmd-request@cu20b.columbia.edu
> seismo!columbia!cunixc!info-ccmd-request
>
>
>
> On Sat, Jul 31, 2021 at 12:03 PM Clem Cole <clemc@ccc.com> wrote:
>
>> https://github.com/PDP-10/sri-nic/blob/master/files/fs/c/ccmd/ccmdmd.unx
>>
>> On Sat, Jul 31, 2021 at 11:46 AM Richard Salz <rich.salz@gmail.com>
>> wrote:
>>
>>> Look for "comnd jsys" that exact spelling. Source code is around.
>>>
>>>
>>>>

[-- Attachment #2: Type: text/html, Size: 5626 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 12:25 [TUHS] Systematic approach to command-line interfaces Michael Siegel
                   ` (3 preceding siblings ...)
  2021-07-31 15:56 ` Paul Winalski
@ 2021-07-31 16:41 ` Clem Cole
  2021-07-31 17:41   ` John Cowan
  2021-07-31 17:30 ` Anthony Martin
  2021-07-31 19:20 ` [TUHS] Systematic approach to command-line interfaces [ meta issues ] Jon Steinhart
  6 siblings, 1 reply; 72+ messages in thread
From: Clem Cole @ 2021-07-31 16:41 UTC (permalink / raw)
  To: Michael Siegel; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 2275 bytes --]

On Sat, Jul 31, 2021 at 8:36 AM Michael Siegel <msi@malbolge.net> wrote:

> Hello,
>
> I've recently started to implement a set of helper functions and
> procedures for parsing Unix-like command-line interfaces (i.e., POSIX +
> GNU-style long options, in this case)

As an old guy, I am amused to read these words .. because UNIX did not have
a command-line parsing standard., and I remember the wars.  If you came
from a system, where the program was exec's with the command line
parameters pre-parsed (like the DEC world, ITS, and some others); UNIX
seemed foreign and often consider 'bad' by folks.   The biggest argument
(which was reasonable) was Unix command, sometimes used 'keys' (like tp/tar
and the like) and others used switches (cp, ed).   Folks new to UNIX often
b*tched as it being 'inconsistent (read things like the 'UNIX Haters
Book').  I admit I was 'surprised' when I came there in the Fifth Edition
in the mid-70s from the PDP-10 world, but as a programmer, I ended up
really liking the fact that the command-line was not pre-parsed, other than
white space removal and I did not have figure out some strange syntax for
findnext() and other UUO/JSYS from my previous life.

So by the late 70's early 80's, a number of different UNIX parsing schemes
popped up.   Like the stuff from Columbia Richard pointed out.   TCL in
some ways end result, which had a life that was useful, but fell away too
eventually.   The whole getopt(3) thing appeared originally inside of BTL.
 The first version I was was from USB (Summit), but I'm not sure they were
the original authors.   One problem was that it was tied up with later AT&T
licenses [i.e. PWB or later] and was not in Research, the USENIX community
lacked it.  Thus when AT&T brought it to us to consider for POSIX.2, there
was balking.  The ISV's seemed to like it, but there was not a lot of
support elsewhere.  At some point, somebody in the USENIX community wrote a
version and posted it to comp.unix.sources and some people began to use
it.  Of course, GNU had to take it and pee on it, so we got the long file
name stuff.

All in all, it's what's you are used I suspect.

The AT&T whole getopt(3) thing works (I can deal with keys too BTW).  I
guess I just don't get excited about it, these days.

Clem

[-- Attachment #2: Type: text/html, Size: 4303 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 12:25 [TUHS] Systematic approach to command-line interfaces Michael Siegel
                   ` (4 preceding siblings ...)
  2021-07-31 16:41 ` Clem Cole
@ 2021-07-31 17:30 ` Anthony Martin
  2021-07-31 17:46   ` John Cowan
  2021-07-31 18:56   ` Michael Siegel
  2021-07-31 19:20 ` [TUHS] Systematic approach to command-line interfaces [ meta issues ] Jon Steinhart
  6 siblings, 2 replies; 72+ messages in thread
From: Anthony Martin @ 2021-07-31 17:30 UTC (permalink / raw)
  To: Michael Siegel; +Cc: tuhs

Michael Siegel <msi@malbolge.net> once said:
> So, I've prepared a bit of a write-up, pondering on the pros and cons
> of two different ways of having task-specific tool sets
> (non-hierarchical command sets vs. sub-commands) that is available at
>
>   https://www.msiism.org/files/doc/unix-like_command-line_interfaces.html
>
> I tend to think the sub-command approach is better. But I'm neither a UI
> nor a Unix expert and have no formal training in computer things. So, I
> thought this would be a good place to ask for comment (and get some
> historical perspective).

You're missing the approach taken in Plan 9 (and
10th edition Unix): put related commands in a
directory and use a shell that doesn't restrict
the first argument of a command to a single path
element.

This lets you execute commands like:

	auth/as
	disk/prep
	git/rebase
	ip/ping
	ndb/dns
	upas/send

without having a prefix on every command name or
single large binaries with every command linked
in as subcommands.

Cheers,
  Anthony

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 16:41 ` Clem Cole
@ 2021-07-31 17:41   ` John Cowan
  0 siblings, 0 replies; 72+ messages in thread
From: John Cowan @ 2021-07-31 17:41 UTC (permalink / raw)
  To: Clem Cole; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]

On Sat, Jul 31, 2021 at 12:41 PM Clem Cole <clemc@ccc.com> wrote:

The biggest argument (which was reasonable) was Unix command, sometimes
> used 'keys' (like tp/tar and the like) and others used switches (cp, ed).
>

That's modest (especially now that tar accepts "-" before the key) compared
to remembering when to use -F, when to use -d, and when to use -t to
specify the field separator, and when you are stuck without a field
separator option; also what the default is with no option (I'd prefer
"arbitrary amount whitespace" in all cases, but that's often not available
at all).  These inconsistencies still piss me off no end.

[-- Attachment #2: Type: text/html, Size: 1469 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 17:30 ` Anthony Martin
@ 2021-07-31 17:46   ` John Cowan
  2021-07-31 18:56   ` Michael Siegel
  1 sibling, 0 replies; 72+ messages in thread
From: John Cowan @ 2021-07-31 17:46 UTC (permalink / raw)
  To: Anthony Martin; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

On Sat, Jul 31, 2021 at 1:39 PM Anthony Martin <ality@pbrane.org> wrote:

> You're missing the approach taken in Plan 9 (and
> 10th edition Unix): put related commands in a
> directory and use a shell that doesn't restrict
> the first argument of a command to a single path
> element.
>

What that doesn't give you is the ability to say "git <git-options> diff
<git-diff-options>", which is very nice and makes the inconsistencies I
just posted on less likely.  Fortunately, any getopt-variant can deal with
these; you just have to pass the tail of argv and a suitably reduced value
for argc to another call of the options parser.

[-- Attachment #2: Type: text/html, Size: 1201 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 17:30 ` Anthony Martin
  2021-07-31 17:46   ` John Cowan
@ 2021-07-31 18:56   ` Michael Siegel
  2021-07-31 19:41     ` Clem Cole
  2021-08-01 17:48     ` Chet Ramey
  1 sibling, 2 replies; 72+ messages in thread
From: Michael Siegel @ 2021-07-31 18:56 UTC (permalink / raw)
  To: Anthony Martin; +Cc: tuhs

Am Sat, 31 Jul 2021 10:30:18 -0700
schrieb Anthony Martin <ality@pbrane.org>:

> Michael Siegel <msi@malbolge.net> once said:
> > So, I've prepared a bit of a write-up, pondering on the pros and
> > cons of two different ways of having task-specific tool sets
> > (non-hierarchical command sets vs. sub-commands) that is available
> > at
> >
> >   https://www.msiism.org/files/doc/unix-like_command-line_interfaces.html
> >
> > I tend to think the sub-command approach is better. But I'm neither
> > a UI nor a Unix expert and have no formal training in computer
> > things. So, I thought this would be a good place to ask for comment
> > (and get some historical perspective).  
> 
> You're missing the approach taken in Plan 9 (and
> 10th edition Unix): put related commands in a
> directory and use a shell that doesn't restrict
> the first argument of a command to a single path
> element.
> 
> This lets you execute commands like:
> 
> 	auth/as
> 	disk/prep
> 	git/rebase
> 	ip/ping
> 	ndb/dns
> 	upas/send
> 
> without having a prefix on every command name or
> single large binaries with every command linked
> in as subcommands.

Thanks for pointing this out. I had no idea.

Unfortunately(?), I'm looking to make my life easier on more "Unix-like
Unix-like systems" (for want of a better term), for the time being
(Linux, BSD, maybe illumos). (I mean, which shell would I use to
accomplish this on Unix?) And, as has already been pointed out, this
approach doesn't allow for global command options before sub-commands,
which pretty much defeats the sub-command approach altogether UI-wise,
I'd say.

Unrelated: I'm still having some technical difficulties with this list,
namely that I don't receive any mail sent to it. (I'm using the Web
archive to keep track of what's happening.) So, for me to be able to
reply to a particular message, it would also have to be sent directly
to me. Sorry for the inconvenience. The problem is already being
worked on.

--
Michael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 12:25 [TUHS] Systematic approach to command-line interfaces Michael Siegel
                   ` (5 preceding siblings ...)
  2021-07-31 17:30 ` Anthony Martin
@ 2021-07-31 19:20 ` Jon Steinhart
  2021-07-31 21:06   ` Richard Salz
  2021-07-31 22:04   ` Bakul Shah
  6 siblings, 2 replies; 72+ messages in thread
From: Jon Steinhart @ 2021-07-31 19:20 UTC (permalink / raw)
  To: tuhs

Michael Siegel writes:
> Hello,
>
> I've recently started to implement a set of helper functions and
> procedures for parsing Unix-like command-line interfaces (i.e., POSIX +
> GNU-style long options, in this case) in Ada.
>
> While doing that, I learned that there is a better way to approach
> this problem – beyond using getopt(s) (which never really made sense to
> me) and having to write case statements in loops every time: Define a
> grammar, let a pre-built parser do the work, and have the parser
> provide the results to the program.
>
> Now, defining such a grammar requires a thoroughly systematic approach
> to the design of command-line interfaces. One problem with that is
> whether that grammar should allow for sub-commands. And that leads to
> the question of how task-specific tool sets should be designed. These
> seem to be a relatively new phenomenon in Unix-like systems that POSIX
> doesn't say anything about, as far as I can see.
>
> So, I've prepared a bit of a write-up, pondering on the pros and cons
> of two different ways of having task-specific tool sets
> (non-hierarchical command sets vs. sub-commands) that is available at
>
>   https://www.msiism.org/files/doc/unix-like_command-line_interfaces.html
>
> I tend to think the sub-command approach is better. But I'm neither a UI
> nor a Unix expert and have no formal training in computer things. So, I
> thought this would be a good place to ask for comment (and get some
> historical perspective).
>
> This is all just my pro-hobbyist attempt to make some people's lives
> easier, especially mine. I mean, currently, the "Unix" command line is
> quite a zoo, and not in a positive sense. Also, the number of
> well-thought-out command-line interfaces doesn't seem to be a growing
> one. But I guess that could be changed by providing truly easy ways to
> make good interfaces.
>
>
> --
> Michael

Well, don't let me discourage you from doing what you want.  But, in my
opinion, it doesn't add value to do something that's already been done
but differently; it detracts from value because now there's yet another
competing way to do something.

I'm actually surprised that the format of commands was as consistent as
it was for as long as it was.  Sure, I never liked the way that things
like tar were inconsistent, but it was mostly good for a long time.  I
see two reasons for the more recent messiness.

 o  Minimal learning of history by newer practicioners resulting in
    doing things in a way that they're familiar instead of learning
    the behavior of the target environment and fitting in.  It's what
    I call the "Ugly American Tourist" model; I'm in your country so
    you should speak English; why would I learn your language?

 o  The endless pile of --gnu-long-option-names.

On one hand, I sort of understand the long option names because so many
commands have so many options that there's no way to have one-character
mnemonics.  But, I would make the argument that if one has that many
options it's a sign that the command is trying to do too much stuff.
For example, I don't care for the compression options on tar.  These are
actually just invoking separate programs in a pipeline, which to me is
the job of the shell.  Sure, it can be convenient, but again, one of the
advantages of the shell is that I can make scripts for anything that I'm
doing so often that the convenience would be nice.  And without restarting
an old flame war, separate programs are more stylisticly in line and more
useful than many of the cat options.

So I never got getopt().  One of my rules is that I don't use a library
in cases where the number of lines of gunk that that it takes to use a
library function is >= the number of lines to just write it myself.  Yeah,
I know the "but the library has more eyeballs and is debugged" argument
but in reality libraries are the source of many bugs.  I've always taken
the approach that I would never hire someone who had to use a library to
implement a singly-linked list.

Jon

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 18:56   ` Michael Siegel
@ 2021-07-31 19:41     ` Clem Cole
  2021-07-31 21:30       ` Michael Siegel
  2021-08-01 17:48     ` Chet Ramey
  1 sibling, 1 reply; 72+ messages in thread
From: Clem Cole @ 2021-07-31 19:41 UTC (permalink / raw)
  To: Michael Siegel; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 3072 bytes --]

On Sat, Jul 31, 2021 at 2:58 PM Michael Siegel <msi@malbolge.net> wrote:

> I mean, which shell would I use to accomplish this on Unix?

In the old days, when the first Unix shell wars started, there was a Unix
adage:  *"Bourne to Program, Type with Joy"*
FWIW: tcsh supports TOPS-20 autocomplete -- a little work with your search
engine, you can figure out how to use its many options.  That said, the GNU
bash is said to do it also, but  I can not say I have tried it personally
since the ROMS in my fingers were long ago burned to 'Type with Joy.'

Also in 50 years, it's so much that UNIX is perfect, it has lots of flaws
and quirks.  Thinking about them and considering 'better' solutions is
often wise, particularly when capabilities (like Moore's law) give you new
tools to solve them.  But a level of wisdom here is not all of those quirks
are worth repairing.  In the case of command-line parsing, getopt(3) has
proven to be 'good enough' for most things.  If it was really as bad as you
seem to think, I suspect one of the previous N attempts over the last 50
years might have taken root.

My point in my previous message was that getopt(3) was created to solve the
original UNIX problem.  It did actually take root (I'll not get into if the
Gnu long stuff was an improvement).  But there were other attempts,
including the Tops-20 scheme (which has been pointed out is quite similar
to yours) that have been around for at least 35 years in the UNIX community
and it did not catch on.  I ask you to think about if maybe your value of
that feature might be more than others have set it to be.

As an analog, when I first came to UNIX and C from other systems, ideas
like the open curly brace/close curly brace instead of BEGIN/END in C, and
there were plenty of things in Ken's original shell that I found annoying,
particularly coming from the regularity of TOPS-20 and the like.  Hey, I
used EMACS, TECO and DDT and none of them were in my new kit.   But I
forced myself to learn the new tools and new way of doing things.  Since I
was programming on UNIX in C, I made sure my code looked like everyone else
[K&R did not yet exist -- but we would later call this 'White Book C."
 Why? So someone else could read it.   I learned that style too and frankly
have a hard time with any C code that does not follow it today. But if I am
writing in a BEGIN/END style language, I adopt that style.  When in Rome
and all that.

In time, the wonderful things I could do in the UNIX world way outpaced
what I could do in the old world.   In fact, by the time either TECO or
EMACS bacame available for my use by then on a Vax, I never switched off
the earlier UNIX tools I had learned.   Like I said, I 'Type with Joy",
frankly even if I'm on a Mac, Linux or Windows -- I switch the shell to be
tcsh.  Could I learn a new shell, sure?   If I were to switch today, it
would probably be zsh, but my suggestion is to learn the tools that system
has really well.  They keep using them. Adapt to the style of the system
you are using.

Anyway, that my thoughts from an old guy.

[-- Attachment #2: Type: text/html, Size: 5310 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 19:20 ` [TUHS] Systematic approach to command-line interfaces [ meta issues ] Jon Steinhart
@ 2021-07-31 21:06   ` Richard Salz
  2021-07-31 21:32     ` Jon Steinhart
  2021-07-31 22:04   ` Bakul Shah
  1 sibling, 1 reply; 72+ messages in thread
From: Richard Salz @ 2021-07-31 21:06 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 769 bytes --]

On Sat, Jul 31, 2021 at 3:21 PM Jon Steinhart <jon@fourwinds.com> wrote:

> opinion, it doesn't add value to do something that's already been done
> but differently; it detracts from value because now there's yet another
> competing way to do something.
>

You mean like not using getopt and rolling your own?  Shrug.

while ((i = getopt(argc, argv, "xxxxx:xxxx")) != -1)
   switch (i) {
   case ....
  }
argc -= optind;
argv += optind;

So I never got getopt().  One of my rules is that I don't use a library
> in cases where the number of lines of gunk that that it takes to use a
> library function is >= the number of lines to just write it myself.


I don't know, what lines in the above are extra beyond what you write?  The
last two if being generous I suppose.

[-- Attachment #2: Type: text/html, Size: 1410 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 19:41     ` Clem Cole
@ 2021-07-31 21:30       ` Michael Siegel
  0 siblings, 0 replies; 72+ messages in thread
From: Michael Siegel @ 2021-07-31 21:30 UTC (permalink / raw)
  To: Clem Cole; +Cc: TUHS main list

Am Sat, 31 Jul 2021 15:41:17 -0400
schrieb Clem Cole <clemc@ccc.com>:

> On Sat, Jul 31, 2021 at 2:58 PM Michael Siegel <msi@malbolge.net>
> wrote:
> 
> > I mean, which shell would I use to accomplish this on Unix?  
> 
> In the old days, when the first Unix shell wars started, there was a
> Unix adage:  *"Bourne to Program, Type with Joy"*
> FWIW: tcsh supports TOPS-20 autocomplete -- a little work with your
> search engine, you can figure out how to use its many options.  That
> said, the GNU bash is said to do it also, but  I can not say I have
> tried it personally since the ROMS in my fingers were long ago burned
> to 'Type with Joy.'

I see. I currently use Bash as my shell most of the time, and I have
my doubts about that being a good idea. But I also doubt I would like
tcsh any more. I've had a bit of experience with it on FreeBSD once.
All I can say is: We didn't get along when we first met, and we haven't
met since. The one and only shell I know that is (arguably) both a
traditional Unix shell and a huge improvement on the traditional Unix
shell is rc, which I have recently begun to use on and off. I can see
myself switching to that eventually, even though it lacks some
features I've come to depend on. It's definitely non-standard. But I
don't care about that very much because I believe it's objectively
better, and considerably so.

> Also in 50 years, it's so much that UNIX is perfect, it has lots of
> flaws and quirks.  Thinking about them and considering 'better'
> solutions is often wise, particularly when capabilities (like Moore's
> law) give you new tools to solve them.  But a level of wisdom here is
> not all of those quirks are worth repairing.  In the case of
> command-line parsing, getopt(3) has proven to be 'good enough' for
> most things.  If it was really as bad as you seem to think, I suspect
> one of the previous N attempts over the last 50 years might have
> taken root.
> 
> My point in my previous message was that getopt(3) was created to
> solve the original UNIX problem.  It did actually take root (I'll not
> get into if the Gnu long stuff was an improvement).  But there were
> other attempts, including the Tops-20 scheme (which has been pointed
> out is quite similar to yours) that have been around for at least 35
> years in the UNIX community and it did not catch on.  I ask you to
> think about if maybe your value of that feature might be more than
> others have set it to be.

To me, using getopt/getopts has always felt more like a way to
complicate parsing rather than solving any actual problem. My aim
is to get around writing an actual parsing routine based on a
half-backed set of rules each time I put together a command-line utility
because that is time-consuming (for no good reason) and error-prone.

I really find the TOPS-20 way of going about this inspiring, though I'd
aim for something way more primitive that should indeed be good
enough. And I'd want it to stay as close to the POSIX Utility Syntax
Guidelines as reasonably possible because even though these are
lacking, I find them a reasonable base to build upon.

Also, experience tells me that merely adapting to what has taken root is
quite often not a good idea at all. In fact, the reasons for something
good and valuable not taking root might actually turn out to be pretty
nasty.

> As an analog, when I first came to UNIX and C from other systems,
> ideas like the open curly brace/close curly brace instead of
> BEGIN/END in C, and there were plenty of things in Ken's original
> shell that I found annoying, particularly coming from the regularity
> of TOPS-20 and the like.  Hey, I used EMACS, TECO and DDT and none of
> them were in my new kit.   But I forced myself to learn the new tools
> and new way of doing things.  Since I was programming on UNIX in C, I
> made sure my code looked like everyone else [K&R did not yet exist --
> but we would later call this 'White Book C." Why? So someone else
> could read it.   I learned that style too and frankly have a hard
> time with any C code that does not follow it today. But if I am
> writing in a BEGIN/END style language, I adopt that style.  When in
> Rome and all that.
> 
> In time, the wonderful things I could do in the UNIX world way
> outpaced what I could do in the old world.   In fact, by the time
> either TECO or EMACS bacame available for my use by then on a Vax, I
> never switched off the earlier UNIX tools I had learned.   Like I
> said, I 'Type with Joy", frankly even if I'm on a Mac, Linux or
> Windows -- I switch the shell to be tcsh.  Could I learn a new shell,
> sure?   If I were to switch today, it would probably be zsh, but my
> suggestion is to learn the tools that system has really well.  They
> keep using them. Adapt to the style of the system you are using.

As you'll be able to guess by now, I beg to differ.

For example, I have forced myself to learn POSIX shell and Bash, even
enjoying some of it along the way. Today, I believe that they are both
rather terrible things I don't want to spend too much time with. (That
said, for my use case, Bash is almost always preferable over the
available POSIX sh implementation.) Then, I have always had a strong
dislike for the interface of the Unix `find` command. So, I tried to
replace it with what I thought was a better solution (relatively). That
required me to understand `find` on a whole different level. And after
gaining a much better understanding of `find` (and losing some of my
dislike for it), I still believe it should be replaced and have a few
ideas on how to do that. (Sadly, I mainly just have ideas.)

So, in a nutshell: I think that adapting to something that you believe
to be more than slightly deficient after giving it a try and trying to
understand its logic is not a reasonable thing to do.

> Anyway, that my thoughts from an old guy.

They're much appreciated.

--
Michael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 21:06   ` Richard Salz
@ 2021-07-31 21:32     ` Jon Steinhart
  2021-07-31 21:37       ` Richard Salz
  2021-07-31 22:10       ` Warner Losh
  0 siblings, 2 replies; 72+ messages in thread
From: Jon Steinhart @ 2021-07-31 21:32 UTC (permalink / raw)
  To: TUHS main list

Richard Salz writes:
> On Sat, Jul 31, 2021 at 3:21 PM Jon Steinhart <jon@fourwinds.com> wrote:
>
> > opinion, it doesn't add value to do something that's already been done
> > but differently; it detracts from value because now there's yet another
> > competing way to do something.
> >
>
> You mean like not using getopt and rolling your own?  Shrug.
>
> while ((i = getopt(argc, argv, "xxxxx:xxxx")) != -1)
>    switch (i) {
>    case ....
>   }
> argc -= optind;
> argv += optind;
>
> So I never got getopt().  One of my rules is that I don't use a library
> > in cases where the number of lines of gunk that that it takes to use a
> > library function is >= the number of lines to just write it myself.
>
>
> I don't know, what lines in the above are extra beyond what you write?  The
> last two if being generous I suppose.

Well, in my opinion that's not really an accurate representation of using getopt.

I would of course write the #include line, and the table of options, which would
end up being >= the number of lines that it takes me to do this...

	while (--argc > 0) {
		if (*(++argv)[0] == '-') {
			for (p = *argv + 1; *p != '\0'; p++) {
				switch (*p) {

Even if it took a few more lines to do it my way, I'm a believer that good coding
style keeps "meatspace locality of reference" in mind.  As programmers, we put in
a lot of effort to ensure locality of reference for computers, but then completely
toss it for people who aren't as good as it.  So given a choice of a few lines of
code versus having to look something up somewhere else, I choose the few lines of
code.

Being a geezer, I have lots of code lying around from which I can extract working
fragments such as the one above.  Writing those few lines of code provides insulation
from supply-side attack vectors bugs in libraries, versioning issues, having to load
debug libraries, and so on.

I realize that this isn't a huge deal by itself; it's a philosophical point.  When
I strace any random program that I didn't write I'm astonished by the amount of
library loading that takes place.  So any issues are multiplied by n.

Don't get me wrong; I use plenty of libraries.  But I tend to use those for stuff
that is so common that there is a benefit from shared libraries (or at least there
was before everything got containerized) and for libraries that do actual hard stuff.
But I don't use libraries for small snippets of code that I could easily write
myself yielding better code clarity for others reading my code.

Jon

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 21:32     ` Jon Steinhart
@ 2021-07-31 21:37       ` Richard Salz
  2021-07-31 21:55         ` Jon Steinhart
  2021-07-31 22:10       ` Warner Losh
  1 sibling, 1 reply; 72+ messages in thread
From: Richard Salz @ 2021-07-31 21:37 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 341 bytes --]

On Sat, Jul 31, 2021 at 5:34 PM Jon Steinhart <jon@fourwinds.com> wrote:

> Well, in my opinion that's not really an accurate representation of using
> getopt.
>
>
It's how all my getopt code works.

getopt is in libc and a stdlib.h so you can't count that against it :)  on
the other hand, your sample code didn't show arg/no-arg handling.

[-- Attachment #2: Type: text/html, Size: 727 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 21:37       ` Richard Salz
@ 2021-07-31 21:55         ` Jon Steinhart
  0 siblings, 0 replies; 72+ messages in thread
From: Jon Steinhart @ 2021-07-31 21:55 UTC (permalink / raw)
  To: TUHS main list

Richard Salz writes:
>
> On Sat, Jul 31, 2021 at 5:34 PM Jon Steinhart <jon@fourwinds.com> wrote:
>
> > Well, in my opinion that's not really an accurate representation of using
> > getopt.
> >
> >
> It's how all my getopt code works.
>
> getopt is in libc and a stdlib.h so you can't count that against it :)  on
> the other hand, your sample code didn't show arg/no-arg handling.

Well, at least on my system it's here:

SYNOPSIS
       #include <unistd.h>

not either of those other places.  I could provide you with a complete working
example, but I don't think that it's the important part of the discussion.
Using getopt() is more or less a wash in terms of lines of code so the
meatspace locality of reference argument carries the day for me.  Your mileage
may vary.

Jon

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 19:20 ` [TUHS] Systematic approach to command-line interfaces [ meta issues ] Jon Steinhart
  2021-07-31 21:06   ` Richard Salz
@ 2021-07-31 22:04   ` Bakul Shah
  2021-07-31 22:13     ` Larry McVoy
  2021-07-31 22:16     ` Jon Steinhart
  1 sibling, 2 replies; 72+ messages in thread
From: Bakul Shah @ 2021-07-31 22:04 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: TUHS main list

On Jul 31, 2021, at 12:20 PM, Jon Steinhart <jon@fourwinds.com> wrote:
> 
> So I never got getopt().  One of my rules is that I don't use a library
> in cases where the number of lines of gunk that that it takes to use a
> library function is >= the number of lines to just write it myself.  Yeah,
> I know the "but the library has more eyeballs and is debugged" argument
> but in reality libraries are the source of many bugs.  I've always taken
> the approach that I would never hire someone who had to use a library to
> implement a singly-linked list.

getopt() is perhaps the wrong solution but consider something like MH,
whose commands all follow a common pattern. Consider:

  - options (switches) all start with a single '-'
  - they may be abbreviated to a unique prefix.
  - Boolean options may be inverted by prepending -no (e.g. -nolist)
  - value options may also have -no format to remove a previous (or default) value
  - options may appear anywhere and the last instance wins

But different commands take different options. It would make sense to factor
out common parsing, help etc. for a consistent treatment. In my Go code for
parsing MH like options I used Go's flag package as a model.

Personally I vastly prefer MH style option processing to either single char
options or --very-long-names which can't be abbreviated. Never mind options for
commands like gcc, who can even remember 40+ ls options?

But I haven't thought about how to extend this for shell scripts, or
exposing these so that shells like zsh can do command completion. To specify
these you need a vector of tuples (name, type, default, brief-help) but that
is painful to do in a shell.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 21:32     ` Jon Steinhart
  2021-07-31 21:37       ` Richard Salz
@ 2021-07-31 22:10       ` Warner Losh
  2021-07-31 22:19         ` Larry McVoy
  2021-07-31 22:20         ` Jon Steinhart
  1 sibling, 2 replies; 72+ messages in thread
From: Warner Losh @ 2021-07-31 22:10 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 3644 bytes --]

On Sat, Jul 31, 2021 at 3:33 PM Jon Steinhart <jon@fourwinds.com> wrote:

> Richard Salz writes:
> > On Sat, Jul 31, 2021 at 3:21 PM Jon Steinhart <jon@fourwinds.com> wrote:
> >
> > > opinion, it doesn't add value to do something that's already been done
> > > but differently; it detracts from value because now there's yet another
> > > competing way to do something.
> > >
> >
> > You mean like not using getopt and rolling your own?  Shrug.
> >
> > while ((i = getopt(argc, argv, "xxxxx:xxxx")) != -1)
> >    switch (i) {
> >    case ....
> >   }
> > argc -= optind;
> > argv += optind;
> >
> > So I never got getopt().  One of my rules is that I don't use a library
> > > in cases where the number of lines of gunk that that it takes to use a
> > > library function is >= the number of lines to just write it myself.
> >
> >
> > I don't know, what lines in the above are extra beyond what you write?
> The
> > last two if being generous I suppose.
>
> Well, in my opinion that's not really an accurate representation of using
> getopt.
>
> I would of course write the #include line, and the table of options, which
> would
> end up being >= the number of lines that it takes me to do this...
>
>         while (--argc > 0) {
>                 if (*(++argv)[0] == '-') {
>                         for (p = *argv + 1; *p != '\0'; p++) {
>                                 switch (*p) {
>

Except for all the things this gets wrong, it's ok. The problem with
inlining getopt
is that you wind up with cases like -f foo'' on the command line being
treated differently
than '-ffoo'. Inlined code like this can be quite frustrating for the user
to use. Your
locality of reference is cut and paste bugs that getopt eliminates because
it handles
all the special cases in a uniform way.


> Even if it took a few more lines to do it my way, I'm a believer that good
> coding
> style keeps "meatspace locality of reference" in mind.  As programmers, we
> put in
> a lot of effort to ensure locality of reference for computers, but then
> completely
> toss it for people who aren't as good as it.  So given a choice of a few
> lines of
> code versus having to look something up somewhere else, I choose the few
> lines of
> code.
>

And a few more bugs...

Being a geezer, I have lots of code lying around from which I can extract
> working
> fragments such as the one above.  Writing those few lines of code provides
> insulation
> from supply-side attack vectors bugs in libraries, versioning issues,
> having to load
> debug libraries, and so on.
>

getopt has been standardized since the 80s and has had universal adoption
since
the 90s. Hardly a version chasing issue since it's in everybody's libc.


> I realize that this isn't a huge deal by itself; it's a philosophical
> point.  When
> I strace any random program that I didn't write I'm astonished by the
> amount of
> library loading that takes place.  So any issues are multiplied by n.
>

The flip side to this is that libraries can be debugged once, while inline
code
like the above needs to be deugged over and over....


> Don't get me wrong; I use plenty of libraries.  But I tend to use those
> for stuff
> that is so common that there is a benefit from shared libraries (or at
> least there
> was before everything got containerized) and for libraries that do actual
> hard stuff.
> But I don't use libraries for small snippets of code that I could easily
> write
> myself yielding better code clarity for others reading my code.
>

Given the number of times I've been burned by trying to roll my own getopt,
I stopped trying years ago. It's harder than it looks.

Warner


> Jon
>

[-- Attachment #2: Type: text/html, Size: 5365 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:04   ` Bakul Shah
@ 2021-07-31 22:13     ` Larry McVoy
  2021-07-31 22:14       ` Bakul Shah
  2021-07-31 22:16     ` Jon Steinhart
  1 sibling, 1 reply; 72+ messages in thread
From: Larry McVoy @ 2021-07-31 22:13 UTC (permalink / raw)
  To: Bakul Shah; +Cc: TUHS main list

On Sat, Jul 31, 2021 at 03:04:48PM -0700, Bakul Shah wrote:
> On Jul 31, 2021, at 12:20 PM, Jon Steinhart <jon@fourwinds.com> wrote:
> > 
> > So I never got getopt().  One of my rules is that I don't use a library
> > in cases where the number of lines of gunk that that it takes to use a
> > library function is >= the number of lines to just write it myself.  Yeah,
> > I know the "but the library has more eyeballs and is debugged" argument
> > but in reality libraries are the source of many bugs.  I've always taken
> > the approach that I would never hire someone who had to use a library to
> > implement a singly-linked list.
> 
> getopt() is perhaps the wrong solution but consider something like MH,
> whose commands all follow a common pattern. Consider:
> 
>   - options (switches) all start with a single '-'
>   - they may be abbreviated to a unique prefix.

That last one is a gotcha waiting to happen:

program --this-is-the-long-option

is the same as 

program --this

but that will break scripts (and fingers) when program gets a new 
option like

program --this-is-the-even-longer-option

We wrote our own getopt() for BitKeeper and it had long and short options
but no gotcha unique prefix.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:13     ` Larry McVoy
@ 2021-07-31 22:14       ` Bakul Shah
  2021-07-31 22:17         ` Bakul Shah
  0 siblings, 1 reply; 72+ messages in thread
From: Bakul Shah @ 2021-07-31 22:14 UTC (permalink / raw)
  To: Larry McVoy; +Cc: TUHS main list

On Jul 31, 2021, at 3:13 PM, Larry McVoy <lm@mcvoy.com> wrote:
> 
> On Sat, Jul 31, 2021 at 03:04:48PM -0700, Bakul Shah wrote:
>> On Jul 31, 2021, at 12:20 PM, Jon Steinhart <jon@fourwinds.com> wrote:
>>> 
>>> So I never got getopt().  One of my rules is that I don't use a library
>>> in cases where the number of lines of gunk that that it takes to use a
>>> library function is >= the number of lines to just write it myself.  Yeah,
>>> I know the "but the library has more eyeballs and is debugged" argument
>>> but in reality libraries are the source of many bugs.  I've always taken
>>> the approach that I would never hire someone who had to use a library to
>>> implement a singly-linked list.
>> 
>> getopt() is perhaps the wrong solution but consider something like MH,
>> whose commands all follow a common pattern. Consider:
>> 
>>  - options (switches) all start with a single '-'
>>  - they may be abbreviated to a unique prefix.
> 
> That last one is a gotcha waiting to happen:
> 
> program --this-is-the-long-option
> 
> is the same as 
> 
> program --this
> 
> but that will break scripts (and fingers) when program gets a new 
> option like

That is easy to fix: use full options in scripts. Abbreviations for
interactive use. Much better than --always-having-to-type-long-names.

> 
> program --this-is-the-even-longer-option
> 
> We wrote our own getopt() for BitKeeper and it had long and short options
> but no gotcha unique prefix.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:04   ` Bakul Shah
  2021-07-31 22:13     ` Larry McVoy
@ 2021-07-31 22:16     ` Jon Steinhart
  2021-07-31 22:20       ` Bakul Shah
  1 sibling, 1 reply; 72+ messages in thread
From: Jon Steinhart @ 2021-07-31 22:16 UTC (permalink / raw)
  To: TUHS main list

Bakul Shah writes:
> On Jul 31, 2021, at 12:20 PM, Jon Steinhart <jon@fourwinds.com> wrote:
> > 
> > So I never got getopt().  One of my rules is that I don't use a library
> > in cases where the number of lines of gunk that that it takes to use a
> > library function is >= the number of lines to just write it myself.  Yeah,
> > I know the "but the library has more eyeballs and is debugged" argument
> > but in reality libraries are the source of many bugs.  I've always taken
> > the approach that I would never hire someone who had to use a library to
> > implement a singly-linked list.
>
> getopt() is perhaps the wrong solution but consider something like MH,
> whose commands all follow a common pattern. Consider:
>
>   - options (switches) all start with a single '-'
>   - they may be abbreviated to a unique prefix.
>   - Boolean options may be inverted by prepending -no (e.g. -nolist)
>   - value options may also have -no format to remove a previous (or default) value
>   - options may appear anywhere and the last instance wins
>
> But different commands take different options. It would make sense to factor
> out common parsing, help etc. for a consistent treatment. In my Go code for
> parsing MH like options I used Go's flag package as a model.
>
> Personally I vastly prefer MH style option processing to either single char
> options or --very-long-names which can't be abbreviated. Never mind options for
> commands like gcc, who can even remember 40+ ls options?
>
> But I haven't thought about how to extend this for shell scripts, or
> exposing these so that shells like zsh can do command completion. To specify
> these you need a vector of tuples (name, type, default, brief-help) but that
> is painful to do in a shell.

Ah, well, you've given away the secret of real UNIX geezers, we're on both
this mailing list and the nmh list :-)

Yes, I'm mostly happy with the way that nmh does options.

I guess that I would look more kindly on getopt if it had existed much earlier
so that people writing new commands would be encouraged to use the same format.
Not as happy with it as an afterthought.

Once again, I have to go back to the meatspace locality of reference issues.
Sure, it would be nice to be able to factor out common parsing, for example
if a related set of programs shared the same option set.  But unless it's
something huge, I'd just put it in it's own file and use it for multiple
programs; I wouldn't put it in a library.  My point is that the code that
does the actual parsing is really trivial, and not necessarily the best
use of a library.

As far as help goes, I don't expect help built into command line programs;
I expect to look up error messages on the manual pages.  I'm happy with a
generic usage error as most "helpful" output that I get from programs is
not actually helpful.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:14       ` Bakul Shah
@ 2021-07-31 22:17         ` Bakul Shah
  0 siblings, 0 replies; 72+ messages in thread
From: Bakul Shah @ 2021-07-31 22:17 UTC (permalink / raw)
  To: Larry McVoy; +Cc: TUHS main list

On Jul 31, 2021, at 3:14 PM, Bakul Shah <bakul@iitbombay.org> wrote:
> 
> On Jul 31, 2021, at 3:13 PM, Larry McVoy <lm@mcvoy.com> wrote:
>> 
>> On Sat, Jul 31, 2021 at 03:04:48PM -0700, Bakul Shah wrote:
>>> On Jul 31, 2021, at 12:20 PM, Jon Steinhart <jon@fourwinds.com> wrote:
>>>> 
>>>> So I never got getopt().  One of my rules is that I don't use a library
>>>> in cases where the number of lines of gunk that that it takes to use a
>>>> library function is >= the number of lines to just write it myself.  Yeah,
>>>> I know the "but the library has more eyeballs and is debugged" argument
>>>> but in reality libraries are the source of many bugs.  I've always taken
>>>> the approach that I would never hire someone who had to use a library to
>>>> implement a singly-linked list.
>>> 
>>> getopt() is perhaps the wrong solution but consider something like MH,
>>> whose commands all follow a common pattern. Consider:
>>> 
>>> - options (switches) all start with a single '-'
>>> - they may be abbreviated to a unique prefix.
>> 
>> That last one is a gotcha waiting to happen:
>> 
>> program --this-is-the-long-option
>> 
>> is the same as 
>> 
>> program --this
>> 
>> but that will break scripts (and fingers) when program gets a new 
>> option like
> 
> That is easy to fix: use full options in scripts. Abbreviations for
> interactive use. Much better than --always-having-to-type-long-names.
> 
>> 
>> program --this-is-the-even-longer-option
>> 
>> We wrote our own getopt() for BitKeeper and it had long and short options
>> but no gotcha unique prefix.

Forgot to add that whoever extends "program" should know not to create a new
option that uses a longer name breaking a full form old options.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:10       ` Warner Losh
@ 2021-07-31 22:19         ` Larry McVoy
  2021-07-31 22:20         ` Jon Steinhart
  1 sibling, 0 replies; 72+ messages in thread
From: Larry McVoy @ 2021-07-31 22:19 UTC (permalink / raw)
  To: Warner Losh; +Cc: TUHS main list

On Sat, Jul 31, 2021 at 04:10:04PM -0600, Warner Losh wrote:
> On Sat, Jul 31, 2021 at 3:33 PM Jon Steinhart <jon@fourwinds.com> wrote:
> 
> > Richard Salz writes:
> > > On Sat, Jul 31, 2021 at 3:21 PM Jon Steinhart <jon@fourwinds.com> wrote:
> > >
> > > > opinion, it doesn't add value to do something that's already been done
> > > > but differently; it detracts from value because now there's yet another
> > > > competing way to do something.
> > > >
> > >
> > > You mean like not using getopt and rolling your own?  Shrug.
> > >
> > > while ((i = getopt(argc, argv, "xxxxx:xxxx")) != -1)
> > >    switch (i) {
> > >    case ....
> > >   }
> > > argc -= optind;
> > > argv += optind;
> > >
> > > So I never got getopt().  One of my rules is that I don't use a library
> > > > in cases where the number of lines of gunk that that it takes to use a
> > > > library function is >= the number of lines to just write it myself.
> > >
> > >
> > > I don't know, what lines in the above are extra beyond what you write?
> > The
> > > last two if being generous I suppose.
> >
> > Well, in my opinion that's not really an accurate representation of using
> > getopt.
> >
> > I would of course write the #include line, and the table of options, which
> > would
> > end up being >= the number of lines that it takes me to do this...
> >
> >         while (--argc > 0) {
> >                 if (*(++argv)[0] == '-') {
> >                         for (p = *argv + 1; *p != '\0'; p++) {
> >                                 switch (*p) {
> >
> 
> Except for all the things this gets wrong, it's ok. The problem with
> inlining getopt
> is that you wind up with cases like -f foo'' on the command line being
> treated differently
> than '-ffoo'. 

BitKeeper's getopt had a different char for that: "f:" allows -ffoo or -f foo
but "f;" insists on no space.

With that, I'm bowing out of this thread, it's becoming a bike shed.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:10       ` Warner Losh
  2021-07-31 22:19         ` Larry McVoy
@ 2021-07-31 22:20         ` Jon Steinhart
  2021-07-31 23:26           ` Warner Losh
  1 sibling, 1 reply; 72+ messages in thread
From: Jon Steinhart @ 2021-07-31 22:20 UTC (permalink / raw)
  To: TUHS main list

Warner Losh writes:
>
> The flip side to this is that libraries can be debugged once, while inline
> code like the above needs to be deugged over and over....

Well, no.  Inline code doesn't need to be debugged over and over.  It doesn't
have to be written from scratch every time.  While in theory your point about
libraries is correct, it hasn't seem to have worked out in practice.  Better
in C than in node.js, but there have been plenty of spectacular bugs found in
old C libraries recently.

Jon

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:16     ` Jon Steinhart
@ 2021-07-31 22:20       ` Bakul Shah
  0 siblings, 0 replies; 72+ messages in thread
From: Bakul Shah @ 2021-07-31 22:20 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: TUHS main list


-- Bakul

> On Jul 31, 2021, at 3:16 PM, Jon Steinhart <jon@fourwinds.com> wrote:
> 
> Bakul Shah writes:
>> On Jul 31, 2021, at 12:20 PM, Jon Steinhart <jon@fourwinds.com> wrote:
>>> 
>>> So I never got getopt().  One of my rules is that I don't use a library
>>> in cases where the number of lines of gunk that that it takes to use a
>>> library function is >= the number of lines to just write it myself.  Yeah,
>>> I know the "but the library has more eyeballs and is debugged" argument
>>> but in reality libraries are the source of many bugs.  I've always taken
>>> the approach that I would never hire someone who had to use a library to
>>> implement a singly-linked list.
>> 
>> getopt() is perhaps the wrong solution but consider something like MH,
>> whose commands all follow a common pattern. Consider:
>> 
>>  - options (switches) all start with a single '-'
>>  - they may be abbreviated to a unique prefix.
>>  - Boolean options may be inverted by prepending -no (e.g. -nolist)
>>  - value options may also have -no format to remove a previous (or default) value
>>  - options may appear anywhere and the last instance wins
>> 
>> But different commands take different options. It would make sense to factor
>> out common parsing, help etc. for a consistent treatment. In my Go code for
>> parsing MH like options I used Go's flag package as a model.
>> 
>> Personally I vastly prefer MH style option processing to either single char
>> options or --very-long-names which can't be abbreviated. Never mind options for
>> commands like gcc, who can even remember 40+ ls options?
>> 
>> But I haven't thought about how to extend this for shell scripts, or
>> exposing these so that shells like zsh can do command completion. To specify
>> these you need a vector of tuples (name, type, default, brief-help) but that
>> is painful to do in a shell.
> 
> Ah, well, you've given away the secret of real UNIX geezers, we're on both
> this mailing list and the nmh list :-)

:-)

> Yes, I'm mostly happy with the way that nmh does options.
> 
> I guess that I would look more kindly on getopt if it had existed much earlier
> so that people writing new commands would be encouraged to use the same format.
> Not as happy with it as an afterthought.
> 
> Once again, I have to go back to the meatspace locality of reference issues.
> Sure, it would be nice to be able to factor out common parsing, for example
> if a related set of programs shared the same option set.  But unless it's
> something huge, I'd just put it in it's own file and use it for multiple
> programs; I wouldn't put it in a library.  My point is that the code that
> does the actual parsing is really trivial, and not necessarily the best
> use of a library.
> 
> As far as help goes, I don't expect help built into command line programs;
> I expect to look up error messages on the manual pages.  I'm happy with a
> generic usage error as most "helpful" output that I get from programs is
> not actually helpful.

Note that -help in MH program is far more useful as it spells out the full option name.
Consider

% refile -he
Usage: refile [msgs] [switches] +folder ...
  switches are:
  -draft
  -[no]link
  -[no]preserve
  -[no]retainsequences
  -[no]unlink
  -src +folder
  -file file
  -rmmproc program
  -normmproc
  -version
  -help
...

vs

% ls -z
ls: invalid option -- z
usage: ls [-ABCFGHILPRSTUWZabcdfghiklmnopqrstuwxy1,] [--color=when] [-D format] [file ...]



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 22:20         ` Jon Steinhart
@ 2021-07-31 23:26           ` Warner Losh
  2021-07-31 23:41             ` Jon Steinhart
  0 siblings, 1 reply; 72+ messages in thread
From: Warner Losh @ 2021-07-31 23:26 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 792 bytes --]

On Sat, Jul 31, 2021 at 4:37 PM Jon Steinhart <jon@fourwinds.com> wrote:

> Warner Losh writes:
> >
> > The flip side to this is that libraries can be debugged once, while
> inline
> > code like the above needs to be deugged over and over....
>
> Well, no.  Inline code doesn't need to be debugged over and over.  It
> doesn't
> have to be written from scratch every time.  While in theory your point
> about
> libraries is correct, it hasn't seem to have worked out in practice.
> Better
> in C than in node.js, but there have been plenty of spectacular bugs found
> in
> old C libraries recently.
>

The large number of times I've had to replace inline code like you've
quoted with
getopt to fix the irregularities in command line parsing suggests that we
differ on
this viewpoint.

Warner

[-- Attachment #2: Type: text/html, Size: 1217 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-07-31 23:26           ` Warner Losh
@ 2021-07-31 23:41             ` Jon Steinhart
  0 siblings, 0 replies; 72+ messages in thread
From: Jon Steinhart @ 2021-07-31 23:41 UTC (permalink / raw)
  To: TUHS main list

Warner Losh writes:
> The large number of times I've had to replace inline code like you've
> quoted with
> getopt to fix the irregularities in command line parsing suggests that we
> differ on
> this viewpoint.

Fine by me.  Never hurts to know what other people consider best practices.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 15:56 ` Paul Winalski
  2021-07-31 16:19   ` Dan Cross
@ 2021-08-01 16:51   ` Michael Siegel
  2021-08-01 17:31     ` Jon Steinhart
  1 sibling, 1 reply; 72+ messages in thread
From: Michael Siegel @ 2021-08-01 16:51 UTC (permalink / raw)
  To: Paul Winalski; +Cc: tuhs

Am Sat, 31 Jul 2021 11:56:51 -0400
schrieb Paul Winalski <paul.winalski@gmail.com>:

> On 7/31/21, Michael Siegel <msi@malbolge.net> wrote:
> >
> > While doing that, I learned that there is a better way to approach
> > this problem – beyond using getopt(s) (which never really made
> > sense to me) and having to write case statements in loops every
> > time: Define a grammar, let a pre-built parser do the work, and
> > have the parser provide the results to the program.  
> 
> This method for handling command lines dates back at least to the
> 1970s.  The COMND JSYS (system call) in TOPS-20 operated this way, as
> does the DCL command line interface in OpenVMS.  As you pointed out it
> can greatly simplify the code in the application.  It also permits
> command completion.  If the command has a long-winded option, such as
> -supercalifragilisticexpialidocious, I can type -super then hit the
> TAB key and as long as there is only one option that starts with
> -super the parser will fill in the rest of the long keyword.  It also
> means that you can provide interactive help.  At any point the user
> can type a question mark and the command interpreter will say what
> syntactic element is expected next.

Being able to provide interactive help is exactly what the person who
suggested grammar-based parsing to me was working on. I hadn't even
thought about that at first. But given my recent investigation into
built-in command documentation on Unix-like systems, I tend to think
this would be a great enhancement – if it was implemented with a
strict focus on not getting in the way, i.e., the user should be able
to switch it off completely, and search-as-you-type would be opt-in, if
provided at all.


--
Michael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-01 16:51   ` Michael Siegel
@ 2021-08-01 17:31     ` Jon Steinhart
  0 siblings, 0 replies; 72+ messages in thread
From: Jon Steinhart @ 2021-08-01 17:31 UTC (permalink / raw)
  To: tuhs

Michael Siegel writes:
> Being able to provide interactive help is exactly what the person who
> suggested grammar-based parsing to me was working on. I hadn't even
> thought about that at first. But given my recent investigation into
> built-in command documentation on Unix-like systems, I tend to think
> this would be a great enhancement – if it was implemented with a
> strict focus on not getting in the way, i.e., the user should be able
> to switch it off completely, and search-as-you-type would be opt-in, if
> provided at all.
>
>
> --
> Michael

While I agree with you in theory, I'm dubious about how it would work
in practice.  Sorry if it sounds like I've lost faith with my profession
but I'm trying to accept reality.

Where would such documentation come from?  Not wanting to reopen old
flame wars, but the fragmentation of the documentation system where some
things are man pages, some are info pages, some are random HTML files,
some are only online, some things having no documentation at all except
maybe a help message, and much of that have no actual content, is reality.
Who's going to write more documentation, and how is it going to be kept
consistent with other documentation?  Is your help system going to be
yet another fragment?

Also, we seem to be well into a "Farenheight 451" world now where a vast
number of people communicate only by photos and video.  Writing seems to
be deprecated.  I don't have confidence that any documentation would be
useful even if people wrote it.  As a current example, I'm having trouble
with a btrfs filesystem and I can't say that the btrfs-check manual page
contains any useful content.

Maybe since Microsoft "AI" is now going to write code for, not sure what
to call them, programmers doesn't seem right any more, maybe it'll write
their documentation too?

I guess what I'm saying is that it sounds like you are having some good
thoughts on a technical solution that I think will fail without also
having a social solution.  If you could somehow extract your help info
from man pages without creating a whole new documentation system it
might work in the few cases where there are good manual pages.

Wow I sound grumpy this morning.

Jon

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 16:19   ` Dan Cross
@ 2021-08-01 17:44     ` Chet Ramey
  2021-08-01 21:53       ` Dan Cross
  0 siblings, 1 reply; 72+ messages in thread
From: Chet Ramey @ 2021-08-01 17:44 UTC (permalink / raw)
  To: Dan Cross, Paul Winalski, Michael Siegel; +Cc: TUHS main list

On 7/31/21 12:19 PM, Dan Cross wrote:

> There was someone posting here on TUHS a while back about leveraging a 
> special context-sensitive `--shell-help` or similar command line program 
> and synthesizing a protocol between the shell and a program to provide 
> TOPS-20 like command completion. It was nowhere near what you get from the 
> COMND JSYS, but seemed like a reasonable approximation.

This is essentially how the existing shells do it (bash, zsh, tcsh, etc.),
but in an ad-hoc fashion. There is no standard way to obtain possible
completions or list possible arguments, so the shells push that to external
generators.

Since you have to perform the completion in the shell, there has to be some
way to tell the shell the possible completions for each command of
interest, whether that's options or arguments. The different shells have
solved that in essentially the same way, with a few syntactic variations.

Bash provides a framework (complete/compgen/compctl) and pushes a lot of
the command-specific work to external completers. It provides access to the
shell internals (lists of builtins, functions, aliases, variables, and so
on) and built-in ways to perform common completions (filenames, directory
names, command names, etc.), and leaves the rest to external commands or
shell functions.

The real power and flexibility comes from being able to invoke these
external commands or shell functions to generate lists of possible
completions, and defining an API between the shell and those generators to
specify enough of the command line to make it easy to find the word to be
completed, the command for which completion is being attempted, and
clarifying context around that word. In the same way, the shell provides an
API for those generators to return possible completions.

The knowledge about each command's options and arguments is embedded in
these generators.

A standard way to handle command line options and arguments would make
generators easier to write, but doesn't address the other issues of what,
exactly, the user wants to complete, so the existing mechanisms would
likely not change very much. Something like `--shell-help', as long as it
were context-sensitive, would help more.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-07-31 18:56   ` Michael Siegel
  2021-07-31 19:41     ` Clem Cole
@ 2021-08-01 17:48     ` Chet Ramey
  2021-08-01 19:23       ` Richard Salz
  1 sibling, 1 reply; 72+ messages in thread
From: Chet Ramey @ 2021-08-01 17:48 UTC (permalink / raw)
  To: Michael Siegel, Anthony Martin; +Cc: tuhs

On 7/31/21 2:56 PM, Michael Siegel wrote:
> Am Sat, 31 Jul 2021 10:30:18 -0700
> schrieb Anthony Martin <ality@pbrane.org>:
> 
>> Michael Siegel <msi@malbolge.net> once said:
>>> So, I've prepared a bit of a write-up, pondering on the pros and
>>> cons of two different ways of having task-specific tool sets
>>> (non-hierarchical command sets vs. sub-commands) that is available
>>> at
>>>
>>>    https://www.msiism.org/files/doc/unix-like_command-line_interfaces.html
>>>
>>> I tend to think the sub-command approach is better. But I'm neither
>>> a UI nor a Unix expert and have no formal training in computer
>>> things. So, I thought this would be a good place to ask for comment
>>> (and get some historical perspective).
>>
>> You're missing the approach taken in Plan 9 (and
>> 10th edition Unix): put related commands in a
>> directory and use a shell that doesn't restrict
>> the first argument of a command to a single path
>> element.
>>
>> This lets you execute commands like:
>>
>> 	auth/as
>> 	disk/prep
>> 	git/rebase
>> 	ip/ping
>> 	ndb/dns
>> 	upas/send
>>
>> without having a prefix on every command name or
>> single large binaries with every command linked
>> in as subcommands.
> 
> Thanks for pointing this out. I had no idea.
> 
> Unfortunately(?), I'm looking to make my life easier on more "Unix-like
> Unix-like systems" (for want of a better term), for the time being
> (Linux, BSD, maybe illumos). (I mean, which shell would I use to
> accomplish this on Unix?)

POSIX forbids this behavior, FWIW, so you'll probably have a hard time
finding a shell -- at least one with POSIX aspirations -- that implements it.



-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-01 17:48     ` Chet Ramey
@ 2021-08-01 19:23       ` Richard Salz
  2021-08-01 23:26         ` Chet Ramey
  0 siblings, 1 reply; 72+ messages in thread
From: Richard Salz @ 2021-08-01 19:23 UTC (permalink / raw)
  To: Chester Ramey; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]

> >> This lets you execute commands like:
> >>
> >>      auth/as
> ...

POSIX forbids this behavior, FWIW, so you'll probably have a hard time
> finding a shell -- at least one with POSIX aspirations -- that implements
> it.
>

So you write a function or alias that prepends the full path to "as" and
exec's the command.  So you have to type "auth as ..." but BFD.

[-- Attachment #2: Type: text/html, Size: 789 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-01 17:44     ` Chet Ramey
@ 2021-08-01 21:53       ` Dan Cross
  2021-08-01 23:21         ` Chet Ramey
  2021-08-01 23:36         ` John Cowan
  0 siblings, 2 replies; 72+ messages in thread
From: Dan Cross @ 2021-08-01 21:53 UTC (permalink / raw)
  To: Chester Ramey; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 4273 bytes --]

On Sun, Aug 1, 2021 at 1:44 PM Chet Ramey <chet.ramey@case.edu> wrote:

> On 7/31/21 12:19 PM, Dan Cross wrote:
> > There was someone posting here on TUHS a while back about leveraging a
> > special context-sensitive `--shell-help` or similar command line program
> > and synthesizing a protocol between the shell and a program to provide
> > TOPS-20 like command completion. It was nowhere near what you get from
> the
> > COMND JSYS, but seemed like a reasonable approximation.
>
> This is essentially how the existing shells do it (bash, zsh, tcsh, etc.),
> but in an ad-hoc fashion. There is no standard way to obtain possible
> completions or list possible arguments, so the shells push that to external
> generators.
>
> Since you have to perform the completion in the shell, there has to be some
> way to tell the shell the possible completions for each command of
> interest, whether that's options or arguments. The different shells have
> solved that in essentially the same way, with a few syntactic variations.
>
> Bash provides a framework (complete/compgen/compctl) and pushes a lot of
> the command-specific work to external completers. It provides access to the
> shell internals (lists of builtins, functions, aliases, variables, and so
> on) and built-in ways to perform common completions (filenames, directory
> names, command names, etc.), and leaves the rest to external commands or
> shell functions.
>
> The real power and flexibility comes from being able to invoke these
> external commands or shell functions to generate lists of possible
> completions, and defining an API between the shell and those generators to
> specify enough of the command line to make it easy to find the word to be
> completed, the command for which completion is being attempted, and
> clarifying context around that word. In the same way, the shell provides an
> API for those generators to return possible completions.
>
> The knowledge about each command's options and arguments is embedded in
> these generators.
>
> A standard way to handle command line options and arguments would make
> generators easier to write, but doesn't address the other issues of what,
> exactly, the user wants to complete, so the existing mechanisms would
> likely not change very much. Something like `--shell-help', as long as it
> were context-sensitive, would help more.

Thanks for the useful background information on existing solutions.

If I understood the proposal correctly, it was that the program in question
would, itself, be the generator as described above. Perhaps it was coupled
with a standard structured format for consumption by the shell, which seems
like it would be useful for this sort of expansion.

Of course, the process model in TOPS-20 was very different than in Unix,
and in that system, as soon as you typed the _name_ of a command it's image
was "run up" in your process. So the interactive help system was provided
by a running instance of the program itself. What I gathered from the
proposed model was that it involved multiple invocations of the program,
but with a special option that would trigger behavior informally described
as, "here's the context I've built so far; let me know what options are
available here." I don't know that it's terribly "Unixy", but I can see how
it would be useful for interactive use.

As an aside, I maintain some older "machines" at home (even modest hardware
can emulate a PDP-10 or Honeywell DPS8), and find that doing so provides me
with perspective that can be very useful. Looking at other systems that
were available roughly around the time of Unix (TENEX, Multics), it strikes
me that the Unix was a bit of an odd-duck with the way it handled exec in
terms of destructively overlaying the memory of the user portion of a
process with a new image; am I wrong here? I wonder why the "one program
per process and exec destroys what was running before" mechanism was
implemented? I can imagine it had a lot to do with the constraints that
early Unix machines must have imposed on design, not to mention
implementation simplicity, but I wonder what the designers thought of other
systems' process models and whether they were considered at all? Perhaps
Doug and Ken might have thoughts here?

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 4898 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-01 21:53       ` Dan Cross
@ 2021-08-01 23:21         ` Chet Ramey
  2021-08-01 23:36         ` John Cowan
  1 sibling, 0 replies; 72+ messages in thread
From: Chet Ramey @ 2021-08-01 23:21 UTC (permalink / raw)
  To: Dan Cross; +Cc: TUHS main list

On 8/1/21 5:53 PM, Dan Cross wrote:

> Thanks for the useful background information on existing solutions.
> 
> If I understood the proposal correctly, it was that the program in question 
> would, itself, be the generator as described above. Perhaps it was coupled 
> with a standard structured format for consumption by the shell, which seems 
> like it would be useful for this sort of expansion.

Yes, it would make writing generators easier. The rest of the process
would change very little: determining the word to complete, determining
the command name, breaking the edit line into words for the generator,
invoking the generator through the appropriate mechanism, parsing the
results, and processing the matches. From the shell's perspective, it's a
minor change.

> Of course, the process model in TOPS-20 was very different than in Unix, 
> and in that system, as soon as you typed the _name_ of a command it's image 
> was "run up" in your process. So the interactive help system was provided 
> by a running instance of the program itself. What I gathered from the 
> proposed model was that it involved multiple invocations of the program, 
> but with a special option that would trigger behavior informally described 
> as, "here's the context I've built so far; let me know what options are 
> available here." I don't know that it's terribly "Unixy", but I can see how 
> it would be useful for interactive use.

Yes. None of this is very "Unixy", but people have gotten used to being
able to use capabilities like completion.

When you're running interactively, running additional processes when
you're performing word completion isn't particularly expensive. Again
from the shell's perspective, invoking one generator that executes a
program with `--shell-help' isn't that much different or more expensive --
and simpler in some ways because you don't have to save any incremental
parsing state -- than executing a shell function that runs several
processes, mostly command substitutions.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-01 19:23       ` Richard Salz
@ 2021-08-01 23:26         ` Chet Ramey
  0 siblings, 0 replies; 72+ messages in thread
From: Chet Ramey @ 2021-08-01 23:26 UTC (permalink / raw)
  To: Richard Salz; +Cc: TUHS main list

On 8/1/21 3:23 PM, Richard Salz wrote:
> 
>      >> This lets you execute commands like:
>      >>
>      >>      auth/as
>     ... 
> 
>     POSIX forbids this behavior, FWIW, so you'll probably have a hard time
>     finding a shell -- at least one with POSIX aspirations -- that
>     implements it.
> 
> 
> So you write a function or alias that prepends the full path to "as" and 
> exec's the command.  So you have to type "auth as ..." but BFD.

Sure. If you invest effort in building a solution, you can do just about
anything. If you want, you can write a function that generates the set of
aliases you use to do this.

The thing is, you're going to have to build it -- you can't expect to find
a shell that does a $PATH search for a pathname containing a slash.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-01 21:53       ` Dan Cross
  2021-08-01 23:21         ` Chet Ramey
@ 2021-08-01 23:36         ` John Cowan
  2021-08-01 23:49           ` Larry McVoy
                             ` (3 more replies)
  1 sibling, 4 replies; 72+ messages in thread
From: John Cowan @ 2021-08-01 23:36 UTC (permalink / raw)
  To: Dan Cross; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 3403 bytes --]

On Sun, Aug 1, 2021 at 5:55 PM Dan Cross <crossd@gmail.com> wrote:

> Looking at other systems that were available roughly around the time of
> Unix (TENEX, Multics), it strikes me that the Unix was a bit of an odd-duck
> with the way it handled exec in terms of destructively overlaying the
> memory of the user portion of a process with a new image; am I wrong here?
>

See dmr's paper at <https://www.bell-labs.com/usr/dmr/www/hist.html> for
details, but in short exec and its equivalents elsewhere have always
overlaid the running program with another program.  Early versions of PDP-7
Linux used the same process model as Tenex: one process per terminal which
alternated between running the shell and a user program.  So exec() loaded
the user program on top of the shell.  Indeed, this wasn't even a syscall;
the shell itself wrote a tiny program loader into the top of memory that
read the new program, which was open for reading, and jumped to it.
Likewise, exit() was a specialized exec() that reloaded the shell.  The
Tenex and Multics shells had more memory to play with and didn't have to
use these self-overlaying tricks[*]: they loaded your program into
available memory and called it as a subroutine, which accounts for the name
"shell".

So it was the introduction of fork(), which came from the Berkeley Genie
OS, that made the current process control regime possible.  In those days,
fork() wrote the current process out to the swapping disk and set up the
process table with a new entry.  For efficiency, the in-memory version
became the child and the swapped-out version became the parent.  Instantly
the shell was able to run background processes by just not waiting for
them, and pipelines (once the syntax was invented) could be handled with N
- 1 processes in an N-stage pipeline.  Huge new powers landed on the user's
head.

Nowadays it's a question whether fork() makes sense any more.   "A fork()
in the road" [Baumann et al. 2019] <
https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf>
is an interesting argument against fork():

* It doesn't compose.
* It is insecure by default.
* It is slow (there are about 25 properties a process has in addition to
its memory and hardware state, and each of these needs to be copied or not)
even using COW (which is itself a Good Thing and can and should be provided
separately)
* It is incompatible with a single-address-space design.

In short, spawn() beats fork() like a drum, and fork() should be
deprecated. To be sure, the paper comes out of Microsoft Research, but I
find it pretty compelling anyway.

[*] My very favorite self-overlaying program was the PDP-8 bootstrap for
the DF32 disk drive.  You toggled in two instructions at locations 30 and
31 meaning "load disk registers and go" and "jump to self" respectively,
hit the Clear key on the front panel, which cleared all registers, and
started up at 30.

The first instruction told the disk to start reading sector 0 of the disk
into location 0 in memory (because all the registers were 0, including the
disk instruction register where 0 = READ) and the second instruction kept
the CPU busy waiting.  As the sector loaded,  the two instructions were
overwritten by "skip if disk ready" and "jump to previous address", which
would wait until the whole sector was loaded.  Then the OS could be loaded
using the primitive disk driver in block 0.

[-- Attachment #2: Type: text/html, Size: 5905 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-01 23:36         ` John Cowan
@ 2021-08-01 23:49           ` Larry McVoy
  2021-08-02  0:28             ` Larry McVoy
  2021-08-01 23:58           ` Dan Cross
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 72+ messages in thread
From: Larry McVoy @ 2021-08-01 23:49 UTC (permalink / raw)
  To: John Cowan; +Cc: TUHS main list

On Sun, Aug 01, 2021 at 07:36:53PM -0400, John Cowan wrote:
> Nowadays it's a question whether fork() makes sense any more.   "A fork()
> in the road" [Baumann et al. 2019] <
> https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf>
> is an interesting argument against fork():
> 
> * It doesn't compose.
> * It is insecure by default.
> * It is slow (there are about 25 properties a process has in addition to
> its memory and hardware state, and each of these needs to be copied or not)
> even using COW (which is itself a Good Thing and can and should be provided
> separately)
> * It is incompatible with a single-address-space design.
> 
> In short, spawn() beats fork() like a drum, and fork() should be
> deprecated. To be sure, the paper comes out of Microsoft Research, but I
> find it pretty compelling anyway.

When we were working on supporting BitKeeper on Windows, MacOS, all the
various Unix versions, and Linux, we implemented all the needed libc
stuff on Windows (so we could pretend we were not running on Windows).
Everything except fork(), we made a spawnvp() interface.  That's the
one thing that made more sense than the Unix way.  I have called fork()
directly in decades.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-01 23:36         ` John Cowan
  2021-08-01 23:49           ` Larry McVoy
@ 2021-08-01 23:58           ` Dan Cross
  2021-08-02  0:29             ` Steve Nickolas
  2021-08-02  0:13           ` Andrew Warkentin
  2021-08-02 17:37           ` Lars Brinkhoff
  3 siblings, 1 reply; 72+ messages in thread
From: Dan Cross @ 2021-08-01 23:58 UTC (permalink / raw)
  To: John Cowan; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 4091 bytes --]

On Sun, Aug 1, 2021 at 7:37 PM John Cowan <cowan@ccil.org> wrote:

> On Sun, Aug 1, 2021 at 5:55 PM Dan Cross <crossd@gmail.com> wrote:
>
>> Looking at other systems that were available roughly around the time of
>> Unix (TENEX, Multics), it strikes me that the Unix was a bit of an odd-duck
>> with the way it handled exec in terms of destructively overlaying the
>> memory of the user portion of a process with a new image; am I wrong here?
>>
>
> See dmr's paper at <https://www.bell-labs.com/usr/dmr/www/hist.html> for
> details, but in short exec and its equivalents elsewhere have always
> overlaid the running program with another program.
>

That's a great paper and I've really enjoyed revisiting it over the years,
but while it does a great job of explaining how the Unix mechanism worked,
and touches on the "why", it doesn't contrast with other schemes. I suppose
my question could be rephrased as, if the early Unix implementers had had
more resources to work with, would they have chosen a model more along the
lines used by Multics and Twenex, or would they have elected to do
basically what they did? That's probably impossible to answer, but gets at
what they thought about how other systems operated.

Early versions of PDP-7 Linux used the same process model as Tenex: one
> process per terminal which alternated between running the shell and a user
> program.  So exec() loaded the user program on top of the shell.  Indeed,
> this wasn't even a syscall; the shell itself wrote a tiny program loader
> into the top of memory that read the new program, which was open for
> reading, and jumped to it. Likewise, exit() was a specialized exec() that
> reloaded the shell.  The Tenex and Multics shells had more memory to play
> with and didn't have to use these self-overlaying tricks[*]: they loaded
> your program into available memory and called it as a subroutine, which
> accounts for the name "shell".
>

Presumably the virtual memory hardware could also be used to protect the
shell from a malicious or errant program trashing the image of the shell in
memory.

[snip]
> Nowadays it's a question whether fork() makes sense any more.   "A fork()
> in the road" [Baumann et al. 2019] <
> https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf>
> is an interesting argument against fork():
>
> * It doesn't compose.
> * It is insecure by default.
> * It is slow (there are about 25 properties a process has in addition to
> its memory and hardware state, and each of these needs to be copied or not)
> even using COW (which is itself a Good Thing and can and should be provided
> separately)
> * It is incompatible with a single-address-space design.
>
> In short, spawn() beats fork() like a drum, and fork() should be
> deprecated. To be sure, the paper comes out of Microsoft Research, but I
> find it pretty compelling anyway.
>

Spawn vs fork/exec is a false dichotomy, though. We talked about the fork
paper when it came out, and here's what I wrote about it at the time:
https://minnie.tuhs.org/pipermail/tuhs/2019-April/017700.html

[*] My very favorite self-overlaying program was the PDP-8 bootstrap for
> the DF32 disk drive.  You toggled in two instructions at locations 30 and
> 31 meaning "load disk registers and go" and "jump to self" respectively,
> hit the Clear key on the front panel, which cleared all registers, and
> started up at 30.
>

> The first instruction told the disk to start reading sector 0 of the disk
> into location 0 in memory (because all the registers were 0, including the
> disk instruction register where 0 = READ) and the second instruction kept
> the CPU busy waiting.  As the sector loaded,  the two instructions were
> overwritten by "skip if disk ready" and "jump to previous address", which
> would wait until the whole sector was loaded.  Then the OS could be loaded
> using the primitive disk driver in block 0.
>

Very nice; that's highly reminiscent of a Sergeant-style forth:
https://pygmy.utoh.org/3ins4th.html

One wonders if the PDP-8 was one of Sergeant's inspirations?

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 7523 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-01 23:36         ` John Cowan
  2021-08-01 23:49           ` Larry McVoy
  2021-08-01 23:58           ` Dan Cross
@ 2021-08-02  0:13           ` Andrew Warkentin
  2021-08-02  0:18             ` John Cowan
                               ` (3 more replies)
  2021-08-02 17:37           ` Lars Brinkhoff
  3 siblings, 4 replies; 72+ messages in thread
From: Andrew Warkentin @ 2021-08-02  0:13 UTC (permalink / raw)
  To: TUHS main list

On 8/1/21, John Cowan <cowan@ccil.org> wrote:
>
> Nowadays it's a question whether fork() makes sense any more.   "A fork()
> in the road" [Baumann et al. 2019] <
> https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf>
> is an interesting argument against fork():
>
> * It doesn't compose.
> * It is insecure by default.
> * It is slow (there are about 25 properties a process has in addition to
> its memory and hardware state, and each of these needs to be copied or not)
> even using COW (which is itself a Good Thing and can and should be provided
> separately)
> * It is incompatible with a single-address-space design.
>
> In short, spawn() beats fork() like a drum, and fork() should be
> deprecated. To be sure, the paper comes out of Microsoft Research, but I
> find it pretty compelling anyway.
>
There's a third kind of primitive that is superior to either spawn()
or fork() IMO, specifically one that creates a completely empty child
process and returns a context that lets the parent set up the child's
state using normal APIs. To start the child the parent would either
call exec() to start the child running a different program, or call a
new function that starts the child with a parent-provided entry point
and whatever memory mappings the parent set up. Both fork() and
spawn() could be implemented on top of this easily enough with
basically no additional overhead compared to implementing both as
primitives. This is what I plan to do on the OS I'm writing
(manipulating the child's state won't require any additional
primitives beyond regular file I/O since literally all process state
will have a file-based interface).

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-02  0:13           ` Andrew Warkentin
@ 2021-08-02  0:18             ` John Cowan
  2021-08-02  0:54               ` Andrew Warkentin
  2021-08-02  1:04               ` Dan Cross
  2021-08-02  1:05             ` Theodore Ts'o
                               ` (2 subsequent siblings)
  3 siblings, 2 replies; 72+ messages in thread
From: John Cowan @ 2021-08-02  0:18 UTC (permalink / raw)
  To: Andrew Warkentin; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 722 bytes --]

On Sun, Aug 1, 2021 at 8:13 PM Andrew Warkentin <andreww591@gmail.com> wrote

To start the child the parent would either
> call exec() to start the child running a different program, or call a
> new function that starts the child with a parent-provided entry point
> and whatever memory mappings the parent set up.



> This is what I plan to do on the OS I'm writing
> (manipulating the child's state won't require any additional
> primitives beyond regular file I/O since literally all process state
> will have a file-based interface).
>

In that case you don't need *any* primitive except create_empty_process():
you can do exec() by opening the file, writing to /proc/<child>/mem and
then to <proc/<child>/pc-and-go.

[-- Attachment #2: Type: text/html, Size: 1471 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-01 23:49           ` Larry McVoy
@ 2021-08-02  0:28             ` Larry McVoy
  0 siblings, 0 replies; 72+ messages in thread
From: Larry McVoy @ 2021-08-02  0:28 UTC (permalink / raw)
  To: John Cowan; +Cc: TUHS main list

On Sun, Aug 01, 2021 at 04:49:50PM -0700, Larry McVoy wrote:
> On Sun, Aug 01, 2021 at 07:36:53PM -0400, John Cowan wrote:
> > Nowadays it's a question whether fork() makes sense any more.   "A fork()
> > in the road" [Baumann et al. 2019] <
> > https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf>
> > is an interesting argument against fork():
> > 
> > * It doesn't compose.
> > * It is insecure by default.
> > * It is slow (there are about 25 properties a process has in addition to
> > its memory and hardware state, and each of these needs to be copied or not)
> > even using COW (which is itself a Good Thing and can and should be provided
> > separately)
> > * It is incompatible with a single-address-space design.
> > 
> > In short, spawn() beats fork() like a drum, and fork() should be
> > deprecated. To be sure, the paper comes out of Microsoft Research, but I
> > find it pretty compelling anyway.
> 
> When we were working on supporting BitKeeper on Windows, MacOS, all the
> various Unix versions, and Linux, we implemented all the needed libc
> stuff on Windows (so we could pretend we were not running on Windows).
> Everything except fork(), we made a spawnvp() interface.  That's the
> one thing that made more sense than the Unix way.  I have called fork()
> directly in decades.

s/have/have not/ called fork()....

Sigh.

-- 
---
Larry McVoy            	     lm at mcvoy.com             http://www.mcvoy.com/lm 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-01 23:58           ` Dan Cross
@ 2021-08-02  0:29             ` Steve Nickolas
  0 siblings, 0 replies; 72+ messages in thread
From: Steve Nickolas @ 2021-08-02  0:29 UTC (permalink / raw)
  To: Dan Cross; +Cc: TUHS main list

On Sun, 1 Aug 2021, Dan Cross wrote:

> Spawn vs fork/exec is a false dichotomy, though. We talked about the fork
> paper when it came out, and here's what I wrote about it at the time:
> https://minnie.tuhs.org/pipermail/tuhs/2019-April/017700.html

I've often wished I could run a free/open Bourne-type shell on 16-bit 
MS-DOS and OS/2.  Porting to the former is next to impossible because of 
the lack of *any* concept of multitasking.  Porting to the latter is 
difficult because multitasking isn't done anything like the Unix way.

I actually like the spawn* functions better, though I think on Unix 
fork/exec is the most natural way to implement them.

-uso.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-02  0:18             ` John Cowan
@ 2021-08-02  0:54               ` Andrew Warkentin
  2021-08-02  1:04               ` Dan Cross
  1 sibling, 0 replies; 72+ messages in thread
From: Andrew Warkentin @ 2021-08-02  0:54 UTC (permalink / raw)
  To: John Cowan; +Cc: TUHS main list

On 8/1/21, John Cowan <cowan@ccil.org> wrote:
>
> In that case you don't need *any* primitive except create_empty_process():
> you can do exec() by opening the file, writing to /proc/<child>/mem and
> then to <proc/<child>/pc-and-go.
>
Yes, although that would break if the permissions for the program are
execute-only (which admittedly is of limited security value in most
cases).

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-02  0:18             ` John Cowan
  2021-08-02  0:54               ` Andrew Warkentin
@ 2021-08-02  1:04               ` Dan Cross
  1 sibling, 0 replies; 72+ messages in thread
From: Dan Cross @ 2021-08-02  1:04 UTC (permalink / raw)
  To: John Cowan; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 2261 bytes --]

On Sun, Aug 1, 2021 at 8:19 PM John Cowan <cowan@ccil.org> wrote:

> On Sun, Aug 1, 2021 at 8:13 PM Andrew Warkentin <andreww591@gmail.com>
> wrote
>
>> To start the child the parent would either
>> call exec() to start the child running a different program, or call a
>> new function that starts the child with a parent-provided entry point
>> and whatever memory mappings the parent set up.
>
>
>
>> This is what I plan to do on the OS I'm writing
>> (manipulating the child's state won't require any additional
>> primitives beyond regular file I/O since literally all process state
>> will have a file-based interface).
>>
>
> In that case you don't need *any* primitive except create_empty_process():
> you can do exec() by opening the file, writing to /proc/<child>/mem and
> then to <proc/<child>/pc-and-go.
>

Sadly, that's not _quite_ true. You still need some primordial way to get
the system going.

Once you have a proc_create and a make_proc_runnable system call, it seems
like it opens the door to doing all kinds of cool things like moving binary
parsers out of the kernel and into user space, but consider how `init` gets
bootstrapped: often, there's a handful of instructions that basically
invoke `execl("/bin/init", "init", 0);` that you compile into the kernel;
on creation of process 1, the kernel copies those instructions into a page
somewhere in user portion of the address space and "returns" to it; the
process then invokes /bin/init which carries on with bringing up the rest
of the system.

Now you're confronted with two choices: you either put a much more
elaborate bootstrap into the kernel (in this day and age, probably not that
hard), or you have a minimal bootstrap that's smart enough to load a
smarter bootstrap that in turn can load something like init. I suppose a
third option is to compile `init` in some dead simple way that you can load
in the kernel as a special case, and invoke that. This problem isn't
insurmountable, but it's a wide design space, and it's not quite as
straight-forward as it first appears.

As I mentioned in the email I linked to earlier, Akaros implemented the
proc_create/proc_run model. It really was superior to fork()/exec() and I
would argue superior to spawn() as well.

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 3539 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-02  0:13           ` Andrew Warkentin
  2021-08-02  0:18             ` John Cowan
@ 2021-08-02  1:05             ` Theodore Ts'o
  2021-08-02  2:10               ` Andrew Warkentin
  2021-08-02  2:32               ` Bakul Shah
  2021-08-02 17:33             ` Lars Brinkhoff
  2021-09-28 17:46             ` Greg A. Woods
  3 siblings, 2 replies; 72+ messages in thread
From: Theodore Ts'o @ 2021-08-02  1:05 UTC (permalink / raw)
  To: Andrew Warkentin; +Cc: TUHS main list

On Sun, Aug 01, 2021 at 06:13:18PM -0600, Andrew Warkentin wrote:
> There's a third kind of primitive that is superior to either spawn()
> or fork() IMO, specifically one that creates a completely empty child
> process and returns a context that lets the parent set up the child's
> state using normal APIs.

I've seen this argument a number of times, but what's never been clear
to me is what *would* the "normal APIs" be which would allow a parent
to set up the child's state?  How would that be accomplished?  Lots of
new system calls?  Magic files in /proc/<pid>/XXX which get
manipulated somehow?  (How, exactly, does one affect the child's
memory map via magic read/write calls to /proc/<pid>/XXX....  How
about environment variables, etc.)

And what are the access rights by which a process gets to reach out
and touch another process's environment?  Is it only allowed only for
child processes?  And is it only allowed before the child starts
running?  What if the child process is going to be running a setuid or
setgid executable?

The phrase "all process state will have a file-based interface" sounds
good on paper, but I think it remains to be seen how well a "echo XXX
> /proc/<pid>/magic-file" API would actually work.  The devil is
really in the details....

					- Ted

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-02  1:05             ` Theodore Ts'o
@ 2021-08-02  2:10               ` Andrew Warkentin
  2021-08-02  2:32               ` Bakul Shah
  1 sibling, 0 replies; 72+ messages in thread
From: Andrew Warkentin @ 2021-08-02  2:10 UTC (permalink / raw)
  To: TUHS main list

On 8/1/21, Theodore Ts'o <tytso@mit.edu> wrote:

>
> I've seen this argument a number of times, but what's never been clear
> to me is what *would* the "normal APIs" be which would allow a parent
> to set up the child's state?  How would that be accomplished?  Lots of
> new system calls?  Magic files in /proc/<pid>/XXX which get
> manipulated somehow?  (How, exactly, does one affect the child's
> memory map via magic read/write calls to /proc/<pid>/XXX....  How
> about environment variables, etc.)
>

My OS will be microkernel-based and even the RPC channel to the VFS
itself will be a file (with some special semantics). read(), write()
and seek() will bypass the VFS entirely and call the kernel to
directly communicate with the destination process. The call to create
an empty process will return a new RPC channel and there will be an
API to temporarily switch to an alternate channel so that VFS calls
occur in the child context instead of the parent.

All process memory, even the heap and stack, will be implemented as
memory-mapped files in a per-process filesystem under /proc/<pid>.
This will be a special "shadowfs" that allows creating files that
shadow ranges of other files (either on disk or in memory).

Environment variables will also be exposed in /proc of course.
>
> And what are the access rights by which a process gets to reach out
> and touch another process's environment?  Is it only allowed only for
> child processes?  And is it only allowed before the child starts
> running?  What if the child process is going to be running a setuid or
> setgid executable?
>

Any process that has permissions to access the RPC channel file and
memory mapping shadow files in /proc/<pid> will be able to manipulate
the state. The RPC channel will cease to function after the child has
been started. setuid and setgid executables will not be supported at
all (there will instead be a role-based access control system layered
on top of a per-process file permission list, which will allow
privilege escalation on exec in certain situations defined by
configuration).

>
> The phrase "all process state will have a file-based interface" sounds
> good on paper, but I think it remains to be seen how well a "echo XXX
>> /proc/<pid>/magic-file" API would actually work.  The devil is
> really in the details....
>

Even though everything will use a file-based implementation
underneath, there will be a utility library layered on top of it so
that user code doesn't have to contain lots of
open()-read()/write()-close() boilerplate.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-02  1:05             ` Theodore Ts'o
  2021-08-02  2:10               ` Andrew Warkentin
@ 2021-08-02  2:32               ` Bakul Shah
  1 sibling, 0 replies; 72+ messages in thread
From: Bakul Shah @ 2021-08-02  2:32 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: TUHS main list

On Aug 1, 2021, at 6:05 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> 
> On Sun, Aug 01, 2021 at 06:13:18PM -0600, Andrew Warkentin wrote:
>> There's a third kind of primitive that is superior to either spawn()
>> or fork() IMO, specifically one that creates a completely empty child
>> process and returns a context that lets the parent set up the child's
>> state using normal APIs.
> 
> I've seen this argument a number of times, but what's never been clear
> to me is what *would* the "normal APIs" be which would allow a parent
> to set up the child's state?  How would that be accomplished?  Lots of
> new system calls?  Magic files in /proc/<pid>/XXX which get
> manipulated somehow?  (How, exactly, does one affect the child's
> memory map via magic read/write calls to /proc/<pid>/XXX....  How
> about environment variables, etc.)
> 
> And what are the access rights by which a process gets to reach out
> and touch another process's environment?  Is it only allowed only for
> child processes?  And is it only allowed before the child starts
> running?  What if the child process is going to be running a setuid or
> setgid executable?

From the "KeyKOS Nanokernel Architecture" (1992) paper:
----
KeyKOS processes are created by building a segment that will
become the program address space, obtaining a fresh domain,
and inserting the segment key in the domain's address slot.
The domain is created in the waiting state, which means that
it is waiting for a message. A threads paradigm can be
supported by having two or more domains share a common
address space segment.

Because domain initialization is such a common operation,
KeyKOS provides a mechanism to generate "prepackaged"
domains. A factory is an entity that constructs other
domains. Every factory creates a particular type of domain.
For example, the queue factory creates domains that provide
queuing services.  An important aspect of factories is the
ability of the client to determine their trustworthiness. It
is possible for a client to determine whether an object
created by a factory is secure.
----
This paper also talks about their attempt to emulate Unix on
top.

http://css.csail.mit.edu/6.858/2009/readings/keykos.pdf

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-02  0:13           ` Andrew Warkentin
  2021-08-02  0:18             ` John Cowan
  2021-08-02  1:05             ` Theodore Ts'o
@ 2021-08-02 17:33             ` Lars Brinkhoff
  2021-09-28 17:46             ` Greg A. Woods
  3 siblings, 0 replies; 72+ messages in thread
From: Lars Brinkhoff @ 2021-08-02 17:33 UTC (permalink / raw)
  To: Andrew Warkentin; +Cc: TUHS main list

John Cowan wrote:
> Andrew Warkentin wrote:
> > There's a third kind of primitive that is superior to either spawn()
> > or fork() IMO, specifically one that creates a completely empty
> > child process and returns a context that lets the parent set up the
> > child's state using normal APIs.
> In that case you don't need *any* primitive except create_empty_process():
> you can do exec() by opening the file, writing to /proc/<child>/mem

That's almost exactly what what ITS does.  You open the USR: device and
get a file descriptor (not really, but close enough) into the child
process (inferior job).

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-01 23:36         ` John Cowan
                             ` (2 preceding siblings ...)
  2021-08-02  0:13           ` Andrew Warkentin
@ 2021-08-02 17:37           ` Lars Brinkhoff
  2021-08-02 18:52             ` Clem Cole
  3 siblings, 1 reply; 72+ messages in thread
From: Lars Brinkhoff @ 2021-08-02 17:37 UTC (permalink / raw)
  To: John Cowan; +Cc: TUHS main list

John Cowan wrote:
> Early versions of PDP-7 [Unix] used the same process model as Tenex

I understand both Tenex and Unix got the concept of "fork" from Project
Genie.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-02 17:37           ` Lars Brinkhoff
@ 2021-08-02 18:52             ` Clem Cole
  2021-08-02 20:59               ` John Cowan
  2021-08-02 21:13               ` Clem Cole
  0 siblings, 2 replies; 72+ messages in thread
From: Clem Cole @ 2021-08-02 18:52 UTC (permalink / raw)
  To: Lars Brinkhoff; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

On Mon, Aug 2, 2021 at 1:37 PM Lars Brinkhoff <lars@nocrew.org> wrote:

> John Cowan wrote:
> > Early versions of PDP-7 [Unix] used the same process model as Tenex
>
> I understand both Tenex and Unix got the concept of "fork" from Project
> Genie.


Should be required reading of all intro to OS students:

Programming Semantics for Multiprogrammed Computations

Jack B. Dennis and Earl C. Van Horn

Massachusetts Institute of Technology, Cambridge, Massachusetts

Volume 9 / Number 3 / March, 1966

[If your Internet search fails you and/or you find it behind an ACM paywall
or the like, drop me a line, I'll forward a PDF of a scan].


ᐧ
ᐧ

[-- Attachment #2: Type: text/html, Size: 3266 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-02 18:52             ` Clem Cole
@ 2021-08-02 20:59               ` John Cowan
  2021-08-02 21:06                 ` Al Kossow
  2021-08-02 21:14                 ` Clem Cole
  2021-08-02 21:13               ` Clem Cole
  1 sibling, 2 replies; 72+ messages in thread
From: John Cowan @ 2021-08-02 20:59 UTC (permalink / raw)
  To: Clem Cole; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 893 bytes --]

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.6859&rep=rep1&type=pdf
works fine, no paywall.

On Mon, Aug 2, 2021 at 2:53 PM Clem Cole <clemc@ccc.com> wrote:

>
> On Mon, Aug 2, 2021 at 1:37 PM Lars Brinkhoff <lars@nocrew.org> wrote:
>
>> John Cowan wrote:
>> > Early versions of PDP-7 [Unix] used the same process model as Tenex
>>
>> I understand both Tenex and Unix got the concept of "fork" from Project
>> Genie.
>
>
> Should be required reading of all intro to OS students:
>
> Programming Semantics for Multiprogrammed Computations
>
> Jack B. Dennis and Earl C. Van Horn
>
> Massachusetts Institute of Technology, Cambridge, Massachusetts
>
> Volume 9 / Number 3 / March, 1966
>
> [If your Internet search fails you and/or you find it behind an ACM
> paywall or the like, drop me a line, I'll forward a PDF of a scan].
>
>
> ᐧ
> ᐧ
>

[-- Attachment #2: Type: text/html, Size: 3936 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-02 20:59               ` John Cowan
@ 2021-08-02 21:06                 ` Al Kossow
  2021-08-02 21:14                 ` Clem Cole
  1 sibling, 0 replies; 72+ messages in thread
From: Al Kossow @ 2021-08-02 21:06 UTC (permalink / raw)
  To: tuhs

On 8/2/21 1:59 PM, John Cowan wrote:

>         I understand both Tenex and Unix got the concept of "fork" from Project
>         Genie.

Original UCB source material on Genie can be found at bitsavers.



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-02 18:52             ` Clem Cole
  2021-08-02 20:59               ` John Cowan
@ 2021-08-02 21:13               ` Clem Cole
  1 sibling, 0 replies; 72+ messages in thread
From: Clem Cole @ 2021-08-02 21:13 UTC (permalink / raw)
  To: Lars Brinkhoff; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 853 bytes --]

Sorry bad cut/paste -- that was from CACM if its was not obvious
ᐧ

On Mon, Aug 2, 2021 at 2:52 PM Clem Cole <clemc@ccc.com> wrote:

>
> On Mon, Aug 2, 2021 at 1:37 PM Lars Brinkhoff <lars@nocrew.org> wrote:
>
>> John Cowan wrote:
>> > Early versions of PDP-7 [Unix] used the same process model as Tenex
>>
>> I understand both Tenex and Unix got the concept of "fork" from Project
>> Genie.
>
>
> Should be required reading of all intro to OS students:
>
> Programming Semantics for Multiprogrammed Computations
>
> Jack B. Dennis and Earl C. Van Horn
>
> Massachusetts Institute of Technology, Cambridge, Massachusetts
>
> Volume 9 / Number 3 / March, 1966
>
> [If your Internet search fails you and/or you find it behind an ACM
> paywall or the like, drop me a line, I'll forward a PDF of a scan].
>
>
> ᐧ
> ᐧ
>

[-- Attachment #2: Type: text/html, Size: 4061 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-02 20:59               ` John Cowan
  2021-08-02 21:06                 ` Al Kossow
@ 2021-08-02 21:14                 ` Clem Cole
  1 sibling, 0 replies; 72+ messages in thread
From: Clem Cole @ 2021-08-02 21:14 UTC (permalink / raw)
  To: John Cowan; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1021 bytes --]

excellent
ᐧ

On Mon, Aug 2, 2021 at 4:59 PM John Cowan <cowan@ccil.org> wrote:

>
> https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.6859&rep=rep1&type=pdf
> works fine, no paywall.
>
> On Mon, Aug 2, 2021 at 2:53 PM Clem Cole <clemc@ccc.com> wrote:
>
>>
>> On Mon, Aug 2, 2021 at 1:37 PM Lars Brinkhoff <lars@nocrew.org> wrote:
>>
>>> John Cowan wrote:
>>> > Early versions of PDP-7 [Unix] used the same process model as Tenex
>>>
>>> I understand both Tenex and Unix got the concept of "fork" from Project
>>> Genie.
>>
>>
>> Should be required reading of all intro to OS students:
>>
>> Programming Semantics for Multiprogrammed Computations
>>
>> Jack B. Dennis and Earl C. Van Horn
>>
>> Massachusetts Institute of Technology, Cambridge, Massachusetts
>>
>> Volume 9 / Number 3 / March, 1966
>>
>> [If your Internet search fails you and/or you find it behind an ACM
>> paywall or the like, drop me a line, I'll forward a PDF of a scan].
>>
>>
>> ᐧ
>> ᐧ
>>
>

[-- Attachment #2: Type: text/html, Size: 4711 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-08-02  0:13           ` Andrew Warkentin
                               ` (2 preceding siblings ...)
  2021-08-02 17:33             ` Lars Brinkhoff
@ 2021-09-28 17:46             ` Greg A. Woods
  2021-09-28 18:10               ` Larry McVoy
  2021-09-29 23:10               ` Phil Budne
  3 siblings, 2 replies; 72+ messages in thread
From: Greg A. Woods @ 2021-09-28 17:46 UTC (permalink / raw)
  To: The Unix Heritage Society mailing list

[-- Attachment #1: Type: text/plain, Size: 2705 bytes --]

[[ I'm digging through old mail -- my summer has been preoccupied by
things that kept me from most everything else, including computing. ]]

At Sun, 1 Aug 2021 18:13:18 -0600, Andrew Warkentin <andreww591@gmail.com> wrote:
Subject: Re: [TUHS] Systematic approach to command-line interfaces
>
> There's a third kind of primitive that is superior to either spawn()
> or fork() IMO, specifically one that creates a completely empty child
> process and returns a context that lets the parent set up the child's
> state using normal APIs.

That's actually what fork(2) is, effectively -- it sets up a new process
that then effectively has control over its own destiny, but only by
using code supplied by the parent process, and thus it is also working
within the limits of the Unix security model.

The fact that fork() happens to also do some of the general setup useful
in a unix-like system is really just a merely a convenience -- you
almost always want all those things to be done anyway.

I agree there is some messiness introduced in more modern environments,
especially w.r.t. threads, but there is now general consensus on how to
handle such things.

I'll also note here instead of in a separate message that Ted's followup
questions about the API design and security issues with having the
parent process have to do all the setup from its own context are exactly
the problems that fork() solves -- the elegance of fork() is incredible!
You just have to look at it the right way around, and with the Unix
security model firmly in mind.

I personally find spawn() to be the spawn of the devil, worse by a
million times than any alternative, including the Multics process model
(which could have made very good use of threads to increase concurrency
in handling data pipelines, for example -- it was even proposed at the
time).  Spawn() is narrow-minded, inelegant, and an antique by design.

I struggled for a very long time as an undergrad to understand the
Multics process model, but now that I know more about hypervisors
(i.e. the likes of Xen) it makes perfect sense to me.

I now struggle with liking the the Unix concept of "everything is a
file" -- especially with respect to actual data files.  Multics also got
it right to use single-level storage -- that's the right abstraction for
almost everything, i.e. except some forms of communications (for which
Multics I/O was a very clever and elegant design).  The "unix" nod to
single level storage by way of mmap() suffers from horribly bad design
and neglect.

--
					Greg A. Woods <gwoods@acm.org>

Kelowna, BC     +1 250 762-7675           RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>     Avoncote Farms <woods@avoncote.ca>

[-- Attachment #2: OpenPGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-09-28 17:46             ` Greg A. Woods
@ 2021-09-28 18:10               ` Larry McVoy
  2021-09-29 16:40                 ` Greg A. Woods
  2021-09-29 23:10               ` Phil Budne
  1 sibling, 1 reply; 72+ messages in thread
From: Larry McVoy @ 2021-09-28 18:10 UTC (permalink / raw)
  To: The Unix Heritage Society mailing list

On Tue, Sep 28, 2021 at 10:46:25AM -0700, Greg A. Woods wrote:
> The "unix" nod to
> single level storage by way of mmap() suffers from horribly bad design
> and neglect.

I supported Xerox PARC when they were redoing their OS as a user space
application on SunOS 4.x.  They used mmap() and protections to take
user level page faults.  Yeah, there were bugs but that was ~30 years
ago.

In more recent times, BitKeeper used mmap() and protections to take the
same page faults (we implemented a compressed, XORed file storage that
filled in "pages" on demand, it was a crazy performance improvement)
and that worked on pretty much every Unix we tried it on.  Certainly
worked on Linux first try.

So what is it about mmap you don't like?  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-09-28 18:10               ` Larry McVoy
@ 2021-09-29 16:40                 ` Greg A. Woods
  2021-09-29 16:57                   ` Larry McVoy
  0 siblings, 1 reply; 72+ messages in thread
From: Greg A. Woods @ 2021-09-29 16:40 UTC (permalink / raw)
  To: The Unix Heritage Society mailing list

[-- Attachment #1: Type: text/plain, Size: 2163 bytes --]

At Tue, 28 Sep 2021 11:10:16 -0700, Larry McVoy <lm@mcvoy.com> wrote:
Subject: Re: [TUHS] Systematic approach to command-line interfaces
>
> On Tue, Sep 28, 2021 at 10:46:25AM -0700, Greg A. Woods wrote:
> > The "unix" nod to
> > single level storage by way of mmap() suffers from horribly bad design
> > and neglect.
>
> So what is it about mmap you don't like?

Mmap() as we have it today almost completely ignores the bigger picture
and the lessons that came before it.

It was an add-on hack that basically said only "Oh, Yeah, we can do that
too!  Look at this." -- and nobody bothered to look for decades.

For one it has no easy direct language support (though it is possible in
C to pretend to use it directly, though the syntax often gets cumbersome).

Single-level-storage was obviously designed into Multics from the
beginning and from the ground up, and it was easily used in the main
languages supported on Multics -- but it was just an add-on hack in Unix
(that, if memory serves me correctly, was initially only poorly used in
another extremely badly designed add-on hack that didn't pay any
attention whatsoever to past lessons, i.e. dynamic linking. which to
this day is a horror show of inefficiencies and bad hacks).

I think perhaps the problem was that mmap() came too soon in a narrow
sub-set of the Unix implementations that were around at the time, when
many couldn't support it well (especially on 32-bit systems -- it really
only becomes universally useful with either segments or 64-bit and
larger address spaces).  The fracturing of "unix" standards at the time
didn't help either.

Perhaps these "add-on hack" problems are the reason so many people think
fondly of the good old Unix versions where everything was still coming
from a few good minds that could work together to build a cohesive
design.  The add-ons were poorly done, not widely implemented, and
usually incompatible with each other when they were adopted by
additional implementations.

--
					Greg A. Woods <gwoods@acm.org>

Kelowna, BC     +1 250 762-7675           RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>     Avoncote Farms <woods@avoncote.ca>

[-- Attachment #2: OpenPGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-09-29 16:40                 ` Greg A. Woods
@ 2021-09-29 16:57                   ` Larry McVoy
  2021-09-30 17:31                     ` Greg A. Woods
  0 siblings, 1 reply; 72+ messages in thread
From: Larry McVoy @ 2021-09-29 16:57 UTC (permalink / raw)
  To: The Unix Heritage Society mailing list

On Wed, Sep 29, 2021 at 09:40:23AM -0700, Greg A. Woods wrote:
> At Tue, 28 Sep 2021 11:10:16 -0700, Larry McVoy <lm@mcvoy.com> wrote:
> Subject: Re: [TUHS] Systematic approach to command-line interfaces
> >
> > On Tue, Sep 28, 2021 at 10:46:25AM -0700, Greg A. Woods wrote:
> > > The "unix" nod to
> > > single level storage by way of mmap() suffers from horribly bad design
> > > and neglect.
> >
> > So what is it about mmap you don't like?
> 
> Mmap() as we have it today almost completely ignores the bigger picture
> and the lessons that came before it.
> 
> It was an add-on hack that basically said only "Oh, Yeah, we can do that
> too!  Look at this." -- and nobody bothered to look for decades.
> 
> For one it has no easy direct language support (though it is possible in
> C to pretend to use it directly, though the syntax often gets cumbersome).
> 
> Single-level-storage was obviously designed into Multics from the
> beginning and from the ground up, and it was easily used in the main
> languages supported on Multics -- but it was just an add-on hack in Unix
> (that, if memory serves me correctly, was initially only poorly used in
> another extremely badly designed add-on hack that didn't pay any
> attention whatsoever to past lessons, i.e. dynamic linking. which to
> this day is a horror show of inefficiencies and bad hacks).
> 
> I think perhaps the problem was that mmap() came too soon in a narrow
> sub-set of the Unix implementations that were around at the time, when
> many couldn't support it well (especially on 32-bit systems -- it really
> only becomes universally useful with either segments or 64-bit and
> larger address spaces).  The fracturing of "unix" standards at the time
> didn't help either.

I think you didn't use SunOS 4.x.  mmap() was implemented correctly 
there, the 4.x VM system mostly got rid of the buffer cache (the
buffer cache was used only for reading directories and inodes, there
was no regular file data there).  If you read(2) a page and mmap()ed
it and then did a write(2) to the page, the mapped page is the same
physical memory as the write()ed page.  Zero coherency issues.

This was not true in other systems, they copied the page from the
buffer cache and had all sorts of coherency problems.  It took
about a decade for other Unix implementations to catch up and I
think that's what you are hung up on.

SunOS 4.x got it right.  You can read about it, I have all the papers
cached at http://mcvoy.com/lm/papers

ZFS screwed it all up again, ZFS has it's own cache because they weren't
smart enough to know how to make compressed file systems use the page
cache (we did it in BitKeeper so I have an existance proof that it is
possible).  I was deeply disapointed to hear that ZFS screwed up that
badly, the Sun I was part of would have NEVER even entertained such an
idea, they worked so hard to get a unified page cache.  It's just sad.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-09-28 17:46             ` Greg A. Woods
  2021-09-28 18:10               ` Larry McVoy
@ 2021-09-29 23:10               ` Phil Budne
  1 sibling, 0 replies; 72+ messages in thread
From: Phil Budne @ 2021-09-29 23:10 UTC (permalink / raw)
  To: tuhs

Greg A. Woods wrote:
> [[ I'm digging through old mail -- my summer has been preoccupied by
> things that kept me from most everything else, including computing. ]]
>
> At Sun, 1 Aug 2021 18:13:18 -0600, Andrew Warkentin <andreww591@gmail.com> wrote:
> Subject: Re: [TUHS] Systematic approach to command-line interfaces
> >
> > There's a third kind of primitive that is superior to either spawn()
> > or fork() IMO, specifically one that creates a completely empty child
> > process and returns a context that lets the parent set up the child's
> > state using normal APIs.
>
> That's actually what fork(2) is, effectively -- it sets up a new process
> that then effectively has control over its own destiny, but only by
> using code supplied by the parent process, and thus it is also working
> within the limits of the Unix security model.

The original post above made me think of the TENEX (later TOPS-20)
primatives for fork (a noun, aka process) control:

	SFORK -- create an empty fork/process (halted)
	GET -- map executable
	SFORK -- start fork
	HFORK -- halt a running fork
	KFORK -- kill a fork
	SPJFN -- set primary file job file numbers (stdin/stdout)
	SPLFK -- splice a fork into tree

TENEX, like UNIX was created with with knowledge of the Berkeley
Timesharing System (SDS 940) and MULTICS.  Like MULTICS, TENEX was
designed from square one as a VM system, and I believe the 4.2BSD
specified mmap call was inspired by the TENEX PMAP call (which can map
file pages into a process AND map process pages into a file, and map
process pages from another process).

The "halted" process state was also used when a user typed CTRL/C.  A
halted process could be debugged (either in-process, entering a newly
mapped debugger, or one already linked in, or out-of-process by
splicing a debugger into the process tree).  Threads were easily
implemented by mapping (selected pages of) the parent process (leaving
others copy-on-write, or zero-fill for thread-local stoage).

Starting on small machines (an 8KW PDP-7, and a (28KW?) PDP-11) UNIX
placed a premium on maximum usefulness in the minimum space.

The PDP-7 source we have implements fork (implemented, as on the
PDP-11 by swapping out the forking process) but not exec!

The Plan9 rfork unbundles traditional Unix descriptor and memory
inheritance behaviors.

For all the VM generality, a sore place (for me) in TENEX/TOPS-20, a
single file descriptor (job file number) was shared by all processes
in a login session ("job").  "Primary" input and output streams were
however per-process, but, ISTR there was nothing to stop another
process from closing a stream another process was using.

And like MULTICS, TENEX had byte-stream I/O, implemented day-one for
disk files (I'd have to look, but system code may well have
implemented it by issuing PMAP calls (monitor call code could invoke
monitor calls)), and most simple user programs used it, since it was
simpler to program than file mapping.

refs:
https://opost.com/tenex/tenex72.txt
https://www.opennet.ru/docs/BSD/design-44bsd-eng/x312.html
http://www.bitsavers.org/pdf/dec/pdp10/TOPS20/AA-4166E-TM_TOPS-20_Monitor_Calls_Reference_Ver_5_Dec82.pdf

P.S.
And on the ORIGINAL topic, TOPS-20 started with code from the TENEX
EXEC (shell) that implemented command completion and incremental, and
made it the COMND system call (tho it could well have been a shared library,
since almost all of the COMND code called other system calls to do the work).

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces
  2021-09-29 16:57                   ` Larry McVoy
@ 2021-09-30 17:31                     ` Greg A. Woods
  0 siblings, 0 replies; 72+ messages in thread
From: Greg A. Woods @ 2021-09-30 17:31 UTC (permalink / raw)
  To: The Unix Heritage Society mailing list

[-- Attachment #1: Type: text/plain, Size: 2928 bytes --]

At Wed, 29 Sep 2021 09:57:15 -0700, Larry McVoy <lm@mcvoy.com> wrote:
Subject: Re: [TUHS] Systematic approach to command-line interfaces
>
> On Wed, Sep 29, 2021 at 09:40:23AM -0700, Greg A. Woods wrote:
> > I think perhaps the problem was that mmap() came too soon in a narrow
> > sub-set of the Unix implementations that were around at the time, when
> > many couldn't support it well (especially on 32-bit systems -- it really
> > only becomes universally useful with either segments or 64-bit and
> > larger address spaces).  The fracturing of "unix" standards at the time
> > didn't help either.
>
> I think you didn't use SunOS 4.x.  mmap() was implemented correctly
> there, the 4.x VM system mostly got rid of the buffer cache (the
> buffer cache was used only for reading directories and inodes, there
> was no regular file data there).  If you read(2) a page and mmap()ed
> it and then did a write(2) to the page, the mapped page is the same
> physical memory as the write()ed page.  Zero coherency issues.

Implementation isn't really what I meant to talk directly about -- I
meant "integration", and especially integration outside the kernel.

> This was not true in other systems, they copied the page from the
> buffer cache and had all sorts of coherency problems.  It took
> about a decade for other Unix implementations to catch up and I
> think that's what you are hung up on.

Such implementation issues are just a smaller part of the problem,
though obviously they delayed the wider use of mmap() in such broken
implementations.

The fact wasn't even available at all on many kernel implementations at
the time (the way open(2), read(2), write(2), et al were/are), is
equally important too of course -- 3rd party software developers
effectively wouldn't use it because of this.

So, the main part of the problem to me is that mmap() wasn't designed
into any unix deprived or unix-like system coherently (i.e. including at
user level) (that I'm aware of).  It wasn't integrated into languages or
system libraries (stdio f*() functions could probably even have used it,
since I think that's how stdio was (or could have been) emulated on
Multics for the C compiler and libc).

It all reminds me of how horrible the socket(2)/send(2)/sendmsg(2) hack
is -- i.e. socket descriptors should have just been file descriptors,
opened with open(2).  I guess pipe(2) kind of started this mess, and
even Plan 9 didn't seem to do anything remarkable to address pipe
creation as being subtly different from just using open(2).  Maybe I'm
going to far with thinking pipe() could/should have just been a library
call that used open(2) internally, perhaps connecting the descriptors by
opening some kind of "cloning" device in the filesystem.

--
					Greg A. Woods <gwoods@acm.org>

Kelowna, BC     +1 250 762-7675           RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>     Avoncote Farms <woods@avoncote.ca>

[-- Attachment #2: OpenPGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-08-01 19:48 ` arnold
  2021-08-01 21:30   ` John Cowan
@ 2021-08-02 12:11   ` Steffen Nurpmeso
  1 sibling, 0 replies; 72+ messages in thread
From: Steffen Nurpmeso @ 2021-08-02 12:11 UTC (permalink / raw)
  To: arnold; +Cc: tuhs, douglas.mcilroy

arnold@skeeve.com wrote in
 <202108011948.171JmAcK001895@freefriends.org>:
 |Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote:
 |
 |> In realizing the white paper's desire to "have the parser
 |> provide the results to the program", it's likely that the mechanism
 |> will, like Yacc, go beyond parsing and invoke semantic actions
 |> as it identifies tree nodes.
 |
 |I have to admit that all this feels like overkill. Parsing options
 |is only a very small part of the real work that a program does.
 |
 |Speaking for myself, I want something simple and regular that will
 |get the job done and let me get on with the actual business of
 |my software.  A grammar just for command-line argument parsing feels
 |like the tail wagging the dog: not nearly enough ROI, at least
 |for me.
 |
 |I happen to like the getopt_long interface designed by the GNU
 |project. It's easy to learn, setup and use. Once it's in place
 |it's set and forget.

By coincidence just last week i stumbled over (actually searched
and fixed) an issue where that terrible command line resorting hit
me where i did not expect it.  Ie after changing aspects of
a scripts that affect variable content, where that string then
appended to a string constant and then evaluated-passed to
a program, where the variable content did never contain
a hyphen-minus initially, but after the rewrite.  So they saw
a leading hyphen-minus somewhere on the line and turned it into an
option.  (The fix was easy, just turn 'X'$Y into 'X'"$Y", it maybe
should have always been written like that, but it seemed
unnecessary at first.)

 |My two cents,

For C++/C i have always had my own one which can long options,
optionally relates long to short options, where the long ones also
can include a help string (all in one string, as in "debug;d;"
N_("identical to -Sdebug") and N_() expands to literal).

I agree with the other post that turning command lines into a tree
of nodes is great, but of course this is hard to define.  For
first level only yet (without support for multiplexer commands,
ie, commands where the first command chooses an actual command)
i have this for the mailer i maintain, for some commands already.
It is a pain to write things like the following by hand

  mx_CMD_ARG_DESC_SUBCLASS_DEF(call, 2, a_cmd_cad_call){
     {mx_CMD_ARG_DESC_SHEXP | mx_CMD_ARG_DESC_HONOUR_STOP,
        n_SHEXP_PARSE_TRIM_IFSSPACE}, /* macro name */
     {mx_CMD_ARG_DESC_SHEXP | mx_CMD_ARG_DESC_OPTION |
           mx_CMD_ARG_DESC_GREEDY | mx_CMD_ARG_DESC_HONOUR_STOP,
        n_SHEXP_PARSE_IFS_VAR | n_SHEXP_PARSE_TRIM_IFSSPACE} /* args */
  }mx_CMD_ARG_DESC_SUBCLASS_DEF_END;

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-08-01 19:48 ` arnold
@ 2021-08-01 21:30   ` John Cowan
  2021-08-02 12:11   ` Steffen Nurpmeso
  1 sibling, 0 replies; 72+ messages in thread
From: John Cowan @ 2021-08-01 21:30 UTC (permalink / raw)
  To: arnold; +Cc: The Eunuchs Hysterical Society, M Douglas McIlroy

[-- Attachment #1: Type: text/plain, Size: 1035 bytes --]

On Sun, Aug 1, 2021 at 3:48 PM <arnold@skeeve.com> wrote:

> I happen to like the getopt_long interface designed by the GNU
> project. It's easy to learn, setup and use. Once it's in place
> it's set and forget.
>

I agree, and what is more, I say, it is a grammar already, if a simple
one.  You declare what you accept and what's to be done, making it a DSL
expressed as an array of structs.

The only thing it lacks is that old getopt is a bag on the side rather than
being integrated: struct option should have an additional member "char
short_option", where '\0' means "no short option".  Given that feature and
three per-program values "progname" (argv[0] by default), "version", and
"usage_string", the --version and --help options can be processed inside
getopt itself.  I especially like that you pass per-option pointers saying
where to put the value, so no case statement required, just create some
global or local variables and pass in their addresses.  Automatic support
for "--nofoo" given "--foo" would be good as well.

[-- Attachment #2: Type: text/html, Size: 1870 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
  2021-08-01 18:17 Douglas McIlroy
@ 2021-08-01 19:48 ` arnold
  2021-08-01 21:30   ` John Cowan
  2021-08-02 12:11   ` Steffen Nurpmeso
  0 siblings, 2 replies; 72+ messages in thread
From: arnold @ 2021-08-01 19:48 UTC (permalink / raw)
  To: tuhs, douglas.mcilroy

Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote:

> In realizing the white paper's desire to "have the parser
> provide the results to the program", it's likely that the mechanism
> will, like Yacc, go beyond parsing and invoke semantic actions
> as it identifies tree nodes.

I have to admit that all this feels like overkill. Parsing options
is only a very small part of the real work that a program does.

Speaking for myself, I want something simple and regular that will
get the job done and let me get on with the actual business of
my software.  A grammar just for command-line argument parsing feels
like the tail wagging the dog: not nearly enough ROI, at least
for me.

I happen to like the getopt_long interface designed by the GNU
project. It's easy to learn, setup and use. Once it's in place
it's set and forget.

My two cents,

Arnold

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [TUHS] Systematic approach to command-line interfaces [ meta issues ]
@ 2021-08-01 18:17 Douglas McIlroy
  2021-08-01 19:48 ` arnold
  0 siblings, 1 reply; 72+ messages in thread
From: Douglas McIlroy @ 2021-08-01 18:17 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

I have considerable sympathy with the general idea of formally
specifying and parsing inputs. Langsec people make a strong case
for doing so. The white paper,"A systematic approach to modern
Unix-like command interfaces", proposes to "simplify parsing by
facilitating the creation of easy-to-use 'grammar-based' parsers".

I'm not clear on what is meant by "parser". A plain parser is a
beast that builds a parse tree according to a grammar. For most
standard Unix programs, the parse tree has two kinds of leaves:
non-options and options with k parameters. Getopt restricts
k to {0,1}.

Aside from fiddling with argc and argv, I see little difference
in working with a parse tree for arguments that could be
handled by getopt and working with using getopt directly.

A more general parser could handle more elaborate grammatic
constraints on options, for example, field specs in sort(1),
requirements on presence of options in tar(1), or representation
of multiple parameters in cut(1).

In realizing the white paper's desire to "have the parser
provide the results to the program", it's likely that the mechanism
will, like Yacc, go beyond parsing and invoke semantic actions
as it identifies tree nodes.

Pioneer Yaccification of some commands might be a worthy demo.

Doug

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2021-09-30 17:32 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-31 12:25 [TUHS] Systematic approach to command-line interfaces Michael Siegel
2021-07-31 13:05 ` Dan Halbert
2021-07-31 14:21 ` Adam Thornton
2021-07-31 14:25   ` Adam Thornton
2021-07-31 15:45 ` Richard Salz
2021-07-31 16:03   ` Clem Cole
2021-07-31 16:06     ` Richard Salz
2021-07-31 16:21       ` Clem Cole
2021-07-31 16:17     ` Clem Cole
2021-07-31 16:30       ` Dan Cross
2021-07-31 15:56 ` Paul Winalski
2021-07-31 16:19   ` Dan Cross
2021-08-01 17:44     ` Chet Ramey
2021-08-01 21:53       ` Dan Cross
2021-08-01 23:21         ` Chet Ramey
2021-08-01 23:36         ` John Cowan
2021-08-01 23:49           ` Larry McVoy
2021-08-02  0:28             ` Larry McVoy
2021-08-01 23:58           ` Dan Cross
2021-08-02  0:29             ` Steve Nickolas
2021-08-02  0:13           ` Andrew Warkentin
2021-08-02  0:18             ` John Cowan
2021-08-02  0:54               ` Andrew Warkentin
2021-08-02  1:04               ` Dan Cross
2021-08-02  1:05             ` Theodore Ts'o
2021-08-02  2:10               ` Andrew Warkentin
2021-08-02  2:32               ` Bakul Shah
2021-08-02 17:33             ` Lars Brinkhoff
2021-09-28 17:46             ` Greg A. Woods
2021-09-28 18:10               ` Larry McVoy
2021-09-29 16:40                 ` Greg A. Woods
2021-09-29 16:57                   ` Larry McVoy
2021-09-30 17:31                     ` Greg A. Woods
2021-09-29 23:10               ` Phil Budne
2021-08-02 17:37           ` Lars Brinkhoff
2021-08-02 18:52             ` Clem Cole
2021-08-02 20:59               ` John Cowan
2021-08-02 21:06                 ` Al Kossow
2021-08-02 21:14                 ` Clem Cole
2021-08-02 21:13               ` Clem Cole
2021-08-01 16:51   ` Michael Siegel
2021-08-01 17:31     ` Jon Steinhart
2021-07-31 16:41 ` Clem Cole
2021-07-31 17:41   ` John Cowan
2021-07-31 17:30 ` Anthony Martin
2021-07-31 17:46   ` John Cowan
2021-07-31 18:56   ` Michael Siegel
2021-07-31 19:41     ` Clem Cole
2021-07-31 21:30       ` Michael Siegel
2021-08-01 17:48     ` Chet Ramey
2021-08-01 19:23       ` Richard Salz
2021-08-01 23:26         ` Chet Ramey
2021-07-31 19:20 ` [TUHS] Systematic approach to command-line interfaces [ meta issues ] Jon Steinhart
2021-07-31 21:06   ` Richard Salz
2021-07-31 21:32     ` Jon Steinhart
2021-07-31 21:37       ` Richard Salz
2021-07-31 21:55         ` Jon Steinhart
2021-07-31 22:10       ` Warner Losh
2021-07-31 22:19         ` Larry McVoy
2021-07-31 22:20         ` Jon Steinhart
2021-07-31 23:26           ` Warner Losh
2021-07-31 23:41             ` Jon Steinhart
2021-07-31 22:04   ` Bakul Shah
2021-07-31 22:13     ` Larry McVoy
2021-07-31 22:14       ` Bakul Shah
2021-07-31 22:17         ` Bakul Shah
2021-07-31 22:16     ` Jon Steinhart
2021-07-31 22:20       ` Bakul Shah
2021-08-01 18:17 Douglas McIlroy
2021-08-01 19:48 ` arnold
2021-08-01 21:30   ` John Cowan
2021-08-02 12:11   ` Steffen Nurpmeso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).