Hello, I've recently started to implement a set of helper functions and procedures for parsing Unix-like command-line interfaces (i.e., POSIX + GNU-style long options, in this case) in Ada. While doing that, I learned that there is a better way to approach this problem – beyond using getopt(s) (which never really made sense to me) and having to write case statements in loops every time: Define a grammar, let a pre-built parser do the work, and have the parser provide the results to the program. Now, defining such a grammar requires a thoroughly systematic approach to the design of command-line interfaces. One problem with that is whether that grammar should allow for sub-commands. And that leads to the question of how task-specific tool sets should be designed. These seem to be a relatively new phenomenon in Unix-like systems that POSIX doesn't say anything about, as far as I can see. So, I've prepared a bit of a write-up, pondering on the pros and cons of two different ways of having task-specific tool sets (non-hierarchical command sets vs. sub-commands) that is available at https://www.msiism.org/files/doc/unix-like_command-line_interfaces.html I tend to think the sub-command approach is better. But I'm neither a UI nor a Unix expert and have no formal training in computer things. So, I thought this would be a good place to ask for comment (and get some historical perspective). This is all just my pro-hobbyist attempt to make some people's lives easier, especially mine. I mean, currently, the "Unix" command line is quite a zoo, and not in a positive sense. Also, the number of well-thought-out command-line interfaces doesn't seem to be a growing one. But I guess that could be changed by providing truly easy ways to make good interfaces. -- Michael
The "click" CLI parser for Python I think would be of interest to you: https://click.palletsprojects.com/. It has support for sub-commands and nesting. It's not grammar-based internally, as far as I know. Also I think PowerShell has some interesting concepts, though I've not looked at it in detail. Dan H. On 7/31/21 8:25 AM, Michael Siegel wrote: > Hello, > > I've recently started to implement a set of helper functions and > procedures for parsing Unix-like command-line interfaces (i.e., POSIX + > GNU-style long options, in this case) in Ada. > > While doing that, I learned that there is a better way to approach > this problem – beyond using getopt(s) (which never really made sense to > me) and having to write case statements in loops every time: Define a > grammar, let a pre-built parser do the work, and have the parser > provide the results to the program. > > Now, defining such a grammar requires a thoroughly systematic approach > to the design of command-line interfaces. One problem with that is > whether that grammar should allow for sub-commands. And that leads to > the question of how task-specific tool sets should be designed. These > seem to be a relatively new phenomenon in Unix-like systems that POSIX > doesn't say anything about, as far as I can see. > > So, I've prepared a bit of a write-up, pondering on the pros and cons > of two different ways of having task-specific tool sets > (non-hierarchical command sets vs. sub-commands) that is available at > > https://www.msiism.org/files/doc/unix-like_command-line_interfaces.html > > I tend to think the sub-command approach is better. But I'm neither a UI > nor a Unix expert and have no formal training in computer things. So, I > thought this would be a good place to ask for comment (and get some > historical perspective). > > This is all just my pro-hobbyist attempt to make some people's lives > easier, especially mine. I mean, currently, the "Unix" command line is > quite a zoo, and not in a positive sense. Also, the number of > well-thought-out command-line interfaces doesn't seem to be a growing > one. But I guess that could be changed by providing truly easy ways to > make good interfaces. > > > -- > Michael
[-- Attachment #1: Type: text/plain, Size: 1061 bytes --] On Jul 31, 2021, at 5:25 AM, Michael Siegel <msi@malbolge.net> wrote: While doing that, I learned that there is a better way to approach this problem – beyond using getopt(s) (which never really made sense to me) and having to write case statements in loops every time: Define a grammar, let a pre-built parser do the work, and have the parser provide the results to the program. I see that Dan Halbert beat me to mentioning "click." The trick with shell is that unless you write the parser in shell, which is going to be miserable, you’re doing it in a command in a subshell, and therefore your return values have to be a structured stream of bytes on stdout, which the parent shell is going to have to interpret. An eval-able shell fragment, where you have a convention of what the variables you get from the option parser will be, is probably the easiest way, since from the parent that would look like: $(parse_my_opts $*) # Magic variables spring to life if [ “$OPT_SUBCOMMAND_0” == “burninate” ]; then …. Adam [-- Attachment #2: Type: text/html, Size: 1447 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1997 bytes --] Digressing a bit (but only a bit) talking about IPC: Powershell and CMS PIPELINES both take the approach of more structured pipelines, where pipe contents are not just streams of bytes but can be structured records. This offers a lot of power, but it also inhibits the ability to arbitrarily compose pipe stages, because you've effectively introduced a type system. On the other hand you can certainly argue that stream-of-bytes pipes ALSO introduce a type system, it's just a completely ad-hoc, undocumented, and fragile one that relies on the cooperation of both ends of the pipe to work at all, and you'd be right. In practice...well, I'd rather use stream-of-bytes, but I am more comfortable in Unix-like environments than Powershell, and my CMS PIPELINES skills are quite rusty now. On Sat, Jul 31, 2021 at 7:21 AM Adam Thornton <athornton@gmail.com> wrote: > > > On Jul 31, 2021, at 5:25 AM, Michael Siegel <msi@malbolge.net> wrote: > > While doing that, I learned that there is a better way to approach > this problem – beyond using getopt(s) (which never really made sense to > me) and having to write case statements in loops every time: Define a > grammar, let a pre-built parser do the work, and have the parser > provide the results to the program. > > > I see that Dan Halbert beat me to mentioning "click." > > The trick with shell is that unless you write the parser in shell, which > is going to be miserable, you’re doing it in a command in a subshell, and > therefore your return values have to be a structured stream of bytes on > stdout, which the parent shell is going to have to interpret. An eval-able > shell fragment, where you have a convention of what the variables you get > from the option parser will be, is probably the easiest way, since from the > parent that would look like: > > $(parse_my_opts $*) > # Magic variables spring to life > if [ “$OPT_SUBCOMMAND_0” == “burninate” ]; then …. > > Adam > [-- Attachment #2: Type: text/html, Size: 2651 bytes --]
[-- Attachment #1: Type: text/plain, Size: 70 bytes --] Look for "comnd jsys" that exact spelling. Source code is around. > [-- Attachment #2: Type: text/html, Size: 286 bytes --]
On 7/31/21, Michael Siegel <msi@malbolge.net> wrote:
>
> While doing that, I learned that there is a better way to approach
> this problem – beyond using getopt(s) (which never really made sense to
> me) and having to write case statements in loops every time: Define a
> grammar, let a pre-built parser do the work, and have the parser
> provide the results to the program.
This method for handling command lines dates back at least to the
1970s. The COMND JSYS (system call) in TOPS-20 operated this way, as
does the DCL command line interface in OpenVMS. As you pointed out it
can greatly simplify the code in the application. It also permits
command completion. If the command has a long-winded option, such as
-supercalifragilisticexpialidocious, I can type -super then hit the
TAB key and as long as there is only one option that starts with
-super the parser will fill in the rest of the long keyword. It also
means that you can provide interactive help. At any point the user
can type a question mark and the command interpreter will say what
syntactic element is expected next. The TOPS-20 COMND JSYS
implemented both of these features, and I think that command
completion was eventually added to the VMS command interpreter, too.
This method of command line parsing also enforces a degree of
uniformity of syntax between the command lines of the various
utilities and applications.
-Paul W.
[-- Attachment #1: Type: text/plain, Size: 225 bytes --] https://github.com/PDP-10/sri-nic/blob/master/files/fs/c/ccmd/ccmdmd.unx On Sat, Jul 31, 2021 at 11:46 AM Richard Salz <rich.salz@gmail.com> wrote: > Look for "comnd jsys" that exact spelling. Source code is around. > > >> [-- Attachment #2: Type: text/html, Size: 907 bytes --]
[-- Attachment #1: Type: text/plain, Size: 456 bytes --] You gave it away :) Half the fun is getting the search right. The old kermit program from Columbia has an implementation in portable (for its time) C. On Sat, Jul 31, 2021 at 12:03 PM Clem Cole <clemc@ccc.com> wrote: > https://github.com/PDP-10/sri-nic/blob/master/files/fs/c/ccmd/ccmdmd.unx > > On Sat, Jul 31, 2021 at 11:46 AM Richard Salz <rich.salz@gmail.com> wrote: > >> Look for "comnd jsys" that exact spelling. Source code is around. >> >> >>> [-- Attachment #2: Type: text/html, Size: 1473 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1584 bytes --] Sorry, hit return too soon. I remember an old AAUGN newsletter describing it. If I recall it was original done for kermit. The same idea is in tcsh also. Which came first, I don't remember. Cut/pasted from AAUGN Vol8 # 2 ----------------- CCMD: A Version of COMND in C *Andrew Lowry* *Howard Kaye * Columbia University CCMD is a general parsing mechanism for developing User Interfaces to programs. It is based on the functionality of TOP5.20's COMND Jsys. CCMD allows a program to parse for various field types (file names, user names, dates and times, keywords, numbers, arbitrary text, tokens, *etc*.). It is meant to supply a homogeneous user interface across a variety of machines and operating systems for C programs. It currently runs under System V UNIX, 4.2/4.3 BSD, Ultrix 1.2/2.0, and MSDOS. The library defines various default actions (user settable), and allows field completion, help, file indirection, comments, *etc*. on a per field basis. Future plans include command line editing, command history, and ports to other operating systems (such as VMS). CCMD is available for anonymous FTP from [CU20B.COLUMBIA.EDU]WS:<SOURCE.CCMD>*.* For further information, send mail to: info-ccmd-request@cu20b.columbia.edu seismo!columbia!cunixc!info-ccmd-request On Sat, Jul 31, 2021 at 12:03 PM Clem Cole <clemc@ccc.com> wrote: > https://github.com/PDP-10/sri-nic/blob/master/files/fs/c/ccmd/ccmdmd.unx > > On Sat, Jul 31, 2021 at 11:46 AM Richard Salz <rich.salz@gmail.com> wrote: > >> Look for "comnd jsys" that exact spelling. Source code is around. >> >> >>> [-- Attachment #2: Type: text/html, Size: 3746 bytes --]
[-- Attachment #1: Type: text/plain, Size: 2591 bytes --] On Sat, Jul 31, 2021 at 11:57 AM Paul Winalski <paul.winalski@gmail.com> wrote: > On 7/31/21, Michael Siegel <msi@malbolge.net> wrote: > > > > While doing that, I learned that there is a better way to approach > > this problem – beyond using getopt(s) (which never really made sense to > > me) and having to write case statements in loops every time: Define a > > grammar, let a pre-built parser do the work, and have the parser > > provide the results to the program. > > This method for handling command lines dates back at least to the > 1970s. The COMND JSYS (system call) in TOPS-20 operated this way, as > does the DCL command line interface in OpenVMS. As you pointed out it > can greatly simplify the code in the application. It also permits > command completion. If the command has a long-winded option, such as > -supercalifragilisticexpialidocious, I can type -super then hit the > TAB key and as long as there is only one option that starts with > -super the parser will fill in the rest of the long keyword. It also > means that you can provide interactive help. At any point the user > can type a question mark and the command interpreter will say what > syntactic element is expected next. The TOPS-20 COMND JSYS > implemented both of these features, and I think that command > completion was eventually added to the VMS command interpreter, too. > > This method of command line parsing also enforces a degree of > uniformity of syntax between the command lines of the various > utilities and applications. > There was someone posting here on TUHS a while back about leveraging a special context-sensitive `--shell-help` or similar command line program and synthesizing a protocol between the shell and a program to provide TOPS-20 like command completion. It was nowhere near what you get from the COMND JSYS, but seemed like a reasonable approximation. This is verging on COFF territory, but one of the reasons such a mechanism is unlike what you get from TOPS-20 is that, in that system, as soon as you type the name of a command, you're effectively running that command; the process model is quite different from that of Unix. With respect to command line handling in general, I think there are some attempts at making things more rational available in modern languages. Command line parsing packages for Go and the `clap` package for Rust come to mind ( https://rust-lang-nursery.github.io/rust-cookbook/cli/arguments.html). I've used clap recently in a few places and it's very convenient. - Dan C. [-- Attachment #2: Type: text/html, Size: 3208 bytes --]
[-- Attachment #1: Type: text/plain, Size: 768 bytes --] Sorry, I remembered it from AUUGN and checked those first, then searched for "CCMD Unix Columbia.edu" and got the hit on Lar's PDP-10 sources. and note the dyslexic spelling in my earlier email of AUUGN -- sigh. On Sat, Jul 31, 2021 at 12:06 PM Richard Salz <rich.salz@gmail.com> wrote: > You gave it away :) Half the fun is getting the search right. > > The old kermit program from Columbia has an implementation in portable > (for its time) C. > > On Sat, Jul 31, 2021 at 12:03 PM Clem Cole <clemc@ccc.com> wrote: > >> https://github.com/PDP-10/sri-nic/blob/master/files/fs/c/ccmd/ccmdmd.unx >> >> On Sat, Jul 31, 2021 at 11:46 AM Richard Salz <rich.salz@gmail.com> >> wrote: >> >>> Look for "comnd jsys" that exact spelling. Source code is around. >>> >>> >>>> [-- Attachment #2: Type: text/html, Size: 2331 bytes --]
Besides C-Kermit on Unix systems, the TOPS-20 command interface is used inside the mm mail client, which I've been using for decades on TOPS-20, VMS, and several flavors of Unix: http://www.math.utah.edu/pub/mm mm doesn't handle attachments, or do fancy display of HTML, and thus, cannot do anything nasty in response to incoming mail messages. I rarely need to extract an attachment, and I then save the message in a temporary file and run munpack on it. Here are some small snippets of its inline help: MM] read (messages) ? message number or range of message numbers, n:m or range of message numbers, n-m or range of message numbers, n+m (m messages beginning with n) or "." to specify the current message or "*" to specify the last message or message sequence, one of the following: after all answered before current deleted flagged from inverse keyword last longer new on previous-sequence recent seen shorter since subject text to unanswered undeleted unflagged unkeyword unseen or "," and another message sequence R] read (messages) flagged since yesterday [message(s) appear here] MM] headers (messages) since monday longer (than) 100000 [list of long messages here] ------------------------------------------------------------------------------- - Nelson H. F. Beebe Tel: +1 801 581 5254 - - University of Utah FAX: +1 801 581 4148 - - Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu - - 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org - - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ - -------------------------------------------------------------------------------
[-- Attachment #1: Type: text/plain, Size: 3123 bytes --] On Sat, Jul 31, 2021 at 12:18 PM Clem Cole <clemc@ccc.com> wrote: > Sorry, hit return too soon. I remember an old AAUGN newsletter > describing it. If I recall it was original done for kermit. The same > idea is in tcsh also. Which came first, I don't remember. Cut/pasted from > AAUGN Vol8 # 2 > Frank da Cruz wrote a very nice reminiscence of the DECSYSTEM-20s at Columbia that discusses the creation of CCMD as they decommissioned the PDP-10s and switched to Unix on VAXen (and then Suns). http://www.columbia.edu/kermit/dec20.html When I was a student, we were still given accounts on the CUNIX cluster; 64-bit SPARC machines running Solaris at the time. At the time, the actress Julia Styles was a student. One day, I was walking out of Mudd (the engineering building) with a friend of mine who suddenly grabbed my arm and said, "oh my god oh my god oh my god that's Julia Styles!" Being perpetually ignorant of popular culture, I had no idea who she was referring to confusedly thought she meant Julia Child, the late host of a cooking show. "...But I thought she was dead?" "No, Dan, that's Julia Child!" We decided to look up Ms Styles in the student directory, but being a celebrity she wasn't listed. However, one could still discover her "UNI" (login name) by grepping for her in the NIS password database. We did that and sent her an email: "Christy was too embarrassed to say hi to you and Dan thought you were Julia Child." Predictably, she did not respond. In retrospect, I idly wonder how many such emails she got, most presumably of the creepy variety, but we just thought ours was funny. It appears that CUNIX still exists: https://cuit.columbia.edu/unix - Dan C. ----------------- > CCMD: A Version of COMND in C > > *Andrew Lowry* > *Howard Kaye * > > Columbia University > > CCMD is a general parsing mechanism for developing User Interfaces to > programs. It is based on the functionality of TOP5.20's COMND Jsys. CCMD > allows a program to parse for various field types (file names, user names, > dates and times, keywords, numbers, arbitrary text, tokens, *etc*.). It > is meant to supply a homogeneous user interface across a variety of > machines and operating systems for C programs. It currently runs under > System V UNIX, 4.2/4.3 BSD, Ultrix 1.2/2.0, and MSDOS. The library defines > various default actions (user settable), and allows field completion, help, > file indirection, comments, *etc*. on a per field basis. Future plans > include command line editing, command history, and ports to other operating > systems (such as VMS). > > CCMD is available for anonymous FTP from > [CU20B.COLUMBIA.EDU]WS:<SOURCE.CCMD>*.* > > For further information, send mail to: > > info-ccmd-request@cu20b.columbia.edu > seismo!columbia!cunixc!info-ccmd-request > > > > On Sat, Jul 31, 2021 at 12:03 PM Clem Cole <clemc@ccc.com> wrote: > >> https://github.com/PDP-10/sri-nic/blob/master/files/fs/c/ccmd/ccmdmd.unx >> >> On Sat, Jul 31, 2021 at 11:46 AM Richard Salz <rich.salz@gmail.com> >> wrote: >> >>> Look for "comnd jsys" that exact spelling. Source code is around. >>> >>> >>>> [-- Attachment #2: Type: text/html, Size: 5626 bytes --]
[-- Attachment #1: Type: text/plain, Size: 2275 bytes --] On Sat, Jul 31, 2021 at 8:36 AM Michael Siegel <msi@malbolge.net> wrote: > Hello, > > I've recently started to implement a set of helper functions and > procedures for parsing Unix-like command-line interfaces (i.e., POSIX + > GNU-style long options, in this case) As an old guy, I am amused to read these words .. because UNIX did not have a command-line parsing standard., and I remember the wars. If you came from a system, where the program was exec's with the command line parameters pre-parsed (like the DEC world, ITS, and some others); UNIX seemed foreign and often consider 'bad' by folks. The biggest argument (which was reasonable) was Unix command, sometimes used 'keys' (like tp/tar and the like) and others used switches (cp, ed). Folks new to UNIX often b*tched as it being 'inconsistent (read things like the 'UNIX Haters Book'). I admit I was 'surprised' when I came there in the Fifth Edition in the mid-70s from the PDP-10 world, but as a programmer, I ended up really liking the fact that the command-line was not pre-parsed, other than white space removal and I did not have figure out some strange syntax for findnext() and other UUO/JSYS from my previous life. So by the late 70's early 80's, a number of different UNIX parsing schemes popped up. Like the stuff from Columbia Richard pointed out. TCL in some ways end result, which had a life that was useful, but fell away too eventually. The whole getopt(3) thing appeared originally inside of BTL. The first version I was was from USB (Summit), but I'm not sure they were the original authors. One problem was that it was tied up with later AT&T licenses [i.e. PWB or later] and was not in Research, the USENIX community lacked it. Thus when AT&T brought it to us to consider for POSIX.2, there was balking. The ISV's seemed to like it, but there was not a lot of support elsewhere. At some point, somebody in the USENIX community wrote a version and posted it to comp.unix.sources and some people began to use it. Of course, GNU had to take it and pee on it, so we got the long file name stuff. All in all, it's what's you are used I suspect. The AT&T whole getopt(3) thing works (I can deal with keys too BTW). I guess I just don't get excited about it, these days. Clem [-- Attachment #2: Type: text/html, Size: 4303 bytes --]
> On 7/31/21, Michael Siegel <msi@malbolge.net> wrote: > The TOPS-20 COMND JSYS implemented both of these features, and I think that command > completion was eventually added to the VMS command interpreter, too. FYI, There is also a unix version of the COMND JSYS capability. It was developed at Columbia University as part of their "mm" mail manager. It is located in to the ccmd subdirectory in the mm.tar.gz file. url: https://www.kermitproject.org/mm/ ftp://ftp.kermitproject.org/kermit/mm/mm.tar.gz -ron
Michael Siegel <msi@malbolge.net> once said:
> So, I've prepared a bit of a write-up, pondering on the pros and cons
> of two different ways of having task-specific tool sets
> (non-hierarchical command sets vs. sub-commands) that is available at
>
> https://www.msiism.org/files/doc/unix-like_command-line_interfaces.html
>
> I tend to think the sub-command approach is better. But I'm neither a UI
> nor a Unix expert and have no formal training in computer things. So, I
> thought this would be a good place to ask for comment (and get some
> historical perspective).
You're missing the approach taken in Plan 9 (and
10th edition Unix): put related commands in a
directory and use a shell that doesn't restrict
the first argument of a command to a single path
element.
This lets you execute commands like:
auth/as
disk/prep
git/rebase
ip/ping
ndb/dns
upas/send
without having a prefix on every command name or
single large binaries with every command linked
in as subcommands.
Cheers,
Anthony
[-- Attachment #1: Type: text/plain, Size: 634 bytes --] On Sat, Jul 31, 2021 at 12:41 PM Clem Cole <clemc@ccc.com> wrote: The biggest argument (which was reasonable) was Unix command, sometimes > used 'keys' (like tp/tar and the like) and others used switches (cp, ed). > That's modest (especially now that tar accepts "-" before the key) compared to remembering when to use -F, when to use -d, and when to use -t to specify the field separator, and when you are stuck without a field separator option; also what the default is with no option (I'd prefer "arbitrary amount whitespace" in all cases, but that's often not available at all). These inconsistencies still piss me off no end. [-- Attachment #2: Type: text/html, Size: 1469 bytes --]
[-- Attachment #1: Type: text/plain, Size: 632 bytes --] On Sat, Jul 31, 2021 at 1:39 PM Anthony Martin <ality@pbrane.org> wrote: > You're missing the approach taken in Plan 9 (and > 10th edition Unix): put related commands in a > directory and use a shell that doesn't restrict > the first argument of a command to a single path > element. > What that doesn't give you is the ability to say "git <git-options> diff <git-diff-options>", which is very nice and makes the inconsistencies I just posted on less likely. Fortunately, any getopt-variant can deal with these; you just have to pass the tail of argv and a suitably reduced value for argc to another call of the options parser. [-- Attachment #2: Type: text/html, Size: 1201 bytes --]
Am Sat, 31 Jul 2021 10:30:18 -0700
schrieb Anthony Martin <ality@pbrane.org>:
> Michael Siegel <msi@malbolge.net> once said:
> > So, I've prepared a bit of a write-up, pondering on the pros and
> > cons of two different ways of having task-specific tool sets
> > (non-hierarchical command sets vs. sub-commands) that is available
> > at
> >
> > https://www.msiism.org/files/doc/unix-like_command-line_interfaces.html
> >
> > I tend to think the sub-command approach is better. But I'm neither
> > a UI nor a Unix expert and have no formal training in computer
> > things. So, I thought this would be a good place to ask for comment
> > (and get some historical perspective).
>
> You're missing the approach taken in Plan 9 (and
> 10th edition Unix): put related commands in a
> directory and use a shell that doesn't restrict
> the first argument of a command to a single path
> element.
>
> This lets you execute commands like:
>
> auth/as
> disk/prep
> git/rebase
> ip/ping
> ndb/dns
> upas/send
>
> without having a prefix on every command name or
> single large binaries with every command linked
> in as subcommands.
Thanks for pointing this out. I had no idea.
Unfortunately(?), I'm looking to make my life easier on more "Unix-like
Unix-like systems" (for want of a better term), for the time being
(Linux, BSD, maybe illumos). (I mean, which shell would I use to
accomplish this on Unix?) And, as has already been pointed out, this
approach doesn't allow for global command options before sub-commands,
which pretty much defeats the sub-command approach altogether UI-wise,
I'd say.
Unrelated: I'm still having some technical difficulties with this list,
namely that I don't receive any mail sent to it. (I'm using the Web
archive to keep track of what's happening.) So, for me to be able to
reply to a particular message, it would also have to be sent directly
to me. Sorry for the inconvenience. The problem is already being
worked on.
--
Michael
[-- Attachment #1: Type: text/plain, Size: 3072 bytes --] On Sat, Jul 31, 2021 at 2:58 PM Michael Siegel <msi@malbolge.net> wrote: > I mean, which shell would I use to accomplish this on Unix? In the old days, when the first Unix shell wars started, there was a Unix adage: *"Bourne to Program, Type with Joy"* FWIW: tcsh supports TOPS-20 autocomplete -- a little work with your search engine, you can figure out how to use its many options. That said, the GNU bash is said to do it also, but I can not say I have tried it personally since the ROMS in my fingers were long ago burned to 'Type with Joy.' Also in 50 years, it's so much that UNIX is perfect, it has lots of flaws and quirks. Thinking about them and considering 'better' solutions is often wise, particularly when capabilities (like Moore's law) give you new tools to solve them. But a level of wisdom here is not all of those quirks are worth repairing. In the case of command-line parsing, getopt(3) has proven to be 'good enough' for most things. If it was really as bad as you seem to think, I suspect one of the previous N attempts over the last 50 years might have taken root. My point in my previous message was that getopt(3) was created to solve the original UNIX problem. It did actually take root (I'll not get into if the Gnu long stuff was an improvement). But there were other attempts, including the Tops-20 scheme (which has been pointed out is quite similar to yours) that have been around for at least 35 years in the UNIX community and it did not catch on. I ask you to think about if maybe your value of that feature might be more than others have set it to be. As an analog, when I first came to UNIX and C from other systems, ideas like the open curly brace/close curly brace instead of BEGIN/END in C, and there were plenty of things in Ken's original shell that I found annoying, particularly coming from the regularity of TOPS-20 and the like. Hey, I used EMACS, TECO and DDT and none of them were in my new kit. But I forced myself to learn the new tools and new way of doing things. Since I was programming on UNIX in C, I made sure my code looked like everyone else [K&R did not yet exist -- but we would later call this 'White Book C." Why? So someone else could read it. I learned that style too and frankly have a hard time with any C code that does not follow it today. But if I am writing in a BEGIN/END style language, I adopt that style. When in Rome and all that. In time, the wonderful things I could do in the UNIX world way outpaced what I could do in the old world. In fact, by the time either TECO or EMACS bacame available for my use by then on a Vax, I never switched off the earlier UNIX tools I had learned. Like I said, I 'Type with Joy", frankly even if I'm on a Mac, Linux or Windows -- I switch the shell to be tcsh. Could I learn a new shell, sure? If I were to switch today, it would probably be zsh, but my suggestion is to learn the tools that system has really well. They keep using them. Adapt to the style of the system you are using. Anyway, that my thoughts from an old guy. [-- Attachment #2: Type: text/html, Size: 5310 bytes --]
Am Sat, 31 Jul 2021 15:41:17 -0400 schrieb Clem Cole <clemc@ccc.com>: > On Sat, Jul 31, 2021 at 2:58 PM Michael Siegel <msi@malbolge.net> > wrote: > > > I mean, which shell would I use to accomplish this on Unix? > > In the old days, when the first Unix shell wars started, there was a > Unix adage: *"Bourne to Program, Type with Joy"* > FWIW: tcsh supports TOPS-20 autocomplete -- a little work with your > search engine, you can figure out how to use its many options. That > said, the GNU bash is said to do it also, but I can not say I have > tried it personally since the ROMS in my fingers were long ago burned > to 'Type with Joy.' I see. I currently use Bash as my shell most of the time, and I have my doubts about that being a good idea. But I also doubt I would like tcsh any more. I've had a bit of experience with it on FreeBSD once. All I can say is: We didn't get along when we first met, and we haven't met since. The one and only shell I know that is (arguably) both a traditional Unix shell and a huge improvement on the traditional Unix shell is rc, which I have recently begun to use on and off. I can see myself switching to that eventually, even though it lacks some features I've come to depend on. It's definitely non-standard. But I don't care about that very much because I believe it's objectively better, and considerably so. > Also in 50 years, it's so much that UNIX is perfect, it has lots of > flaws and quirks. Thinking about them and considering 'better' > solutions is often wise, particularly when capabilities (like Moore's > law) give you new tools to solve them. But a level of wisdom here is > not all of those quirks are worth repairing. In the case of > command-line parsing, getopt(3) has proven to be 'good enough' for > most things. If it was really as bad as you seem to think, I suspect > one of the previous N attempts over the last 50 years might have > taken root. > > My point in my previous message was that getopt(3) was created to > solve the original UNIX problem. It did actually take root (I'll not > get into if the Gnu long stuff was an improvement). But there were > other attempts, including the Tops-20 scheme (which has been pointed > out is quite similar to yours) that have been around for at least 35 > years in the UNIX community and it did not catch on. I ask you to > think about if maybe your value of that feature might be more than > others have set it to be. To me, using getopt/getopts has always felt more like a way to complicate parsing rather than solving any actual problem. My aim is to get around writing an actual parsing routine based on a half-backed set of rules each time I put together a command-line utility because that is time-consuming (for no good reason) and error-prone. I really find the TOPS-20 way of going about this inspiring, though I'd aim for something way more primitive that should indeed be good enough. And I'd want it to stay as close to the POSIX Utility Syntax Guidelines as reasonably possible because even though these are lacking, I find them a reasonable base to build upon. Also, experience tells me that merely adapting to what has taken root is quite often not a good idea at all. In fact, the reasons for something good and valuable not taking root might actually turn out to be pretty nasty. > As an analog, when I first came to UNIX and C from other systems, > ideas like the open curly brace/close curly brace instead of > BEGIN/END in C, and there were plenty of things in Ken's original > shell that I found annoying, particularly coming from the regularity > of TOPS-20 and the like. Hey, I used EMACS, TECO and DDT and none of > them were in my new kit. But I forced myself to learn the new tools > and new way of doing things. Since I was programming on UNIX in C, I > made sure my code looked like everyone else [K&R did not yet exist -- > but we would later call this 'White Book C." Why? So someone else > could read it. I learned that style too and frankly have a hard > time with any C code that does not follow it today. But if I am > writing in a BEGIN/END style language, I adopt that style. When in > Rome and all that. > > In time, the wonderful things I could do in the UNIX world way > outpaced what I could do in the old world. In fact, by the time > either TECO or EMACS bacame available for my use by then on a Vax, I > never switched off the earlier UNIX tools I had learned. Like I > said, I 'Type with Joy", frankly even if I'm on a Mac, Linux or > Windows -- I switch the shell to be tcsh. Could I learn a new shell, > sure? If I were to switch today, it would probably be zsh, but my > suggestion is to learn the tools that system has really well. They > keep using them. Adapt to the style of the system you are using. As you'll be able to guess by now, I beg to differ. For example, I have forced myself to learn POSIX shell and Bash, even enjoying some of it along the way. Today, I believe that they are both rather terrible things I don't want to spend too much time with. (That said, for my use case, Bash is almost always preferable over the available POSIX sh implementation.) Then, I have always had a strong dislike for the interface of the Unix `find` command. So, I tried to replace it with what I thought was a better solution (relatively). That required me to understand `find` on a whole different level. And after gaining a much better understanding of `find` (and losing some of my dislike for it), I still believe it should be replaced and have a few ideas on how to do that. (Sadly, I mainly just have ideas.) So, in a nutshell: I think that adapting to something that you believe to be more than slightly deficient after giving it a try and trying to understand its logic is not a reasonable thing to do. > Anyway, that my thoughts from an old guy. They're much appreciated. -- Michael
Am Sat, 31 Jul 2021 11:56:51 -0400
schrieb Paul Winalski <paul.winalski@gmail.com>:
> On 7/31/21, Michael Siegel <msi@malbolge.net> wrote:
> >
> > While doing that, I learned that there is a better way to approach
> > this problem – beyond using getopt(s) (which never really made
> > sense to me) and having to write case statements in loops every
> > time: Define a grammar, let a pre-built parser do the work, and
> > have the parser provide the results to the program.
>
> This method for handling command lines dates back at least to the
> 1970s. The COMND JSYS (system call) in TOPS-20 operated this way, as
> does the DCL command line interface in OpenVMS. As you pointed out it
> can greatly simplify the code in the application. It also permits
> command completion. If the command has a long-winded option, such as
> -supercalifragilisticexpialidocious, I can type -super then hit the
> TAB key and as long as there is only one option that starts with
> -super the parser will fill in the rest of the long keyword. It also
> means that you can provide interactive help. At any point the user
> can type a question mark and the command interpreter will say what
> syntactic element is expected next.
Being able to provide interactive help is exactly what the person who
suggested grammar-based parsing to me was working on. I hadn't even
thought about that at first. But given my recent investigation into
built-in command documentation on Unix-like systems, I tend to think
this would be a great enhancement – if it was implemented with a
strict focus on not getting in the way, i.e., the user should be able
to switch it off completely, and search-as-you-type would be opt-in, if
provided at all.
--
Michael
Michael Siegel writes:
> Being able to provide interactive help is exactly what the person who
> suggested grammar-based parsing to me was working on. I hadn't even
> thought about that at first. But given my recent investigation into
> built-in command documentation on Unix-like systems, I tend to think
> this would be a great enhancement – if it was implemented with a
> strict focus on not getting in the way, i.e., the user should be able
> to switch it off completely, and search-as-you-type would be opt-in, if
> provided at all.
>
>
> --
> Michael
While I agree with you in theory, I'm dubious about how it would work
in practice. Sorry if it sounds like I've lost faith with my profession
but I'm trying to accept reality.
Where would such documentation come from? Not wanting to reopen old
flame wars, but the fragmentation of the documentation system where some
things are man pages, some are info pages, some are random HTML files,
some are only online, some things having no documentation at all except
maybe a help message, and much of that have no actual content, is reality.
Who's going to write more documentation, and how is it going to be kept
consistent with other documentation? Is your help system going to be
yet another fragment?
Also, we seem to be well into a "Farenheight 451" world now where a vast
number of people communicate only by photos and video. Writing seems to
be deprecated. I don't have confidence that any documentation would be
useful even if people wrote it. As a current example, I'm having trouble
with a btrfs filesystem and I can't say that the btrfs-check manual page
contains any useful content.
Maybe since Microsoft "AI" is now going to write code for, not sure what
to call them, programmers doesn't seem right any more, maybe it'll write
their documentation too?
I guess what I'm saying is that it sounds like you are having some good
thoughts on a technical solution that I think will fail without also
having a social solution. If you could somehow extract your help info
from man pages without creating a whole new documentation system it
might work in the few cases where there are good manual pages.
Wow I sound grumpy this morning.
Jon
On 7/31/21 12:19 PM, Dan Cross wrote: > There was someone posting here on TUHS a while back about leveraging a > special context-sensitive `--shell-help` or similar command line program > and synthesizing a protocol between the shell and a program to provide > TOPS-20 like command completion. It was nowhere near what you get from the > COMND JSYS, but seemed like a reasonable approximation. This is essentially how the existing shells do it (bash, zsh, tcsh, etc.), but in an ad-hoc fashion. There is no standard way to obtain possible completions or list possible arguments, so the shells push that to external generators. Since you have to perform the completion in the shell, there has to be some way to tell the shell the possible completions for each command of interest, whether that's options or arguments. The different shells have solved that in essentially the same way, with a few syntactic variations. Bash provides a framework (complete/compgen/compctl) and pushes a lot of the command-specific work to external completers. It provides access to the shell internals (lists of builtins, functions, aliases, variables, and so on) and built-in ways to perform common completions (filenames, directory names, command names, etc.), and leaves the rest to external commands or shell functions. The real power and flexibility comes from being able to invoke these external commands or shell functions to generate lists of possible completions, and defining an API between the shell and those generators to specify enough of the command line to make it easy to find the word to be completed, the command for which completion is being attempted, and clarifying context around that word. In the same way, the shell provides an API for those generators to return possible completions. The knowledge about each command's options and arguments is embedded in these generators. A standard way to handle command line options and arguments would make generators easier to write, but doesn't address the other issues of what, exactly, the user wants to complete, so the existing mechanisms would likely not change very much. Something like `--shell-help', as long as it were context-sensitive, would help more. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet@case.edu http://tiswww.cwru.edu/~chet/
On 7/31/21 2:56 PM, Michael Siegel wrote: > Am Sat, 31 Jul 2021 10:30:18 -0700 > schrieb Anthony Martin <ality@pbrane.org>: > >> Michael Siegel <msi@malbolge.net> once said: >>> So, I've prepared a bit of a write-up, pondering on the pros and >>> cons of two different ways of having task-specific tool sets >>> (non-hierarchical command sets vs. sub-commands) that is available >>> at >>> >>> https://www.msiism.org/files/doc/unix-like_command-line_interfaces.html >>> >>> I tend to think the sub-command approach is better. But I'm neither >>> a UI nor a Unix expert and have no formal training in computer >>> things. So, I thought this would be a good place to ask for comment >>> (and get some historical perspective). >> >> You're missing the approach taken in Plan 9 (and >> 10th edition Unix): put related commands in a >> directory and use a shell that doesn't restrict >> the first argument of a command to a single path >> element. >> >> This lets you execute commands like: >> >> auth/as >> disk/prep >> git/rebase >> ip/ping >> ndb/dns >> upas/send >> >> without having a prefix on every command name or >> single large binaries with every command linked >> in as subcommands. > > Thanks for pointing this out. I had no idea. > > Unfortunately(?), I'm looking to make my life easier on more "Unix-like > Unix-like systems" (for want of a better term), for the time being > (Linux, BSD, maybe illumos). (I mean, which shell would I use to > accomplish this on Unix?) POSIX forbids this behavior, FWIW, so you'll probably have a hard time finding a shell -- at least one with POSIX aspirations -- that implements it. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet@case.edu http://tiswww.cwru.edu/~chet/
[-- Attachment #1: Type: text/plain, Size: 365 bytes --] > >> This lets you execute commands like: > >> > >> auth/as > ... POSIX forbids this behavior, FWIW, so you'll probably have a hard time > finding a shell -- at least one with POSIX aspirations -- that implements > it. > So you write a function or alias that prepends the full path to "as" and exec's the command. So you have to type "auth as ..." but BFD. [-- Attachment #2: Type: text/html, Size: 789 bytes --]
[-- Attachment #1: Type: text/plain, Size: 4273 bytes --] On Sun, Aug 1, 2021 at 1:44 PM Chet Ramey <chet.ramey@case.edu> wrote: > On 7/31/21 12:19 PM, Dan Cross wrote: > > There was someone posting here on TUHS a while back about leveraging a > > special context-sensitive `--shell-help` or similar command line program > > and synthesizing a protocol between the shell and a program to provide > > TOPS-20 like command completion. It was nowhere near what you get from > the > > COMND JSYS, but seemed like a reasonable approximation. > > This is essentially how the existing shells do it (bash, zsh, tcsh, etc.), > but in an ad-hoc fashion. There is no standard way to obtain possible > completions or list possible arguments, so the shells push that to external > generators. > > Since you have to perform the completion in the shell, there has to be some > way to tell the shell the possible completions for each command of > interest, whether that's options or arguments. The different shells have > solved that in essentially the same way, with a few syntactic variations. > > Bash provides a framework (complete/compgen/compctl) and pushes a lot of > the command-specific work to external completers. It provides access to the > shell internals (lists of builtins, functions, aliases, variables, and so > on) and built-in ways to perform common completions (filenames, directory > names, command names, etc.), and leaves the rest to external commands or > shell functions. > > The real power and flexibility comes from being able to invoke these > external commands or shell functions to generate lists of possible > completions, and defining an API between the shell and those generators to > specify enough of the command line to make it easy to find the word to be > completed, the command for which completion is being attempted, and > clarifying context around that word. In the same way, the shell provides an > API for those generators to return possible completions. > > The knowledge about each command's options and arguments is embedded in > these generators. > > A standard way to handle command line options and arguments would make > generators easier to write, but doesn't address the other issues of what, > exactly, the user wants to complete, so the existing mechanisms would > likely not change very much. Something like `--shell-help', as long as it > were context-sensitive, would help more. Thanks for the useful background information on existing solutions. If I understood the proposal correctly, it was that the program in question would, itself, be the generator as described above. Perhaps it was coupled with a standard structured format for consumption by the shell, which seems like it would be useful for this sort of expansion. Of course, the process model in TOPS-20 was very different than in Unix, and in that system, as soon as you typed the _name_ of a command it's image was "run up" in your process. So the interactive help system was provided by a running instance of the program itself. What I gathered from the proposed model was that it involved multiple invocations of the program, but with a special option that would trigger behavior informally described as, "here's the context I've built so far; let me know what options are available here." I don't know that it's terribly "Unixy", but I can see how it would be useful for interactive use. As an aside, I maintain some older "machines" at home (even modest hardware can emulate a PDP-10 or Honeywell DPS8), and find that doing so provides me with perspective that can be very useful. Looking at other systems that were available roughly around the time of Unix (TENEX, Multics), it strikes me that the Unix was a bit of an odd-duck with the way it handled exec in terms of destructively overlaying the memory of the user portion of a process with a new image; am I wrong here? I wonder why the "one program per process and exec destroys what was running before" mechanism was implemented? I can imagine it had a lot to do with the constraints that early Unix machines must have imposed on design, not to mention implementation simplicity, but I wonder what the designers thought of other systems' process models and whether they were considered at all? Perhaps Doug and Ken might have thoughts here? - Dan C. [-- Attachment #2: Type: text/html, Size: 4898 bytes --]
On 8/1/21 5:53 PM, Dan Cross wrote: > Thanks for the useful background information on existing solutions. > > If I understood the proposal correctly, it was that the program in question > would, itself, be the generator as described above. Perhaps it was coupled > with a standard structured format for consumption by the shell, which seems > like it would be useful for this sort of expansion. Yes, it would make writing generators easier. The rest of the process would change very little: determining the word to complete, determining the command name, breaking the edit line into words for the generator, invoking the generator through the appropriate mechanism, parsing the results, and processing the matches. From the shell's perspective, it's a minor change. > Of course, the process model in TOPS-20 was very different than in Unix, > and in that system, as soon as you typed the _name_ of a command it's image > was "run up" in your process. So the interactive help system was provided > by a running instance of the program itself. What I gathered from the > proposed model was that it involved multiple invocations of the program, > but with a special option that would trigger behavior informally described > as, "here's the context I've built so far; let me know what options are > available here." I don't know that it's terribly "Unixy", but I can see how > it would be useful for interactive use. Yes. None of this is very "Unixy", but people have gotten used to being able to use capabilities like completion. When you're running interactively, running additional processes when you're performing word completion isn't particularly expensive. Again from the shell's perspective, invoking one generator that executes a program with `--shell-help' isn't that much different or more expensive -- and simpler in some ways because you don't have to save any incremental parsing state -- than executing a shell function that runs several processes, mostly command substitutions. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet@case.edu http://tiswww.cwru.edu/~chet/
On 8/1/21 3:23 PM, Richard Salz wrote: > > >> This lets you execute commands like: > >> > >> auth/as > ... > > POSIX forbids this behavior, FWIW, so you'll probably have a hard time > finding a shell -- at least one with POSIX aspirations -- that > implements it. > > > So you write a function or alias that prepends the full path to "as" and > exec's the command. So you have to type "auth as ..." but BFD. Sure. If you invest effort in building a solution, you can do just about anything. If you want, you can write a function that generates the set of aliases you use to do this. The thing is, you're going to have to build it -- you can't expect to find a shell that does a $PATH search for a pathname containing a slash. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet@case.edu http://tiswww.cwru.edu/~chet/
[-- Attachment #1: Type: text/plain, Size: 3403 bytes --] On Sun, Aug 1, 2021 at 5:55 PM Dan Cross <crossd@gmail.com> wrote: > Looking at other systems that were available roughly around the time of > Unix (TENEX, Multics), it strikes me that the Unix was a bit of an odd-duck > with the way it handled exec in terms of destructively overlaying the > memory of the user portion of a process with a new image; am I wrong here? > See dmr's paper at <https://www.bell-labs.com/usr/dmr/www/hist.html> for details, but in short exec and its equivalents elsewhere have always overlaid the running program with another program. Early versions of PDP-7 Linux used the same process model as Tenex: one process per terminal which alternated between running the shell and a user program. So exec() loaded the user program on top of the shell. Indeed, this wasn't even a syscall; the shell itself wrote a tiny program loader into the top of memory that read the new program, which was open for reading, and jumped to it. Likewise, exit() was a specialized exec() that reloaded the shell. The Tenex and Multics shells had more memory to play with and didn't have to use these self-overlaying tricks[*]: they loaded your program into available memory and called it as a subroutine, which accounts for the name "shell". So it was the introduction of fork(), which came from the Berkeley Genie OS, that made the current process control regime possible. In those days, fork() wrote the current process out to the swapping disk and set up the process table with a new entry. For efficiency, the in-memory version became the child and the swapped-out version became the parent. Instantly the shell was able to run background processes by just not waiting for them, and pipelines (once the syntax was invented) could be handled with N - 1 processes in an N-stage pipeline. Huge new powers landed on the user's head. Nowadays it's a question whether fork() makes sense any more. "A fork() in the road" [Baumann et al. 2019] < https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf> is an interesting argument against fork(): * It doesn't compose. * It is insecure by default. * It is slow (there are about 25 properties a process has in addition to its memory and hardware state, and each of these needs to be copied or not) even using COW (which is itself a Good Thing and can and should be provided separately) * It is incompatible with a single-address-space design. In short, spawn() beats fork() like a drum, and fork() should be deprecated. To be sure, the paper comes out of Microsoft Research, but I find it pretty compelling anyway. [*] My very favorite self-overlaying program was the PDP-8 bootstrap for the DF32 disk drive. You toggled in two instructions at locations 30 and 31 meaning "load disk registers and go" and "jump to self" respectively, hit the Clear key on the front panel, which cleared all registers, and started up at 30. The first instruction told the disk to start reading sector 0 of the disk into location 0 in memory (because all the registers were 0, including the disk instruction register where 0 = READ) and the second instruction kept the CPU busy waiting. As the sector loaded, the two instructions were overwritten by "skip if disk ready" and "jump to previous address", which would wait until the whole sector was loaded. Then the OS could be loaded using the primitive disk driver in block 0. [-- Attachment #2: Type: text/html, Size: 5905 bytes --]
On Sun, Aug 01, 2021 at 07:36:53PM -0400, John Cowan wrote:
> Nowadays it's a question whether fork() makes sense any more. "A fork()
> in the road" [Baumann et al. 2019] <
> https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf>
> is an interesting argument against fork():
>
> * It doesn't compose.
> * It is insecure by default.
> * It is slow (there are about 25 properties a process has in addition to
> its memory and hardware state, and each of these needs to be copied or not)
> even using COW (which is itself a Good Thing and can and should be provided
> separately)
> * It is incompatible with a single-address-space design.
>
> In short, spawn() beats fork() like a drum, and fork() should be
> deprecated. To be sure, the paper comes out of Microsoft Research, but I
> find it pretty compelling anyway.
When we were working on supporting BitKeeper on Windows, MacOS, all the
various Unix versions, and Linux, we implemented all the needed libc
stuff on Windows (so we could pretend we were not running on Windows).
Everything except fork(), we made a spawnvp() interface. That's the
one thing that made more sense than the Unix way. I have called fork()
directly in decades.
[-- Attachment #1: Type: text/plain, Size: 4091 bytes --] On Sun, Aug 1, 2021 at 7:37 PM John Cowan <cowan@ccil.org> wrote: > On Sun, Aug 1, 2021 at 5:55 PM Dan Cross <crossd@gmail.com> wrote: > >> Looking at other systems that were available roughly around the time of >> Unix (TENEX, Multics), it strikes me that the Unix was a bit of an odd-duck >> with the way it handled exec in terms of destructively overlaying the >> memory of the user portion of a process with a new image; am I wrong here? >> > > See dmr's paper at <https://www.bell-labs.com/usr/dmr/www/hist.html> for > details, but in short exec and its equivalents elsewhere have always > overlaid the running program with another program. > That's a great paper and I've really enjoyed revisiting it over the years, but while it does a great job of explaining how the Unix mechanism worked, and touches on the "why", it doesn't contrast with other schemes. I suppose my question could be rephrased as, if the early Unix implementers had had more resources to work with, would they have chosen a model more along the lines used by Multics and Twenex, or would they have elected to do basically what they did? That's probably impossible to answer, but gets at what they thought about how other systems operated. Early versions of PDP-7 Linux used the same process model as Tenex: one > process per terminal which alternated between running the shell and a user > program. So exec() loaded the user program on top of the shell. Indeed, > this wasn't even a syscall; the shell itself wrote a tiny program loader > into the top of memory that read the new program, which was open for > reading, and jumped to it. Likewise, exit() was a specialized exec() that > reloaded the shell. The Tenex and Multics shells had more memory to play > with and didn't have to use these self-overlaying tricks[*]: they loaded > your program into available memory and called it as a subroutine, which > accounts for the name "shell". > Presumably the virtual memory hardware could also be used to protect the shell from a malicious or errant program trashing the image of the shell in memory. [snip] > Nowadays it's a question whether fork() makes sense any more. "A fork() > in the road" [Baumann et al. 2019] < > https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf> > is an interesting argument against fork(): > > * It doesn't compose. > * It is insecure by default. > * It is slow (there are about 25 properties a process has in addition to > its memory and hardware state, and each of these needs to be copied or not) > even using COW (which is itself a Good Thing and can and should be provided > separately) > * It is incompatible with a single-address-space design. > > In short, spawn() beats fork() like a drum, and fork() should be > deprecated. To be sure, the paper comes out of Microsoft Research, but I > find it pretty compelling anyway. > Spawn vs fork/exec is a false dichotomy, though. We talked about the fork paper when it came out, and here's what I wrote about it at the time: https://minnie.tuhs.org/pipermail/tuhs/2019-April/017700.html [*] My very favorite self-overlaying program was the PDP-8 bootstrap for > the DF32 disk drive. You toggled in two instructions at locations 30 and > 31 meaning "load disk registers and go" and "jump to self" respectively, > hit the Clear key on the front panel, which cleared all registers, and > started up at 30. > > The first instruction told the disk to start reading sector 0 of the disk > into location 0 in memory (because all the registers were 0, including the > disk instruction register where 0 = READ) and the second instruction kept > the CPU busy waiting. As the sector loaded, the two instructions were > overwritten by "skip if disk ready" and "jump to previous address", which > would wait until the whole sector was loaded. Then the OS could be loaded > using the primitive disk driver in block 0. > Very nice; that's highly reminiscent of a Sergeant-style forth: https://pygmy.utoh.org/3ins4th.html One wonders if the PDP-8 was one of Sergeant's inspirations? - Dan C. [-- Attachment #2: Type: text/html, Size: 7523 bytes --]
On 8/1/21, John Cowan <cowan@ccil.org> wrote:
>
> Nowadays it's a question whether fork() makes sense any more. "A fork()
> in the road" [Baumann et al. 2019] <
> https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf>
> is an interesting argument against fork():
>
> * It doesn't compose.
> * It is insecure by default.
> * It is slow (there are about 25 properties a process has in addition to
> its memory and hardware state, and each of these needs to be copied or not)
> even using COW (which is itself a Good Thing and can and should be provided
> separately)
> * It is incompatible with a single-address-space design.
>
> In short, spawn() beats fork() like a drum, and fork() should be
> deprecated. To be sure, the paper comes out of Microsoft Research, but I
> find it pretty compelling anyway.
>
There's a third kind of primitive that is superior to either spawn()
or fork() IMO, specifically one that creates a completely empty child
process and returns a context that lets the parent set up the child's
state using normal APIs. To start the child the parent would either
call exec() to start the child running a different program, or call a
new function that starts the child with a parent-provided entry point
and whatever memory mappings the parent set up. Both fork() and
spawn() could be implemented on top of this easily enough with
basically no additional overhead compared to implementing both as
primitives. This is what I plan to do on the OS I'm writing
(manipulating the child's state won't require any additional
primitives beyond regular file I/O since literally all process state
will have a file-based interface).
[-- Attachment #1: Type: text/plain, Size: 722 bytes --] On Sun, Aug 1, 2021 at 8:13 PM Andrew Warkentin <andreww591@gmail.com> wrote To start the child the parent would either > call exec() to start the child running a different program, or call a > new function that starts the child with a parent-provided entry point > and whatever memory mappings the parent set up. > This is what I plan to do on the OS I'm writing > (manipulating the child's state won't require any additional > primitives beyond regular file I/O since literally all process state > will have a file-based interface). > In that case you don't need *any* primitive except create_empty_process(): you can do exec() by opening the file, writing to /proc/<child>/mem and then to <proc/<child>/pc-and-go. [-- Attachment #2: Type: text/html, Size: 1471 bytes --]
On Sun, Aug 01, 2021 at 04:49:50PM -0700, Larry McVoy wrote: > On Sun, Aug 01, 2021 at 07:36:53PM -0400, John Cowan wrote: > > Nowadays it's a question whether fork() makes sense any more. "A fork() > > in the road" [Baumann et al. 2019] < > > https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf> > > is an interesting argument against fork(): > > > > * It doesn't compose. > > * It is insecure by default. > > * It is slow (there are about 25 properties a process has in addition to > > its memory and hardware state, and each of these needs to be copied or not) > > even using COW (which is itself a Good Thing and can and should be provided > > separately) > > * It is incompatible with a single-address-space design. > > > > In short, spawn() beats fork() like a drum, and fork() should be > > deprecated. To be sure, the paper comes out of Microsoft Research, but I > > find it pretty compelling anyway. > > When we were working on supporting BitKeeper on Windows, MacOS, all the > various Unix versions, and Linux, we implemented all the needed libc > stuff on Windows (so we could pretend we were not running on Windows). > Everything except fork(), we made a spawnvp() interface. That's the > one thing that made more sense than the Unix way. I have called fork() > directly in decades. s/have/have not/ called fork().... Sigh. -- --- Larry McVoy lm at mcvoy.com http://www.mcvoy.com/lm
On Sun, 1 Aug 2021, Dan Cross wrote:
> Spawn vs fork/exec is a false dichotomy, though. We talked about the fork
> paper when it came out, and here's what I wrote about it at the time:
> https://minnie.tuhs.org/pipermail/tuhs/2019-April/017700.html
I've often wished I could run a free/open Bourne-type shell on 16-bit
MS-DOS and OS/2. Porting to the former is next to impossible because of
the lack of *any* concept of multitasking. Porting to the latter is
difficult because multitasking isn't done anything like the Unix way.
I actually like the spawn* functions better, though I think on Unix
fork/exec is the most natural way to implement them.
-uso.
On 8/1/21, John Cowan <cowan@ccil.org> wrote:
>
> In that case you don't need *any* primitive except create_empty_process():
> you can do exec() by opening the file, writing to /proc/<child>/mem and
> then to <proc/<child>/pc-and-go.
>
Yes, although that would break if the permissions for the program are
execute-only (which admittedly is of limited security value in most
cases).
[-- Attachment #1: Type: text/plain, Size: 2261 bytes --] On Sun, Aug 1, 2021 at 8:19 PM John Cowan <cowan@ccil.org> wrote: > On Sun, Aug 1, 2021 at 8:13 PM Andrew Warkentin <andreww591@gmail.com> > wrote > >> To start the child the parent would either >> call exec() to start the child running a different program, or call a >> new function that starts the child with a parent-provided entry point >> and whatever memory mappings the parent set up. > > > >> This is what I plan to do on the OS I'm writing >> (manipulating the child's state won't require any additional >> primitives beyond regular file I/O since literally all process state >> will have a file-based interface). >> > > In that case you don't need *any* primitive except create_empty_process(): > you can do exec() by opening the file, writing to /proc/<child>/mem and > then to <proc/<child>/pc-and-go. > Sadly, that's not _quite_ true. You still need some primordial way to get the system going. Once you have a proc_create and a make_proc_runnable system call, it seems like it opens the door to doing all kinds of cool things like moving binary parsers out of the kernel and into user space, but consider how `init` gets bootstrapped: often, there's a handful of instructions that basically invoke `execl("/bin/init", "init", 0);` that you compile into the kernel; on creation of process 1, the kernel copies those instructions into a page somewhere in user portion of the address space and "returns" to it; the process then invokes /bin/init which carries on with bringing up the rest of the system. Now you're confronted with two choices: you either put a much more elaborate bootstrap into the kernel (in this day and age, probably not that hard), or you have a minimal bootstrap that's smart enough to load a smarter bootstrap that in turn can load something like init. I suppose a third option is to compile `init` in some dead simple way that you can load in the kernel as a special case, and invoke that. This problem isn't insurmountable, but it's a wide design space, and it's not quite as straight-forward as it first appears. As I mentioned in the email I linked to earlier, Akaros implemented the proc_create/proc_run model. It really was superior to fork()/exec() and I would argue superior to spawn() as well. - Dan C. [-- Attachment #2: Type: text/html, Size: 3539 bytes --]
On Sun, Aug 01, 2021 at 06:13:18PM -0600, Andrew Warkentin wrote: > There's a third kind of primitive that is superior to either spawn() > or fork() IMO, specifically one that creates a completely empty child > process and returns a context that lets the parent set up the child's > state using normal APIs. I've seen this argument a number of times, but what's never been clear to me is what *would* the "normal APIs" be which would allow a parent to set up the child's state? How would that be accomplished? Lots of new system calls? Magic files in /proc/<pid>/XXX which get manipulated somehow? (How, exactly, does one affect the child's memory map via magic read/write calls to /proc/<pid>/XXX.... How about environment variables, etc.) And what are the access rights by which a process gets to reach out and touch another process's environment? Is it only allowed only for child processes? And is it only allowed before the child starts running? What if the child process is going to be running a setuid or setgid executable? The phrase "all process state will have a file-based interface" sounds good on paper, but I think it remains to be seen how well a "echo XXX > /proc/<pid>/magic-file" API would actually work. The devil is really in the details.... - Ted
On 8/1/21, Theodore Ts'o <tytso@mit.edu> wrote: > > I've seen this argument a number of times, but what's never been clear > to me is what *would* the "normal APIs" be which would allow a parent > to set up the child's state? How would that be accomplished? Lots of > new system calls? Magic files in /proc/<pid>/XXX which get > manipulated somehow? (How, exactly, does one affect the child's > memory map via magic read/write calls to /proc/<pid>/XXX.... How > about environment variables, etc.) > My OS will be microkernel-based and even the RPC channel to the VFS itself will be a file (with some special semantics). read(), write() and seek() will bypass the VFS entirely and call the kernel to directly communicate with the destination process. The call to create an empty process will return a new RPC channel and there will be an API to temporarily switch to an alternate channel so that VFS calls occur in the child context instead of the parent. All process memory, even the heap and stack, will be implemented as memory-mapped files in a per-process filesystem under /proc/<pid>. This will be a special "shadowfs" that allows creating files that shadow ranges of other files (either on disk or in memory). Environment variables will also be exposed in /proc of course. > > And what are the access rights by which a process gets to reach out > and touch another process's environment? Is it only allowed only for > child processes? And is it only allowed before the child starts > running? What if the child process is going to be running a setuid or > setgid executable? > Any process that has permissions to access the RPC channel file and memory mapping shadow files in /proc/<pid> will be able to manipulate the state. The RPC channel will cease to function after the child has been started. setuid and setgid executables will not be supported at all (there will instead be a role-based access control system layered on top of a per-process file permission list, which will allow privilege escalation on exec in certain situations defined by configuration). > > The phrase "all process state will have a file-based interface" sounds > good on paper, but I think it remains to be seen how well a "echo XXX >> /proc/<pid>/magic-file" API would actually work. The devil is > really in the details.... > Even though everything will use a file-based implementation underneath, there will be a utility library layered on top of it so that user code doesn't have to contain lots of open()-read()/write()-close() boilerplate.
On Aug 1, 2021, at 6:05 PM, Theodore Ts'o <tytso@mit.edu> wrote: > > On Sun, Aug 01, 2021 at 06:13:18PM -0600, Andrew Warkentin wrote: >> There's a third kind of primitive that is superior to either spawn() >> or fork() IMO, specifically one that creates a completely empty child >> process and returns a context that lets the parent set up the child's >> state using normal APIs. > > I've seen this argument a number of times, but what's never been clear > to me is what *would* the "normal APIs" be which would allow a parent > to set up the child's state? How would that be accomplished? Lots of > new system calls? Magic files in /proc/<pid>/XXX which get > manipulated somehow? (How, exactly, does one affect the child's > memory map via magic read/write calls to /proc/<pid>/XXX.... How > about environment variables, etc.) > > And what are the access rights by which a process gets to reach out > and touch another process's environment? Is it only allowed only for > child processes? And is it only allowed before the child starts > running? What if the child process is going to be running a setuid or > setgid executable? From the "KeyKOS Nanokernel Architecture" (1992) paper: ---- KeyKOS processes are created by building a segment that will become the program address space, obtaining a fresh domain, and inserting the segment key in the domain's address slot. The domain is created in the waiting state, which means that it is waiting for a message. A threads paradigm can be supported by having two or more domains share a common address space segment. Because domain initialization is such a common operation, KeyKOS provides a mechanism to generate "prepackaged" domains. A factory is an entity that constructs other domains. Every factory creates a particular type of domain. For example, the queue factory creates domains that provide queuing services. An important aspect of factories is the ability of the client to determine their trustworthiness. It is possible for a client to determine whether an object created by a factory is secure. ---- This paper also talks about their attempt to emulate Unix on top. http://css.csail.mit.edu/6.858/2009/readings/keykos.pdf
On Sat, Jul 31, 2021 at 1:08 PM Ron Young <rly1@embarqmail.com> wrote:
> FYI, There is also a unix version of the COMND JSYS capability. It was
> developed at Columbia University as part of their "mm" mail manager. It
> is located in to the ccmd subdirectory in the mm.tar.gz file.
>
> url: https://www.kermitproject.org/mm/
>
> ftp://ftp.kermitproject.org/kermit/mm/mm.tar.gz
>
I hope there are copies of these files. Looks like that FTP server has
some major corruption going on. The MM files are inaccessible, as are
others.
Jim
> spawn() beats fork()[;] fork() should be deprecated
Spawn is a further complication of exec, which tells what signals and
file descriptors to inherit in addition to what arguments and
environment variables to pass.
Fork has a place. For example, Program 1 in
www.cs.dartmouth.edu/~doug/sieve/sieve.pdf forks like crazy and never
execs. To use spawn, the program would have to be split in three (or
be passed a switch setting).
While you may dismiss Program 1 as merely a neat demo, the same idea
applies in parallelizing code for use in a multiprocessor world.
Doug
On Sun, Aug 01, 2021 at 10:42:53PM -0400, Douglas McIlroy wrote:
> > spawn() beats fork()[;] fork() should be deprecated
>
> Spawn is a further complication of exec, which tells what signals and
> file descriptors to inherit in addition to what arguments and
> environment variables to pass.
>
> Fork has a place. For example, Program 1 in
> www.cs.dartmouth.edu/~doug/sieve/sieve.pdf forks like crazy and never
> execs. To use spawn, the program would have to be split in three (or
> be passed a switch setting).
>
> While you may dismiss Program 1 as merely a neat demo, the same idea
> applies in parallelizing code for use in a multiprocessor world.
It's certainly clear that some kind of primitive is needed to create
new threads. An open question is whether if there exists some kind of
"new thread" primitve plus either spawn(2) or some kind of "create a
child process and then then frob like crazy using 'echo XXX >
/proc/<pid>/<magic files>'" whether there still is a need for a
fork(2) system call.
Obviously, as soon as we start going down this path, we're deviated
quite strongly from the "radical simplicity" of Unix Version 7 that
people have accused modern systems (whether they be Linux or FreeBSD)
of lacking. It's rather interesting that we haven't heard complaints
about how people who dare to try come up with new API's are somehow
traitors to "The Unix Philosphy" that we've seen on other threads. :-)
- Ted
John Cowan wrote:
> Andrew Warkentin wrote:
> > There's a third kind of primitive that is superior to either spawn()
> > or fork() IMO, specifically one that creates a completely empty
> > child process and returns a context that lets the parent set up the
> > child's state using normal APIs.
> In that case you don't need *any* primitive except create_empty_process():
> you can do exec() by opening the file, writing to /proc/<child>/mem
That's almost exactly what what ITS does. You open the USR: device and
get a file descriptor (not really, but close enough) into the child
process (inferior job).
John Cowan wrote:
> Early versions of PDP-7 [Unix] used the same process model as Tenex
I understand both Tenex and Unix got the concept of "fork" from Project
Genie.
[-- Attachment #1: Type: text/plain, Size: 2099 bytes --] It's a measure of Unix having been wounded by its own success. fork() is a great model for a single-threaded text processing pipeline to do automated typesetting. (More generally, anything that is a straightforward composition of filter/transform stages.) Which is, y'know, what Unix is *for*. It's not so great for a responsive GUI in front of a multi-function interactive program. These days, the vast majority of Unix applications are "stuff people play with on their phones." Adam On Mon, Aug 2, 2021 at 7:59 AM Theodore Ts'o <tytso@mit.edu> wrote: > On Sun, Aug 01, 2021 at 10:42:53PM -0400, Douglas McIlroy wrote: > > > spawn() beats fork()[;] fork() should be deprecated > > > > Spawn is a further complication of exec, which tells what signals and > > file descriptors to inherit in addition to what arguments and > > environment variables to pass. > > > > Fork has a place. For example, Program 1 in > > www.cs.dartmouth.edu/~doug/sieve/sieve.pdf forks like crazy and never > > execs. To use spawn, the program would have to be split in three (or > > be passed a switch setting). > > > > While you may dismiss Program 1 as merely a neat demo, the same idea > > applies in parallelizing code for use in a multiprocessor world. > > It's certainly clear that some kind of primitive is needed to create > new threads. An open question is whether if there exists some kind of > "new thread" primitve plus either spawn(2) or some kind of "create a > child process and then then frob like crazy using 'echo XXX > > /proc/<pid>/<magic files>'" whether there still is a need for a > fork(2) system call. > > Obviously, as soon as we start going down this path, we're deviated > quite strongly from the "radical simplicity" of Unix Version 7 that > people have accused modern systems (whether they be Linux or FreeBSD) > of lacking. It's rather interesting that we haven't heard complaints > about how people who dare to try come up with new API's are somehow > traitors to "The Unix Philosphy" that we've seen on other threads. :-) > > - Ted > [-- Attachment #2: Type: text/html, Size: 2860 bytes --]
[-- Attachment #1: Type: text/plain, Size: 2792 bytes --] On Mon, Aug 2, 2021 at 12:16 PM Adam Thornton <athornton@gmail.com> wrote: > It's a measure of Unix having been wounded by its own success. > > fork() is a great model for a single-threaded text processing pipeline to > do automated typesetting. (More generally, anything that is a > straightforward composition of filter/transform stages.) Which is, y'know, > what Unix is *for*. > fork() dates from a time that demand paging wasn't a thing. Processes were as cheap as it got. There were no threads. All the different variations on a theme on fork() since then have been to either make threads super cheap to create, to optimize the exec case (which already has been discussed a bit:), and/or to control what the new process inherits. It's not so great for a responsive GUI in front of a multi-function > interactive program. > > These days, the vast majority of Unix applications are "stuff people play > with on their phones." > Ah, a thread-heavy environment that's not all that exec intensive (but that's complicated enough you can no longer safely do a naive fork/exec when you need to)... But mostly, it's threads. Warner > Adam > > On Mon, Aug 2, 2021 at 7:59 AM Theodore Ts'o <tytso@mit.edu> wrote: > >> On Sun, Aug 01, 2021 at 10:42:53PM -0400, Douglas McIlroy wrote: >> > > spawn() beats fork()[;] fork() should be deprecated >> > >> > Spawn is a further complication of exec, which tells what signals and >> > file descriptors to inherit in addition to what arguments and >> > environment variables to pass. >> > >> > Fork has a place. For example, Program 1 in >> > www.cs.dartmouth.edu/~doug/sieve/sieve.pdf forks like crazy and never >> > execs. To use spawn, the program would have to be split in three (or >> > be passed a switch setting). >> > >> > While you may dismiss Program 1 as merely a neat demo, the same idea >> > applies in parallelizing code for use in a multiprocessor world. >> >> It's certainly clear that some kind of primitive is needed to create >> new threads. An open question is whether if there exists some kind of >> "new thread" primitve plus either spawn(2) or some kind of "create a >> child process and then then frob like crazy using 'echo XXX > >> /proc/<pid>/<magic files>'" whether there still is a need for a >> fork(2) system call. >> >> Obviously, as soon as we start going down this path, we're deviated >> quite strongly from the "radical simplicity" of Unix Version 7 that >> people have accused modern systems (whether they be Linux or FreeBSD) >> of lacking. It's rather interesting that we haven't heard complaints >> about how people who dare to try come up with new API's are somehow >> traitors to "The Unix Philosphy" that we've seen on other threads. :-) >> >> - Ted >> > [-- Attachment #2: Type: text/html, Size: 4253 bytes --]
[-- Attachment #1: Type: text/plain, Size: 672 bytes --] On Mon, Aug 2, 2021 at 1:37 PM Lars Brinkhoff <lars@nocrew.org> wrote: > John Cowan wrote: > > Early versions of PDP-7 [Unix] used the same process model as Tenex > > I understand both Tenex and Unix got the concept of "fork" from Project > Genie. Should be required reading of all intro to OS students: Programming Semantics for Multiprogrammed Computations Jack B. Dennis and Earl C. Van Horn Massachusetts Institute of Technology, Cambridge, Massachusetts Volume 9 / Number 3 / March, 1966 [If your Internet search fails you and/or you find it behind an ACM paywall or the like, drop me a line, I'll forward a PDF of a scan]. ᐧ ᐧ [-- Attachment #2: Type: text/html, Size: 3266 bytes --]
[-- Attachment #1: Type: text/plain, Size: 950 bytes --] On Mon, Aug 2, 2021 at 2:16 PM Adam Thornton <athornton@gmail.com> wrote: > fork() is a great model for a single-threaded text processing pipeline to > do automated typesetting. (More generally, anything that is a > straightforward composition of filter/transform stages.) Which is, y'know, > what Unix is *for*. > Indeed. But it's also a very good model for "baking" web pages in the background so that you can serve them up with a plain dumb web server, maybe with a bit of JS to provide some auto-updating, especially if the source data is stored not in a database but in the file system. The result is a page that displays (modulo network latency) as fast as you can hit the Enter key in the address bar. (The weak point is the lack of dependency management when the system is too big to rebake all the pages each time. Perhaps make(1), which Alex Shinn described as "a beautiful little Prolog for the file system", is the Right Thing.) [-- Attachment #2: Type: text/html, Size: 1741 bytes --]
[-- Attachment #1: Type: text/plain, Size: 893 bytes --] https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.6859&rep=rep1&type=pdf works fine, no paywall. On Mon, Aug 2, 2021 at 2:53 PM Clem Cole <clemc@ccc.com> wrote: > > On Mon, Aug 2, 2021 at 1:37 PM Lars Brinkhoff <lars@nocrew.org> wrote: > >> John Cowan wrote: >> > Early versions of PDP-7 [Unix] used the same process model as Tenex >> >> I understand both Tenex and Unix got the concept of "fork" from Project >> Genie. > > > Should be required reading of all intro to OS students: > > Programming Semantics for Multiprogrammed Computations > > Jack B. Dennis and Earl C. Van Horn > > Massachusetts Institute of Technology, Cambridge, Massachusetts > > Volume 9 / Number 3 / March, 1966 > > [If your Internet search fails you and/or you find it behind an ACM > paywall or the like, drop me a line, I'll forward a PDF of a scan]. > > > ᐧ > ᐧ > [-- Attachment #2: Type: text/html, Size: 3936 bytes --]
John Cowan writes:
>
> > fork() is a great model for a single-threaded text processing pipeline to
> > do automated typesetting. (More generally, anything that is a
> > straightforward composition of filter/transform stages.) Which is, y'know,
> > what Unix is *for*.
> >
>
> Indeed. But it's also a very good model for "baking" web pages in the
> background so that you can serve them up with a plain dumb web server,
> maybe with a bit of JS to provide some auto-updating, especially if the
> source data is stored not in a database but in the file system. The result
> is a page that displays (modulo network latency) as fast as you can hit the
> Enter key in the address bar.
>
> (The weak point is the lack of dependency management when the system is too
> big to rebake all the pages each time. Perhaps make(1), which Alex Shinn
> described as "a beautiful little Prolog for the file system", is the Right
> Thing.)
We have, of course, had similar discussions many times on this list.
I think that the root issue is the false equivalence of "I don't
understand this well enough to be able to use it effectively to solve
my problem" with "it's broken/obsolete/dated".
On 8/2/21 1:59 PM, John Cowan wrote:
> I understand both Tenex and Unix got the concept of "fork" from Project
> Genie.
Original UCB source material on Genie can be found at bitsavers.
[-- Attachment #1: Type: text/plain, Size: 853 bytes --] Sorry bad cut/paste -- that was from CACM if its was not obvious ᐧ On Mon, Aug 2, 2021 at 2:52 PM Clem Cole <clemc@ccc.com> wrote: > > On Mon, Aug 2, 2021 at 1:37 PM Lars Brinkhoff <lars@nocrew.org> wrote: > >> John Cowan wrote: >> > Early versions of PDP-7 [Unix] used the same process model as Tenex >> >> I understand both Tenex and Unix got the concept of "fork" from Project >> Genie. > > > Should be required reading of all intro to OS students: > > Programming Semantics for Multiprogrammed Computations > > Jack B. Dennis and Earl C. Van Horn > > Massachusetts Institute of Technology, Cambridge, Massachusetts > > Volume 9 / Number 3 / March, 1966 > > [If your Internet search fails you and/or you find it behind an ACM > paywall or the like, drop me a line, I'll forward a PDF of a scan]. > > > ᐧ > ᐧ > [-- Attachment #2: Type: text/html, Size: 4061 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1021 bytes --] excellent ᐧ On Mon, Aug 2, 2021 at 4:59 PM John Cowan <cowan@ccil.org> wrote: > > https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.6859&rep=rep1&type=pdf > works fine, no paywall. > > On Mon, Aug 2, 2021 at 2:53 PM Clem Cole <clemc@ccc.com> wrote: > >> >> On Mon, Aug 2, 2021 at 1:37 PM Lars Brinkhoff <lars@nocrew.org> wrote: >> >>> John Cowan wrote: >>> > Early versions of PDP-7 [Unix] used the same process model as Tenex >>> >>> I understand both Tenex and Unix got the concept of "fork" from Project >>> Genie. >> >> >> Should be required reading of all intro to OS students: >> >> Programming Semantics for Multiprogrammed Computations >> >> Jack B. Dennis and Earl C. Van Horn >> >> Massachusetts Institute of Technology, Cambridge, Massachusetts >> >> Volume 9 / Number 3 / March, 1966 >> >> [If your Internet search fails you and/or you find it behind an ACM >> paywall or the like, drop me a line, I'll forward a PDF of a scan]. >> >> >> ᐧ >> ᐧ >> > [-- Attachment #2: Type: text/html, Size: 4711 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1664 bytes --] On Mon, Aug 2, 2021 at 5:19 PM Jon Steinhart <jon@fourwinds.com> wrote: > John Cowan writes: > > > fork() is a great model for a single-threaded text processing pipeline > to > > > do automated typesetting. (More generally, anything that is a > > > straightforward composition of filter/transform stages.) Which is, > y'know, > > > what Unix is *for*. > > > > > > > Indeed. But it's also a very good model for "baking" web pages in the > > background so that you can serve them up with a plain dumb web server, > > maybe with a bit of JS to provide some auto-updating, especially if the > > source data is stored not in a database but in the file system. The > result > > is a page that displays (modulo network latency) as fast as you can hit > the > > Enter key in the address bar. > > > > (The weak point is the lack of dependency management when the system is > too > > big to rebake all the pages each time. Perhaps make(1), which Alex Shinn > > described as "a beautiful little Prolog for the file system", is the > Right > > Thing.) > > We have, of course, had similar discussions many times on this list. > I think that the root issue is the false equivalence of "I don't > understand this well enough to be able to use it effectively to solve > my problem" with "it's broken/obsolete/dated". > That's a bit unfair. One can understand something and see value in it and still appreciate its limitations. Fork has served us well for more than five decades; I've got no argument with that. However, should we never question whether it continues to be the right, or best, abstraction as the environment around it continues to evolve? - Dan C. [-- Attachment #2: Type: text/html, Size: 2200 bytes --]
Dan Cross writes:
> That's a bit unfair. One can understand something and see value in it and
> still appreciate its limitations.
>
> Fork has served us well for more than five decades; I've got no argument
> with that. However, should we never question whether it continues to be the
> right, or best, abstraction as the environment around it continues to
> evolve?
Oh, sorry, wasn't meaning to be categorical there. Main reason that it came
to mind was John's web example; many have said in the past that the UNIX
model couldn't do that until they figured out that it actually could.
[-- Attachment #1: Type: text/plain, Size: 823 bytes --] On Mon, Aug 2, 2021 at 6:00 PM Jon Steinhart <jon@fourwinds.com> wrote:> > Oh, sorry, wasn't meaning to be categorical there. Main reason that it > came > to mind was John's web example; many have said in the past that the UNIX > model couldn't do that until they figured out that it actually could. > I sorta lost track of what I was saying there: spawn*() would work fine in pipelines, since they involve fork-quickly-followed-by-exec. Doug's nifty sieve example, on the other hand, would not: the Right Thing there is Go, or else goroutines in C (either libmill or its successor libdill as you prefer) since the sieve doesn't actually involve any sort of global state for which processes are relevant. Granted, the C libraries have a little bit of x86_64 asm in them (but you are not expected to understand this). [-- Attachment #2: Type: text/html, Size: 1690 bytes --]
On Aug 2, 2021, at 2:25 PM, Dan Cross <crossd@gmail.com> wrote:
>
> Fork has served us well for more than five decades; I've got no argument with that. However, should we never question whether it continues to be the right, or best, abstraction as the environment around it continues to evolve?
An os with no fork() can be used to reimplement fork() for backward
compatibility. You will anyway need all the things fork() used to do
in this brave new world. For example, you will need to specify which
signals and file descriptors to inherit. Something like:
jmp_buf state;
pid = 0;
if (!setjmp(state) && !pid) {
fd = openAt(hostfd, "/n/address-space/new", ...);
<<copy /proc/self/mem to fd.>>
pid = breathe_life(fd, state, signals, fds);
longjmp(state, 1);
return pid;
}
The /n/address-space filesystem yields a new address space on a
given host (as represented by the hostfd). You can then "copy"
to it to fill it up from the current proc's memory. Copy should
be done using COW if the new proc is on the same host (or even
remote but that opens up another area of discussion...).
breathe_life is given where everything is set up and it just has to
create a new thread, associated with the address, wire up signals
as file descriptors that are to be inherited and arrange things
so that on return from syscall in the child, process state will be
as per "state".
If signals and fds can be wired up to a remote host, this can be
used to "migrate" a process. There is likely much more process
state in the kernel mode which would have to be packaged up somehow.
There may be other breakage if the child & parent are on different
hosts.
Or perhaps the issue is not having graphics/GUI designers with the
creativity and sensibilities of the early Bell Labs crowd of researchers?
I keep thinking there ought to be something simpler/more elegant than
the current graphics subsystems....
> On Aug 2, 2021, at 11:16 AM, Adam Thornton <athornton@gmail.com> wrote:
>
>
> It's a measure of Unix having been wounded by its own success.
>
> fork() is a great model for a single-threaded text processing pipeline to do automated typesetting. (More generally, anything that is a straightforward composition of filter/transform stages.) Which is, y'know, what Unix is *for*.
>
> It's not so great for a responsive GUI in front of a multi-function interactive program.
>
> These days, the vast majority of Unix applications are "stuff people play with on their phones."
>
> Adam
>
> On Mon, Aug 2, 2021 at 7:59 AM Theodore Ts'o <tytso@mit.edu> wrote:
> On Sun, Aug 01, 2021 at 10:42:53PM -0400, Douglas McIlroy wrote:
> > > spawn() beats fork()[;] fork() should be deprecated
> >
> > Spawn is a further complication of exec, which tells what signals and
> > file descriptors to inherit in addition to what arguments and
> > environment variables to pass.
> >
> > Fork has a place. For example, Program 1 in
> > www.cs.dartmouth.edu/~doug/sieve/sieve.pdf forks like crazy and never
> > execs. To use spawn, the program would have to be split in three (or
> > be passed a switch setting).
> >
> > While you may dismiss Program 1 as merely a neat demo, the same idea
> > applies in parallelizing code for use in a multiprocessor world.
>
> It's certainly clear that some kind of primitive is needed to create
> new threads. An open question is whether if there exists some kind of
> "new thread" primitve plus either spawn(2) or some kind of "create a
> child process and then then frob like crazy using 'echo XXX >
> /proc/<pid>/<magic files>'" whether there still is a need for a
> fork(2) system call.
>
> Obviously, as soon as we start going down this path, we're deviated
> quite strongly from the "radical simplicity" of Unix Version 7 that
> people have accused modern systems (whether they be Linux or FreeBSD)
> of lacking. It's rather interesting that we haven't heard complaints
> about how people who dare to try come up with new API's are somehow
> traitors to "The Unix Philosphy" that we've seen on other threads. :-)
>
> - Ted
> On Aug 2, 2021, at 5:20 PM, Bakul Shah <bakul@iitbombay.org> wrote:
>
> On Aug 2, 2021, at 2:25 PM, Dan Cross <crossd@gmail.com> wrote:
>>
>> Fork has served us well for more than five decades; I've got no argument with that. However, should we never question whether it continues to be the right, or best, abstraction as the environment around it continues to evolve?
>
> An os with no fork() can be used to reimplement fork() for backward
> compatibility. You will anyway need all the things fork() used to do
> in this brave new world.
Oh man, now I’m having flashbacks to early Cygwin and what was that, DJGPP for OS/2?
Well, I picked a lousy week to give up sniffing glue anyway.
Adam
[-- Attachment #1: Type: text/plain, Size: 773 bytes --] On Mon, 2 Aug 2021, Adam Thornton wrote: > > >> On Aug 2, 2021, at 5:20 PM, Bakul Shah <bakul@iitbombay.org> wrote: >> >> On Aug 2, 2021, at 2:25 PM, Dan Cross <crossd@gmail.com> wrote: >>> >>> Fork has served us well for more than five decades; I've got no argument with that. However, should we never question whether it continues to be the right, or best, abstraction as the environment around it continues to evolve? >> >> An os with no fork() can be used to reimplement fork() for backward >> compatibility. You will anyway need all the things fork() used to do >> in this brave new world. > > Oh man, now I’m having flashbacks to early Cygwin and what was that, DJGPP for OS/2? > > Well, I picked a lousy week to give up sniffing glue anyway. > > Adam EMX. -uso.
Bakul Shah writes:
> Or perhaps the issue is not having graphics/GUI designers with the
> creativity and sensibilities of the early Bell Labs crowd of researchers?
> I keep thinking there ought to be something simpler/more elegant than
> the current graphics subsystems....
>
> > On Aug 2, 2021, at 11:16 AM, Adam Thornton <athornton@gmail.com> wrote:
> >
> > It's a measure of Unix having been wounded by its own success.
> >
> > fork() is a great model for a single-threaded text processing pipeline to
> do automated typesetting. (More generally, anything that is a straightforward
> composition of filter/transform stages.) Which is, y'know, what Unix is *for*.
> >
> > It's not so great for a responsive GUI in front of a multi-function interactive program.
> >
> > These days, the vast majority of Unix applications are "stuff people play with on their phones."
> >
> > Adam
I thought that I posted something about this recently when someone
was arguing for threads being unnecessary and bad.
My two cents is that GUIs were the big driver for threads. I would
personally like to get rid of them as they're "the same but different"
with regards to processes. My preference would be to solve the
heaviness of processes problem. I'm not in the thick of that these
days, but I don't see it being solved in software alone; it's going
to take some serious hardware/software architecture work. Might be
easier to accomplish now that the world has pretty much settled on
the process model.
Jon
> On Aug 2, 2021, at 6:49 PM, Jon Steinhart <jon@fourwinds.com> wrote:
>
> My two cents is that GUIs were the big driver for threads. I would
> personally like to get rid of them as they're "the same but different"
> with regards to processes. My preference would be to solve the
> heaviness of processes problem. I'm not in the thick of that these
> days, but I don't see it being solved in software alone; it's going
> to take some serious hardware/software architecture work. Might be
> easier to accomplish now that the world has pretty much settled on
> the process model.
Has it, though? Most of the stuff I’m working with right now is either asyncio and Python, or Javascript, both of which are very much about threading.
I think there’s a lot of stuff that does work better with multiple program counters for different things executing at the same time with shared access to memory, although of course reasoning about concurrency is always hard and having to manage locks on things that shouldn’t be shared _right now_ is a lot of work and easy to get wrong.
I like Go and goroutines and channels as an intra-process communication mechanism.
My problem with processes, in the traditional sense, is that there ARE often pieces of state you want to share between the things concurrently acting on that state. A channel-like mechanism could work as IPC, but, equally well, why _not_ have multiple things—we can call them threads if you want—that can just see that shared state?
Adam
On Aug 2, 2021, at 6:49 PM, Jon Steinhart <jon@fourwinds.com> wrote:
>
> Bakul Shah writes:
>> Or perhaps the issue is not having graphics/GUI designers with the
>> creativity and sensibilities of the early Bell Labs crowd of researchers?
>> I keep thinking there ought to be something simpler/more elegant than
>> the current graphics subsystems....
>>
>>> On Aug 2, 2021, at 11:16 AM, Adam Thornton <athornton@gmail.com> wrote:
>>>
>>> It's a measure of Unix having been wounded by its own success.
>>>
>>> fork() is a great model for a single-threaded text processing pipeline to
>> do automated typesetting. (More generally, anything that is a straightforward
>> composition of filter/transform stages.) Which is, y'know, what Unix is *for*.
>>>
>>> It's not so great for a responsive GUI in front of a multi-function interactive program.
>>>
>>> These days, the vast majority of Unix applications are "stuff people play with on their phones."
>>>
>>> Adam
>
> I thought that I posted something about this recently when someone
> was arguing for threads being unnecessary and bad.
>
> My two cents is that GUIs were the big driver for threads. I would
> personally like to get rid of them as they're "the same but different"
> with regards to processes. My preference would be to solve the
> heaviness of processes problem. I'm not in the thick of that these
> days, but I don't see it being solved in software alone; it's going
> to take some serious hardware/software architecture work. Might be
> easier to accomplish now that the world has pretty much settled on
> the process model.
>
> Jon
AFAIK pretty much all GUI frameworks are (or were) single threaded.
At least I don't see the GUI as the main motivation for threads.
I was mainly complaining about the complexity of graphics subsystem,
including user interaction. Don't see what fork() has to do with it.
Bakul Shah writes:
> On Aug 2, 2021, at 6:49 PM, Jon Steinhart <jon@fourwinds.com> wrote:
> >
> > Bakul Shah writes:
> >> Or perhaps the issue is not having graphics/GUI designers with the
> >> creativity and sensibilities of the early Bell Labs crowd of researchers?
> >> I keep thinking there ought to be something simpler/more elegant than
> >> the current graphics subsystems....
> >>
> >>> On Aug 2, 2021, at 11:16 AM, Adam Thornton <athornton@gmail.com> wrote:
> >>>
> >>> It's a measure of Unix having been wounded by its own success.
> >>>
> >>> fork() is a great model for a single-threaded text processing pipeline to
> >> do automated typesetting. (More generally, anything that is a straightforward
> >> composition of filter/transform stages.) Which is, y'know, what Unix is *for*.
> >>>
> >>> It's not so great for a responsive GUI in front of a multi-function interactive program.
> >>>
> >>> These days, the vast majority of Unix applications are "stuff people play with on their phones."
> >>>
> >>> Adam
> >
> > I thought that I posted something about this recently when someone
> > was arguing for threads being unnecessary and bad.
> >
> > My two cents is that GUIs were the big driver for threads. I would
> > personally like to get rid of them as they're "the same but different"
> > with regards to processes. My preference would be to solve the
> > heaviness of processes problem. I'm not in the thick of that these
> > days, but I don't see it being solved in software alone; it's going
> > to take some serious hardware/software architecture work. Might be
> > easier to accomplish now that the world has pretty much settled on
> > the process model.
> >
> > Jon
>
> AFAIK pretty much all GUI frameworks are (or were) single threaded.
> At least I don't see the GUI as the main motivation for threads.
>
> I was mainly complaining about the complexity of graphics subsystem,
> including user interaction. Don't see what fork() has to do with it.
Not my experience. At least from what I was working on the 1980s
there was a need to be able to keep complicated state in graphical
programs which is when assembly language versions of threads came
into being. Made it possible for whatever was generating output
to be interrupted so that input could be handled in a timely manner.
I seem to recall getting some code from Sun for this that I ported
to a different system.
Jon
"Theodore Ts'o" <tytso@mit.edu> wrote:
> It's certainly clear that some kind of primitive is needed to create
> new threads. An open question is whether if there exists some kind of
> "new thread" primitve plus either spawn(2) or some kind of "create a
> child process and then then frob like crazy using 'echo XXX >
> /proc/<pid>/<magic files>'" whether there still is a need for a
> fork(2) system call.
I haven't caught up yet in this thread. Apologies if this has been
discussed already.
The Plan 9 folks blazed this trail over 30 years ago with rfork, where
you specify what bits you wish to duplicate. I don't remember details
anymore, but I think it was pretty elegant. IIRC Around that time Rob Pike
said "Threads are the lack of an idea", meaning, if you think you need
threads, you haven't thought about the problem hard enough. (Apologies
to Rob if I am misremembering and/or misrepresenting.)
Arnold
> fork() is a great model for a single-threaded text processing pipeline to do > automated typesetting. (More generally, anything that is a straightforward > composition of filter/transform stages.) Which is, y'know, what Unix is *for*. > It's not so great for a responsive GUI in front of a multi-function interactive program. "Single-threaded" is not a term I would apply to multiple processes in a pipeline. If you mean a single track of data flow, fine, but the fact that that's a prevalent configuration of cooperating processes in Unix is an artifact of shell syntax, not an inherent property of pipe-style IPC. The cooperating processes in Rob Pike's 20th century window systems and screen editors, for example, worked smoothly without interrupts or events - only stream connections. I see no abstract distinction between these programs and "stuff people play with on their phones." It bears repeating, too, that stream connections are much easier to reason about than asynchronous communication. Thus code built on streams is far less vulnerable to timing bugs. At last a prince has come to awaken the sleeping beauty of stream connections. In Go (Pike again) we have a widely accepted programming language that can fully exploit them, "[w]hich is, y'know, what Unix is 'for'." (If you wish, you may read "process" above to include threads, but I'll stay out of that.) Doug
[-- Attachment #1: Type: text/plain, Size: 2577 bytes --] I always believed that fork() was the very essence of the beauty of UNIX. So simple, yet so powerful. And it is far simpler than its predecessor on Genie, which looked a lot more like spawn(). With fork, the child inherits properties which may not have existed when the code was written, so it's much easier to reason about the behavior of sub-processes. Fork made writing the shell and pipelines much more obvious. Today we know that threads and shared mutable memory are a really bad idea, it's just that the hardware gave us no alternatives. I claim UNIX is directly responsible for the existence of MMUs in microprocessors. What if CPU designers would add facilities to directly implement inter-process or inter-processor messaging? Sadly, there has to be a dominant software paradigm for the hardware guys to target, so there's a nasty chicken and egg problem. Imagine if the Erlang model of distributed systems had taken off. Go gets us part of the way there, but cross-machine messaging is still a mess. On Tue, Aug 3, 2021 at 8:02 AM Douglas McIlroy < douglas.mcilroy@dartmouth.edu> wrote: > > fork() is a great model for a single-threaded text processing pipeline > to do > > automated typesetting. (More generally, anything that is a > straightforward > > composition of filter/transform stages.) Which is, y'know, what Unix > is *for*. > > > It's not so great for a responsive GUI in front of a multi-function > interactive program. > > "Single-threaded" is not a term I would apply to multiple processes in > a pipeline. If you mean a single track of data flow, fine, but the > fact that that's a prevalent configuration of cooperating processes in > Unix is an artifact of shell syntax, not an inherent property of > pipe-style IPC. The cooperating processes in Rob Pike's 20th century > window systems and screen editors, for example, worked smoothly > without interrupts or events - only stream connections. I see no > abstract distinction between these programs and "stuff people play > with on their phones." > > It bears repeating, too, that stream connections are much easier to > reason about than asynchronous communication. Thus code built on > streams is far less vulnerable to timing bugs. > > At last a prince has come to awaken the sleeping beauty of stream > connections. In Go (Pike again) we have a widely accepted programming > language that can fully exploit them, "[w]hich is, y'know, what Unix > is 'for'." > > (If you wish, you may read "process" above to include threads, but > I'll stay out of that.) > > Doug > -- - Tom [-- Attachment #2: Type: text/html, Size: 3445 bytes --]
> Go gets us part of the way there, but cross-machine messaging is still a mess.
Shed a tear for Plan 9 (Pike yet again). While many of its secondary
innovations have been stuffed into Linux; its animating
principle--transparently distributable computing--could not overcome
the enormous inertia of installed BSD-model systems.
Doug
On 8/3/21, arnold@skeeve.com <arnold@skeeve.com> wrote:
>
> I haven't caught up yet in this thread. Apologies if this has been
> discussed already.
>
> The Plan 9 folks blazed this trail over 30 years ago with rfork, where
> you specify what bits you wish to duplicate. I don't remember details
> anymore, but I think it was pretty elegant. IIRC Around that time Rob Pike
> said "Threads are the lack of an idea", meaning, if you think you need
> threads, you haven't thought about the problem hard enough. (Apologies
> to Rob if I am misremembering and/or misrepresenting.)
>
I've never really been a fan of the rfork()/clone() model, or at least
the Linux implementation of it that requires ugly library-level hacks
to share state between threads that the kernel doesn't support
sharing. Also, I don't really care for the large number of flags
required.
Up until now I was just planning on following the traditional
threading model of associating most state with processes with only
execution state being per-thread in the OS I'm working on, but now I'm
thinking I should reduce the state associated with a process to just
the PID, PPID, PGID, containing cgroup, command line, and list of
threads. All other state would be contained in various types of
context objects that are not tied to a particular process or thread
(other than being destroyed when no more threads are associated with
them). This would include:
Filesystem namespace
File descriptors
Address space
Security context (file permission list, UID, GID)
Signal handlers
Scheduling context
Each of these object types would be completely separate from the
others, allowing full control over which state is shared and which is
private. I'm using seL4 as a microkernel, and it already works like
this (it has no real concept of processes, only threads that are each
associated with an address space, a capability space, and a scheduling
context) so it's a good match for it.
exec() would still replace all threads within a process as on
traditional Unix, unless the exec is performed within a child process
that hasn't yet been started. Sending a signal to an entire process
would send it to every signal group within the process (similarly, it
would be possible to send a signal to an entire cgroup; basically,
processes will really just be a special kind of cgroup in this model).
On 8/3/21, Andrew Warkentin <andreww591@gmail.com> wrote:
>
> Up until now I was just planning on following the traditional
> threading model of associating most state with processes with only
> execution state being per-thread in the OS I'm working on, but now I'm
> thinking I should reduce the state associated with a process to just
> the PID, PPID, PGID, containing cgroup, command line, and list of
> threads. All other state would be contained in various types of
> context objects that are not tied to a particular process or thread
> (other than being destroyed when no more threads are associated with
> them).
For what it's worth, this is the process model that Windows NT has.
This thread has discussed two very different ways to do command line
processing: the TOPS-20 model, where the program image for the
command is activated immediately and then calls back to the OS to
process the command line elements, and the UNIX model, where the first
element of the command line is a path (full or partial) to the program
image for the command, and it's entirely up to that image to process
the command line elements as it sees fit.
VMS takes a third approach. The VAX had four, hierarchical execution
modes: user, supervisor, exec, and kernel, in increasing order of
privilege. The VAX had "change mode" instructions (CHMU, CHMS, CHME,
CHMK) that ran the current process's change-mode interrupt handler to
process the request. Applications and programs to process commands
ran in user mode. The DCL command interpreter (VMS's equivalent of
the UNIX shell) ran in supervisor mode. Exec mode was reserved for
record management services (RMS), VMS's equivalent of the file system
code in UNIX. Kernel mode (the only mode where privileged
instructions can be executed) was, of course, for the kernel. One
oddity of VMS is that each process had the supervisor, exec, and
kernel code and data mapped to its address space.
Commands in VMS are described using a meta-language that is compiled
into a set of data tables that the command line interpreter (CLI,
running in supervisor mode) uses to determine which program image
needs to be activated to execute the command. The CLI also does a
full parse of the command line. It then calls the kernel's image
activator to map the appropriate program image and switches to user
mode to start it running. There is a runtime library of routines to
retrieve the various options and parameters from the command line (the
routines in this library do CHMS calls to achieve this).
So on VMS we have a situation where the command interpreter (shell) is
part of every process. Processes are created by the "create process"
OS library routine, sys$creprc, which is a wrapper around the
appropriate CHMK instruction. Unlike fork() on UNIX, the new process
inherits very little state from its parent. Most significantly it
inherits neither address space nor open file context. VMS process
creation is also notoriously expensive compared to fork() on UNIX.
-Paul W.
Tom Lyon via TUHS <tuhs@minnie.tuhs.org> writes:
> What if CPU designers would add facilities to directly implement
> inter-process or inter-processor messaging?
You mean like INMOS's Transputer architecture from back in the late
eighties and early nineties? Each processor had a thread scheduling and
message passing microkernel implemented in its microcode, and had four
bi-directional links to other processors, so you could build a grid.
They designed the language Occam along with it, to be its lowest level
language; it and the instruction set were designed to match. Occam has
threads and message passing as built-in concepts, of course.
-tih (playing with the Helios distributed OS on Transputer hardware)
--
Most people who graduate with CS degrees don't understand the significance
of Lisp. Lisp is the most important idea in computer science. --Alan Kay
[-- Attachment #1: Type: text/plain, Size: 1148 bytes --] I never had the pleasure of playing with Transputers, but it sounds nice. Of course, I'd want the networking to be Ethernet/IP centric these days. On Wed, Aug 11, 2021 at 11:11 AM Tom Ivar Helbekkmo <tih@hamartun.priv.no> wrote: > Tom Lyon via TUHS <tuhs@minnie.tuhs.org> writes: > > > What if CPU designers would add facilities to directly implement > > inter-process or inter-processor messaging? > > You mean like INMOS's Transputer architecture from back in the late > eighties and early nineties? Each processor had a thread scheduling and > message passing microkernel implemented in its microcode, and had four > bi-directional links to other processors, so you could build a grid. > They designed the language Occam along with it, to be its lowest level > language; it and the instruction set were designed to match. Occam has > threads and message passing as built-in concepts, of course. > > -tih (playing with the Helios distributed OS on Transputer hardware) > -- > Most people who graduate with CS degrees don't understand the significance > of Lisp. Lisp is the most important idea in computer science. --Alan Kay > -- - Tom [-- Attachment #2: Type: text/html, Size: 1777 bytes --]
[-- Attachment #1: Type: text/plain, Size: 2705 bytes --] [[ I'm digging through old mail -- my summer has been preoccupied by things that kept me from most everything else, including computing. ]] At Sun, 1 Aug 2021 18:13:18 -0600, Andrew Warkentin <andreww591@gmail.com> wrote: Subject: Re: [TUHS] Systematic approach to command-line interfaces > > There's a third kind of primitive that is superior to either spawn() > or fork() IMO, specifically one that creates a completely empty child > process and returns a context that lets the parent set up the child's > state using normal APIs. That's actually what fork(2) is, effectively -- it sets up a new process that then effectively has control over its own destiny, but only by using code supplied by the parent process, and thus it is also working within the limits of the Unix security model. The fact that fork() happens to also do some of the general setup useful in a unix-like system is really just a merely a convenience -- you almost always want all those things to be done anyway. I agree there is some messiness introduced in more modern environments, especially w.r.t. threads, but there is now general consensus on how to handle such things. I'll also note here instead of in a separate message that Ted's followup questions about the API design and security issues with having the parent process have to do all the setup from its own context are exactly the problems that fork() solves -- the elegance of fork() is incredible! You just have to look at it the right way around, and with the Unix security model firmly in mind. I personally find spawn() to be the spawn of the devil, worse by a million times than any alternative, including the Multics process model (which could have made very good use of threads to increase concurrency in handling data pipelines, for example -- it was even proposed at the time). Spawn() is narrow-minded, inelegant, and an antique by design. I struggled for a very long time as an undergrad to understand the Multics process model, but now that I know more about hypervisors (i.e. the likes of Xen) it makes perfect sense to me. I now struggle with liking the the Unix concept of "everything is a file" -- especially with respect to actual data files. Multics also got it right to use single-level storage -- that's the right abstraction for almost everything, i.e. except some forms of communications (for which Multics I/O was a very clever and elegant design). The "unix" nod to single level storage by way of mmap() suffers from horribly bad design and neglect. -- Greg A. Woods <gwoods@acm.org> Kelowna, BC +1 250 762-7675 RoboHack <woods@robohack.ca> Planix, Inc. <woods@planix.com> Avoncote Farms <woods@avoncote.ca> [-- Attachment #2: OpenPGP Digital Signature --] [-- Type: application/pgp-signature, Size: 195 bytes --]
On Tue, Sep 28, 2021 at 10:46:25AM -0700, Greg A. Woods wrote:
> The "unix" nod to
> single level storage by way of mmap() suffers from horribly bad design
> and neglect.
I supported Xerox PARC when they were redoing their OS as a user space
application on SunOS 4.x. They used mmap() and protections to take
user level page faults. Yeah, there were bugs but that was ~30 years
ago.
In more recent times, BitKeeper used mmap() and protections to take the
same page faults (we implemented a compressed, XORed file storage that
filled in "pages" on demand, it was a crazy performance improvement)
and that worked on pretty much every Unix we tried it on. Certainly
worked on Linux first try.
So what is it about mmap you don't like?
> From: "Greg A. Woods" > the elegance of fork() is incredible! That's because in PDP-11 Unix, they didn't have the _room_ to create a huge mess. Try reading the exec() code in V6 or so. (I'm in a bit of a foul mood today; my laptop sorta locked up when a _single_ Edge browser window process grew to almost _2GB_ in size. Are you effing kidding me? If I had any idea what today would look like, back when I was 20 - especially the massive excrement pile that the Internet has turned into - I never would have gone into computers - cabinetwork, or something, would have been an infinitely superior career choice.) > I now struggle with liking the the Unix concept of "everything is a > file" -- especially with respect to actual data files. Multics also got > it right to use single-level storage -- that's the right abstraction Well, files a la Unix, instead of the SLS, are OK for a _lot_ of data storage - pretty much everything except less-common cases like concurrent access to a shared database, etc. Where the SLS really shines is _code_ - being able to just do a subroutine call to interact with something else has incredible bang/buck ratio - although I concede doing it all securely is hard (although they did make a lot of progress there). Noel
[-- Attachment #1: Type: text/plain, Size: 2163 bytes --] At Tue, 28 Sep 2021 11:10:16 -0700, Larry McVoy <lm@mcvoy.com> wrote: Subject: Re: [TUHS] Systematic approach to command-line interfaces > > On Tue, Sep 28, 2021 at 10:46:25AM -0700, Greg A. Woods wrote: > > The "unix" nod to > > single level storage by way of mmap() suffers from horribly bad design > > and neglect. > > So what is it about mmap you don't like? Mmap() as we have it today almost completely ignores the bigger picture and the lessons that came before it. It was an add-on hack that basically said only "Oh, Yeah, we can do that too! Look at this." -- and nobody bothered to look for decades. For one it has no easy direct language support (though it is possible in C to pretend to use it directly, though the syntax often gets cumbersome). Single-level-storage was obviously designed into Multics from the beginning and from the ground up, and it was easily used in the main languages supported on Multics -- but it was just an add-on hack in Unix (that, if memory serves me correctly, was initially only poorly used in another extremely badly designed add-on hack that didn't pay any attention whatsoever to past lessons, i.e. dynamic linking. which to this day is a horror show of inefficiencies and bad hacks). I think perhaps the problem was that mmap() came too soon in a narrow sub-set of the Unix implementations that were around at the time, when many couldn't support it well (especially on 32-bit systems -- it really only becomes universally useful with either segments or 64-bit and larger address spaces). The fracturing of "unix" standards at the time didn't help either. Perhaps these "add-on hack" problems are the reason so many people think fondly of the good old Unix versions where everything was still coming from a few good minds that could work together to build a cohesive design. The add-ons were poorly done, not widely implemented, and usually incompatible with each other when they were adopted by additional implementations. -- Greg A. Woods <gwoods@acm.org> Kelowna, BC +1 250 762-7675 RoboHack <woods@robohack.ca> Planix, Inc. <woods@planix.com> Avoncote Farms <woods@avoncote.ca> [-- Attachment #2: OpenPGP Digital Signature --] [-- Type: application/pgp-signature, Size: 195 bytes --]
On Wed, Sep 29, 2021 at 09:40:23AM -0700, Greg A. Woods wrote: > At Tue, 28 Sep 2021 11:10:16 -0700, Larry McVoy <lm@mcvoy.com> wrote: > Subject: Re: [TUHS] Systematic approach to command-line interfaces > > > > On Tue, Sep 28, 2021 at 10:46:25AM -0700, Greg A. Woods wrote: > > > The "unix" nod to > > > single level storage by way of mmap() suffers from horribly bad design > > > and neglect. > > > > So what is it about mmap you don't like? > > Mmap() as we have it today almost completely ignores the bigger picture > and the lessons that came before it. > > It was an add-on hack that basically said only "Oh, Yeah, we can do that > too! Look at this." -- and nobody bothered to look for decades. > > For one it has no easy direct language support (though it is possible in > C to pretend to use it directly, though the syntax often gets cumbersome). > > Single-level-storage was obviously designed into Multics from the > beginning and from the ground up, and it was easily used in the main > languages supported on Multics -- but it was just an add-on hack in Unix > (that, if memory serves me correctly, was initially only poorly used in > another extremely badly designed add-on hack that didn't pay any > attention whatsoever to past lessons, i.e. dynamic linking. which to > this day is a horror show of inefficiencies and bad hacks). > > I think perhaps the problem was that mmap() came too soon in a narrow > sub-set of the Unix implementations that were around at the time, when > many couldn't support it well (especially on 32-bit systems -- it really > only becomes universally useful with either segments or 64-bit and > larger address spaces). The fracturing of "unix" standards at the time > didn't help either. I think you didn't use SunOS 4.x. mmap() was implemented correctly there, the 4.x VM system mostly got rid of the buffer cache (the buffer cache was used only for reading directories and inodes, there was no regular file data there). If you read(2) a page and mmap()ed it and then did a write(2) to the page, the mapped page is the same physical memory as the write()ed page. Zero coherency issues. This was not true in other systems, they copied the page from the buffer cache and had all sorts of coherency problems. It took about a decade for other Unix implementations to catch up and I think that's what you are hung up on. SunOS 4.x got it right. You can read about it, I have all the papers cached at http://mcvoy.com/lm/papers ZFS screwed it all up again, ZFS has it's own cache because they weren't smart enough to know how to make compressed file systems use the page cache (we did it in BitKeeper so I have an existance proof that it is possible). I was deeply disapointed to hear that ZFS screwed up that badly, the Sun I was part of would have NEVER even entertained such an idea, they worked so hard to get a unified page cache. It's just sad.
> From: Larry McVoy > If you read(2) a page and mmap()ed it and then did a write(2) to the > page, the mapped page is the same physical memory as the write()ed > page. Zero coherency issues. Now I'm confused; read() and write() semantically include a copy operation (so there are then two copies of that data chunk, and possible consistency issues between them), and the copied item is not necessarily page-sized (so you can't ensure consistency between the original+copy by mapping it in). So when one does a read(file, &buffer, 1), one gets a _copy of just that byte_ in the process' address space (and similar for write()). Yes, there's no coherency issue between the contents of an mmap()'d page, and the system's idea of what's in that page of the file, but that's a _different_ coherency issue. Or am I confused? PS: > From: "Greg A. Woods" > I now struggle with liking the the Unix concept of "everything is a > file" -- especially with respect to actual data files. Multics also got > it right to use single-level storage -- that's the right abstraction Oh, one other thing that SLS breaks, for data files, is the whole Unix 'pipe' abstraction, which is at the heart of the whole Unix tools paradigm. So no more 'cmd | wc' et al. And since SLS doesn't have the 'make a copy' semantics of pipe output, it would be hard to trivially work around it. Yes, one could build up a similar framework, but each command would have to specify an input file and an output file (no more 'standard in' and 'out'), and then the command interpreter would have to i) take command A's output file and feed it to command B, and ii) delete A's output file when the whole works was done. Yes, the user could do it manually, but compare: cmd aaa | wc and cmd aaa bbb wc bbb rm bbb If bbb is huge, one might run out of room, but with today's 'light my cigar with disk blocks' life, not a problem - but it would involve more disk traffic, as bbb would have to be written out in its entirety, not just have a mall piece kept in the disk cache as with a pipe. Noel
[-- Attachment #1: Type: text/plain, Size: 3553 bytes --] On Wed, Sep 29, 2021 at 2:08 PM Noel Chiappa <jnc@mercury.lcs.mit.edu> wrote: > > From: Larry McVoy > > > If you read(2) a page and mmap()ed it and then did a write(2) to the > > page, the mapped page is the same physical memory as the write()ed > > page. Zero coherency issues. > > Now I'm confused; read() and write() semantically include a copy operation > (so there are then two copies of that data chunk, and possible consistency > issues between them), and the copied item is not necessarily page-sized (so > you can't ensure consistency between the original+copy by mapping it in). > So > when one does a read(file, &buffer, 1), one gets a _copy of just that byte_ > in the process' address space (and similar for write()). > > Yes, there's no coherency issue between the contents of an mmap()'d page, > and > the system's idea of what's in that page of the file, but that's a > _different_ coherency issue. > > Or am I confused? > I think that mention of `read` here is a bit of a red-herring; presumably Larry only mentioned it so that the reader would assume that the read blocks are in the buffer cache when discussing the write vs mmap coherency issue with respect to unmerged page and buffer caches (e.g., `*p = 1;` modifies some page in the VM cache, but not the relevant block in the buffer cache, so a subsequent `read` on the mmap'd file descriptor won't necessarily reflect the store). I don't think that is strictly necessary, though; presumably the same problem exists even if the file data isn't in block cache yet. PS: > > From: "Greg A. Woods" > > > I now struggle with liking the the Unix concept of "everything is a > > file" -- especially with respect to actual data files. Multics also > got > > it right to use single-level storage -- that's the right abstraction > > Oh, one other thing that SLS breaks, for data files, is the whole Unix > 'pipe' > abstraction, which is at the heart of the whole Unix tools paradigm. So no > more 'cmd | wc' et al. And since SLS doesn't have the 'make a copy' > semantics of pipe output, it would be hard to trivially work around it. > I don't know about that. One could still model a pipe as an IO device; Multics supported tapes, printers and terminals, after all. It even had pipes! https://web.mit.edu/multics-history/source/Multics/doc/info_segments/pipes.gi.info Yes, one could build up a similar framework, but each command would have to > specify an input file and an output file (no more 'standard in' and 'out'), Why? To continue with the Multics example, it supported `sysin` and `sysprint` from PL/1, referring to the terminal by default. > and then the command interpreter would have to i) take command A's output > file > and feed it to command B, and ii) delete A's output file when the whole > works > was done. Yes, the user could do it manually, but compare: > > cmd aaa | wc > > and > > cmd aaa bbb > wc bbb > rm bbb > > If bbb is huge, one might run out of room, but with today's 'light my cigar > with disk blocks' life, not a problem - but it would involve more disk > traffic, as bbb would have to be written out in its entirety, not just > have a > mall piece kept in the disk cache as with a pipe. > It feels like all you need is a stream abstraction and programs written to use it, which doesn't seem incompatible with SLS. Perhaps the argument is that in a SLS system one wouldn't be motivated to write programs that way, whereas in a program with a Unix-style IO mechanism it's more natural? - Dan C. [-- Attachment #2: Type: text/html, Size: 4861 bytes --]
On Wed, Sep 29, 2021 at 02:07:42PM -0400, Noel Chiappa wrote:
> > From: Larry McVoy
>
> > If you read(2) a page and mmap()ed it and then did a write(2) to the
> > page, the mapped page is the same physical memory as the write()ed
> > page. Zero coherency issues.
>
> Now I'm confused; read() and write() semantically include a copy operation
> (so there are then two copies of that data chunk, and possible consistency
> issues between them), and the copied item is not necessarily page-sized (so
> you can't ensure consistency between the original+copy by mapping it in). So
> when one does a read(file, &buffer, 1), one gets a _copy of just that byte_
> in the process' address space (and similar for write()).
>
> Yes, there's no coherency issue between the contents of an mmap()'d page, and
> the system's idea of what's in that page of the file, but that's a
> _different_ coherency issue.
That "different" coherency issue is the one I was talking about. SunOS
got rid of it, when HP-UX etc grudgingly implemented mmap() they did not
have a unified page cache, they had pages that were mmapped but the data
was copied from the buffer cache. It was a mess for years and years.
Greg A. Woods wrote: > [[ I'm digging through old mail -- my summer has been preoccupied by > things that kept me from most everything else, including computing. ]] > > At Sun, 1 Aug 2021 18:13:18 -0600, Andrew Warkentin <andreww591@gmail.com> wrote: > Subject: Re: [TUHS] Systematic approach to command-line interfaces > > > > There's a third kind of primitive that is superior to either spawn() > > or fork() IMO, specifically one that creates a completely empty child > > process and returns a context that lets the parent set up the child's > > state using normal APIs. > > That's actually what fork(2) is, effectively -- it sets up a new process > that then effectively has control over its own destiny, but only by > using code supplied by the parent process, and thus it is also working > within the limits of the Unix security model. The original post above made me think of the TENEX (later TOPS-20) primatives for fork (a noun, aka process) control: SFORK -- create an empty fork/process (halted) GET -- map executable SFORK -- start fork HFORK -- halt a running fork KFORK -- kill a fork SPJFN -- set primary file job file numbers (stdin/stdout) SPLFK -- splice a fork into tree TENEX, like UNIX was created with with knowledge of the Berkeley Timesharing System (SDS 940) and MULTICS. Like MULTICS, TENEX was designed from square one as a VM system, and I believe the 4.2BSD specified mmap call was inspired by the TENEX PMAP call (which can map file pages into a process AND map process pages into a file, and map process pages from another process). The "halted" process state was also used when a user typed CTRL/C. A halted process could be debugged (either in-process, entering a newly mapped debugger, or one already linked in, or out-of-process by splicing a debugger into the process tree). Threads were easily implemented by mapping (selected pages of) the parent process (leaving others copy-on-write, or zero-fill for thread-local stoage). Starting on small machines (an 8KW PDP-7, and a (28KW?) PDP-11) UNIX placed a premium on maximum usefulness in the minimum space. The PDP-7 source we have implements fork (implemented, as on the PDP-11 by swapping out the forking process) but not exec! The Plan9 rfork unbundles traditional Unix descriptor and memory inheritance behaviors. For all the VM generality, a sore place (for me) in TENEX/TOPS-20, a single file descriptor (job file number) was shared by all processes in a login session ("job"). "Primary" input and output streams were however per-process, but, ISTR there was nothing to stop another process from closing a stream another process was using. And like MULTICS, TENEX had byte-stream I/O, implemented day-one for disk files (I'd have to look, but system code may well have implemented it by issuing PMAP calls (monitor call code could invoke monitor calls)), and most simple user programs used it, since it was simpler to program than file mapping. refs: https://opost.com/tenex/tenex72.txt https://www.opennet.ru/docs/BSD/design-44bsd-eng/x312.html http://www.bitsavers.org/pdf/dec/pdp10/TOPS20/AA-4166E-TM_TOPS-20_Monitor_Calls_Reference_Ver_5_Dec82.pdf P.S. And on the ORIGINAL topic, TOPS-20 started with code from the TENEX EXEC (shell) that implemented command completion and incremental, and made it the COMND system call (tho it could well have been a shared library, since almost all of the COMND code called other system calls to do the work).
> one other thing that SLS breaks, for data files, is the whole Unix 'pipe'
> abstraction, which is at the heart of the whole Unix tools paradigm.
Multics had an IO system with an inherent notion of redirectable data
streams. Pipes could have--and eventually did (circa 1987)--fit into
that framework. I presume a pipe DIM (device interface manager)
was not hard to build once it was proposed and accepted.
Doug
[-- Attachment #1: Type: text/plain, Size: 2928 bytes --] At Wed, 29 Sep 2021 09:57:15 -0700, Larry McVoy <lm@mcvoy.com> wrote: Subject: Re: [TUHS] Systematic approach to command-line interfaces > > On Wed, Sep 29, 2021 at 09:40:23AM -0700, Greg A. Woods wrote: > > I think perhaps the problem was that mmap() came too soon in a narrow > > sub-set of the Unix implementations that were around at the time, when > > many couldn't support it well (especially on 32-bit systems -- it really > > only becomes universally useful with either segments or 64-bit and > > larger address spaces). The fracturing of "unix" standards at the time > > didn't help either. > > I think you didn't use SunOS 4.x. mmap() was implemented correctly > there, the 4.x VM system mostly got rid of the buffer cache (the > buffer cache was used only for reading directories and inodes, there > was no regular file data there). If you read(2) a page and mmap()ed > it and then did a write(2) to the page, the mapped page is the same > physical memory as the write()ed page. Zero coherency issues. Implementation isn't really what I meant to talk directly about -- I meant "integration", and especially integration outside the kernel. > This was not true in other systems, they copied the page from the > buffer cache and had all sorts of coherency problems. It took > about a decade for other Unix implementations to catch up and I > think that's what you are hung up on. Such implementation issues are just a smaller part of the problem, though obviously they delayed the wider use of mmap() in such broken implementations. The fact wasn't even available at all on many kernel implementations at the time (the way open(2), read(2), write(2), et al were/are), is equally important too of course -- 3rd party software developers effectively wouldn't use it because of this. So, the main part of the problem to me is that mmap() wasn't designed into any unix deprived or unix-like system coherently (i.e. including at user level) (that I'm aware of). It wasn't integrated into languages or system libraries (stdio f*() functions could probably even have used it, since I think that's how stdio was (or could have been) emulated on Multics for the C compiler and libc). It all reminds me of how horrible the socket(2)/send(2)/sendmsg(2) hack is -- i.e. socket descriptors should have just been file descriptors, opened with open(2). I guess pipe(2) kind of started this mess, and even Plan 9 didn't seem to do anything remarkable to address pipe creation as being subtly different from just using open(2). Maybe I'm going to far with thinking pipe() could/should have just been a library call that used open(2) internally, perhaps connecting the descriptors by opening some kind of "cloning" device in the filesystem. -- Greg A. Woods <gwoods@acm.org> Kelowna, BC +1 250 762-7675 RoboHack <woods@robohack.ca> Planix, Inc. <woods@planix.com> Avoncote Farms <woods@avoncote.ca> [-- Attachment #2: OpenPGP Digital Signature --] [-- Type: application/pgp-signature, Size: 195 bytes --]
Greg wrote:
> I guess pipe(2) kind of started this mess, [...] Maybe I'm
> going to far with thinking pipe() could/should have just been a library
> call that used open(2) internally, perhaps connecting the descriptors by
> opening some kind of "cloning" device in the filesystem.
At times I’ve been pondering this as well. All of creat/open/pipe could have been rolled into just open(). It is not clear to me why this synthesis did not happen around the time of 7th edition; although it seems the creat/open merger happened in BSD around that time.
As to pipe(), the very first implementation returned just a single fd where writes echoed to reads. It was backed by a single disk buffer, so could only hold ~500 bytes, which was probably not enough in practice. Then it was reimplemented using an anonymous file as backing store and got the modern two fd system call. The latter probably arose as a convenient hack to store the two file pointers needed.
It would have been possible to implement the anonymous file solution still using a single fd, and storing the second file pointer in the inode. Maybe this felt as a worse hack at the time (the conceptual split in vnode / inode was still a decade into the future.)
With a single fd, it would also have been possible to have a cloning device for pipe’s as you suggest (e.g. /dev/pipe, somewhat analogous to the implementation of /dev/stdin in 10th edition). Arguably, in total code/data size this would not have been much different from pipe().
My guess is that from a 1975 perspective, creat/open/pipe was not perceived as something that needed fixing.