On Tue, Oct 6, 2020 at 5:54 AM Tyler Adams <coppero1237@gmail.com> wrote:

How did globbing come about in unix?

It's been present at least since the PDP-11 migration. The Thompson shell, used in the 1st through 6th Editions, used a separate program called /etc/glob to do the dirty work, presumably in order to keep /bin/sh as small as possible. Unfortunately, glob never got its own man page, so its protocol for communicating with the shell is lost, unless someone remembers it and writes it down (hint, hint).

Related, as regexes were already well known because of qed/ed, why wasn't a subset of regular expressions used instead?

The use of * and ?  along with file extensions preceded by dot (as in ".c" and ".o") are, or so it seems to me, an inheritance from the DEC operating systems, starting with Monitor (later called TOPS-10) in 1964 and going right through OpenVMS.  In the file systems used by those OSes, the "filename" (typically up to 6 characters) and the "extension" (typically up to 3 characters) were stored separately both on disk and in memory, and the separating dot was parsed by user programs before invoking the appropriate kernel routine.  (That is why it is still true in WIndows that "foo" and "foo." refer to the same file.)  

Because dot was not in any way magic to the Unix file system, and because file names were limited to 14 characters, extensions were kept short.  However, the path that leads from DEC OSes to CP/M to MS-DOS to Windows has kept the 3-letter extension alive, and we now see plenty of it in Unix-style OSes.  Thus using dot to mean "any character" would seriously collide with this well-established usage as the extension separator.

Globbing was uninterpreted by the shell-equivalent in the DEC OSes, and was understood only by a few programs, those responsible for listing directories and copying, renaming, and deleting files.  Universal globbing in the shell was AFAIK original with Unix, though Prime Computer's PRIMOS also had it and may have been earlier by a year or two.  "It steam-engines when it comes steam-engine time."  Both were direct descendants of Multics; I have not been able to find out anything about 

TIL that GNU find(1) supplements the standard -name option (which globs against the filename) with -regex (which matches the regex against the whole path).



John Cowan          http://vrici.lojban.org/~cowan        cowan@ccil.org
The Imperials are decadent, 300 pound free-range chickens (except they have
teeth, arms instead of wings, and dinosaurlike tails).  --Elyse Grasso