From mboxrd@z Thu Jan  1 00:00:00 1970
From: dot@dotat.at (Tony Finch)
Date: Mon, 1 Feb 2016 10:38:53 +0000
Subject: [TUHS] Short history of 'grep'
In-Reply-To: <20160131023700.GB7917@mercury.ccil.org>
References: <20160130030012.GB9762@minnie.tuhs.org>
 <56AD0AB7.40701@mhorton.net>
 <alpine.BSF.2.11.1601310639150.14724@aneurin.horsfall.org>
 <56AD1B28.4010908@mhorton.net>
 <alpine.BSF.2.11.1601310740170.14724@aneurin.horsfall.org>
 <CAOkr1zUprhZws1WoD+4fuepMZkzNLO9HjNfLp1fGrDnjx4JKtA@mail.gmail.com>
 <alpine.BSF.2.11.1601311227470.14724@aneurin.horsfall.org>
 <20160131023700.GB7917@mercury.ccil.org>
Message-ID: <alpine.LSU.2.00.1602011030020.21662@hermes-2.csi.cam.ac.uk>

John Cowan <cowan at mercury.ccil.org> wrote:
> Dave Horsfall scripsit:
>
> > I'm still trying to get my around about how a program such as "egrep"
> > which handles complex patterns can be faster than one that doesn't...  It
> > seems to defeat all logic :-)
>
[...]
> Classic grep uses backtracking, which makes it much slower on problematic
> expressions like "a*b" where there is no b in the input.  On the other
> hand, creating a deterministic automaton has higher setup costs.

Right. The relevant section in the article that started this thread says:

: Al Aho decided to put theory into practice, and implemented full regular
: expressions (including alternation and grouping which were missing from
: grep)and wrote egrep over a weekend. Fgrep, specialised for the case of
: multiple (alternate) literal strings, was written in the same weekend.
: Egrep was about twice as fast as grep for simplecharacter searches but was
: slower for complex search patterns (due to the high cost of build-ing the
: state machine that recognised the patterns).

The "putting theory into practice" refers to compiling the regex to a DFA,
rather than interpreting an NFA.

Russ Cox has a good summary of differing regex implementation techniques
at https://swtch.com/~rsc/regexp/regexp1.html

This makes me wonder how well-known was the technique of compiling to a
DFA, and whether it was widely implemented before awk, egrep, and lex.

Tony.
-- 
f.anthony.n.finch  <dot at dotat.at>  http://dotat.at/
Fair Isle, Faeroes: Southeast 6 to gale 8, veering southwest gale 8 to storm
10, becoming cyclonic later. Very rough, becoming high or very high. Rain or
squally showers. Moderate or poor.