From mboxrd@z Thu Jan 1 00:00:00 1970 From: dot@dotat.at (Tony Finch) Date: Mon, 1 Feb 2016 10:38:53 +0000 Subject: [TUHS] Short history of 'grep' In-Reply-To: <20160131023700.GB7917@mercury.ccil.org> References: <20160130030012.GB9762@minnie.tuhs.org> <56AD0AB7.40701@mhorton.net> <56AD1B28.4010908@mhorton.net> <20160131023700.GB7917@mercury.ccil.org> Message-ID: John Cowan wrote: > Dave Horsfall scripsit: > > > I'm still trying to get my around about how a program such as "egrep" > > which handles complex patterns can be faster than one that doesn't... It > > seems to defeat all logic :-) > [...] > Classic grep uses backtracking, which makes it much slower on problematic > expressions like "a*b" where there is no b in the input. On the other > hand, creating a deterministic automaton has higher setup costs. Right. The relevant section in the article that started this thread says: : Al Aho decided to put theory into practice, and implemented full regular : expressions (including alternation and grouping which were missing from : grep)and wrote egrep over a weekend. Fgrep, specialised for the case of : multiple (alternate) literal strings, was written in the same weekend. : Egrep was about twice as fast as grep for simplecharacter searches but was : slower for complex search patterns (due to the high cost of build-ing the : state machine that recognised the patterns). The "putting theory into practice" refers to compiling the regex to a DFA, rather than interpreting an NFA. Russ Cox has a good summary of differing regex implementation techniques at https://swtch.com/~rsc/regexp/regexp1.html This makes me wonder how well-known was the technique of compiling to a DFA, and whether it was widely implemented before awk, egrep, and lex. Tony. -- f.anthony.n.finch http://dotat.at/ Fair Isle, Faeroes: Southeast 6 to gale 8, veering southwest gale 8 to storm 10, becoming cyclonic later. Very rough, becoming high or very high. Rain or squally showers. Moderate or poor.