From: dot@dotat.at (Tony Finch)
Subject: [TUHS] Short history of 'grep'
Date: Mon, 1 Feb 2016 10:38:53 +0000 [thread overview]
Message-ID: <alpine.LSU.2.00.1602011030020.21662@hermes-2.csi.cam.ac.uk> (raw)
In-Reply-To: <20160131023700.GB7917@mercury.ccil.org>
John Cowan <cowan at mercury.ccil.org> wrote:
> Dave Horsfall scripsit:
>
> > I'm still trying to get my around about how a program such as "egrep"
> > which handles complex patterns can be faster than one that doesn't... It
> > seems to defeat all logic :-)
>
[...]
> Classic grep uses backtracking, which makes it much slower on problematic
> expressions like "a*b" where there is no b in the input. On the other
> hand, creating a deterministic automaton has higher setup costs.
Right. The relevant section in the article that started this thread says:
: Al Aho decided to put theory into practice, and implemented full regular
: expressions (including alternation and grouping which were missing from
: grep)and wrote egrep over a weekend. Fgrep, specialised for the case of
: multiple (alternate) literal strings, was written in the same weekend.
: Egrep was about twice as fast as grep for simplecharacter searches but was
: slower for complex search patterns (due to the high cost of build-ing the
: state machine that recognised the patterns).
The "putting theory into practice" refers to compiling the regex to a DFA,
rather than interpreting an NFA.
Russ Cox has a good summary of differing regex implementation techniques
at https://swtch.com/~rsc/regexp/regexp1.html
This makes me wonder how well-known was the technique of compiling to a
DFA, and whether it was widely implemented before awk, egrep, and lex.
Tony.
--
f.anthony.n.finch <dot at dotat.at> http://dotat.at/
Fair Isle, Faeroes: Southeast 6 to gale 8, veering southwest gale 8 to storm
10, becoming cyclonic later. Very rough, becoming high or very high. Rain or
squally showers. Moderate or poor.
next prev parent reply other threads:[~2016-02-01 10:38 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-30 3:00 Warren Toomey
2016-01-30 19:10 ` Mary Ann Horton
2016-01-30 19:44 ` Dave Horsfall
2016-01-30 20:20 ` Mary Ann Horton
2016-01-30 20:40 ` Dave Horsfall
2016-01-30 21:42 ` Marc Rochkind
2016-01-31 1:41 ` Dave Horsfall
2016-01-31 1:50 ` Larry McVoy
2016-01-31 2:06 ` jason-tuhs
2016-01-31 4:20 ` Random832
2016-01-31 17:11 ` Mary Ann Horton
2016-01-31 17:38 ` John Cowan
2016-02-01 10:48 ` Tony Finch
2016-01-31 2:37 ` John Cowan
2016-02-01 10:38 ` Tony Finch [this message]
2016-02-01 19:26 ` scj
2016-01-31 17:01 Doug McIlroy
2016-03-05 1:48 ` Dave Horsfall
2016-03-05 1:54 ` Larry McVoy
2016-02-01 20:57 Doug McIlroy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.00.1602011030020.21662@hermes-2.csi.cam.ac.uk \
--to=dot@dotat.at \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).