From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <ee9e417a0702230933y7d4717b2nd3c51e015f26aeed@mail.gmail.com>
Date: Fri, 23 Feb 2007 12:33:24 -0500
From: "Russ Cox" <rsc@swtch.com>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu>
Subject: Re: Composition of regexps (Was re: [9fans] regular expressions in
	plan9 different from the ones in unix?)
In-Reply-To: <224a39ebca9aaa0370eb804cd59e6aac@plan9.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <ee9e417a0702221548l2caf1a85m276c7b34e2e55b65@mail.gmail.com>
	<224a39ebca9aaa0370eb804cd59e6aac@plan9.jp>
Topicbox-Message-UUID: 1376f776-ead2-11e9-9d60-3106f5b1d025

Lex has three benefits:

 1) You don't have to write the lexer directly.
 2) What you do have to write is fairly concise.
 3) The resulting lexer is fairly efficient.

It has two main drawbacks:

 4) The input model does not always match your
 own program's input model, creating a messy interface.
 5) Once you need more than regular expressions,
 lexers written with state variables and such can get
 very opaque very fast.

Many on this list would argue that (1) and (2) do not
outweigh (4) and (5), instead suggesting that writing a
lexer by hand is not too difficult and ends up being
more maintainable than a lex spec in the long run.
And of course, for a well-written by-hand lexer,
you get to keep (3).

Creating new entry hooks in the regexp library doesn't
preserve (1), (2), or (3).  And if much of your time is
spent in lexical analysis (as Ken claimed was true for
the Plan 9 compilers), losing (3) is a big deal.
So that seems like not a very good replacement for lex.

All that said, lex has been used to write a lot of C
compilers, and can be used in that context without
running into much of (4) or (5).  Why not just use lex here?

Russ