From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: Date: Fri, 23 Feb 2007 12:33:24 -0500 From: "Russ Cox" To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu> Subject: Re: Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) In-Reply-To: <224a39ebca9aaa0370eb804cd59e6aac@plan9.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <224a39ebca9aaa0370eb804cd59e6aac@plan9.jp> Topicbox-Message-UUID: 1376f776-ead2-11e9-9d60-3106f5b1d025 Lex has three benefits: 1) You don't have to write the lexer directly. 2) What you do have to write is fairly concise. 3) The resulting lexer is fairly efficient. It has two main drawbacks: 4) The input model does not always match your own program's input model, creating a messy interface. 5) Once you need more than regular expressions, lexers written with state variables and such can get very opaque very fast. Many on this list would argue that (1) and (2) do not outweigh (4) and (5), instead suggesting that writing a lexer by hand is not too difficult and ends up being more maintainable than a lex spec in the long run. And of course, for a well-written by-hand lexer, you get to keep (3). Creating new entry hooks in the regexp library doesn't preserve (1), (2), or (3). And if much of your time is spent in lexical analysis (as Ken claimed was true for the Plan 9 compilers), losing (3) is a big deal. So that seems like not a very good replacement for lex. All that said, lex has been used to write a lot of C compilers, and can be used in that context without running into much of (4) or (5). Why not just use lex here? Russ