9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: "Joel C. Salomon" <joelcsalomon@gmail.com>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu>
Subject: Re: Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?)
Date: Fri, 23 Feb 2007 08:34:45 -0500	[thread overview]
Message-ID: <7871fcf50702230534m4d917437ya1fda80e4579c30e@mail.gmail.com> (raw)
In-Reply-To: <20070223065409.GC22337@mero.morphisms.net>

On 2/23/07, William K. Josephson <jkw@eecs.harvard.edu> wrote:
> On Fri, Feb 23, 2007 at 01:27:56AM -0500, Joel Salomon wrote:
> > Would such a project be a worthwhile spent of time?  (Might it develop
> > into the asteroid to kill the dinosaur waiting for it?)
>
> Why go to the trouble?  For C, the lexer is easy
> enough to just write by hand.

For a useful and significant subset of C, the lexer is easy enough to
just write by hand.  I was trying for full C99 (what were those ISO
guys drinking?).  I spent far too much time on it to call the task
"easy".

I have what I believe is a pretty complete C lexer
(http://www.tip9ug.jp/who/chesky/comp/lex.c).  It still is far from
being integrated into a full grammar, but it scans cpp(1) output
nicely.  I tested it against some of the odder "features" of C99—UCNs,
hex floats, &c.—and it seems to work.

Some parts were easy, some less so, and some looked easy until they
turned out to be subtly wrong.  Recognizing whether the number seen is
an integer (in decimal, octal, or hex) or a real number was one of the
hard parts, and one I gladly handed off to a regexp.  The way I
generated the regexp may not be ideal, as someone pointed out to me
off-list, but hand-generated code that recognizes what sort of number
was seen would be exactly equivalent to the regexp, and less readable.

--Joel

  reply	other threads:[~2007-02-23 13:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-22 22:16 [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Folkert van Heusden
2007-02-22 23:17 ` Alberto Cortés
2007-02-22 23:21 ` William Josephson
2007-02-22 23:48   ` Russ Cox
2007-02-23  6:27     ` Composition of regexps (Was re: [9fans] regular expressions in plan9 different from the ones in unix?) Joel Salomon
2007-02-23  6:54       ` William K. Josephson
2007-02-23 13:34         ` Joel C. Salomon [this message]
2007-02-23 17:33       ` Russ Cox
2007-02-23 11:19     ` [9fans] regular expressions in plan9 different from the ones in unix? (at least linux) Gorka Guardiola
2007-02-23 12:12       ` erik quanstrom
2007-02-23 12:17         ` Gorka Guardiola
2007-02-23 13:02           ` erik quanstrom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7871fcf50702230534m4d917437ya1fda80e4579c30e@mail.gmail.com \
    --to=joelcsalomon@gmail.com \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).