caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Alain Frisch <Alain.Frisch@ens.fr>
To: skaller <skaller@users.sourceforge.net>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] Need NFA/DFA conversion help
Date: Wed, 15 Sep 2004 21:15:28 +0200 (MEST)	[thread overview]
Message-ID: <Pine.SOL.4.44.0409152045090.29154-100000@clipper.ens.fr> (raw)
In-Reply-To: <20040915154137.GB24717@yquem.inria.fr>

> > I need some help understanding how to convert
> > an NFA with *labelled* arcs into an equivalent DFA.
>
> As regards ocamllex, I worked from various recent publications by
> Ville Larikari.

Lexer generators (and regexp packages) are usually pretty weak for what
concerns extraction; you can only extract very unstructured information
(a finite number of groups), and the semantics of capture groups
below Kleene stars is not always satisfactory.

The algorithm presented in this paper (sorry for citing myself):

Greedy regular expression matching. A. Frisch, and L. Cardelli. In ICALP 2004
http://www.cduce.org/papers/icalp04.pdf
(or the older and extended version: http://www.cduce.org/papers/greedy.pdf)

makes it possible to write a lexer generator (and a regexp package) with a
more structured semantics, where, for instance, a capture inside a star
returns a list of bindings, a capture below a ? returns a(n?) 'a option.
The (quite straightforward) algorithm in the paper returns a parse tree
with a specified disambiguation policy, using two linear traversals of the
input word. The first traversal is just running a very standard automaton
(without marks), keeping the intermediate states for each position;  the
second traversal is driven by the syntax of the regexp and builds the
parse tree. Let me know (off-list) if you are interested in implementing
this kind of structured capture semantics.


-- Alain

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2004-09-15 19:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-15 15:05 skaller
2004-09-15 15:41 ` Luc Maranget
2004-09-15 19:15   ` Alain Frisch [this message]
2004-09-15 17:50 ` Jason Hickey
2004-09-15 20:50 ` Jason Hickey
2004-09-15 23:35 ` skaller
2004-09-16 11:50   ` Diego Olivier Fernandez Pons
2004-09-16 12:07     ` Luc Maranget
2004-09-16 12:37       ` james woodyatt
2004-09-16 13:29       ` skaller
2004-09-16 12:57     ` skaller
2004-09-16 13:04       ` Diego Olivier Fernandez Pons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.SOL.4.44.0409152045090.29154-100000@clipper.ens.fr \
    --to=alain.frisch@ens.fr \
    --cc=caml-list@inria.fr \
    --cc=skaller@users.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).