caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: skaller <skaller@users.sourceforge.net>
To: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] Need NFA/DFA conversion help
Date: 16 Sep 2004 09:35:41 +1000	[thread overview]
Message-ID: <1095291341.27775.1164.camel@pelican.wigram> (raw)
In-Reply-To: <1095260758.27775.1047.camel@pelican.wigram>

On Thu, 2004-09-16 at 01:05, skaller wrote:
> I need some help understanding how to convert
> an NFA with *labelled* arcs into an equivalent DFA.

Thx to those who responded. Just for motivation:

> Ocamllex functionality is already available,

This already works:

(* test matcher *)

(* tokens *)
type codes_t = [
  | `Word
  | `Ident
  | `Num
  | `White
  | `Dus
]

let string_of_code = function 
  | `Word -> "Word"
  | `Ident -> "Ident"
  | `Num -> "Num"
  | `White -> "White"
  | `Dus -> "Dus"
  | `Reserved -> "Reserved"

let string_of_result f = function
  | `Coded xs -> String.concat ", " (List.map f xs)
  | `Error _ -> "Match failure"

(* regexps *)
let digit = regexp_of_charset "0123456789"
let lower = regexp_of_inclusive_char_range 'a' 'z'
let upper = regexp_of_inclusive_char_range 'A' 'Z'
let letter = alt upper lower
let underscore = regexp_of_char '_'
let word = many letter
let idstart = alt letter underscore
let idrest = alt idstart digit
let space = regexp_of_char ' '
let white = many space
let ident = seq idstart (aster idrest)
let num = many digit
let us2 = seq underscore underscore

(* associate regexps with tokens *)
let tk_word = seq word (code `Word)
let tk_ident = seq ident (code `Ident)
let tk_num = seq num (code `Num)
let tk_white = seq white (code `White)
let tk_us2 = seq (seq us2 (code `Dus))

(* gather the alternatives *)
let re = alts [tk_word; tk_ident; tk_num; tk_white; tk_us2]
let alphabet, states, codes, matrix = process_regexp re;;

(* print all the token associated with command line argument *)
let data = Sys.argv.(1) ;;
let matcher = make_matcher matrix codes data;;
let result = recognize matcher;;
print_endline ("Result: " ^ string_of_result string_of_code result);;

---------------------------

With some minor fiddling this is a tokeniser
like Ocamllex, except you don't need any external tool 
or library, you can define everything 'in-caml' 
and dynamically, as shown.  (The core of the matcher
engine is also purely functional :)

Then you can either process some data,
or you can save the compiled automaton (possibly
as Ocaml or C sourcecode which can be compiled directly
to executable code like (Ocaml)lex output).

What doesn't work is sub-group matching,
look ahead, etc -- things requiring actions
in the middle of a regexp (rather than at the end).

If that could be made to work, we'd have a nice
pure Ocaml only library that could replace Ocamllex,
Str, and PCRE all at once -- and also allow a lot
of other things to be built on top, such as an RTN parser.

[It turns out polymorphic variants are quite useful
in extending the regular expression ASTs, as well
as being used as tokens.]

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  parent reply	other threads:[~2004-09-15 23:35 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-15 15:05 skaller
2004-09-15 15:41 ` Luc Maranget
2004-09-15 19:15   ` Alain Frisch
2004-09-15 17:50 ` Jason Hickey
2004-09-15 20:50 ` Jason Hickey
2004-09-15 23:35 ` skaller [this message]
2004-09-16 11:50   ` Diego Olivier Fernandez Pons
2004-09-16 12:07     ` Luc Maranget
2004-09-16 12:37       ` james woodyatt
2004-09-16 13:29       ` skaller
2004-09-16 12:57     ` skaller
2004-09-16 13:04       ` Diego Olivier Fernandez Pons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1095291341.27775.1164.camel@pelican.wigram \
    --to=skaller@users.sourceforge.net \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).