caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* mixing lexers with camlp4
@ 2007-02-02  1:40 Pietro Abate
  2007-02-02  6:26 ` [Caml-list] " Pietro Abate
  0 siblings, 1 reply; 3+ messages in thread
From: Pietro Abate @ 2007-02-02  1:40 UTC (permalink / raw)
  To: ocaml ml

Hi all,
I want to parsa a language like this one:
l := l & l | l % l | Id

where the symbols & , % , ... are almost arbitrary.
This my first step toward the idea of expanding the camlp4 language on
the fly.  So for the moment I'm parsing the language, then I'll add the
actions to extend the grammar. For the moment I'm happy to return a list
of type stype.

I've written the following camlp4 extension:

type stype = Lid | Symbol of string ;;
let (=~) s re = Str.string_match (Str.regexp re) s 0;;
let tok = ["[a-z][A-Z]*[a-z]*";"[A-Z][A-Z]*[a-z]*";
           "%";"&";"*";"?";"~";"[";"]";"<";">"] ;;
let symbex s = List.exists (fun e -> s =~ e) tok ;;

let grammar = Grammar.gcreate (Plexer.gmake ());;
let symbol strm =
    match Stream.peek strm with
    | Some(_,s) when (symbex s) -> Stream.junk strm; s
    | _ -> raise Stream.Failure
;;
let symbol = Grammar.Entry.of_parser grammar "symbol" symbol ;;
let gram_list = Grammar.Entry.create grammar "gram_list";;

EXTEND
GLOBAL: gram_list;

gram_list: [[ grams = LIST1 gram; EOI -> grams ]];

gram: [[ p = LIDENT; ":="; rules = LIST1 rule SEP "|" -> (p,rules) ]];

rule: [[ psl = LIST1 psymbol -> psl ]];

psymbol: [[
     "Id" -> Lid
    | e = symbol -> Symbol(e)
]];
END
;;

now my problem is with the production symbol, that I'd like to parse not using
the standard camlp4 lexer, but one of my own. This is because I want to allow
almost arbitrary symbols in my language and the Plexer is to restrictive. My
solution above works but it's very clumsy. The easiest way I can think of is
to use the Genlex module. So to have something like:

let lexer = Genlex.make_lexer [
    "+";"-";"*";"/";"=";
    "[";"]";"<";">";
    "%";"&";"*";"?";"~"
];;

let symbgrammar = Grammar.gcreate (lexer);;
let symbol strm =
    |Kwd s -> Stream.junk strm; s
    |Ident i -> ....
    ......... 
    | _ -> raise Stream.Failure
;;
let symbol = Grammar.Entry.of_parser symbgrammar "symbol" symbol ;;

of course the Genlex module is not immediately compatible with the Plexer
interface so I'm a bit lost...

- Is this the best way of doing it ?

- How can I make the Genlex module compatible with the Plexer 
  interface (example ?) ?

- Does camlp4 allows me to mix lexers for different productions in the same
  extension ?

I believe this kind of things are going to be much easier with the new
camlp4 version...

:)
p

-- 
++ Blog: http://blog.rsise.anu.edu.au/?q=pietro
++ 
++ "All great truths begin as blasphemies." -George Bernard Shaw
++ Please avoid sending me Word or PowerPoint attachments.
   See http://www.fsf.org/philosophy/no-word-attachments.html


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-02-04 23:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-02  1:40 mixing lexers with camlp4 Pietro Abate
2007-02-02  6:26 ` [Caml-list] " Pietro Abate
2007-02-04 23:41   ` Pietro Abate

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).