caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* mixing lexers with camlp4
@ 2007-02-02  1:40 Pietro Abate
  2007-02-02  6:26 ` [Caml-list] " Pietro Abate
  0 siblings, 1 reply; 3+ messages in thread
From: Pietro Abate @ 2007-02-02  1:40 UTC (permalink / raw)
  To: ocaml ml

Hi all,
I want to parsa a language like this one:
l := l & l | l % l | Id

where the symbols & , % , ... are almost arbitrary.
This my first step toward the idea of expanding the camlp4 language on
the fly.  So for the moment I'm parsing the language, then I'll add the
actions to extend the grammar. For the moment I'm happy to return a list
of type stype.

I've written the following camlp4 extension:

type stype = Lid | Symbol of string ;;
let (=~) s re = Str.string_match (Str.regexp re) s 0;;
let tok = ["[a-z][A-Z]*[a-z]*";"[A-Z][A-Z]*[a-z]*";
           "%";"&";"*";"?";"~";"[";"]";"<";">"] ;;
let symbex s = List.exists (fun e -> s =~ e) tok ;;

let grammar = Grammar.gcreate (Plexer.gmake ());;
let symbol strm =
    match Stream.peek strm with
    | Some(_,s) when (symbex s) -> Stream.junk strm; s
    | _ -> raise Stream.Failure
;;
let symbol = Grammar.Entry.of_parser grammar "symbol" symbol ;;
let gram_list = Grammar.Entry.create grammar "gram_list";;

EXTEND
GLOBAL: gram_list;

gram_list: [[ grams = LIST1 gram; EOI -> grams ]];

gram: [[ p = LIDENT; ":="; rules = LIST1 rule SEP "|" -> (p,rules) ]];

rule: [[ psl = LIST1 psymbol -> psl ]];

psymbol: [[
     "Id" -> Lid
    | e = symbol -> Symbol(e)
]];
END
;;

now my problem is with the production symbol, that I'd like to parse not using
the standard camlp4 lexer, but one of my own. This is because I want to allow
almost arbitrary symbols in my language and the Plexer is to restrictive. My
solution above works but it's very clumsy. The easiest way I can think of is
to use the Genlex module. So to have something like:

let lexer = Genlex.make_lexer [
    "+";"-";"*";"/";"=";
    "[";"]";"<";">";
    "%";"&";"*";"?";"~"
];;

let symbgrammar = Grammar.gcreate (lexer);;
let symbol strm =
    |Kwd s -> Stream.junk strm; s
    |Ident i -> ....
    ......... 
    | _ -> raise Stream.Failure
;;
let symbol = Grammar.Entry.of_parser symbgrammar "symbol" symbol ;;

of course the Genlex module is not immediately compatible with the Plexer
interface so I'm a bit lost...

- Is this the best way of doing it ?

- How can I make the Genlex module compatible with the Plexer 
  interface (example ?) ?

- Does camlp4 allows me to mix lexers for different productions in the same
  extension ?

I believe this kind of things are going to be much easier with the new
camlp4 version...

:)
p

-- 
++ Blog: http://blog.rsise.anu.edu.au/?q=pietro
++ 
++ "All great truths begin as blasphemies." -George Bernard Shaw
++ Please avoid sending me Word or PowerPoint attachments.
   See http://www.fsf.org/philosophy/no-word-attachments.html


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] mixing lexers with camlp4
  2007-02-02  1:40 mixing lexers with camlp4 Pietro Abate
@ 2007-02-02  6:26 ` Pietro Abate
  2007-02-04 23:41   ` Pietro Abate
  0 siblings, 1 reply; 3+ messages in thread
From: Pietro Abate @ 2007-02-02  6:26 UTC (permalink / raw)
  To: caml-list, ocaml ml

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 3740 bytes --]

In the best traditions, I partially answer to myself (below) but I've a new
question:

> - Does camlp4 allows me to mix lexers for different productions in the same
>   extension ?
well, it seems it doesn't. Now I get this error:

Error: entries "psymbol" and "symbol" do not belong to the same grammar.
Fatal error: exception Failure("Grammar.extend error")

- Is there a deep reason why I cannot mix different grammars ?
- Is there a way of forcing this behaviour ?

On Fri, Feb 02, 2007 at 12:40:11PM +1100, Pietro Abate wrote:
> Hi all,
> I want to parsa a language like this one:
> l := l & l | l % l | Id
[...]
> of course the Genlex module is not immediately compatible with the Plexer
> interface so I'm a bit lost...
> 
> - Is this the best way of doing it ?
don't know, maybe not.

> - How can I make the Genlex module compatible with the Plexer 
>   interface (example ?) ?
This should do the job (I think) even if ignore the location...

open Genlex
let lexer = Genlex.make_lexer [
    "+";"-";"*";"/";"=";
    "[";"]";"<";">";
    "%";"&";"*";"?";"~"
];;
let getkwd = function Kwd s -> s | _ -> failwith "aa" ;;
let rec glexer = parser
    [< 'Kwd ("+" | "-" | "*" | "/"
            |"=" | "[" | "]" | "<"
            |">" | "%" | "&" | "?" | "~" ) as s >] -> ("", getkwd s)
    | [< 'Ident s >] -> ("LIDENT",s)
    | [< >] -> ("EOI","")
;;
let lexer_gmake () = {
    Token.tok_func =
    Token.lexer_func_of_parser (fun s -> (glexer (lexer s), Token.dummy_loc));
    Token.tok_using = (fun _ -> ());
    Token.tok_removing = (fun _ -> ());
    Token.tok_match = Token.default_match;
    Token.tok_text = Token.lexer_text;
    Token.tok_comm = None
}
;;

The full code of my example:

to compile:
#> camlp4o pa_extend.cmo pr_o.cmo pa_test.ml >> test.ml
#> ocamlfind ocamlc -package camlp4 camlp4.cma str.cma test.ml 

------------ pa_test.ml ------------
open Genlex
type stype = Lid | Symbol of string ;;

let lexer = Genlex.make_lexer [
    "+";"-";"*";"/";"=";
    "[";"]";"<";">";
    "%";"&";"*";"?";"~"
];;
let getkwd = function Kwd s -> s | _ -> failwith "fail getkwd" ;;
let rec glexer = parser
    [< 'Kwd ("+" | "-" | "*" | "/"
            |"=" | "[" | "]" | "<"
            |">" | "%" | "&" | "?" | "~" ) as s >] -> ("", getkwd s)
    | [< 'Ident s >] -> ("LIDENT",s)
    | [< >] -> ("EOI","")
;;
let lexer_gmake () = {
    Token.tok_func =
    Token.lexer_func_of_parser (fun s -> (glexer (lexer s), Token.dummy_loc));
    Token.tok_using = (fun _ -> ());
    Token.tok_removing = (fun _ -> ());
    Token.tok_match = Token.default_match;
    Token.tok_text = Token.lexer_text;
    Token.tok_comm = None
}
;;

let symbgrammar = Grammar.gcreate (lexer_gmake ());;
let symbol strm =
    match Stream.peek strm with
    |Some("",s) -> Stream.junk strm; s
    |Some("LINDENT",s) -> Stream.junk strm; s
    | _ -> raise Stream.Failure
;;
let symbol = Grammar.Entry.of_parser symbgrammar "symbol" symbol ;;
let grammar = Grammar.gcreate (Plexer.gmake ());;
let gram_list = Grammar.Entry.create grammar "gram_list";;

EXTEND
GLOBAL: gram_list;

gram_list: [[ grams = LIST1 gram; EOI -> grams ]];

gram: [[ p = LIDENT; ":="; rules = LIST1 rule SEP "|" -> (p,rules) ]];

rule: [[ psl = LIST1 psymbol -> psl ]];

psymbol: [[
     "VAR" -> Lid
    | e = symbol -> Symbol(e)
]];

END
;;

let apply s = Grammar.Entry.parse gram_list (Stream.of_string s);;
(apply "l := VAR");;
(apply "l := VAR & VAR");;
(apply "l := VAR U VAR");;

Je vous remercie énormément pour votre aide.

:)
p


-- 
++ Blog: http://blog.rsise.anu.edu.au/?q=pietro
++ 
++ "All great truths begin as blasphemies." -George Bernard Shaw
++ Please avoid sending me Word or PowerPoint attachments.
   See http://www.fsf.org/philosophy/no-word-attachments.html


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] mixing lexers with camlp4
  2007-02-02  6:26 ` [Caml-list] " Pietro Abate
@ 2007-02-04 23:41   ` Pietro Abate
  0 siblings, 0 replies; 3+ messages in thread
From: Pietro Abate @ 2007-02-04 23:41 UTC (permalink / raw)
  To: caml-list, ocaml ml; +Cc: Pietro Abate

On Fri, Feb 02, 2007 at 05:26:06PM +1100, Pietro Abate wrote:
> well, it seems it doesn't. Now I get this error:
> Error: entries "psymbol" and "symbol" do not belong to the same grammar.
> Fatal error: exception Failure("Grammar.extend error")
> - Is there a way of forcing this behaviour ?
I don't think this is possible as the type "grammar 'te" in gramext.ml
is not exposed in the mli and there are no functions to modify its
value. In particular I don't think gram_reinit and reinit_gram are of
any use in this case.

Ok, I stop here with this problem.

p

-- 
++ Blog: http://blog.rsise.anu.edu.au/?q=pietro
++ 
++ "All great truths begin as blasphemies." -George Bernard Shaw
++ Please avoid sending me Word or PowerPoint attachments.
   See http://www.fsf.org/philosophy/no-word-attachments.html


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-02-04 23:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-02  1:40 mixing lexers with camlp4 Pietro Abate
2007-02-02  6:26 ` [Caml-list] " Pietro Abate
2007-02-04 23:41   ` Pietro Abate

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).