caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* camlp4 stream parser syntax
@ 2009-03-07 22:38 Joel Reymont
  2009-03-07 22:52 ` Joel Reymont
  2009-03-07 23:52 ` [Caml-list] " Jon Harrop
  0 siblings, 2 replies; 36+ messages in thread
From: Joel Reymont @ 2009-03-07 22:38 UTC (permalink / raw)
  To: O'Caml Mailing List

Where can I read up on the syntax of the following in a camlp4 stream  
parser?

   | [<' INT n >] -> Int n

For example, where are [< ... >] described and why is the ' needed in  
between?

	Thanks, Joel

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: camlp4 stream parser syntax
  2009-03-07 22:38 camlp4 stream parser syntax Joel Reymont
@ 2009-03-07 22:52 ` Joel Reymont
  2009-03-07 23:21   ` Re : [Caml-list] " Matthieu Wipliez
  2009-03-07 23:52 ` [Caml-list] " Jon Harrop
  1 sibling, 1 reply; 36+ messages in thread
From: Joel Reymont @ 2009-03-07 22:52 UTC (permalink / raw)
  To: O'Caml Mailing List

> Where can I read up on the syntax of the following in a camlp4  
> stream parser?
>
>  | [<' INT n >] -> Int n
>
> For example, where are [< ... >] described and why is the ' needed  
> in between?


To be more precise, I'm using camlp4 to parse a language into a non- 
OCaml AST.

I'm trying to figure out the meaning of [<, >], [[ and ]]

My ocamllex lexer is wrapped to make it look like a stream lexer  
(below) and I'm returning a tuple of (tok, loc) because I don't see  
another way of making token location available to the parser.

Still, I'm how to integrate the reporting of error location into ?? in  
something like this

  | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'"  
 >] -> e

Would someone kindly shed light on this?

	Thanks in advance, Joel

P.S. ocamllex wrapper to return a' Stream.t

{
let from_lexbuf tab lb =
   let next _ =
     let tok = token tab lb in
     let loc = Loc.of_lexbuf lb in
     Some (tok, loc)
   in Stream.from next

let setup_loc lb loc =
   let start_pos = Loc.start_pos loc in
   lb.lex_abs_pos <- start_pos.pos_cnum;
   lb.lex_curr_p  <- start_pos

let from_string loc tab str =
   let lb = Lexing.from_string str in
   setup_loc lb loc;
   from_lexbuf tab lb

}

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-07 22:52 ` Joel Reymont
@ 2009-03-07 23:21   ` Matthieu Wipliez
  2009-03-07 23:42     ` Joel Reymont
  2009-03-08  0:40     ` Joel Reymont
  0 siblings, 2 replies; 36+ messages in thread
From: Matthieu Wipliez @ 2009-03-07 23:21 UTC (permalink / raw)
  To: O'Caml Mailing List

[-- Attachment #1: Type: text/plain, Size: 3737 bytes --]

Hi Joel,

why are you using stream parsers instead of Camlp4 grammars ?
This:

> let rec parse_primary = parser
> 
>   | [< 'INT n >] -> Int n
>   | [< 'FLOAT n >] -> Float n
>   | [< 'STRING n >] -> Str n
>   | [< 'TRUE >] -> Bool true
>   | [< 'FALSE >] -> Bool false
> 
>   | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")

could be written as:
expression: [
  [ (i, _) = INT -> Int i
  | (s, _) = STRING -> Str s
  ... ]
];

Note that Camlp4 will automatically raise an exception if the input cannot be parsed with the grammar given.

Also if you have input that is syntactically correct but is not semantically correct, and you want to raise an exception with the error location during parsing, you might want to use Loc.raise as follows:
expression: [
  [ e1 = SELF; "/"; e2 = SELF ->
    if e2 = Int 0 then
      Loc.raise _loc (Failure "division by zero")
    else
      BinaryOp (e1, Div, e2) ]
];

By the way, do you need you own tailor-made lexer? Camlp4 provides one that might satisfy your needs.
Otherwise, you can always define your own lexer (I had to do that for the project I'm working on, see file attached).

Your parser would then look like

(* functor application *)
module Camlp4Loc = Camlp4.Struct.Loc
module Lexer = Cal_lexer.Make(Camlp4Loc)
module Gram = Camlp4.Struct.Grammar.Static.Make(Lexer)

(* exposes EOI and other stuff *)
open Lexer

(* rule definition *)
let rule = Gram.Entry.mk "rule"

(* grammar definition *)
EXTEND Gram
  rule: [ [ ... ] ];
END

(* to parse a file *)
Gram.parse rule (Loc.mk file) (Stream.of_channel ch)


This should be compiled with camlp4of.

I hope this helps you with what you'd like to do,

Cheers,

Matthieu


----- Message d'origine ----
> De : Joel Reymont <joelr1@gmail.com>
> À : O'Caml Mailing List <caml-list@yquem.inria.fr>
> Envoyé le : Samedi, 7 Mars 2009, 23h52mn 52s
> Objet : [Caml-list] Re: camlp4 stream parser syntax
> 
> > Where can I read up on the syntax of the following in a camlp4 stream parser?
> > 
> >  | [<' INT n >] -> Int n
> > 
> > For example, where are [< ... >] described and why is the ' needed in between?
> 
> 
> To be more precise, I'm using camlp4 to parse a language into a non-OCaml AST.
> 
> I'm trying to figure out the meaning of [<, >], [[ and ]]
> 
> My ocamllex lexer is wrapped to make it look like a stream lexer (below) and I'm 
> returning a tuple of (tok, loc) because I don't see another way of making token 
> location available to the parser.
> 
> Still, I'm how to integrate the reporting of error location into ?? in something 
> like this
> 
> | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
> 
> Would someone kindly shed light on this?
> 
>     Thanks in advance, Joel
> 
> P.S. ocamllex wrapper to return a' Stream.t
> 
> {
> let from_lexbuf tab lb =
>   let next _ =
>     let tok = token tab lb in
>     let loc = Loc.of_lexbuf lb in
>     Some (tok, loc)
>   in Stream.from next
> 
> let setup_loc lb loc =
>   let start_pos = Loc.start_pos loc in
>   lb.lex_abs_pos <- start_pos.pos_cnum;
>   lb.lex_curr_p  <- start_pos
> 
> let from_string loc tab str =
>   let lb = Lexing.from_string str in
>   setup_loc lb loc;
>   from_lexbuf tab lb
> 
> }
> 
> ---
> http://tinyco.de
> Mac, C++, OCaml
> 
> 
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs



      

[-- Attachment #2: cal_lexer.mll --]
[-- Type: application/octet-stream, Size: 10564 bytes --]

(*****************************************************************************)
(* Cal2C                                                                     *)
(* Copyright (c) 2007-2008, IETR/INSA of Rennes.                             *)
(* All rights reserved.                                                      *)
(*                                                                           *)
(* This software is governed by the CeCILL-B license under French law and    *)
(* abiding by the rules of distribution of free software. You can  use,      *)
(* modify and/ or redistribute the software under the terms of the CeCILL-B  *)
(* license as circulated by CEA, CNRS and INRIA at the following URL         *)
(* "http://www.cecill.info".                                                 *)
(*                                                                           *)
(* Matthieu WIPLIEZ <Matthieu.Wipliez@insa-rennes.fr                         *)
(*****************************************************************************)

(* File cal_lexer.mll *)
{
open Printf
open Format

module Make (Loc : Camlp4.Sig.Loc) = struct
  module Loc = Loc

	type token =
		| KEYWORD of string
		| SYMBOL  of string
    | IDENT   of string
    | INT     of int * string
    | FLOAT   of float * string
    | CHAR    of char * string
    | STRING  of string * string
	  | EOI
	
	module Token = struct
		module Loc = Loc
	
    type t = token

	  let to_string =
	    function
	      KEYWORD s     -> sprintf "KEYWORD %S" s
	    | SYMBOL s      -> sprintf "SYMBOL %S" s
	    | IDENT s       -> sprintf "IDENT %S" s
	    | INT (_, s)    -> sprintf "INT %s" s
	    | FLOAT (_, s)  -> sprintf "FLOAT %s" s
	    | CHAR (_, s)   -> sprintf "CHAR '%s'" s
	    | STRING (_, s) -> sprintf "STRING \"%s\"" s
	                      (* here it's not %S since the string is already escaped *)
	    | EOI           -> sprintf "EOI"

	  let print ppf x = pp_print_string ppf (to_string x)
	
	  let match_keyword kwd = function
	      KEYWORD kwd' when kwd = kwd' -> true
	    | _ -> false
	
	  let extract_string =
	    function
	      KEYWORD s
			| IDENT s
			| INT (_, s)
			| FLOAT (_, s)
			| CHAR (_, s)
			| STRING (_, s) -> s
	    | tok ->
	        invalid_arg ("Cannot extract a string from this token: "^
	                     to_string tok)
	
	  module Error = struct
	    type t =
	        Illegal_token of string
	      | Keyword_as_label of string
	      | Illegal_token_pattern of string * string
	      | Illegal_constructor of string
	
	    exception E of t
	
	    let print ppf =
	      function
	        Illegal_token s ->
	          fprintf ppf "Illegal token (%s)" s
	      | Keyword_as_label kwd ->
	          fprintf ppf "`%s' is a keyword, it cannot be used as label name" kwd
	      | Illegal_token_pattern (p_con, p_prm) ->
	          fprintf ppf "Illegal token pattern: %s %S" p_con p_prm
	      | Illegal_constructor con ->
	          fprintf ppf "Illegal constructor %S" con
	
	    let to_string x =
	      let b = Buffer.create 50 in
	      let () = bprintf b "%a" print x in Buffer.contents b
	  end
		
	  module M = Camlp4.ErrorHandler.Register(Error)
	
	  module Filter = struct
	    type token_filter = (t, Loc.t) Camlp4.Sig.stream_filter
			
	    type t =
	      { is_kwd : string -> bool;
	        mutable filter : token_filter }
	
	    let mk is_kwd =
	      { is_kwd = is_kwd;
	        filter = fun s -> s }
	
	    let keyword_conversion tok is_kwd =
        match tok with
          SYMBOL s | IDENT s when is_kwd s -> KEYWORD s
        | _ -> tok
	
	    let filter x =
	      let f tok loc =
          let tok' = keyword_conversion tok x.is_kwd in
	        (tok', loc)
	      in
	      let rec filter =
	        parser
	        | [< '(tok, loc); s >] -> [< ' f tok loc; filter s >]
	        | [< >] -> [< >]
	      in
	      fun strm -> x.filter (filter strm)
	
	    let define_filter x f = x.filter <- f x.filter
	
	    let keyword_added _ _ _ = ()
	    let keyword_removed _ _ = ()
	  end
	end
	
  open Lexing
	
	(* Error report *)
  module Error = struct

    type t =
      | Illegal_character of char
      | Illegal_escape    of string
      | Unterminated_comment
      | Unterminated_string
      | Unterminated_quotation
      | Unterminated_antiquot
      | Unterminated_string_in_comment
      | Comment_start
      | Comment_not_end
      | Literal_overflow of string

    exception E of t

    open Format

    let print ppf =
      function
      | Illegal_character c ->
          fprintf ppf "Illegal character (%s)" (Char.escaped c)
      | Illegal_escape s ->
          fprintf ppf "Illegal backslash escape in string or character (%s)" s
      | Unterminated_comment ->
          fprintf ppf "Comment not terminated"
      | Unterminated_string ->
          fprintf ppf "String literal not terminated"
      | Unterminated_string_in_comment ->
          fprintf ppf "This comment contains an unterminated string literal"
      | Unterminated_quotation ->
          fprintf ppf "Quotation not terminated"
      | Unterminated_antiquot ->
          fprintf ppf "Antiquotation not terminated"
      | Literal_overflow ty ->
          fprintf ppf "Integer literal exceeds the range of representable integers of type %s" ty
      | Comment_start ->
          fprintf ppf "this is the start of a comment"
      | Comment_not_end ->
          fprintf ppf "this is not the end of a comment"

    let to_string x =
      let b = Buffer.create 50 in
      let () = bprintf b "%a" print x in Buffer.contents b
  end

  let module M = Camlp4.ErrorHandler.Register(Error) in ()

  open Error
	
	open Cal2c_util
  exception Eof

(* String construction *)
let str = ref ""

type context = {
  loc        : Loc.t;
  in_comment : bool;
  quotations : bool;
  antiquots  : bool;
  lexbuf     : lexbuf;
  buffer     : Buffer.t
}

(* Update the current location with file name and line number. *)
let update_loc c file line absolute chars =
  let lexbuf = c.lexbuf in
  let pos = lexbuf.lex_curr_p in
  let new_file =
		match file with
    | None -> pos.pos_fname
    | Some s -> s
  in
  lexbuf.lex_curr_p <- { pos with
    pos_fname = new_file;
    pos_lnum = if absolute then line else pos.pos_lnum + line;
    pos_bol = pos.pos_cnum - chars;
  }

(* Matches either \ or $. Why so many backslashes? Because \ has to be escaped*)
(* in strings, so we get \\. \, | and $ also have to be escaped in regexps, *)
(* so we have \\\\ \\| \\$. *)
let re_id = Str.regexp "\\\\\\|\\$"
}

(* Numbers *)
let nonZeroDecimalDigit = ['1'-'9']

let decimalDigit = '0' | nonZeroDecimalDigit
let decimalLiteral = nonZeroDecimalDigit (decimalDigit)*

let hexadecimalDigit = decimalDigit | ['a'-'f'] | ['A'-'F']
let hexadecimalLiteral = '0' ('x'|'X') hexadecimalDigit (hexadecimalDigit)*

let octalDigit = ['0'-'7']
let octalLiteral = '0' (octalDigit)*

let integer = decimalLiteral | hexadecimalLiteral | octalLiteral

let exponent = ('e'|'E') ('+'|'-')? decimalDigit+
let real = decimalDigit+ '.' (decimalDigit)* exponent?
| '.' decimalDigit+ exponent?
| decimalDigit+ exponent

(* Identifiers *)
let char = ['a'-'z' 'A'-'Z']
let any_identifier = (char | '_' | decimalDigit | '$')+
let other_identifier =
    (char | '_') (char | '_' | decimalDigit | '$')*
  | '$' (char | '_' | decimalDigit | '$')+
let identifier = '\\' any_identifier '\\' | other_identifier

let newline = ('\010' | '\013' | "\013\010")

(* Token rule *)
rule token c = parse
  | [' ' '\t'] {token c lexbuf}
	| newline { update_loc c None 1 false 0; token c lexbuf }

	| "^" { SYMBOL "^" }
	| "->" { SYMBOL "->" }
	| ':' { SYMBOL ":" }
	| ":=" { SYMBOL ":=" }
	| ',' { SYMBOL "," }
	| "!=" { SYMBOL "!=" }
	| '/' { SYMBOL "/" }
	| '.' { SYMBOL "." }
	| ".." { SYMBOL ".." }
	| "::" { SYMBOL "::" }
	| "-->" { SYMBOL "-->" }
	| "==>" { SYMBOL "==>" }
	| '=' { SYMBOL "=" }
	| ">=" { SYMBOL ">=" }
	| '>' { SYMBOL ">" }
	| '{' { SYMBOL "{" }
	| '[' { SYMBOL "[" }
	| "<=" { SYMBOL "<=" }
	| '<' { SYMBOL "<" }
	| '(' { SYMBOL "(" }
	| '-' { SYMBOL "-" }
	| '+' { SYMBOL "+" }
	| '}' { SYMBOL "}" }
	| ']' { SYMBOL "]" }
	| ')' { SYMBOL ")" }
	| ';' { SYMBOL ";" }
	| '#' { SYMBOL "#" }
	| '*' { SYMBOL "*" }

  | integer as lxm { INT (int_of_string lxm, lxm) }
  | real as lxm { FLOAT (float_of_string lxm, lxm) }
  | identifier as ident {
				let ident = Str.global_replace re_id "_" ident in
				IDENT ident }
  | '"' { let str = string c lexbuf in STRING (str, str) }
  | "//" { single_line_comment c lexbuf }
	| "/*" { multi_line_comment c lexbuf }
  | eof { EOI }
and string ctx = parse
	| "\\\"" { str := !str ^ "\\\""; string ctx lexbuf }
	| '"' { let s = !str in str := ""; s }
	| _ as c { str := !str ^ (String.make 1 c); string ctx lexbuf }
and single_line_comment c = parse
  | newline { update_loc c None 1 false 0; token c lexbuf }
	| _ { single_line_comment c lexbuf }
and multi_line_comment c = parse
  | "*/" { token c lexbuf }
	| newline { update_loc c None 1 false 0; multi_line_comment c lexbuf }
	| _ { multi_line_comment c lexbuf }
    
{		
  let default_context lb =
  { loc        = Loc.ghost ;
    in_comment = false     ;
    quotations = true      ;
    antiquots  = false     ;
    lexbuf     = lb        ;
    buffer     = Buffer.create 256 }
	
  let update_loc c = { (c) with loc = Loc.of_lexbuf c.lexbuf }

  let with_curr_loc f c = f (update_loc c) c.lexbuf
	
  let lexing_store s buff max =
    let rec self n s =
      if n >= max then n
      else
        match Stream.peek s with
        | Some x ->
            Stream.junk s;
            buff.[n] <- x;
            succ n
        | _ -> n
    in
    self 0 s

  let from_context c =
    let next _ =
      let tok = with_curr_loc token c in
      let loc = Loc.of_lexbuf c.lexbuf in
      Some ((tok, loc))
    in Stream.from next

  let from_lexbuf ?(quotations = true) lb =
    let c = { (default_context lb) with
              loc        = Loc.of_lexbuf lb;
              antiquots  = !Camlp4_config.antiquotations;
              quotations = quotations      }
    in from_context c

  let setup_loc lb loc =
    let start_pos = Loc.start_pos loc in
    lb.lex_abs_pos <- start_pos.pos_cnum;
    lb.lex_curr_p  <- start_pos

  let from_string ?quotations loc str =
    let lb = Lexing.from_string str in
    setup_loc lb loc;
    from_lexbuf ?quotations lb

  let from_stream ?quotations loc strm =
    let lb = Lexing.from_function (lexing_store strm) in
    setup_loc lb loc;
    from_lexbuf ?quotations lb

  let mk () loc strm =
    from_stream ~quotations:!Camlp4_config.quotations loc strm
end
}

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-07 23:21   ` Re : [Caml-list] " Matthieu Wipliez
@ 2009-03-07 23:42     ` Joel Reymont
  2009-03-08  0:40     ` Joel Reymont
  1 sibling, 0 replies; 36+ messages in thread
From: Joel Reymont @ 2009-03-07 23:42 UTC (permalink / raw)
  To: Matthieu Wipliez; +Cc: O'Caml Mailing List


On Mar 7, 2009, at 11:21 PM, Matthieu Wipliez wrote:

> why are you using stream parsers instead of Camlp4 grammars ?

Because I don't know any better? I'm just starting out, really.

I have a parser that I wrote using ocamlyacc and menhir. I finally  
when with dypgen and didn't touch the code for a few months. I then  
tried to simplify the grammar on account of a later type checking pass  
and realized that I cannot troubleshoot it.

I think I can make do with a camlp4 parser and it will vastly simplify  
debugging.

> This:
> ...
> could be written as:
> expression: [
>  [ (i, _) = INT -> Int i
>  | (s, _) = STRING -> Str s
>  ... ]
> ];

Doesn't your example assume that I'm using the camlp4 lexer?

> expression: [
>  [ e1 = SELF; "/"; e2 = SELF ->
>    if e2 = Int 0 then
>      Loc.raise _loc (Failure "division by zero")
>    else
>      BinaryOp (e1, Div, e2) ]
> ];

Where does SELF above come from?

Can I use a token instead of "/" since I return SLASH whenever "/" is  
found by the lexer.

> By the way, do you need you own tailor-made lexer? Camlp4 provides  
> one that might satisfy your needs.

It has been said that it's not extensible so I wrote my own lexer  
using ocamllex and wrapped it to return (tok, loc) Stream.t.

> Otherwise, you can always define your own lexer (I had to do that  
> for the project I'm working on, see file attached).

Thanks, I'll study it.


> Your parser would then look like
>
> (* functor application *)
> module Camlp4Loc = Camlp4.Struct.Loc
> module Lexer = Cal_lexer.Make(Camlp4Loc)
> module Gram = Camlp4.Struct.Grammar.Static.Make(Lexer)

Is this extending the OCaml grammar or starting with an "empty" one?

> (* rule definition *)
> let rule = Gram.Entry.mk "rule"

Is this the "start" rule of the parser?

	Thanks, Joel

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Caml-list] camlp4 stream parser syntax
  2009-03-07 22:38 camlp4 stream parser syntax Joel Reymont
  2009-03-07 22:52 ` Joel Reymont
@ 2009-03-07 23:52 ` Jon Harrop
  2009-03-07 23:53   ` Joel Reymont
  1 sibling, 1 reply; 36+ messages in thread
From: Jon Harrop @ 2009-03-07 23:52 UTC (permalink / raw)
  To: caml-list

On Saturday 07 March 2009 22:38:14 Joel Reymont wrote:
> Where can I read up on the syntax of the following in a camlp4 stream
> parser?
>
>    | [<' INT n >] -> Int n
>
> For example, where are [< ... >] described and why is the ' needed in
> between?

The grammar is described formally here:

  http://caml.inria.fr/pub/docs/manual-camlp4/manual003.html

You may find one of my free articles on parsing to be of interest because it 
covers the stream parser camlp4 extension:

  http://www.ffconsultancy.com/ocaml/benefits/parsing.html

There is also a slightly bigger parser here:

  http://www.ffconsultancy.com/ocaml/benefits/interpreter.html

The [< .. >] denote a stream when matching over one using the "parser" keyword 
and the tick ' denotes a kind of literal to identify a single token in the 
stream. So:

    | [< 'Kwd "if"; p=parse_expr; 'Kwd "then"; t=parse_expr;
         'Kwd "else"; f=parse_expr >] ->

uses ' to parse three individual keywords but also requests that parts of the 
stream are parsed using the parse_expr function and each result is named 
accordingly.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Caml-list] camlp4 stream parser syntax
  2009-03-07 23:52 ` [Caml-list] " Jon Harrop
@ 2009-03-07 23:53   ` Joel Reymont
  2009-03-08  0:12     ` Jon Harrop
  0 siblings, 1 reply; 36+ messages in thread
From: Joel Reymont @ 2009-03-07 23:53 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon,

On Mar 7, 2009, at 11:52 PM, Jon Harrop wrote:

> The [< .. >] denote a stream when matching over one using the  
> "parser" keyword
> and the tick ' denotes a kind of literal to identify a single token  
> in the
> stream. So:
>
>    | [< 'Kwd "if"; p=parse_expr; 'Kwd "then"; t=parse_expr;
>         'Kwd "else"; f=parse_expr >] ->

Should I be using camlp4 grammars as Matthieu suggested?

It seems there are are far more and better resources on doing this  
than the stream parsing approach. This includes your OCaml Journal.

Do I loose anything when going with camlp4 grammars and NOT parsing  
into an OCaml AST? Do I gain a lot with grammars over stream parsing?

	Thanks, Joel

---
http://tinyco.de
Mac, C++, OCaml


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Caml-list] camlp4 stream parser syntax
  2009-03-07 23:53   ` Joel Reymont
@ 2009-03-08  0:12     ` Jon Harrop
  2009-03-08  0:20       ` Re : " Matthieu Wipliez
  0 siblings, 1 reply; 36+ messages in thread
From: Jon Harrop @ 2009-03-08  0:12 UTC (permalink / raw)
  To: Joel Reymont, caml-list

On Saturday 07 March 2009 23:53:03 you wrote:
> Should I be using camlp4 grammars as Matthieu suggested?
>
> It seems there are are far more and better resources on doing this
> than the stream parsing approach. This includes your OCaml Journal.

I would say that there is very little documentation about either approach but 
I personally found it much easier to use the stream parsers rather than 
camlp4 because they are much simpler and, therefore, do not require so much 
documentation. Having said that, I never used ??.

> Do I loose anything when going with camlp4 grammars and NOT parsing
> into an OCaml AST?

No, parsing into other ASTs is really easy with Camlp4.

> Do I gain a lot with grammars over stream parsing? 

Swings and roundabouts, IMHO. Camlp4 is higher level, more capable and the 
syntax is clearer but the documentation is so poor that I have given up every 
time I have tried to use it either because the default lexer was insufficient 
or because I could not figure out how to extract the necessary data from the 
OCaml grammar.

Matthieu's example looks fantastic though...

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re : [Caml-list] camlp4 stream parser syntax
  2009-03-08  0:12     ` Jon Harrop
@ 2009-03-08  0:20       ` Matthieu Wipliez
  2009-03-08  0:29         ` Jon Harrop
  2009-03-08  0:30         ` Re : " Joel Reymont
  0 siblings, 2 replies; 36+ messages in thread
From: Matthieu Wipliez @ 2009-03-08  0:20 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 1262 bytes --]

Joel asked me the parser so I gave it to him, but maybe it can be of use for others, so here it is.
Apart from the code specific to the application, it gives a good example of a complete Camlp4 lexer/parser for a language.

Note that for the lexer I started from a custom lexer made by Pietro Abate ( https://www.cduce.org/~abate/how-add-a-custom-lexer-camlp4 ) from the cduce lexer.

Cheers,
Matthieu



----- Message d'origine ----
> Swings and roundabouts, IMHO. Camlp4 is higher level, more capable and the 
> syntax is clearer but the documentation is so poor that I have given up every 
> time I have tried to use it either because the default lexer was insufficient 
> or because I could not figure out how to extract the necessary data from the 
> OCaml grammar.
> 
> Matthieu's example looks fantastic though...
> 
> -- 
> Dr Jon Harrop, Flying Frog Consultancy Ltd.
> http://www.ffconsultancy.com/?e
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs



      

[-- Attachment #2: cal_parser.ml --]
[-- Type: application/octet-stream, Size: 27492 bytes --]

(*****************************************************************************)
(* Cal2C                                                                     *)
(* Copyright (c) 2007-2008, IETR/INSA of Rennes.                             *)
(* All rights reserved.                                                      *)
(*                                                                           *)
(* This software is governed by the CeCILL-B license under French law and    *)
(* abiding by the rules of distribution of free software. You can  use,      *)
(* modify and/ or redistribute the software under the terms of the CeCILL-B  *)
(* license as circulated by CEA, CNRS and INRIA at the following URL         *)
(* "http://www.cecill.info".                                                 *)
(*                                                                           *)
(* Matthieu WIPLIEZ <Matthieu.Wipliez@insa-rennes.fr                         *)
(*****************************************************************************)

open Cal2c_util
open Printf

let time = ref 0.

(* Camlp4 stuff *)
module Camlp4Loc = Camlp4.Struct.Loc
module Lexer = Cal_lexer.Make(Camlp4Loc)
module Gram = Camlp4.Struct.Grammar.Static.Make(Lexer)

(** [convert_loc _loc] returns a [Loc.t] from a [Camlp4.Struct.Loc.t]. *)
let convert_loc _loc =
	let (file_name, start_line, start_bol, start_off,
		stop_line, stop_bol, stop_off, _) = Camlp4Loc.to_tuple _loc in
	{
		Loc.file_name = file_name;
		Loc.start = {Loc.line = start_line; Loc.bol = start_bol; Loc.off = start_off };
		Loc.stop = {Loc.line = stop_line; Loc.bol = stop_bol; Loc.off = stop_off };
	}

open Lexer

(*****************************************************************************)
(*****************************************************************************)
(*****************************************************************************)

(** [bop _loc e1 op e2] returns [Calast.ExprBOp (convert_loc _loc, e1, op, e2)] *)
let bop _loc e1 op e2 = Calast.ExprBOp (convert_loc _loc, e1, op, e2)

(** [uop _loc e1 op e2] returns [Calast.ExprUOp (convert_loc _loc, e, op)] *)
let uop _loc op e = Calast.ExprUOp (convert_loc _loc, op, e)

(*****************************************************************************)
(*****************************************************************************)
(*****************************************************************************)
(* Type definitions *)

(** Defines different kinds of type attributes. *)
type type_attr =
	| ExprAttr of Calast.expr (** A type attribute that references an expression. *)
	| TypeAttr of Calast.type_def (** A type attribute that references a type. *)

(** [find_size _loc typeAttrs] attemps to find a [type_attr] named ["size"]
 that is an [ExprAttr]. The function returns a [Calast.expr]. *)
let find_size _loc typeAttrs =
	let attr = 
		List.assoc "size" typeAttrs
	in
  match attr with
		| ExprAttr e -> e
		| _ ->
		  Asthelper.failwith (convert_loc _loc) "size must be an expression!"

(** [find_type _loc typeAttrs] attemps to find a [type_attr] named ["type"]
 that is an [TypeAttr]. The function returns a [Calast.type_def]. *)
let find_type _loc typeAttrs =
	let attr = 
		List.assoc "type" typeAttrs
	in
  match attr with
		| TypeAttr t -> t
		| _ -> Asthelper.failwith (convert_loc _loc) "type must be a type!"

(** [find_size_or_default _loc typeAttrs] attemps to find a [type_attr]
 named ["size"] that is an [ExprAttr]. If not found, the function returns the
 default size given as an [int]. *)
let find_size_or_default _loc typeAttrs default =
	(* size in bits *)
	try
		find_size _loc typeAttrs
	with Not_found ->
  	(* no size information found, assuming "default" bits. *)
  	Calast.ExprInt (convert_loc _loc, default)

(** [type_of_typeDef _loc name typeAttrs] returns a [Calast.type_def] from a
 name and type attributes that were parsed. *)
let type_of_typeDef _loc name typeAttrs =
	match name with
		| "bool" -> Calast.TypeBool
		| "int" -> Calast.TypeInt (find_size_or_default _loc typeAttrs 32)
		| "list" ->
			Asthelper.failwith (convert_loc _loc)
				"The type \"list\" is deprecated. Please use \"List\"."
		| "List" ->
			(* get a type *)
			let t =
				try
					find_type _loc typeAttrs
			  with Not_found ->
			    Asthelper.failwith (convert_loc _loc)
						"RVC-CAL requires that all lists have a type."
			in
			
			(* and a size in number of elements *)
			let size =
				try
					find_size _loc typeAttrs
				with Not_found ->
			    Asthelper.failwith (convert_loc _loc)
						"RVC-CAL requires that all lists have a size."
			in
			Calast.TypeList (t, size)
    | "string" ->
			Asthelper.failwith (convert_loc _loc)
				"The type \"string\" is deprecated. Please use \"String\"."
		| "String" -> Calast.TypeStr
		| "uint" -> Calast.TypeUint (find_size_or_default _loc typeAttrs 32)
		| t ->
			let message = "The type \"" ^ t ^ "\" is not known.\n\
			  Did you want to declare a variable \"" ^ t ^ "\"? \
				If that is the case please specify its type." in
			Asthelper.failwith (convert_loc _loc) message

(*****************************************************************************)
(*****************************************************************************)
(*****************************************************************************)
(* Actor definitions. *)

(** Defines different kinds of actor declarations. *)
type actor_decl =
	| Action of Calast.action (** An action of type [Calast.action]. *)
	| FuncDecl of Calast.func (** A function declaration at the actor level. *)
	| Initialization of Calast.action (** An initialization action of type [Calast.action]. *) 
	| PriorityOrder of Calast.tag list list (** An actor declaration of type priority order. *)
	| ProcDecl of Calast.proc (** A procedure declaration at the actor level. *)
	| VarDecl of Calast.var_info (** A variable declaration at the actor level. *)

let get_something pred map declarations =
	let (actions, declarations) = List.partition pred declarations in
	let actions = List.map map actions in
	(actions, declarations)

(** [get_actions declarations] returns a tuple [(actions, declarations)] where
 actions is a list of actions and declarations the remaining declarations. *)
let get_actions declarations =
	get_something
		(function Action _ -> true | _ -> false)
	  (function | Action a -> a | _ -> failwith "never happens")
		declarations

(** [get_funcs declarations] returns a tuple [(funcs, declarations)] where
 funcs is a list of function declarations and [declarations] the
 remaining declarations. *)
let get_funcs declarations =
	get_something
		(function FuncDecl _ -> true | _ -> false)
	  (function | FuncDecl f -> f | _ -> failwith "never happens")
		declarations

(** [get_priorities declarations] returns a tuple [(priorities, declarations)] where
 priorities is a list of priorities and declarations the remaining declarations. *)
let get_priorities declarations =
	let (priorities, declarations) =
		get_something
			(function PriorityOrder _ -> true | _ -> false)
		  (function | PriorityOrder p -> p | _ -> failwith "never happens")
			declarations
	in
	let priorities = List.flatten priorities in
	(priorities, declarations)

(** [get_funcs declarations] returns a tuple [(funcs, declarations)] where
 funcs is a list of function declarations and [declarations] the
 remaining declarations. *)
let get_procs declarations =
	get_something
		(function ProcDecl _ -> true | _ -> false)
	  (function | ProcDecl p -> p | _ -> failwith "never happens")
		declarations

(** [get_initializes declarations] returns a tuple [(initializes, declarations)]
 where initializes is a list of initialize and declarations the remaining
 declarations. *)
let get_initializes declarations =
	get_something
		(function Initialization _ -> true | _ -> false)
	  (function | Initialization i -> i | _ -> failwith "never happens")
		declarations

(** [get_vars declarations] returns a tuple [(vars, declarations)] where
 vars is a list of local variable declarations and [declarations] the
 remaining declarations. *)
let get_vars declarations =
	get_something
		(function VarDecl _ -> true | _ -> false)
	  (function | VarDecl v -> v | _ -> failwith "never happens")
		declarations

let var assignable global loc name t v =
	{ Calast.v_assignable = assignable;
		v_global = global;
		v_loc = loc;
		v_name = name;
		v_type = t;
		v_value = v }

(*****************************************************************************)
(*****************************************************************************)
(*****************************************************************************)
(* Rule declarations *)
let actor = Gram.Entry.mk "actor"
let actorActionOrInit = Gram.Entry.mk "actorActionOrInit"
let actorDeclarations = Gram.Entry.mk "actorDeclarations"
let actorImport = Gram.Entry.mk "actorImport"
let actorPars = Gram.Entry.mk "actorPars"
let actorPortDecls = Gram.Entry.mk "actorPortDecls"

let action = Gram.Entry.mk "action"
let actionChannelSelector = Gram.Entry.mk "actionChannelSelector"
let actionChannelSelectorNames = Gram.Entry.mk "actionChannelSelectorNames"
let actionDelay = Gram.Entry.mk "actionDelay"
let actionGuards = Gram.Entry.mk "actionGuards"
let actionInputs = Gram.Entry.mk "actionInputs"
let actionOutputs = Gram.Entry.mk "actionOutputs"
let actionRepeat = Gram.Entry.mk "actionRepeat"
let actionStatements = Gram.Entry.mk "actionStatements"
let actionTag = Gram.Entry.mk "actionTag"
let actionTokenNames = Gram.Entry.mk "actionTokenNames"

let expression = Gram.Entry.mk "expression"
let expressionGenerators = Gram.Entry.mk "expressionGenerators"
let expressionGeneratorsOpt = Gram.Entry.mk "expressionGeneratorsOpt"
let expressions = Gram.Entry.mk "expressions"
let ident = Gram.Entry.mk "ident"

let initializationAction = Gram.Entry.mk "initializationAction"

let qualifiedId = Gram.Entry.mk "qualifiedId"

let priorityInequality = Gram.Entry.mk "priorityInequality"
let priorityOrder = Gram.Entry.mk "priorityOrder"
let schedule = Gram.Entry.mk "schedule"
let stateTransition = Gram.Entry.mk "stateTransition"
let stateTransitions = Gram.Entry.mk "stateTransitions"

let statements = Gram.Entry.mk "statements"
let statementForEachIdents = Gram.Entry.mk "statementForEachIdents"
let statementIfElseOpt = Gram.Entry.mk "statementIfElseOpt"

let typeAttrs = Gram.Entry.mk "typeAttrs"
let typeDef = Gram.Entry.mk "typeDef"
let typePars = Gram.Entry.mk "typePars"
let typeParsOpt = Gram.Entry.mk "typeParsOpt"

let varDecl = Gram.Entry.mk "varDecl"
let varDeclFunctionParams = Gram.Entry.mk "varDeclFunctionParams"
let varDeclNoExpr = Gram.Entry.mk "varDeclNoExpr"
let varDecls = Gram.Entry.mk "varDecls"
let varDeclsAndDoOpt = Gram.Entry.mk "varDeclsAndDoOpt"
let varDeclsOpt = Gram.Entry.mk "varDeclsOpt"

(* Grammar definition *)
EXTEND Gram

  (***************************************************************************)
  (* an action. *)
  action: [
		[ inputs = actionInputs; "==>"; outputs = actionOutputs;
		  guards = actionGuards;
			OPT actionDelay;
			decls = varDeclsOpt;
		  stmts = actionStatements;
		  "end" ->
			{
				Calast.a_guards = guards;
				a_inputs = inputs;
				a_loc = convert_loc _loc;
				a_outputs = outputs;
				a_stmts = stmts;
				a_tag = []; (* the tag is filled in the actorDeclarations rule. *)
				a_vars = decls;
			}
		]
	];
	
	actionChannelSelector: [
		[ actionChannelSelectorNames ->
			Asthelper.failwith (convert_loc _loc)
				"RVC-CAL does not support channel selectors." ]
	];
	
	actionChannelSelectorNames: [ [ "at" | "at*" | "any" | "all" ] ];
	
	actionDelay: [ [ "delay"; expression ->
		Asthelper.failwith (convert_loc _loc)
			"RVC-CAL does not permit the use of delay." ] ];
	
	actionGuards: [ [ "guard"; e = expressions -> e | -> [] ] ];

	(* action inputs *)
	actionInputs: [
		[ l = LIST0 [
			"["; tokens = actionTokenNames; "]"; repeat = actionRepeat; OPT actionChannelSelector ->
				("", tokens, repeat)
		| (_, portName) = ident; ":"; "["; tokens = actionTokenNames; "]"; repeat = actionRepeat; OPT actionChannelSelector ->
				(portName, tokens, repeat)
		] SEP "," -> l ]
	];

	(* action outputs *)
	actionOutputs: [
		[ l = LIST0 [
		  "["; exprs = expressions; "]"; repeat = actionRepeat; OPT actionChannelSelector ->
				("", exprs, repeat)
		| (_, portName) = ident; ":"; "["; exprs = expressions; "]"; repeat = actionRepeat; OPT actionChannelSelector ->
				(portName, exprs, repeat)
		] SEP "," -> l ]
	];
	
	actionRepeat: [
		[ "repeat"; e = expression -> e
		| -> Calast.ExprInt (convert_loc _loc, 1) ]
	];
	
	actionStatements: [ [ "do"; s = statements -> s | -> [] ] ];
	
	actionTag: [ [ tag = LIST1 [ (_, id) = ident -> id ] SEP "." -> tag ] ];
	
	actionTokenNames: [
		[	tokens = LIST0 [ (loc, id) = ident -> (loc, id) ] SEP "," -> tokens ]
	];
	
	(***************************************************************************)
  (* a CAL actor. *)
  actor: [
		[ LIST0 actorImport; "actor"; (_, name) = ident; typeParsOpt;
		  "("; parameters = actorPars; ")";
			inputs = actorPortDecls; "==>"; outputs = actorPortDecls; ":";
			declarations1 = actorDeclarations;
			fsm = OPT schedule;
			declarations2 = actorDeclarations;
			"end"; `EOI ->
				let declarations = List.append declarations1 declarations2 in
				let (actions, declarations) = get_actions declarations in
				let (funcs, declarations) = get_funcs declarations in
				let (priorities, declarations) = get_priorities declarations in
				let (procs, declarations) = get_procs declarations in
				let (vars, declarations) = get_vars declarations in
				let (_initializes, declarations) = get_initializes declarations in
				assert (declarations = []);
				{
	        Calast.ac_actions = actions;
	        ac_fsm = fsm;
					ac_funcs = funcs;
	        ac_inputs = inputs;
	        ac_name = name;
	        ac_outputs = outputs;
	        ac_parameters = parameters;
	        ac_priorities = priorities;
	        ac_procs = procs;
	        ac_vars = vars;
	      }
		]
	];
	
	actorActionOrInit: [
		[ "action"; a = action -> Action a
		| "initialize"; i = initializationAction -> Initialization i ]
	];
	
	(* declarations in the actor body. A few rules are duplicated here because
	 the grammar is not LL(1). In contrast with CLR, functions and procedures
	 may only be declared at this level. Cal2C does not support nested function
	 declarations. *) 
	actorDeclarations: [
		[ l = LIST0 [
			"action"; a = action -> Action a
		| "function"; (_, n) = ident; "("; p = varDeclFunctionParams; ")";
		    "-->"; t = typeDef; v = varDeclsOpt; ":"; e = expression; "end" ->
			FuncDecl {
				Calast.f_decls = v;
				f_expr = e;
				f_loc = convert_loc _loc;
				f_name = n;
				f_params = p;
				f_return = t;
			}
		| "procedure"; (_, n) = ident; "("; p = varDeclFunctionParams; ")";
		  v = varDeclsOpt; "begin"; s = statements; "end" ->
			ProcDecl {
				Calast.p_decls = v;
				p_loc = convert_loc _loc;
				p_name = n;
				p_params = p;
				p_stmts = s
			}
		| "initialize"; i = initializationAction -> Initialization i
		| "priority"; p = priorityOrder -> PriorityOrder p

		| (_, tag) = ident; ":"; a = actorActionOrInit ->
			(match a with
				| Action a -> Action {a with Calast.a_tag = [tag]}
				| Initialization a -> Initialization {a with Calast.a_tag = [tag]}
				| _ -> failwith "never happens")
		| (_, tag) = ident; "."; tags = actionTag; ":"; a = actorActionOrInit ->
			(match a with
				| Action a -> Action {a with Calast.a_tag = tag :: tags}
				| Initialization a -> Initialization {a with Calast.a_tag = tag :: tags}
				| _ -> failwith "never happens")
		
		| ident; "["; typePars; "]" ->
			Asthelper.failwith (convert_loc _loc) "RVC-CAL does not support type parameters."

		| (_, name) = ident; (var_loc, var_name) = ident; ";" ->
			  let t = type_of_typeDef _loc name [] in
				VarDecl (var true true var_loc var_name t None)

		| (_, name) = ident; (var_loc, var_name) = ident; "="; e = expression; ";" ->
			  let t = type_of_typeDef _loc name [] in
				VarDecl (var false true var_loc var_name t (Some e))

		| (_, name) = ident; (var_loc, var_name) = ident; ":="; e = expression; ";" ->
			  let t = type_of_typeDef _loc name [] in
				VarDecl (var true true var_loc var_name t (Some e))

		| (_, name) = ident; "("; attrs = typeAttrs; ")"; (var_loc, var_name) = ident; ";" ->
			  let t = type_of_typeDef _loc name attrs in
				VarDecl (var true true var_loc var_name t None)

		| (_, name) = ident; "("; attrs = typeAttrs; ")";
		  (var_loc, var_name) = ident; "="; e = expression; ";" ->
			  let t = type_of_typeDef _loc name attrs in
				VarDecl (var false true var_loc var_name t (Some e))

		| (_, name) = ident; "("; attrs = typeAttrs; ")";
		  (var_loc, var_name) = ident; ":="; e = expression; ";" ->
			  let t = type_of_typeDef _loc name attrs in
				VarDecl (var true true var_loc var_name t (Some e))

    | (_, i) = ident; ";" ->
			Asthelper.failwith (convert_loc _loc)
				("Missing type for declaration of \"" ^ i ^ "\".")
		| (_, i) = ident; "="; expression; ";" ->
			Asthelper.failwith (convert_loc _loc)
				("Missing type for declaration of \"" ^ i ^ "\".")
		| (_, i) = ident; ":="; expression; ";" ->
			Asthelper.failwith (convert_loc _loc)
				("Missing type for declaration of \"" ^ i ^ "\".")
				
		] -> l ]
	];
	
	(* stuff imported by the current actor *)
	actorImport: [
		[ "import"; "all"; qualifiedId; ";" -> ()
		| "import"; qualifiedId; ";" -> () ]
	];
	
	(* actor parameters: type, name and optional expression. *)
	actorPars: [
		[ parameters = LIST0 [
			t = typeDef; (_, name) = ident; v = OPT [ "="; e = expression -> e ] -> 
			var false true (convert_loc _loc) name t v
		] SEP "," -> parameters ]
	];
	
	(* a port declaration: "multi" or not, type and identifier. *)
	actorPortDecls: [
		[ l = LIST0 [
			OPT "multi"; t = typeDef; (_, name) = ident ->
			var false true (convert_loc _loc) name t None
		] SEP "," -> l ]
	];
	
	(***************************************************************************)
  (* expressions. *)	
	expression: [
		"top"
		  [	"["; e = expressions; g = expressionGeneratorsOpt; "]" ->
				Calast.ExprList (convert_loc _loc, e, g)
			| "if"; e1 = SELF; "then"; e2 = expression; "else"; e3 = expression; "end" ->
				Calast.ExprIf (convert_loc _loc, e1, e2, e3) ]
	| "or"
		  [	e1 = SELF; "or"; e2 = SELF -> bop _loc e1 Calast.BOpOr e2 ]
	| "and"
			[ e1 = SELF; "and"; e2 = SELF -> bop _loc e1 Calast.BOpAnd e2 ]
	| "cmp"
	    [ e1 = SELF; "="; e2 = SELF -> bop _loc e1 Calast.BOpEQ e2
			| e1 = SELF; "!="; e2 = SELF -> bop _loc e1 Calast.BOpNE e2
			| e1 = SELF; "<"; e2 = SELF -> bop _loc e1 Calast.BOpLT e2
			| e1 = SELF; "<="; e2 = SELF -> bop _loc e1 Calast.BOpLE e2
			| e1 = SELF; ">"; e2 = SELF -> bop _loc e1 Calast.BOpGT e2
			| e1 = SELF; ">="; e2 = SELF -> bop _loc e1 Calast.BOpGE e2	]
	|	"add"
      [ e1 = SELF; "+"; e2 = SELF -> bop _loc e1 Calast.BOpPlus e2
      | e1 = SELF; "-"; e2 = SELF -> bop _loc e1 Calast.BOpMinus e2 ]
	| "mul"
		  [ e1 = SELF; "div"; e2 = SELF -> bop _loc e1 Calast.BOpDivInt e2
      | e1 = SELF; "mod"; e2 = SELF -> bop _loc e1 Calast.BOpMod e2
			| e1 = SELF; "*"; e2 = SELF -> bop _loc e1 Calast.BOpTimes e2
      | e1 = SELF; "/"; e2 = SELF -> bop _loc e1 Calast.BOpDiv e2 ]
	| "exp"
	    [ e1 = SELF; "^"; e2 = SELF -> bop _loc e1 Calast.BOpExp e2 ]
	| "unary"
		  [	"-"; e = SELF -> uop _loc Calast.UOpMinus e
			| "not"; e = SELF -> uop _loc Calast.UOpNot e
			| "#"; e = SELF -> uop _loc Calast.UOpNbElts e ]
	| "simple"
			[ "("; e = SELF; ")" -> e
			| "true" -> Calast.ExprBool (convert_loc _loc, true)
			| "false" -> Calast.ExprBool (convert_loc _loc, false)
			| (i, _) = INT -> Calast.ExprInt (convert_loc _loc, i)
			| (s, _) = STRING -> Calast.ExprStr (convert_loc _loc, s)
			| (_, v) = ident; "("; el = expressions; ")" ->
				Calast.ExprCall (convert_loc _loc, v, el)
			| (loc, v) = ident; el = LIST1 [ "["; e = expression; "]" -> e ] ->
				Calast.ExprIdx (convert_loc _loc, (loc, v), el)
			| (loc, v) = ident -> Calast.ExprVar (loc, v) ]
	];
	
	expressionGenerators: [
		[ l = LIST1 [
			"for"; t = typeDef; (loc, name) = ident; "in"; e = expression ->
				let var = var false false loc name t None in
				(var, e) ] SEP "," -> l ]
	];
	
	expressionGeneratorsOpt: [ [ ":"; g = expressionGenerators -> g | -> [] ] ]; 
	
	expressions: [ [ l = LIST0 [ e = expression -> e ] SEP "," -> l ] ];
	
	ident: [ [ s = IDENT -> (convert_loc _loc, s) ] ];
	
	(***************************************************************************)
	(* initialization action. *)
	initializationAction: [
		[ "==>"; outputs = actionOutputs;
		  guards = actionGuards; OPT actionDelay; decls = varDeclsOpt;
		  stmts = actionStatements;
		  "end" ->
			{
				Calast.a_guards = guards;
				a_inputs = [];
				a_loc = convert_loc _loc;
				a_outputs = outputs;
				a_stmts = stmts;
				a_tag = []; (* the tag is filled in the actorDeclarations rule. *)
				a_vars = decls;
			}
		]
	];
	
	(***************************************************************************)
	qualifiedId: [ [ qid = LIST1 [ id = ident -> id ] SEP "." -> qid ] ];
	
	(***************************************************************************)
	(* schedule and priorities. We only support FSM schedules. *)
	priorityInequality: [
		[ tag = actionTag; ">"; tags = LIST1 [a = actionTag -> a ] SEP ">" -> tag :: tags ]
	];
	
	priorityOrder: [ [ l = LIST0 [ p = priorityInequality; ";" -> p ]; "end" -> l ] ];
	
	schedule: [
		[ "schedule"; "fsm"; (_, first_state) = ident; ":";
		  transitions = stateTransitions; "end" -> (first_state, transitions)
		| "schedule"; "regexp" ->
			Asthelper.failwith (convert_loc _loc) "RVC-CAL does not support \"regexp\" schedules."
		]
	];
	
	stateTransition: [
    [ (_, from_state) = ident; "("; action = actionTag; ")"; "-->"; (_, to_state) = ident; ";" ->
			(from_state, action, to_state) ]
	];
	
	stateTransitions: [ [ l = LIST0 [ t = stateTransition -> t ] -> l ] ];
	
	(***************************************************************************)
	(* statements: while, for, if... *)
	statements: [
		[ l = LIST0 [
			"begin"; decls = varDeclsAndDoOpt; st = statements; "end" ->
			Calast.StmtBlock (convert_loc _loc, decls, st)
		| "choose" ->
			Asthelper.failwith (convert_loc _loc)
				"RVC-CAL does not support the \"choose\" statement."
		| "for" ->
			Asthelper.failwith (convert_loc _loc)
				"RVC-CAL does not support the \"for\" statement, please use \"foreach\" instead."
		| "foreach"; var = varDeclNoExpr; "in"; e = expression;
			v = varDeclsOpt; "do"; s = statements; "end" ->
			Calast.StmtForeach (convert_loc _loc, var, e, v, s)
		| "foreach"; typeDef; ident; "in"; expression; ".."; expression ->
			Asthelper.failwith (convert_loc _loc)
				"RVC-CAL does not support the \"..\" construct, please use \"Integers\" instead."
		| "if"; e = expression; "then"; s1 = statements; s2 = statementIfElseOpt; "end" ->
			Calast.StmtIf (convert_loc _loc, e, s1, s2)
		| "while"; e = expression; decls = varDeclsOpt; "do"; s = statements; "end" ->
			Calast.StmtWhile (convert_loc _loc, e, decls, s)
		| (loc, v) = ident; "["; el = expressions; "]"; ":="; e = expression; ";" ->
			Calast.StmtInstr (convert_loc _loc,
				[Calast.InstrAssignArray (convert_loc _loc, (loc, v), el, e)])
		| (_, v) = ident; "."; (_, f) = ident; ":="; e = expression; ";" ->
			Calast.StmtInstr (convert_loc _loc,
				[Calast.InstrAssignField (convert_loc _loc, v, f, e)])
		| (loc, v) = ident; ":="; e = expression; ";" ->
			Calast.StmtInstr (convert_loc _loc,
				[Calast.InstrAssignVar (convert_loc _loc, (loc, v), e)])
		| (_, v) = ident; "("; el = expressions; ")"; ";" ->
			Calast.StmtInstr (convert_loc _loc,
				[Calast.InstrCall (convert_loc _loc, v, el)])
		| (_, v) = ident; "."; (_, m) = ident; "("; el = expressions; ")";
		  LIST0 [ "."; ident; "("; expressions; ")" ]; ";" ->
			Calast.StmtInstr (convert_loc _loc,
				[Calast.InstrCallMethod (convert_loc _loc, v, m, el)])
		] -> l ]
	];
	
	statementForEachIdents: [ [ l = LIST1 [ t = typeDef; (loc, name) = ident ->
		var false false loc name t None
	] -> l ] ];
	
	statementIfElseOpt: [ [ "else"; s = statements -> s | -> [] ] ];
	
	(***************************************************************************)
	(* a type attribute, such as "type:" and "size=" *)
	typeAttrs: [
		[ l = LIST0 [
			(_, attr) = ident; ":"; t = typeDef -> (attr, TypeAttr t)
		| (_, attr) = ident; "="; e = expression -> (attr, ExprAttr e)
		] SEP "," -> l ]
	];

	(* a type definition: bool, int(size=5), list(type:int, size=10)... *)	
	typeDef: [
		[ (_, name) = ident -> type_of_typeDef _loc name []
		| ident; "["; typePars; "]" ->
			Asthelper.failwith (convert_loc _loc) "RVC-CAL does not support type parameters."
		| (_, name) = ident; "("; attrs = typeAttrs; ")" ->
			  type_of_typeDef _loc name attrs ]
	];
	
	(* type parameters, not supported at this point. *)
	typePars: [ [ LIST0 [ IDENT -> () | IDENT; "<"; typeDef -> ()	] SEP "," -> () ] ];
	
	typeParsOpt: [
		[ "["; typePars; "]" ->
			Asthelper.failwith (convert_loc _loc) "RVC-CAL does not support type parameters."
		| ]
	];
	
	(***************************************************************************)
	(* variable declarations. *)
	
	(* we do not support nested declarations of functions nor procedures. *)
	varDecl: [
		[ t = typeDef; (loc, name) = ident; "="; e = expression ->
			var false false loc name t (Some e)
		| t = typeDef; (loc, name) = ident; ":="; e = expression ->
			var true false loc name t (Some e)
		| t = typeDef; (loc, name) = ident -> var true false loc name t None ]
	];
	
	
	(* t = typeDef; (loc, name) = ident -> var false false loc name t None	*)
	varDeclFunctionParams: [
		[ l = LIST0
			[ t = typeDef; (loc, name) = ident -> var true false loc name t None
		] SEP "," -> l ]
	];

	varDeclNoExpr: [
		[ t = typeDef; (loc, name) = ident -> var false false loc name t None
		]
	];
	
	varDecls: [ [ l = LIST1 [ v = varDecl -> v] SEP "," -> l ] ];
	
	varDeclsAndDoOpt: [ [ "var"; decls = varDecls; "do" -> decls | -> [] ] ];
	
	varDeclsOpt: [ [ "var"; decls = varDecls -> decls | -> [] ] ];
END

(*****************************************************************************)
(* additional grammar for -D <type> <name> = <value> *)

let arg = Gram.Entry.mk "arg"

(* Grammar definition *)
EXTEND Gram

  arg: [
		[ (loc, name) = ident; "="; e = expression ->
			var false true loc name Calast.TypeUnknown (Some e) ]
	];

END

let parse_with_msg f rule loc stream =
	try
		f rule loc stream
	with Camlp4Loc.Exc_located (loc, exn) ->
		(match exn with
			| Stream.Error err -> fprintf stderr "%s\n%s\n" (Camlp4Loc.to_string loc) err
			| _ -> fprintf stderr "%s\n%s\n" (Camlp4Loc.to_string loc) (Printexc.to_string exn));
		exit (-1)

(** [parse_actor path] parses the file whose absolute path is given by [path]
 and returns a [Calast.actor]. If anything goes wrong, Cal2C exists. *)
let parse_actor file =
	let t1 = Sys.time () in
	let ch = open_in file in
	let actor =
		parse_with_msg Gram.parse actor (Loc.mk file) (Stream.of_channel ch)
	in
	close_in ch;
	let t2 = Sys.time () in
	time := !time +. t2 -. t1;
	actor

(** [parse_arg str] parses the string [str] as a variable declaration,
 and returns a [Calast.var_decl]. If anything goes wrong, Cal2C exits. *)
let parse_arg str =
	parse_with_msg Gram.parse arg (Loc.mk str) (Stream.of_string str)

let parse_expr str =
	parse_with_msg Gram.parse expression (Loc.mk str) (Stream.of_string str)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Caml-list] camlp4 stream parser syntax
  2009-03-08  0:20       ` Re : " Matthieu Wipliez
@ 2009-03-08  0:29         ` Jon Harrop
  2009-03-08  0:30         ` Re : " Joel Reymont
  1 sibling, 0 replies; 36+ messages in thread
From: Jon Harrop @ 2009-03-08  0:29 UTC (permalink / raw)
  To: caml-list

On Sunday 08 March 2009 00:20:06 Matthieu Wipliez wrote:
> Joel asked me the parser so I gave it to him, but maybe it can be of use
> for others, so here it is. Apart from the code specific to the application,
> it gives a good example of a complete Camlp4 lexer/parser for a language.
>
> Note that for the lexer I started from a custom lexer made by Pietro Abate
> ( https://www.cduce.org/~abate/how-add-a-custom-lexer-camlp4 ) from the
> cduce lexer.

These are really wonderful examples, thank you!

I had no idea Camlp4 had been used to write such non-trivial parsers...

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : [Caml-list] camlp4 stream parser syntax
  2009-03-08  0:20       ` Re : " Matthieu Wipliez
  2009-03-08  0:29         ` Jon Harrop
@ 2009-03-08  0:30         ` Joel Reymont
  2009-03-08  0:37           ` Re : " Matthieu Wipliez
  1 sibling, 1 reply; 36+ messages in thread
From: Joel Reymont @ 2009-03-08  0:30 UTC (permalink / raw)
  To: Matthieu Wipliez; +Cc: caml-list


On Mar 8, 2009, at 12:20 AM, Matthieu Wipliez wrote:

> Joel asked me the parser so I gave it to him, but maybe it can be of  
> use for others, so here it is.


While we are on this subject... How do you troubleshoot camlp4 rules?

With a stream parser you can invoke individual functions since each is  
a full-blown parser. Can the same be done with camlp4, e.g. individual  
rules invoked?

Can rules be traced to see which ones are being taken?

	Thanks, Joel

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re : Re : [Caml-list] camlp4 stream parser syntax
  2009-03-08  0:30         ` Re : " Joel Reymont
@ 2009-03-08  0:37           ` Matthieu Wipliez
  0 siblings, 0 replies; 36+ messages in thread
From: Matthieu Wipliez @ 2009-03-08  0:37 UTC (permalink / raw)
  To: caml-list


> While we are on this subject... How do you troubleshoot camlp4 rules?

Not sure what you mean :(

> With a stream parser you can invoke individual functions since each is a 
> full-blown parser. Can the same be done with camlp4, e.g. individual rules 
> invoked?

Well when you invoke the parser with Gram.parse, you give it the entry point. So you may parse only a subset of your language if the grammar allows it.

> Can rules be traced to see which ones are being taken?

Erm, I don't really know... You can always printf when a rule is taken, but I'm not aware of a built-in construct that allows you to monitor the rules that are taken.

Cheers,
Matthieu






^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-07 23:21   ` Re : [Caml-list] " Matthieu Wipliez
  2009-03-07 23:42     ` Joel Reymont
@ 2009-03-08  0:40     ` Joel Reymont
  2009-03-08  1:08       ` Re : " Matthieu Wipliez
  1 sibling, 1 reply; 36+ messages in thread
From: Joel Reymont @ 2009-03-08  0:40 UTC (permalink / raw)
  To: Matthieu Wipliez; +Cc: O'Caml Mailing List

Matthieu,

Is the camlp4 grammar parser case-insensitive?

Will both Delay and delay be accepted in the actionDelay rule?

	actionDelay: [ [ "delay"; expression ->
		Asthelper.failwith (convert_loc _loc)
			"RVC-CAL does not permit the use of delay." ] ];


Also, I noticed that your lexer has a really small token set, i.e.

type token =
     | KEYWORD of string
     | SYMBOL  of string
     | IDENT   of string
     | INT     of int * string
     | FLOAT   of float * string
     | CHAR    of char * string
     | STRING  of string * string
     | EOI

My custom lexer, on the other hand, has a HUGE token set, e.g.

   type token =
     | BUY_TO_COVER
     | SELL_SHORT
     | AT_ENTRY
     | RANGE
     | YELLOW
     | WHITE
     | WHILE
     | UNTIL
     ...

This is partly because I have a very large set of keywords.

Do I correctly understand that I do not need all the keywords since I  
can match them in the camlp4 grammar as strings like "BuyToCover",  
"SellShort", etc.?

	Thanks, Joel

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re : Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08  0:40     ` Joel Reymont
@ 2009-03-08  1:08       ` Matthieu Wipliez
  2009-03-08  8:25         ` Joel Reymont
  2009-03-08  9:34         ` Joel Reymont
  0 siblings, 2 replies; 36+ messages in thread
From: Matthieu Wipliez @ 2009-03-08  1:08 UTC (permalink / raw)
  To: Joel Reymont; +Cc: O'Caml Mailing List


> Matthieu,
> 
> Is the camlp4 grammar parser case-insensitive?
> 
> Will both Delay and delay be accepted in the actionDelay rule?
> 
>     actionDelay: [ [ "delay"; expression ->
>         Asthelper.failwith (convert_loc _loc)
>             "RVC-CAL does not permit the use of delay." ] ];

No, only "delay" is accepted.

> Also, I noticed that your lexer has a really small token set, i.e.
> 
> type token =
>     | KEYWORD of string
>     | SYMBOL  of string
>     | IDENT   of string
>     | INT     of int * string
>     | FLOAT   of float * string
>     | CHAR    of char * string
>     | STRING  of string * string
>     | EOI
> 
> My custom lexer, on the other hand, has a HUGE token set, e.g.
> 
>   type token =
>     | BUY_TO_COVER
>     | SELL_SHORT
>     | AT_ENTRY
>     | RANGE
>     | YELLOW
>     | WHITE
>     | WHILE
>     | UNTIL
>     ...
> 
> This is partly because I have a very large set of keywords.
> 
> Do I correctly understand that I do not need all the keywords since I can match 
> them in the camlp4 grammar as strings like "BuyToCover", "SellShort", etc.?

Yes that's right.

Also a good source of information, being given the status of Camlp4
documentation, is Camlp4 source code, especially camlp4/Camlp4Parsers/Camlp4OCamlRevisedParser.ml and
Camlp4OCamlParser.ml

> I had no idea Camlp4 had been used to write such non-trivial parsers...

Actually the aforementioned files show the power of Camlp4 parsing and grammar extension capabilities quite well IMHO.

Cheers,
Matthieu





^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08  1:08       ` Re : " Matthieu Wipliez
@ 2009-03-08  8:25         ` Joel Reymont
  2009-03-08  9:37           ` Daniel de Rauglaudre
  2009-03-08 11:45           ` Re : Re : " Matthieu Wipliez
  2009-03-08  9:34         ` Joel Reymont
  1 sibling, 2 replies; 36+ messages in thread
From: Joel Reymont @ 2009-03-08  8:25 UTC (permalink / raw)
  To: Matthieu Wipliez; +Cc: O'Caml Mailing List

How can I make camlp4 parsing case-insensitive?

The only approach I can think of so far is to build a really larger  
set of tokens and use them instead of strings in the parser.

Any flag I can flip or way to do this without a large set of tokens?

	Thanks, Joel

On Mar 8, 2009, at 1:08 AM, Matthieu Wipliez wrote:

>
>> Is the camlp4 grammar parser case-insensitive?
>>
>> Will both Delay and delay be accepted in the actionDelay rule?
>>
>>    actionDelay: [ [ "delay"; expression ->
>>        Asthelper.failwith (convert_loc _loc)
>>            "RVC-CAL does not permit the use of delay." ] ];
>
> No, only "delay" is accepted.



---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08  1:08       ` Re : " Matthieu Wipliez
  2009-03-08  8:25         ` Joel Reymont
@ 2009-03-08  9:34         ` Joel Reymont
  1 sibling, 0 replies; 36+ messages in thread
From: Joel Reymont @ 2009-03-08  9:34 UTC (permalink / raw)
  To: Matthieu Wipliez; +Cc: O'Caml Mailing List


On Mar 8, 2009, at 1:08 AM, Matthieu Wipliez wrote:

>>    actionDelay: [ [ "delay"; expression ->
>>        Asthelper.failwith (convert_loc _loc)
>>            "RVC-CAL does not permit the use of delay." ] ];

Which of the following tokens does "delay" get checked against?

I'm assuming that camlp4 has to give "delay" to the lexer somehow and  
ask the lexer if the next token matches "delay".

How does this happen?

>>
>> type token =
>>    | KEYWORD of string
>>    | SYMBOL  of string
>>    | IDENT   of string
>>    | INT     of int * string
>>    | FLOAT   of float * string
>>    | CHAR    of char * string
>>    | STRING  of string * string
>>    | EOI


	Thanks, Joel

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08  8:25         ` Joel Reymont
@ 2009-03-08  9:37           ` Daniel de Rauglaudre
  2009-03-08  9:51             ` Joel Reymont
  2009-03-08 11:45           ` Re : Re : " Matthieu Wipliez
  1 sibling, 1 reply; 36+ messages in thread
From: Daniel de Rauglaudre @ 2009-03-08  9:37 UTC (permalink / raw)
  To: caml-list

Hi

On Sun, Mar 08, 2009 at 08:25:23AM +0000, Joel Reymont wrote:

> How can I make camlp4 parsing case-insensitive?

I think it should work with doing the two following things (both):

1/ Change your lexer to generate case-insensitive tokens.

2/ Use the field "tok_match" of the interface with the lexer. Redefining
   it allows you to match some token pattern with the corresponding token.
     See doc (camlp5) in:
       http://pauillac.inria.fr/~ddr/camlp5/doc/htmlc/grammars.html#b:The-lexer-record
   In the example "default_match", change the test "if con = p_con" into
   "if String.lowercase con = p_con".

Don't know if it still works with Camlp4, but you can often use the
Camlp5 documentation even for many Camlp4 features.

-- 
Daniel de Rauglaudre
http://pauillac.inria.fr/~ddr/


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08  9:37           ` Daniel de Rauglaudre
@ 2009-03-08  9:51             ` Joel Reymont
  2009-03-08 10:27               ` Daniel de Rauglaudre
  0 siblings, 1 reply; 36+ messages in thread
From: Joel Reymont @ 2009-03-08  9:51 UTC (permalink / raw)
  To: Daniel de Rauglaudre; +Cc: caml-list

I would prefer to use the #2 approach but I'm using a custom lexer  
built by ocamllex.

Where would I plug in String.lowercase con = ... in Matthieu's lexer,  
for example?

	Thanks, Joel

On Mar 8, 2009, at 9:37 AM, Daniel de Rauglaudre wrote:

> 2/ Use the field "tok_match" of the interface with the lexer.  
> Redefining
>   it allows you to match some token pattern with the corresponding  
> token.
>     See doc (camlp5) in:
>       http://pauillac.inria.fr/~ddr/camlp5/doc/htmlc/ 
> grammars.html#b:The-lexer-record
>   In the example "default_match", change the test "if con = p_con"  
> into
>   "if String.lowercase con = p_con".

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08  9:51             ` Joel Reymont
@ 2009-03-08 10:27               ` Daniel de Rauglaudre
  2009-03-08 10:35                 ` Joel Reymont
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel de Rauglaudre @ 2009-03-08 10:27 UTC (permalink / raw)
  To: caml-list

Hi,

On Sun, Mar 08, 2009 at 09:51:26AM +0000, Joel Reymont wrote:

> I would prefer to use the #2 approach but I'm using a custom lexer  
> built by ocamllex.

Mmm... I am not eventually sure that what I said was correct... I should
test it myself, what I generally do before asserting things... :-)

But I was not clear: I said that you had to program *both* items. It
was not an "or" but an "and"...

But... it was false...

Bsakjfvouveoussasj.... I said nothing... I restart...

A change in the lexer should be sufficient.

If you cannot (or if you don't want):

Only changing the "tok_match" record field (2nd point) would not work
for keywords (defined by "just a string" in Camlp* grammars), because
the lexer *must* recognize all combinations of the identifier as
keywords, implying a change, anyway, in the lexer.

On the other hand, if you can accept that these identifiers are not
keywords (i.e. not reserved names), and if there a token for identifiers,
like "LIDENT" of "UIDENT" in Camlp* proposed lexer (module Plexer in
Camlp5), you can put them in your grammar as (for example):
     LIDENT "delay"
instead of:
     "delay"

In this case, a change of the "tok_match" record field should work.
Define the function:

let my_tok_match =
  function
    (p_con, "") ->
       begin function (con, prm) ->
         if con = p_con then prm else raise Stream.Failure
       end
  | (p_con, p_prm) ->
       begin function (con, prm) ->
         if String.lowercase con = p_con && prm = p_prm then prm
         else raise Stream.Failure
       end
;;

Then look for an identifier named "tok_match" in your code, which
should be a record field, and define that "tok_match" record field as
"my_tok_match".

If you don't find it, perhaps it is implicitely used by another Camlp*
library function. In this case, well, more work may have been done.

-- 
Daniel de Rauglaudre
http://pauillac.inria.fr/~ddr/


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 10:27               ` Daniel de Rauglaudre
@ 2009-03-08 10:35                 ` Joel Reymont
  2009-03-08 11:07                   ` Joel Reymont
  0 siblings, 1 reply; 36+ messages in thread
From: Joel Reymont @ 2009-03-08 10:35 UTC (permalink / raw)
  To: Daniel de Rauglaudre; +Cc: caml-list


On Mar 8, 2009, at 10:27 AM, Daniel de Rauglaudre wrote:

> Only changing the "tok_match" record field (2nd point) would not work
> for keywords (defined by "just a string" in Camlp* grammars), because
> the lexer *must* recognize all combinations of the identifier as
> keywords, implying a change, anyway, in the lexer.


This is precisely what I'm trying to figure out.

What do I have to change in my _custom_ lexer generated by ocamllex to  
recognize keywords defined by just a string in camlp4 grammars. I'm  
not using LIDENT, etc. as I have my own set of tokens.

I understand that I need to downcase the keyword (or upcase) but I  
don't understand where I need to do this.

The filter module nested in the token module seems like a good  
candidate. What functions of the lexer or filter are accessed when a  
string keyword (e.g. "delay") is found in the camlp4 grammar?

	Thanks, Joel

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 10:35                 ` Joel Reymont
@ 2009-03-08 11:07                   ` Joel Reymont
  2009-03-08 11:28                     ` Daniel de Rauglaudre
  0 siblings, 1 reply; 36+ messages in thread
From: Joel Reymont @ 2009-03-08 11:07 UTC (permalink / raw)
  To: Daniel de Rauglaudre, caml-list List


On Mar 8, 2009, at 10:35 AM, Joel Reymont wrote:

> The filter module nested in the token module seems like a good  
> candidate. What functions of the lexer or filter are accessed when a  
> string keyword (e.g. "delay") is found in the camlp4 grammar?


The filter portion of the token module looks like this (more below) ...

   module Token = struct
     module Loc = Loc

     type t = token

     ...

     module Filter = struct
       type token_filter = (t, Loc.t) Camlp4.Sig.stream_filter

       type t =
         { is_kwd : string -> bool;
           mutable filter : token_filter }

       let mk is_kwd =
         { is_kwd = is_kwd;
           filter = fun s -> s }

       let keyword_conversion tok is_kwd =
         match tok with
           SYMBOL s | IDENT s when is_kwd s -> KEYWORD s
         | _ -> tok

       ...
     end
   end

The relevant part here is the function is_kwd : (string -> bool)  
that's passed to Filter.mk. Within the bowels of OCaml a keyword hash  
table is set up and used to manage keywords, e.g gkeywords in gram  
below.

The functions using and removing (below) can be used to add and remove  
keywords.

module Structure =
   struct
     open Sig.Grammar

     module type S =
       sig
         module Loc : Sig.Loc

         module Token : Sig.Token with module Loc = Loc

         module Lexer : Sig.Lexer with module Loc = Loc
           and module Token = Token

         module Context : Context.S with module Token = Token

         module Action : Sig.Grammar.Action

         type gram =
           { gfilter : Token.Filter.t;
             gkeywords : (string, int ref) Hashtbl.t;
             glexer :
               Loc.t -> char Stream.t -> (Token.t * Loc.t) Stream.t;
             warning_verbose : bool ref; error_verbose : bool ref
           }

         type efun =
           Context.t -> (Token.t * Loc.t) Stream.t -> Action.t

         type token_pattern = ((Token.t -> bool) * string)

         type internal_entry = ...

         type production_rule = ((symbol list) * Action.t)

         ...

         val get_filter : gram -> Token.Filter.t

         val using : gram -> string -> unit

         val removing : gram -> string -> unit

       end

Matthieu is using this bit to parse

let parse_arg str =
	parse_with_msg Gram.parse arg (Loc.mk str) (Stream.of_string str)

Should I just invoke Gram.using ... ? I feel that the solution is  
staring me in the face here but I still can't recognize it. Help!!!

	Thanks, Joel


---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 11:07                   ` Joel Reymont
@ 2009-03-08 11:28                     ` Daniel de Rauglaudre
  0 siblings, 0 replies; 36+ messages in thread
From: Daniel de Rauglaudre @ 2009-03-08 11:28 UTC (permalink / raw)
  To: caml-list

Hi,

On Sun, Mar 08, 2009 at 11:07:02AM +0000, Joel Reymont wrote:

> Should I just invoke Gram.using ... ? I feel that the solution is  
> staring me in the face here but I still can't recognize it. Help!!!

Well, I am afraid it is probably Camlp4 (not 5). Nicolas Pouillard
probably could help, I don't know the details of the changes done
in Camlp4.

-- 
Daniel de Rauglaudre
http://pauillac.inria.fr/~ddr/


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re : Re : Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08  8:25         ` Joel Reymont
  2009-03-08  9:37           ` Daniel de Rauglaudre
@ 2009-03-08 11:45           ` Matthieu Wipliez
  2009-03-08 11:52             ` Joel Reymont
  1 sibling, 1 reply; 36+ messages in thread
From: Matthieu Wipliez @ 2009-03-08 11:45 UTC (permalink / raw)
  To: O'Caml Mailing List


Since I don't know how to use the filter either, I tried to find another way :-)

In your lexer, do you have something along the lines of the "calc" examples in ocamllex official documentation, like a hash table that associates strings to tokens?

In this case, here is a possible solution, you have your hash table associate a lowercase version of the token with what you'd like to use in the grammar:
"buytocover" => "BuyToCover"
"sellshort" => "SellShort"
...

And you replace the lookup with
try
  IDENT (Hashtbl.find keyword_table (String.lowercase id))
with Not_found ->
  IDENT id

This way identifiers that when lower-cased look like "buytocover" ("BuYTOCovEr", "bUytOcOVeR", etc.) are replaced by a single token "BuyToCover", against which you match in the grammar.

Could this satisfy your requirements?

Cheers,
Matthieu



----- Message d'origine ----
> De : Joel Reymont <joelr1@gmail.com>
> À : Matthieu Wipliez <mwipliez@yahoo.fr>
> Cc : O'Caml Mailing List <caml-list@yquem.inria.fr>
> Envoyé le : Dimanche, 8 Mars 2009, 9h25mn 23s
> Objet : Re: Re : Re : [Caml-list] Re: camlp4 stream parser syntax
> 
> How can I make camlp4 parsing case-insensitive?
> 
> The only approach I can think of so far is to build a really larger  
> set of tokens and use them instead of strings in the parser.
> 
> Any flag I can flip or way to do this without a large set of tokens?
> 
>     Thanks, Joel
> 
> On Mar 8, 2009, at 1:08 AM, Matthieu Wipliez wrote:
> 
> >
> >> Is the camlp4 grammar parser case-insensitive?
> >>
> >> Will both Delay and delay be accepted in the actionDelay rule?
> >>
> >>    actionDelay: [ [ "delay"; expression ->
> >>        Asthelper.failwith (convert_loc _loc)
> >>            "RVC-CAL does not permit the use of delay." ] ];
> >
> > No, only "delay" is accepted.
> 
> 
> 
> ---
> http://tinyco.de
> Mac, C++, OCaml






^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : Re : Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 11:45           ` Re : Re : " Matthieu Wipliez
@ 2009-03-08 11:52             ` Joel Reymont
  2009-03-08 13:33               ` Re : " Matthieu Wipliez
  0 siblings, 1 reply; 36+ messages in thread
From: Joel Reymont @ 2009-03-08 11:52 UTC (permalink / raw)
  To: Matthieu Wipliez; +Cc: O'Caml Mailing List


On Mar 8, 2009, at 11:45 AM, Matthieu Wipliez wrote:

> In this case, here is a possible solution, you have your hash table  
> associate a lowercase version of the token with what you'd like to  
> use in the grammar:
> "buytocover" => "BuyToCover"
> "sellshort" => "SellShort"
> ...


I'm doing this already but I don't think it will do the trick with a  
camlp4 parser since it goes through is_kwd to find a match when you  
use "delay".

I think that the internal keyword hash table in the grammar needs to  
be populated with lowercase keywords (by invoking 'using'). I don't  
know how to get to the 'using' function yet, though.

	Thanks, Joel

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re : Re : Re : Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 11:52             ` Joel Reymont
@ 2009-03-08 13:33               ` Matthieu Wipliez
  2009-03-08 13:59                 ` Joel Reymont
  0 siblings, 1 reply; 36+ messages in thread
From: Matthieu Wipliez @ 2009-03-08 13:33 UTC (permalink / raw)
  To: Joel Reymont; +Cc: O'Caml Mailing List


> > In this case, here is a possible solution, you have your hash table associate 
> a lowercase version of the token with what you'd like to use in the grammar:
> > "buytocover" => "BuyToCover"
> > "sellshort" => "SellShort"
> > ...
> 
> 
> I'm doing this already but I don't think it will do the trick with a camlp4 
> parser since it goes through is_kwd to find a match when you use "delay".

I've just tested the idea with my lexer, in the rule identifier:
  | identifier as ident {
    if String.lowercase ident = "action" then
      IDENT "ActioN"
    else
      IDENT ident

replacing entries in the grammar that match against "action" so they match against "ActioN".

In the source code, I have
reload: ActIon in8:[i]
shift: acTIon

And Camlp4 parses it correctly. I have a tentative explanation as why it works below:

> I think that the internal keyword hash table in the grammar needs to be 
> populated with lowercase keywords (by invoking 'using'). I don't know how to get 
> to the 'using' function yet, though.

I don't think so, here is what happens:
  1) you preprocess your grammar with camlp4of. This transforms the EXTEND statements (and a lot of other stuff) to calls to Camlp4 modules/functions.
The grammar parser is in the Camlp4GrammarParser module.
In the rule "symbol", the entry | s = STRING -> matches strings (literal tokens) and produces a TXkwd s.
This is later transformed by make_expr to an expression Camlp4Grammar__.Skeyword s (quotation <:expr< $uid:gm$.Skeyword $str:kwd$ >>)
What this means is that at compile time an entry
  my_rule : [ [ "BuyOrSell"; .. ] ]
gets transformed to an AST node
  Skeyword "BuyOrSell"

You can see that by running "camlp4of" on the parser. Every rule gets transformed to a call to Gram.extend function, with Gram.Sopt, Gram.Snterm, Gram.Skeyword etc.

  2) At runtime, when you start your program, all the Gram.extend calls are executed (because they are top-level). Your parser is kind of configured.
It turns out that extend is just a synonym for Insert.extend
  (last line of Static module)

  value extend = Insert.extend

This function will insert rules and tokens into Camlp4. The insert_tokens function tells us that whenever a Skeyword is seen, "using gram kwd" is called.
I believe this is the function you're referring to?

This function calls Structure.using, which basically add a keyword if necessary, and increase its reference count. (I think this is to automatically remove unused keywords, remember that Camlp4 can also delete rules, not only insert them).



So to sum up: when you declare a rule with a token "MyToken", the grammar is configured to recognize a "MyToken" keyword.

Now the lexer produces IDENT (or SYMBOL for that matters). SYMBOLs are KEYWORDs by default. IDENTs become KEYWORDs if they match the keyword content.

So in our case, the lexer recognizes identifiers. If this identifier equals (case-insensitively speaking) "mytoken", we declare an IDENT "MyToken", which will be later recognized as the "MyToken" keyword (because the is_kwd test is case-sensitive).

Cheers,
Matthieu

> 
>     Thanks, Joel
> 
> ---
> http://tinyco.de
> Mac, C++, OCaml






^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 13:33               ` Re : " Matthieu Wipliez
@ 2009-03-08 13:59                 ` Joel Reymont
  2009-03-08 14:09                   ` Re : " Matthieu Wipliez
  0 siblings, 1 reply; 36+ messages in thread
From: Joel Reymont @ 2009-03-08 13:59 UTC (permalink / raw)
  To: Matthieu Wipliez; +Cc: O'Caml Mailing List


On Mar 8, 2009, at 1:33 PM, Matthieu Wipliez wrote:

> So to sum up: when you declare a rule with a token "MyToken", the  
> grammar is configured to recognize a "MyToken" keyword.

The issue here is that it must be lower case in the camlp4 rules, i.e.  
"mytoken".

What if I want to have "MyToken" (camel-case) in the rule and have it  
be low-cased when the grammar is extended? I think that requires  
extending one of the Camlp4 modules or it won't work.

Also, using is not directly accessible and neither is the keywords  
hash table or is_kwd. You _can_ get the filter with get_filter () but  
the resulting structure is not mutable so you can't wrap is_kwd to low- 
case the string passed to it.

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 13:59                 ` Joel Reymont
@ 2009-03-08 14:09                   ` Matthieu Wipliez
  2009-03-08 14:30                     ` Joel Reymont
  0 siblings, 1 reply; 36+ messages in thread
From: Matthieu Wipliez @ 2009-03-08 14:09 UTC (permalink / raw)
  To: O'Caml Mailing List


> > So to sum up: when you declare a rule with a token "MyToken", the grammar is 
> configured to recognize a "MyToken" keyword.
> 
> The issue here is that it must be lower case in the camlp4 rules, i.e. 
> "mytoken".

Why "it must"? You need it to be lower-case? Or parsing does not work if it is not lower-case?

Maybe I did not understand correctly what you want...
I thought you wanted to recognize
  BuyOrSell something
  buyORsell something

using a single rule, say
  buy : [ [ "buyOrSell"; ... ] ]

If that is the case, I think my solution works.
You might even do that:
  buy : [ [ "buy_or_sell"; ... ] ]

and at lexing time do
  if String.lowercase s = "buyorsell" then
    IDENT "buy_or_sell"
  else
    IDENT s

In this case it is more than a matter of case, but the argument is still valid: I have declared a rule with "buy_or_sell", so the rule will be taken when a "buy_or_sell" keyword is found, and the lexer produces "buy_or_sell" identifiers from anything that matches case-insensitively "BuyOrSell".

Cheers,
Matthieu






^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 14:09                   ` Re : " Matthieu Wipliez
@ 2009-03-08 14:30                     ` Joel Reymont
  2009-03-08 15:07                       ` Re : " Matthieu Wipliez
  0 siblings, 1 reply; 36+ messages in thread
From: Joel Reymont @ 2009-03-08 14:30 UTC (permalink / raw)
  To: Matthieu Wipliez; +Cc: O'Caml Mailing List


On Mar 8, 2009, at 2:09 PM, Matthieu Wipliez wrote:

> using a single rule, say
>  buy : [ [ "buyOrSell"; ... ] ]

Yes, I want camel-case above.

> and at lexing time do
>  if String.lowercase s = "buyorsell" then
>    IDENT "buy_or_sell"
>  else
>    IDENT s


And this is the part that I object to. I have quite a number of  
keywords and I don't want to have a bunch of if statements or have a  
hash table mapping lowercase to camel case. This would mean having to  
track the parser (camel case) version in two places: the lexer and the  
parser.

What I want is to extend Camlp4.Struct.Grammar.Static with a custom  
version of Make that applies String.lowercase before giving the string  
to 'using' to be inserted into the keywords table.

	Thanks, Joel

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re : Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 14:30                     ` Joel Reymont
@ 2009-03-08 15:07                       ` Matthieu Wipliez
  2009-03-08 15:24                         ` Joel Reymont
  0 siblings, 1 reply; 36+ messages in thread
From: Matthieu Wipliez @ 2009-03-08 15:07 UTC (permalink / raw)
  To: O'Caml Mailing List


> And this is the part that I object to. I have quite a number of keywords and I 
> don't want to have a bunch of if statements or have a hash table mapping 
> lowercase to camel case. This would mean having to track the parser (camel case) 
> version in two places: the lexer and the parser.

Ahhh ok, I (finally) got it!
I believe there is a (partially acceptable) solution, if you are willing to accept having all your keywords in lower-case in the grammar (not in the lexer), ie you match against "buyorsell", "sellshort" etc.

Then you can change the functions match_keyword and keyword_conversion as follows:

let keyword_conversion tok is_kwd =
  match tok with
    SYMBOL s | IDENT s when is_kwd (String.lowercase s) -> KEYWORD s
  | _ -> tok

This will pass lower-cased identifiers to "is_kwd", so "BuyOrSell" becomes a valid keyword.

let match_keyword kwd = function
  KEYWORD kwd' when kwd = String.lowercase kwd' -> true
| _ -> false

Here kwd is the keyword from the grammar ("buyorsell") and kwd' is the content of the keyword produced by the lexer ("BuyOrSell"), and they match.

Cheers,
Matthieu






^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 15:07                       ` Re : " Matthieu Wipliez
@ 2009-03-08 15:24                         ` Joel Reymont
  2009-03-08 15:32                           ` Re : " Matthieu Wipliez
  0 siblings, 1 reply; 36+ messages in thread
From: Joel Reymont @ 2009-03-08 15:24 UTC (permalink / raw)
  To: Matthieu Wipliez; +Cc: O'Caml Mailing List


On Mar 8, 2009, at 3:07 PM, Matthieu Wipliez wrote:

> I believe there is a (partially acceptable) solution, if you are  
> willing to accept having all your keywords in lower-case in the  
> grammar (not in the lexer), ie you match against "buyorsell",  
> "sellshort" etc.

Nope, I want camel case! :D I think a functor or something like that  
is called for here. There must be a way to include Structure into a  
module to redefine 'using', without having to duplicate  
Camlp4.Struct.Grammar.Static.Make!

The problem is that Static includes Structure.

I haven't figured out a solution yet.

I already downcase the idents in the lexer, what I want is to use  
camel case in the camlp4 parser and have that be stored as lower case  
in the internal hash table.

	Thanks, Joel

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re : Re : Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 15:24                         ` Joel Reymont
@ 2009-03-08 15:32                           ` Matthieu Wipliez
  2009-03-08 15:39                             ` Joel Reymont
  2009-03-08 15:46                             ` Joel Reymont
  0 siblings, 2 replies; 36+ messages in thread
From: Matthieu Wipliez @ 2009-03-08 15:32 UTC (permalink / raw)
  To: Joel Reymont; +Cc: O'Caml Mailing List


> > I believe there is a (partially acceptable) solution, if you are willing to 
> accept having all your keywords in lower-case in the grammar (not in the lexer), 
> ie you match against "buyorsell", "sellshort" etc.
> 
> Nope, I want camel case! :D

lol ok :-)

> I think a functor or something like that is called 
> for here. There must be a way to include Structure into a module to redefine 
> 'using', without having to duplicate Camlp4.Struct.Grammar.Static.Make!
> 
> The problem is that Static includes Structure.

I'd say duplicate Static, and redefine "using". Seems like the simplest solution to me, certainly not the cleanest though (but is there an alternative?).

Cheers,
Matthieu






^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 15:32                           ` Re : " Matthieu Wipliez
@ 2009-03-08 15:39                             ` Joel Reymont
  2009-03-08 15:46                             ` Joel Reymont
  1 sibling, 0 replies; 36+ messages in thread
From: Joel Reymont @ 2009-03-08 15:39 UTC (permalink / raw)
  To: Matthieu Wipliez; +Cc: O'Caml Mailing List


On Mar 8, 2009, at 3:32 PM, Matthieu Wipliez wrote:

> I'd say duplicate Static, and redefine "using". Seems like the  
> simplest solution to me, certainly not the cleanest though (but is  
> there an alternative?).


Now we are talking!

This is Static.ml:

module Make (Lexer : Sig.Lexer)
: Sig.Grammar.Static with module Loc = Lexer.Loc
                       and module Token = Lexer.Token
= struct

   module Structure = Structure.Make Lexer;
   module Delete = Delete.Make Structure;
   module Insert = Insert.Make Structure;
   module Fold = Fold.Make Structure;
   include Structure;
...
   value get_filter () = gram.gfilter;
...
   value extend = Insert.extend;
end;

I read the documentation for 'include' but couldn't quite grasp  
whether the included interface was exported from that module that's  
including. Given that 'get_filter' is available but 'using', I reckon  
the answer is NO.

What if Static1 included Static after making it, then included  
Structure again and defined its own using in terms of the one provided  
by Structure?

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 15:32                           ` Re : " Matthieu Wipliez
  2009-03-08 15:39                             ` Joel Reymont
@ 2009-03-08 15:46                             ` Joel Reymont
  2009-03-08 15:55                               ` Re : " Matthieu Wipliez
  1 sibling, 1 reply; 36+ messages in thread
From: Joel Reymont @ 2009-03-08 15:46 UTC (permalink / raw)
  To: Matthieu Wipliez; +Cc: O'Caml Mailing List


On Mar 8, 2009, at 3:32 PM, Matthieu Wipliez wrote:

> I'd say duplicate Static, and redefine "using". Seems like the  
> simplest solution to me, certainly not the cleanest though (but is  
> there an alternative?).


I don't think this will work elegantly.

Static first makes a Structure (is make the right term?) and then  
makes a bunch of other modules using it. A custom Structure will be  
needed to downcase the keywords before inserting them into the hash  
table, so Static will need to be duplicated as well.

I'm learning modules, functors, etc. Perhaps someone more experienced  
in this and camlp4 can weight in.

	Thanks, Joel

-- Static.ml --- 

module Make (Lexer : Sig.Lexer)
: Sig.Grammar.Static with module Loc = Lexer.Loc
                         and module Token = Lexer.Token
= struct
   module Structure = Structure.Make Lexer;
   module Delete = Delete.Make Structure;
   module Insert = Insert.Make Structure;
   module Fold = Fold.Make Structure;
   include Structure;

---
http://tinyco.de
Mac, C++, OCaml


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 15:46                             ` Joel Reymont
@ 2009-03-08 15:55                               ` Matthieu Wipliez
  2009-03-08 16:58                                 ` Joel Reymont
  0 siblings, 1 reply; 36+ messages in thread
From: Matthieu Wipliez @ 2009-03-08 15:55 UTC (permalink / raw)
  To: O'Caml Mailing List

[-- Attachment #1: Type: text/plain, Size: 1452 bytes --]

> I don't think this will work elegantly.
> 
> Static first makes a Structure (is make the right term?) and then makes a bunch 
> of other modules using it. A custom Structure will be needed to downcase the 
> keywords before inserting them into the hash table, so Static will need to be 
> duplicated as well.

Well I just duplicated Static to Static1 (and added Camlp4.Struct.Grammar where necessary) and replaced:
  module Structure = Camlp4.Struct.Grammar.Structure.Make Lexer;
by:
  module Structure = struct
        include Camlp4.Struct.Grammar.Structure.Make Lexer;
        
        value using { gkeywords = table; gfilter = filter } kwd =
            let kwd = String.lowercase kwd in
        let r = try Hashtbl.find table kwd with
                [ Not_found ->
                    let r = ref 0 in do { Hashtbl.add table kwd r; r } ]
        in do { Token.Filter.keyword_added filter kwd (r.val = 0);
                incr r };
    end;

This way, I redefine "using" to my liking, the only modification being the lower-casing on the first line.

Structure is then passed to other functors as usual.
Note that you need to compile Static1 with camlp4r because it is revised syntax (in ocamlbuild _tags this is camlp4r, use_camlp4).
This seems to work (you need the lowercase in match_keyword too btw): I have "acTIon" and "actiON" in the parser, and parses "action" in input files.

Cheers,
Matthieu



      

[-- Attachment #2: Static1.ml --]
[-- Type: application/octet-stream, Size: 3481 bytes --]

(****************************************************************************)
(*                                                                          *)
(*                              Objective Caml                              *)
(*                                                                          *)
(*                            INRIA Rocquencourt                            *)
(*                                                                          *)
(*  Copyright  2006   Institut National de Recherche  en  Informatique et   *)
(*  en Automatique.  All rights reserved.  This file is distributed under   *)
(*  the terms of the GNU Library General Public License, with the special   *)
(*  exception on linking described in LICENSE at the top of the Objective   *)
(*  Caml source tree.                                                       *)
(*                                                                          *)
(****************************************************************************)

(* Authors:
 * - Daniel de Rauglaudre: initial version
 * - Nicolas Pouillard: refactoring
*)

open Camlp4;

value uncurry f (x,y) = f x y;
value flip f x y = f y x;

module Make (Lexer : Sig.Lexer)
: Sig.Grammar.Static with module Loc = Lexer.Loc
                        and module Token = Lexer.Token
= struct
  module Structure = struct
		include Camlp4.Struct.Grammar.Structure.Make Lexer;
		
		value using { gkeywords = table; gfilter = filter } kwd =
			let kwd = String.lowercase kwd in
	    let r = try Hashtbl.find table kwd with
	            [ Not_found ->
	                let r = ref 0 in do { Hashtbl.add table kwd r; r } ]
	    in do { Token.Filter.keyword_added filter kwd (r.val = 0);
	            incr r };
	end;
  module Delete = Camlp4.Struct.Grammar.Delete.Make Structure;
  module Insert = Camlp4.Struct.Grammar.Insert.Make Structure;
  module Fold = Camlp4.Struct.Grammar.Fold.Make Structure;
  include Structure;

  value gram =
    let gkeywords = Hashtbl.create 301 in
    {
      gkeywords = gkeywords;
      gfilter = Token.Filter.mk (Hashtbl.mem gkeywords);
      glexer = Lexer.mk ();
      warning_verbose = ref True; (* FIXME *)
      error_verbose = Camlp4_config.verbose
    };

  module Entry = struct
    module E = Camlp4.Struct.Grammar.Entry.Make Structure;
    type t 'a = E.t 'a;
    value mk = E.mk gram;
    value of_parser name strm = E.of_parser gram name strm;
    value setup_parser = E.setup_parser;
    value name = E.name;
    value print = E.print;
    value clear = E.clear;
    value dump = E.dump;
    value obj x = x;
  end;

  value get_filter () = gram.gfilter;

  value lex loc cs = gram.glexer loc cs;

  value lex_string loc str = lex loc (Stream.of_string str);

  value filter ts = Token.Filter.filter gram.gfilter ts;

  value parse_tokens_after_filter entry ts = Entry.E.parse_tokens_after_filter entry ts;

  value parse_tokens_before_filter entry ts = parse_tokens_after_filter entry (filter ts);

  value parse entry loc cs = parse_tokens_before_filter entry (lex loc cs);

  value parse_string entry loc str = parse_tokens_before_filter entry (lex_string loc str);

  value delete_rule = Delete.delete_rule;

  value srules e rl =
    Stree (List.fold_left (flip (uncurry (Insert.insert_tree e))) DeadEnd rl);
  value sfold0 = Fold.sfold0;
  value sfold1 = Fold.sfold1;
  value sfold0sep = Fold.sfold0sep;
  (* value sfold1sep = Fold.sfold1sep; *)

  value extend = Insert.extend;

end;

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 15:55                               ` Re : " Matthieu Wipliez
@ 2009-03-08 16:58                                 ` Joel Reymont
  2009-03-08 17:04                                   ` Re : " Matthieu Wipliez
  0 siblings, 1 reply; 36+ messages in thread
From: Joel Reymont @ 2009-03-08 16:58 UTC (permalink / raw)
  To: Matthieu Wipliez; +Cc: O'Caml Mailing List


On Mar 8, 2009, at 3:55 PM, Matthieu Wipliez wrote:

> Well I just duplicated Static to Static1 (and added  
> Camlp4.Struct.Grammar where necessary) and replaced:
>  module Structure = Camlp4.Struct.Grammar.Structure.Make Lexer;
> by:

Something like this you mean? I must be doing something wrong as I  
never see my printout from 'using'.

	Thanks, Joel

--- Static1.ml ---

open Camlp4;
open Struct;
open Grammar;

value uncurry f (x,y) = f x y;
value flip f x y = f y x;

module Make (Lexer : Sig.Lexer)
: Sig.Grammar.Static with module Loc = Lexer.Loc
                         and module Token = Lexer.Token
= struct
   module Structure = struct
     include Camlp4.Struct.Grammar.Structure.Make Lexer;

     value using { gkeywords = table; gfilter = filter } kwd =
       let _ = print_endline ("using: storing " ^ String.lowercase  
kwd) in
       let kwd = String.lowercase kwd in
       let r = try Hashtbl.find table kwd with
         [ Not_found ->
           let r = ref 0 in do { Hashtbl.add table kwd r; r } ]
           in do { Token.Filter.keyword_added filter kwd (r.val = 0);  
incr r };
   end;
   module Delete = Delete.Make Structure;
   module Insert = Insert.Make Structure;
   module Fold = Fold.Make Structure;
   include Structure;

   value gram =
     let gkeywords = Hashtbl.create 301 in
     {
       gkeywords = gkeywords;
       gfilter = Token.Filter.mk (Hashtbl.mem gkeywords);
       glexer = Lexer.mk ();
       warning_verbose = ref True; (* FIXME *)
       error_verbose = Camlp4_config.verbose
     };

   module Entry = struct
     module E = Entry.Make Structure;
     type t 'a = E.t 'a;
     value mk = E.mk gram;
     value of_parser name strm = E.of_parser gram name strm;
     value setup_parser = E.setup_parser;
     value name = E.name;
     value print = E.print;
     value clear = E.clear;
     value dump = E.dump;
     value obj x = x;
   end;

   value get_filter () = gram.gfilter;

   value lex loc cs = gram.glexer loc cs;

   value lex_string loc str = lex loc (Stream.of_string str);

   value filter ts = Token.Filter.filter gram.gfilter ts;

   value parse_tokens_after_filter entry ts =  
Entry.E.parse_tokens_after_filter entry ts;

   value parse_tokens_before_filter entry ts =  
parse_tokens_after_filter entry (filter ts);

   value parse entry loc cs = parse_tokens_before_filter entry (lex  
loc cs);

   value parse_string entry loc str = parse_tokens_before_filter entry  
(lex_string loc str);

   value delete_rule = Delete.delete_rule;

   value srules e rl =
     Stree (List.fold_left (flip (uncurry (Insert.insert_tree e)))  
DeadEnd rl);
   value sfold0 = Fold.sfold0;
   value sfold1 = Fold.sfold1;
   value sfold0sep = Fold.sfold0sep;
   (* value sfold1sep = Fold.sfold1sep; *)

   value extend = Insert.extend;

end;

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re : Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 16:58                                 ` Joel Reymont
@ 2009-03-08 17:04                                   ` Matthieu Wipliez
  2009-03-08 17:15                                     ` Joel Reymont
  0 siblings, 1 reply; 36+ messages in thread
From: Matthieu Wipliez @ 2009-03-08 17:04 UTC (permalink / raw)
  To: O'Caml Mailing List


> > Well I just duplicated Static to Static1 (and added Camlp4.Struct.Grammar 
> where necessary) and replaced:
> >  module Structure = Camlp4.Struct.Grammar.Structure.Make Lexer;
> > by:
> 
> Something like this you mean? I must be doing something wrong as I never see my 
> printout from 'using'.

In the parser, did you replace
  module Gram = Camlp4.Struct.Grammar.Static.Make(Lexer)
by
  module Gram = Static1.Make(Lexer)

Because it works fine for me.

Cheers,
Matthieu






^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Re : Re : [Caml-list] Re: camlp4 stream parser syntax
  2009-03-08 17:04                                   ` Re : " Matthieu Wipliez
@ 2009-03-08 17:15                                     ` Joel Reymont
  0 siblings, 0 replies; 36+ messages in thread
From: Joel Reymont @ 2009-03-08 17:15 UTC (permalink / raw)
  To: Matthieu Wipliez; +Cc: O'Caml Mailing List


On Mar 8, 2009, at 5:04 PM, Matthieu Wipliez wrote:

> In the parser, did you replace
>  module Gram = Camlp4.Struct.Grammar.Static.Make(Lexer)
> by
>  module Gram = Static1.Make(Lexer)


I forgot to fix match_keyword. Works otherwise, thanks!

Now, why is match_keyword supplied with the original keyword, e.g.  
"Delay" when the lower case version of that is supposed to be inserted  
into the hash table?

---
http://tinyco.de
Mac, C++, OCaml




^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2009-03-08 17:15 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-07 22:38 camlp4 stream parser syntax Joel Reymont
2009-03-07 22:52 ` Joel Reymont
2009-03-07 23:21   ` Re : [Caml-list] " Matthieu Wipliez
2009-03-07 23:42     ` Joel Reymont
2009-03-08  0:40     ` Joel Reymont
2009-03-08  1:08       ` Re : " Matthieu Wipliez
2009-03-08  8:25         ` Joel Reymont
2009-03-08  9:37           ` Daniel de Rauglaudre
2009-03-08  9:51             ` Joel Reymont
2009-03-08 10:27               ` Daniel de Rauglaudre
2009-03-08 10:35                 ` Joel Reymont
2009-03-08 11:07                   ` Joel Reymont
2009-03-08 11:28                     ` Daniel de Rauglaudre
2009-03-08 11:45           ` Re : Re : " Matthieu Wipliez
2009-03-08 11:52             ` Joel Reymont
2009-03-08 13:33               ` Re : " Matthieu Wipliez
2009-03-08 13:59                 ` Joel Reymont
2009-03-08 14:09                   ` Re : " Matthieu Wipliez
2009-03-08 14:30                     ` Joel Reymont
2009-03-08 15:07                       ` Re : " Matthieu Wipliez
2009-03-08 15:24                         ` Joel Reymont
2009-03-08 15:32                           ` Re : " Matthieu Wipliez
2009-03-08 15:39                             ` Joel Reymont
2009-03-08 15:46                             ` Joel Reymont
2009-03-08 15:55                               ` Re : " Matthieu Wipliez
2009-03-08 16:58                                 ` Joel Reymont
2009-03-08 17:04                                   ` Re : " Matthieu Wipliez
2009-03-08 17:15                                     ` Joel Reymont
2009-03-08  9:34         ` Joel Reymont
2009-03-07 23:52 ` [Caml-list] " Jon Harrop
2009-03-07 23:53   ` Joel Reymont
2009-03-08  0:12     ` Jon Harrop
2009-03-08  0:20       ` Re : " Matthieu Wipliez
2009-03-08  0:29         ` Jon Harrop
2009-03-08  0:30         ` Re : " Joel Reymont
2009-03-08  0:37           ` Re : " Matthieu Wipliez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).