caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Matthieu Wipliez <mwipliez@yahoo.fr>
To: O'Caml Mailing List <caml-list@yquem.inria.fr>
Subject: Re : [Caml-list] Re: camlp4 stream parser syntax
Date: Sat, 7 Mar 2009 23:21:42 +0000 (GMT)	[thread overview]
Message-ID: <46331.52510.qm@web27007.mail.ukl.yahoo.com> (raw)
In-Reply-To: <24D11586-4F15-4B6E-8FB7-58651317164D@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3737 bytes --]

Hi Joel,

why are you using stream parsers instead of Camlp4 grammars ?
This:

> let rec parse_primary = parser
> 
>   | [< 'INT n >] -> Int n
>   | [< 'FLOAT n >] -> Float n
>   | [< 'STRING n >] -> Str n
>   | [< 'TRUE >] -> Bool true
>   | [< 'FALSE >] -> Bool false
> 
>   | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")

could be written as:
expression: [
  [ (i, _) = INT -> Int i
  | (s, _) = STRING -> Str s
  ... ]
];

Note that Camlp4 will automatically raise an exception if the input cannot be parsed with the grammar given.

Also if you have input that is syntactically correct but is not semantically correct, and you want to raise an exception with the error location during parsing, you might want to use Loc.raise as follows:
expression: [
  [ e1 = SELF; "/"; e2 = SELF ->
    if e2 = Int 0 then
      Loc.raise _loc (Failure "division by zero")
    else
      BinaryOp (e1, Div, e2) ]
];

By the way, do you need you own tailor-made lexer? Camlp4 provides one that might satisfy your needs.
Otherwise, you can always define your own lexer (I had to do that for the project I'm working on, see file attached).

Your parser would then look like

(* functor application *)
module Camlp4Loc = Camlp4.Struct.Loc
module Lexer = Cal_lexer.Make(Camlp4Loc)
module Gram = Camlp4.Struct.Grammar.Static.Make(Lexer)

(* exposes EOI and other stuff *)
open Lexer

(* rule definition *)
let rule = Gram.Entry.mk "rule"

(* grammar definition *)
EXTEND Gram
  rule: [ [ ... ] ];
END

(* to parse a file *)
Gram.parse rule (Loc.mk file) (Stream.of_channel ch)


This should be compiled with camlp4of.

I hope this helps you with what you'd like to do,

Cheers,

Matthieu


----- Message d'origine ----
> De : Joel Reymont <joelr1@gmail.com>
> À : O'Caml Mailing List <caml-list@yquem.inria.fr>
> Envoyé le : Samedi, 7 Mars 2009, 23h52mn 52s
> Objet : [Caml-list] Re: camlp4 stream parser syntax
> 
> > Where can I read up on the syntax of the following in a camlp4 stream parser?
> > 
> >  | [<' INT n >] -> Int n
> > 
> > For example, where are [< ... >] described and why is the ' needed in between?
> 
> 
> To be more precise, I'm using camlp4 to parse a language into a non-OCaml AST.
> 
> I'm trying to figure out the meaning of [<, >], [[ and ]]
> 
> My ocamllex lexer is wrapped to make it look like a stream lexer (below) and I'm 
> returning a tuple of (tok, loc) because I don't see another way of making token 
> location available to the parser.
> 
> Still, I'm how to integrate the reporting of error location into ?? in something 
> like this
> 
> | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
> 
> Would someone kindly shed light on this?
> 
>     Thanks in advance, Joel
> 
> P.S. ocamllex wrapper to return a' Stream.t
> 
> {
> let from_lexbuf tab lb =
>   let next _ =
>     let tok = token tab lb in
>     let loc = Loc.of_lexbuf lb in
>     Some (tok, loc)
>   in Stream.from next
> 
> let setup_loc lb loc =
>   let start_pos = Loc.start_pos loc in
>   lb.lex_abs_pos <- start_pos.pos_cnum;
>   lb.lex_curr_p  <- start_pos
> 
> let from_string loc tab str =
>   let lb = Lexing.from_string str in
>   setup_loc lb loc;
>   from_lexbuf tab lb
> 
> }
> 
> ---
> http://tinyco.de
> Mac, C++, OCaml
> 
> 
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs



      

[-- Attachment #2: cal_lexer.mll --]
[-- Type: application/octet-stream, Size: 10564 bytes --]

(*****************************************************************************)
(* Cal2C                                                                     *)
(* Copyright (c) 2007-2008, IETR/INSA of Rennes.                             *)
(* All rights reserved.                                                      *)
(*                                                                           *)
(* This software is governed by the CeCILL-B license under French law and    *)
(* abiding by the rules of distribution of free software. You can  use,      *)
(* modify and/ or redistribute the software under the terms of the CeCILL-B  *)
(* license as circulated by CEA, CNRS and INRIA at the following URL         *)
(* "http://www.cecill.info".                                                 *)
(*                                                                           *)
(* Matthieu WIPLIEZ <Matthieu.Wipliez@insa-rennes.fr                         *)
(*****************************************************************************)

(* File cal_lexer.mll *)
{
open Printf
open Format

module Make (Loc : Camlp4.Sig.Loc) = struct
  module Loc = Loc

	type token =
		| KEYWORD of string
		| SYMBOL  of string
    | IDENT   of string
    | INT     of int * string
    | FLOAT   of float * string
    | CHAR    of char * string
    | STRING  of string * string
	  | EOI
	
	module Token = struct
		module Loc = Loc
	
    type t = token

	  let to_string =
	    function
	      KEYWORD s     -> sprintf "KEYWORD %S" s
	    | SYMBOL s      -> sprintf "SYMBOL %S" s
	    | IDENT s       -> sprintf "IDENT %S" s
	    | INT (_, s)    -> sprintf "INT %s" s
	    | FLOAT (_, s)  -> sprintf "FLOAT %s" s
	    | CHAR (_, s)   -> sprintf "CHAR '%s'" s
	    | STRING (_, s) -> sprintf "STRING \"%s\"" s
	                      (* here it's not %S since the string is already escaped *)
	    | EOI           -> sprintf "EOI"

	  let print ppf x = pp_print_string ppf (to_string x)
	
	  let match_keyword kwd = function
	      KEYWORD kwd' when kwd = kwd' -> true
	    | _ -> false
	
	  let extract_string =
	    function
	      KEYWORD s
			| IDENT s
			| INT (_, s)
			| FLOAT (_, s)
			| CHAR (_, s)
			| STRING (_, s) -> s
	    | tok ->
	        invalid_arg ("Cannot extract a string from this token: "^
	                     to_string tok)
	
	  module Error = struct
	    type t =
	        Illegal_token of string
	      | Keyword_as_label of string
	      | Illegal_token_pattern of string * string
	      | Illegal_constructor of string
	
	    exception E of t
	
	    let print ppf =
	      function
	        Illegal_token s ->
	          fprintf ppf "Illegal token (%s)" s
	      | Keyword_as_label kwd ->
	          fprintf ppf "`%s' is a keyword, it cannot be used as label name" kwd
	      | Illegal_token_pattern (p_con, p_prm) ->
	          fprintf ppf "Illegal token pattern: %s %S" p_con p_prm
	      | Illegal_constructor con ->
	          fprintf ppf "Illegal constructor %S" con
	
	    let to_string x =
	      let b = Buffer.create 50 in
	      let () = bprintf b "%a" print x in Buffer.contents b
	  end
		
	  module M = Camlp4.ErrorHandler.Register(Error)
	
	  module Filter = struct
	    type token_filter = (t, Loc.t) Camlp4.Sig.stream_filter
			
	    type t =
	      { is_kwd : string -> bool;
	        mutable filter : token_filter }
	
	    let mk is_kwd =
	      { is_kwd = is_kwd;
	        filter = fun s -> s }
	
	    let keyword_conversion tok is_kwd =
        match tok with
          SYMBOL s | IDENT s when is_kwd s -> KEYWORD s
        | _ -> tok
	
	    let filter x =
	      let f tok loc =
          let tok' = keyword_conversion tok x.is_kwd in
	        (tok', loc)
	      in
	      let rec filter =
	        parser
	        | [< '(tok, loc); s >] -> [< ' f tok loc; filter s >]
	        | [< >] -> [< >]
	      in
	      fun strm -> x.filter (filter strm)
	
	    let define_filter x f = x.filter <- f x.filter
	
	    let keyword_added _ _ _ = ()
	    let keyword_removed _ _ = ()
	  end
	end
	
  open Lexing
	
	(* Error report *)
  module Error = struct

    type t =
      | Illegal_character of char
      | Illegal_escape    of string
      | Unterminated_comment
      | Unterminated_string
      | Unterminated_quotation
      | Unterminated_antiquot
      | Unterminated_string_in_comment
      | Comment_start
      | Comment_not_end
      | Literal_overflow of string

    exception E of t

    open Format

    let print ppf =
      function
      | Illegal_character c ->
          fprintf ppf "Illegal character (%s)" (Char.escaped c)
      | Illegal_escape s ->
          fprintf ppf "Illegal backslash escape in string or character (%s)" s
      | Unterminated_comment ->
          fprintf ppf "Comment not terminated"
      | Unterminated_string ->
          fprintf ppf "String literal not terminated"
      | Unterminated_string_in_comment ->
          fprintf ppf "This comment contains an unterminated string literal"
      | Unterminated_quotation ->
          fprintf ppf "Quotation not terminated"
      | Unterminated_antiquot ->
          fprintf ppf "Antiquotation not terminated"
      | Literal_overflow ty ->
          fprintf ppf "Integer literal exceeds the range of representable integers of type %s" ty
      | Comment_start ->
          fprintf ppf "this is the start of a comment"
      | Comment_not_end ->
          fprintf ppf "this is not the end of a comment"

    let to_string x =
      let b = Buffer.create 50 in
      let () = bprintf b "%a" print x in Buffer.contents b
  end

  let module M = Camlp4.ErrorHandler.Register(Error) in ()

  open Error
	
	open Cal2c_util
  exception Eof

(* String construction *)
let str = ref ""

type context = {
  loc        : Loc.t;
  in_comment : bool;
  quotations : bool;
  antiquots  : bool;
  lexbuf     : lexbuf;
  buffer     : Buffer.t
}

(* Update the current location with file name and line number. *)
let update_loc c file line absolute chars =
  let lexbuf = c.lexbuf in
  let pos = lexbuf.lex_curr_p in
  let new_file =
		match file with
    | None -> pos.pos_fname
    | Some s -> s
  in
  lexbuf.lex_curr_p <- { pos with
    pos_fname = new_file;
    pos_lnum = if absolute then line else pos.pos_lnum + line;
    pos_bol = pos.pos_cnum - chars;
  }

(* Matches either \ or $. Why so many backslashes? Because \ has to be escaped*)
(* in strings, so we get \\. \, | and $ also have to be escaped in regexps, *)
(* so we have \\\\ \\| \\$. *)
let re_id = Str.regexp "\\\\\\|\\$"
}

(* Numbers *)
let nonZeroDecimalDigit = ['1'-'9']

let decimalDigit = '0' | nonZeroDecimalDigit
let decimalLiteral = nonZeroDecimalDigit (decimalDigit)*

let hexadecimalDigit = decimalDigit | ['a'-'f'] | ['A'-'F']
let hexadecimalLiteral = '0' ('x'|'X') hexadecimalDigit (hexadecimalDigit)*

let octalDigit = ['0'-'7']
let octalLiteral = '0' (octalDigit)*

let integer = decimalLiteral | hexadecimalLiteral | octalLiteral

let exponent = ('e'|'E') ('+'|'-')? decimalDigit+
let real = decimalDigit+ '.' (decimalDigit)* exponent?
| '.' decimalDigit+ exponent?
| decimalDigit+ exponent

(* Identifiers *)
let char = ['a'-'z' 'A'-'Z']
let any_identifier = (char | '_' | decimalDigit | '$')+
let other_identifier =
    (char | '_') (char | '_' | decimalDigit | '$')*
  | '$' (char | '_' | decimalDigit | '$')+
let identifier = '\\' any_identifier '\\' | other_identifier

let newline = ('\010' | '\013' | "\013\010")

(* Token rule *)
rule token c = parse
  | [' ' '\t'] {token c lexbuf}
	| newline { update_loc c None 1 false 0; token c lexbuf }

	| "^" { SYMBOL "^" }
	| "->" { SYMBOL "->" }
	| ':' { SYMBOL ":" }
	| ":=" { SYMBOL ":=" }
	| ',' { SYMBOL "," }
	| "!=" { SYMBOL "!=" }
	| '/' { SYMBOL "/" }
	| '.' { SYMBOL "." }
	| ".." { SYMBOL ".." }
	| "::" { SYMBOL "::" }
	| "-->" { SYMBOL "-->" }
	| "==>" { SYMBOL "==>" }
	| '=' { SYMBOL "=" }
	| ">=" { SYMBOL ">=" }
	| '>' { SYMBOL ">" }
	| '{' { SYMBOL "{" }
	| '[' { SYMBOL "[" }
	| "<=" { SYMBOL "<=" }
	| '<' { SYMBOL "<" }
	| '(' { SYMBOL "(" }
	| '-' { SYMBOL "-" }
	| '+' { SYMBOL "+" }
	| '}' { SYMBOL "}" }
	| ']' { SYMBOL "]" }
	| ')' { SYMBOL ")" }
	| ';' { SYMBOL ";" }
	| '#' { SYMBOL "#" }
	| '*' { SYMBOL "*" }

  | integer as lxm { INT (int_of_string lxm, lxm) }
  | real as lxm { FLOAT (float_of_string lxm, lxm) }
  | identifier as ident {
				let ident = Str.global_replace re_id "_" ident in
				IDENT ident }
  | '"' { let str = string c lexbuf in STRING (str, str) }
  | "//" { single_line_comment c lexbuf }
	| "/*" { multi_line_comment c lexbuf }
  | eof { EOI }
and string ctx = parse
	| "\\\"" { str := !str ^ "\\\""; string ctx lexbuf }
	| '"' { let s = !str in str := ""; s }
	| _ as c { str := !str ^ (String.make 1 c); string ctx lexbuf }
and single_line_comment c = parse
  | newline { update_loc c None 1 false 0; token c lexbuf }
	| _ { single_line_comment c lexbuf }
and multi_line_comment c = parse
  | "*/" { token c lexbuf }
	| newline { update_loc c None 1 false 0; multi_line_comment c lexbuf }
	| _ { multi_line_comment c lexbuf }
    
{		
  let default_context lb =
  { loc        = Loc.ghost ;
    in_comment = false     ;
    quotations = true      ;
    antiquots  = false     ;
    lexbuf     = lb        ;
    buffer     = Buffer.create 256 }
	
  let update_loc c = { (c) with loc = Loc.of_lexbuf c.lexbuf }

  let with_curr_loc f c = f (update_loc c) c.lexbuf
	
  let lexing_store s buff max =
    let rec self n s =
      if n >= max then n
      else
        match Stream.peek s with
        | Some x ->
            Stream.junk s;
            buff.[n] <- x;
            succ n
        | _ -> n
    in
    self 0 s

  let from_context c =
    let next _ =
      let tok = with_curr_loc token c in
      let loc = Loc.of_lexbuf c.lexbuf in
      Some ((tok, loc))
    in Stream.from next

  let from_lexbuf ?(quotations = true) lb =
    let c = { (default_context lb) with
              loc        = Loc.of_lexbuf lb;
              antiquots  = !Camlp4_config.antiquotations;
              quotations = quotations      }
    in from_context c

  let setup_loc lb loc =
    let start_pos = Loc.start_pos loc in
    lb.lex_abs_pos <- start_pos.pos_cnum;
    lb.lex_curr_p  <- start_pos

  let from_string ?quotations loc str =
    let lb = Lexing.from_string str in
    setup_loc lb loc;
    from_lexbuf ?quotations lb

  let from_stream ?quotations loc strm =
    let lb = Lexing.from_function (lexing_store strm) in
    setup_loc lb loc;
    from_lexbuf ?quotations lb

  let mk () loc strm =
    from_stream ~quotations:!Camlp4_config.quotations loc strm
end
}

  reply	other threads:[~2009-03-07 23:22 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-07 22:38 Joel Reymont
2009-03-07 22:52 ` Joel Reymont
2009-03-07 23:21   ` Matthieu Wipliez [this message]
2009-03-07 23:42     ` Re : [Caml-list] " Joel Reymont
2009-03-08  0:40     ` Joel Reymont
2009-03-08  1:08       ` Re : " Matthieu Wipliez
2009-03-08  8:25         ` Joel Reymont
2009-03-08  9:37           ` Daniel de Rauglaudre
2009-03-08  9:51             ` Joel Reymont
2009-03-08 10:27               ` Daniel de Rauglaudre
2009-03-08 10:35                 ` Joel Reymont
2009-03-08 11:07                   ` Joel Reymont
2009-03-08 11:28                     ` Daniel de Rauglaudre
2009-03-08 11:45           ` Re : Re : " Matthieu Wipliez
2009-03-08 11:52             ` Joel Reymont
2009-03-08 13:33               ` Re : " Matthieu Wipliez
2009-03-08 13:59                 ` Joel Reymont
2009-03-08 14:09                   ` Re : " Matthieu Wipliez
2009-03-08 14:30                     ` Joel Reymont
2009-03-08 15:07                       ` Re : " Matthieu Wipliez
2009-03-08 15:24                         ` Joel Reymont
2009-03-08 15:32                           ` Re : " Matthieu Wipliez
2009-03-08 15:39                             ` Joel Reymont
2009-03-08 15:46                             ` Joel Reymont
2009-03-08 15:55                               ` Re : " Matthieu Wipliez
2009-03-08 16:58                                 ` Joel Reymont
2009-03-08 17:04                                   ` Re : " Matthieu Wipliez
2009-03-08 17:15                                     ` Joel Reymont
2009-03-08  9:34         ` Joel Reymont
2009-03-07 23:52 ` [Caml-list] " Jon Harrop
2009-03-07 23:53   ` Joel Reymont
2009-03-08  0:12     ` Jon Harrop
2009-03-08  0:20       ` Re : " Matthieu Wipliez
2009-03-08  0:29         ` Jon Harrop
2009-03-08  0:30         ` Re : " Joel Reymont
2009-03-08  0:37           ` Re : " Matthieu Wipliez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46331.52510.qm@web27007.mail.ukl.yahoo.com \
    --to=mwipliez@yahoo.fr \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).