From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 67B39BBAF for ; Thu, 15 May 2008 17:01:51 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AscAAArtK0iGnQCBomdsb2JhbACSIQEBAQEBCAUGCRGbHg X-IronPort-AV: E=Sophos;i="4.27,492,1204498800"; d="scan'208";a="10756982" Received: from shiva.jussieu.fr ([134.157.0.129]) by mail2-smtp-roc.national.inria.fr with ESMTP; 15 May 2008 17:01:51 +0200 Received: from hydrogene.pps.jussieu.fr (hydrogene.pps.jussieu.fr [134.157.168.1]) by shiva.jussieu.fr (8.14.2/jtpda-5.4) with ESMTP id m4FF0Yg6015860 for ; Thu, 15 May 2008 17:00:49 +0200 (CEST) X-Ids:168 Received: from hydrogene.pps.jussieu.fr (localhost.localdomain [127.0.0.1]) by hydrogene.pps.jussieu.fr (8.13.4/jtpda-5.4) with ESMTP id m4FF0Xbh032600 for ; Thu, 15 May 2008 17:00:33 +0200 Received: (from abate@localhost) by hydrogene.pps.jussieu.fr (8.13.4/8.13.2/Submit) id m4FF0XE4032599 for caml-list@yquem.inria.fr; Thu, 15 May 2008 17:00:33 +0200 Date: Thu, 15 May 2008 17:00:33 +0200 From: Pietro Abate To: caml-list@yquem.inria.fr Subject: camlp4 and lexers Message-ID: <20080515150033.GA31934@uranium.pps.jussieu.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Operating-System: GNU/Linux User-Agent: Mutt/1.5.9i X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (shiva.jussieu.fr [134.157.0.168]); Thu, 15 May 2008 17:00:50 +0200 (CEST) X-Virus-Scanned: ClamAV 0.92/7128/Thu May 15 12:45:41 2008 on shiva.jussieu.fr X-Virus-Status: Clean X-Miltered: at jchkmail.jussieu.fr with ID 482C5021.028 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)! X-j-chkmail-Enveloppe: 482C5021.028/134.157.168.1/hydrogene.pps.jussieu.fr/hydrogene.pps.jussieu.fr/ X-j-chkmail-Score: MSGID : 482C5021.028 on jchkmail.jussieu.fr : j-chkmail score : . : R=. U=. O=. B=0.061 -> S=0.061 X-j-chkmail-Status: Ham X-Spam: no; 0.00; camlp:01 lexers:01 camlp:01 parser:01 regexp:01 ocaml:01 lexer:01 parser:01 lexer:01 tokens:01 regexp:01 plexer:01 struct:01 struct:01 seq:01 Hi all, This question was asked a few weeks ago, and again last week. However I still don't really get how to proceed. I hope we can cook down a small example to understand a bit more the camlp4 internals. Say I want to write a small parser for regexp (or an aritmetic calculator), but I don't want to extend the ocaml grammar to do that. I just want to create a minimal lexer and a minimal grammar to parse expressions like (aaa*|b?);c The parser part is easy (below). The part I don't understand is how to create a lexer. I had a look at the ocsigen xmlcaml lexer and the camlp4 lexer, but I still haven't found a minimal example I can use without getting confused. In particular, the problem below is that I want my lexer to give me back CHAR tokens (different from the CHAR of char * string of camlp4) and not strings. I could do the same with the camlp4 lexer, but all my regexp should be then written as ('a''a''a' *) etc ... that it's not good looking. A while ago I did something similar with the old camlp4 [1] using plexer, but this is not possible anymore... Nicolas a while ago suggested to copy the Camlp4.PreCast module and the lexer module and customize them. I think it should be possible just to use Struct.Grammar.Static.Make with a new lexer instead... but, as I said, I'm not able to write a very minimal lexer for this example... Maybe I'm confused about this. I think a minimal example will help more then one person here. thanks :) p -------------------------- This is my parser... module RegExGram = Struct.Grammar.Static.Make(RegExpLexer) let regex = RegExGram.Entry.mk "regex" EXTEND RegExGram GLOBAL: regex; regex: [[ e1 = SELF ; "|" ; e2 = concat -> Alt(e1,e2) | e1 = seq -> e1 ] ]; concat:[[ e1 = SELF ; ";"; e2 = seq -> Seq(e1,e2) | e1 = SELF ; e2 = seq -> Seq(e1,e2) | e1 = seq -> e1 ] ]; seq: [[ e1 = simple ; "?" -> Opt e1 | e1 = simple ; "*" -> Star e1 | e1 = simple ; "+" -> Plus e1 | e1 = simple -> e1 ] ]; simple:[[ "." -> Dot | "("; e1 = regex; ")" -> e1 | `CHAR(s) -> Sym s ] ]; END ---------------------- [1] http://groups.google.com/group/fa.caml/browse_thread/thread/e26569427cc8879d