From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA06818 for caml-red; Fri, 9 Feb 2001 10:50:11 +0100 (MET) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id KAA15481 for ; Fri, 9 Feb 2001 10:44:44 +0100 (MET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f199igP29727; Fri, 9 Feb 2001 10:44:42 +0100 (MET) Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA02025; Fri, 9 Feb 2001 10:44:41 +0100 (MET) From: Pierre Weis Message-Id: <200102090944.KAA02025@pauillac.inria.fr> Subject: Re: Genlex In-Reply-To: <3A831147.E28E6DF8@titan.iwu.edu> from Hans-Joerg Tiede at "Feb 8, 101 03:36:07 pm" To: htiede@titan.iwu.edu (Hans-Joerg Tiede) Date: Fri, 9 Feb 2001 10:44:41 +0100 (MET) Cc: caml-list@inria.fr X-Mailer: ELM [version 2.4ME+ PL28 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: weis@pauillac.inria.fr > Hi, > I was writing a simple Scheme parser in Ocaml and I set up a lexer using > the Genlex library. It seems that the lexer doesn't support keywords > with the # character in them, making it hard to recognize #t and #f > (true and false in Scheme). [...] > --Joerg > ----------------------------------------------------- > Hans-Joerg Tiede [...] > www: http://www.iwu.edu/~htiede > ----------------------------------------------------- Right. This is because the # is a starter for ``special idents'' made of symbols only (here symbols == non alphanumeric chars). You must change the rule for ident2 to add the possibility to have alpha-numeric chars after a non-alphanumeric char. For instance: and ident2 = parser | [< ' '!'|'%'|'&'|'$'|'#'|'+'|'-'|'/'|':'|'<'|'='|'>'|'?'|'@'|'\\'| '~'|'^'|'|'|'*' as c; s >] -> store c; ident2 s | [< ' 'A'..'Z'|'a'..'z'|'\192'..'\255'|'0'..'9'|'_'|'\'' as c; s>] -> store c; ident2 s | [< >] -> Some(ident_or_keyword(get_string())) However, to built a lexical analyzer for Scheme, you should rewrite a lot of the Genlex module, since the tokens recognized by genlex are far too similar to those of Caml (or Pascal or C or Java) to accomodate Scheme symbols (for instance int->real is naturally considered as 3 tokens by Genlex, when it is a regular ident name in Scheme). Alternatively, you can consider using Ocamllex to write a conventional lexer. Hope this helps, Pierre Weis INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/