From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 503A9BC69 for ; Fri, 23 Feb 2007 20:05:33 +0100 (CET) Received: from out5.smtp.messagingengine.com (out5.smtp.messagingengine.com [66.111.4.29]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l1NJ5W5l009361 for ; Fri, 23 Feb 2007 20:05:32 +0100 Received: from out1.internal (unknown [10.202.2.149]) by out1.messagingengine.com (Postfix) with ESMTP id 8A33F1D014E; Fri, 23 Feb 2007 14:05:31 -0500 (EST) Received: from heartbeat2.messagingengine.com ([10.202.2.161]) by out1.internal (MEProxy); Fri, 23 Feb 2007 14:05:31 -0500 X-Sasl-enc: cdm0Rdn6tqUCznOgUe1vO+xe6GzFfuXZLrHgl//Zbvwk 1172257531 Received: from [172.16.112.115] (burnham.ljcrf.edu [192.231.106.2]) by mail.messagingengine.com (Postfix) with ESMTP id CA1F8EDC8; Fri, 23 Feb 2007 14:05:30 -0500 (EST) Date: Fri, 23 Feb 2007 11:04:29 -0800 (PST) From: Martin Jambon X-X-Sender: martin@localhost To: Joel Reymont Cc: caml-list@inria.fr Subject: Re: [Caml-list] Case-insensitive lexing In-Reply-To: <7504EE4A-36DE-46EF-B60C-A9F4ADEEED6F@gmail.com> Message-ID: References: <23E977DC-77FF-44A2-8675-5EAA8F61505F@gmail.com> <011EB42A-05E3-4686-BED7-2DB8B2663221@cs.uni-sb.de> <7504EE4A-36DE-46EF-B60C-A9F4ADEEED6F@gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Miltered: at concorde with ID 45DF3AFC.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ens-lyon:01 lexing:01 lindig:01 uglier:01 ocamllex:01 ocamllex:01 tilde:01 regexp:01 2007,:98 23,:98 2007,:98 wrote:01 wrote:01 caml-list:01 lex:01 On Fri, 23 Feb 2007, Joel Reymont wrote: > > On Feb 23, 2007, at 3:50 PM, Christian Lindig wrote: > > > I assume you define identifier to a sequence of upper and lower > > characters. What is wrong with this solution and what would you > > have liked to see instead? > > I was hoping not having to write this (from a Lex file): > > A [Aa] > B [Bb] > .. > Y [Yy] > Z [Zz] > > {A}{B}{O}{V}{E} { return(ABOVE); } > {A}{G}{O} { return(AGO); } > {A}{L}{E}{R}{T} { return(ALERT); } > > Or even uglier > > {N}{U}{M}{E}{R}{I}{C}{A}{R}{R}{A}{Y}{R}{E}{F} ('ARRAY-NUM- > REF, yytext) > {T}{R}{U}{E}{F}{A}{L}{S}{E}{A}{R}{R}{A}{Y} ('ARRAY-NUM, > yytext) > {T}{R}{U}{E}{F}{A}{L}{S}{E}{A}{R}{R}{A}{Y}{R}{E}{F} ('ARRAY-NUM- > REF, yytext) The ocamllex version would look like: let a = ['A' 'a'] let b = ['B' 'b'] ... a b o v e { ABOVE } | a g o { AGO } | a l e r t { ALERT } Note that in micmatch--which is not ocamllex but supports the same core syntax--there is a tilde operator which makes the regexp case-insensitive according to latin1. It doesn't work with every language or encoding though. Martin -- Martin Jambon http://martin.jambon.free.fr