From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <martin.jambon@ens-lyon.org>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39])
	by yquem.inria.fr (Postfix) with ESMTP id 503A9BC69
	for <caml-list@yquem.inria.fr>; Fri, 23 Feb 2007 20:05:33 +0100 (CET)
Received: from out5.smtp.messagingengine.com (out5.smtp.messagingengine.com [66.111.4.29])
	by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l1NJ5W5l009361
	for <caml-list@inria.fr>; Fri, 23 Feb 2007 20:05:32 +0100
Received: from out1.internal (unknown [10.202.2.149])
	by out1.messagingengine.com (Postfix) with ESMTP id 8A33F1D014E;
	Fri, 23 Feb 2007 14:05:31 -0500 (EST)
Received: from heartbeat2.messagingengine.com ([10.202.2.161])
  by out1.internal (MEProxy); Fri, 23 Feb 2007 14:05:31 -0500
X-Sasl-enc: cdm0Rdn6tqUCznOgUe1vO+xe6GzFfuXZLrHgl//Zbvwk 1172257531
Received: from [172.16.112.115] (burnham.ljcrf.edu [192.231.106.2])
	by mail.messagingengine.com (Postfix) with ESMTP id CA1F8EDC8;
	Fri, 23 Feb 2007 14:05:30 -0500 (EST)
Date: Fri, 23 Feb 2007 11:04:29 -0800 (PST)
From: Martin Jambon <martin.jambon@ens-lyon.org>
X-X-Sender: martin@localhost
To: Joel Reymont <joelr1@gmail.com>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Case-insensitive lexing
In-Reply-To: <7504EE4A-36DE-46EF-B60C-A9F4ADEEED6F@gmail.com>
Message-ID: <Pine.LNX.4.58.0702231047450.30925@localhost>
References: <23E977DC-77FF-44A2-8675-5EAA8F61505F@gmail.com>
 <011EB42A-05E3-4686-BED7-2DB8B2663221@cs.uni-sb.de>
 <7504EE4A-36DE-46EF-B60C-A9F4ADEEED6F@gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Miltered: at concorde with ID 45DF3AFC.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)!
X-Spam: no; 0.00; ens-lyon:01 lexing:01 lindig:01 uglier:01 ocamllex:01 ocamllex:01 tilde:01 regexp:01 2007,:98 23,:98 2007,:98 wrote:01 wrote:01 caml-list:01 lex:01 

On Fri, 23 Feb 2007, Joel Reymont wrote:

>
> On Feb 23, 2007, at 3:50 PM, Christian Lindig wrote:
>
> > I assume you define identifier to a sequence of upper and lower
> > characters. What is wrong with this solution and what would you
> > have liked to see instead?
>
> I was hoping not having to write this (from a Lex file):
>
> A       [Aa]
> B       [Bb]
> ..
> Y       [Yy]
> Z       [Zz]
>
> {A}{B}{O}{V}{E}                 { return(ABOVE); }
> {A}{G}{O}                       { return(AGO); }
> {A}{L}{E}{R}{T}                 { return(ALERT); }
>
> Or even uglier
>
> {N}{U}{M}{E}{R}{I}{C}{A}{R}{R}{A}{Y}{R}{E}{F}         ('ARRAY-NUM-
> REF, yytext)
> {T}{R}{U}{E}{F}{A}{L}{S}{E}{A}{R}{R}{A}{Y}            ('ARRAY-NUM,
> yytext)
> {T}{R}{U}{E}{F}{A}{L}{S}{E}{A}{R}{R}{A}{Y}{R}{E}{F}   ('ARRAY-NUM-
> REF, yytext)

The ocamllex version would look like:

let a = ['A' 'a']
let b = ['B' 'b']
...

  a b o v e                 { ABOVE }
| a g o                     { AGO }
| a l e r t                 { ALERT }


Note that in micmatch--which is not ocamllex but supports the same core
syntax--there is a tilde operator which makes the regexp case-insensitive
according to latin1. It doesn't work with every language or encoding
though.


Martin

--
Martin Jambon
http://martin.jambon.free.fr