From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id BAA31622; Tue, 3 Jun 2003 01:42:55 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id BAA31678 for ; Tue, 3 Jun 2003 01:42:54 +0200 (MET DST) Received: from mail3.tpgi.com.au (mail.tpgi.com.au [203.12.160.59]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id h52NgpH05886 for ; Tue, 3 Jun 2003 01:42:51 +0200 (MET DST) Received: from ozemail.com.au (203-219-225-121-syd-ts24-2600.tpgi.com.au [203.219.225.121]) (authenticated (0 bits)) by mail3.tpgi.com.au (8.11.6/8.11.6) with ESMTP id h52NggK20913; Tue, 3 Jun 2003 09:42:43 +1000 Message-ID: <3EDBE0F2.2010009@ozemail.com.au> Date: Tue, 03 Jun 2003 09:42:42 +1000 From: John Max Skaller User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2.1) Gecko/20010901 X-Accept-Language: en-us MIME-Version: 1.0 To: Stefan Heimann CC: caml-list@inria.fr Subject: Re: [Caml-list] ocamllex, regular expression syntax References: <20030522205632.GA2130@kunz.ratzer> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam: no; 0.00; ozemail:01 caml-list:01 quoting:01 pcre:01 nsq:99 toxteth:01 glebe:01 2037,:01 9660:01 0850:01 bin:01 ocaml:01 nsw:01 readable:01 syntax:02 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Stefan Heimann wrote: > Hi, > > [sorry if this posting appears twice. I first submitted it with my > news client. It seems not to appear on the mailing list and so I > decided to post it again] > > > I new to ocaml and today I played a little bit around with > ocamllex. Now I'm wondering why ocamllex has this strange regular > expression syntax. One has to quoted every character > Regular expressions like this > > "[^"\\]*(\\.[^"\\]*)*" > > are not easy to read, but with the ocamllex syntax it is even more > difficult: > > '"'[^'"''\\']*('\\'_[^'"''\\']*)*'"' > > (and harder to write). > > Is this just for historical reason or is there a practical reason for > this syntax? The ocamllex syntax is MUCH more readable if you figure out how to use it correctly: let bindigit = ['0'-'1'] let octdigit = ['0'-'7'] let digit = ['0'-'9'] let hexdigit = digit | ['A'-'F'] | ['a'-'f'] let bin_lit = '0' ('b' | 'B') (underscore? bindigit) + let oct_lit = '0' ('o' | 'O') (underscore? octdigit) + let dec_lit = ('0' ('d' | 'D'))? digit (underscore? digit) * let hex_lit = '0' ('x' | 'X') (underscore? hexdigit) + The reason for quoting characters is now obvious: ocamllex provides regular *definitions* not just regular expressions, and they're infinitely superior; its much better to use identifers for expressions, than to embed them in strings like pcre "*" // pcre alpha * // ocamllex You'd be mad not to write your example like this: let quote = '"' let slosh = "\\" let any = _ let nsq = [^'\\''"'] (* WEAK! *) dquote nsq * ( any nsq * ) * dquote which I can actually read :-) the [] syntax is weak though, Felix does much better (and regular definitions are built into the language like patterns are in ocaml) -- John Max Skaller, mailto:skaller@ozemail.com.au snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners