From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA03428; Fri, 23 May 2003 10:53:17 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id KAA03692 for ; Fri, 23 May 2003 10:53:17 +0200 (MET DST) Received: from beaune.inria.fr (beaune.inria.fr [128.93.8.3]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id h4N8rGT06044 for ; Fri, 23 May 2003 10:53:16 +0200 (MET DST) Received: by beaune.inria.fr (8.8.8/1.1.22.3/14Sep99-0328PM) id KAA0000027204; Fri, 23 May 2003 10:53:15 +0200 (MET DST) From: Luc Maranget Message-Id: <200305230853.KAA0000027204@beaune.inria.fr> Subject: Re: [Caml-list] ocamllex, regular expression syntax To: lists@stefanheimann.net (Stefan Heimann) Date: Fri, 23 May 2003 10:53:15 +0200 (MET DST) Cc: caml-list@inria.fr In-Reply-To: <20030522205632.GA2130@kunz.ratzer> from "Stefan Heimann" at mai 22, 2003 10:56:33 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam: no; 0.00; caml-list:01 caml-like:01 delimiters:01 unescaped:01 unquoted:01 caracter:99 --luc:01 regexp:01 compiler:01 tokens:01 htmlman:01 ocaml:01 caml:01 irrelevant:01 syntax:02 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > I new to ocaml and today I played a little bit around with > ocamllex. Now I'm wondering why ocamllex has this strange regular > expression syntax. One has to quoted every character, an arbitrary > character is matched by the underscore... > > The manual for ocamllex says: "The regular expressions are in the > style of lex, with a more Caml-like syntax." > > But the regular expression syntax in the Str module looks "normal" to > me. > > Regular expressions like this > > "[^"\\]*(\\.[^"\\]*)*" > > are not easy to read, but with the ocamllex syntax it is even more > difficult: > > '"'[^'"''\\']*('\\'_[^'"''\\']*)*'"' > > (and harder to write). > > Is this just for historical reason or is there a practical reason for > this syntax? I'm just curious... > Ah, regexp syntax ! I think I can explain a few principles, as I see them. Lex-like tools are part of, let us say, a compiler culture. In lex-style regexp, you clearly have a too stage definition. 1. The tokens: Characters (caml-style) 'c', with some escape mechamism (such as '\\') Various operators such as *, +, etc. or delimiters such as (, ) Spacing between tokens is irrelevant. 2. From the tokens, regexp are defined as trees This allows a clean, regular, definition of regexp syntax. Moreover, lexing conventions are the ones of Caml. But then, as you noticed, users have to type many quotes. Perl-like tools follow a different idea, they intend to minimize keystrokes. I guess the first idea was to make unescaped/unquoted characters correspond to their ``most frequent usage''. The consequence is that users type many backslashes, In my opinion, the meaning of quotes (ocamllex) is clear because they express one simple construct: I want this caracter. The meaning of backslahes (perl) is less clear, it means ``I want some special meaning of this characters'', which covers many situations. In particuler \ ordinary meanig is not ``a backslah, and this implies that \\ means ``I want a backslash''. The same applies to *, whose default meaning is being the repetition operator. This is a bit irregular in my opinion. Some additional problems arise when several meanings are considered. consider, for instance, \1 (reference to \(..\) number one) and \001 (character whose code is one). It is no surprise that various regexp tools disagree on such subtle points. As a conclusion, lex way of doing things is inspired by design (first lex, then parse), whereas perl way of doing things is inspired by minimizing users keystrokes, leading to, in my opinion, some dark corners. --Luc ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners