From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id SAA22747 for caml-red; Mon, 11 Dec 2000 18:29:50 +0100 (MET) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA12585 for ; Sat, 9 Dec 2000 18:08:49 +0100 (MET) Received: from localhost.localdomain (cartman118.zip.com.au [61.8.20.246]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id eB9H8jP09642 for ; Sat, 9 Dec 2000 18:08:46 +0100 (MET) Received: from ozemail.com.au (IDENT:root@localhost [127.0.0.1]) by localhost.localdomain (8.9.3/8.8.7) with ESMTP id EAA02114; Sun, 10 Dec 2000 04:07:49 +1100 Message-ID: <3A3266E5.CBC48F54@ozemail.com.au> Date: Sun, 10 Dec 2000 04:07:49 +1100 From: John Max Skaller X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.12-20 i686) X-Accept-Language: en MIME-Version: 1.0 To: Alain Frisch CC: OCAML Subject: Re: features of PCRE-OCaml References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: weis@pauillac.inria.fr Alain Frisch wrote: > > On Sat, 9 Dec 2000, John Max Skaller wrote: > > > > Have a look at wlex: > > > This reduces the number of states and transitions in the automaton, > > > > No, it has no effect on the number of states (rows of the > > DFA matrix). It reduces the number of columns. > > Yes it does: the extra classification layer may consume more than one > character in the lexbuf. When you parse UTF-8 with ocamllex, you have > "waiting" states in the automaton corresponding to multibyte encoded code > points. You're right. I've since downloaded wlex and given it a whirl. Good stuff! -- John (Max) Skaller, mailto:skaller@maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 checkout Vyper http://Vyper.sourceforge.net download Interscript http://Interscript.sourceforge.net