From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA01147; Fri, 16 Jan 2004 19:52:48 +0100 (MET) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA01137 for ; Fri, 16 Jan 2004 19:52:47 +0100 (MET) Received: from venus.is.s.u-tokyo.ac.jp (venus.is.s.u-tokyo.ac.jp [133.11.12.9]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id i0GIqj508535 for ; Fri, 16 Jan 2004 19:52:45 +0100 (MET) Received: from tuba.is.s.u-tokyo.ac.jp (tuba.is.s.u-tokyo.ac.jp [133.11.12.102]) by venus.is.s.u-tokyo.ac.jp (8.11.6p3/3.7W) with ESMTP id i0GIqgb15839 for ; Sat, 17 Jan 2004 03:52:42 +0900 (JST) Received: (from oiwa@localhost) by tuba.is.s.u-tokyo.ac.jp (8.11.6+Sun/3.7W) id i0GIqgP05808; Sat, 17 Jan 2004 03:52:42 +0900 (JST) X-Authentication-Warning: tuba.is.s.u-tokyo.ac.jp: oiwa set sender to oiwa@yl.is.s.u-tokyo.ac.jp using -f To: caml-list@inria.fr Subject: Re: [Caml-list] ANNOUNCE: mod_caml 1.0.6 - includes security patch References: <20040115140324.GA3047@redhat.com> <4006AC01.F2AD2741@decis.be> <20040115154211.GA8340@redhat.com> <20040115161943.GB9541@fichte.ai.univie.ac.at> <20040115165315.GA10912@redhat.com> <6290BE91-47EB-11D8-A8F5-000393B8133A@wetware.com> <20040116093454.GA23909@redhat.com> MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII From: Yutaka OIWA Date: Sat, 17 Jan 2004 03:52:42 +0900 In-Reply-To: <20040116093454.GA23909@redhat.com> (Richard Jones's message of "Fri, 16 Jan 2004 09:34:54 +0000") Message-ID: User-Agent: T-gnus/6.15.6 (based on Oort Gnus v0.06) (revision 01) SEMI/1.14.3 (Ushinoya) FLIM/1.14.3 (=?ISO-8859-4?Q?Unebigory=F2mae?=) APEL/10.2 Emacs/20.7 (sparc-sun-solaris2.8) MULE/4.0 (HANANOEN) X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 yutaka:01 oiwa:01 oiwa:01 u-tokyo:01 2004:99 2004:99 foo:01 camlp:01 -macro:01 u-tokyo:01 printf:01 printf:01 matchings:01 regex:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Hello. >> On Fri, 16 Jan 2004 09:34:54 +0000, Richard Jones said: Richard> Being able to write: Richard> var ~ /ab+/ Richard> and similar certainly makes string handling and simple parsing a lot Richard> easier. >> On Fri, 16 Jan 2004 13:05:15 -0600 (CST), Brian Hurt said: Brian> What I'd like to see is to be able to pattern match on regexs, like: Brian> match str with Brian> | /ab+/ -> ... Brian> | /foo(bar)*/ -> ... Brian> etc. My camlp4-macro named Regexp/OCaml may solve most of the requests: try it from http://www.yl.is.s.u-tokyo.ac.jp/~oiwa/caml/ . Using Regexp/OCaml, you can write the code like Regexp.match str with "^(\d+)-(\d+)$" as f : int, t : int -> for i = f to t do printf "%d\n" i done | "^(\d+)$" as s : int -> printf "%d\n" s to perform branch based on multiple regular patterns and to extract matched substrings automatically (bound to f, t, s respectively, after converted to int type by using int_of_string). See http://www.yl.is.s.u-tokyo.ac.jp/~oiwa/pub/caml/regexp-pp-0.9.3/README.match-regexp for further details. Brian> The compiler could then combine all the matchings into a single DFA, Brian> improving performance over code like: Brian> if (regex_match str "ab+") then Brian> ... Brian> else if (regex_match str "foo(bar)*") then Brian> ... Brian> else Brian> ... The code generated by current Regexp/OCaml is something similar to the above, (however, pattern compilations are performed only once per execution per each pattern.) but if the backend regexp engine (currently Regexp/OCaml uses PCRE/OCaml) supports optimization for multiple regular expression matching, Regexp/OCaml can easily utilize it. Analysis for patterns may be performed at compilation (camlp4-translation) phase, if required. Brian> The regex matching would also let the compiler know if there were possible Brian> unmatched strings (these would should up as transitions to the error state Brian> in the DFA). This feature is not currently implemented in Regexp/OCaml, but as the macro package owns self-implemented parser for regular patterns, it is possible to implement if I have enough time to do. (And it is included in my personal to-do list for Regexp/OCaml.) -- Yutaka Oiwa Yonezawa Lab., Dept. of Computer Science, Graduate School of Information Sci. & Tech., Univ. of Tokyo. , PGP fingerprint = C9 8D 5C B8 86 ED D8 07 EA 59 34 D8 F4 65 53 61 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners