From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id IAA05371; Mon, 20 Sep 2004 08:54:52 +0200 (MET DST) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id IAA06068 for ; Mon, 20 Sep 2004 08:54:51 +0200 (MET DST) Received: from venus.is.s.u-tokyo.ac.jp (venus.is.s.u-tokyo.ac.jp [133.11.12.9]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id i8K6sngj002615 for ; Mon, 20 Sep 2004 08:54:50 +0200 Received: from tuba.is.s.u-tokyo.ac.jp (tuba.is.s.u-tokyo.ac.jp [133.11.12.102]) by venus.is.s.u-tokyo.ac.jp (8.11.6p3/3.7W) with ESMTP id i8K6sm723744; Mon, 20 Sep 2004 15:54:48 +0900 (JST) Received: (from oiwa@localhost) by tuba.is.s.u-tokyo.ac.jp (8.11.6+Sun/3.7W) id i8K6smc05417; Mon, 20 Sep 2004 15:54:48 +0900 (JST) X-Authentication-Warning: tuba.is.s.u-tokyo.ac.jp: oiwa set sender to oiwa@yl.is.s.u-tokyo.ac.jp using -f To: skaller@users.sourceforge.net Cc: caml-list Subject: Re: [Caml-list] [ann] Regexp library supporting binding for * and +'s References: <1095640712.2580.292.camel@pelican.wigram> MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII From: Yutaka OIWA Date: Mon, 20 Sep 2004 15:54:48 +0900 In-Reply-To: <1095640712.2580.292.camel@pelican.wigram> (skaller's message of "20 Sep 2004 10:38:33 +1000") Message-ID: User-Agent: T-gnus/6.15.6 (based on Oort Gnus v0.06) (revision 01) SEMI/1.14.3 (Ushinoya) FLIM/1.14.3 (=?ISO-8859-4?Q?Unebigory=F2mae?=) APEL/10.2 Emacs/20.7 (sparc-sun-solaris2.8) MULE/4.0 (HANANOEN) X-Miltered: at concorde with ID 414E7EB9.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 yutaka:01 oiwa:01 oiwa:01 u-tokyo:01 2004:99 sourceforge:01 2004:99 yutaka:01 pcre:01 helper:01 runtime:01 backends:01 pcre:01 3.07:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk >> On 20 Sep 2004 10:38:33 +1000, skaller said: skaller> On Mon, 2004-09-20 at 06:41, Yutaka OIWA wrote: >> I plan to construct a neat syntax sugar over this library >> and build a next-generation version of Regexp/OCaml library. >> Any comments are welcome. skaller> Can you explain why/how Pcre is being used? The reason is simply current implemenentation convenience. It is stable, has enough features (e.g. unlimited number of captures, non-capturing groups, much of helper functions and runtime features, and is well-performing. My intension is not to implement automata engine by myself, at least in near future. However, as you can see in README in Regexp-OCaml (main version), my future plan includes supporting backends other than PCRE/OCaml. Having its own regexp parser and limiting regexp syntax to strict regular language are the provision for possible future. At the time of OCaml 3.07 released, I really considered to support the standard Str module, but unfortunately current Str lacks some of the features required by current Regexp/OCaml implementation. Anyway, backend is backend. And also, frontend is frontend. Period. It can be highly independent once it designed so, and my interests are mainly in the frontend part. I highly appreciete supports from people working on the backend part. Multilingualization is one in current high-priority to-do list. At least one of the users requested me to support EUC-JP patterns, and you might be the second person :-) I am considering how to support M17N feature: it may depends to underlying backends (e.g. Camomile?), or it may be supported solely in the frontend layer, by encoding multibyte handling into regexps. This trick is used in the Japanese port of Perl interpreter on MS-DOS, and (at least) one of Japanese handling module for Perl5. # As you can imagine, just using M17N feature of underlying library is # not sufficient: internal regexp parser must also modified to accept # multibyte-encoded regular expression. This is one of the reason that # curent Regexp/OCaml does not support UTF8 option of PCRE/OCaml. For supporting list-binding of Kleene-stars, I am very interested in richer backends which supports such features. Alain Frisch's recent posting has interested me. There is also a talk with related title in ICFP04, although I had not yet read the paper. However, I feel at the same time that backend is not a current show-stopper: it is truly better to have such backends, but it can be emulated without that, As I had shown in the combinators. I can wait for a while for theretical/practical progresses. Current problem is mainly the frontend: there are many language-design problems once we introduce nested bindings. I already had a discussion with some people in ICFP04, and I hope more. -- Yutaka Oiwa Yonezawa Lab., Dept. of Computer Science, Graduate School of Information Sci. & Tech., Univ. of Tokyo. , PGP fingerprint = C9 8D 5C B8 86 ED D8 07 EA 59 34 D8 F4 65 53 61 ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners