From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA01825; Thu, 23 Aug 2001 19:32:24 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA04814 for ; Thu, 23 Aug 2001 19:32:23 +0200 (MET DST) Received: from shell5.ba.best.com (shell5.ba.best.com [206.184.139.136]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f7NHWLH01672 for ; Thu, 23 Aug 2001 19:32:22 +0200 (MET DST) Received: from localhost (bpr@localhost) by shell5.ba.best.com (8.9.3/8.9.2/best.sh) with ESMTP id KAA25655; Thu, 23 Aug 2001 10:31:46 -0700 (PDT) Date: Thu, 23 Aug 2001 10:31:46 -0700 (PDT) From: Brian Rogoff To: Nicolas George cc: Miles Egan , Markus Mottl , neale-caml@woozle.org, caml-list@pauillac.inria.fr Subject: Re: [Caml-list] Str.string_match raising Invalid_argument "String.sub" in gc In-Reply-To: <20010823000625.B4229@aimlin> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN Content-Transfer-Encoding: QUOTED-PRINTABLE Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Thu, 23 Aug 2001, Nicolas George wrote: > Le mercredi 22 ao=FBt 2001 =E0 13:31, Miles Egan a =E9crit=A0: > >>=09=09 PCRE-library (Perl Compatible Regular Expressions): > > I've asked this several times before, but I think it's worth asking aga= in: is > > there any chance of adding pcre to the stock distribution? It's superi= or in > > every way the the str module and much friendlier to python/perl refugee= s. >=20 > I second that too. And because PCRE is under LGPL (Str is based on GNU > regex, which is under GPL), it could be in the standard library and not > only in the distribution.=20 Some other "pure OCaml" regexp engines were discussed here recently, includ= ing=20 Claude Marche's and the one from Unison. Since the Unison code is under GPL= =20 and not LGPL, and I'm a (inverse) license ayatollah, I can only use the LGPL'ed one. I've been playing with it and it's quite nice, though I think = it needs a few more bells and whistles to satisfy the Perlers. I don't know ho= w=20 it compares in performance against the Pcre C code.=20 I agree that Str is suboptimal, but I think that there are also a few other ways in which string handling could be improved, like=20 (1) Very long strings (Sys.max_string_length =3D 16777211 on most machines). Please don't tell me that slurping a 100M file into a=20 string is probably not smart, I know that, but it's a restriction that annoys some (many?) programmers.=20 (2) Wide character strings (3) Functional strings (and functional arrays while we're at it :) (4) Substrings (1) and (3) could be fixed by adding a "ropes" library, or (1) alone could be fixed by building strings over Bigarrays. (2) can also be fixed using=20 Bigarrays, either building on top of them or just stealing the C code and= =20 specializing it. I ported the SML Basis library for substrings over to OCaml, but I much prefer Hansen's subsequence reference approach (if you've read Finkel's "Advanced Programming Language Design" you know what I mean) and I've made a new module based on that which I'll release after some more tire kicking; e-mail me if you want a version. Interestingly, it= =20 depends on physical reference equality so a semantics preserving port to SML would require some uglification.=20 So, I think we could use a richer set of string datatypes, and operations= =20 over them. It's not clear to me how much of this needs to be part of OCaml= =20 proper, and how much should just be, say, part of the CDK. It is clear that= =20 if there is going to be built-in regexp matching that Str is not the way to= go.=20 =20 > Maybe we could even hope a regexp pattern matching as a syntax extension = :-) Some version of Haskell had a regexp matcher built in that worked on regexp= s over=20 other types than characters. I don't think it survived, but it's certainly a cool idea. -- Brian ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr