On Thu, 23 Aug 2001, Nicolas George wrote: > Le mercredi 22 août 2001 à 13:31, Miles Egan a écrit : > >> PCRE-library (Perl Compatible Regular Expressions): > > I've asked this several times before, but I think it's worth asking again: is > > there any chance of adding pcre to the stock distribution? It's superior in > > every way the the str module and much friendlier to python/perl refugees. > > I second that too. And because PCRE is under LGPL (Str is based on GNU > regex, which is under GPL), it could be in the standard library and not > only in the distribution. Some other "pure OCaml" regexp engines were discussed here recently, including Claude Marche's and the one from Unison. Since the Unison code is under GPL and not LGPL, and I'm a (inverse) license ayatollah, I can only use the LGPL'ed one. I've been playing with it and it's quite nice, though I think it needs a few more bells and whistles to satisfy the Perlers. I don't know how it compares in performance against the Pcre C code. I agree that Str is suboptimal, but I think that there are also a few other ways in which string handling could be improved, like (1) Very long strings (Sys.max_string_length = 16777211 on most machines). Please don't tell me that slurping a 100M file into a string is probably not smart, I know that, but it's a restriction that annoys some (many?) programmers. (2) Wide character strings (3) Functional strings (and functional arrays while we're at it :) (4) Substrings (1) and (3) could be fixed by adding a "ropes" library, or (1) alone could be fixed by building strings over Bigarrays. (2) can also be fixed using Bigarrays, either building on top of them or just stealing the C code and specializing it. I ported the SML Basis library for substrings over to OCaml, but I much prefer Hansen's subsequence reference approach (if you've read Finkel's "Advanced Programming Language Design" you know what I mean) and I've made a new module based on that which I'll release after some more tire kicking; e-mail me if you want a version. Interestingly, it depends on physical reference equality so a semantics preserving port to SML would require some uglification. So, I think we could use a richer set of string datatypes, and operations over them. It's not clear to me how much of this needs to be part of OCaml proper, and how much should just be, say, part of the CDK. It is clear that if there is going to be built-in regexp matching that Str is not the way to go. > Maybe we could even hope a regexp pattern matching as a syntax extension :-) Some version of Haskell had a regexp matcher built in that worked on regexps over other types than characters. I don't think it survived, but it's certainly a cool idea. -- Brian ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr