caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Brian Rogoff <bpr@best.com>
To: Nicolas George <nicolas.george@ens.fr>
Cc: Miles Egan <miles@caddr.com>,
	Markus Mottl <markus@mail4.ai.univie.ac.at>,
	neale-caml@woozle.org, caml-list@pauillac.inria.fr
Subject: Re: [Caml-list] Str.string_match raising Invalid_argument "String.sub" in gc
Date: Thu, 23 Aug 2001 10:31:46 -0700 (PDT)	[thread overview]
Message-ID: <Pine.BSF.4.21.0108230948480.17462-100000@shell5.ba.best.com> (raw)
In-Reply-To: <20010823000625.B4229@aimlin>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 3030 bytes --]

On Thu, 23 Aug 2001, Nicolas George wrote:
> Le mercredi 22 août 2001 à 13:31, Miles Egan a écrit :
> >>		    PCRE-library (Perl Compatible Regular Expressions):
> > I've asked this several times before, but I think it's worth asking again: is
> > there any chance of adding pcre to the stock distribution?  It's superior in
> > every way the the str module and much friendlier to python/perl refugees.
> 
> I second that too. And because PCRE is under LGPL (Str is based on GNU
> regex, which is under GPL), it could be in the standard library and not
> only in the distribution. 

Some other "pure OCaml" regexp engines were discussed here recently, including 
Claude Marche's and the one from Unison. Since the Unison code is under GPL 
and not LGPL, and I'm a (inverse) license ayatollah, I can only use the
LGPL'ed one. I've been playing with it and it's quite nice, though I think it
needs a few more bells and whistles to satisfy the Perlers. I don't know how 
it compares in performance against the Pcre C code. 

I agree that Str is suboptimal, but I think that there are also a few
other ways in which string handling could be improved, like 

(1) Very long strings (Sys.max_string_length = 16777211 on most
    machines). Please don't tell me that slurping a 100M file into a 
    string is probably not smart, I know that, but it's a restriction
    that annoys some (many?) programmers. 

(2) Wide character strings

(3) Functional strings (and functional arrays while we're at it :)

(4) Substrings

(1) and (3) could be fixed by adding a "ropes" library, or (1) alone could
be fixed by building strings over Bigarrays. (2) can also be fixed using 
Bigarrays, either building on top of them or just stealing the C code and 
specializing it. I ported the SML Basis library for substrings over to
OCaml, but I much prefer Hansen's subsequence reference approach (if
you've read Finkel's "Advanced Programming Language Design" you know what
I mean) and I've made a new module based on that which I'll release after
some more tire kicking; e-mail me if you want a version. Interestingly, it 
depends on physical reference equality so a semantics preserving port to
SML would require some uglification. 

So, I think we could use a richer set of string datatypes, and operations 
over them. It's not clear to me how much of this needs to be part of OCaml 
proper, and how much should just be, say, part of the CDK. It is clear that 
if there is going to be built-in regexp matching that Str is not the way to go. 
 
> Maybe we could even hope a regexp pattern matching as a syntax extension :-)

Some version of Haskell had a regexp matcher built in that worked on regexps over 
other types than characters. I don't think it survived, but it's certainly
a cool idea.

-- Brian


-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


  parent reply	other threads:[~2001-08-23 17:32 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-08-22 18:53 neale-caml
2001-08-22 19:18 ` Alain Frisch
2001-08-22 20:41   ` Neale Pickett
2001-08-23 10:21     ` Frank Atanassow
2001-08-23 16:06       ` Neale Pickett
2001-08-23 16:25         ` Alain Frisch
2001-08-23 18:14           ` Neale Pickett
2001-08-22 20:23 ` Markus Mottl
2001-08-22 20:31   ` Miles Egan
2001-08-22 20:52     ` Michael Leary
2001-08-23  5:36       ` Jeremy Fincher
2001-08-22 22:06     ` Nicolas George
2001-08-23  7:08       ` [Caml-list] PCRE as standard (Was: Str.string_match raising Invalid_argument...) Florian Hars
2001-08-23 17:31       ` Brian Rogoff [this message]
2001-08-23 18:08         ` [Caml-list] standard regex package Miles Egan
2001-08-23 19:28           ` Brian Rogoff
2001-08-23 19:49             ` Miles Egan
2001-08-23 19:51             ` Gerd Stolpmann
2001-08-23 21:12               ` Brian Rogoff
2001-08-23 21:27               ` Benjamin C. Pierce
2001-08-23 21:49                 ` Gerd Stolpmann
2001-08-23 22:11                   ` Miles Egan
2001-08-23 23:55                     ` Gerd Stolpmann
2001-08-24  9:03                       ` Claudio Sacerdoti Coen
2001-08-24  9:26                       ` Sven
2001-08-27 15:46                         ` [Caml-list] Package dependencies [Was: standard regex package] Ian Zimmerman
2001-08-27 20:50                           ` Gerd Stolpmann
2001-08-24  9:23                   ` [Caml-list] standard regex package Sven
2001-08-27 15:54                     ` Ian Zimmerman
2001-08-30  8:41                       ` Sven
2001-08-23 21:06             ` RE : " Lionel Fourquaux
2001-08-24  9:23               ` [Caml-list] dynamic loading and OS interface Xavier Leroy
2001-08-27 15:16             ` [Caml-list] standard regex package Ian Zimmerman
2001-08-27 15:35               ` Brian Rogoff
2001-08-24  9:13           ` Xavier Leroy
2001-08-24 10:16             ` Markus Mottl
2001-08-24 16:49             ` Miles Egan
     [not found]   ` <w533d6j1vxn.fsf@woozle.org>
     [not found]     ` <20010823112653.A7085@chopin.ai.univie.ac.at>
     [not found]       ` <w5366be7fd0.fsf_-_@woozle.org>
2001-08-23 20:01         ` [Caml-list] Re: [OFF-LIST] Str.string_match raising Invalid_argument "String.sub" in gc Markus Mottl
2001-08-23 20:31           ` Patrick M Doane

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.BSF.4.21.0108230948480.17462-100000@shell5.ba.best.com \
    --to=bpr@best.com \
    --cc=caml-list@pauillac.inria.fr \
    --cc=markus@mail4.ai.univie.ac.at \
    --cc=miles@caddr.com \
    --cc=neale-caml@woozle.org \
    --cc=nicolas.george@ens.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).