caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] String.unescaped and some other little pitiful laments
@ 2001-07-10 18:07 Berke Durak
  2001-07-10 18:55 ` Markus Mottl
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Berke Durak @ 2001-07-10 18:07 UTC (permalink / raw)
  To: caml-list

There's a reversible String.escaped function, and I think it would be
nice to have its inverse function built in the String module.

Also I'd like to see those horrible functions returning parameters in
global variables be eradicated, such as those that can be found in the Str
(regular expression) module. Is there a complete, typeful regular
expression package entirely written in Ocaml ?

Many people on this list are talking lighthearted about functions such
as Obj.magic. These functions are pure evil. It makes me sorry to see
that my favorite language has an unsafe and ugly type casting
function. Modules using such features should be flagged as
``evil'', and the use of these functions should not be publicly
advocated.

PS. What is the purpose of the "uses unsafe features" flag in .cmo
files ?  (it can be seen in the output of the "objinfo" program in the
tools/ directory of the compiler). I've made a test program using
unsafe features such as Obj and Array.unsafe_get but the flag wasn't
set.
--
Berke Durak

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] String.unescaped and some other little pitiful laments
  2001-07-10 18:07 [Caml-list] String.unescaped and some other little pitiful laments Berke Durak
@ 2001-07-10 18:55 ` Markus Mottl
  2001-07-11  6:44   ` Jean-Christophe Filliatre
  2001-07-11  6:37 ` Jean-Christophe Filliatre
  2001-07-11 19:30 ` Xavier Leroy
  2 siblings, 1 reply; 8+ messages in thread
From: Markus Mottl @ 2001-07-10 18:55 UTC (permalink / raw)
  To: Berke Durak; +Cc: caml-list

On Tue, 10 Jul 2001, Berke Durak wrote:
> Also I'd like to see those horrible functions returning parameters in
> global variables be eradicated, such as those that can be found in the Str
> (regular expression) module. Is there a complete, typeful regular
> expression package entirely written in Ocaml ?

Unfortunately not. My Pcre-library has to interface to C to access the
matching engine, but the huge rest of the functions that build on it
are written in OCaml. In contrast to the Str-library, the Pcre-library
is fully reentrant, which is nice if you want to use it with threads or
want to interleave several matches with others.

If somebody wants to give it a try, the SML-entry in the language shootout
implements a regexp-library with NFAs and DFAs. I haven't given it a
closer look yet, but performance looks excellent:

  http://www.bagley.org/~doug/shootout/bench/regexmatch/regexmatch.mlton

> Many people on this list are talking lighthearted about functions
> such as Obj.magic. These functions are pure evil. It makes me sorry
> to see that my favorite language has an unsafe and ugly type casting
> function. Modules using such features should be flagged as ``evil'',
> and the use of these functions should not be publicly advocated.

I don't think anybody would talk lightheared about "Obj.magic". It happens
extremely seldom that one needs it, e.g. when you want to initialize
the contents of a reference with a fully polymorphic value, which
you cannot necessarily create (and matches on optional values with an
"assert false"-branch look really ugly and require many more lines, too).

The latter problem could be eliminated in some cases if one could raise
exceptions with polymorphic contents (by binding the type variable in
some enclosing expression in which the exception is defined).

There is also the trick to use "Obj.magic" for resizable arrays to
deallocate objects that are outside the index. I wouldn't know how else
to get the same behaviour.

Regards,
Markus Mottl

-- 
Markus Mottl                                             markus@oefai.at
Austrian Research Institute
for Artificial Intelligence                  http://www.oefai.at/~markus
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] String.unescaped and some other little pitiful laments
  2001-07-10 18:07 [Caml-list] String.unescaped and some other little pitiful laments Berke Durak
  2001-07-10 18:55 ` Markus Mottl
@ 2001-07-11  6:37 ` Jean-Christophe Filliatre
  2001-07-11  7:29   ` Claude Marche
  2001-07-11 18:03   ` Jerome Vouillon
  2001-07-11 19:30 ` Xavier Leroy
  2 siblings, 2 replies; 8+ messages in thread
From: Jean-Christophe Filliatre @ 2001-07-11  6:37 UTC (permalink / raw)
  To: Berke Durak; +Cc: caml-list


Berke Durak writes:
 > 
 > Also I'd like to see those horrible functions returning parameters in
 > global variables be eradicated, such as those that can be found in the Str
 > (regular expression) module. Is there a complete, typeful regular
 > expression package entirely written in Ocaml ?

Yes,  there  is one  by  Claude Marché,  available  (in  a very  first
release) at:

	 http://www.lri.fr/~marche/tmp/regexp-0.1.tar.gz

(Documentation can be found in  the .mli files)

-- 
Jean-Christophe Filliatre
  mailto:Jean-Christophe.Filliatre@lri.fr
  http://www.lri.fr/~filliatr
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] String.unescaped and some other little pitiful laments
  2001-07-10 18:55 ` Markus Mottl
@ 2001-07-11  6:44   ` Jean-Christophe Filliatre
  0 siblings, 0 replies; 8+ messages in thread
From: Jean-Christophe Filliatre @ 2001-07-11  6:44 UTC (permalink / raw)
  To: Markus Mottl; +Cc: Berke Durak, caml-list


Markus Mottl writes:

 > If somebody wants to give it a try, the SML-entry in the language shootout
 > implements a regexp-library with NFAs and DFAs. I haven't given it a
 > closer look yet, but performance looks excellent:
 > 
 >   http://www.bagley.org/~doug/shootout/bench/regexmatch/regexmatch.mlton

I  tried  Claude Marché's  Regexp  library  instead  of Pcre  on  that
particular example and there is a speedup of 5 % approximatively.

Note  that  the  Regexp  library  compiles  regular  expressions  into
deterministic finite  automata using  a very different  algorithm than
the SML entry, from this article:

    G. Berry and R. Sethi
    From Regular Expressions to Deterministic Automata
    Theoretical Computer Science 48 (1986) 117-126

This  is a very  nice and  concise algorithm  and I  encourage anybody
interested in regexp and automata to have a look at it (again, the url
for the code is http://www.lri.fr/~marche/tmp/regexp-0.1.tar.gz)

-- 
Jean-Christophe Filliatre
  mailto:Jean-Christophe.Filliatre@lri.fr
  http://www.lri.fr/~filliatr
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] String.unescaped and some other little pitiful laments
  2001-07-11  6:37 ` Jean-Christophe Filliatre
@ 2001-07-11  7:29   ` Claude Marche
  2001-07-11 18:03   ` Jerome Vouillon
  1 sibling, 0 replies; 8+ messages in thread
From: Claude Marche @ 2001-07-11  7:29 UTC (permalink / raw)
  To: Jean-Christophe Filliatre; +Cc: Berke Durak, caml-list


Hi all,

>>>>> "Jean-Christophe" == Jean-Christophe Filliatre <Jean-Christophe.Filliatre@lri.fr> writes:

    Jean-Christophe> Berke Durak writes:
    >> 
    >> Also I'd like to see those horrible functions returning parameters in
    >> global variables be eradicated, such as those that can be found in the Str
    >> (regular expression) module. Is there a complete, typeful regular
    >> expression package entirely written in Ocaml ?

    Jean-Christophe> Yes,  there  is one  by  Claude Marché,  available  (in  a very  first
    Jean-Christophe> release) at:

    Jean-Christophe> 	 http://www.lri.fr/~marche/tmp/regexp-0.1.tar.gz

    Jean-Christophe> (Documentation can be found in  the .mli files)

I made this small package quite recently, entirely in Caml because I
needed to do so. But with respect to Str and Pcre, several features
are missing. If there is enough demand I may add such features in the
future. I added to the Web page the documentation (generated with
ocamlweb, http://www.lri.fr/~filliatr/ocamlweb/) in HTML, PDF and PS
format. See http://www.lri.fr/~marche/regexp

- Claude

-- 
| Claude Marché           | mailto:Claude.Marche@lri.fr |
| LRI - Bât. 490          | http://www.lri.fr/~marche/  |
| Université de Paris-Sud | phoneto: +33 1 69 15 64 85  |
| F-91405 ORSAY Cedex     | faxto: +33 1 69 15 65 86    |
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] String.unescaped and some other little pitiful laments
  2001-07-11  6:37 ` Jean-Christophe Filliatre
  2001-07-11  7:29   ` Claude Marche
@ 2001-07-11 18:03   ` Jerome Vouillon
  1 sibling, 0 replies; 8+ messages in thread
From: Jerome Vouillon @ 2001-07-11 18:03 UTC (permalink / raw)
  To: Jean-Christophe Filliatre; +Cc: Berke Durak, caml-list

On Wed, Jul 11, 2001 at 08:37:38AM +0200, Jean-Christophe Filliatre wrote:
> 
> Berke Durak writes:
>  > 
>  > Also I'd like to see those horrible functions returning parameters in
>  > global variables be eradicated, such as those that can be found in the Str
>  > (regular expression) module. Is there a complete, typeful regular
>  > expression package entirely written in Ocaml ?
> 
> Yes,  there  is one  by  Claude Marché,  available  (in  a very  first
> release) at:
> 
> 	 http://www.lri.fr/~marche/tmp/regexp-0.1.tar.gz
> 
> (Documentation can be found in  the .mli files)

We also implemented a regular expression module for Unison, as the
standard one (Str) was unusably slow.  It has a similar interface to
the one by Claude Marché, but it is more complete: it support almost
all Posix extended regular expression (only collating sequences are
missing), filename globbing, case insensitive matching, and boolean
operations (union, intersection and difference) on regular expression.
Both implementation have the disavantage of not supporting submatches,
though.  It would be interesting to compare their perfomances.

The sources are a stand-alone subset (files src/rx.ml and src/rx.mli)
of the sources of Unison (available from
http://www.cis.upenn.edu/~bcpierce/unison/index.html).

-- Jerome
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] String.unescaped and some other little pitiful laments
  2001-07-10 18:07 [Caml-list] String.unescaped and some other little pitiful laments Berke Durak
  2001-07-10 18:55 ` Markus Mottl
  2001-07-11  6:37 ` Jean-Christophe Filliatre
@ 2001-07-11 19:30 ` Xavier Leroy
  2001-07-11 20:33   ` Markus Mottl
  2 siblings, 1 reply; 8+ messages in thread
From: Xavier Leroy @ 2001-07-11 19:30 UTC (permalink / raw)
  To: Berke Durak; +Cc: caml-list

> Also I'd like to see those horrible functions returning parameters in
> global variables be eradicated, such as those that can be found in the Str
> (regular expression) module. 

Yes, the Str module is a thorn in my side: not only the API is bad
(too much reliance on global state), but the underlying implementation
(the GNU regexp library) is awful -- on moderately complex regular
expressions, it can get really slow, or just abort on an exception.
(Stallman et al usually write better code than this!)

I'd really love to get rid of it, but as usual I'm obsessed with
backward compatibility, and couldn't find an existing regexp library
that recognizes the same regexp language as Str -- so that we could easily
keep the old Str interface as a wrapper around the new interface.

So, this is a question to the developers of alternate regexp
libraries: how hard would it be to implement an Str emulation on top
of your libraries?  If you're interested, we can pursue this
discussion by private e-mail.

> Many people on this list are talking lighthearted about functions such
> as Obj.magic. These functions are pure evil. It makes me sorry to see
> that my favorite language has an unsafe and ugly type casting
> function. Modules using such features should be flagged as
> ``evil'', and the use of these functions should not be publicly
> advocated.

But they are not!  Not by us, at least.  You'd be hard-pressed to find
any mention of the Obj module in the OCaml docs.  There are a couple
of legitimate uses of Obj.magic in the toplevel loop, and a few other
uses (e.g. in ocamlyacc-generated parsers) that could be removed with
a little more work.

But, yes, I'd advise all OCaml programmers to never, never use
Obj.magic.  In particular, this can lead to incorrect code being
generated by the ocamlopt compiler (because it fools its
type-dependent optimizations).

A few years ago, I spent a couple of hours tracking an obscure GC bug
in a program sent by an user as part of a bug report.  It turned out
to be an incorrect use of Obj.magic in the source code...  Since then,
I first grep for Obj.magic in every bug report sent to us!

> PS. What is the purpose of the "uses unsafe features" flag in .cmo
> files ?  (it can be seen in the output of the "objinfo" program in the
> tools/ directory of the compiler). I've made a test program using
> unsafe features such as Obj and Array.unsafe_get but the flag wasn't
> set.

It's poorly named.  Actually, it tracks whether the module declares
external primitives (using the "external" syntax).  It's used for
type-safe dynamic loading of compiled bytecode: the Dynlink loader
lets you check that the bytecode was compiled against a set of known
interfaces (presumably not including unsafe operations such as
Obj.magic or Array.unsafe_get), but there is also the risk that the
bytecode simply declares these operations itself using well-chosen
"external" declarations.  So, Dynlink can also track "external"
declarations and prohibit them.

- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] String.unescaped and some other little pitiful laments
  2001-07-11 19:30 ` Xavier Leroy
@ 2001-07-11 20:33   ` Markus Mottl
  0 siblings, 0 replies; 8+ messages in thread
From: Markus Mottl @ 2001-07-11 20:33 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Berke Durak, caml-list

On Wed, 11 Jul 2001, Xavier Leroy wrote:
> So, this is a question to the developers of alternate regexp
> libraries: how hard would it be to implement an Str emulation on top
> of your libraries?  If you're interested, we can pursue this
> discussion by private e-mail.

Well, I somehow felt addressed by this request... ;)

It may seem that one would only have to rewrite Emacs-style patterns
to Perl-style ones to use my Pcre-interface. Unfortunately, there is
no way to stay backward compatible _and_ get rid of the statefulness,
because the interface of the Str-library just relies on the latter.
One could surely emulate this behaviour with not too much effort, but
the Str-library would still remain awfully stateful then (though it
would perform matching somewhat faster).

I think that in the long run there will be no way around declaring the
Str-interface obsolete. Especially for multi-threaded applications a
stateless regexp engine is really a requirement. The longer we keep this
interface, the more legacy code we will get...

If you want, just shamelessly grab the Pcre-library and adapt it to
your needs: it's LGPLed anyway. Though, I admit that I'd also like to
see a featureful and fast regexp-library purely written in OCaml rather
than one cowardly interfacing to existing C-libraries (would require
significantly more work).

If anybody wants to write a substitute for the Str- building on the
Pcre-library and needs some hints, just tell me. Unfortunately, I won't
have time to work on this issue in the near future...

Best regards,
Markus Mottl

-- 
Markus Mottl                                             markus@oefai.at
Austrian Research Institute
for Artificial Intelligence                  http://www.oefai.at/~markus
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-07-11 20:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-07-10 18:07 [Caml-list] String.unescaped and some other little pitiful laments Berke Durak
2001-07-10 18:55 ` Markus Mottl
2001-07-11  6:44   ` Jean-Christophe Filliatre
2001-07-11  6:37 ` Jean-Christophe Filliatre
2001-07-11  7:29   ` Claude Marche
2001-07-11 18:03   ` Jerome Vouillon
2001-07-11 19:30 ` Xavier Leroy
2001-07-11 20:33   ` Markus Mottl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).