From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id JAA24838 for caml-red; Thu, 7 Dec 2000 09:04:52 +0100 (MET) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id BAA28192 for ; Wed, 6 Dec 2000 01:51:41 +0100 (MET) Received: from miss.wu-wien.ac.at (miss.wu-wien.ac.at [137.208.107.17]) by nez-perce.inria.fr (8.11.1/8.10.0) with ESMTP id eB60per18960 for ; Wed, 6 Dec 2000 01:51:40 +0100 (MET) Received: (from mottl@localhost) by miss.wu-wien.ac.at (8.9.0/8.9.0) id BAA09568 for caml-list@inria.fr; Wed, 6 Dec 2000 01:51:39 +0100 (MET) Date: Wed, 6 Dec 2000 01:51:39 +0100 From: Markus Mottl To: OCAML Subject: features of PCRE-OCaml Message-ID: <20001206015139.D31140@miss.wu-wien.ac.at> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: weis@pauillac.inria.fr Hello, it seems that many people hadn't yet learnt about PCRE-OCaml (the OCaml-interface to the PCRE-library) and have asked for more information on the advantages as compared to the Str-library (or to Perl). Here is a list of features as taken from the README: * The PCRE-library by Philip Hazel has been under development for quite some time now and is fairly advanced and stable. It implements just about all of the convenient functionality of regular expressions as one can find them in PERL. The higher-level functions written in OCaml (split, replace), too, are compatible to the corresponding PERL-functions (to the extent that OCaml allows). Most people find the syntax of PERL-style regular expressions more straightforward than the Emacs-style one used in the "Str"-module. * In contrast to PERL, the library creates DFAs (deterministic finite automata) instead of NFAs (nondeterministic finite automata). DFAs generally allow much faster pattern matching, because they never need to backtrack. Especially patterns with many alternations can see a great speedup. * It is reentrant - and thus thread safe. This is not the case with the "Str"-module of OCaml, which builds on the GNU "regex"-library. Using reentrant libraries also means more convenience for programmers. They do not have to reason about states in which the library might be in. * The high-level functions for replacement and substitution, they are all implemented in OCaml, are much faster than the ones of the "Str"-module. In fact, when compiled to native code, they even seem to be significantly faster than those of PERL (PERL is written in C). Somebody reported to me that he had tested OCaml with PCRE-OCaml against PERL and Python with several 100MB data that had to be matched/manipulated. Trusting his claims, the overall speed of the OCaml-version (native code) was 15 times faster than Perl and 45 times faster than Python, which is probably also due to the high quality of the OCaml-compiler. * You can rely on the data returned being unique. In other terms: if the result of a function is a string, you can safely use destructive updates on it without having to fear side effects. * The interface to the library makes use of labels and default arguments to give you a high degree of programming comfort. I hope this answers most questions! Best regards, Markus Mottl -- Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl