caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: John Max Skaller <skaller@ozemail.com.au>
To: Markus Mottl <mottl@miss.wu-wien.ac.at>
Cc: OCAML <caml-list@inria.fr>
Subject: Re: features of PCRE-OCaml
Date: Fri, 08 Dec 2000 04:08:11 +1100	[thread overview]
Message-ID: <3A2FC3FB.A0BB09DD@ozemail.com.au> (raw)
In-Reply-To: <20001207173228.B9463@miss.wu-wien.ac.at>

Markus Mottl wrote:
> 
> On Fri, 08 Dec 2000, John Max Skaller wrote:
> > Funny. Python 1.5.2 used the _same_ C library by Philip Hazel. :-)
> > Given the fact this library builds DFA's instead of NFA's, Python
> > ought to be faster than Perl. :-)
> 
> Well, the matching engine is not everything... ;)

	It is for code doing extensive matching of long strings
against a single pattern: everything else should be dwarfed
by the match time.

> > Note also, Python 2.0 uses a modified library which does something
> > PCRE-OCaml cannot: it works with Unicode characters (supposedly).
> 
> To my knowledge, Phil Hazel is working on support for this. Unless the
> PCRE-library supports Unicode (and unless OCaml does ;), there is not
> much one can do about it...

	What? You mean it isn't generic enough to just change
'char' to 'short' and recompile?  [:-)]
 
> I am not sure whether it is really necessary to have a Str compatible
> interface: the regular expressions are already different so exchanging
> the old against the new library would break code anyway.

	If the expressions were translated?

	BTW: I think some of the features of the regex are
parochial, and should be eliminated: support for case insensitive
matching, and matching 'words' etc should be dropped. Such things
might make sense in English, but are much too hard to build in
to a regexp facility correctly for internationalised text.

	By the way, how big can the DFA tables get?
Does it eliminate duplicate columns? 

	[Ocaml lex cannot support large enough tables for matching
ISO-10646 identifiers, when encoded using UTF-8. This is a real pain,
since all my languages specify UTF-8 encoded ISO-10646: I have to 
cheat, and assume 'almost everything' is a suitable character to
put in an identifier, and then check it afterwards. This makes it
hard to use use special symbols as tokens. I'm not sure why
this is, but I guess it doesn't eliminate duplicate columns?]

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net



  reply	other threads:[~2000-12-08  9:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-12-06  0:51 Markus Mottl
2000-12-07 16:01 ` John Max Skaller
2000-12-07 16:32   ` Markus Mottl
2000-12-07 17:08     ` John Max Skaller [this message]
2000-12-08  0:03       ` Markus Mottl
2000-12-08 17:52         ` John Max Skaller
2000-12-08  9:19       ` Alain Frisch
2000-12-08 18:11         ` John Max Skaller
2000-12-08 19:48           ` Alain Frisch
2000-12-09 17:07             ` John Max Skaller
2000-12-14 17:35   ` unicode support Nickolay Semyonov
2000-12-07 20:17 ` features of PCRE-OCaml Miles Egan
2000-12-08 12:30   ` Gerd Stolpmann
2000-12-08 15:05     ` Markus Mottl
2000-12-08 15:40       ` Gerd Stolpmann
2000-12-09  3:03         ` Markus Mottl
2000-12-09 13:12           ` Gerd Stolpmann
2000-12-10  0:32             ` Markus Mottl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3A2FC3FB.A0BB09DD@ozemail.com.au \
    --to=skaller@ozemail.com.au \
    --cc=caml-list@inria.fr \
    --cc=mottl@miss.wu-wien.ac.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).