caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Alain Frisch <frisch@clipper.ens.fr>
To: John Max Skaller <skaller@ozemail.com.au>
Cc: OCAML <caml-list@inria.fr>
Subject: Re: features of PCRE-OCaml
Date: Fri, 8 Dec 2000 10:19:51 +0100 (MET)	[thread overview]
Message-ID: <Pine.GSO.4.04.10012081013430.1583-100000@clipper.ens.fr> (raw)
In-Reply-To: <3A2FC3FB.A0BB09DD@ozemail.com.au>

On Fri, 8 Dec 2000, John Max Skaller wrote:

> 	[Ocaml lex cannot support large enough tables for matching
> ISO-10646 identifiers, when encoded using UTF-8. This is a real pain,
> since all my languages specify UTF-8 encoded ISO-10646: I have to 
> cheat, and assume 'almost everything' is a suitable character to
> put in an identifier, and then check it afterwards. This makes it
> hard to use use special symbols as tokens. I'm not sure why
> this is, but I guess it doesn't eliminate duplicate columns?]

Have a look at wlex:
http://www.eleves.ens.fr:8080/home/frisch/soft
http://www.eleves.ens.fr:8080/home/frisch/info/wlex-20001006.tar.gz


<< This package consists of a lexer generator and the associated runtime
system. The new lexing model adds a "classification" layer between the
lexbuf and the lexer itself. This layer classifies characters from the
lexbuf into a few number of classes, on which the regexps in the lexer
specification are built. 

 This reduces the number of states and transitions in the automaton,
especially when working with large encodings such as UTF-8 (the primary
motivation for wlex).  >>

The development release of pxp may use wlex (same lexer for different
encodings: UTF-8, Latin-1).

wlex is distributed as a patch to ocamllex.


-- 
  Alain Frisch



  parent reply	other threads:[~2000-12-08  9:23 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-12-06  0:51 Markus Mottl
2000-12-07 16:01 ` John Max Skaller
2000-12-07 16:32   ` Markus Mottl
2000-12-07 17:08     ` John Max Skaller
2000-12-08  0:03       ` Markus Mottl
2000-12-08 17:52         ` John Max Skaller
2000-12-08  9:19       ` Alain Frisch [this message]
2000-12-08 18:11         ` John Max Skaller
2000-12-08 19:48           ` Alain Frisch
2000-12-09 17:07             ` John Max Skaller
2000-12-14 17:35   ` unicode support Nickolay Semyonov
2000-12-07 20:17 ` features of PCRE-OCaml Miles Egan
2000-12-08 12:30   ` Gerd Stolpmann
2000-12-08 15:05     ` Markus Mottl
2000-12-08 15:40       ` Gerd Stolpmann
2000-12-09  3:03         ` Markus Mottl
2000-12-09 13:12           ` Gerd Stolpmann
2000-12-10  0:32             ` Markus Mottl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.GSO.4.04.10012081013430.1583-100000@clipper.ens.fr \
    --to=frisch@clipper.ens.fr \
    --cc=caml-list@inria.fr \
    --cc=skaller@ozemail.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).