caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: skaller <skaller@users.sourceforge.net>
To: Pierre Weis <pierre.weis@inria.fr>
Cc: Luc Maranget <Luc.Maranget@inria.fr>,
	jgoerzen@complete.org, caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] ocamllex/yacc and camlp4
Date: 17 Jun 2004 19:36:35 +1000	[thread overview]
Message-ID: <1087464994.16811.1476.camel@pelican.wigram> (raw)
In-Reply-To: <200406162248.AAA11976@pauillac.inria.fr>

On Thu, 2004-06-17 at 08:48, Pierre Weis wrote:

>  1) ocamllex and ocamlyacc implementation technologies are damned fast
> and it is difficult to compete with them using streams.

They're not so fast when your problem exceeds the constraints
which determine what they're good at. All my lexers generate
an in memory token list for this reason.

> Last but not least, the actual ocamllex/ocamlyacc implementations work
> pretty well, so that there is no clear necessity to rewrite them.

I think there is: they haven't worked so well for any of the
parsers I've had to write -- not even the Felix parser which
is specifically designed to be unambiguious LALR(1) 
and Ocamlyaccable.

The parser can't take a state argument, it can't accept
a token type, the generated interface can't be added to
by the client which is necessary when you need to define
a function callable by the lexer and parser which depends on the
type of a token, you can't use a meta-grammar notation with
the obvious interpretation (a* makes a list, a? an option).

Considerable effort is required to decouple the
faulty interface which makes the parser depend on
a lexbuf.

LALR(1) is very hard to work with, and often the easiest
workaround is by doing some lookahead in the tokeniser:
the coupling of the parser and lexer make this difficult.
Felix lexer/parser needs about 8 files: more than any
other part of the compiler.

Also Ocamllex is only an 8 bit lexer which isn't
that useful these days where XML/Web stuff demands UTF-8
encoded Unicode.

> In conclusion: pure Camlp4 implementation of ocamllex/ocamlyacc is
> still an interesting and challenging progamming task for the next few
> years, if you (or someone else) had the will and time to provide two
> ``great camlp4 examples'' to the rest of us...
> 
> Happy hacking :)

Ulex already integrates lexing and provides UTF-8,
Camomile already provides 32 bit lexers. Code exists.
The problem here isn't hacking the code, but getting
INRIA to agree to sit down and work with the community
on designing an interface specification for a facility
good enough to put directly in the standard distro.
Once that were agreed I'm quite sure the non-INRIA
community would rapidly provide an implementation.

As an added incentive: an integrated lexer automatically
provides a superior alternative to Str and PCRE, and if
you do a parser as well, an even better system: quite a
lot of 'regexp' stuff should actually be done by a real
parser.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  parent reply	other threads:[~2004-06-17  9:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-06-16  3:26 John Goerzen
2004-06-16  6:56 ` Luc Maranget
2004-06-16 22:48   ` Pierre Weis
2004-06-17  2:04     ` William Lovas
2004-06-17  9:42       ` skaller
2004-06-17  6:46     ` Alain Frisch
2004-06-17  9:36     ` skaller [this message]
2004-06-17  1:44   ` Shawn Wagner
2004-06-29  7:34 ` [Caml-list] Startconditions in ocamllex oliver
2004-06-29  8:00   ` Luc Maranget
2004-06-30  7:45     ` Hendrik Tews

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1087464994.16811.1476.camel@pelican.wigram \
    --to=skaller@users.sourceforge.net \
    --cc=Luc.Maranget@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=jgoerzen@complete.org \
    --cc=pierre.weis@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).