Re: [Caml-list] ocamllex+ocamlyacc and not parsing properly

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: skaller <skaller@users.sourceforge.net>
To: Jonathan Roewen <jonathan.roewen@gmail.com>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] ocamllex+ocamlyacc and not parsing properly
Date: Mon, 08 Aug 2005 18:59:23 +1000	[thread overview]
Message-ID: <1123491563.9947.42.camel@localhost.localdomain> (raw)
In-Reply-To: <ad8cfe7e05080721237a609e@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2121 bytes --]

On Mon, 2005-08-08 at 16:23 +1200, Jonathan Roewen wrote:
> Is there any way to call another rule based on some variable in
> ocamllex? I see you can pass arguments to a rule, but what use are
> these except in the actions part?

You are trying to do too much work in the wrong
places IMHO.

Your lexer should always be context insensitive
and build small sensible pretokens NOT tokens:

identifier
integer
whitespace
newline
: # .

Then postprocess these pretokens into tokens,
this is easiest with a list where you can use
pattern matching and functional techniques to
look ahead.

Then parse the tokens, this is easy because you just
choose the tokens to make it easy :)

Use the grammar production arguments $1 $2 ...
to do further processing in the action as required.

The point is: the easy stuff is done by the two
automata (lexer, parser) and the hard stuff
is done in OCAML code.

For example: Felix lexer generates these pretokens:

WHITESPACE NEWLINE COMMENT

which are NOT tokens of the grammar. These tokens
are stripped out by the preprocessor. You may actually
do this:

let pack_names tokens =
match tokens with
| NAME s1 :: WHITE s2:: NAME s3 :: t -> 
  NAME (String.concat [s1;s2;s3]) :: pack_names t
| WHITE :: t -> pack_names t
| h :: t -> h :: pack_names t
| [] -> []

Yeah, this isn't idea because it isn't tail recursive,
but it should illustrate the idea: do the hard stuff
in a language capable of handling the hard stuff easily .. :)

Tail rec version:

let pack output input = match input with
| NAME .... :: t -> 
  pack (NAME (Str.....) :: output) t
 ...
[] -> List.rev output

This technique can often be used to condition a
nasty language into an LALR(1) .. I managed to turn
Python into an LALR(1) languages .. but it took
17 preprocessing passes to do it .. :))
Mainly, fiddling with INDENT, UNDENT since Python
is based on indentation rules, but a lot of hassles
with the expression x,y, which is extremely hard
to parse (optional trailing comma ..)

-- 
John Skaller <skaller at users dot sourceforge dot net>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

     prev parent reply	other threads:[~2005-08-08  8:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-08-07 21:39 Jonathan Roewen
2005-08-07 21:54 ` Jonathan Roewen
2005-08-07 23:58 ` Jon Harrop
2005-08-08  2:17   ` Jonathan Roewen
2005-08-08  4:23     ` Jonathan Roewen
2005-08-08  5:03       ` Jonathan Roewen
2005-08-08  6:39         ` Jon Harrop
2005-08-08  6:47           ` Jonathan Roewen
2005-08-08  8:59       ` skaller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1123491563.9947.42.camel@localhost.localdomain \
    --to=skaller@users.sourceforge.net \
    --cc=caml-list@yquem.inria.fr \
    --cc=jonathan.roewen@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).