caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] lexer disambiguation?
@ 2001-08-31 21:09 Michael Leary
  2001-09-10 15:31 ` Xavier Leroy
  0 siblings, 1 reply; 2+ messages in thread
From: Michael Leary @ 2001-08-31 21:09 UTC (permalink / raw)
  To: caml

since the lexer looks like an ordinary ocaml function (more or less), does
the disambiguation boil down to:

1. the longest series of bytes that matches a single rule
2. match the first rule in the function that matches #1

and mechanically, if the input looks like:

some series of bytes ¥99¢  @#%#@$  \r with newlines

does it do a pattern match on (assuming whitespace is a token):

s(* match some rule *)
so(* match some rule *)
som(* match some rule *)
some(* match some rule *)
some (* match some rule, and also the whitespace rule, so do the some rule,
then repeat matching starting with the concatenation of the leftover ' '
and the next byte *)



-- 
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Caml-list] lexer disambiguation?
  2001-08-31 21:09 [Caml-list] lexer disambiguation? Michael Leary
@ 2001-09-10 15:31 ` Xavier Leroy
  0 siblings, 0 replies; 2+ messages in thread
From: Xavier Leroy @ 2001-09-10 15:31 UTC (permalink / raw)
  To: Michael Leary; +Cc: caml

> since the lexer looks like an ordinary ocaml function (more or less), does
> the disambiguation boil down to:
> 
> 1. the longest series of bytes that matches a single rule
> 2. match the first rule in the function that matches #1

I'm not sure which lexer you're talking about.

Lexers generated by ocamllex do indeed implement the behavior you
describe: longest match + first rule if several rules matches the same
maximal-length substring.  (But they sure don't look like ordinary
OCaml functions: they just call an underlying table-driven DFA engine
that does all the hard work!)

Lexers written using stream parsers behave like all stream parsers:
they select the first pattern that matches the beginning of the
stream, then "commit" to this pattern, matching the remainder of the
pattern without backtracking.  This "commit" behavior is different
from regular pattern-matching on (say) lists, which backtracks as
necessary.

The OCaml lexer (used by the compilers and the toplevel), as well as
the generic lexer in module Genlex, also implement the longest-match
rule, so that for instance abcd is one identifier, not four
identifiers a, b, c, and d.

I hope this answers your question.

- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2001-09-10 15:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-31 21:09 [Caml-list] lexer disambiguation? Michael Leary
2001-09-10 15:31 ` Xavier Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).