Re: The lexer hack

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Re: The lexer hack
@ 2009-11-10 15:26 Jeff Shaw
  2009-11-10 16:26 ` [Caml-list] " Dario Teixeira
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Shaw @ 2009-11-10 15:26 UTC (permalink / raw)
  To: caml-list

Dario,
You could write your lexers in Menhir and make them part of your 
grammar. I know this isn't a terribly easy solution but it would be 
elegant IMO.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Re: The lexer hack
  2009-11-10 15:26 The lexer hack Jeff Shaw
@ 2009-11-10 16:26 ` Dario Teixeira
  2009-11-10 16:33   ` Francois Pottier
  0 siblings, 1 reply; 6+ messages in thread
From: Dario Teixeira @ 2009-11-10 16:26 UTC (permalink / raw)
  To: caml-list, Jeff Shaw

Hi,

> You could write your lexers in Menhir and make them part of
> your grammar. I know this isn't a terribly easy solution but
> it would be elegant IMO.

I thought of scannerless parsing, but the input is UTF8 encoded,
which makes using Ulex all the more convenient and less error-prone.

Anyway, I was looking at Dypgen's early actions, and an idea
occurred to me: I can create dummy empty actions that simply
change a global "parsing context" variable:

inline:
  | (...)
  | BEGIN_VERB enter_verb RAW END_VERB exit_verb {Ast.Verbatim $3}
  | (...)

enter_verb:  /* empty */  {Global.context := Global.Verbatim}
exit_verb:   /* empty */  {Global.context := Global.General}


Still hackish, but better than creating a state machine...

Cheers,
Dario Teixeira






^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Re: The lexer hack
  2009-11-10 16:26 ` [Caml-list] " Dario Teixeira
@ 2009-11-10 16:33   ` Francois Pottier
  2009-11-10 16:48     ` Dario Teixeira
  0 siblings, 1 reply; 6+ messages in thread
From: Francois Pottier @ 2009-11-10 16:33 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list, Jeff Shaw


Hello,

On Tue, Nov 10, 2009 at 08:26:23AM -0800, Dario Teixeira wrote:
> Anyway, I was looking at Dypgen's early actions, and an idea
> occurred to me: I can create dummy empty actions that simply
> change a global "parsing context" variable:
> 
> inline:
>   | (...)
>   | BEGIN_VERB enter_verb RAW END_VERB exit_verb {Ast.Verbatim $3}
>   | (...)
> 
> enter_verb:  /* empty */  {Global.context := Global.Verbatim}
> exit_verb:   /* empty */  {Global.context := Global.General}
> 
> 
> Still hackish, but better than creating a state machine...

Interesting. Have you confirmed that this works? I am slightly worried by the
fact that an LR parser reads one token ahead, i.e. one token past BEGIN_VERB
might already have been read before the enter_verb semantic action is
executed. If that is so, then this token would be read while the lexer is
still in the wrong mode.

-- 
François Pottier
Francois.Pottier@inria.fr
http://gallium.inria.fr/~fpottier/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Re: The lexer hack
  2009-11-10 16:33   ` Francois Pottier
@ 2009-11-10 16:48     ` Dario Teixeira
  2009-11-11 11:03       ` Martin Jambon
  0 siblings, 1 reply; 6+ messages in thread
From: Dario Teixeira @ 2009-11-10 16:48 UTC (permalink / raw)
  To: Francois.Pottier; +Cc: caml-list, Jeff Shaw

Hi,

> Interesting. Have you confirmed that this works? I am slightly
> worried by the fact that an LR parser reads one token ahead,
> i.e. one token past BEGIN_VERB might already have been read
> before the enter_verb semantic action is executed. If that is
> so, then this token would be read while the lexer is still in
> the wrong mode.

Yes, I was just thinking about that as well... :-)
I think I can pile another hack on top of the dummy action:
dummy tokens to take care of the readahead issue.  Though
this has the potential to get comically silly pretty quickly!

I'll report later...

cheers,
Dario Teixeira






^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Re: The lexer hack
  2009-11-10 16:48     ` Dario Teixeira
@ 2009-11-11 11:03       ` Martin Jambon
  2009-11-14 18:19         ` Dario Teixeira
  0 siblings, 1 reply; 6+ messages in thread
From: Martin Jambon @ 2009-11-11 11:03 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: Francois.Pottier, Jeff Shaw, caml-list

Dario Teixeira wrote:
> Hi,
> 
>> Interesting. Have you confirmed that this works? I am slightly
>> worried by the fact that an LR parser reads one token ahead,
>> i.e. one token past BEGIN_VERB might already have been read
>> before the enter_verb semantic action is executed. If that is
>> so, then this token would be read while the lexer is still in
>> the wrong mode.
> 
> Yes, I was just thinking about that as well... :-)
> I think I can pile another hack on top of the dummy action:
> dummy tokens to take care of the readahead issue.  Though
> this has the potential to get comically silly pretty quickly!
> 
> I'll report later...

If the lexer to use can be determined by only one token (BEGIN_VERB), I think
you can change the state in the lexer like this:

rule token state = parse
 ""   { match !state with
             `Normal -> normal_token state lexbuf
           | `Verbatim -> verbatim_token state lexbuf
      }

and normal_token state = parse
  ...
| "\\begin{verbatim}"   { state := `Verbatim; BEGIN_VERB }

and verbatim_token state = parse
  ...                  { RAW (...) }
| "\\end{verbatim}"    { state := `Normal; END_VERB }



An even simpler option, if possible in your case, is to use a single token for
the whole verbatim section:

rule token = parse
  ...
| "\\begin{verbatim}"   { finish_verbatim lexbuf }

and finish_verbatim = shortest
  _* as s "\\end{verbatim}"   { RAW s }




Martin

-- 
http://mjambon.com/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Re: The lexer hack
  2009-11-11 11:03       ` Martin Jambon
@ 2009-11-14 18:19         ` Dario Teixeira
  0 siblings, 0 replies; 6+ messages in thread
From: Dario Teixeira @ 2009-11-14 18:19 UTC (permalink / raw)
  To: Martin Jambon; +Cc: caml-list

Hi,

> If the lexer to use can be determined by only one token
> (BEGIN_VERB), I think you can change the state in the lexer
> like this:

Unfortunately the language features some verbatim-like environments
where choosing the right scanner entails knowing more about the
context than what is available to the lexer.  As an example, in
the command \link{url}{text}, the "url" should be interpreted
verbatim, whereas the "text" portion uses the general scanner.
The parser is fully aware of the context, of course, which is
why the "dummy action" solution worked fine.

(And yes, I thought of having different delimiters for different
scanning contexts, but in many ways it would make the language
more cumbersome for the user).

Cheers,
Dario Teixeira

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-11-14 18:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-10 15:26 The lexer hack Jeff Shaw
2009-11-10 16:26 ` [Caml-list] " Dario Teixeira
2009-11-10 16:33   ` Francois Pottier
2009-11-10 16:48     ` Dario Teixeira
2009-11-11 11:03       ` Martin Jambon
2009-11-14 18:19         ` Dario Teixeira

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).