caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Menhir grammar with sequences delimited by same token
@ 2016-05-08  9:33 Dario Teixeira
  2016-05-08 10:27 ` Jacques-Henri Jourdan
  2016-05-08 13:35 ` Allan Wegan
  0 siblings, 2 replies; 8+ messages in thread
From: Dario Teixeira @ 2016-05-08  9:33 UTC (permalink / raw)
  To: caml-list

Hi,

(Sending this to Caml-list because Menhir-list is currently down.)

I've come across an interesting parsing problem, one for which I
wonder if there is a succinct solution in Menhir.  Suppose I want
to parse a markup which uses the same token for delimiting *both*
the beginning and the termination of a bold sequence (and likewise
for an emph sequence).  Basically this:

   inline:
     | TEXT               {Ast.Text $1}
     | BOLD inline* BOLD  {Ast.Bold $2}
     | EMPH inline* EMPH  {Ast.Emph $2}


Which of course has a shift/reduce conflict: if the token stream is
[BOLD; TEXT; BOLD; ...], what should the parser do upon encountering
the second BOLD -- start a new nesting level, or close the current
one?  I can force the latter behaviour by rearranging the grammar
so that an inline sequence within BOLDs cannot contain BOLD itself,
and likewise for EMPH:

   inline:
     | TEXT                        {Ast.Text $1}
     | BOLD inline_sans_bold* BOLD {Ast.Bold $2}
     | EMPH inline_sans_emph* EMPH {Ast.Emph $2}

   inline_sans_bold:
     | TEXT                        {Ast.Text $1}
     | EMPH inline_sans_emph* EMPH {Ast.Emph $2}

   inline_sans_emph:
     | TEXT                        {Ast.Text $1}
     | BOLD inline_sans_bold* BOLD {Ast.Bold $2}


For this simple example this approach is feasible, but blows up
into silliness for a real-world case where besides BOLD and EMPH I
have many other similar tokens.  Does Menhir offer a more succinct
solution to this problem?  (I reckon using the priority mechanism
somehow, but exactly how eludes me.)

Thanks in advance for your time!
Best regards,
Dario Teixeira


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-05-08 21:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-08  9:33 [Caml-list] Menhir grammar with sequences delimited by same token Dario Teixeira
2016-05-08 10:27 ` Jacques-Henri Jourdan
2016-05-08 11:57   ` Sébastien Hinderer
2016-05-08 14:16     ` Dario Teixeira
2016-05-08 13:43   ` Dario Teixeira
2016-05-08 21:29     ` Jacques-Henri Jourdan
2016-05-08 13:35 ` Allan Wegan
2016-05-08 14:19   ` Dario Teixeira

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).