From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: weis Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA11917 for caml-redistribution; Fri, 3 Dec 1999 19:29:13 +0100 (MET) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA01609 for ; Fri, 3 Dec 1999 19:24:03 +0100 (MET) Received: from ruby (itchy87.zip.com.au [61.8.12.87]) by nez-perce.inria.fr (8.8.7/8.8.7) with ESMTP id TAA28761 for ; Fri, 3 Dec 1999 19:23:58 +0100 (MET) Received: from maxtal.com.au (IDENT:root@localhost [127.0.0.1]) by ruby (8.9.3/8.9.3) with ESMTP id FAA30648; Sat, 4 Dec 1999 05:18:57 +1100 Sender: weis Message-ID: <38480990.4223BF76@maxtal.com.au> Date: Sat, 04 Dec 1999 05:18:56 +1100 From: skaller Organization: Maxtal X-Mailer: Mozilla 4.51 [en] (X11; I; Linux 2.2.12 i586) X-Accept-Language: en MIME-Version: 1.0 To: Hendrik Tews CC: "caml-list@inria.fr" Subject: Re: "nested" parsers References: <38467330.86DA3330@inrets.fr> <199912031331.OAA13008@ithif20.inf.tu-dresden.de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hendrik Tews wrote: > > Georges Mariano writes: > Date: Thu, 02 Dec 1999 13:25:04 +0000 > Subject: "nested" parsers > > It appears that we can divide the language L in a few "sub"-languages. > Let's say that L3 <: L2 <: L1 = L ('<:' included in ) > > And we would like to have one entry point for each language > in our parser. > > ocamlyacc generates a parsing function for echa start symbol that > you declare in the grammar file. Therefore the easiest way (if > possible) is that you rewrite your grammar, such that for each of > the languages Ln you have one metasymbol, which generates this > language. Then you include several start directives in the header > of the grammar file. But it isn't clear how to invoke these sublanguages recursively, from with an action associated with a reduce operation for the very sublanguage which we wanted a separate parser for. For example, the Python language can conveniently be divided into two distinct languages: a statement language and an expression language. Certainly, I can have a non-terminal for statements, and one for expressions, and make both entry points --- and in my parser for Python I do just that. But that isn't what I want to do. What I actually want is a recursive transition machine corresponding to a meta-grammar (RHS of productions can be regular expressions; for example, BNF] This can be organised by a finite state automaton in which a non-terminal transition pushes down the whole automaton, and starts a new one corresponding to the RHS of the production defining the non-terminal labelling the arc being followed. Chosing the right arc is the hard bit. If the first sets of all the arcs out of a node are known, the number of trial parses can be reduced -- to one, if the first sets are disjoint. -- John Skaller, mailto:skaller@maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia homepage: http://www.maxtal.com.au/~skaller voice: 61-2-9660-0850