From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 90DE4BB91 for ; Fri, 28 Jan 2005 10:04:18 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j0S94Idb024596 for ; Fri, 28 Jan 2005 10:04:18 +0100 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id KAA27600 for ; Fri, 28 Jan 2005 10:04:17 +0100 (MET) Received: from ext.lri.fr (ext.lri.fr [129.175.15.4]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j0S94HLH005021 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Fri, 28 Jan 2005 10:04:17 +0100 Received: from localhost (localhost [127.0.0.1]) by ext.lri.fr (Postfix) with ESMTP id 2D58B19E70E; Fri, 28 Jan 2005 10:04:17 +0100 (CET) Received: from ext.lri.fr ([127.0.0.1]) by localhost (ext.lri.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 26181-01; Fri, 28 Jan 2005 10:04:17 +0100 (CET) Received: from smtp.lri.fr (serveur3-5 [129.175.3.5]) by ext.lri.fr (Postfix) with ESMTP id 1A32619E620; Fri, 28 Jan 2005 10:04:17 +0100 (CET) Received: from pc8-142 (pc8-142 [129.175.8.142]) by smtp.lri.fr (Postfix) with ESMTP id CE109CED98; Fri, 28 Jan 2005 10:04:15 +0100 (CET) Received: from filliatr by pc8-142 with local (Exim 3.36 #1 (Debian)) id 1CuS35-0002Ip-00; Fri, 28 Jan 2005 10:04:15 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Message-ID: <16890.15.809674.392193@gargle.gargle.HOWL> Date: Fri, 28 Jan 2005 10:04:15 +0100 To: Erik de Castro Lopo Cc: Subject: Re: [Caml-list] yacc style In-Reply-To: <20050128132812.7001512c.ocaml-erikd@mega-nerd.com> References: <875c7e0705012712177a9e852@mail.gmail.com> <20050128083956.7dc72787.ocaml-erikd@mega-nerd.com> <1106874879.12114.145.camel@pelican.wigram> <20050128132812.7001512c.ocaml-erikd@mega-nerd.com> X-Mailer: VM 7.18 under Emacs 21.3.1 From: Jean-Christophe Filliatre X-Virus-Scanned: by amavisd-new at lri.fr X-Miltered: at nez-perce with ID 41FA0012.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 41FA0011.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 filliatre:01 filliatre:01 lri:01 sourceforge:01 wrote:01 wrote:01 parser:01 parsing:01 lexer:01 tokens:01 pointer:01 casts:01 typedef:01 parser:01 X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.0 X-Spam-Level: Erik de Castro Lopo writes: > skaller wrote: > > On Fri, 2005-01-28 at 08:39, Erik de Castro Lopo wrote: > > > > > > Yes, normally the parser generates a parse tree which is then > > > passed to the semantic analyser for semantic checking. > > > > Unfortunately this is useless in the common case > > of needing to parse C. > > Could you give a example? When parsing C, the lexer must produce different tokens for variables identifiers and types identifiers, otherwise you may misinterpret things like "a * b" (is it the declaration of a pointer b or a multiplication?) or casts. The following piece of code is illustrating the difficulty: ====================================================================== int a, b; typedef int t, u; void f1() { a * b; } void f2() { t * u; } void f3() { t * b; } void f4() { int t; t * b; } void f5(t u, unsigned t) { switch ( t ) { case 0: if ( u ) default: return; } } ====================================================================== The solution is to have the parser modifying the lexer while parsing. This is quite ugly in practice. The CIL framework includes a full C parser written in ocaml, so you can get there one possible way of handling this issue; see http://manju.cs.berkeley.edu/cil/ Hope this helps, -- Jean-Christophe Filliātre (http://www.lri.fr/~filliatr)