From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.6 required=5.0 tests=AWL,DNS_FROM_RFC_ABUSE, DNS_FROM_RFC_POST autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 21D4DBBAF for ; Sun, 8 Mar 2009 14:33:34 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AsACACdms0nZkrEHmWdsb2JhbACVKgEBAQEBCAsKBxGuEIEHjjsBBIQF X-IronPort-AV: E=Sophos;i="4.38,324,1233529200"; d="scan'208";a="24017856" Received: from web27007.mail.ukl.yahoo.com ([217.146.177.7]) by mail3-smtp-sop.national.inria.fr with SMTP; 08 Mar 2009 14:33:33 +0100 Received: (qmail 5368 invoked by uid 60001); 8 Mar 2009 13:33:32 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.fr; s=s1024; t=1236519212; bh=tbAk1MUdKvKnSGhXbrINQ32HGUHRqrzOXV0OdK0B2E0=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding; b=siiMZX3qqJN+hxHa6Kj4qqrvSV/fqDpzipLv+MpORFC6nFDBjcB2mZX0FoTDT3S9vr3nf2z/rw/FytPIUpjBZ/gDK9FiUTadKjaNM06UOwfEf4qLBvg3a94o/gpbuUTWWiVCFqidy8EgLSEFLuJm0mY0wP5OiNEyxm8CTXi+VpA= DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.fr; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding; b=4A6NJTS6TnWpHtOXhpKWb86oXHmGG/jq+YQslKga/pJJmzo1/3GoG/GicphrttGysfPPg+xKumRLM6LfhdPKmWQZWt+ot0bsTvBzKQ7rQaNg6TNIOVE3moG9UULPbJ6rJmfKnpEjIeh6mK4+aHwTEIcYMJtkmwgmd2sU5omHDxM=; Message-ID: <843379.3763.qm@web27007.mail.ukl.yahoo.com> X-YMail-OSG: tdLW8CoVM1mj8BEegzUGRAcnpZfdkJWRw4jlNfjwMO03G6Zaa1G_PUJpkpYQC.Xsv1..vgD6ihTloOVfBibQ5xa2i.FWXDsz2Z3Sr4SZ09nXg_m0Lt8GzaCE6_3plUkL2oqw1p3iAYhJxuqcNVKrmxON2e_PTIX1Utr9iFJzk1cgJT0Z2z8brG67TnbuIJTEsBOSu1J7sQ8B.uViaBhZxE8sog-- Received: from [82.242.132.106] by web27007.mail.ukl.yahoo.com via HTTP; Sun, 08 Mar 2009 13:33:32 GMT X-Mailer: YahooMailRC/1155.45 YahooMailWebService/0.7.289.1 References: <24D11586-4F15-4B6E-8FB7-58651317164D@gmail.com> <46331.52510.qm@web27007.mail.ukl.yahoo.com> <0B508092-FD71-4733-BC95-B6B87A6D3E6B@gmail.com> <154139.25342.qm@web27007.mail.ukl.yahoo.com> <46FCBABD-7E4A-4077-8227-3816FD6D635D@gmail.com> <279881.54090.qm@web27008.mail.ukl.yahoo.com> <58D957FE-F549-4B2F-9794-6A6651A20A29@gmail.com> Date: Sun, 8 Mar 2009 13:33:32 +0000 (GMT) From: Matthieu Wipliez Subject: Re : Re : Re : Re : [Caml-list] Re: camlp4 stream parser syntax To: Joel Reymont Cc: O'Caml Mailing List MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; matthieu:01 camlp:01 parser:01 syntax:01 hash:01 lowercase:01 camlp:01 parser:01 kwd:01 lexer:01 lowercase:01 parses:01 hash:01 preprocess:01 tokens:01 > > In this case, here is a possible solution, you have your hash table ass= ociate =0A> a lowercase version of the token with what you'd like to use in= the grammar:=0A> > "buytocover" =3D> "BuyToCover"=0A> > "sellshort" =3D> "= SellShort"=0A> > ...=0A> =0A> =0A> I'm doing this already but I don't think= it will do the trick with a camlp4 =0A> parser since it goes through is_kw= d to find a match when you use "delay".=0A=0AI've just tested the idea with= my lexer, in the rule identifier:=0A | identifier as ident {=0A if Str= ing.lowercase ident =3D "action" then=0A IDENT "ActioN"=0A else=0A = IDENT ident=0A=0Areplacing entries in the grammar that match against "= action" so they match against "ActioN".=0A=0AIn the source code, I have=0Ar= eload: ActIon in8:[i]=0Ashift: acTIon=0A=0AAnd Camlp4 parses it correctly. = I have a tentative explanation as why it works below:=0A=0A> I think that t= he internal keyword hash table in the grammar needs to be =0A> populated wi= th lowercase keywords (by invoking 'using'). I don't know how to get =0A> t= o the 'using' function yet, though.=0A=0AI don't think so, here is what hap= pens:=0A 1) you preprocess your grammar with camlp4of. This transforms the= EXTEND statements (and a lot of other stuff) to calls to Camlp4 modules/fu= nctions.=0AThe grammar parser is in the Camlp4GrammarParser module.=0AIn th= e rule "symbol", the entry | s =3D STRING -> matches strings (literal token= s) and produces a TXkwd s.=0AThis is later transformed by make_expr to an e= xpression Camlp4Grammar__.Skeyword s (quotation <:expr< $uid:gm$.Skeyword $= str:kwd$ >>)=0AWhat this means is that at compile time an entry=0A my_rule= : [ [ "BuyOrSell"; .. ] ]=0Agets transformed to an AST node=0A Skeyword "= BuyOrSell"=0A=0AYou can see that by running "camlp4of" on the parser. Every= rule gets transformed to a call to Gram.extend function, with Gram.Sopt, G= ram.Snterm, Gram.Skeyword etc.=0A=0A 2) At runtime, when you start your pr= ogram, all the Gram.extend calls are executed (because they are top-level).= Your parser is kind of configured.=0AIt turns out that extend is just a sy= nonym for Insert.extend=0A (last line of Static module)=0A=0A value exten= d =3D Insert.extend=0A=0AThis function will insert rules and tokens into Ca= mlp4. The insert_tokens function tells us that whenever a Skeyword is seen,= "using gram kwd" is called.=0AI believe this is the function you're referr= ing to?=0A=0AThis function calls Structure.using, which basically add a key= word if necessary, and increase its reference count. (I think this is to au= tomatically remove unused keywords, remember that Camlp4 can also delete ru= les, not only insert them).=0A=0A=0A=0ASo to sum up: when you declare a rul= e with a token "MyToken", the grammar is configured to recognize a "MyToken= " keyword.=0A=0ANow the lexer produces IDENT (or SYMBOL for that matters). = SYMBOLs are KEYWORDs by default. IDENTs become KEYWORDs if they match the k= eyword content.=0A=0ASo in our case, the lexer recognizes identifiers. If t= his identifier equals (case-insensitively speaking) "mytoken", we declare a= n IDENT "MyToken", which will be later recognized as the "MyToken" keyword = (because the is_kwd test is case-sensitive).=0A=0ACheers,=0AMatthieu=0A=0A>= =0A> Thanks, Joel=0A> =0A> ---=0A> http://tinyco.de=0A> Mac, C++, OCam= l=0A=0A=0A=0A