From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id A5702BC37 for ; Tue, 10 Nov 2009 21:02:56 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqwFAMNV+UpDww+Ld2dsb2JhbACEco0agQCHQYE5AQwKCQkTsU0ogTmGTohqAQYEAYEtgjhUBA X-IronPort-AV: E=Sophos;i="4.44,718,1249250400"; d="scan'208";a="50128544" Received: from web111502.mail.gq1.yahoo.com ([67.195.15.139]) by mail4-smtp-sop.national.inria.fr with SMTP; 10 Nov 2009 21:02:55 +0100 Received: (qmail 56863 invoked by uid 60001); 10 Nov 2009 20:02:54 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1257883373; bh=w+tXlemizjFZeMbgkN79M7TzLlx/pPWIAYAb90ABOvY=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=r7GINCCL7pmIxzy0SFSkcZ2CYt6F2k3UBCWvje8ldH1TcVuH6zrbQMPVoJQfXOzzcl9PafRuPFD5HXrWQ+Rl4IPUqV4TBSo7LIZ3AFApTWJGEgilPAx1jJb1elKrof1Vpm3fAkIQsmLqAkLRF2MYQdgg2WAzpQe1UCfPXJjU47o= DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=IB6eCUn2VS+gLv1VnmEj6p6a3YGWxkJkyw6+f0pcolBUXGO0RjFdXRIKJLFEtuwKVG9oQ0S8ilMNfGX4x6dA7ATPkOVPQa7og9ojlgzpUV+r+Lj0r4pwCRzgZj+4DTDVbzGi1rJhz8/hnVjrhIA4MxgnSgd8x1zd7GsCSVKOuHs=; Message-ID: <967420.15683.qm@web111502.mail.gq1.yahoo.com> X-YMail-OSG: I0Wf6QAVM1mmtYI0gc3oArsQsX_mNQX1a3hIydU3m0XOaB7FXClhwzA.BDZcGFdxTUb76W8ESe_Tqex1BmDHRrS4bwyMnoC2japyVrL0SRWXUgxDjW2lvnIJAzUPMWdIdBzkRCO.b6PPX1HVvkTPCm2kVw95QmNcBFdqSFA_aa3TRUfGjdgHObHTOz7lmYCzG8Eaz.BU036fG4UGvY6xYsOrM7ZtJ0zmr8lVXS_TIUUkVkfFSylBwVowkS4fuszFwwX6EMHu3M13BvMkYOug1zJLxC_SiQ385mPC7VbsuGnuDNoft954Z9iTUTpOTgJ6Cl0ciSYx Received: from [213.205.70.216] by web111502.mail.gq1.yahoo.com via HTTP; Tue, 10 Nov 2009 12:02:53 PST X-Mailer: YahooMailClassic/8.1.6 YahooMailWebService/0.7.361.4 Date: Tue, 10 Nov 2009 12:02:53 -0800 (PST) From: Dario Teixeira Subject: RE: [Caml-list] The lexer hack To: caml-list@yquem.inria.fr, David Allsopp In-Reply-To: <003001ca622c$41f99840$c5ecc8c0$@metastack.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; lexer:01 expander:01 syntax:01 language's:01 syntax:01 parser:01 lexer:01 parser:01 inserting:01 cheers:01 hacks:01 token:01 token:01 verb:01 verb:01 Hi,=0A=0A> Out of interest, how LaTeX-ish do you mean? I would hazard=0A> a= guess that it's impossible to parse an unrestricted TeX=0A> file using an = LR grammar (or at least no more clear than a=0A> hand-coded automaton) beca= use you have to execute the macro=0A> expander in order to parse the file *= completely* correctly.=0A> However, if you only mean LaTeX-ish in syntax (i= .e. the=0A> files aren't actually TeX files) then you don't have to=0A> wor= ry about TeX's elegant (by which I mean terrifying)=0A> \catcode mechanism = and macro language!=0A=0AI developed the language's syntax in tandem with t= he parser/lexer=0Aso I made sure it was LR-friendly and Ulex-friendly (the = verbatim=0Aenvironments are the only parsing-unfriendly features). The lan= guage=0Alooks and feels like LaTeX, but without the hairy stuff...=0A=0AInc= identally, the dummy token/action trick seems to be working=0Afine with Men= hir. Since the parser will look ahead one token,=0AI just have a tokenizer= sitting between the lexer and the parser,=0Aand inserting a DUMMY token in= to the stream after any token that=0Aprecedes a dummy action:=0A=0Ainline:= =0A | (...)=0A | BEGIN_VERBATIM enter_verb DUMMY RAW exit_verb END_VERBAT= IM {...}=0A | (...)=0A=0Aenter_verb: /*empty*/ {Global.context :=3D Global= .Verbatim}=0Aexit_verb: /*empty*/ {Global.context :=3D Global.General}=0A= =0A=0AIt's not the prettiest thing in the world (and I suspect I might=0Ast= ill find some problem with it), but as far as lexer hacks go=0Ait's not bad= and a lot better than building a state machine.=0A=0ACheers,=0ADario Teixe= ira=0A=0A=0A=0A