From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,SPF_FAIL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 1609DBC37 for ; Wed, 1 Jul 2009 17:03:57 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvoBAPsXS0pQW+UCe2dsb2JhbACZCAEBFiQEtl+EEAU X-IronPort-AV: E=Sophos;i="4.42,324,1243807200"; d="scan'208";a="42759971" Received: from main.gmane.org (HELO ciao.gmane.org) ([80.91.229.2]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/AES256-SHA; 01 Jul 2009 17:03:40 +0200 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1MM1LX-00025V-6q for caml-list@inria.fr; Wed, 01 Jul 2009 15:03:39 +0000 Received: from ks300734.kimsufi.com ([91.121.65.225]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 01 Jul 2009 15:03:39 +0000 Received: from sylvain by ks300734.kimsufi.com with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 01 Jul 2009 15:03:39 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: caml-list@inria.fr From: Sylvain Le Gall Subject: Re: ocamllex and python-style indentation Date: Wed, 1 Jul 2009 15:03:20 +0000 (UTC) Message-ID: References: <7d8707de0906110557n6a1511a2k9f4f00827f954cb6@mail.gmail.com> <4A310A5B.9010404@ens-lyon.org> <7d8707de0906120120x10cc8fe0p54adbd189003f3da@mail.gmail.com> <1750DDE9-4FC6-4657-9687-58240F520A70@mpi-sws.org> <2a1a1a0c0906301913t1dce3eebie91c7cc94f015481@mail.gmail.com> <2a1a1a0c0907010702u19ad7ba6kd831a71fa1b7f2a5@mail.gmail.com> <4A4B700D.3060701@mpi-sws.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: ks300734.kimsufi.com User-Agent: slrn/pre1.0.0-2 (Linux) Sender: news X-Spam: no; 0.00; le-gall:01 ocamllex:01 rossberg:01 rossberg:01 lexer:01 lexbuf:01 lexbuf:01 decr:01 ocaml:01 wrote:01 wrote:01 compilers:01 andreas:01 parentheses:01 strings:01 Hello, On 01-07-2009, Andreas Rossberg wrote: > Mike Lin wrote: >> OK, now I'm curious :) how does your lexer match balanced parentheses, >> or in this case comments? >> > > Easily, with a bit of side effects (I think that's roughly how all ML > compilers do it): > > ------------------------------------------------ > let error l s = (* ... *) > let commentDepth = ref 0 > let start = ref 0 > let loc length = let pos = !start in (pos, pos+length) > > rule lex = > parse eof { EOF } > (* | ... *) > | "{-" { start := pos lexbuf; > lexNestComment lexbuf } > > and lexNestComment = > parse eof { error (loc 2) "unterminated comment" } > | "(*" { incr commentDepth; > lexNestComment lexbuf } > | "*)" { decr commentDepth; > if !commentDepth > 0 > then lexNestComment lexbuf > else lex lexbuf } > | _ { lexNestComment lexbuf } > ------------------------------------------------ > > If you also want to treat strings in comments specially (like OCaml), > then you need to do a bit more work, but it's basically the same idea. > May I recommend you to write this in a more simple way: ------------------------------------------------------------------------- rule lex = parse eof { () } | "(*" { start := pos lexbuf; lexNestComment lexbuf; lex lexbuf } and lexNestComment = parse eof { error (loc 2) "unterminated comment" } | "(*" { lexNestComment lexbuf } | "*)" { () } | _ { lexNestComment lexbuf } ------------------------------------------------------------------------- I think it works the same way, except that it uses less global variables. Regards, Sylvain Le Gall