From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gclci-caml-list@m.gmane.org>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,SPF_FAIL 
	autolearn=disabled version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105])
	by yquem.inria.fr (Postfix) with ESMTP id 1609DBC37
	for <caml-list@yquem.inria.fr>; Wed,  1 Jul 2009 17:03:57 +0200 (CEST)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AvoBAPsXS0pQW+UCe2dsb2JhbACZCAEBFiQEtl+EEAU
X-IronPort-AV: E=Sophos;i="4.42,324,1243807200"; 
   d="scan'208";a="42759971"
Received: from main.gmane.org (HELO ciao.gmane.org) ([80.91.229.2])
  by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/AES256-SHA; 01 Jul 2009 17:03:40 +0200
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1MM1LX-00025V-6q
	for caml-list@inria.fr; Wed, 01 Jul 2009 15:03:39 +0000
Received: from ks300734.kimsufi.com ([91.121.65.225])
        by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
        id 1AlnuQ-0007hv-00
        for <caml-list@inria.fr>; Wed, 01 Jul 2009 15:03:39 +0000
Received: from sylvain by ks300734.kimsufi.com with local (Gmexim 0.1 (Debian))
        id 1AlnuQ-0007hv-00
        for <caml-list@inria.fr>; Wed, 01 Jul 2009 15:03:39 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: caml-list@inria.fr
From: Sylvain Le Gall <sylvain@le-gall.net>
Subject:  Re: ocamllex and python-style indentation
Date: Wed, 1 Jul 2009 15:03:20 +0000 (UTC)
Message-ID:  <slrnh4mulo.hil.sylvain@gallu.homelinux.org>
References:  <7d8707de0906110557n6a1511a2k9f4f00827f954cb6@mail.gmail.com>
 <4A310A5B.9010404@ens-lyon.org>
 <7d8707de0906120120x10cc8fe0p54adbd189003f3da@mail.gmail.com>
 <FBA1153F-776B-47FF-B267-22504D045671@mpi-sws.org>
 <E47AC31E-BF02-4440-A0BD-EB4B2D90182A@research.att.com>
 <1750DDE9-4FC6-4657-9687-58240F520A70@mpi-sws.org>
 <2a1a1a0c0906301913t1dce3eebie91c7cc94f015481@mail.gmail.com>
 <C9178E58-587B-47CA-A649-4019AB230FAD@mpi-sws.org>
 <2a1a1a0c0907010702u19ad7ba6kd831a71fa1b7f2a5@mail.gmail.com>
 <4A4B700D.3060701@mpi-sws.org>
Mime-Version:  1.0
Content-Type:  text/plain; charset=us-ascii
Content-Transfer-Encoding:  7bit
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: ks300734.kimsufi.com
User-Agent: slrn/pre1.0.0-2 (Linux)
Sender: news <news@ger.gmane.org>
X-Spam: no; 0.00; le-gall:01 ocamllex:01 rossberg:01 rossberg:01 lexer:01 lexbuf:01 lexbuf:01 decr:01 ocaml:01 wrote:01 wrote:01 compilers:01 andreas:01 parentheses:01 strings:01 

Hello,

On 01-07-2009, Andreas Rossberg <rossberg@mpi-sws.org> wrote:
> Mike Lin wrote:
>> OK, now I'm curious :) how does your lexer match balanced parentheses,
>> or in this case comments?
>>   
>
> Easily, with a bit of side effects (I think that's roughly how all ML 
> compilers do it):
>
> ------------------------------------------------
> let error l s = (* ... *)
> let commentDepth = ref 0
> let start = ref 0
> let loc length = let pos = !start in (pos, pos+length)
>
> rule lex =
>     parse eof            { EOF }
>     (* | ... *)
>     | "{-"            { start := pos lexbuf;
>                   lexNestComment lexbuf }
>
> and lexNestComment =
>     parse eof            { error (loc 2) "unterminated comment" }
>     | "(*"            { incr commentDepth;
>                   lexNestComment lexbuf }
>     | "*)"            { decr commentDepth;
>                   if !commentDepth > 0
>                   then lexNestComment lexbuf
>                   else lex lexbuf }
>     | _            { lexNestComment lexbuf }
> ------------------------------------------------
>
> If you also want to treat strings in comments specially (like OCaml), 
> then you need to do a bit more work, but it's basically the same idea.
>

May I recommend you to write this in a more simple way:

-------------------------------------------------------------------------
rule lex =
  parse eof    { () }
  | "(*"       { start := pos lexbuf; lexNestComment lexbuf; lex lexbuf }

and lexNestComment =
  parse eof    { error (loc 2) "unterminated comment" }
| "(*"         { lexNestComment lexbuf }
| "*)"         { () }
| _            { lexNestComment lexbuf }
-------------------------------------------------------------------------

I think it works the same way, except that it uses less global
variables.

Regards,
Sylvain Le Gall