From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 0BE37BC37 for ; Wed, 1 Jul 2009 16:17:55 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AngBAPwMS0qLEwExkWdsb2JhbACZCAEBAQEJCwoHEwWlSJBoBYQQ X-IronPort-AV: E=Sophos;i="4.42,323,1243807200"; d="scan'208";a="30506269" Received: from infao0809.mpi-sb.mpg.de (HELO hera.mpi-sb.mpg.de) ([139.19.1.49]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/AES256-SHA; 01 Jul 2009 16:17:54 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mpi-sb.mpg.de; s=mail200803; h=Message-ID:Date:From: MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type: Content-Transfer-Encoding; bh=Q71+UO5AhiZEPTiT3dMXS05rY/6rZSbW28 pvEO/vIaU=; b=DPK8U89DM6YWGEFBv+4HEvqqWdfHL2C0nSs1skgaYzTvX3KRQe Ta+cthyCZy0GkR6EBGqdwTSremuS5hkd6QMrAU/ofdOvOVuW90l8L8LoJBFLrkta MtfOz+x9U1fuqNEIO/VqbRTHAjOUeRQB/tNhJW/GoGOB/lRMR/cegyS/U= Received: from newzak.mpi-sb.mpg.de ([139.19.1.28]:48960) by hera.mpi-sb.mpg.de (envelope-from ) with esmtp (Exim 4.69) id 1MM0dC-0004h2-NV; Wed, 01 Jul 2009 16:17:53 +0200 Received: from swsao0703.ds.mpi-sws.mpg.de ([139.19.131.41]:59847) by newzak.mpi-sb.mpg.de with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1MM0dC-000852-8t; Wed, 01 Jul 2009 16:17:50 +0200 Message-ID: <4A4B700D.3060701@mpi-sws.org> Date: Wed, 01 Jul 2009 16:17:49 +0200 From: Andreas Rossberg User-Agent: Mozilla-Thunderbird 2.0.0.17 (X11/20081019) MIME-Version: 1.0 To: Mike Lin Cc: caml-list@inria.fr Subject: Re: [Caml-list] ocamllex and python-style indentation References: <7d8707de0906110557n6a1511a2k9f4f00827f954cb6@mail.gmail.com> <4A310A5B.9010404@ens-lyon.org> <7d8707de0906120120x10cc8fe0p54adbd189003f3da@mail.gmail.com> <1750DDE9-4FC6-4657-9687-58240F520A70@mpi-sws.org> <2a1a1a0c0906301913t1dce3eebie91c7cc94f015481@mail.gmail.com> <2a1a1a0c0907010702u19ad7ba6kd831a71fa1b7f2a5@mail.gmail.com> In-Reply-To: <2a1a1a0c0907010702u19ad7ba6kd831a71fa1b7f2a5@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam: no; 0.00; rossberg:01 rossberg:01 ocamllex:01 lexer:01 lexbuf:01 lexbuf:01 decr:01 ocaml:01 wrote:01 compilers:01 andreas:01 andreas:01 caml-list:01 parentheses:01 strings:01 Mike Lin wrote: > OK, now I'm curious :) how does your lexer match balanced parentheses, > or in this case comments? > Easily, with a bit of side effects (I think that's roughly how all ML compilers do it): ------------------------------------------------ let error l s = (* ... *) let commentDepth = ref 0 let start = ref 0 let loc length = let pos = !start in (pos, pos+length) rule lex = parse eof { EOF } (* | ... *) | "{-" { start := pos lexbuf; lexNestComment lexbuf } and lexNestComment = parse eof { error (loc 2) "unterminated comment" } | "(*" { incr commentDepth; lexNestComment lexbuf } | "*)" { decr commentDepth; if !commentDepth > 0 then lexNestComment lexbuf else lex lexbuf } | _ { lexNestComment lexbuf } ------------------------------------------------ If you also want to treat strings in comments specially (like OCaml), then you need to do a bit more work, but it's basically the same idea. - Andreas