From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: ** X-Spam-Status: No, score=2.8 required=5.0 tests=AWL,DNS_FROM_RFC_POST, SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id BC6E7BC37 for ; Tue, 30 Jun 2009 22:19:52 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: As0BABMQSkrRVdSunGdsb2JhbACORol/PwEBAQEBCAsICROmd4EakAMBAwIEhAsF X-IronPort-AV: E=Sophos;i="4.42,318,1243807200"; d="scan'208";a="30453159" Received: from mail-vw0-f174.google.com ([209.85.212.174]) by mail3-smtp-sop.national.inria.fr with ESMTP; 30 Jun 2009 22:19:51 +0200 Received: by vwj4 with SMTP id 4so138775vwj.1 for ; Tue, 30 Jun 2009 13:19:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=jdAAqzdM7lmAPGDwtEYw3L0qvYj7oZ2nKfGKc/TWRp8=; b=lfzMD61ZkOcCcyUAXKC9NSyTgEINH2yfZ+qNMz91NRK50lGW720dSWeq4Tffz660P4 OJaaT9ti7Nt3dh3It9wUIN7LpJSmxCbz2SgKTf5VExeRAGvmZDWRlTv9BlvgRn9JGI1G xWk1GE8CvU0q243qq+gJNBKJC4wka8dNN6EAM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=B9enthVhFEU8ZZ3xuCso3A+35ArBnYRCFJfW0wkR7JQymhsCxFgjIL5lrbvR/rWKYP sy3z8lml954gZGp8ibjgLfV/DmzojsoI056Fm8oPiqQVCPkcCEheMKrxK3oBIpwyFVue V/yE3IwPgz4CnsYq8YBMhWM5krYlnqy+J3Z/0= MIME-Version: 1.0 Received: by 10.220.98.17 with SMTP id o17mr7117077vcn.86.1246393191018; Tue, 30 Jun 2009 13:19:51 -0700 (PDT) In-Reply-To: References: <7d8707de0906110557n6a1511a2k9f4f00827f954cb6@mail.gmail.com> <4A310A5B.9010404@ens-lyon.org> <7d8707de0906120120x10cc8fe0p54adbd189003f3da@mail.gmail.com> Date: Tue, 30 Jun 2009 16:19:50 -0400 Message-ID: <2a1a1a0c0906301319q77c20932gd6b879f5af212028@mail.gmail.com> Subject: Re: [Caml-list] ocamllex and python-style indentation From: Mike Lin To: Yitzhak Mandelbaum Cc: Andreas Rossberg , caml-list@inria.fr, Andrej Bauer Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; ocamllex:01 literals:01 ocaml:01 minimally:01 yitzhak:01 mandelbaum:01 yitzhak:01 parens:01 rossberg:01 andrej:01 inserting:01 tokens:01 separators:01 separators:01 haskell:01 More generally, you've got parentheses, comments, and string literals, and you need to know to ignore whitespace within any of those -- and to ignore e.g. an open parenthesis that occurs within a comment, or a close comment that occurs within a string literal. So inevitably you've got to lex and parse at some level to make this work for a practical language. There are still some byzantine cases that ocaml+twt doesn't handle properly, and I think it probably gets pretty close to the minimally complex yet practically usable approach to this. Mike On Tue, Jun 30, 2009 at 2:58 PM, Yitzhak Mandelbaum wrote: > To restart this thread, do your solutions handle the following (legal) > variation of the original example? > if True: > =A0=A0 x =3D 3+4 > =A0=A0 y =3D (2 + > 4 + 5) > =A0=A0 z =3D 5 > else: > =A0=A0 x =3D 5 > =A0=A0 if False: > =A0=A0 =A0 =A0 x =3D 8 > =A0=A0 =A0 =A0 z =3D 2 > > Notice that the assignment of y wraps onto the next line at an *earlier* > column. This is legal b/c it is surrounded by parens. However, it seems t= hat > the preprocessing approaches will fail for this example. Do you have a > workaround? > --Yitzhak > > On Jun 12, 2009, at 11:43 AM, Andreas Rossberg wrote: > > On Jun 12, 2009, at 10.20 h, Andrej Bauer wrote: > > I think I understand the general idea of inserting "virtual" tokens, > but the details confuse me still. So starting with > > if True: > > =A0=A0=A0x =3D 3 > > =A0=A0=A0y =3D (2 + > > =A0=A0=A0=A0=A04 + 5) > > else: > > =A0=A0=A0x =3D 5 > > =A0=A0=A0if False: > > =A0=A0=A0=A0=A0=A0=A0x =3D 8 > > =A0=A0=A0=A0=A0=A0=A0z =3D 2 > > Martin suggests the following: > > { > > if True: > > ; > > =A0=A0{ > > =A0=A0x =3D 3 > > =A0=A0; > > =A0=A0y =3D (2 + > > =A0=A0; > > =A0=A0=A0=A0{ > > =A0=A0=A0=A04 + 5) > > =A0=A0=A0=A0} > > =A0=A0} > > ; > > else: > > ; > > =A0=A0{ > > =A0=A0x =3D 5 > > =A0=A0; > > =A0=A0if False: > > =A0=A0; > > =A0=A0=A0=A0=A0=A0{ > > =A0=A0=A0=A0=A0=A0x =3D 8 > > =A0=A0=A0=A0=A0=A0; > > =A0=A0=A0=A0=A0=A0z =3D 2 > > =A0=A0=A0=A0=A0=A0} > > =A0=A0} > > } > > I have two questions. Notice that the { ... } and ( ... ) need not be > correctly nested (in the top half), so how are we going to deal with > this? The second question is, why are there the separators after and > just before "else:". I would expect separators inside { .... }, but > not around "else". > > It depends on how exactly you define your layout rules. The usual approac= h > is to tie start of layout-sensitive blocks to particular keywords -- this= is > essentially what Python and Haskell do. In that case, the binding to y is > not affected.=A0Haskell's rules for optional layout would rewrite your > original program as > > if True: > > =A0=A0 {x =3D 3 > > =A0=A0 ;y =3D (2 + > > =A0=A0=A0=A0=A04 + 5) > > }else: > > =A0=A0 {x =3D 5 > > =A0=A0 ;if False: > > =A0=A0 =A0 =A0 {x =3D 8 > > =A0=A0 =A0 =A0 ;z =3D 2 > > }} > > The basic rules are fairly simple: > 1. Insert "{" (assume width 0) before the first token following a layout > keyword (usually ":" in Python). This opens a block. > 2. As long as inside a block, insert ";" before each token that is on the > _same_ column as the current (i.e. innermost) "{". > 3. A block ends as soon as you see a line whose first token is _left_ of = the > current "{". Insert "}" before that token. > Blocks can be nested, so you need to maintain a stack of starting columns= in > the parser. Note that rule 3 may end several blocks at once.=A0EOF is tre= ated > as a token at column 0. > The way I implemented this is by wrapping the ocamllex-generated lexer wi= th > a function that compares each token's column with the top of the layout > stack and inserts auxiliary tokens as necessary. > Haskell has another rule for inserting "}" if there would be a parse erro= r > without it (this is to allow inline blocks). This rule is pretty fudgy, a= nd > almost impossible to implement properly with a conventional parser > generator. IMO, the only sane way to reformulate this rule is again to ti= e > it to specific keywords, e.g. insert "}" before "else" if missing. This c= an > be implemented in the parser by making closing braces optional in the rig= ht > places. > - Andreas > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > -------------------------------------------------- > Yitzhak Mandelbaum > AT&T Labs - Research > http://www.research.att.com/~yitzhak > > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > >