From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.2 required=5.0 tests=AWL,HTML_MESSAGE,SPF_SOFTFAIL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 2281EBC37 for ; Tue, 30 Jun 2009 20:58:22 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: An8CAFP9SUrAFOFwmWdsb2JhbACCVItxij4BAQEBAQgLCgcTuC6EDwU X-IronPort-AV: E=Sophos;i="4.42,318,1243807200"; d="scan'208,217";a="32240208" Received: from mail-dark.research.att.com (HELO mail-yellow.research.att.com) ([192.20.225.112]) by mail1-smtp-roc.national.inria.fr with ESMTP; 30 Jun 2009 20:58:21 +0200 Received: from YitzhakMac.client.research.att.com (YitzhakMac.client.research.att.com [135.207.174.180]) by bigmail.research.att.com (8.13.7+Sun/8.11.6) with ESMTP id n5UIwIlN015262; Tue, 30 Jun 2009 14:58:19 -0400 (EDT) Cc: Andrej Bauer , caml-list@inria.fr Message-Id: From: Yitzhak Mandelbaum To: Andreas Rossberg In-Reply-To: Content-Type: multipart/alternative; boundary=Apple-Mail-18--354358359 Mime-Version: 1.0 (Apple Message framework v935.3) Subject: Re: [Caml-list] ocamllex and python-style indentation Date: Tue, 30 Jun 2009 14:58:17 -0400 References: <7d8707de0906110557n6a1511a2k9f4f00827f954cb6@mail.gmail.com> <4A310A5B.9010404@ens-lyon.org> <7d8707de0906120120x10cc8fe0p54adbd189003f3da@mail.gmail.com> X-Mailer: Apple Mail (2.935.3) X-Spam: no; 0.00; yitzhak:01 mandelbaum:01 yitzhak:01 ocamllex:01 parens:01 rossberg:01 andrej:01 inserting:01 tokens:01 separators:01 separators:01 haskell:01 haskell's:01 parser:01 lexer:01 --Apple-Mail-18--354358359 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit To restart this thread, do your solutions handle the following (legal) variation of the original example? if True: x = 3+4 y = (2 + 4 + 5) z = 5 else: x = 5 if False: x = 8 z = 2 Notice that the assignment of y wraps onto the next line at an *earlier* column. This is legal b/c it is surrounded by parens. However, it seems that the preprocessing approaches will fail for this example. Do you have a workaround? --Yitzhak On Jun 12, 2009, at 11:43 AM, Andreas Rossberg wrote: > On Jun 12, 2009, at 10.20 h, Andrej Bauer wrote: > >> I think I understand the general idea of inserting "virtual" tokens, >> but the details confuse me still. So starting with >> >>> if True: >>> x = 3 >>> y = (2 + >>> 4 + 5) >>> else: >>> x = 5 >>> if False: >>> x = 8 >>> z = 2 >> >> Martin suggests the following: >> >>> { >>> if True: >>> ; >>> { >>> x = 3 >>> ; >>> y = (2 + >>> ; >>> { >>> 4 + 5) >>> } >>> } >>> ; >>> else: >>> ; >>> { >>> x = 5 >>> ; >>> if False: >>> ; >>> { >>> x = 8 >>> ; >>> z = 2 >>> } >>> } >>> } >> >> I have two questions. Notice that the { ... } and ( ... ) need not be >> correctly nested (in the top half), so how are we going to deal with >> this? The second question is, why are there the separators after and >> just before "else:". I would expect separators inside { .... }, but >> not around "else". > > It depends on how exactly you define your layout rules. The usual > approach is to tie start of layout-sensitive blocks to particular > keywords -- this is essentially what Python and Haskell do. In that > case, the binding to y is not affected. Haskell's rules for optional > layout would rewrite your original program as > >>> if True: >>> {x = 3 >>> ;y = (2 + >>> 4 + 5) >>> }else: >>> {x = 5 >>> ;if False: >>> {x = 8 >>> ;z = 2 >>> }} > > The basic rules are fairly simple: > > 1. Insert "{" (assume width 0) before the first token following a > layout keyword (usually ":" in Python). This opens a block. > > 2. As long as inside a block, insert ";" before each token that is > on the _same_ column as the current (i.e. innermost) "{". > > 3. A block ends as soon as you see a line whose first token is > _left_ of the current "{". Insert "}" before that token. > > Blocks can be nested, so you need to maintain a stack of starting > columns in the parser. Note that rule 3 may end several blocks at > once. EOF is treated as a token at column 0. > > The way I implemented this is by wrapping the ocamllex-generated > lexer with a function that compares each token's column with the top > of the layout stack and inserts auxiliary tokens as necessary. > > Haskell has another rule for inserting "}" if there would be a parse > error without it (this is to allow inline blocks). This rule is > pretty fudgy, and almost impossible to implement properly with a > conventional parser generator. IMO, the only sane way to reformulate > this rule is again to tie it to specific keywords, e.g. insert "}" > before "else" if missing. This can be implemented in the parser by > making closing braces optional in the right places. > > - Andreas > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs -------------------------------------------------- Yitzhak Mandelbaum AT&T Labs - Research http://www.research.att.com/~yitzhak --Apple-Mail-18--354358359 Content-Type: text/html; charset=US-ASCII Content-Transfer-Encoding: quoted-printable To restart this thread, do your = solutions handle the following (legal) variation of the original = example?

if True:
   x =3D = 3+4
   y =3D (2 +
4 + = 5)
   z =3D 5
else:
   = x =3D 5
   if False:
     =   x =3D 8
       z =3D = 2


Notice that the assignment of = y wraps onto the next line at an *earlier* column. This is legal b/c it = is surrounded by parens. However, it seems that the preprocessing = approaches will fail for this example. Do you have a = workaround?

--Yitzhak

On Jun 12, 2009, at 11:43 AM, Andreas Rossberg = wrote:

On Jun 12, = 2009, at 10.20 h, Andrej Bauer wrote:

I = think I understand the general idea of inserting "virtual" = tokens,
but the details confuse me still. So starting = with

if = True:
   x =3D = 3
   y =3D (2 = +
=      4 + 5)
else:
=    x =3D 5
=    if False:
=        x =3D = 8
=        z =3D = 2

Martin suggests the following:

{
if = True:
;
=   {
  x =3D= 3
=   ;
  y =3D= (2 +
=   ;
=     {
=     4 + 5)
=     }
=   }
;
else:
;
=   {
  x =3D= 5
=   ;
  if = False:
=   ;
=       {
      x =3D = 8
=       ;
      z =3D = 2
=       }
  }
}

I have two questions. Notice that = the { ... } and ( ... ) need not be
correctly nested (in the top = half), so how are we going to deal with
this? The second question is, = why are there the separators after and
just before "else:". I would = expect separators inside { .... }, but
not around = "else".

It depends on how = exactly you define your layout rules. The usual approach is to tie start = of layout-sensitive blocks to particular keywords -- this is essentially = what Python and Haskell do. In that case, the binding to y is not = affected. Haskell's rules for optional layout would rewrite your = original program as

if = True:
   {x =3D = 3
   ;y =3D (2 = +
     4 + = 5)
}else:
   {x =3D 5
   ;if False:
       {x =3D = 8
       = ;z =3D 2
}}

The basic rules are fairly = simple:

1. Insert "{" (assume width 0) before the first token = following a layout keyword (usually ":" in Python). This opens a = block.

2. As long as inside a block, insert ";" before each = token that is on the _same_ column as the current (i.e. innermost) = "{".

3. A block ends as soon as you see a line whose first = token is _left_ of the current "{". Insert "}" before that = token.

Blocks can be nested, so you need to maintain a stack = of starting columns in the parser. Note that rule 3 may end several = blocks at once. EOF is treated as a token at column = 0.

The way I implemented this is by wrapping the = ocamllex-generated lexer with a function that compares each token's = column with the top of the layout stack and inserts auxiliary tokens as = necessary.

Haskell has another rule for = inserting "}" if there would be a parse error without it (this is to = allow inline blocks). This rule is pretty fudgy, and almost impossible = to implement properly with a conventional parser generator. IMO, the = only sane way to reformulate this rule is again to tie it to specific = keywords, e.g. insert "}" before "else" if missing. This can be = implemented in the parser by making closing braces optional in the right = places.

- Andreas

_____________________= __________________________
Caml-list mailing list. Subscription = management:
http://y= quem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: = http://caml.inria.fr
Beginner's list: = http://groups.yahoo.com/group/ocaml_beginners
Bug reports: = http://caml.inria.fr/bin/caml-bugs

=

= --Apple-Mail-18--354358359--