From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL,HTML_MESSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 06328BC37 for ; Fri, 12 Jun 2009 17:43:56 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhcCACcUMkqLEwExhWdsb2JhbACCJS2VcQEBAQoLChoFqTOPNwWECw X-IronPort-AV: E=Sophos;i="4.42,210,1243807200"; d="scan'208,217";a="31121164" Received: from infao0809.mpi-sb.mpg.de (HELO hera.mpi-sb.mpg.de) ([139.19.1.49]) by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/AES256-SHA; 12 Jun 2009 17:43:55 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mpi-sb.mpg.de; s=mail200803; h=Cc:Message-Id:From:To: In-Reply-To:Content-Type:Subject:Mime-Version:Date:References; bh=h2dHAeGop28NwNYlRdejdgRQejyfoUXGkajHv8SQZnU=; b=XnJOr2O3jHalA u0/hGELerlZYSHADFpZGq4h98+h4w9oJqSwBxW8ufeWid2IOUdZZEhuLPOcCD/G0 tcGYUokkYzLXxeLWlT8q0Yn0qnDUP0qTBJVyHpQDrKdBsGjiymgQA6FgyiPnwUlH KiRCyIgbiY/FtAwcoprTsXFB1tWCG4= Received: from newzak.mpi-sb.mpg.de ([139.19.1.28]:52236) by hera.mpi-sb.mpg.de (envelope-from ) with esmtp (Exim 4.69) id 1MF8v1-0006Yb-My; Fri, 12 Jun 2009 17:43:54 +0200 Received: from srbk-4db7e951.pool.einsundeins.de ([77.183.233.81]:57465 helo=[192.168.178.20]) by newzak.mpi-sb.mpg.de with esmtpsa (TLS-1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.63) (envelope-from ) id 1MF8v1-0003SG-0F; Fri, 12 Jun 2009 17:43:51 +0200 Cc: caml-list@inria.fr Message-Id: From: Andreas Rossberg To: Andrej Bauer In-Reply-To: <7d8707de0906120120x10cc8fe0p54adbd189003f3da@mail.gmail.com> Content-Type: multipart/alternative; boundary=Apple-Mail-1-226257810 Subject: Re: [Caml-list] ocamllex and python-style indentation Mime-Version: 1.0 (Apple Message framework v935.3) Date: Fri, 12 Jun 2009 17:43:50 +0200 References: <7d8707de0906110557n6a1511a2k9f4f00827f954cb6@mail.gmail.com> <4A310A5B.9010404@ens-lyon.org> <7d8707de0906120120x10cc8fe0p54adbd189003f3da@mail.gmail.com> X-Mailer: Apple Mail (2.935.3) X-Spam: no; 0.00; rossberg:01 rossberg:01 ocamllex:01 andrej:01 inserting:01 tokens:01 separators:01 separators:01 haskell:01 haskell's:01 parser:01 lexer:01 tokens:01 haskell:01 inserting:01 --Apple-Mail-1-226257810 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit On Jun 12, 2009, at 10.20 h, Andrej Bauer wrote: > I think I understand the general idea of inserting "virtual" tokens, > but the details confuse me still. So starting with > >> if True: >> x = 3 >> y = (2 + >> 4 + 5) >> else: >> x = 5 >> if False: >> x = 8 >> z = 2 > > Martin suggests the following: > >> { >> if True: >> ; >> { >> x = 3 >> ; >> y = (2 + >> ; >> { >> 4 + 5) >> } >> } >> ; >> else: >> ; >> { >> x = 5 >> ; >> if False: >> ; >> { >> x = 8 >> ; >> z = 2 >> } >> } >> } > > I have two questions. Notice that the { ... } and ( ... ) need not be > correctly nested (in the top half), so how are we going to deal with > this? The second question is, why are there the separators after and > just before "else:". I would expect separators inside { .... }, but > not around "else". It depends on how exactly you define your layout rules. The usual approach is to tie start of layout-sensitive blocks to particular keywords -- this is essentially what Python and Haskell do. In that case, the binding to y is not affected. Haskell's rules for optional layout would rewrite your original program as >> if True: >> {x = 3 >> ;y = (2 + >> 4 + 5) >> }else: >> {x = 5 >> ;if False: >> {x = 8 >> ;z = 2 >> }} The basic rules are fairly simple: 1. Insert "{" (assume width 0) before the first token following a layout keyword (usually ":" in Python). This opens a block. 2. As long as inside a block, insert ";" before each token that is on the _same_ column as the current (i.e. innermost) "{". 3. A block ends as soon as you see a line whose first token is _left_ of the current "{". Insert "}" before that token. Blocks can be nested, so you need to maintain a stack of starting columns in the parser. Note that rule 3 may end several blocks at once. EOF is treated as a token at column 0. The way I implemented this is by wrapping the ocamllex-generated lexer with a function that compares each token's column with the top of the layout stack and inserts auxiliary tokens as necessary. Haskell has another rule for inserting "}" if there would be a parse error without it (this is to allow inline blocks). This rule is pretty fudgy, and almost impossible to implement properly with a conventional parser generator. IMO, the only sane way to reformulate this rule is again to tie it to specific keywords, e.g. insert "}" before "else" if missing. This can be implemented in the parser by making closing braces optional in the right places. - Andreas --Apple-Mail-1-226257810 Content-Type: text/html; charset=US-ASCII Content-Transfer-Encoding: quoted-printable
On Jun 12, 2009, at = 10.20 h, Andrej Bauer wrote:

I = think I understand the general idea of inserting "virtual" = tokens,
but the details confuse me still. So starting = with

if = True:
   x =3D = 3
   y =3D (2 = +
=      4 + 5)
else:
=    x =3D 5
=    if False:
=        x =3D = 8
=        z =3D = 2

Martin suggests the following:

{
if = True:
;
=   {
  x =3D= 3
=   ;
  y =3D= (2 +
=   ;
=     {
=     4 + 5)
=     }
=   }
;
else:
;
=   {
  x =3D= 5
=   ;
  if = False:
=   ;
=       {
      x =3D = 8
=       ;
      z =3D = 2
=       }
  }
}

I have two questions. Notice that = the { ... } and ( ... ) need not be
correctly nested (in the top = half), so how are we going to deal with
this? The second question is, = why are there the separators after and
just before "else:". I would = expect separators inside { .... }, but
not around = "else".

It depends on how = exactly you define your layout rules. The usual approach is to tie start = of layout-sensitive blocks to particular keywords -- this is essentially = what Python and Haskell do. In that case, the binding to y is not = affected. Haskell's rules for optional layout would rewrite your = original program as

if = True:
   {x =3D = 3
   ;y =3D (2 = +
     4 + = 5)
}else:
   {x =3D 5
   ;if False:
       {x =3D = 8
       = ;z =3D 2
}}

The basic rules are fairly = simple:

1. Insert "{" (assume width 0) before the first token = following a layout keyword (usually ":" in Python). This opens a = block.

2. As long as inside a block, insert ";" before each = token that is on the _same_ column as the current (i.e. innermost) = "{".

3. A block ends as soon as you see a line whose first = token is _left_ of the current "{". Insert "}" before that = token.

Blocks can be nested, so you need to maintain a stack = of starting columns in the parser. Note that rule 3 may end several = blocks at once. EOF is treated as a token at column = 0.

The way I implemented this is by wrapping the = ocamllex-generated lexer with a function that compares each token's = column with the top of the layout stack and inserts auxiliary tokens as = necessary.

Haskell has another rule for inserting = "}" if there would be a parse error without it (this is to allow inline = blocks). This rule is pretty fudgy, and almost impossible to implement = properly with a conventional parser generator. IMO, the only sane way to = reformulate this rule is again to tie it to specific keywords, e.g. = insert "}" before "else" if missing. This can be implemented in the = parser by making closing braces optional in the right = places.

- Andreas

= --Apple-Mail-1-226257810--