On Jun 12, 2009, at 10.20 h, Andrej Bauer wrote:

I think I understand the general idea of inserting "virtual" tokens,
but the details confuse me still. So starting with

if True:
   x = 3
   y = (2 +
     4 + 5)
else:
   x = 5
   if False:
       x = 8
       z = 2

Martin suggests the following:

{
if True:
;
  {
  x = 3
  ;
  y = (2 +
  ;
    {
    4 + 5)
    }
  }
;
else:
;
  {
  x = 5
  ;
  if False:
  ;
      {
      x = 8
      ;
      z = 2
      }
  }
}

I have two questions. Notice that the { ... } and ( ... ) need not be
correctly nested (in the top half), so how are we going to deal with
this? The second question is, why are there the separators after and
just before "else:". I would expect separators inside { .... }, but
not around "else".

It depends on how exactly you define your layout rules. The usual approach is to tie start of layout-sensitive blocks to particular keywords -- this is essentially what Python and Haskell do. In that case, the binding to y is not affected. Haskell's rules for optional layout would rewrite your original program as

if True:
   {x = 3
   ;y = (2 +
     4 + 5)
}else:
   {x = 5
   ;if False:
       {x = 8
       ;z = 2
}}

The basic rules are fairly simple:

1. Insert "{" (assume width 0) before the first token following a layout keyword (usually ":" in Python). This opens a block.

2. As long as inside a block, insert ";" before each token that is on the _same_ column as the current (i.e. innermost) "{".

3. A block ends as soon as you see a line whose first token is _left_ of the current "{". Insert "}" before that token.

Blocks can be nested, so you need to maintain a stack of starting columns in the parser. Note that rule 3 may end several blocks at once. EOF is treated as a token at column 0.

The way I implemented this is by wrapping the ocamllex-generated lexer with a function that compares each token's column with the top of the layout stack and inserts auxiliary tokens as necessary.

Haskell has another rule for inserting "}" if there would be a parse error without it (this is to allow inline blocks). This rule is pretty fudgy, and almost impossible to implement properly with a conventional parser generator. IMO, the only sane way to reformulate this rule is again to tie it to specific keywords, e.g. insert "}" before "else" if missing. This can be implemented in the parser by making closing braces optional in the right places.

- Andreas