On Jun 12, 2009, at 10.20 h, Andrej Bauer wrote: > I think I understand the general idea of inserting "virtual" tokens, > but the details confuse me still. So starting with > >> if True: >> x = 3 >> y = (2 + >> 4 + 5) >> else: >> x = 5 >> if False: >> x = 8 >> z = 2 > > Martin suggests the following: > >> { >> if True: >> ; >> { >> x = 3 >> ; >> y = (2 + >> ; >> { >> 4 + 5) >> } >> } >> ; >> else: >> ; >> { >> x = 5 >> ; >> if False: >> ; >> { >> x = 8 >> ; >> z = 2 >> } >> } >> } > > I have two questions. Notice that the { ... } and ( ... ) need not be > correctly nested (in the top half), so how are we going to deal with > this? The second question is, why are there the separators after and > just before "else:". I would expect separators inside { .... }, but > not around "else". It depends on how exactly you define your layout rules. The usual approach is to tie start of layout-sensitive blocks to particular keywords -- this is essentially what Python and Haskell do. In that case, the binding to y is not affected. Haskell's rules for optional layout would rewrite your original program as >> if True: >> {x = 3 >> ;y = (2 + >> 4 + 5) >> }else: >> {x = 5 >> ;if False: >> {x = 8 >> ;z = 2 >> }} The basic rules are fairly simple: 1. Insert "{" (assume width 0) before the first token following a layout keyword (usually ":" in Python). This opens a block. 2. As long as inside a block, insert ";" before each token that is on the _same_ column as the current (i.e. innermost) "{". 3. A block ends as soon as you see a line whose first token is _left_ of the current "{". Insert "}" before that token. Blocks can be nested, so you need to maintain a stack of starting columns in the parser. Note that rule 3 may end several blocks at once. EOF is treated as a token at column 0. The way I implemented this is by wrapping the ocamllex-generated lexer with a function that compares each token's column with the top of the layout stack and inserts auxiliary tokens as necessary. Haskell has another rule for inserting "}" if there would be a parse error without it (this is to allow inline blocks). This rule is pretty fudgy, and almost impossible to implement properly with a conventional parser generator. IMO, the only sane way to reformulate this rule is again to tie it to specific keywords, e.g. insert "}" before "else" if missing. This can be implemented in the parser by making closing braces optional in the right places. - Andreas