caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* ocamllex and python-style indentation
@ 2009-06-11 12:57 Andrej Bauer
  2009-06-11 13:12 ` [Caml-list] " yoann padioleau
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Andrej Bauer @ 2009-06-11 12:57 UTC (permalink / raw)
  To: caml-list

My parsing powers are not sufficient to easily come up with
lexer/parser for a simple language that uses python-style indentation
and newline rules. Does anyone have such a thing lying around, written
in ocamllex/yacc or menhir? I would appreciate a peek to see how
you've dealt with it.

For example, suppose we want just a very simple fragment of Python
involving True, False, conditional statements, variables, and
assignments, such as:

if True:
    x = 3
    y = (2 +
      4 + 5)
else:
    x = 5
    if False:
        x = 8
        z = 2

How would I go about writing a lexer/parser for such a thing in ocaml?

Andrej


^ permalink raw reply	[flat|nested] 22+ messages in thread

* re: [Caml-list] ocamllex and python-style indentation
  2009-06-11 12:57 ocamllex and python-style indentation Andrej Bauer
@ 2009-06-11 13:12 ` yoann padioleau
  2009-06-11 13:21 ` Andreas Rossberg
  2009-06-11 13:44 ` Martin Jambon
  2 siblings, 0 replies; 22+ messages in thread
From: yoann padioleau @ 2009-06-11 13:12 UTC (permalink / raw)
  To: Andrej Bauer, caml-list


> My parsing powers are not sufficient to easily come up with
> lexer/parser for a simple language that uses python-style indentation
> and newline rules. Does anyone have such a thing lying around, written
> in ocamllex/yacc or menhir? I would appreciate a peek to see how
> you've dealt with it.
> 
> For example, suppose we want just a very simple fragment of Python
> involving True, False, conditional statements, variables, and
> assignments, such as:
> 
> if True:
> x = 3
> y = (2 +
> 4 + 5)
> else:
> x = 5
> if False:
> x = 8
> z = 2
> 
> How would I go about writing a lexer/parser for such a thing in ocaml?

I guess you could interpose a function between lexing and parsing that works on the tokens
and position information of those tokens to insert some new extra tokens that will then
make the job of the parser easier. For instance you could add some "virtual" ';' at the end
of each line except when you are in a parenthesized stuff.


> 
> Andrej
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-06-11 12:57 ocamllex and python-style indentation Andrej Bauer
  2009-06-11 13:12 ` [Caml-list] " yoann padioleau
@ 2009-06-11 13:21 ` Andreas Rossberg
  2009-06-11 13:44 ` Martin Jambon
  2 siblings, 0 replies; 22+ messages in thread
From: Andreas Rossberg @ 2009-06-11 13:21 UTC (permalink / raw)
  To: Andrej Bauer; +Cc: caml-list

On Jun 11, 2009, at 14.57 h, Andrej Bauer wrote:

> My parsing powers are not sufficient to easily come up with
> lexer/parser for a simple language that uses python-style indentation
> and newline rules. Does anyone have such a thing lying around, written
> in ocamllex/yacc or menhir? I would appreciate a peek to see how
> you've dealt with it.

Hi Andrej,

I have some _very_ old code (a source-to-source translator that  
provides Haskell-style syntax for Ocaml itself) that does that:

   http://www.mpi-sws.org/~rossberg/hocaml-0.12.tgz

Hope that helps.

Cheers,
- Andreas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-06-11 12:57 ocamllex and python-style indentation Andrej Bauer
  2009-06-11 13:12 ` [Caml-list] " yoann padioleau
  2009-06-11 13:21 ` Andreas Rossberg
@ 2009-06-11 13:44 ` Martin Jambon
  2009-06-12  8:20   ` Andrej Bauer
  2 siblings, 1 reply; 22+ messages in thread
From: Martin Jambon @ 2009-06-11 13:44 UTC (permalink / raw)
  To: Andrej Bauer; +Cc: caml-list

Andrej Bauer wrote:
> My parsing powers are not sufficient to easily come up with
> lexer/parser for a simple language that uses python-style indentation
> and newline rules. Does anyone have such a thing lying around, written
> in ocamllex/yacc or menhir? I would appreciate a peek to see how
> you've dealt with it.
> 
> For example, suppose we want just a very simple fragment of Python
> involving True, False, conditional statements, variables, and
> assignments, such as:
> 
> if True:
>     x = 3
>     y = (2 +
>       4 + 5)
> else:
>     x = 5
>     if False:
>         x = 8
>         z = 2
> 
> How would I go about writing a lexer/parser for such a thing in ocaml?

I would use a first pass that converts the input lines into this imaginary
structure:


{
if True:
;
    {
    x = 3
    ;
    y = (2 +
    ;
      {
      4 + 5)
      }
    }
;
else:
;
    {
    x = 5
    ;
    if False:
    ;
        {
        x = 8
        ;
        z = 2
        }
    }
}


You could create a generic tool that parses a file into this:

type t = Line of loc * string | Block of loc * t list


but as suggested by Yoann, the next step should probably be to flatten this
into a stream by introducing artificial tokens:

type gen_token =
   Open of loc          (* fake "{" *)
 | Close of loc         (* fake "}" *)
 | Separator of loc     (* fake ";" *)
 | Line of loc * string


then parse each Line into a list of tokens and flatten the result into one
single token stream:

type token =
   OPEN_BLOCK of loc  (* fake "{" *)
 | CLOSE_BLOCK of loc (* fake "}" *)
 | SEPARATOR of loc   (* fake ";" *)
 | ... (* your language-specific tokens here *)


The token stream could then be processed by ocamlyacc/menhir.


That's the approach I would follow if I had to solve this problem again.



Martin

-- 
http://mjambon.com/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-06-11 13:44 ` Martin Jambon
@ 2009-06-12  8:20   ` Andrej Bauer
  2009-06-12 12:56     ` Martin Jambon
                       ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Andrej Bauer @ 2009-06-12  8:20 UTC (permalink / raw)
  To: caml-list

Thanks to Andreas, I'll have a look at the "old" code.

I think I understand the general idea of inserting "virtual" tokens,
but the details confuse me still. So starting with

> if True:
>     x = 3
>     y = (2 +
>       4 + 5)
> else:
>     x = 5
>     if False:
>         x = 8
>         z = 2

Martin suggests the following:

> {
> if True:
> ;
>    {
>    x = 3
>    ;
>    y = (2 +
>    ;
>      {
>      4 + 5)
>      }
>    }
> ;
> else:
> ;
>    {
>    x = 5
>    ;
>    if False:
>    ;
>        {
>        x = 8
>        ;
>        z = 2
>        }
>    }
> }

I have two questions. Notice that the { ... } and ( ... ) need not be
correctly nested (in the top half), so how are we going to deal with
this? The second question is, why are there the separators after and
just before "else:". I would expect separators inside { .... }, but
not around "else".

Presumably the intermediate stage that I would preprocess the token
stream would have to know about indentation levels. I have not tried
this, but ocaml lexer will correctly match things like

| '\n' [' ' '\t']* -> { INDENTATION (compute_indentation (lexeme buf)) }

Yes?

With kind regards,

Andrej


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-06-12  8:20   ` Andrej Bauer
@ 2009-06-12 12:56     ` Martin Jambon
  2009-06-12 13:34     ` Martin Jambon
  2009-06-12 15:43     ` Andreas Rossberg
  2 siblings, 0 replies; 22+ messages in thread
From: Martin Jambon @ 2009-06-12 12:56 UTC (permalink / raw)
  To: Andrej Bauer; +Cc: caml-list

Andrej Bauer wrote:
> Thanks to Andreas, I'll have a look at the "old" code.
> 
> I think I understand the general idea of inserting "virtual" tokens,
> but the details confuse me still. So starting with
> 
>> if True:
>>     x = 3
>>     y = (2 +
>>       4 + 5)
>> else:
>>     x = 5
>>     if False:
>>         x = 8
>>         z = 2
> 
> Martin suggests the following:
> 
>> {
>> if True:
>> ;
>>    {
>>    x = 3
>>    ;
>>    y = (2 +
>>    ;
>>      {
>>      4 + 5)
>>      }
>>    }
>> ;
>> else:
>> ;
>>    {
>>    x = 5
>>    ;
>>    if False:
>>    ;
>>        {
>>        x = 8
>>        ;
>>        z = 2
>>        }
>>    }
>> }
> 
> I have two questions. Notice that the { ... } and ( ... ) need not be
> correctly nested (in the top half), so how are we going to deal with
> this? The second question is, why are there the separators after and
> just before "else:". I would expect separators inside { .... }, but
> not around "else".

Original example:

if True:
    x = 3
    y = (2 +
      4 + 5)
else:
    x = 5
    if False:
        x = 8
        z = 2


For pure indentation concerns, it is equivalent to:

x
  x
  x
    x
x
  x
  x
    x
    x


Which is parsed into:

[
  Line;
  Block
    [
      Line;
      Line;
      Block
       [
         Line
       ]
    ];
  Line;
  Block
    [
       Line;
       Line
    ];
  Block
    [
      Line;
      Line
    ]
]


I wrote the following code, which does the job.  You might want to use
ocamllex instead in order to better manage newline characters (CRLF...), line
number directives and allow input from something else than a file or in_channel.


Note that the following must be rejected:

x
    x
  x (indentation here could be only 0, 4 or more)


But this is accepted:

x
    x
x
  x


You could also enforce that the indentation of a block must be the current
indentation + k, for example k=2 for the whole input.



(******************* indent_parser.ml **********************)

type indent_line = Lexing.position * (int * string)

type indent_tree =
    [ `Line of (Lexing.position * string)
    | `Block of (Lexing.position * indent_tree list) ]


let split s =
  let len = String.length s in
  let result = ref None in
  try
    for i = 0 to len - 1 do
      if s.[i] <> ' ' then (
	result := Some (i, String.sub s i (len - i));
	raise Exit
      )
    done;
     None
  with Exit -> !result

let parse_lines fname ic : indent_line list =
  let lines = ref [] in
  let lnum = ref 0 in
  try
    while true do
      let bol = pos_in ic in
      let s = input_line ic in
      incr lnum;
      match split s with
	  None -> ()
	| Some ((n, _) as x) ->
	    let pos = {
	      Lexing.pos_fname = fname;
	      pos_lnum = !lnum;
	      pos_bol = bol;
	      pos_cnum = bol + n;
	    } in
	    lines := (pos, x) :: !lines
    done;
    assert false
  with End_of_file -> List.rev !lines

let parse_lines_from_file fname =
  let ic = open_in fname in
  try
    let x = parse_lines fname ic in
    close_in ic;
    x
  with e ->
    close_in_noerr ic;
    raise e

let error pos msg =
  let cpos = pos.Lexing.pos_cnum - pos.Lexing.pos_bol in
  let msg =
    Printf.sprintf "File %S, line %i, characters %i-%i:\n%s"
      pos.Lexing.pos_fname pos.Lexing.pos_lnum 0 cpos msg
  in
  failwith msg

let rec block_body cur_indent sub_indent cur_block l :
    indent_tree list * indent_line list =
  match l with
      [] -> (List.rev cur_block, [])
    | (pos, (n, s)) :: tl ->
	if n = cur_indent then
	  block_body cur_indent sub_indent (`Line (pos, s) :: cur_block) tl
	else if n > cur_indent then (
	  (match sub_indent with
	       None -> ()
	     | Some n' ->
		 if n <> n' then
		   error pos "Inconsistent indentation"
	  );
	  let sub_block, remaining =
	    block_body n None [ `Line (pos, s) ] tl in

	  block_body
	    cur_indent (Some n) (`Block (pos, sub_block) :: cur_block)
	    remaining
	)
	else
	  (List.rev cur_block, l)


let parse_indentation fname =
  let l = parse_lines_from_file fname in
  let result, remaining = block_body 0 None [] l in
  assert (remaining = []);
  result


let test () =
  let fname = Filename.temp_file "test" ".ind" in
  let oc = open_out fname in
  output_string oc "
if True:
    x = 3
    y = (2 +
      4 + 5)
else:
    x = 5
    if False:
        x = 8
        z = 2
";
  close_out oc;

  try
    let result = parse_indentation fname in
    Sys.remove fname;
    result
  with Failure msg as e ->
    Printf.eprintf "%s\n%!" msg;
    Sys.remove fname;
    raise e


(*****************************************************************)






> Presumably the intermediate stage that I would preprocess the token
> stream would have to know about indentation levels. I have not tried
> this, but ocaml lexer will correctly match things like
> 
> | '\n' [' ' '\t']* -> { INDENTATION (compute_indentation (lexeme buf)) }
> 
> Yes?

Kind of.  Don't discard the rest of the line...
If you have a choice, reject tabs.
Beware of CRLF newlines (\r\n) and missing \n before the end of file.
Also ocamllex does not keep track of newlines automatically.  See the
documentation for Lexing.lexbuf.



Martin

-- 
http://mjambon.com/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-06-12  8:20   ` Andrej Bauer
  2009-06-12 12:56     ` Martin Jambon
@ 2009-06-12 13:34     ` Martin Jambon
  2009-06-12 15:43     ` Andreas Rossberg
  2 siblings, 0 replies; 22+ messages in thread
From: Martin Jambon @ 2009-06-12 13:34 UTC (permalink / raw)
  To: Andrej Bauer; +Cc: caml-list

Andrej Bauer wrote:
> Thanks to Andreas, I'll have a look at the "old" code.
> 
> I think I understand the general idea of inserting "virtual" tokens,
> but the details confuse me still. So starting with
> 
>> if True:
>>     x = 3
>>     y = (2 +
>>       4 + 5)
>> else:
>>     x = 5
>>     if False:
>>         x = 8
>>         z = 2
> 
> Martin suggests the following:
> 
>> {
>> if True:
>> ;
>>    {
>>    x = 3
>>    ;
>>    y = (2 +
>>    ;
>>      {
>>      4 + 5)
>>      }
>>    }
>> ;
>> else:
>> ;
>>    {
>>    x = 5
>>    ;
>>    if False:
>>    ;
>>        {
>>        x = 8
>>        ;
>>        z = 2
>>        }
>>    }
>> }
> 
> I have two questions. Notice that the { ... } and ( ... ) need not be
> correctly nested (in the top half), so how are we going to deal with
> this?

It depends on the characteristics of your language.
It is generally easier to use several successive passes rather than trying to
do everything in one pass.


Martin

-- 
http://mjambon.com/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-06-12  8:20   ` Andrej Bauer
  2009-06-12 12:56     ` Martin Jambon
  2009-06-12 13:34     ` Martin Jambon
@ 2009-06-12 15:43     ` Andreas Rossberg
  2009-06-30 18:58       ` Yitzhak Mandelbaum
  2 siblings, 1 reply; 22+ messages in thread
From: Andreas Rossberg @ 2009-06-12 15:43 UTC (permalink / raw)
  To: Andrej Bauer; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 2630 bytes --]

On Jun 12, 2009, at 10.20 h, Andrej Bauer wrote:

> I think I understand the general idea of inserting "virtual" tokens,
> but the details confuse me still. So starting with
>
>> if True:
>>    x = 3
>>    y = (2 +
>>      4 + 5)
>> else:
>>    x = 5
>>    if False:
>>        x = 8
>>        z = 2
>
> Martin suggests the following:
>
>> {
>> if True:
>> ;
>>   {
>>   x = 3
>>   ;
>>   y = (2 +
>>   ;
>>     {
>>     4 + 5)
>>     }
>>   }
>> ;
>> else:
>> ;
>>   {
>>   x = 5
>>   ;
>>   if False:
>>   ;
>>       {
>>       x = 8
>>       ;
>>       z = 2
>>       }
>>   }
>> }
>
> I have two questions. Notice that the { ... } and ( ... ) need not be
> correctly nested (in the top half), so how are we going to deal with
> this? The second question is, why are there the separators after and
> just before "else:". I would expect separators inside { .... }, but
> not around "else".

It depends on how exactly you define your layout rules. The usual  
approach is to tie start of layout-sensitive blocks to particular  
keywords -- this is essentially what Python and Haskell do. In that  
case, the binding to y is not affected. Haskell's rules for optional  
layout would rewrite your original program as

>> if True:
>>    {x = 3
>>    ;y = (2 +
>>      4 + 5)
>> }else:
>>    {x = 5
>>    ;if False:
>>        {x = 8
>>        ;z = 2
>> }}

The basic rules are fairly simple:

1. Insert "{" (assume width 0) before the first token following a  
layout keyword (usually ":" in Python). This opens a block.

2. As long as inside a block, insert ";" before each token that is on  
the _same_ column as the current (i.e. innermost) "{".

3. A block ends as soon as you see a line whose first token is _left_  
of the current "{". Insert "}" before that token.

Blocks can be nested, so you need to maintain a stack of starting  
columns in the parser. Note that rule 3 may end several blocks at  
once. EOF is treated as a token at column 0.

The way I implemented this is by wrapping the ocamllex-generated lexer  
with a function that compares each token's column with the top of the  
layout stack and inserts auxiliary tokens as necessary.

Haskell has another rule for inserting "}" if there would be a parse  
error without it (this is to allow inline blocks). This rule is pretty  
fudgy, and almost impossible to implement properly with a conventional  
parser generator. IMO, the only sane way to reformulate this rule is  
again to tie it to specific keywords, e.g. insert "}" before "else" if  
missing. This can be implemented in the parser by making closing  
braces optional in the right places.

- Andreas


[-- Attachment #2: Type: text/html, Size: 6455 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-06-12 15:43     ` Andreas Rossberg
@ 2009-06-30 18:58       ` Yitzhak Mandelbaum
  2009-06-30 20:19         ` Mike Lin
  2009-06-30 22:06         ` Andreas Rossberg
  0 siblings, 2 replies; 22+ messages in thread
From: Yitzhak Mandelbaum @ 2009-06-30 18:58 UTC (permalink / raw)
  To: Andreas Rossberg; +Cc: Andrej Bauer, caml-list

[-- Attachment #1: Type: text/plain, Size: 3721 bytes --]

To restart this thread, do your solutions handle the following (legal)  
variation of the original example?

if True:
    x = 3+4
    y = (2 +
4 + 5)
    z = 5
else:
    x = 5
    if False:
        x = 8
        z = 2


Notice that the assignment of y wraps onto the next line at an  
*earlier* column. This is legal b/c it is surrounded by parens.  
However, it seems that the preprocessing approaches will fail for this  
example. Do you have a workaround?

--Yitzhak


On Jun 12, 2009, at 11:43 AM, Andreas Rossberg wrote:

> On Jun 12, 2009, at 10.20 h, Andrej Bauer wrote:
>
>> I think I understand the general idea of inserting "virtual" tokens,
>> but the details confuse me still. So starting with
>>
>>> if True:
>>>    x = 3
>>>    y = (2 +
>>>      4 + 5)
>>> else:
>>>    x = 5
>>>    if False:
>>>        x = 8
>>>        z = 2
>>
>> Martin suggests the following:
>>
>>> {
>>> if True:
>>> ;
>>>   {
>>>   x = 3
>>>   ;
>>>   y = (2 +
>>>   ;
>>>     {
>>>     4 + 5)
>>>     }
>>>   }
>>> ;
>>> else:
>>> ;
>>>   {
>>>   x = 5
>>>   ;
>>>   if False:
>>>   ;
>>>       {
>>>       x = 8
>>>       ;
>>>       z = 2
>>>       }
>>>   }
>>> }
>>
>> I have two questions. Notice that the { ... } and ( ... ) need not be
>> correctly nested (in the top half), so how are we going to deal with
>> this? The second question is, why are there the separators after and
>> just before "else:". I would expect separators inside { .... }, but
>> not around "else".
>
> It depends on how exactly you define your layout rules. The usual  
> approach is to tie start of layout-sensitive blocks to particular  
> keywords -- this is essentially what Python and Haskell do. In that  
> case, the binding to y is not affected. Haskell's rules for optional  
> layout would rewrite your original program as
>
>>> if True:
>>>    {x = 3
>>>    ;y = (2 +
>>>      4 + 5)
>>> }else:
>>>    {x = 5
>>>    ;if False:
>>>        {x = 8
>>>        ;z = 2
>>> }}
>
> The basic rules are fairly simple:
>
> 1. Insert "{" (assume width 0) before the first token following a  
> layout keyword (usually ":" in Python). This opens a block.
>
> 2. As long as inside a block, insert ";" before each token that is  
> on the _same_ column as the current (i.e. innermost) "{".
>
> 3. A block ends as soon as you see a line whose first token is  
> _left_ of the current "{". Insert "}" before that token.
>
> Blocks can be nested, so you need to maintain a stack of starting  
> columns in the parser. Note that rule 3 may end several blocks at  
> once. EOF is treated as a token at column 0.
>
> The way I implemented this is by wrapping the ocamllex-generated  
> lexer with a function that compares each token's column with the top  
> of the layout stack and inserts auxiliary tokens as necessary.
>
> Haskell has another rule for inserting "}" if there would be a parse  
> error without it (this is to allow inline blocks). This rule is  
> pretty fudgy, and almost impossible to implement properly with a  
> conventional parser generator. IMO, the only sane way to reformulate  
> this rule is again to tie it to specific keywords, e.g. insert "}"  
> before "else" if missing. This can be implemented in the parser by  
> making closing braces optional in the right places.
>
> - Andreas
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

--------------------------------------------------
Yitzhak Mandelbaum
AT&T Labs - Research

http://www.research.att.com/~yitzhak



[-- Attachment #2: Type: text/html, Size: 10100 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-06-30 18:58       ` Yitzhak Mandelbaum
@ 2009-06-30 20:19         ` Mike Lin
  2009-06-30 22:06         ` Andreas Rossberg
  1 sibling, 0 replies; 22+ messages in thread
From: Mike Lin @ 2009-06-30 20:19 UTC (permalink / raw)
  To: Yitzhak Mandelbaum; +Cc: Andreas Rossberg, caml-list, Andrej Bauer

More generally, you've got parentheses, comments, and string literals,
and you need to know to ignore whitespace within any of those -- and
to ignore e.g. an open parenthesis that occurs within a comment, or a
close comment that occurs within a string literal. So inevitably
you've got to lex and parse at some level to make this work for a
practical language.
There are still some byzantine cases that ocaml+twt doesn't handle
properly, and I think it probably gets pretty close to the minimally
complex yet practically usable approach to this.

Mike

On Tue, Jun 30, 2009 at 2:58 PM, Yitzhak
Mandelbaum<yitzhak@research.att.com> wrote:
> To restart this thread, do your solutions handle the following (legal)
> variation of the original example?
> if True:
>    x = 3+4
>    y = (2 +
> 4 + 5)
>    z = 5
> else:
>    x = 5
>    if False:
>        x = 8
>        z = 2
>
> Notice that the assignment of y wraps onto the next line at an *earlier*
> column. This is legal b/c it is surrounded by parens. However, it seems that
> the preprocessing approaches will fail for this example. Do you have a
> workaround?
> --Yitzhak
>
> On Jun 12, 2009, at 11:43 AM, Andreas Rossberg wrote:
>
> On Jun 12, 2009, at 10.20 h, Andrej Bauer wrote:
>
> I think I understand the general idea of inserting "virtual" tokens,
> but the details confuse me still. So starting with
>
> if True:
>
>    x = 3
>
>    y = (2 +
>
>      4 + 5)
>
> else:
>
>    x = 5
>
>    if False:
>
>        x = 8
>
>        z = 2
>
> Martin suggests the following:
>
> {
>
> if True:
>
> ;
>
>   {
>
>   x = 3
>
>   ;
>
>   y = (2 +
>
>   ;
>
>     {
>
>     4 + 5)
>
>     }
>
>   }
>
> ;
>
> else:
>
> ;
>
>   {
>
>   x = 5
>
>   ;
>
>   if False:
>
>   ;
>
>       {
>
>       x = 8
>
>       ;
>
>       z = 2
>
>       }
>
>   }
>
> }
>
> I have two questions. Notice that the { ... } and ( ... ) need not be
> correctly nested (in the top half), so how are we going to deal with
> this? The second question is, why are there the separators after and
> just before "else:". I would expect separators inside { .... }, but
> not around "else".
>
> It depends on how exactly you define your layout rules. The usual approach
> is to tie start of layout-sensitive blocks to particular keywords -- this is
> essentially what Python and Haskell do. In that case, the binding to y is
> not affected. Haskell's rules for optional layout would rewrite your
> original program as
>
> if True:
>
>    {x = 3
>
>    ;y = (2 +
>
>      4 + 5)
>
> }else:
>
>    {x = 5
>
>    ;if False:
>
>        {x = 8
>
>        ;z = 2
>
> }}
>
> The basic rules are fairly simple:
> 1. Insert "{" (assume width 0) before the first token following a layout
> keyword (usually ":" in Python). This opens a block.
> 2. As long as inside a block, insert ";" before each token that is on the
> _same_ column as the current (i.e. innermost) "{".
> 3. A block ends as soon as you see a line whose first token is _left_ of the
> current "{". Insert "}" before that token.
> Blocks can be nested, so you need to maintain a stack of starting columns in
> the parser. Note that rule 3 may end several blocks at once. EOF is treated
> as a token at column 0.
> The way I implemented this is by wrapping the ocamllex-generated lexer with
> a function that compares each token's column with the top of the layout
> stack and inserts auxiliary tokens as necessary.
> Haskell has another rule for inserting "}" if there would be a parse error
> without it (this is to allow inline blocks). This rule is pretty fudgy, and
> almost impossible to implement properly with a conventional parser
> generator. IMO, the only sane way to reformulate this rule is again to tie
> it to specific keywords, e.g. insert "}" before "else" if missing. This can
> be implemented in the parser by making closing braces optional in the right
> places.
> - Andreas
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
> --------------------------------------------------
> Yitzhak Mandelbaum
> AT&T Labs - Research
> http://www.research.att.com/~yitzhak
>
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-06-30 18:58       ` Yitzhak Mandelbaum
  2009-06-30 20:19         ` Mike Lin
@ 2009-06-30 22:06         ` Andreas Rossberg
  2009-07-01  2:13           ` Mike Lin
  1 sibling, 1 reply; 22+ messages in thread
From: Andreas Rossberg @ 2009-06-30 22:06 UTC (permalink / raw)
  To: Yitzhak Mandelbaum, Mike Lin; +Cc: Andrej Bauer, caml-list

On Jun 30, 2009, at 20.58 h, Yitzhak Mandelbaum wrote:
> To restart this thread, do your solutions handle the following  
> (legal) variation of the original example?
>
> if True:
>    x = 3+4
>    y = (2 +
> 4 + 5)
>    z = 5
> else:
>    x = 5
>    if False:
>        x = 8
>        z = 2
>
> Notice that the assignment of y wraps onto the next line at an  
> *earlier* column. This is legal b/c it is surrounded by parens.  
> However, it seems that the preprocessing approaches will fail for  
> this example. Do you have a workaround?

Handling that wouldn't be hard with the approach I described. You only  
need to have parentheses start a dummy block at column 0, so that no  
tokens will be inserted inside it .

However, I don't think it is actually a good idea to allow this. (And  
I don't think Haskell does, but I may misremember.)

On Jun 30, 2009, at 22.19 h, Mike Lin wrote:
> More generally, you've got parentheses, comments, and string literals,
> and you need to know to ignore whitespace within any of those -- and
> to ignore e.g. an open parenthesis that occurs within a comment, or a
> close comment that occurs within a string literal. So inevitably
> you've got to lex and parse at some level to make this work for a
> practical language.

Comments and string literals are no problem, since the lex wrapper  
will never see anything inside them as separate tokens anyway.

- Andreas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-06-30 22:06         ` Andreas Rossberg
@ 2009-07-01  2:13           ` Mike Lin
  2009-07-01  7:31             ` Andreas Rossberg
  0 siblings, 1 reply; 22+ messages in thread
From: Mike Lin @ 2009-07-01  2:13 UTC (permalink / raw)
  To: Andreas Rossberg; +Cc: Yitzhak Mandelbaum, Andrej Bauer, caml-list

On Tue, Jun 30, 2009 at 6:06 PM, Andreas Rossberg<rossberg@mpi-sws.org> wrote:
> On Jun 30, 2009, at 22.19 h, Mike Lin wrote:
>>
>> More generally, you've got parentheses, comments, and string literals,
>> and you need to know to ignore whitespace within any of those -- and
>> to ignore e.g. an open parenthesis that occurs within a comment, or a
>> close comment that occurs within a string literal. So inevitably
>> you've got to lex and parse at some level to make this work for a
>> practical language.
>
> Comments and string literals are no problem, since the lex wrapper will
> never see anything inside them as separate tokens anyway.

OCaml comments can be nested, and must be nested parenthetically. :)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-07-01  2:13           ` Mike Lin
@ 2009-07-01  7:31             ` Andreas Rossberg
  2009-07-01 14:02               ` Mike Lin
  0 siblings, 1 reply; 22+ messages in thread
From: Andreas Rossberg @ 2009-07-01  7:31 UTC (permalink / raw)
  To: Mike Lin; +Cc: Yitzhak Mandelbaum, Andrej Bauer, caml-list

[-- Attachment #1: Type: text/plain, Size: 456 bytes --]

On Jul 1, 2009, at 04.13 h, Mike Lin wrote:
>>
>> Comments and string literals are no problem, since the lex wrapper  
>> will
>> never see anything inside them as separate tokens anyway.
>
> OCaml comments can be nested, and must be nested parenthetically. :)


I know. (Haskell has nested comments, too, btw.) That does not make a  
difference, though - it is all handled by lex already. In fact, my  
code handled nested comments just fine.

- Andreas


[-- Attachment #2: Type: text/html, Size: 879 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-07-01  7:31             ` Andreas Rossberg
@ 2009-07-01 14:02               ` Mike Lin
  2009-07-01 14:17                 ` Andreas Rossberg
  0 siblings, 1 reply; 22+ messages in thread
From: Mike Lin @ 2009-07-01 14:02 UTC (permalink / raw)
  To: Andreas Rossberg; +Cc: Yitzhak Mandelbaum, Andrej Bauer, caml-list

On Wed, Jul 1, 2009 at 3:31 AM, Andreas Rossberg<rossberg@mpi-sws.org> wrote:
>
> I know. (Haskell has nested comments, too, btw.) That does not make a
> difference, though - it is all handled by lex already. In fact, my code
> handled nested comments just fine.

OK, now I'm curious :) how does your lexer match balanced parentheses,
or in this case comments?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-07-01 14:02               ` Mike Lin
@ 2009-07-01 14:17                 ` Andreas Rossberg
  2009-07-01 14:21                   ` Andreas Rossberg
  2009-07-01 15:03                   ` Sylvain Le Gall
  0 siblings, 2 replies; 22+ messages in thread
From: Andreas Rossberg @ 2009-07-01 14:17 UTC (permalink / raw)
  To: Mike Lin; +Cc: caml-list

Mike Lin wrote:
> OK, now I'm curious :) how does your lexer match balanced parentheses,
> or in this case comments?
>   

Easily, with a bit of side effects (I think that's roughly how all ML 
compilers do it):

------------------------------------------------
let error l s = (* ... *)
let commentDepth = ref 0
let start = ref 0
let loc length = let pos = !start in (pos, pos+length)

rule lex =
    parse eof            { EOF }
    (* | ... *)
    | "{-"            { start := pos lexbuf;
                  lexNestComment lexbuf }

and lexNestComment =
    parse eof            { error (loc 2) "unterminated comment" }
    | "(*"            { incr commentDepth;
                  lexNestComment lexbuf }
    | "*)"            { decr commentDepth;
                  if !commentDepth > 0
                  then lexNestComment lexbuf
                  else lex lexbuf }
    | _            { lexNestComment lexbuf }
------------------------------------------------

If you also want to treat strings in comments specially (like OCaml), 
then you need to do a bit more work, but it's basically the same idea.

- Andreas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-07-01 14:17                 ` Andreas Rossberg
@ 2009-07-01 14:21                   ` Andreas Rossberg
  2009-07-01 14:37                     ` Mike Lin
  2009-07-01 15:03                   ` Sylvain Le Gall
  1 sibling, 1 reply; 22+ messages in thread
From: Andreas Rossberg @ 2009-07-01 14:21 UTC (permalink / raw)
  To: Mike Lin; +Cc: caml-list

I wrote:
> ------------------------------------------------
> let error l s = (* ... *)
> let commentDepth = ref 0
> let start = ref 0
> let loc length = let pos = !start in (pos, pos+length)
>
> rule lex =
>    parse eof            { EOF }
>    (* | ... *)
>    | "{-"            { start := pos lexbuf;
>                  lexNestComment lexbuf }

Sorry, the "{-" should be "(*" for ML. And the (* | ... *) is supposed 
to stand for all other tokens in your lexer.

- Andreas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] ocamllex and python-style indentation
  2009-07-01 14:21                   ` Andreas Rossberg
@ 2009-07-01 14:37                     ` Mike Lin
  0 siblings, 0 replies; 22+ messages in thread
From: Mike Lin @ 2009-07-01 14:37 UTC (permalink / raw)
  To: Andreas Rossberg; +Cc: caml-list

That's cheating!!!!!! It's clever :) But clearly, you have a little
parser with a stack there. My original contribution to this thread was
just that "you've got to lex and parse at some level" before doing the
whitespace conversion. Not that you were claiming otherwise, but it
seemed like others had alluded to simpler ways which would only work
for toy languages.


On Wed, Jul 1, 2009 at 10:21 AM, Andreas Rossberg<rossberg@mpi-sws.org> wrote:
> I wrote:
>>
>> ------------------------------------------------
>> let error l s = (* ... *)
>> let commentDepth = ref 0
>> let start = ref 0
>> let loc length = let pos = !start in (pos, pos+length)
>>
>> rule lex =
>>   parse eof            { EOF }
>>   (* | ... *)
>>   | "{-"            { start := pos lexbuf;
>>                 lexNestComment lexbuf }
>
> Sorry, the "{-" should be "(*" for ML. And the (* | ... *) is supposed to
> stand for all other tokens in your lexer.
>
> - Andreas
>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ocamllex and python-style indentation
  2009-07-01 14:17                 ` Andreas Rossberg
  2009-07-01 14:21                   ` Andreas Rossberg
@ 2009-07-01 15:03                   ` Sylvain Le Gall
  2009-07-01 15:16                     ` [Caml-list] " Andreas Rossberg
  2009-07-01 15:19                     ` [Caml-list] " Martin Jambon
  1 sibling, 2 replies; 22+ messages in thread
From: Sylvain Le Gall @ 2009-07-01 15:03 UTC (permalink / raw)
  To: caml-list

Hello,

On 01-07-2009, Andreas Rossberg <rossberg@mpi-sws.org> wrote:
> Mike Lin wrote:
>> OK, now I'm curious :) how does your lexer match balanced parentheses,
>> or in this case comments?
>>   
>
> Easily, with a bit of side effects (I think that's roughly how all ML 
> compilers do it):
>
> ------------------------------------------------
> let error l s = (* ... *)
> let commentDepth = ref 0
> let start = ref 0
> let loc length = let pos = !start in (pos, pos+length)
>
> rule lex =
>     parse eof            { EOF }
>     (* | ... *)
>     | "{-"            { start := pos lexbuf;
>                   lexNestComment lexbuf }
>
> and lexNestComment =
>     parse eof            { error (loc 2) "unterminated comment" }
>     | "(*"            { incr commentDepth;
>                   lexNestComment lexbuf }
>     | "*)"            { decr commentDepth;
>                   if !commentDepth > 0
>                   then lexNestComment lexbuf
>                   else lex lexbuf }
>     | _            { lexNestComment lexbuf }
> ------------------------------------------------
>
> If you also want to treat strings in comments specially (like OCaml), 
> then you need to do a bit more work, but it's basically the same idea.
>

May I recommend you to write this in a more simple way:

-------------------------------------------------------------------------
rule lex =
  parse eof    { () }
  | "(*"       { start := pos lexbuf; lexNestComment lexbuf; lex lexbuf }

and lexNestComment =
  parse eof    { error (loc 2) "unterminated comment" }
| "(*"         { lexNestComment lexbuf }
| "*)"         { () }
| _            { lexNestComment lexbuf }
-------------------------------------------------------------------------

I think it works the same way, except that it uses less global
variables.

Regards,
Sylvain Le Gall


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] Re: ocamllex and python-style indentation
  2009-07-01 15:03                   ` Sylvain Le Gall
@ 2009-07-01 15:16                     ` Andreas Rossberg
  2009-07-01 16:26                       ` Sylvain Le Gall
  2009-07-01 15:19                     ` [Caml-list] " Martin Jambon
  1 sibling, 1 reply; 22+ messages in thread
From: Andreas Rossberg @ 2009-07-01 15:16 UTC (permalink / raw)
  To: Sylvain Le Gall; +Cc: caml-list

Sylvain Le Gall wrote:
> May I recommend you to write this in a more simple way:
>
> -------------------------------------------------------------------------
> rule lex =
>   parse eof    { () }
>   | "(*"       { start := pos lexbuf; lexNestComment lexbuf; lex lexbuf }
>
> and lexNestComment =
>   parse eof    { error (loc 2) "unterminated comment" }
> | "(*"         { lexNestComment lexbuf }
> | "*)"         { () }
> | _            { lexNestComment lexbuf }
> -------------------------------------------------------------------------
>   

Mh, I think in lexNestComment it should at least say

 | "(*"         { lexNestComment lexbuf; lexNestComment lexbuf }


That might work. I am not sure how well the various lexer generators 
handle arbitrary recursive invocations and reentrance, though. Have you 
tried it? If it works, yes, that's certainly a nicer version.

- Andreas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] Re: ocamllex and python-style indentation
  2009-07-01 15:03                   ` Sylvain Le Gall
  2009-07-01 15:16                     ` [Caml-list] " Andreas Rossberg
@ 2009-07-01 15:19                     ` Martin Jambon
  2009-07-01 15:43                       ` Andreas Rossberg
  1 sibling, 1 reply; 22+ messages in thread
From: Martin Jambon @ 2009-07-01 15:19 UTC (permalink / raw)
  To: Sylvain Le Gall; +Cc: caml-list

Sylvain Le Gall wrote:
> Hello,
> 
> On 01-07-2009, Andreas Rossberg <rossberg@mpi-sws.org> wrote:
>> Mike Lin wrote:
>>> OK, now I'm curious :) how does your lexer match balanced parentheses,
>>> or in this case comments?
>>>   
>> Easily, with a bit of side effects (I think that's roughly how all ML 
>> compilers do it):
>>
>> ------------------------------------------------
>> let error l s = (* ... *)
>> let commentDepth = ref 0
>> let start = ref 0
>> let loc length = let pos = !start in (pos, pos+length)
>>
>> rule lex =
>>     parse eof            { EOF }
>>     (* | ... *)
>>     | "{-"            { start := pos lexbuf;
>>                   lexNestComment lexbuf }
>>
>> and lexNestComment =
>>     parse eof            { error (loc 2) "unterminated comment" }
>>     | "(*"            { incr commentDepth;
>>                   lexNestComment lexbuf }
>>     | "*)"            { decr commentDepth;
>>                   if !commentDepth > 0
>>                   then lexNestComment lexbuf
>>                   else lex lexbuf }
>>     | _            { lexNestComment lexbuf }
>> ------------------------------------------------
>>
>> If you also want to treat strings in comments specially (like OCaml), 
>> then you need to do a bit more work, but it's basically the same idea.
>>
> 
> May I recommend you to write this in a more simple way:
> 
> -------------------------------------------------------------------------
> rule lex =
>   parse eof    { () }
>   | "(*"       { start := pos lexbuf; lexNestComment lexbuf; lex lexbuf }
> 
> and lexNestComment =
>   parse eof    { error (loc 2) "unterminated comment" }
> | "(*"         { lexNestComment lexbuf }
> | "*)"         { () }
> | _            { lexNestComment lexbuf }
> -------------------------------------------------------------------------
> 
> I think it works the same way, except that it uses less global
> variables.


You can even get rid of global variables completely:


rule lex x = parse
  eof    { () }
| "(*"   { x.start <- pos lexbuf; lexNestComment x lexbuf; lex x lexbuf }

and lexNestComment x = parse
  eof          { error (loc x 2) "unterminated comment" }
| "(*"         { lexNestComment x lexbuf }
| "*)"         { () }
| _            { lexNestComment x lexbuf }





Martin

-- 
http://mjambon.com/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Caml-list] Re: ocamllex and python-style indentation
  2009-07-01 15:19                     ` [Caml-list] " Martin Jambon
@ 2009-07-01 15:43                       ` Andreas Rossberg
  0 siblings, 0 replies; 22+ messages in thread
From: Andreas Rossberg @ 2009-07-01 15:43 UTC (permalink / raw)
  To: Martin Jambon; +Cc: Sylvain Le Gall, caml-list

Martin Jambon wrote:
> You can even get rid of global variables completely:
>
>
> rule lex x = parse
>   eof    { () }
> | "(*"   { x.start <- pos lexbuf; lexNestComment x lexbuf; lex x lexbuf }
>
> and lexNestComment x = parse
>   eof          { error (loc x 2) "unterminated comment" }
> | "(*"         { lexNestComment x lexbuf }
> | "*)"         { () }
> | _            { lexNestComment x lexbuf }
>   

Nicer indeed. To my defense I have to say that I wrote the code long 
before rule arguments were introduced into ocamllex - I said it was 
really old... :-)

- Andreas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ocamllex and python-style indentation
  2009-07-01 15:16                     ` [Caml-list] " Andreas Rossberg
@ 2009-07-01 16:26                       ` Sylvain Le Gall
  0 siblings, 0 replies; 22+ messages in thread
From: Sylvain Le Gall @ 2009-07-01 16:26 UTC (permalink / raw)
  To: caml-list

On 01-07-2009, Andreas Rossberg <rossberg@mpi-sws.org> wrote:
> Sylvain Le Gall wrote:
>> May I recommend you to write this in a more simple way:
>>
>> -------------------------------------------------------------------------
>> rule lex =
>>   parse eof    { () }
>>   | "(*"       { start := pos lexbuf; lexNestComment lexbuf; lex lexbuf }
>>
>> and lexNestComment =
>>   parse eof    { error (loc 2) "unterminated comment" }
>> | "(*"         { lexNestComment lexbuf }
>> | "*)"         { () }
>> | _            { lexNestComment lexbuf }
>> -------------------------------------------------------------------------
>>   
>
> Mh, I think in lexNestComment it should at least say
>
> | "(*"         { lexNestComment lexbuf; lexNestComment lexbuf }
>
>
> That might work. I am not sure how well the various lexer generators 
> handle arbitrary recursive invocations and reentrance, though. Have you 
> tried it? If it works, yes, that's certainly a nicer version.
>

Yes, you're right, here it is:
------------------------------------------------------------------------
{
let start = ref 0
let error l s = failwith "toto"
let loc length = let pos = !start in (pos, pos+length)
let pos _ = 0
}

rule lex =
  parse eof    { () }
  | "(*"       { start := pos lexbuf; lexNestComment lexbuf; lex lexbuf
} 
  | _          { print_string (Lexing.lexeme lexbuf); lex lexbuf }   

and lexNestComment =
  parse eof    { error (loc 2) "unterminated comment" }
| "(*"         { lexNestComment lexbuf; lexNestComment lexbuf }
| "*)"         { () }
| _            { lexNestComment lexbuf }


{
  let chn =
    open_in "nested.ml"
  in
    lex (Lexing.from_channel chn);
    close_in chn
}
------------------------------------------------------------------------

and nested.ml:
-----------------------------------
(* Comment 1 *)

"Comment1 ok";;

(* Commenbt 2 (* test *) *)

"Comment 2 ok";;

(* totot ... (* (* *) (**) *)*)

"Comment 3 ok";;
-----------------------------------

and the result:
----------------------------------------------------
ocamllex lexer.mll
9 states, 260 transitions, table size 1094 bytes
ocamlc -o lexer lexer.ml
./lexer



"Comment1 ok";;



"Comment 2 ok";;



"Comment 3 ok";;

-----------------------------------------------------

It works ;-)

Regards,
Sylvain Le Gall


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-07-01 16:27 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-11 12:57 ocamllex and python-style indentation Andrej Bauer
2009-06-11 13:12 ` [Caml-list] " yoann padioleau
2009-06-11 13:21 ` Andreas Rossberg
2009-06-11 13:44 ` Martin Jambon
2009-06-12  8:20   ` Andrej Bauer
2009-06-12 12:56     ` Martin Jambon
2009-06-12 13:34     ` Martin Jambon
2009-06-12 15:43     ` Andreas Rossberg
2009-06-30 18:58       ` Yitzhak Mandelbaum
2009-06-30 20:19         ` Mike Lin
2009-06-30 22:06         ` Andreas Rossberg
2009-07-01  2:13           ` Mike Lin
2009-07-01  7:31             ` Andreas Rossberg
2009-07-01 14:02               ` Mike Lin
2009-07-01 14:17                 ` Andreas Rossberg
2009-07-01 14:21                   ` Andreas Rossberg
2009-07-01 14:37                     ` Mike Lin
2009-07-01 15:03                   ` Sylvain Le Gall
2009-07-01 15:16                     ` [Caml-list] " Andreas Rossberg
2009-07-01 16:26                       ` Sylvain Le Gall
2009-07-01 15:19                     ` [Caml-list] " Martin Jambon
2009-07-01 15:43                       ` Andreas Rossberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).