caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Martin Jambon <martin.jambon@ens-lyon.org>
To: Andrej Bauer <andrej.bauer@andrej.com>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] ocamllex and python-style indentation
Date: Thu, 11 Jun 2009 15:44:59 +0200	[thread overview]
Message-ID: <4A310A5B.9010404@ens-lyon.org> (raw)
In-Reply-To: <7d8707de0906110557n6a1511a2k9f4f00827f954cb6@mail.gmail.com>

Andrej Bauer wrote:
> My parsing powers are not sufficient to easily come up with
> lexer/parser for a simple language that uses python-style indentation
> and newline rules. Does anyone have such a thing lying around, written
> in ocamllex/yacc or menhir? I would appreciate a peek to see how
> you've dealt with it.
> 
> For example, suppose we want just a very simple fragment of Python
> involving True, False, conditional statements, variables, and
> assignments, such as:
> 
> if True:
>     x = 3
>     y = (2 +
>       4 + 5)
> else:
>     x = 5
>     if False:
>         x = 8
>         z = 2
> 
> How would I go about writing a lexer/parser for such a thing in ocaml?

I would use a first pass that converts the input lines into this imaginary
structure:


{
if True:
;
    {
    x = 3
    ;
    y = (2 +
    ;
      {
      4 + 5)
      }
    }
;
else:
;
    {
    x = 5
    ;
    if False:
    ;
        {
        x = 8
        ;
        z = 2
        }
    }
}


You could create a generic tool that parses a file into this:

type t = Line of loc * string | Block of loc * t list


but as suggested by Yoann, the next step should probably be to flatten this
into a stream by introducing artificial tokens:

type gen_token =
   Open of loc          (* fake "{" *)
 | Close of loc         (* fake "}" *)
 | Separator of loc     (* fake ";" *)
 | Line of loc * string


then parse each Line into a list of tokens and flatten the result into one
single token stream:

type token =
   OPEN_BLOCK of loc  (* fake "{" *)
 | CLOSE_BLOCK of loc (* fake "}" *)
 | SEPARATOR of loc   (* fake ";" *)
 | ... (* your language-specific tokens here *)


The token stream could then be processed by ocamlyacc/menhir.


That's the approach I would follow if I had to solve this problem again.



Martin

-- 
http://mjambon.com/


  parent reply	other threads:[~2009-06-11 13:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-11 12:57 Andrej Bauer
2009-06-11 13:12 ` [Caml-list] " yoann padioleau
2009-06-11 13:21 ` Andreas Rossberg
2009-06-11 13:44 ` Martin Jambon [this message]
2009-06-12  8:20   ` Andrej Bauer
2009-06-12 12:56     ` Martin Jambon
2009-06-12 13:34     ` Martin Jambon
2009-06-12 15:43     ` Andreas Rossberg
2009-06-30 18:58       ` Yitzhak Mandelbaum
2009-06-30 20:19         ` Mike Lin
2009-06-30 22:06         ` Andreas Rossberg
2009-07-01  2:13           ` Mike Lin
2009-07-01  7:31             ` Andreas Rossberg
2009-07-01 14:02               ` Mike Lin
2009-07-01 14:17                 ` Andreas Rossberg
2009-07-01 14:21                   ` Andreas Rossberg
2009-07-01 14:37                     ` Mike Lin
2009-07-01 15:03                   ` Sylvain Le Gall
2009-07-01 15:16                     ` [Caml-list] " Andreas Rossberg
2009-07-01 16:26                       ` Sylvain Le Gall
2009-07-01 15:19                     ` [Caml-list] " Martin Jambon
2009-07-01 15:43                       ` Andreas Rossberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A310A5B.9010404@ens-lyon.org \
    --to=martin.jambon@ens-lyon.org \
    --cc=andrej.bauer@andrej.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).