caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Paul Snively <psnively@mac.com>
To: Paul Snively <psnively@mac.com>
Cc: oleg@pobox.com, caml-list@inria.fr
Subject: Re: [Caml-list] Incremental,	undoable parsing in OCaml as the general parser inversion
Date: Wed, 4 Jul 2007 16:42:38 -0700	[thread overview]
Message-ID: <FD724B66-A0A4-4889-A32E-D14D2EDDC3A0@mac.com> (raw)
In-Reply-To: <D691E383-3CEA-4617-9D2B-0F5232B75516@mac.com>

[-- Attachment #1: Type: text/plain, Size: 289 bytes --]

I think I made some boneheaded mistakes in my previous code. I think  
I've rectified them, but the code still doesn't compile. I've  
attached my new code, which at least fails to compile in consistent,  
suggestive ways, (hopefully) instead of being trivially ridiculous.

Thanks,
Paul


[-- Attachment #2: incremental-parsing.ml --]
[-- Type: application/octet-stream, Size: 1796 bytes --]

open Cc;;
open Stream;;
open Printf;;

let lexer = Genlex.make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"];;

let stream_fn1 str = fun pos ->
  (* printf "stream_fn1: asked for char #%d\n" pos; *)
  if pos < String.length str then Some str.[pos] else None;;

let show_token_stream = Stream.iter (function 
  | Genlex.Kwd s    -> printf "Stream elem: Kwd %s\n" s
  | Genlex.Ident s  -> printf "Stream elem: Ident %s\n" s
  | Genlex.Int d    -> printf "Stream elem: int %d\n" d
  | Genlex.Float f  -> printf "Stream elem: float %g\n" f
  | Genlex.String s -> printf "Stream elem: \"%s\"\n" s
  | Genlex.Char c   -> printf "Stream elem: '%c'\n" c
);;

let test1 = show_token_stream (lexer (Stream.from (stream_fn1 
   "let x = 1 + 2 in (* comment *) \"xxx\" ^ x")));;

type stream_req = ReqDone | ReqChar of (char option -> stream_req) Cc.m;;
let stream_inv p = fun pos -> run (shiftP p (fun sk -> ReqChar (pos,sk)));;

let rec filler buffer buffer_pos = function ReqDone -> ReqDone
  | ReqChar (pos,k) as req ->
  let pos_rel = pos - buffer_pos in
  let _ = 
    (* we don't accumulate already emptied buffers. We could. *)
    assert (pos_rel >= 0) in
  if pos_rel < String.length buffer then
    (* desired char is already in buffer *)
    filler buffer buffer_pos ((run k) (Some buffer.[pos_rel]))
    else
    (* buffer underflow. Ask the user to fill the buffer *)
    req
;;

let finish (ReqChar (pos,k)) = filler "" pos ((run k) (return None));;

let cont str (ReqChar (pos,k) as req) = filler str pos req;;
let finish (ReqChar (pos,k)) = filler "" pos ((run k) (return None));;

let test2c0 = perform
  p <-- new_prompt ();
  return (pushP p (return
    (filler "" 0 (
    show_token_stream (lexer (Stream.from (stream_inv p))); ReqDone))));;

let test2c1 = cont "let x = 1 + 2 " test2c0;;

[-- Attachment #3: Type: text/plain, Size: 8990 bytes --]


On Jul 4, 2007, at 12:41 PM, Paul Snively wrote:

> Oleg,
>
> Thanks for this! I've attempted to rewrite this in terms of your  
> monadic version of delimited continuations, as included in the  
> pa_monad syntax extension distribution. I'm attaching my effort so  
> far, which I have run in an OCaml 3.09.3 toploop in which I've done:
>
> #use "topfind";;
> #camlp4o;;
> #require "monad";;
> #load "cc.cmo";;
>
> I'm having some difficulty, I think, due to stream_req being a  
> recursive type that I need to turn into a type that sometimes  
> represents a computation that returns the (recursive) type. Any  
> advice you (or others) might have would be most welcome.
>
> Many thanks and best regards,
> Paul Snively
>
> <incremental-parsing.ml>
> On Jul 3, 2007, at 4:38 AM, oleg@pobox.com wrote:
>
>>
>> This message is to show that the incremental parsing in OCaml is a
>> solved problem -- essentially for any parser. Moreover, the procedure
>> to convert from a regular parser to an incremental one is independent
>> of the parser implementation. The incremental parsing lets us parse
>> from a stream that is only partially known. The parser may
>> report what it can and will ask for more input. When more input is
>> supplied, the parsing resumes. No OS threads are required. The
>> solution to control inversion is to use delimited continuations --
>> which are designed for the purpose of giving fine control over
>> continuations in direct-style programs.
>>
>> skaller wrote:
>>> This problem is not restricted to parsers .. it's a general
>>> problem with Ocaml and also C, C++, and most other languages.
>>>
>>> My language Felix solves this problem for procedures with
>>> user space threading .. but it doesn't work with functions.
>>> [And the builtin GLR parser is purely functional:]
>>
>> The solution below works especially well with functions. That is, if
>> the parser maintains no visible state, the parsing is not only
>> incremental but is also undoable and restartable. If, after ingesting
>> the current chunk of input the parser reported a problem, we can `go
>> back' and supply a different.
>>
>>> Other languages like Scheme and I think Haskell and MLton
>>> have more general solutions because they're not restricted
>>> to the archaic concept of a linear stack.
>>
>> Linear or segmented stack is an implementation detail. The central
>> issue is delimited continuations. In fact, OCaml has a rare
>> distinction of being the first system with widely available native
>> delimited continuations (the native implementation of `control' was
>> announced for Scheme48 back at ICFP02, but it was not included in the
>> Scheme48 distribution). So, OCaml is well ahead of the pack.
>>
>> The solution is illustrated below. For simplicity, we'll be using
>> Genlex.make_lexer of OCaml 3.09 as our parser (it is quite
>> inconvenient for me at the moment to upgrade to OCaml 3.10).  Code
>> fragments (cut and pasted from the eshell buffer) are given below
>> within [[ ... ]] brackets.
>>
>> First, preliminaries:
>>
>> [[
>>
>> (* Open the DelimCC library
>>   http://pobox.com/~oleg/ftp/Computation/Continuations.html#caml- 
>> shift
>> *)
>> open Delimcc;;
>> let shift p f = take_subcont p (fun sk () ->
>>   push_prompt p (fun () -> (f (fun c ->
>>     push_prompt p (fun () -> push_subcont sk (fun () -> c))))))
>> ;;
>>
>> open Stream;;
>> open Printf;;
>> ]]
>>
>> First, we define the lexer and demonstrate non-incremental  
>> parsing. We
>> apply the lexer to a sample stream and print out the produced tokens.
>> We use the standard library Stream.iter function.
>>
>> [[
>> let lexer = Genlex.make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"];;
>>
>> let stream_fn1 str = fun pos ->
>>   (* printf "stream_fn1: asked for char #%d\n" pos; *)
>>   if pos < String.length str then Some str.[pos] else None;;
>>
>> let show_token_stream = Stream.iter (function
>>   | Genlex.Kwd s    -> printf "Stream elem: Kwd %s\n" s
>>   | Genlex.Ident s  -> printf "Stream elem: Ident %s\n" s
>>   | Genlex.Int d    -> printf "Stream elem: int %d\n" d
>>   | Genlex.Float f  -> printf "Stream elem: float %g\n" f
>>   | Genlex.String s -> printf "Stream elem: \"%s\"\n" s
>>   | Genlex.Char c   -> printf "Stream elem: '%c'\n" c
>> );;
>> ]]
>>
>> Evaluating
>> [[
>> let test1 = show_token_stream (lexer (Stream.from (stream_fn1
>>    "let x = 1 + 2 in (* comment *) \"xxx\" ^ x")));;
>> ]]
>>
>> produces the expected result
>>
>> 	Stream elem: Kwd let
>> 	Stream elem: Ident x
>> 	Stream elem: Kwd =
>> 	Stream elem: int 1
>> 	Stream elem: Kwd +
>> 	Stream elem: int 2
>> 	Stream elem: Ident in
>> 	Stream elem: "xxx"
>> 	Stream elem: Ident ^
>> 	Stream elem: Ident x
>> 	val test1 : unit = ()
>>
>>
>> We now invert the parser, in the following two lines:
>> [[
>> type stream_req = ReqDone | ReqChar of int * (char option ->  
>> stream_req);;
>> let stream_inv p = fun pos -> shift p (fun sk -> ReqChar (pos,sk));;
>> ]]
>>
>> That is it. If we wish to ask the user for chunks (rather than
>> characters) of data, we need a simple buffering layer.
>>
>> [[
>> let rec filler buffer buffer_pos = function ReqDone -> ReqDone
>>   | ReqChar (pos,k) as req ->
>>   let pos_rel = pos - buffer_pos in
>>   let _ =
>>     (* we don't accumulate already emptied buffers. We could. *)
>>     assert (pos_rel >= 0) in
>>   if pos_rel < String.length buffer then
>>     (* desired char is already in buffer *)
>>     filler buffer buffer_pos (k (Some buffer.[pos_rel]))
>>     else
>>     (* buffer underflow. Ask the user to fill the buffer *)
>>     req
>> ;;
>>
>>
>> let cont str (ReqChar (pos,k) as req) = filler str pos req;;
>> let finish (ReqChar (pos,k)) = filler "" pos (k None);;
>>
>> ]]
>>
>> We are all set. Please compare the iterator below with test1 above.
>> The composition "show_token_stream (lexer (Stream.from" is exactly as
>> before. We merely replaced the stream function stream_fn with
>> stream_inv. In calculus terms, we replaced "x" with "x+dx".
>>
>> [[
>> let test2c0 =
>>   let p = new_prompt () in
>>   push_prompt p (fun () ->
>>     filler "" 0 (
>>     show_token_stream (lexer (Stream.from (stream_inv p)));  
>> ReqDone));;
>> ]]
>>
>> Now, the result is
>>           val test2c0 : stream_req = ReqChar (0, <fun>)
>>
>> That is, the parser is suspended, awaiting the character number 0. We
>> now feed the parser chunks of input. The chunks do
>> not have to contain complete lexems. For example, a comment may start
>> in one chunk and end in another.
>>
>> [[
>> let test2c1 = cont "let x = 1 + 2 " test2c0;;
>> ]]
>> 	Stream elem: Kwd let
>> 	Stream elem: Ident x
>> 	Stream elem: Kwd =
>> 	Stream elem: int 1
>> 	Stream elem: Kwd +
>> 	Stream elem: int 2
>> 	val test2c1 : stream_req = ReqChar (14, <fun>)
>>
>> The parser ate the chunk, produced some tokens, and waits for more
>> input.
>>
>> [[
>> let test2c2 = cont "in (* com " test2c1;;
>> ]]
>>
>> 	Stream elem: Ident in
>> 	val test2c2 : stream_req = ReqChar (24, <fun>)
>>
>> Here, the chunk contains a non-terminating comment.
>>
>> [[
>> let test2c3 = cont "ment *) " test2c2;;
>> ]]
>> 	val test2c3 : stream_req = ReqChar (32, <fun>)
>>
>> [[
>> let test2c4 = cont "\"xx" test2c3;;
>> ]]
>> 	val test2c4 : stream_req = ReqChar (35, <fun>)
>>
>> [[
>> let test2c5 = cont "x\" " test2c4;;
>> ]]
>> 	Stream elem: "xxx"
>> 	val test2c5 : stream_req = ReqChar (38, <fun>)
>>
>> Finally the parser produced something, because it got the matching
>> double quote.
>>
>> [[
>> let test2c6 = cont " ^ x" test2c5;;
>> ]]
>>
>> 	Stream elem: Ident ^
>> 	val test2c6 : stream_req = ReqChar (42, <fun>)
>> [[
>> let test2c7 = finish test2c6;;
>> ]]
>>
>> 	Stream elem: Ident x
>> 	val test2c7 : stream_req = ReqDone
>>
>> And we are finished. As we said earlier, the parser can be
>> restartable and undoable. Suppose, we have changed our mind and
>> decided to continue parsing after the checkpoint test2c6:
>>
>> [[
>> let test2c71 = cont "yx * 10" test2c6;;
>> let test2c8  = finish test2c71;;
>> ]]
>>
>> 	Stream elem: Ident xyx
>> 	Stream elem: Kwd *
>> 	val test2c71 : stream_req = ReqChar (49, <fun>)
>> 	Stream elem: int 10
>> 	val test2c8 : stream_req = ReqDone
>>
>> and so we get an `alternative' parse, of an alternative stream. We  
>> can go
>> back to any checkpoint test2c1, test2c2, etc. and continue parsing
>> from that point.
>>
>> _______________________________________________
>> Caml-list mailing list. Subscription management:
>> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
>> Archives: http://caml.inria.fr
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs


  reply	other threads:[~2007-07-04 23:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-03 11:38 oleg
2007-07-03 15:28 ` [Caml-list] " skaller
2007-07-04 19:41 ` Paul Snively
2007-07-04 23:42   ` Paul Snively [this message]
2007-07-05  8:13     ` oleg
2007-07-05 12:39       ` skaller
2007-07-05 23:00         ` [Caml-list] Incremental, undoable parsing in OCaml as the oleg
2007-07-06  4:40           ` skaller
2007-07-05 13:23       ` [Caml-list] Incremental, undoable parsing in OCaml as the general parser inversion skaller
2007-07-05 13:54       ` Paul Snively

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FD724B66-A0A4-4889-A32E-D14D2EDDC3A0@mac.com \
    --to=psnively@mac.com \
    --cc=caml-list@inria.fr \
    --cc=oleg@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).