caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Partial parsing
@ 2006-07-27 22:37 Markus Mottl
  2006-07-27 22:57 ` [Caml-list] " Fermin Reig
  2006-07-28  9:58 ` Jean-Marie Gaillourdet
  0 siblings, 2 replies; 5+ messages in thread
From: Markus Mottl @ 2006-07-27 22:37 UTC (permalink / raw)
  To: ocaml

Hi,

one feature that I'd love to see with our lexing/parsing tools for
OCaml would be the possibility to perform partial lexing / parsing.

For example, imagine that you read a message from a socket into a
string buffer, but it is not yet complete.  You may not be able to
know that in advance, because that would require parsing it.

What I'd like to be able to do is to e.g. call a lexer or parser
function on this buffer, and it will recognize as much as possible
with the option to continue parsing in another buffer.  For this we
could e.g. define a type "parse_result":

  type parse_result =
    | Done of result * int
    | Cont of parse_fun
  and parse_fun = pos : int -> len : int -> string -> parse_result

If a full parse was detected, it would return e.g. "Done (result,
next_pos)", where "result" is the result of the successful parse, and
"next_pos" is the position in the buffer that contains the character
following the successfully parsed text.

If the buffer did not contain enough data to finish the parse, the
function would return e.g. "Cont f", where "f" can be called to
continue parsing in a new buffer starting at position "pos" and
reading at most "len" characters, e.g. in the case when the rest of
the message arrives in the input buffer.

So far this kind of parsing can only be implemented by hand, which is
very tedious to do and inevitable if one wants to read self-delimiting
messages of variable size from sources that return partial data.  I
don't see any inherent problem extending the code generators for
ocamllex and ocamlyacc to support this feature.

Has anybody already implemented a generic solution to this kind of
parsing problem?  Maybe Menhir, which seems like an ambitious parser
project, could add this feature to support partial parsing (of course,
this would require cooperation from the lexer, too)?

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Partial parsing
  2006-07-27 22:37 Partial parsing Markus Mottl
@ 2006-07-27 22:57 ` Fermin Reig
  2006-07-28 13:39   ` Sebastien Ferre
  2006-07-28  9:58 ` Jean-Marie Gaillourdet
  1 sibling, 1 reply; 5+ messages in thread
From: Fermin Reig @ 2006-07-27 22:57 UTC (permalink / raw)
  To: ocaml; +Cc: Markus Mottl

Markus Mottl wrote:
> Hi,
> 
> one feature that I'd love to see with our lexing/parsing tools for
> OCaml would be the possibility to perform partial lexing / parsing.
> 
> For example, imagine that you read a message from a socket into a
> string buffer, but it is not yet complete.  You may not be able to
> know that in advance, because that would require parsing it.
> 
> What I'd like to be able to do is to e.g. call a lexer or parser
> function on this buffer, and it will recognize as much as possible
> with the option to continue parsing in another buffer.  For this we
> could e.g. define a type "parse_result":
> 
>  type parse_result =
>    | Done of result * int
>    | Cont of parse_fun
>  and parse_fun = pos : int -> len : int -> string -> parse_result
> 
> If a full parse was detected, it would return e.g. "Done (result,
> next_pos)", where "result" is the result of the successful parse, and
> "next_pos" is the position in the buffer that contains the character
> following the successfully parsed text.
 > [...]

This sounds like a parsing problem suitable for the parser combinator 
approach. There, the result of a parse is a pair where one of the two 
components is the remaining of the input string.

Many of the papers and implementations are from the Haskell folk, but I 
think there are some in Ocaml as well.

HTH,
Fermin



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Partial parsing
  2006-07-27 22:37 Partial parsing Markus Mottl
  2006-07-27 22:57 ` [Caml-list] " Fermin Reig
@ 2006-07-28  9:58 ` Jean-Marie Gaillourdet
  2006-07-28 14:01   ` Markus Mottl
  1 sibling, 1 reply; 5+ messages in thread
From: Jean-Marie Gaillourdet @ 2006-07-28  9:58 UTC (permalink / raw)
  To: Markus Mottl; +Cc: ocaml

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

On 28.07.2006, at 00:37, Markus Mottl wrote:

> Hi,
>
> one feature that I'd love to see with our lexing/parsing tools for
> OCaml would be the possibility to perform partial lexing / parsing.
>
> For example, imagine that you read a message from a socket into a
> string buffer, but it is not yet complete.  You may not be able to
> know that in advance, because that would require parsing it.
>
> What I'd like to be able to do is to e.g. call a lexer or parser
> function on this buffer, and it will recognize as much as possible
> with the option to continue parsing in another buffer.  For this we
> could e.g. define a type "parse_result":
>
>  type parse_result =
>    | Done of result * int
>    | Cont of parse_fun
>  and parse_fun = pos : int -> len : int -> string -> parse_result
>
> If a full parse was detected, it would return e.g. "Done (result,
> next_pos)", where "result" is the result of the successful parse, and
> "next_pos" is the position in the buffer that contains the character
> following the successfully parsed text.
>
> If the buffer did not contain enough data to finish the parse, the
> function would return e.g. "Cont f", where "f" can be called to
> continue parsing in a new buffer starting at position "pos" and
> reading at most "len" characters, e.g. in the case when the rest of
> the message arrives in the input buffer.
>
> So far this kind of parsing can only be implemented by hand, which is
> very tedious to do and inevitable if one wants to read self-delimiting
> messages of variable size from sources that return partial data.  I
> don't see any inherent problem extending the code generators for
> ocamllex and ocamlyacc to support this feature.
>
> Has anybody already implemented a generic solution to this kind of
> parsing problem?  Maybe Menhir, which seems like an ambitious parser
> project, could add this feature to support partial parsing (of course,
> this would require cooperation from the lexer, too)?

You could implement such a feature on top of the existing lexer and  
parser technologies by using thread. Dedicate one thread to the lexer  
and parser, make the input buffer blocking and  make every suitable  
intermediate result available to the main thread.

I think it should also be possible to construct such a feature with  
the resumable exceptions module posted on this list some weeks ago.

And I think it should be possible to implement that with the  
continuations posted here, too.

But I have not much experience with continuations. And I am not sure  
whether that really fits into the framework of ocamllex/ocamlyacc.

As Fermin Reig pointed out parser combinators provide that feature  
out of the box. AFAIK, they depend on lazy evaluation. I am not sure  
whether those libraries for OCaml are available to hide that or not.

Best regards,
   Jean-Marie
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFEyd/nNIUNP/I5YOgRAmWJAJ9WijR0B0f1FT/4smsum7Ln4qWXPACeLxkB
4Hp1UPcEseMzFYzGZfLhbYc=
=IqFi
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Partial parsing
  2006-07-27 22:57 ` [Caml-list] " Fermin Reig
@ 2006-07-28 13:39   ` Sebastien Ferre
  0 siblings, 0 replies; 5+ messages in thread
From: Sebastien Ferre @ 2006-07-28 13:39 UTC (permalink / raw)
  To: ocaml

Hi,

Fermin Reig wrote:
> This sounds like a parsing problem suitable for the parser combinator 
> approach. There, the result of a parse is a pair where one of the two 
> components is the remaining of the input string.
> 
> Many of the papers and implementations are from the Haskell folk, but I 
> think there are some in Ocaml as well.

Indeed. Look at the library Ostaps in the Caml hump.

HTH,
Sebastien


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Partial parsing
  2006-07-28  9:58 ` Jean-Marie Gaillourdet
@ 2006-07-28 14:01   ` Markus Mottl
  0 siblings, 0 replies; 5+ messages in thread
From: Markus Mottl @ 2006-07-28 14:01 UTC (permalink / raw)
  To: Jean-Marie Gaillourdet; +Cc: ocaml

On 7/28/06, Jean-Marie Gaillourdet <jm@gaillourdet.net> wrote:
> You could implement such a feature on top of the existing lexer and
> parser technologies by using thread. Dedicate one thread to the lexer
> and parser, make the input buffer blocking and  make every suitable
> intermediate result available to the main thread.

I'd rather not depend on extra threads for parsing.  Besides of them
making things complicated, in my case I also need to be able to deal
with nonblocking / interruptable input functions.

> But I have not much experience with continuations. And I am not sure
> whether that really fits into the framework of ocamllex/ocamlyacc.

I think it would work well: instead of returning a token stream, the
lexer should return a stream of either matched tokens or
continuations.  The parser would consume the recognized tokens, or
return another continuation that would continue lexing/parsing.

> As Fermin Reig pointed out parser combinators provide that feature
> out of the box. AFAIK, they depend on lazy evaluation. I am not sure
> whether those libraries for OCaml are available to hide that or not.

Sure, implementing some parser combinators might be an option, but I'm
otherwise quite happy with the ease of specification and the high
efficiency of ocamllex/ocamlyacc generated parsers.  It's just that
interrupting and resuming parsing is currently not an option.

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-07-28 14:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-27 22:37 Partial parsing Markus Mottl
2006-07-27 22:57 ` [Caml-list] " Fermin Reig
2006-07-28 13:39   ` Sebastien Ferre
2006-07-28  9:58 ` Jean-Marie Gaillourdet
2006-07-28 14:01   ` Markus Mottl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).