caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* yet another silly question on PXP
@ 2005-02-22 17:07 Paul Argentoff
  2005-02-22 17:34 ` [Caml-list] " Jerome Simeon
  2005-02-22 18:25 ` Paul Argentoff
  0 siblings, 2 replies; 12+ messages in thread
From: Paul Argentoff @ 2005-02-22 17:07 UTC (permalink / raw)
  To: caml-list

Hello world!

I have recently found a features in PXP named "pull parser", "event
interface". I hope these things can help me with such a problems as xmpp
streams parsing or huuuuge files parsing using Ocaml lazy streams (to avoid
"Out of memory" errors). Can anybody suggest an url/other place to read
more on these? I'm now reading the pxp source comments and version infos
from it's site.

Thanks.
-- 
Yours truly, WBR, Paul Argentoff.
Jabber:	paul@jabber.rtelekom.ru
RIPE:	PA1291-RIPE


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] yet another silly question on PXP
  2005-02-22 17:07 yet another silly question on PXP Paul Argentoff
@ 2005-02-22 17:34 ` Jerome Simeon
  2005-02-22 18:25 ` Paul Argentoff
  1 sibling, 0 replies; 12+ messages in thread
From: Jerome Simeon @ 2005-02-22 17:34 UTC (permalink / raw)
  To: Paul Argentoff; +Cc: caml-list

Those are just a pull variant of a SAX parser.

People at BEA have done some work on that (They call it token stream):

Daniela Florescu, Chris Hillery, Donald Kossmann, Paul Lucas, Fabio 
Riccardi, Till Westmann, Michael J. Carey, Arvind Sundararajan, Geetika 
Agrawal: The BEA/XQRL Streaming XQuery Processor. VLDB 2003: 997-1008

http://www.informatik.uni-trier.de/~ley/db/conf/vldb/vldb2003.html#FlorescuHKLRWCSA03

The XTiSP system which was presented at PLAN-X in January seems to have
something similar as well:
# XTiSP presented by Keisuke Nakano (UTokyo) http://xtisp.psdlab.org/ 

XML pull token streams also used extensively inside the Galax's query 
engine.

There are probably other projects using those.
- Jerome

caml-list-admin@yquem.inria.fr wrote on 02/22/2005 12:07:18 PM:

> Hello world!
> 
> I have recently found a features in PXP named "pull parser", "event
> interface". I hope these things can help me with such a problems as xmpp
> streams parsing or huuuuge files parsing using Ocaml lazy streams (to 
avoid
> "Out of memory" errors). Can anybody suggest an url/other place to read
> more on these? I'm now reading the pxp source comments and version infos
> from it's site.
> 
> Thanks.
> -- 
> Yours truly, WBR, Paul Argentoff.
> Jabber:   paul@jabber.rtelekom.ru
> RIPE:   PA1291-RIPE
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] yet another silly question on PXP
  2005-02-22 17:07 yet another silly question on PXP Paul Argentoff
  2005-02-22 17:34 ` [Caml-list] " Jerome Simeon
@ 2005-02-22 18:25 ` Paul Argentoff
  2005-02-22 19:03   ` Gerd Stolpmann
  1 sibling, 1 reply; 12+ messages in thread
From: Paul Argentoff @ 2005-02-22 18:25 UTC (permalink / raw)
  To: caml-list

Dear Paul Argentoff,

Let PA = "Paul Argentoff" in
  written_by PA => 

 PA> Hello world!  I have recently found a features in PXP named "pull
 PA> parser", "event interface". I hope these things can help me with such
 PA> a problems as xmpp streams parsing or huuuuge files parsing using
 PA> Ocaml lazy streams (to avoid "Out of memory" errors). Can anybody
 PA> suggest an url/other place to read more on these? I'm now reading the
 PA> pxp source comments and version infos from it's site.

One more question: where can I find any documentation (besides comments) on
pxp-pp library? How can I use it?

-- 
Yours truly, WBR, Paul Argentoff.
Jabber:	paul@jabber.rtelekom.ru
RIPE:	PA1291-RIPE


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] yet another silly question on PXP
  2005-02-22 18:25 ` Paul Argentoff
@ 2005-02-22 19:03   ` Gerd Stolpmann
  2005-02-24  7:49     ` Paul Argentoff
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Gerd Stolpmann @ 2005-02-22 19:03 UTC (permalink / raw)
  To: Paul Argentoff; +Cc: caml-list

Am Dienstag, den 22.02.2005, 21:25 +0300 schrieb Paul Argentoff:
> Dear Paul Argentoff,
> 
> Let PA = "Paul Argentoff" in
>   written_by PA => 
> 
>  PA> Hello world!  I have recently found a features in PXP named "pull
>  PA> parser", "event interface". I hope these things can help me with such
>  PA> a problems as xmpp streams parsing or huuuuge files parsing using
>  PA> Ocaml lazy streams (to avoid "Out of memory" errors). Can anybody
>  PA> suggest an url/other place to read more on these? I'm now reading the
>  PA> pxp source comments and version infos from it's site.
> 
> One more question: where can I find any documentation (besides comments) on
> pxp-pp library? How can I use it?

See the file doc/PREPROCESSOR which is part of the distribution tarball.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] yet another silly question on PXP
  2005-02-22 19:03   ` Gerd Stolpmann
@ 2005-02-24  7:49     ` Paul Argentoff
  2005-02-24 12:11     ` Paul Argentoff
  2005-02-25 16:14     ` :pxp_evpull notation (was: yet another silly question on PXP) Paul Argentoff
  2 siblings, 0 replies; 12+ messages in thread
From: Paul Argentoff @ 2005-02-24  7:49 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

Dear Gerd Stolpmann,

Let GS = "Gerd Stolpmann" in
  written_by GS => 

 GS> See the file doc/PREPROCESSOR

thnx

-- 
Yours truly, WBR, Paul Argentoff.
Jabber:	paul@jabber.rtelekom.ru
RIPE:	PA1291-RIPE


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] yet another silly question on PXP
  2005-02-22 19:03   ` Gerd Stolpmann
  2005-02-24  7:49     ` Paul Argentoff
@ 2005-02-24 12:11     ` Paul Argentoff
  2005-02-25  7:35       ` Paul Argentoff
  2005-02-25 16:14     ` :pxp_evpull notation (was: yet another silly question on PXP) Paul Argentoff
  2 siblings, 1 reply; 12+ messages in thread
From: Paul Argentoff @ 2005-02-24 12:11 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

Dear Gerd Stolpmann,

Let GS = "Gerd Stolpmann" in
  written_by GS => 

 GS> See the file doc/PREPROCESSOR which is part of the distribution
 GS> tarball.

Ok. But I can't compile it with OCamlMakeFile. Is there any way to do that?

-- 
Yours truly, WBR, Paul Argentoff.
Jabber:	paul@jabber.rtelekom.ru
RIPE:	PA1291-RIPE


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] yet another silly question on PXP
  2005-02-24 12:11     ` Paul Argentoff
@ 2005-02-25  7:35       ` Paul Argentoff
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Argentoff @ 2005-02-25  7:35 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

Dear Paul Argentoff,

Let PA = "Paul Argentoff" in
  written_by PA => 

 PA> But I can't compile it with OCamlMakeFile. Is there any way to do
 PA> that?

Here's the workaround I found:

In the first line of preprocessed file I write (*pp sh pp.sh *) -- that's
OCamlMakefile standard except that I use as a preprocesssor a custom sh
script which is generated from within Makefile as a .PHONY target. Here's
an example of my Makefile fragment:

PACKS= zip \
       equeue \
       netclient \
       pxp-engine \
       pxp-ulex-utf8 \
       pxp-pp \
       annexlib \
       postgresql \
       dbi

PPPACKS= netstring \
	 pcre

USE_CAMLP4 = yes
PPLIBS = unix.cma \
	 pcre.cma \
	 netstring.cma \
	 pxp_pp.cma

PRE_TARGETS = pp.sh

.PHONY: pp.sh

pp.sh:
	echo -n "camlp4o" >pp.sh
	$(foreach pack, ${PACKS}, echo -n " -I `ocamlfind query ${pack}`" >>pp.sh;) \
	$(foreach pack, ${PPPACKS}, echo -n " -I `ocamlfind query ${pack}`" >>pp.sh;) \
	echo -n " -I `ocamlc -where`" >>pp.sh
	$(foreach lib, ${PPLIBS}, echo -n " ${lib}" >>pp.sh;) \
	echo -n " "$$\1 >>pp.sh

The latter part may not seem that elegant, but it's what I could do at last
last night after reading those gnu make manuals...

-- 
Yours truly, WBR, Paul Argentoff.
Jabber:	paul@jabber.rtelekom.ru
RIPE:	PA1291-RIPE


^ permalink raw reply	[flat|nested] 12+ messages in thread

* :pxp_evpull notation (was: yet another silly question on PXP)
  2005-02-22 19:03   ` Gerd Stolpmann
  2005-02-24  7:49     ` Paul Argentoff
  2005-02-24 12:11     ` Paul Argentoff
@ 2005-02-25 16:14     ` Paul Argentoff
  2005-02-27 19:05       ` Gerd Stolpmann
  2 siblings, 1 reply; 12+ messages in thread
From: Paul Argentoff @ 2005-02-25 16:14 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

Dear Gerd Stolpmann,

Let GS = "Gerd Stolpmann" in
  written_by GS => 

 GS> See the file doc/PREPROCESSOR which is part of the distribution
 GS> tarball.

Thanks again for a reference. My next question is about :pxp_evpull
notation. Can I make such a construct:

let pile = <:pxp_evpull<
             <foo> (: some_fun () :) >>

where some_fun generates a further "subtree" using the same pxp_evpull
notation. 

My task really is to build a converter from a huge (>100M) text file (or
string Stream.t) to a huge xml file. Of course, I need to do all job with
lazy streams to avoid out-of-memory exceptions.

-- 
Yours truly, WBR, Paul Argentoff.
Jabber:	paul@jabber.rtelekom.ru
RIPE:	PA1291-RIPE


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: :pxp_evpull notation (was: yet another silly question on PXP)
  2005-02-25 16:14     ` :pxp_evpull notation (was: yet another silly question on PXP) Paul Argentoff
@ 2005-02-27 19:05       ` Gerd Stolpmann
  2005-02-28 10:24         ` :pxp_evpull notation Paul Argentoff
  0 siblings, 1 reply; 12+ messages in thread
From: Gerd Stolpmann @ 2005-02-27 19:05 UTC (permalink / raw)
  To: Paul Argentoff; +Cc: caml-list

Am Freitag, den 25.02.2005, 19:14 +0300 schrieb Paul Argentoff:
> Dear Gerd Stolpmann,
> 
> Let GS = "Gerd Stolpmann" in
>   written_by GS => 
> 
>  GS> See the file doc/PREPROCESSOR which is part of the distribution
>  GS> tarball.
> 
> Thanks again for a reference. My next question is about :pxp_evpull
> notation. Can I make such a construct:
> 
> let pile = <:pxp_evpull<
>              <foo> (: some_fun () :) >>
> 
> where some_fun generates a further "subtree" using the same pxp_evpull
> notation. 

Yes, this works. some_fun is called when the events for the children of
foo are generated. You must have

some_fun : unit -> Pxp_types.event option

and some_fun is repeatedly called until it returns None.

pxp_evpull generates automata where every state returns an event.
External functions like some_fun are represented as loops, i.e. the next
state is the same state when the function returns Some _, and the
following state for None.

For your example, <:pxp_evpull< <foo> (: some_fun () :) >>, the
automaton is:

let _ =
  let _eid = Pxp_dtd.Entity.create_entity_id () in
  let rec _generator =
    let _state = ref 0 in
    fun _arg ->
      match !_state with
        0 ->
          let ev = Pxp_types.E_start_tag ("foo", [], None, _eid) in
          _state := 1; Some ev
      | 1 ->
          begin match some_fun () _arg with
            None -> _state := 2; _generator _arg
          | Some Pxp_types.E_end_of_stream -> _generator _arg
          | Some ev -> Some ev
          end
      | 2 ->
          let ev = Pxp_types.E_end_tag ("foo", _eid) in _state := 3; Some ev
      | 3 -> None
      | _ -> assert false
  in
  _generator

(output generated with "camlp4 -I ... pa_o.cmo pa_op.cmo pcre.cma
unix.cma netstring.cma pxp_pp.cma pr_o.cmo sample.ml")

some_fun can even be another pxp_evtree automaton.

> My task really is to build a converter from a huge (>100M) text file (or
> string Stream.t) to a huge xml file. Of course, I need to do all job with
> lazy streams to avoid out-of-memory exceptions.

Pull parsers are your friend. They were created with such applications
in mind.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: :pxp_evpull notation
  2005-02-27 19:05       ` Gerd Stolpmann
@ 2005-02-28 10:24         ` Paul Argentoff
  2005-02-28 10:39           ` Gerd Stolpmann
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Argentoff @ 2005-02-28 10:24 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

Dear Gerd Stolpmann,

Let GS = "Gerd Stolpmann" in
  written_by GS => 

 GS> some_fun can even be another pxp_evtree automaton.

pxp_evtree? That sounds a bit new. I cannot find such a notation in PXP
1.95. Or you're speaking figuratively?

-- 
Yours truly, WBR, Paul Argentoff.
Jabber:	paul@jabber.rtelekom.ru
RIPE:	PA1291-RIPE


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: :pxp_evpull notation
  2005-02-28 10:24         ` :pxp_evpull notation Paul Argentoff
@ 2005-02-28 10:39           ` Gerd Stolpmann
  2005-02-28 11:00             ` Paul Argentoff
  0 siblings, 1 reply; 12+ messages in thread
From: Gerd Stolpmann @ 2005-02-28 10:39 UTC (permalink / raw)
  To: Paul Argentoff; +Cc: caml-list

Am Montag, den 28.02.2005, 13:24 +0300 schrieb Paul Argentoff:
> Dear Gerd Stolpmann,
> 
> Let GS = "Gerd Stolpmann" in
>   written_by GS => 
> 
>  GS> some_fun can even be another pxp_evtree automaton.
> 
> pxp_evtree? That sounds a bit new. I cannot find such a notation in PXP
> 1.95. Or you're speaking figuratively?

Sorry, I meant pxp_evpull.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: :pxp_evpull notation
  2005-02-28 10:39           ` Gerd Stolpmann
@ 2005-02-28 11:00             ` Paul Argentoff
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Argentoff @ 2005-02-28 11:00 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

Dear Gerd Stolpmann,

Let GS = "Gerd Stolpmann" in
  written_by GS => 

 GS> Sorry, I meant pxp_evpull.

Nevermind ;). I keep working on your last letter ;)

-- 
Yours truly, WBR, Paul Argentoff.
Jabber:	paul@jabber.rtelekom.ru
RIPE:	PA1291-RIPE


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-02-28 11:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-22 17:07 yet another silly question on PXP Paul Argentoff
2005-02-22 17:34 ` [Caml-list] " Jerome Simeon
2005-02-22 18:25 ` Paul Argentoff
2005-02-22 19:03   ` Gerd Stolpmann
2005-02-24  7:49     ` Paul Argentoff
2005-02-24 12:11     ` Paul Argentoff
2005-02-25  7:35       ` Paul Argentoff
2005-02-25 16:14     ` :pxp_evpull notation (was: yet another silly question on PXP) Paul Argentoff
2005-02-27 19:05       ` Gerd Stolpmann
2005-02-28 10:24         ` :pxp_evpull notation Paul Argentoff
2005-02-28 10:39           ` Gerd Stolpmann
2005-02-28 11:00             ` Paul Argentoff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).