caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Richard Jones <rich@annexia.org>
Cc: yoann padioleau <padator@wanadoo.fr>, caml-list@inria.fr
Subject: Re: [Caml-list] Memory usage/ garbage collection question
Date: Fri, 14 Oct 2005 12:07:06 +0200	[thread overview]
Message-ID: <1129284426.12434.110.camel@localhost.localdomain> (raw)
In-Reply-To: <20051014101018.GA13302@furbychan.cocan.org>

Am Freitag, den 14.10.2005, 11:10 +0100 schrieb Richard Jones:
> On Fri, Oct 14, 2005 at 11:36:57AM +0200, yoann padioleau wrote:
> > >   List.iter (
> > >     fun row ->
> > >       (* put row into database and forget about it *)
> > >   ) rows;
> > >   (* no further references to rows after this *)
> >
> > Because rows is still accessible after the List.iter so it is normal
> > that it is not garbage collected.
> 
> I agree that rows is "accessible", but it's not actually used.  My
> understanding is that the GC would be prevented from considering the
> list for collection if the pointer to the head of the list (ie. rows)
> was stored on the heap or in a register somewhere.  Would this be the
> case here?
> 
> > I had the same kind of problem and to optimize it I choose to
> > produce the elements of rows lazily (but then I had another problem
> > with the Lazy modudle where elements were not garbage collected so I
> > use my own lazy module (simple via closure) and it works perfectly
> > well).
> 
> Unfortunately this isn't really an option here.  The rows list comes
> from a huge XML doc which is parsed by PXP and passed through some
> complex post-processing; PXP doesn't support incremental processing of
> XML docs, and the post-processing would be tricky to convert too.

PXP has a pull parser. You get the XML document as a lazy stream of XML
events. I don't know your document format, but if it is something like

<document>
  <record>...</record>
  <record>...</record>
  ...  lots of them ...
</document>

I would recommend using the pull parser, and then create XML trees for
the individual records only (you can mix both styles).

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Telefon: 06151/153855                  Telefax: 06151/997714
------------------------------------------------------------


  reply	other threads:[~2005-10-14 10:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-14  9:36 yoann padioleau
2005-10-14 10:10 ` Richard Jones
2005-10-14 10:07   ` Gerd Stolpmann [this message]
2005-10-14  9:49 Richard Jones
2005-10-14 10:02 ` [Caml-list] " skaller
2005-10-14 10:08 ` Olivier Andrieu
     [not found] ` <c7ee61120510140258q5b7f393l8e3c2c3d45f49008@mail.gmail.com>
2005-10-14 10:27   ` Richard Jones
2005-10-14 10:51     ` Frederic van der Plancke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1129284426.12434.110.camel@localhost.localdomain \
    --to=info@gerd-stolpmann.de \
    --cc=caml-list@inria.fr \
    --cc=padator@wanadoo.fr \
    --cc=rich@annexia.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).