caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: [Caml-list] Memory usage/ garbage collection question
@ 2005-10-14  9:36 yoann padioleau
  2005-10-14 10:10 ` Richard Jones
  0 siblings, 1 reply; 7+ messages in thread
From: yoann padioleau @ 2005-10-14  9:36 UTC (permalink / raw)
  To: Richard Jones, caml-list


> I'm trying to optimise a program which is using a large amount of
> memory and consequently thrashing.
> 
> The core of the program is an iteration over a list of something like
> a million elements which consumes about 1/2 gig of RAM.  The iteration
> is:
> 
>   List.iter (
>     fun row ->
>       (* put row into database and forget about it *)
>   ) rows;
>   (* no further references to rows after this *)
> 
> This is the stdlib implementation of List.iter.  Should the garbage
> collector be able to collect the part of the list which has been
> iterated over, during the iteration?  At the moment it doesn't look
> like it's doing so.

Because rows is still accessible after the List.iter  so it is normal that it is not garbage collected.

I had the same kind of problem and to optimize it I choose to produce the elements of rows lazily
(but then I had another problem with the Lazy modudle where elements were not garbage collected so 
I use my own lazy module (simple via closure) and it works perfectly well).




> 
> Rich.
> 
> -- 
> Richard Jones, CTO Merjis Ltd.
> Merjis - web marketing and technology - http://merjis.com
> Team Notepad - intranets and extranets for business - http://team-notepad.com
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Memory usage/ garbage collection question
  2005-10-14 10:10 ` Richard Jones
@ 2005-10-14 10:07   ` Gerd Stolpmann
  0 siblings, 0 replies; 7+ messages in thread
From: Gerd Stolpmann @ 2005-10-14 10:07 UTC (permalink / raw)
  To: Richard Jones; +Cc: yoann padioleau, caml-list

Am Freitag, den 14.10.2005, 11:10 +0100 schrieb Richard Jones:
> On Fri, Oct 14, 2005 at 11:36:57AM +0200, yoann padioleau wrote:
> > >   List.iter (
> > >     fun row ->
> > >       (* put row into database and forget about it *)
> > >   ) rows;
> > >   (* no further references to rows after this *)
> >
> > Because rows is still accessible after the List.iter so it is normal
> > that it is not garbage collected.
> 
> I agree that rows is "accessible", but it's not actually used.  My
> understanding is that the GC would be prevented from considering the
> list for collection if the pointer to the head of the list (ie. rows)
> was stored on the heap or in a register somewhere.  Would this be the
> case here?
> 
> > I had the same kind of problem and to optimize it I choose to
> > produce the elements of rows lazily (but then I had another problem
> > with the Lazy modudle where elements were not garbage collected so I
> > use my own lazy module (simple via closure) and it works perfectly
> > well).
> 
> Unfortunately this isn't really an option here.  The rows list comes
> from a huge XML doc which is parsed by PXP and passed through some
> complex post-processing; PXP doesn't support incremental processing of
> XML docs, and the post-processing would be tricky to convert too.

PXP has a pull parser. You get the XML document as a lazy stream of XML
events. I don't know your document format, but if it is something like

<document>
  <record>...</record>
  <record>...</record>
  ...  lots of them ...
</document>

I would recommend using the pull parser, and then create XML trees for
the individual records only (you can mix both styles).

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Telefon: 06151/153855                  Telefax: 06151/997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Memory usage/ garbage collection question
  2005-10-14  9:36 [Caml-list] Memory usage/ garbage collection question yoann padioleau
@ 2005-10-14 10:10 ` Richard Jones
  2005-10-14 10:07   ` Gerd Stolpmann
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Jones @ 2005-10-14 10:10 UTC (permalink / raw)
  To: yoann padioleau; +Cc: caml-list

On Fri, Oct 14, 2005 at 11:36:57AM +0200, yoann padioleau wrote:
> >   List.iter (
> >     fun row ->
> >       (* put row into database and forget about it *)
> >   ) rows;
> >   (* no further references to rows after this *)
>
> Because rows is still accessible after the List.iter so it is normal
> that it is not garbage collected.

I agree that rows is "accessible", but it's not actually used.  My
understanding is that the GC would be prevented from considering the
list for collection if the pointer to the head of the list (ie. rows)
was stored on the heap or in a register somewhere.  Would this be the
case here?

> I had the same kind of problem and to optimize it I choose to
> produce the elements of rows lazily (but then I had another problem
> with the Lazy modudle where elements were not garbage collected so I
> use my own lazy module (simple via closure) and it works perfectly
> well).

Unfortunately this isn't really an option here.  The rows list comes
from a huge XML doc which is parsed by PXP and passed through some
complex post-processing; PXP doesn't support incremental processing of
XML docs, and the post-processing would be tricky to convert too.

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Memory usage/ garbage collection question
  2005-10-14 10:27   ` Richard Jones
@ 2005-10-14 10:51     ` Frederic van der Plancke
  0 siblings, 0 replies; 7+ messages in thread
From: Frederic van der Plancke @ 2005-10-14 10:51 UTC (permalink / raw)
  To: caml-list



Richard Jones wrote:
> 
> On Fri, Oct 14, 2005 at 04:58:59AM -0500, Seth J. Fogarty wrote:
> > I do not see why iterating through a list that consumed a lot of
> > memory should (innately) cause you to thrash. What is thrashing?
> > access to the disk? Garbage collection?
> 
> That's where I'm not really sure, except that it is observably
> thrashing.  Since it's a simple iteration, I guess that would
> implicate the GC?
> 
> > No, because you have bound rows to a name. Now, I believe if rows is
> > returned by a function, and is NOT bound by name in that function, it
> > can be garbage collected. I.E.
> 
> Ah OK ... I'm interested though: why does binding a value to a name
> cause problems?  Surely at this level (ocamlopt generated code) there
> ought to be no difference between a named value and an unnamed one?
> 
> Rich.

Perhaps the problem is not the name, but the fact that when you write

    List.iter f rows

the compiler isn't going to go great lengths in optimizing out the reference to rows before the call to List.iter, since the optimisation probably isn't free and the compiler (and compiler writers) don't know in advance how worthwhile it would be.

If the call to List.iter was a tail call the compiler would probably be forced to optimise rows out.

Hence my 0.02 cents worth untested idea: you might try and use tail call optimisation to get rid of that reference.

say
   let do_work () = 
      let rows = ... in
      List.iter (...) rows

Frédéric


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Memory usage/ garbage collection question
       [not found] ` <c7ee61120510140258q5b7f393l8e3c2c3d45f49008@mail.gmail.com>
@ 2005-10-14 10:27   ` Richard Jones
  2005-10-14 10:51     ` Frederic van der Plancke
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Jones @ 2005-10-14 10:27 UTC (permalink / raw)
  To: Seth J. Fogarty; +Cc: caml-list

On Fri, Oct 14, 2005 at 04:58:59AM -0500, Seth J. Fogarty wrote:
> I do not see why iterating through a list that consumed a lot of
> memory should (innately) cause you to thrash. What is thrashing?
> access to the disk? Garbage collection?

That's where I'm not really sure, except that it is observably
thrashing.  Since it's a simple iteration, I guess that would
implicate the GC?

> No, because you have bound rows to a name. Now, I believe if rows is
> returned by a function, and is NOT bound by name in that function, it
> can be garbage collected. I.E.

Ah OK ... I'm interested though: why does binding a value to a name
cause problems?  Surely at this level (ocamlopt generated code) there
ought to be no difference between a named value and an unnamed one?

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Memory usage/ garbage collection question
  2005-10-14  9:49 Richard Jones
  2005-10-14 10:02 ` [Caml-list] " skaller
@ 2005-10-14 10:08 ` Olivier Andrieu
       [not found] ` <c7ee61120510140258q5b7f393l8e3c2c3d45f49008@mail.gmail.com>
  2 siblings, 0 replies; 7+ messages in thread
From: Olivier Andrieu @ 2005-10-14 10:08 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list

Hi,

 Richard Jones [Friday 14 October 2005] :
 > I'm trying to optimise a program which is using a large amount of
 > memory and consequently thrashing.
 > 
 > The core of the program is an iteration over a list of something like
 > a million elements which consumes about 1/2 gig of RAM.  The iteration
 > is:
 > 
 >   List.iter (
 >     fun row ->
 >       (* put row into database and forget about it *)
 >   ) rows;
 >   (* no further references to rows after this *)
 > 
 > This is the stdlib implementation of List.iter.  Should the garbage
 > collector be able to collect the part of the list which has been
 > iterated over, during the iteration?  At the moment it doesn't look
 > like it's doing so.

« Short answer: with ocamlc, no.  With ocamlopt, yes. »

cf. these informatives messages by Xavier:
http://groups.google.com/group/fa.caml/msg/f194a3240d240e71
http://groups.google.com/group/fa.caml/msg/d36cb040d0d87ca6

-- 
   Olivier


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Memory usage/ garbage collection question
  2005-10-14  9:49 Richard Jones
@ 2005-10-14 10:02 ` skaller
  2005-10-14 10:08 ` Olivier Andrieu
       [not found] ` <c7ee61120510140258q5b7f393l8e3c2c3d45f49008@mail.gmail.com>
  2 siblings, 0 replies; 7+ messages in thread
From: skaller @ 2005-10-14 10:02 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list

On Fri, 2005-10-14 at 10:49 +0100, Richard Jones wrote:
> I'm trying to optimise a program which is using a large amount of
> memory and consequently thrashing.
> 
> The core of the program is an iteration over a list of something like
> a million elements which consumes about 1/2 gig of RAM.  The iteration
> is:
> 
>   List.iter (
>     fun row ->
>       (* put row into database and forget about it *)
>   ) rows;
>   (* no further references to rows after this *)
> 
> This is the stdlib implementation of List.iter.  Should the garbage
> collector be able to collect the part of the list which has been
> iterated over, during the iteration?  At the moment it doesn't look
> like it's doing so.

let rows = ref [] in
(* build up the list of rows here *)

let rec doit rows = match rows with
  | [] -> ()
  | h :: t -> install_in_db h; doit t (* tail call *)
in 
  doit (let r = !rows in rows := []; r)

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-10-14 10:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-14  9:36 [Caml-list] Memory usage/ garbage collection question yoann padioleau
2005-10-14 10:10 ` Richard Jones
2005-10-14 10:07   ` Gerd Stolpmann
2005-10-14  9:49 Richard Jones
2005-10-14 10:02 ` [Caml-list] " skaller
2005-10-14 10:08 ` Olivier Andrieu
     [not found] ` <c7ee61120510140258q5b7f393l8e3c2c3d45f49008@mail.gmail.com>
2005-10-14 10:27   ` Richard Jones
2005-10-14 10:51     ` Frederic van der Plancke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).