Re: [Caml-list] Memory leaks generated by Scanf.fscanf?

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: "François Bobot" <francois.bobot@cea.fr>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Memory leaks generated by Scanf.fscanf?
Date: Mon, 23 Jun 2014 11:06:14 +0200	[thread overview]
Message-ID: <53A7EE06.5060209@cea.fr> (raw)
In-Reply-To: <53A70E28.2030804@gmail.com>

On 22/06/2014 19:11, Benoît Vaugon wrote:
> I attach a samll patch based on weak-pointers that seems to solve
> the problem.The Jean-Vincent example now prints something like:

Wow! That's a funny solution, but I'm just afraid that it rely on a bad behavior of the GC that the
first three commits of !22 (https://github.com/ocaml/ocaml/pull/22) correct.

In english terms, what Benoît does:
  - put the key in a weak-set to be able to test if it disappeared
  - create a pair that store the key and the associated value
  - put this pair in a weak set, where it will be reclaimed at the next GC because it is reachable
only through a weak pointer.
  - add a finalizer on the pair that add again the pair in the weak set at each GC
  - stop to add it when the key is not present anymore.

The correction of this algorithm rely on the following (good, IMHO) behavior:
  - A value which have an attached finalizer is marked alive, so an ocaml function can run with this
function and one can make it reachable again.

and the following bad behavior:
  - The weak pointers are cleaned before the marking of finalized pointer.

If the cleaning is done later, the pair is marked, then the key is marked, the key is not removed
from the weak-set, thus the key never disappear.

The fact that "The weak pointers are cleaned before the marking of finalized pointer" is bad mainly
because you don't have anymore the property that if a value disappear from a weak set then it have
been reclaimed and can't be used anymore. Use of (==) and tag in hashconsed values are based on this
property.  Moreover the fact that the cleaning of weak pointer is done during the marking phase can
also lead to a value disappearing from a weak set but not from another.

In conclusion the ephemerons are the real solution for the 4.03 release. For the 4.02 I'm not sure
the added complexity and memory/cpu cost is worse it.

Moreover I don't understand why Scanf.from_channel must be memoized by the library. Can't we say
that the user of the library shouldn't call it twice with the same value? The example of
jean-vincent loddo will work with such API, no? We can also add a new not memoized function.

-- 
François

next prev parent reply	other threads:[~2014-06-23  8:58 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-20 12:29 jean-vincent.loddo
2014-06-20 13:01 ` Jeremy Yallop
2014-06-20 15:35   ` Gabriel Scherer
2014-06-22 17:11     ` Benoît Vaugon
2014-06-23  9:06       ` François Bobot [this message]
2014-06-27 14:32   ` Jeremy Yallop

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A7EE06.5060209@cea.fr \
    --to=francois.bobot@cea.fr \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).