caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Benoît Vaugon" <benoit.vaugon@gmail.com>
To: caml-list@inria.fr, jean-vincent.loddo@lipn.univ-paris13.fr
Subject: Re: [Caml-list] Memory leaks generated by Scanf.fscanf?
Date: Sun, 22 Jun 2014 19:11:04 +0200	[thread overview]
Message-ID: <53A70E28.2030804@gmail.com> (raw)
In-Reply-To: <CAPFanBGrfvs_Y9z2Ue6U-RQDVZ-h0KYViBrFNjCrVhSTdNQ5Jw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3371 bytes --]

I attach a samll patch based on weak-pointers that seems to solve
the problem.The Jean-Vincent example now prints something like:

Iteration #01: stat called 256 times: live blocks: 788
Iteration #02: stat called 256 times: live blocks: 4866
Iteration #03: stat called 256 times: live blocks: 9712
Iteration #04: stat called 256 times: live blocks: 9467
Iteration #05: stat called 256 times: live blocks: 13099
Iteration #06: stat called 256 times: live blocks: 16865
Iteration #07: stat called 256 times: live blocks: 9463
Iteration #08: stat called 256 times: live blocks: 6833
Iteration #09: stat called 256 times: live blocks: 1290
Iteration #10: stat called 256 times: live blocks: 3831
Iteration #11: stat called 256 times: live blocks: 3831
Iteration #12: stat called 256 times: live blocks: 3831
Iteration #13: stat called 256 times: live blocks: 3833
Iteration #14: stat called 256 times: live blocks: 3829
Iteration #15: stat called 256 times: live blocks: 3829
Iteration #16: stat called 256 times: live blocks: 3829
Iteration #17: stat called 256 times: live blocks: 3829
Iteration #18: stat called 256 times: live blocks: 3829
Iteration #19: stat called 256 times: live blocks: 3829
Iteration #20: stat called 256 times: live blocks: 3829

This patch also preserves the scanf semantics about factorisation
of scanning buffers. This property may be verified by running the
following code:

Scanf.fscanf stdin "%[0-9]" (fun s -> print_endline s);;
Gc.compact ();;
Scanf.fscanf stdin "\n%d" (fun n -> print_endline (string_of_int n));;

Regards,
Benoît.


Le 20/06/2014 17:35, Gabriel Scherer a écrit :
> It looks like ephemerons would be a perfect fit to fix this issue, but
> unfortunately they're not yet available.
>
> It should be possible instead, at each call of the memo function, to
> iterate on the table and remove any item for file-descriptor which has
> been closed (I don't think checking whether a file-descriptor is
> closed is provided by an OCaml-land function right now, but it'd be
> easy to add to the runtime). That would make Scanning.from_channel
> slower (linear in the number of opened channels, though we could
> easily amortize by checking for all N new channels), but remove the
> leak, I think.
>
>
> On Fri, Jun 20, 2014 at 3:01 PM, Jeremy Yallop <yallop@gmail.com> wrote:
>> On 20 June 2014 13:29,  <jean-vincent.loddo@lipn.univ-paris13.fr> wrote:
>>> working on Marionnet (https://launchpad.net/marionnet), I noticed a serious
>>> memory leak making the system unusable after a few tens of minutes. After
>>> investigation, the problem seems to be related to Scanf.fscanf.
>> It looks like your leak is caused by the 'memo' table in the Scanf
>> module that associates a lookahead buffer with each input channel:
>>
>>     https://github.com/ocaml/ocaml/blob/trunk/stdlib/scanf.ml#L393
>>
>> as explained by a comment in the Scanf code:
>>
>>     https://github.com/ocaml/ocaml/blob/trunk/stdlib/scanf.ml#L268-L320
>>
>> Entries are added to the table for each input channel used for
>> scanning, but there's no mechanism for removing entries.  This would
>> be worth raising on Mantis.
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs


[-- Attachment #2: fix-scanf-memory-leak.diff --]
[-- Type: text/x-patch, Size: 2698 bytes --]

diff -Naur old/stdlib/.depend new/stdlib/.depend
--- old/stdlib/.depend	2014-06-22 18:34:31.298480318 +0200
+++ new/stdlib/.depend	2014-06-22 18:34:17.522479966 +0200
@@ -147,10 +147,10 @@
     digest.cmx char.cmx array.cmx random.cmi
 scanf.cmo : string.cmi printf.cmi pervasives.cmi list.cmi \
     camlinternalFormatBasics.cmi camlinternalFormat.cmi bytes.cmi buffer.cmi \
-    scanf.cmi
+    weak.cmi scanf.cmi
 scanf.cmx : string.cmx printf.cmx pervasives.cmx list.cmx \
     camlinternalFormatBasics.cmx camlinternalFormat.cmx bytes.cmx buffer.cmx \
-    scanf.cmi
+    weak.cmx scanf.cmi
 set.cmo : list.cmi set.cmi
 set.cmx : list.cmx set.cmi
 sort.cmo : array.cmi sort.cmi
diff -Naur old/stdlib/Makefile.shared new/stdlib/Makefile.shared
--- old/stdlib/Makefile.shared	2014-06-22 18:33:54.426479374 +0200
+++ new/stdlib/Makefile.shared	2014-06-22 18:33:42.866479078 +0200
@@ -30,9 +30,9 @@
   camlinternalLazy.cmo lazy.cmo stream.cmo \
   buffer.cmo camlinternalFormat.cmo printf.cmo \
   arg.cmo printexc.cmo gc.cmo \
-  digest.cmo random.cmo hashtbl.cmo format.cmo scanf.cmo callback.cmo \
+  digest.cmo random.cmo hashtbl.cmo format.cmo weak.cmo scanf.cmo callback.cmo \
   camlinternalOO.cmo oo.cmo camlinternalMod.cmo \
-  genlex.cmo weak.cmo \
+  genlex.cmo \
   filename.cmo complex.cmo \
   arrayLabels.cmo listLabels.cmo bytesLabels.cmo \
   stringLabels.cmo moreLabels.cmo stdLabels.cmo
diff -Naur old/stdlib/scanf.ml new/stdlib/scanf.ml
--- old/stdlib/scanf.ml	2014-06-22 18:31:52.162476244 +0200
+++ new/stdlib/scanf.ml	2014-06-22 18:31:35.010475805 +0200
@@ -390,12 +390,31 @@
   let from_file_bin = open_in_bin;;
 
   let memo_from_ic =
-    let memo = ref [] in
+    let module IcMemo = Weak.Make (struct
+      type t = Pervasives.in_channel
+      let equal ic1 ic2 = ic1 = ic2
+      let hash ic = Hashtbl.hash ic
+    end) in
+    let module PairMemo = Weak.Make (struct
+      type t = Pervasives.in_channel * in_channel
+      let equal (ic1, _) (ic2, _) = ic1 = ic2
+      let hash (ic, _) = Hashtbl.hash ic
+    end) in
+    let ic_memo = IcMemo.create 16 in
+    let pair_memo = PairMemo.create 16 in
+    let rec finaliser ((ic, _) as pair) =
+      if IcMemo.mem ic_memo ic then (
+        Gc.finalise finaliser pair;
+        PairMemo.add pair_memo pair;
+      ) in
     (fun scan_close_ic ic ->
-     try List.assq ic !memo with
+     try snd (PairMemo.find pair_memo (ic, stdin)) with
      | Not_found ->
        let ib = from_ic scan_close_ic (From_channel ic) ic in
-       memo := (ic, ib) :: !memo;
+       let pair = (ic, ib) in
+       IcMemo.add ic_memo ic;
+       Gc.finalise finaliser pair;
+       PairMemo.add pair_memo pair;
        ib)
   ;;
 

  reply	other threads:[~2014-06-22 17:11 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-20 12:29 jean-vincent.loddo
2014-06-20 13:01 ` Jeremy Yallop
2014-06-20 15:35   ` Gabriel Scherer
2014-06-22 17:11     ` Benoît Vaugon [this message]
2014-06-23  9:06       ` François Bobot
2014-06-27 14:32   ` Jeremy Yallop

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A70E28.2030804@gmail.com \
    --to=benoit.vaugon@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=jean-vincent.loddo@lipn.univ-paris13.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).