From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 9F0267EC6E for ; Sun, 22 Jun 2014 19:11:08 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of benoit.vaugon@gmail.com) identity=pra; client-ip=74.125.82.45; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="benoit.vaugon@gmail.com"; x-sender="benoit.vaugon@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of benoit.vaugon@gmail.com designates 74.125.82.45 as permitted sender) identity=mailfrom; client-ip=74.125.82.45; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="benoit.vaugon@gmail.com"; x-sender="benoit.vaugon@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-wg0-f45.google.com) identity=helo; client-ip=74.125.82.45; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="benoit.vaugon@gmail.com"; x-sender="postmaster@mail-wg0-f45.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ah0CAMsMp1NKfVItlGdsb2JhbABZg1+DR6oillgBgQEWDwEBAQEHCwsJEiqEAwEBAQQSEQQZARsSCgIDDAYFCw0JBBILAgIJAwIBAgEPAhEBBQEKEgcMBgIBAQ4QiAsBAxEEAQibbGqLJ4Fygw+RPAoZJwMKZYVmEQEFDIxLgiyCd4FMAQOEYwWNHoFBhQuBeoFFhTiGYIQVQYR0ag X-IPAS-Result: Ah0CAMsMp1NKfVItlGdsb2JhbABZg1+DR6oillgBgQEWDwEBAQEHCwsJEiqEAwEBAQQSEQQZARsSCgIDDAYFCw0JBBILAgIJAwIBAgEPAhEBBQEKEgcMBgIBAQ4QiAsBAxEEAQibbGqLJ4Fygw+RPAoZJwMKZYVmEQEFDIxLgiyCd4FMAQOEYwWNHoFBhQuBeoFFhTiGYIQVQYR0ag X-IronPort-AV: E=Sophos;i="5.01,524,1400018400"; d="diff'?scan'208";a="81441728" Received: from mail-wg0-f45.google.com ([74.125.82.45]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 22 Jun 2014 19:11:08 +0200 Received: by mail-wg0-f45.google.com with SMTP id l18so5594191wgh.4 for ; Sun, 22 Jun 2014 10:11:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=3cJd+UUK09foQaDKP+KSog5wEuGsjmdZgh/ZYrXkC3Y=; b=m96w8uQTIuvBhjDxitayH5ywl7jF3EoLIFUL2XpfNnlbOZ7tNbUJSYtKZHivYN84kB ERpDgWEwayPR4HjYU9gcPE0xzsn2DKRzO5JhXhuVXLl88OUgigwGieIdmWvIMR4MR2hY aj12br2dQl13X+lw7la28QY5XqXKnwveknjOnneimgesqXWekOhpnbhM8M969B3rl7S2 dQpENM7pEPVxngTIzlZhFEnHTnQyEiO2qHH88zOE9H5hEpPLHSeWhp3b5QP/9mwewfrs LmZEAHC+j9Itau9aW7GtGIhr+K9+EWyl+AqT0UBpvKrVQvNU/MoLSw3MZACNSSdfokHE KzPA== X-Received: by 10.180.74.131 with SMTP id t3mr19568391wiv.23.1403457067429; Sun, 22 Jun 2014 10:11:07 -0700 (PDT) Received: from [192.168.1.10] (AAubervilliers-552-1-86-174.w90-3.abo.wanadoo.fr. [90.3.93.174]) by mx.google.com with ESMTPSA id h13sm24260284wjs.2.2014.06.22.10.11.05 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 22 Jun 2014 10:11:06 -0700 (PDT) Message-ID: <53A70E28.2030804@gmail.com> Date: Sun, 22 Jun 2014 19:11:04 +0200 From: =?UTF-8?B?QmVub8OudCBWYXVnb24=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: caml-list@inria.fr, jean-vincent.loddo@lipn.univ-paris13.fr References: In-Reply-To: Content-Type: multipart/mixed; boundary="------------070805090306030709080004" Subject: Re: [Caml-list] Memory leaks generated by Scanf.fscanf? This is a multi-part message in MIME format. --------------070805090306030709080004 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit I attach a samll patch based on weak-pointers that seems to solve the problem.The Jean-Vincent example now prints something like: Iteration #01: stat called 256 times: live blocks: 788 Iteration #02: stat called 256 times: live blocks: 4866 Iteration #03: stat called 256 times: live blocks: 9712 Iteration #04: stat called 256 times: live blocks: 9467 Iteration #05: stat called 256 times: live blocks: 13099 Iteration #06: stat called 256 times: live blocks: 16865 Iteration #07: stat called 256 times: live blocks: 9463 Iteration #08: stat called 256 times: live blocks: 6833 Iteration #09: stat called 256 times: live blocks: 1290 Iteration #10: stat called 256 times: live blocks: 3831 Iteration #11: stat called 256 times: live blocks: 3831 Iteration #12: stat called 256 times: live blocks: 3831 Iteration #13: stat called 256 times: live blocks: 3833 Iteration #14: stat called 256 times: live blocks: 3829 Iteration #15: stat called 256 times: live blocks: 3829 Iteration #16: stat called 256 times: live blocks: 3829 Iteration #17: stat called 256 times: live blocks: 3829 Iteration #18: stat called 256 times: live blocks: 3829 Iteration #19: stat called 256 times: live blocks: 3829 Iteration #20: stat called 256 times: live blocks: 3829 This patch also preserves the scanf semantics about factorisation of scanning buffers. This property may be verified by running the following code: Scanf.fscanf stdin "%[0-9]" (fun s -> print_endline s);; Gc.compact ();; Scanf.fscanf stdin "\n%d" (fun n -> print_endline (string_of_int n));; Regards, Benoît. Le 20/06/2014 17:35, Gabriel Scherer a écrit : > It looks like ephemerons would be a perfect fit to fix this issue, but > unfortunately they're not yet available. > > It should be possible instead, at each call of the memo function, to > iterate on the table and remove any item for file-descriptor which has > been closed (I don't think checking whether a file-descriptor is > closed is provided by an OCaml-land function right now, but it'd be > easy to add to the runtime). That would make Scanning.from_channel > slower (linear in the number of opened channels, though we could > easily amortize by checking for all N new channels), but remove the > leak, I think. > > > On Fri, Jun 20, 2014 at 3:01 PM, Jeremy Yallop wrote: >> On 20 June 2014 13:29, wrote: >>> working on Marionnet (https://launchpad.net/marionnet), I noticed a serious >>> memory leak making the system unusable after a few tens of minutes. After >>> investigation, the problem seems to be related to Scanf.fscanf. >> It looks like your leak is caused by the 'memo' table in the Scanf >> module that associates a lookahead buffer with each input channel: >> >> https://github.com/ocaml/ocaml/blob/trunk/stdlib/scanf.ml#L393 >> >> as explained by a comment in the Scanf code: >> >> https://github.com/ocaml/ocaml/blob/trunk/stdlib/scanf.ml#L268-L320 >> >> Entries are added to the table for each input channel used for >> scanning, but there's no mechanism for removing entries. This would >> be worth raising on Mantis. >> >> -- >> Caml-list mailing list. Subscription management and archives: >> https://sympa.inria.fr/sympa/arc/caml-list >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >> Bug reports: http://caml.inria.fr/bin/caml-bugs --------------070805090306030709080004 Content-Type: text/x-patch; name="fix-scanf-memory-leak.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="fix-scanf-memory-leak.diff" diff -Naur old/stdlib/.depend new/stdlib/.depend --- old/stdlib/.depend 2014-06-22 18:34:31.298480318 +0200 +++ new/stdlib/.depend 2014-06-22 18:34:17.522479966 +0200 @@ -147,10 +147,10 @@ digest.cmx char.cmx array.cmx random.cmi scanf.cmo : string.cmi printf.cmi pervasives.cmi list.cmi \ camlinternalFormatBasics.cmi camlinternalFormat.cmi bytes.cmi buffer.cmi \ - scanf.cmi + weak.cmi scanf.cmi scanf.cmx : string.cmx printf.cmx pervasives.cmx list.cmx \ camlinternalFormatBasics.cmx camlinternalFormat.cmx bytes.cmx buffer.cmx \ - scanf.cmi + weak.cmx scanf.cmi set.cmo : list.cmi set.cmi set.cmx : list.cmx set.cmi sort.cmo : array.cmi sort.cmi diff -Naur old/stdlib/Makefile.shared new/stdlib/Makefile.shared --- old/stdlib/Makefile.shared 2014-06-22 18:33:54.426479374 +0200 +++ new/stdlib/Makefile.shared 2014-06-22 18:33:42.866479078 +0200 @@ -30,9 +30,9 @@ camlinternalLazy.cmo lazy.cmo stream.cmo \ buffer.cmo camlinternalFormat.cmo printf.cmo \ arg.cmo printexc.cmo gc.cmo \ - digest.cmo random.cmo hashtbl.cmo format.cmo scanf.cmo callback.cmo \ + digest.cmo random.cmo hashtbl.cmo format.cmo weak.cmo scanf.cmo callback.cmo \ camlinternalOO.cmo oo.cmo camlinternalMod.cmo \ - genlex.cmo weak.cmo \ + genlex.cmo \ filename.cmo complex.cmo \ arrayLabels.cmo listLabels.cmo bytesLabels.cmo \ stringLabels.cmo moreLabels.cmo stdLabels.cmo diff -Naur old/stdlib/scanf.ml new/stdlib/scanf.ml --- old/stdlib/scanf.ml 2014-06-22 18:31:52.162476244 +0200 +++ new/stdlib/scanf.ml 2014-06-22 18:31:35.010475805 +0200 @@ -390,12 +390,31 @@ let from_file_bin = open_in_bin;; let memo_from_ic = - let memo = ref [] in + let module IcMemo = Weak.Make (struct + type t = Pervasives.in_channel + let equal ic1 ic2 = ic1 = ic2 + let hash ic = Hashtbl.hash ic + end) in + let module PairMemo = Weak.Make (struct + type t = Pervasives.in_channel * in_channel + let equal (ic1, _) (ic2, _) = ic1 = ic2 + let hash (ic, _) = Hashtbl.hash ic + end) in + let ic_memo = IcMemo.create 16 in + let pair_memo = PairMemo.create 16 in + let rec finaliser ((ic, _) as pair) = + if IcMemo.mem ic_memo ic then ( + Gc.finalise finaliser pair; + PairMemo.add pair_memo pair; + ) in (fun scan_close_ic ic -> - try List.assq ic !memo with + try snd (PairMemo.find pair_memo (ic, stdin)) with | Not_found -> let ib = from_ic scan_close_ic (From_channel ic) ic in - memo := (ic, ib) :: !memo; + let pair = (ic, ib) in + IcMemo.add ic_memo ic; + Gc.finalise finaliser pair; + PairMemo.add pair_memo pair; ib) ;; --------------070805090306030709080004--