From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 533427FECC for ; Fri, 17 Jun 2016 21:23:52 +0200 (CEST) IronPort-PHdr: 9a23:oP+PJB0wny9vwo11smDT+DRfVm0co7zxezQtwd8ZsekWLfad9pjvdHbS+e9qxAeQG96LurQV06GP6PyocFdDyKjCmUhKSIZLWR4BhJdetC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TWM5DIfUi/yKRBybrysXNWC3oLmj6vroMGbSj4LrQT+SIs6FA+xowTVu5teqqpZAYF19CH0pGBVcf9d32JiKAHbtR/94sCt4MwrqHwI6LoJvvRNWqTifqk+UacQTHF/azh0t/vQqALbQACTynwZW2QQ2loUUkmWpC39C7L4qDf7sKJHnmG8MND2wa2DVC7qu79sUwPAjS4dMTMktmrQj5ojorhcpUeOrhZlwoPQKLqeNPdkc7mVKdwTT3BAU8IXTCdBD5mxdaMACuMAOaBTqIyr9AhGlge3GQT5XLCn8TRPnHKjmPBj3g== Authentication-Results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=gabriel.scherer@gmail.com; spf=Pass smtp.mailfrom=gabriel.scherer@gmail.com; spf=None smtp.helo=postmaster@mail-it0-f52.google.com Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of gabriel.scherer@gmail.com) identity=pra; client-ip=209.85.214.52; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="gabriel.scherer@gmail.com"; x-sender="gabriel.scherer@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of gabriel.scherer@gmail.com designates 209.85.214.52 as permitted sender) identity=mailfrom; client-ip=209.85.214.52; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="gabriel.scherer@gmail.com"; x-sender="gabriel.scherer@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-it0-f52.google.com) identity=helo; client-ip=209.85.214.52; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="gabriel.scherer@gmail.com"; x-sender="postmaster@mail-it0-f52.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0AQAAC/TGRXhjTWVdFDGoNcOH0GgnikdoFThXWGJoUBgXoihXUCgR4HOBQBAQEBAQEBAREBAQEICwsJIS+CMYIbAQEDAQwGER0BGw8DCwEDAQsGAwILGh0CAiIBEQEFAQoSBhMSEIdzAQMPCA4tkl+PQoExPjGLO4FqglkFh1AKGScDClKDDAEBAQEGAQEBAQEaAgYQhheETYRtglSCWgWGSgySG4FYhC2IJII3jGuHIIcWEh6BDw8PgjEeIoFRIDIBiXoBAQE X-IPAS-Result: A0AQAAC/TGRXhjTWVdFDGoNcOH0GgnikdoFThXWGJoUBgXoihXUCgR4HOBQBAQEBAQEBAREBAQEICwsJIS+CMYIbAQEDAQwGER0BGw8DCwEDAQsGAwILGh0CAiIBEQEFAQoSBhMSEIdzAQMPCA4tkl+PQoExPjGLO4FqglkFh1AKGScDClKDDAEBAQEGAQEBAQEaAgYQhheETYRtglSCWgWGSgySG4FYhC2IJII3jGuHIIcWEh6BDw8PgjEeIoFRIDIBiXoBAQE X-IronPort-AV: E=Sophos;i="5.26,484,1459807200"; d="scan'208,217";a="181667121" Received: from mail-it0-f52.google.com ([209.85.214.52]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/AES128-GCM-SHA256; 17 Jun 2016 21:23:50 +0200 Received: by mail-it0-f52.google.com with SMTP id e5so758557ith.0 for ; Fri, 17 Jun 2016 12:23:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=zvxzYzU+/K21bZ9VuKKinlreSbp8yN7EuVMjtw7vsno=; b=i4hPKIBsgIbzWYdKOn9Qd9XN+3CSuLiqhKOgMq5Hk4nx+rWqUERMZAjFuZtQHgAykn w6e9mvqNwgFk3Mbyjb3jJL1TmCI6wgMXSBDU7kZAYVTroDX1m3oSvW0f0xcV2hy1H0If ua7umXVQ4vdscN7MX6Dgfh701cXitx2IoHDZ0AviqsqRgV05bkQJaAzDnHMExnxezbIn gQmr5lSBQUXICWv1veMi8JSX6bTWfgHIH38ei6akAehn07hjMj8cgdHErtNstn2RNlbp dXAJMfOkdzus3ZvRtxLiRYoJSeUElkpFM0HYvdWpvrJs5BaahPrjGoEaUjYG2RTeuKNV KKsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=zvxzYzU+/K21bZ9VuKKinlreSbp8yN7EuVMjtw7vsno=; b=BT6I5L9Ulq9ugik0XqZ296Vx8YgG/QA3DhdxkI5GQMseGDDicrrLJi0pRaKv3bzEFo mGhLlokzQfT8GGLc5kHeqXFc7X+K/jVJGmTGsh9gtm3XyHOrUmDOAu+9VLMIt/wgr5DP pyNhpdDSFDLt/n9t0PmjO8g8p8+KQk2Kl0zXO0YkbedWxASxpkcWE5qp7nIOZUycZqaU g6WHML7ABLXlHeAaoh0rdpjUFyskkvYXWKO8/c3XE8Js6Tklc8NfucWaKw7VbG34XZD+ /gjg+GAktWEsfgMBF1Qd7+S82UZ6ukgTfY0cmV6tBCEdVP2laDOE4+mkJOnalXyAdKur IlaQ== X-Gm-Message-State: ALyK8tLvSE8qIVXW/rqG+MVeo4gyPSr+HPOa9kwMFwSwXG2HX0r+lZScA52lHEq8HvYu9TIcQaBIrNRo4OH0/w== X-Received: by 10.36.249.137 with SMTP id l131mr242126ith.21.1466191429104; Fri, 17 Jun 2016 12:23:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.136.70 with HTTP; Fri, 17 Jun 2016 12:23:09 -0700 (PDT) In-Reply-To: <7182303B-EBFF-4CFC-A7DF-B5C7A059509F@marneu.com> References: <7182303B-EBFF-4CFC-A7DF-B5C7A059509F@marneu.com> From: Gabriel Scherer Date: Fri, 17 Jun 2016 15:23:09 -0400 Message-ID: To: =?UTF-8?B?TWFydGluIFIuIE5ldWjDpHXDn2Vy?= Cc: OCaml List Content-Type: multipart/alternative; boundary=94eb2c0363564df94b05357e4b15 Subject: Re: [Caml-list] Presumed bug in OCaml's garbage collector or in its Weak module... --94eb2c0363564df94b05357e4b15 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable > 3. The finalizer is never called for blocks that are still alive; stated > otherwise, a block that has been finalized can never been presented as a > value to a C-stub anymore. I think that this does not hold in general: the finalizer should only be called when there is no reference to the value around, *but* the finalizer action can create a new reference (for example you may insert the value into a (live) global cache), so the value may be live after the finalizer has been called -- and be sent to a C stub, etc. This remark may not apply to your actual reproduction case (your finalizer shouldn't make the value reachable from the OCaml heap). (There were changes between the 4.03 development cycle that would explain the difference in behavior between beta1 and beta2, see in particular http://caml.inria.fr/mantis/view.php?id=3D7210 and https://github.com/ocaml/ocaml/pull/22 ; but I'll leave memory-runtime experts comment on that.) On Fri, Jun 17, 2016 at 2:09 PM, "Martin R. Neuh=C3=A4u=C3=9Fer" wrote: > Dear all, > > after an intense week of debugging some large C-bindings, I presume to > have found a bug in OCaml=E2=80=99s garbage collection code or in its Wea= k module. > To summarize, it seems as if OCaml=E2=80=99s garbage collector and its We= ak module > sometimes release a custom block (by calling its finalizer) too early, i.= e. > when it is still reachable. > > I hesitate a bit before opening an =E2=80=9Eofficial=E2=80=9C issue on Ma= ntis as I might > very well overlook some detail. Therefore I=E2=80=99d like to double-chec= k some of > the assumptions that I have made when writing my C-stubs. Any corrections > are highly welcome: > 1. Custom blocks may be moved around in memory by the GC, but they are > never duplicated. > 2. The finalizer for each custom block that is allocated by > caml_alloc_custom is called at most once. > 3. The finalizer is never called for blocks that are still alive; stated > otherwise, a block that has been finalized can never been presented as a > value to a C-stub anymore. > > There is a small example program available on github: > https://github.com/martin-neuhaeusser/ocaml_bug where the above > assumptions seem to be violated. > > The idea is as follows: > Each custom block that the example allocates contains a pointer to > structure that is malloc=E2=80=99ed outside the OCaml heap; each custom b= lock > uniquely refers to such a structure. > Upon creation of a custom block, a special flag-value is stored in its > corresponding (newly created) structure to indicate that the block is > alive. Once a custom block=E2=80=99s finalizer is called, the flag is cha= nged to > indicate finalization of the block. I assume that after the finalizer > returns, the custom block is unreachable (and perhaps its memory reclaimed > by the GC); however, for debugging, I intentionally keep the custom block= =E2=80=99s > structure allocated on the unmanaged heap. Whenever a custom block is > presented to the C-stubs, they check that the corresponding flag-value > within its corresponding non-managed heap structure indicates liveness of > the block. As it turns out, this invariant is violated, if the Weak module > and the GC are stressed. > > Of course, it might very well be that there is a bug in my C-stubs that is > responsible for that behavior. I tested with different versions of OCaml, > with different outcomes: The example program terminates correctly with > OCaml 3.12.1, 4.00.0, 4.00.1, 4.01.0, and 4.03.0+beta1 whereas it crashes > due to the above error when compiled with OCaml 4.02.0, 4.02.1, 4.02.2, > 4.02.3, 4.03.0+beta2, and the released 4.03.0. The tests were run on 64bit > Debian Linux and 64bit MacOS X. > > I'd very much appreciate any help, hints to bugs in my code, or requests > for clarification. > > Thanks and best regards, > Martin > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs --94eb2c0363564df94b05357e4b15 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

3. The finalizer is never called for blocks that are still alive; stated otherwise, a block that has been finalized can never been presented as a value to a C-stub anymore.

I think that th= is does not hold in general: the finalizer should only be called when there= is no reference to the value around, *but* the finalizer action can create= a new reference (for example you may insert the value into a (live) global= cache), so the value may be live after the finalizer has been called -- an= d be sent to a C stub, etc.

This remark may not apply to = your actual reproduction case (your finalizer shouldn't make the value = reachable from the OCaml heap).

(There were ch= anges between the 4.03 development cycle that would explain the difference = in behavior between beta1 and beta2, see in particular http://caml.inria.fr/mantis/view.php= ?id=3D7210 and https= ://github.com/ocaml/ocaml/pull/22 ; but I'll leave memory-runtime e= xperts comment on that.)

On Fri, Jun 17, 2016 at 2:09 PM, "Martin R. Neu= h=C3=A4u=C3=9Fer" <post@marneu.com> wrote:
Dear all,

after an intense week of debugging some large C-bindings, I presume to have= found a bug in OCaml=E2=80=99s garbage collection code or in its Weak modu= le.
To summarize, it seems as if OCaml=E2=80=99s garbage collector and its Weak= module sometimes release a custom block (by calling its finalizer) too ear= ly, i.e. when it is still reachable.

I hesitate a bit before opening an =E2=80=9Eofficial=E2=80=9C issue on Mant= is as I might very well overlook some detail. Therefore I=E2=80=99d like to= double-check some of the assumptions that I have made when writing my C-st= ubs. Any corrections are highly welcome:
1. Custom blocks may be moved around in memory by the GC, but they are neve= r duplicated.
2. The finalizer for each custom block that is allocated by caml_alloc_cust= om is called at most once.
3. The finalizer is never called for blocks that are still alive; stated ot= herwise, a block that has been finalized can never been presented as a valu= e to a C-stub anymore.

There is a small example program available on github: https://github.com/martin-neuhaeusser/ocaml_bug where the above assump= tions seem to be violated.

The idea is as follows:
Each custom block that the example allocates contains a pointer to structur= e that is malloc=E2=80=99ed outside the OCaml heap; each custom block uniqu= ely refers to such a structure.
Upon creation of a custom block, a special flag-value is stored in its corr= esponding (newly created) structure to indicate that the block is alive. On= ce a custom block=E2=80=99s finalizer is called, the flag is changed to ind= icate finalization of the block. I assume that after the finalizer returns,= the custom block is unreachable (and perhaps its memory reclaimed by the G= C); however, for debugging, I intentionally keep the custom block=E2=80=99s= structure allocated on the unmanaged heap. Whenever a custom block is pres= ented to the C-stubs, they check that the corresponding flag-value within i= ts corresponding non-managed heap structure indicates liveness of the block= . As it turns out, this invariant is violated, if the Weak module and the G= C are stressed.

Of course, it might very well be that there is a bug in my C-stubs that is = responsible for that behavior. I tested with different versions of OCaml, w= ith different outcomes: The example program terminates correctly with OCaml= 3.12.1, 4.00.0, 4.00.1, 4.01.0, and 4.03.0+beta1 whereas it crashes due to= the above error when compiled with OCaml 4.02.0, 4.02.1, 4.02.2, 4.02.3, 4= .03.0+beta2, and the released 4.03.0. The tests were run on 64bit Debian Li= nux and 64bit MacOS X.

I'd very much appreciate any help, hints to bugs in my code, or request= s for clarification.

Thanks and best regards,
Martin

--
Caml-list mailing list.=C2=A0 Subscription management and archives:
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocam= l_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs
<= /blockquote>

--94eb2c0363564df94b05357e4b15--