caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "\"Martin R. Neuhäußer\"" <post@marneu.com>
To: "Török Edwin" <edwin+ml-ocaml@etorok.net>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Presumed bug in OCaml's garbage collector or in its Weak module...
Date: Sat, 18 Jun 2016 17:11:42 +0200	[thread overview]
Message-ID: <90E10E72-C397-47B7-A6B8-43B0E8A29D5C@marneu.com> (raw)
In-Reply-To: <3673db34-9350-a9b1-fcd5-e7593ba0fd01@etorok.net>

[-- Attachment #1: Type: text/plain, Size: 3651 bytes --]

> 
> Am 17.06.2016 um 21:41 schrieb Török Edwin <edwin+ml-ocaml@etorok.net>:
> 
> On 06/17/2016 09:09 PM, "Martin R. Neuhäußer" wrote:
>> Dear all,
>> 
>> after an intense week of debugging some large C-bindings, I presume to have found a bug in OCaml’s garbage collection code or in its Weak module.
>> To summarize, it seems as if OCaml’s garbage collector and its Weak module sometimes release a custom block (by calling its finalizer) too early, i.e. when it is still reachable.
>> 
>> I hesitate a bit before opening an „official“ issue on Mantis as I might very well overlook some detail. Therefore I’d like to double-check some of the assumptions that I have made when writing my C-stubs. Any corrections are highly welcome:
>> 1. Custom blocks may be moved around in memory by the GC, but they are never duplicated.
>> 2. The finalizer for each custom block that is allocated by caml_alloc_custom is called at most once.
>> 3. The finalizer is never called for blocks that are still alive; stated otherwise, a block that has been finalized can never been presented as a value to a C-stub anymore.
> 
> Shallow copies don't seem very safe in the presence of out-of-OCaml-heap C pointers [*].
> Perhaps Weak.get_copy should raise if it encounters a Custom_tag?
Just forbidding to use custom blocks in weak sets is harsh, isn’t it? For example, we are relying on SMT solvers and use weak sets on the OCaml side to do a lightweight hash-consing (actually, we only want to avoid having duplicate term representations in OCaml).
Moreover, I suspect it might be difficult to detect situations that involve custom blocks, as a custom block may occur deeply nested in a structured block that is stored in a weak set...
> 
> The custom value may contain C pointers, that may get invalidated or changed if the custom value has a finalizer (as in the testcase):
> When the original weak value has no more references it gets the finalizer invoked, and gets garbage collected
> (in this case it changes unique_id to c_data_invalid_id, but it might as well free it instead causing a crash).
> 
> Meanwhile the copy made through Weak.get_copy still lives, and shares the copied custom value (pointer) from the weak value that got finalized.
> Operations on the copied value (e.g. comparison/hashing) will try to access the custom value, which points to c_data_invalid_id.
I uploaded a second example `gcbug2.ml` which crashes as well but this time, it is designed not to check the „validity“ of custom blocks while a Weak.find operation is ongoing. Actually, in my experiments, this version aborts within the C-layer finalizer when trying to finalize a custom block that has been finalized before. So something like a double-free is happening here, and outside the scope of Weak.find.
If the behavior is related to the shallow copies created by Weak.get_copy, is it the case that they are finalized as well, and independently of their source custom block?
> 
> 
>> There is a small example program available on github: https://github.com/martin-neuhaeusser/ocaml_bug where the above assumptions seem to be violated.
> 
> [*] AFAICT a shallow copy is created by WeakDummySet.find -> Weak.get_copy
> 
> Best regards,
> --
> Edwin Török | Co-founder and Lead Developer
> 
> Skylable open-source object storage: reliable, fast, secure
> http://www.skylable.com
> 
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 204 bytes --]

  parent reply	other threads:[~2016-06-18 15:11 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17 18:09 "Martin R. Neuhäußer"
2016-06-17 19:23 ` Gabriel Scherer
2016-06-18 15:15   ` "Martin R. Neuhäußer"
2016-06-17 19:41 ` Török Edwin
2016-06-17 20:29   ` "Martin R. Neuhäußer"
2016-06-18 10:59     ` Josh Berdine
2016-06-18 15:11   ` "Martin R. Neuhäußer" [this message]
2016-06-18 16:54   ` Leo White
2016-06-21 12:43     ` François Bobot
2016-06-21 19:37       ` Alain Frisch
2016-06-22  8:12         ` François Bobot
2016-06-18 17:39 ` "Martin R. Neuhäußer"
2016-06-21 11:55   ` François Bobot
2016-06-27 11:35     ` AW: " Neuhaeusser, Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=90E10E72-C397-47B7-A6B8-43B0E8A29D5C@marneu.com \
    --to=post@marneu.com \
    --cc=caml-list@inria.fr \
    --cc=edwin+ml-ocaml@etorok.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).