Oh shit, and also, uh, oh right! I forgot that (for example in Tree((v1 = e1), (v2 = e2)) it could transpire that after evaluating e1 and assigning to v1, the evaluation of e2 could end up moving the value pointed-to by v1, updaiting v1, but NOT updating the result of the expression (v1 = e1) (b/c of course, it's evaluated and on-stack waiting for the call to Tree(). Oof. On Sat, May 5, 2018 at 12:42 AM, Xavier Leroy wrote: > > > On Sat, May 5, 2018 at 5:25 AM Chet Murthy wrote: > >> It's been a while since I did this sort of thing, but I suspect if you >> declare CAMLlocal variables for each intermediate expression, and stick in >> the assignments, that should solve your problem (while not making your code >> too ugly). E.g. >> >> CAMLprim value left_comb(value a, value b, value c) >> { >> CAMLparam3(a, b, c); >> CAMLlocal5(l1, l2, l3, l4, l5); >> CAMLreturn(l1 = Tree(l2 = Tree((l3 = Leaf(a)), (l4 = Leaf(b)), (l5 = >> Leaf(c)))); >> } >> > > That's bold C/C++ programming! It might even work in C++, where > assignment expressions are l-values if I remember correctly. > > However, I'm afraid it won't work in C because an assignment expression > "lv = rv" is a r-value equal to the value of rv converted to the type of lv > at the time the assignment is evaluated. So, if lv is a local variable > registered with the GC, the GC will update lv when needed, but the "lv = > rv" expression will keep its initial value. > > There's also the rules concerning sequence points. I think the code above > respects the C99 rules but I'm less sure about the C11 rules. > >> >> Even better, you could linearize the tree of expressions into a sequence, >> and that should solve your problem, also. >> > > Yes, that's the robust solution. Spelling it out: > > CAMLprim value left_comb(value a, value b, value c) > { > CAMLparam3(a, b, c); > CAMLlocal5(la, lb, lc, tab, t); > la = Leaf(a); > lb = Leaf(b); > lc = Leaf(c); > tab = Tree(la, lb); > t = Tree(tab, lc); > CAMLreturn(t); > } > > You can also do "CAMLreturn(Tree(tab, lc))" directly. > > - Xavier Leroy > > >> Uh, I think. Been a while since I wrote a lotta C/C++ code to interface >> with Ocaml, but this oughta work. >> >> --chet-- >> >> >> On Wed, May 2, 2018 at 9:09 AM, Frederic Perriot >> wrote: >> >>> Hello caml-list, >>> >>> I have a GC-related question. To give you some context, I'm writing a >>> tool to parse .cmi files and generate .h and .c files, to facilitate >>> constructing OCaml variants from C bindings. >>> >>> For instance, given the following source: >>> >>> type 'a tree = Leaf of 'a | Tree of 'a tree * 'a tree [@@h_file] >>> >>> >>> the tool produces C functions: >>> >>> CAMLprim value Leaf(value arg1) >>> { >>> CAMLparam1(arg1); >>> CAMLlocal1(obj); >>> >>> obj = caml_alloc_small(1, 0); >>> >>> Field(obj, 0) = arg1; >>> >>> CAMLreturn(obj); >>> } >>> >>> CAMLprim value Tree(value arg1, value arg2) >>> { >>> // similar code here >>> } >>> >>> >>> From there, it's tempting to nest calls to variant constructors from C >>> and write code such as: >>> >>> CAMLprim value left_comb(value a, value b, value c) >>> { >>> CAMLparam3(a, b, c); >>> CAMLreturn(Tree(Tree(Leaf(a), Leaf(b)), Leaf(c))); >>> } >>> >>> >>> The problem with the above is the GC root loss due to the nesting of >>> calls to allocating functions. >>> >>> Say Leaf(c) is constructed first, and the resulting value cached in a >>> register, then Leaf(b) triggers a collection, thus invalidating the >>> register contents, and leaving a dangling pointer in the top Tree. >>> >>> Here is an actual ocamlopt output, with Leaf(c) getting cached in rbx: >>> >>> 0x000000000040dbf4 <+149>: callq 0x40d8fd >>> 0x000000000040dbf9 <+154>: mov %rax,%rbx >>> 0x000000000040dbfc <+157>: mov -0x90(%rbp),%rax >>> 0x000000000040dc03 <+164>: mov %rax,%rdi >>> 0x000000000040dc06 <+167>: callq 0x40d8fd >>> 0x000000000040dc0b <+172>: mov %rax,%r12 >>> 0x000000000040dc0e <+175>: mov -0x88(%rbp),%rax >>> 0x000000000040dc15 <+182>: mov %rax,%rdi >>> 0x000000000040dc18 <+185>: callq 0x40d8fd >>> 0x000000000040dc1d <+190>: mov %r12,%rsi >>> 0x000000000040dc20 <+193>: mov %rax,%rdi >>> 0x000000000040dc23 <+196>: callq 0x40da19 >>> 0x000000000040dc28 <+201>: mov %rbx,%rsi >>> 0x000000000040dc2b <+204>: mov %rax,%rdi >>> 0x000000000040dc2e <+207>: callq 0x40da19 >>> >>> >>> While the C code clearly violates the spirit of the GC rules, I can't >>> help but feel this is still a pitfall. >>> >>> Rule 2 of the manual states: "Local variables of type value must be >>> declared with one of the CAMLlocal macros. [...]" >>> >>> But here, I'm not declaring local variables, unless you count compiler >>> temporaries as local variables? >>> >>> I can see some other people making the same mistake I did. Should >>> there be an explicit warning in the rules? maybe underlining that >>> compiler temps count as variables, or discouraging the kind of nested >>> calls returning values displayed above? >>> >>> thanks, >>> Frédéric Perriot >>> >>> PS: this is also my first time posting to the list, so I take this >>> opportunity to thank you for the great Q's and A's I've read here over >>> the years >>> >>> -- >>> Caml-list mailing list. Subscription management and archives: >>> https://sympa.inria.fr/sympa/arc/caml-list >>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >>> Bug reports: http://caml.inria.fr/bin/caml-bugs >> >> >> -- Caml-list mailing list. Subscription management and archives: https://sympa.inria.fr/sympa/arc/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs