caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Interfacing with C: bad practice
@ 2011-08-16  7:37 Dmitry Bely
  2011-08-16  8:04 ` Török Edwin
                   ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Dmitry Bely @ 2011-08-16  7:37 UTC (permalink / raw)
  To: Caml List

I would like to share my experience of writing bad C bindings. The
following code is wrong, although no "living in harmony with the
garbage collector" rule seems to be violated:

value wrp_ml_cons (value v, value l)
{
  CAMLparam2(v, l);
  CAMLlocal1(cell);
  cell = caml_alloc_small(2, Tag_cons);
  Field(cell, 0) = v;
  Field(cell, 1) = l;
  CAMLreturn(cell);
}

value string_list(const char ** s)
{
    CAMLparam0();
    CAMLlocal1(list);
    list = Val_emptylist;
    while (*s != NULL) {
        list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
    }
    CAMLreturn(list);
}

In the line

        list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */

C compiler first puts "list" pointer on stack and then calls
caml_copy_string(*s), potentially invalidating "list". Of course, the
stack copy of "list" is not registered as a global root so wrp_ml_cons
gets an invalid value.

Maybe Ocaml docs should be updated to warn about pitfalls like that?

- Dmitry Bely

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16  7:37 [Caml-list] Interfacing with C: bad practice Dmitry Bely
@ 2011-08-16  8:04 ` Török Edwin
  2011-08-16  8:25   ` Dmitry Bely
       [not found] ` <20110816.105738.246515733851238101.Christophe.Troestler@umons.ac.be>
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Török Edwin @ 2011-08-16  8:04 UTC (permalink / raw)
  To: caml-list

On 08/16/2011 10:37 AM, Dmitry Bely wrote:
> I would like to share my experience of writing bad C bindings. The
> following code is wrong, although no "living in harmony with the
> garbage collector" rule seems to be violated:
> 
> value wrp_ml_cons (value v, value l)
> {
>   CAMLparam2(v, l);
>   CAMLlocal1(cell);
>   cell = caml_alloc_small(2, Tag_cons);
>   Field(cell, 0) = v;
>   Field(cell, 1) = l;
>   CAMLreturn(cell);
> }
> 
> value string_list(const char ** s)
> {
>     CAMLparam0();
>     CAMLlocal1(list);
>     list = Val_emptylist;
>     while (*s != NULL) {
>         list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
>     }
>     CAMLreturn(list);
> }
> 
> In the line
> 
>         list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
> 
> C compiler first puts "list" pointer on stack and then calls
> caml_copy_string(*s), potentially invalidating "list".

list is a local root though, shouldn't that prevent the invalidation?

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16  8:04 ` Török Edwin
@ 2011-08-16  8:25   ` Dmitry Bely
  2011-08-16  8:43     ` Török Edwin
  0 siblings, 1 reply; 31+ messages in thread
From: Dmitry Bely @ 2011-08-16  8:25 UTC (permalink / raw)
  To: Caml List

2011/8/16 Török Edwin <edwintorok@gmail.com>:
> On 08/16/2011 10:37 AM, Dmitry Bely wrote:
>> I would like to share my experience of writing bad C bindings. The
>> following code is wrong, although no "living in harmony with the
>> garbage collector" rule seems to be violated:
>>
>> value wrp_ml_cons (value v, value l)
>> {
>>   CAMLparam2(v, l);
>>   CAMLlocal1(cell);
>>   cell = caml_alloc_small(2, Tag_cons);
>>   Field(cell, 0) = v;
>>   Field(cell, 1) = l;
>>   CAMLreturn(cell);
>> }
>>
>> value string_list(const char ** s)
>> {
>>     CAMLparam0();
>>     CAMLlocal1(list);
>>     list = Val_emptylist;
>>     while (*s != NULL) {
>>         list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
>>     }
>>     CAMLreturn(list);
>> }
>>
>> In the line
>>
>>         list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
>>
>> C compiler first puts "list" pointer on stack and then calls
>> caml_copy_string(*s), potentially invalidating "list".
>
> list is a local root though, shouldn't that prevent the invalidation?

Unfortunately not because wrp_ml_cons() second parameter is not
registered. So wrp_ml_cons() gets an invalid value.

- Dmitry Bely


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16  8:25   ` Dmitry Bely
@ 2011-08-16  8:43     ` Török Edwin
  2011-08-16  9:46       ` rixed
  0 siblings, 1 reply; 31+ messages in thread
From: Török Edwin @ 2011-08-16  8:43 UTC (permalink / raw)
  To: caml-list

On 08/16/2011 11:25 AM, Dmitry Bely wrote:
> 2011/8/16 Török Edwin <edwintorok@gmail.com>:
>> On 08/16/2011 10:37 AM, Dmitry Bely wrote:
>>> I would like to share my experience of writing bad C bindings. The
>>> following code is wrong, although no "living in harmony with the
>>> garbage collector" rule seems to be violated:
>>>
>>> value wrp_ml_cons (value v, value l)
>>> {
>>>   CAMLparam2(v, l);
>>>   CAMLlocal1(cell);
>>>   cell = caml_alloc_small(2, Tag_cons);
>>>   Field(cell, 0) = v;
>>>   Field(cell, 1) = l;
>>>   CAMLreturn(cell);
>>> }
>>>
>>> value string_list(const char ** s)
>>> {
>>>     CAMLparam0();
>>>     CAMLlocal1(list);

list is registered there, is that not enough? (the macro registers address of list AFAIK,
so even if you change its value, it'll stay registered as a local root)

>>>     list = Val_emptylist;
>>>     while (*s != NULL) {
>>>         list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
>>>     }
>>>     CAMLreturn(list);
>>> }
>>>
>>> In the line
>>>
>>>         list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
>>>
>>> C compiler first puts "list" pointer on stack and then calls
>>> caml_copy_string(*s), potentially invalidating "list".
>>
>> list is a local root though, shouldn't that prevent the invalidation?
> 
> Unfortunately not because wrp_ml_cons() second parameter is not
> registered. 

> So wrp_ml_cons() gets an invalid value.

'list' should be reachable via caml_local_roots, so if it really gets an invalid value
it sounds like a bug to me.

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
       [not found] ` <20110816.105738.246515733851238101.Christophe.Troestler@umons.ac.be>
@ 2011-08-16  9:21   ` Dmitry Bely
  2011-08-16 10:39     ` Mauricio Fernandez
  0 siblings, 1 reply; 31+ messages in thread
From: Dmitry Bely @ 2011-08-16  9:21 UTC (permalink / raw)
  To: Caml List

On Tue, Aug 16, 2011 at 12:57 PM, Christophe TROESTLER
<Christophe.Troestler@umons.ac.be> wrote:
> On Tue, 16 Aug 2011 11:37:03 +0400, Dmitry Bely wrote:
>>
>> I would like to share my experience of writing bad C bindings. The
>> following code is wrong, although no "living in harmony with the
>> garbage collector" rule seems to be violated:
>>
>> value wrp_ml_cons (value v, value l)
>> {
>>   CAMLparam2(v, l);
>>   CAMLlocal1(cell);
>>   cell = caml_alloc_small(2, Tag_cons);
>>   Field(cell, 0) = v;
>>   Field(cell, 1) = l;
>>   CAMLreturn(cell);
>> }
>>
>> value string_list(const char ** s)
>> {
>>     CAMLparam0();
>>     CAMLlocal1(list);
>>     list = Val_emptylist;
>>     while (*s != NULL) {
>>         list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
>>     }
>>     CAMLreturn(list);
>> }
>>
>> In the line
>>
>>         list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
>>
>> C compiler first puts "list" pointer on stack and then calls
>> caml_copy_string(*s), potentially invalidating "list". Of course, the
>> stack copy of "list" is not registered as a global root so wrp_ml_cons
>> gets an invalid value.
>
> Are you sure it is not because, in this way, “list” is being
> registered twice?

Where?

>  IMHO, wrp_ml_cons does not need to register values.

Let me explain again. Before wrp_ml_cons() is called, C compiler

1. Pushes a _copy_ of list value onto the stack.
2. Calls caml_copy_string(*s) and pushes its result onto the stack also.
3. Calls wrp_ml_cons().

After (2) the second parameter of wrp_ml_cons() becomes invalid: it is
a _copy_ of list, not registered as a local root.

Already two persons don't see the problem at first glance; It proves
that I was right starting the topic ;-)

- Dmitry Bely


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16  8:43     ` Török Edwin
@ 2011-08-16  9:46       ` rixed
  2011-08-16  9:53         ` Dmitry Bely
  0 siblings, 1 reply; 31+ messages in thread
From: rixed @ 2011-08-16  9:46 UTC (permalink / raw)
  To: caml-list

-[ Tue, Aug 16, 2011 at 11:43:24AM +0300, Török Edwin ]----
> 'list' should be reachable via caml_local_roots, so if it really gets an invalid value
> it sounds like a bug to me.

list may not be garbage collected (since it is indeed registered as the
root), but it may be moved arround (ie promoted to major heap).
The address of list would be fixed in the local variable (the root) but
not the transiant copy on the stack.
Maybe that's the actual problem ?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16  9:46       ` rixed
@ 2011-08-16  9:53         ` Dmitry Bely
  2011-08-16 10:17           ` Török Edwin
  0 siblings, 1 reply; 31+ messages in thread
From: Dmitry Bely @ 2011-08-16  9:53 UTC (permalink / raw)
  To: rixed; +Cc: caml-list

On Tue, Aug 16, 2011 at 1:46 PM,  <rixed@happyleptic.org> wrote:
> -[ Tue, Aug 16, 2011 at 11:43:24AM +0300, Török Edwin ]----
>> 'list' should be reachable via caml_local_roots, so if it really gets an invalid value
>> it sounds like a bug to me.
>
> list may not be garbage collected (since it is indeed registered as the
> root), but it may be moved arround (ie promoted to major heap).
> The address of list would be fixed in the local variable (the root) but
> not the transiant copy on the stack.
> Maybe that's the actual problem ?

Exactly. List is moved but its stack copy (wrp_ml_cons 2nd parameter)
is not updated and holds an old address.

- Dmitry Bely


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16  9:53         ` Dmitry Bely
@ 2011-08-16 10:17           ` Török Edwin
  2011-08-16 11:04             ` rixed
  0 siblings, 1 reply; 31+ messages in thread
From: Török Edwin @ 2011-08-16 10:17 UTC (permalink / raw)
  To: caml-list

On 08/16/2011 12:53 PM, Dmitry Bely wrote:
> On Tue, Aug 16, 2011 at 1:46 PM,  <rixed@happyleptic.org> wrote:
>> -[ Tue, Aug 16, 2011 at 11:43:24AM +0300, Török Edwin ]----
>>> 'list' should be reachable via caml_local_roots, so if it really gets an invalid value
>>> it sounds like a bug to me.
>>
>> list may not be garbage collected (since it is indeed registered as the
>> root), but it may be moved arround (ie promoted to major heap).
>> The address of list would be fixed in the local variable (the root) but
>> not the transiant copy on the stack.
>> Maybe that's the actual problem ?
> 
> Exactly. List is moved but its stack copy (wrp_ml_cons 2nd parameter)
> is not updated and holds an old address.

Ah yes.
So a best practice would be to always use a temporary variable
registered with CAMLlocal* when you call a function that can potentially invoke the GC?
(in your example to store the result of caml_copy_string).

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16  9:21   ` Dmitry Bely
@ 2011-08-16 10:39     ` Mauricio Fernandez
  2011-08-16 14:27       ` John Carr
  0 siblings, 1 reply; 31+ messages in thread
From: Mauricio Fernandez @ 2011-08-16 10:39 UTC (permalink / raw)
  To: caml-list

On Tue, Aug 16, 2011 at 01:21:02PM +0400, Dmitry Bely wrote:
> On Tue, Aug 16, 2011 at 12:57 PM, Christophe TROESTLER
> <Christophe.Troestler@umons.ac.be> wrote:
> > On Tue, 16 Aug 2011 11:37:03 +0400, Dmitry Bely wrote:
> >> In the line
> >>
> >>         list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
> >>
> 
> Let me explain again. Before wrp_ml_cons() is called, C compiler
> 
> 1. Pushes a _copy_ of list value onto the stack.
> 2. Calls caml_copy_string(*s) and pushes its result onto the stack also.
> 3. Calls wrp_ml_cons().
> 
> After (2) the second parameter of wrp_ml_cons() becomes invalid: it is
> a _copy_ of list, not registered as a local root.

This has been in my mind for a while: why don't CAMLparamX declare the local
variables as volatile?

-- 
Mauricio Fernandez  -   http://eigenclass.org

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 10:17           ` Török Edwin
@ 2011-08-16 11:04             ` rixed
  0 siblings, 0 replies; 31+ messages in thread
From: rixed @ 2011-08-16 11:04 UTC (permalink / raw)
  To: caml-list

-[ Tue, Aug 16, 2011 at 01:17:28PM +0300, Török Edwin ]----
> So a best practice would be to always use a temporary variable
> registered with CAMLlocal* when you call a function that can potentially invoke the GC?

I would say the best practive would be to not call an allocating
function from a parameters list, since other values in the
parameters list are unregistered copies untill the call is taken.

I would really like to see this exemple in the manual as well.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Caml-list] Re: Interfacing with C: bad practice
  2011-08-16  7:37 [Caml-list] Interfacing with C: bad practice Dmitry Bely
  2011-08-16  8:04 ` Török Edwin
       [not found] ` <20110816.105738.246515733851238101.Christophe.Troestler@umons.ac.be>
@ 2011-08-16 12:28 ` Dmitry Bely
  2011-08-16 15:25 ` [Caml-list] " Richard W.M. Jones
  3 siblings, 0 replies; 31+ messages in thread
From: Dmitry Bely @ 2011-08-16 12:28 UTC (permalink / raw)
  To: Caml List

On Tue, Aug 16, 2011 at 11:37 AM, Dmitry Bely <dmitry.bely@gmail.com> wrote:
(...)
>    while (*s != NULL) {
>        list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
>    }

Ah, s increment is missing. The loop should be written as

    for (; *s != NULL; s++) {
        list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
    }

- Dmitry Bely


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 10:39     ` Mauricio Fernandez
@ 2011-08-16 14:27       ` John Carr
  0 siblings, 0 replies; 31+ messages in thread
From: John Carr @ 2011-08-16 14:27 UTC (permalink / raw)
  To: caml-list


Mauricio Fernandez <mfp@acm.org> wrote:

> This has been in my mind for a while: why don't CAMLparamX declare the local
> variables as volatile?

That would not help.

Passing the address of a local variable to an external function
warns the compiler that the variable may change when another
external function is called.  That is necessary and sufficient
to accomodate OCaml's garbage collector.

The problem here is, order of evaluation of function call arguments
is unspecified.

In principle the same bug is present in

  Field(camlvar, 1) = allocating_function(...);

camlvar may be evaluated before the allocating function is called,
and the allocating function may move the object it points to.
I don't know whether the new C sequencing rules change this, but I
wouldn't intentionally write code that depends on the ordering of
this statement.

    --John Carr (jfc@mit.edu)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16  7:37 [Caml-list] Interfacing with C: bad practice Dmitry Bely
                   ` (2 preceding siblings ...)
  2011-08-16 12:28 ` [Caml-list] " Dmitry Bely
@ 2011-08-16 15:25 ` Richard W.M. Jones
  2011-08-16 15:51   ` rixed
                     ` (2 more replies)
  3 siblings, 3 replies; 31+ messages in thread
From: Richard W.M. Jones @ 2011-08-16 15:25 UTC (permalink / raw)
  To: Dmitry Bely; +Cc: Caml List

On Tue, Aug 16, 2011 at 11:37:03AM +0400, Dmitry Bely wrote:
> C compiler first puts "list" pointer on stack and then calls
> caml_copy_string(*s), potentially invalidating "list". Of course, the
> stack copy of "list" is not registered as a global root so wrp_ml_cons
> gets an invalid value.

I think this must be a bug in your C compiler.  The address of list is
stashed in the roots struct, so the C compiler should know that list
can be changed by the call to caml_copy_string.

What C compiler / version is this?

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 15:25 ` [Caml-list] " Richard W.M. Jones
@ 2011-08-16 15:51   ` rixed
  2011-08-16 16:00     ` Will M. Farr
  2011-08-16 16:10     ` Richard W.M. Jones
  2011-08-16 16:06   ` John Carr
  2011-08-16 16:13   ` Dmitry Bely
  2 siblings, 2 replies; 31+ messages in thread
From: rixed @ 2011-08-16 15:51 UTC (permalink / raw)
  To: Caml List

-[ Tue, Aug 16, 2011 at 04:25:50PM +0100, Richard W.M. Jones ]----
> I think this must be a bug in your C compiler.  The address of list is
> stashed in the roots struct, so the C compiler should know that list
> can be changed by the call to caml_copy_string.

Are you certain that the C abstract machine allow for any value stored
within the frameset of a function to be changed by a function call when
the address of the variable at hand is not passed to this function? And
mandate the C compiler to handle this scenario? In other words, mandate
the C compiler to reload from the stack all values between any function
call?

I don't think so ; or more likely I have not understood your view on
this matter?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 15:51   ` rixed
@ 2011-08-16 16:00     ` Will M. Farr
  2011-08-16 16:10     ` Richard W.M. Jones
  1 sibling, 0 replies; 31+ messages in thread
From: Will M. Farr @ 2011-08-16 16:00 UTC (permalink / raw)
  To: rixed; +Cc: Caml List

[-- Attachment #1: Type: text/plain, Size: 1767 bytes --]

Yes.  The C compiler can see that the address of list has been passed to external functions in CAMLlocal, so has to assume the worst: that the address is stored in some global variable, and caml_copy_string or the functions it calls can access it and thereby change the value of list.  If the compiler is a spectacularly optimizing one (i.e. performs a ton of inter-module procedure inlining and/or very careful escape analysis), it *may* be able to prove that this never happens, and therefore would be allowed to optimize the call in the manner proposed.  In this case, of course, "the worst" actually happens, so such a proof must fail; if the proof does not fail, the compiler has a bug.

Will

On Aug 16, 2011, at 11:51 AM, rixed@happyleptic.org wrote:

> -[ Tue, Aug 16, 2011 at 04:25:50PM +0100, Richard W.M. Jones ]----
>> I think this must be a bug in your C compiler.  The address of list is
>> stashed in the roots struct, so the C compiler should know that list
>> can be changed by the call to caml_copy_string.
> 
> Are you certain that the C abstract machine allow for any value stored
> within the frameset of a function to be changed by a function call when
> the address of the variable at hand is not passed to this function? And
> mandate the C compiler to handle this scenario? In other words, mandate
> the C compiler to reload from the stack all values between any function
> call?
> 
> I don't think so ; or more likely I have not understood your view on
> this matter?
> 
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 243 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 15:25 ` [Caml-list] " Richard W.M. Jones
  2011-08-16 15:51   ` rixed
@ 2011-08-16 16:06   ` John Carr
  2011-08-16 16:14     ` Wojciech Meyer
  2011-08-16 16:13   ` Dmitry Bely
  2 siblings, 1 reply; 31+ messages in thread
From: John Carr @ 2011-08-16 16:06 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Caml List


Richard W.M. Jones <rich@annexia.org> wrote:

> On Tue, Aug 16, 2011 at 11:37:03AM +0400, Dmitry Bely wrote:
> > C compiler first puts "list" pointer on stack and then calls
> > caml_copy_string(*s), potentially invalidating "list". Of course, the
> > stack copy of "list" is not registered as a global root so wrp_ml_cons
> > gets an invalid value.
> 
> I think this must be a bug in your C compiler.  The address of list is
> stashed in the roots struct, so the C compiler should know that list
> can be changed by the call to caml_copy_string.

The call

   f(g(), x)

can behave as either

  temp1 = g()
  temp2 = x
  f(temp1, temp2)

or

  temp1 = x
  temp2 = g()
  f(temp2, temp1)

The order does not need to be deterministic.

If the call to g() changes x, the second order results in the
function f() receiving the "wrong" value.

    --John Carr (jfc@mit.edu)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 15:51   ` rixed
  2011-08-16 16:00     ` Will M. Farr
@ 2011-08-16 16:10     ` Richard W.M. Jones
  2011-08-16 16:17       ` Richard W.M. Jones
                         ` (2 more replies)
  1 sibling, 3 replies; 31+ messages in thread
From: Richard W.M. Jones @ 2011-08-16 16:10 UTC (permalink / raw)
  To: rixed; +Cc: Caml List

On Tue, Aug 16, 2011 at 05:51:38PM +0200, rixed@happyleptic.org wrote:
> -[ Tue, Aug 16, 2011 at 04:25:50PM +0100, Richard W.M. Jones ]----
> > I think this must be a bug in your C compiler.  The address of list is
> > stashed in the roots struct, so the C compiler should know that list
> > can be changed by the call to caml_copy_string.
> 
> Are you certain that the C abstract machine allow for any value stored
> within the frameset of a function to be changed by a function call when
> the address of the variable at hand is not passed to this function? And
> mandate the C compiler to handle this scenario? In other words, mandate
> the C compiler to reload from the stack all values between any function
> call?
> 
> I don't think so ; or more likely I have not understood your view on
> this matter?

Well it would certainly help if we had a piece of runnable code to
look at.  The code supplied in the original email contains an infinite
loop.

Nevertheless, the C compiler isn't allowed to just push 'list' blindly
onto the stack and assume it doesn't change across the call to
'caml_copy_string'.  Calls in C are sequence points, and the compiler
cannot possibly assume that 'list' won't change, because the address
of 'list' has been taken *and* is directly reachable in two steps from
caml__local_roots, which is a global variable.

If the compiler is doing this (and we don't really have proof of that
because there is no runnable code nor is a compiler + version
specified), then the compiler has a bug.

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 15:25 ` [Caml-list] " Richard W.M. Jones
  2011-08-16 15:51   ` rixed
  2011-08-16 16:06   ` John Carr
@ 2011-08-16 16:13   ` Dmitry Bely
  2 siblings, 0 replies; 31+ messages in thread
From: Dmitry Bely @ 2011-08-16 16:13 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Caml List

On Tue, Aug 16, 2011 at 7:25 PM, Richard W.M. Jones <rich@annexia.org> wrote:
> On Tue, Aug 16, 2011 at 11:37:03AM +0400, Dmitry Bely wrote:
>> C compiler first puts "list" pointer on stack and then calls
>> caml_copy_string(*s), potentially invalidating "list". Of course, the
>> stack copy of "list" is not registered as a global root so wrp_ml_cons
>> gets an invalid value.
>
> I think this must be a bug in your C compiler.  The address of list is
> stashed in the roots struct,so the C compiler should know that list
> can be changed by the call to caml_copy_string.

Why the compiler should worry about that? It evaluates parameters form
right to left and that behavior fully conforms to the C standard
(evaluation order is unspecified).

> What C compiler / version is this?

Microsoft Visual C++ 10.0 ;-)

- Dmitry Bely


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 16:06   ` John Carr
@ 2011-08-16 16:14     ` Wojciech Meyer
  0 siblings, 0 replies; 31+ messages in thread
From: Wojciech Meyer @ 2011-08-16 16:14 UTC (permalink / raw)
  To: John Carr; +Cc: Richard W.M. Jones, Caml List

[-- Attachment #1: Type: text/plain, Size: 1336 bytes --]

On Tue, Aug 16, 2011 at 5:06 PM, John Carr <jfc@mit.edu> wrote:

>
> Richard W.M. Jones <rich@annexia.org> wrote:
>
> > On Tue, Aug 16, 2011 at 11:37:03AM +0400, Dmitry Bely wrote:
> > > C compiler first puts "list" pointer on stack and then calls
> > > caml_copy_string(*s), potentially invalidating "list". Of course, the
> > > stack copy of "list" is not registered as a global root so wrp_ml_cons
> > > gets an invalid value.
> >
> > I think this must be a bug in your C compiler.  The address of list is
> > stashed in the roots struct, so the C compiler should know that list
> > can be changed by the call to caml_copy_string.
>

Maybe looking at the assembly output would help. If you can post the output
with gcc -S flag.

Cheers;
Wojciech

The call
>
>   f(g(), x)
>
> can behave as either
>
>  temp1 = g()
>  temp2 = x
>  f(temp1, temp2)
>
> or
>
>  temp1 = x
>  temp2 = g()
>  f(temp2, temp1)
>
> The order does not need to be deterministic.
>
> If the call to g() changes x, the second order results in the
> function f() receiving the "wrong" value.
>
>    --John Carr (jfc@mit.edu)
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>

[-- Attachment #2: Type: text/html, Size: 2386 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 16:10     ` Richard W.M. Jones
@ 2011-08-16 16:17       ` Richard W.M. Jones
  2011-08-16 16:18       ` Dmitry Bely
  2011-08-16 17:08       ` [Caml-list] " rixed
  2 siblings, 0 replies; 31+ messages in thread
From: Richard W.M. Jones @ 2011-08-16 16:17 UTC (permalink / raw)
  Cc: Caml List

On Tue, Aug 16, 2011 at 05:10:42PM +0100, Richard W.M. Jones wrote:
> Well it would certainly help if we had a piece of runnable code to
> look at.  The code supplied in the original email contains an infinite
> loop.

I missed there was another email later that contains the
fixed code (s++).

I'm trying to see what assembly gcc generates for this ...

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 16:10     ` Richard W.M. Jones
  2011-08-16 16:17       ` Richard W.M. Jones
@ 2011-08-16 16:18       ` Dmitry Bely
  2011-08-16 16:22         ` Richard W.M. Jones
  2011-08-16 17:08       ` [Caml-list] " rixed
  2 siblings, 1 reply; 31+ messages in thread
From: Dmitry Bely @ 2011-08-16 16:18 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Caml List

On Tue, Aug 16, 2011 at 8:10 PM, Richard W.M. Jones <rich@annexia.org> wrote:

> Well it would certainly help if we had a piece of runnable code to
> look at.  The code supplied in the original email contains an infinite
> loop.

Yes, that was my mistake that I corrected later. The loop should be

 for (; *s != NULL; s++) {
    list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
 }

- Dmitry Bely


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 16:18       ` Dmitry Bely
@ 2011-08-16 16:22         ` Richard W.M. Jones
  2011-08-16 16:27           ` Richard W.M. Jones
  0 siblings, 1 reply; 31+ messages in thread
From: Richard W.M. Jones @ 2011-08-16 16:22 UTC (permalink / raw)
  To: Dmitry Bely; +Cc: Caml List

[-- Attachment #1: Type: text/plain, Size: 784 bytes --]

On Tue, Aug 16, 2011 at 08:18:48PM +0400, Dmitry Bely wrote:
> On Tue, Aug 16, 2011 at 8:10 PM, Richard W.M. Jones <rich@annexia.org> wrote:
> 
> > Well it would certainly help if we had a piece of runnable code to
> > look at.  The code supplied in the original email contains an infinite
> > loop.
> 
> Yes, that was my mistake that I corrected later. The loop should be
> 
>  for (; *s != NULL; s++) {
>     list = wrp_ml_cons(caml_copy_string(*s), list); /* bug! */
>  }

Well, contrary to what I said earlier, it really does seem
like this is undefined C behaviour.

Try the attached program, which distills the problem down to something
simple.  gcc with -O0 and -O2 produces completely different answers,
and the assembler output confirms it.

Rich.

-- 
Richard Jones
Red Hat

[-- Attachment #2: test.c --]
[-- Type: text/x-csrc, Size: 332 bytes --]

#include <stdio.h>
#include <stdlib.h>

void f (int a, int b);
int g (void);
int *global = NULL;

int
main (void)
{
  int x = 1;
  global = &x;
  f (g (), x);

  exit (0);
}

void
f (int a, int b)
{
  printf ("a = %d, b = %d\n", a, b);
}

int
g (void)
{
  if (global) {
    global++;
    return *global;
  }
  else
    return 42;
}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 16:22         ` Richard W.M. Jones
@ 2011-08-16 16:27           ` Richard W.M. Jones
  2011-08-16 16:30             ` malc
  2011-08-16 16:34             ` Török Edwin
  0 siblings, 2 replies; 31+ messages in thread
From: Richard W.M. Jones @ 2011-08-16 16:27 UTC (permalink / raw)
  To: Dmitry Bely; +Cc: Caml List

Grrr now _my_ program has a bug.

----------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>

void f (int a, int b);
int g (void);
int *global = NULL;

int
main (void)
{
  int x = 1;
  global = &x;
  f (g (), x);

  exit (0);
}

void
f (int a, int b)
{
  printf ("a = %d, b = %d\n", a, b);
}

int
g (void)
{
  if (global) {
    (*global)++;
    return *global;
  }
  else
    return 42;
}
----------------------------------------------------------------------

This program doesn't show undefined behaviour.

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 16:27           ` Richard W.M. Jones
@ 2011-08-16 16:30             ` malc
  2011-08-16 16:34             ` Török Edwin
  1 sibling, 0 replies; 31+ messages in thread
From: malc @ 2011-08-16 16:30 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Dmitry Bely, Caml List

On Tue, 16 Aug 2011, Richard W.M. Jones wrote:

> Grrr now _my_ program has a bug.
> 

FWIW i have asked Richard Henderson to share his opinion and he found
nothing wrong with memory.h and/or the OP's code, and believes it's a
compiler bug.

[..snip..]

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 16:27           ` Richard W.M. Jones
  2011-08-16 16:30             ` malc
@ 2011-08-16 16:34             ` Török Edwin
  2011-08-16 16:47               ` Richard W.M. Jones
  2011-08-16 16:55               ` [Caml-list] " Jeffrey Scofield
  1 sibling, 2 replies; 31+ messages in thread
From: Török Edwin @ 2011-08-16 16:34 UTC (permalink / raw)
  To: caml-list, rich

On 08/16/2011 07:27 PM, Richard W.M. Jones wrote:
> Grrr now _my_ program has a bug.
> 
> ----------------------------------------------------------------------
> #include <stdio.h>
> #include <stdlib.h>
> 
> void f (int a, int b);
> int g (void);
> int *global = NULL;
> 
> int
> main (void)
> {
>   int x = 1;
>   global = &x;
>   f (g (), x);
> 
>   exit (0);
> }
> 
> void
> f (int a, int b)
> {
>   printf ("a = %d, b = %d\n", a, b);
> }
> 
> int
> g (void)
> {
>   if (global) {
>     (*global)++;
>     return *global;
>   }
>   else
>     return 42;
> }
> ----------------------------------------------------------------------
> 
> This program doesn't show undefined behaviour.

Undefined behaviour doesn't mean it must show different results with -O2,
it *might* if the compiler decides to do some optimization.

But isn't this 'f(g(), x)' issue the same as the classic example of undefined behaviour with f(++i, ++i)?

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 16:34             ` Török Edwin
@ 2011-08-16 16:47               ` Richard W.M. Jones
  2011-08-16 16:55               ` [Caml-list] " Jeffrey Scofield
  1 sibling, 0 replies; 31+ messages in thread
From: Richard W.M. Jones @ 2011-08-16 16:47 UTC (permalink / raw)
  To: Török Edwin; +Cc: caml-list

On Tue, Aug 16, 2011 at 07:34:31PM +0300, Török Edwin wrote:
> Undefined behaviour doesn't mean it must show different results with -O2,
> it *might* if the compiler decides to do some optimization.
> 
> But isn't this 'f(g(), x)' issue the same as the classic example of undefined behaviour with f(++i, ++i)?

I was a bit unclear.  I just meant the program no longer displayed
random numbers.

I agree (now) that this appears to be unspecified behaviour, like
f(++i, ++i), but rather more indirectly.

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Caml-list] Re: Interfacing with C: bad practice
  2011-08-16 16:34             ` Török Edwin
  2011-08-16 16:47               ` Richard W.M. Jones
@ 2011-08-16 16:55               ` Jeffrey Scofield
  2011-08-16 17:08                 ` Will M. Farr
  2011-08-16 19:46                 ` Gerd Stolpmann
  1 sibling, 2 replies; 31+ messages in thread
From: Jeffrey Scofield @ 2011-08-16 16:55 UTC (permalink / raw)
  To: caml-list

Török Edwin <edwintorok@gmail.com> writes:

> But isn't this 'f(g(), x)' issue the same as the classic example
> of undefined behaviour with f(++i, ++i)?

It's not quite the same, because a function call (g())
introduces a sequence point.  In the f(++i, ++i) case,
I think there's only a sequence point after the call
to f.

I personally think the effect of f(x, g()) is unspecified
if g() changes the value of x.  I don't think the compiler
is required to chase down dependencies like this, that's why
the order of parameter passing is left unspecified.

I've often worried about exactly this case when writing
OCaml/C interfaces.

I'm far from an expert, however.

(One great thing about the ML family is that there
are exceptionally clear answers to questions like this.
Especially for Standard ML.)

Jeffrey


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Interfacing with C: bad practice
  2011-08-16 16:10     ` Richard W.M. Jones
  2011-08-16 16:17       ` Richard W.M. Jones
  2011-08-16 16:18       ` Dmitry Bely
@ 2011-08-16 17:08       ` rixed
  2 siblings, 0 replies; 31+ messages in thread
From: rixed @ 2011-08-16 17:08 UTC (permalink / raw)
  To: caml-list

-[ Tue, Aug 16, 2011 at 05:10:42PM +0100, Richard W.M. Jones ]----
> Nevertheless, the C compiler isn't allowed to just push 'list' blindly
> onto the stack and assume it doesn't change across the call to
> 'caml_copy_string'.

For me,
	wrp_ml_cons(caml_copy_string(*s), list);
with caml_copy_string changing list, fall under the rule that a value
shall not be modified twice between two sequence points. I've just read
the relevant parts of the spec and if indeed there is a sequence point
between parameters evaluation and the call itself, there are not between
the evaluation of the parameters themself (which order of evaluation is,
of course, undefined)! Please see for yourself:

       [#10] The order of evaluation of  the  function  designator,
       the  actual  arguments, and subexpressions within the actual
       arguments is unspecified, but  there  is  a  sequence  point
       before the actual call.

So it seams to me that the compiler if entitled to evaluate list first,
and then evaluate caml_copy_string(*s), provided it "fully" evaluates
them before the actual jump to wrp_ml_cons. This is unrelated to wether
the address of list is available to caml_copy_string or anything; the
programmer is supposed to know that the order of evaluation is undefined
and write his code accordingly.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Re: Interfacing with C: bad practice
  2011-08-16 16:55               ` [Caml-list] " Jeffrey Scofield
@ 2011-08-16 17:08                 ` Will M. Farr
  2011-08-16 19:46                 ` Gerd Stolpmann
  1 sibling, 0 replies; 31+ messages in thread
From: Will M. Farr @ 2011-08-16 17:08 UTC (permalink / raw)
  To: Caml List

[-- Attachment #1: Type: text/plain, Size: 2304 bytes --]

An aside: In case people want to crib the language, the Racket (formerly PLT Scheme) manual contains a warning about exactly this problem in the appended text (Section 3.1.2 of the "Inside: Racket C API" at http://docs.racket-lang.org/inside/im_memoryalloc.html#(part._im~3a3m~3astack) ).  It shouldn't be hard to adapt the discussion to something appropriate for the OCaml manual:

---------------------------------
The 3m collector needs to know the address of every local or temporary pointer within a function call at any point when a collection can be triggered. Beware that nested function calls can hide temporary pointers; for example, in

  scheme_make_pair(scheme_make_pair(scheme_true, scheme_false),
                   scheme_make_pair(scheme_false, scheme_true))

the result from one scheme_make_pair call is on the stack or in a register during the other call toscheme_make_pair; this pointer must be exposed to the garbage collection and made subject to update. Simply changing the code to

  tmp = scheme_make_pair(scheme_true, scheme_false);
  scheme_make_pair(tmp,
                   scheme_make_pair(scheme_false, scheme_true))

does not expose all pointers, since tmp must be evaluated before the second call toscheme_make_pair. In general, the above code must be converted to the form

  tmp1 = scheme_make_pair(scheme_true, scheme_false);
  tmp2 = scheme_make_pair(scheme_true, scheme_false);
  scheme_make_pair(tmp1, tmp2);

and this is converted form must be instrumented to register tmp1 and tmp2. The final result might be

  {
    Scheme_Object *tmp1 = NULL, *tmp2 = NULL, *result;
    MZ_GC_DECL_REG(2);
  
    MZ_GC_VAR_IN_REG(0, tmp1);
    MZ_GC_VAR_IN_REG(1, tmp2);
    MZ_GC_REG();
  
    tmp1 = scheme_make_pair(scheme_true, scheme_false);
    tmp2 = scheme_make_pair(scheme_true, scheme_false);
    result = scheme_make_pair(tmp1, tmp2);
  
    MZ_GC_UNREG();
  
    return result;
  }

Notice that result is not registered above. The MZ_GC_UNREG macro cannot trigger a garbage collection, so the result variable is never live during a potential collection. Note also that tmp1 andtmp2 are initialized with NULL, so that they always contain a pointer whenever a collection is possible.
--------------------------------------------

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 243 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Caml-list] Re: Interfacing with C: bad practice
  2011-08-16 16:55               ` [Caml-list] " Jeffrey Scofield
  2011-08-16 17:08                 ` Will M. Farr
@ 2011-08-16 19:46                 ` Gerd Stolpmann
  2011-08-16 20:18                   ` Jeffrey Scofield
  1 sibling, 1 reply; 31+ messages in thread
From: Gerd Stolpmann @ 2011-08-16 19:46 UTC (permalink / raw)
  To: Jeffrey Scofield; +Cc: caml-list

Am Dienstag, den 16.08.2011, 11:55 -0500 schrieb Jeffrey Scofield:
> (One great thing about the ML family is that there
> are exceptionally clear answers to questions like this.
> Especially for Standard ML.)

I don't know for SML, but OCaml also leaves the order unspecified in
which function arguments are evaluated (and ocamlc and ocamlopt behave
even differently in this respect). So I think the problem would
translate to OCaml in some form.

Gerd


> 
> Jeffrey
> 
> 

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Caml-list] Re: Interfacing with C: bad practice
  2011-08-16 19:46                 ` Gerd Stolpmann
@ 2011-08-16 20:18                   ` Jeffrey Scofield
  0 siblings, 0 replies; 31+ messages in thread
From: Jeffrey Scofield @ 2011-08-16 20:18 UTC (permalink / raw)
  To: caml-list

Gerd Stolpmann <info@gerd-stolpmann.de> writes:

> I don't know for SML, but OCaml also leaves the order unspecified in
> which function arguments are evaluated (and ocamlc and ocamlopt behave
> even differently in this respect). So I think the problem would
> translate to OCaml in some form.

This is a good point.

A problem with C (and almost every other language) is that there's a
lot of room for debate about what the *standard* means (as opposed to
the meaning of particular programs).  With the ML family you have a
formal framework (lambda calculus, I guess) that makes things quite
a bit less ambiguous.  You can still have unspecified parts of the
semantics, but at least it's clearer where the unspecified parts are!

I don't know offhand whether parameter evaluation order is defined
for Standard ML, either.  But there's a very clear specification that
you can look at to find out.  (It seems like the sort of thing that
would be defined in SML.)

Jeffrey


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2011-08-16 20:18 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-16  7:37 [Caml-list] Interfacing with C: bad practice Dmitry Bely
2011-08-16  8:04 ` Török Edwin
2011-08-16  8:25   ` Dmitry Bely
2011-08-16  8:43     ` Török Edwin
2011-08-16  9:46       ` rixed
2011-08-16  9:53         ` Dmitry Bely
2011-08-16 10:17           ` Török Edwin
2011-08-16 11:04             ` rixed
     [not found] ` <20110816.105738.246515733851238101.Christophe.Troestler@umons.ac.be>
2011-08-16  9:21   ` Dmitry Bely
2011-08-16 10:39     ` Mauricio Fernandez
2011-08-16 14:27       ` John Carr
2011-08-16 12:28 ` [Caml-list] " Dmitry Bely
2011-08-16 15:25 ` [Caml-list] " Richard W.M. Jones
2011-08-16 15:51   ` rixed
2011-08-16 16:00     ` Will M. Farr
2011-08-16 16:10     ` Richard W.M. Jones
2011-08-16 16:17       ` Richard W.M. Jones
2011-08-16 16:18       ` Dmitry Bely
2011-08-16 16:22         ` Richard W.M. Jones
2011-08-16 16:27           ` Richard W.M. Jones
2011-08-16 16:30             ` malc
2011-08-16 16:34             ` Török Edwin
2011-08-16 16:47               ` Richard W.M. Jones
2011-08-16 16:55               ` [Caml-list] " Jeffrey Scofield
2011-08-16 17:08                 ` Will M. Farr
2011-08-16 19:46                 ` Gerd Stolpmann
2011-08-16 20:18                   ` Jeffrey Scofield
2011-08-16 17:08       ` [Caml-list] " rixed
2011-08-16 16:06   ` John Carr
2011-08-16 16:14     ` Wojciech Meyer
2011-08-16 16:13   ` Dmitry Bely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).