Continuing this conversation about equality...

On Aug 9, 2009, at 9:20 AM, David Allsopp wrote:

> Ivan Chollet wrote:
>
>> I would have thought physical equality implies structural equality,  
>> but it
>> doesn't seem like it.
>> Can you please explain to me what’s wrong there?
>
> "It does" - but only with the given caveat that comparison of cyclic
> structures may not terminate. Pervasives.compare has a similar  
> restriction,
> although note that [compare q q] in your example does return 0  
> (which is
> lucky or it would be a bug as the docs for Pervasives.(==) state  
> that [x ==
> y] => [compare x y = 0]).

Let me start with a few pedantic comments about the documentation of  
(==):

First, what David says isn't *quite* what the documentation says. The  
documentation only says this is [x==y] implies [compare x y = 0]  for  
non-mutable structures. And in fact, what the documentation says is  
surprisingly weak: an implementation could define (==) as *always  
false* for non-mutable structures and yet satisfy the documentation.  
I'm not sure of the right way to reword it, but this loophole in the  
specification seems undesirable.

My other issue is that the description of (==) for mutable structures  
doesn't specify that it is symmetric; reading the documentation  
literally only implies that e1 is a substructure of e2. Even just  
adding 'and vice versa' might clean this up:
e1 == e2 is true if and only if physical modification of e1 also  
affects e2 and vice versa

Okay, now for more substantive questions:

> The notes in http://caml.inria.fr/mantis/view.php?id=3322 may be of  
> interest
> too...

I hadn't paid attention to the distinction drawn between 'may not  
terminate' and 'does not terminate' until I read the discussion at  
that link, but it's relevant to my question below...

On Aug 9, 2009, at 9:55 AM, Alain Frisch wrote:

> On 8/9/2009 2:06 PM, ivan chollet wrote:
>
>> I would have thought physical equality implies structural equality,  
>> but
>> it doesn’t seem like it.
>>
>> Can you please explain to me what’s wrong there?
>
> There are two modes for the generic comparison. The total mode  
> (Pervasives.compare) creates a total ordering between values (except  
> for functional values and custom blocks with no comparison function)  
> and uses physical equality as a shortcut to cut the recursive  
> traversal of sub-values. The non-total mode (used by the operators =  
> < > <= >=) has a different behavior for NaN values ([nan <= x], [x  
> <= nan], and [nan = x] all return false, including when x is nan)  
> and does not use the physical equality shortcut (so that [let x =  
> (nan, nan) in x = x] returns false).

In terms of 'may not' versus 'does not' terminate, I just noticed that  
the current documentation for (=) says 'may not terminate' while the  
documentation for (>=) says 'does not terminate'. However, the example  
cited in the link above:
type t = { a : t };;
let rec x = { a = x } in x = x
doesn't terminate in OCaml 3.11. This seems to be about as simple a  
cyclic structure as there is, so shouldn't (=)'s documentation say  
'Equality between cyclic structures does not terminate.'?

Also, the documentation for Pervasives.compare says 'may not  
terminate'. Is this supposed to imply that it uses the shortcut Alain  
mentions (because if it did not use physical equality as a shortcut  
then it would say 'does not terminate')? Or is this shortcutting meant  
to be undocumented?

And is there any reason, aside from NaN values, that (=) does *not*  
use physical equality as a shortcut? The only other reason I can think  
of is that, in cases where things are in fact *not* physically equal,  
checking physical equality would introduce a small overhead.

In any case, if I have a data structure which I know does not contain  
any NaNs (for example, maybe it doesn't contain any floats  
whatsoever), is there ever a reason I should prefer (=) to (compare x  
y = 0)? It sounds to me like compare is preferable because of its  
shortcutting behavior.

-Elnatan