caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* RE: [Caml-list] Efficient and canonical set representation?
@ 2003-11-07 17:27 Fred Smith
  2003-11-10 13:24 ` Diego Olivier Fernandez Pons
  0 siblings, 1 reply; 29+ messages in thread
From: Fred Smith @ 2003-11-07 17:27 UTC (permalink / raw)
  To: Samuel Lacas, caml-list


Ocaml has a structural comparison function (<=) : 'a -> 'a -> bool.  

But the bigger problem with this approach (as Jean-Baptiste Rouquier) kindly pointed out to me is that insertion takes O(n), not O(log(n)) time. Duh.

-Fred


> -----Original Message-----
> From: owner-caml-list@pauillac.inria.fr 
> [mailto:owner-caml-list@pauillac.inria.fr] On Behalf Of Samuel Lacas
> Sent: Friday, November 07, 2003 10:44 AM
> To: caml-list@inria.fr
> Subject: Re: [Caml-list] Efficient and canonical set representation?
> 
> 
> Fred Smith a écrit 2.2K le Fri, Nov 07, 2003 at 10:27:25AM -0500: # 
> # I guess what you're looking for are sorted arrays:
> #   1) O(log n) lookup and insertion via binary search
> #   2) O(n) union and intersection are simple
> #   3) Equal sets are represented by structurally equivalent objects.
> # 
> # -Fred
> 
> Hmm, except that, if I'm not wrong, it was required the 
> structure to hold any kind of object. Sorted arrays require 
> the elements to be sortable. Using the hash of the objects 
> may be an answer ?
> 
> sL
> 
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: 
http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [Caml-list] Efficient and canonical set representation?
  2003-11-07 17:27 [Caml-list] Efficient and canonical set representation? Fred Smith
@ 2003-11-10 13:24 ` Diego Olivier Fernandez Pons
  2003-11-10 13:49   ` [Caml-list] Rounding mode Christophe Raffalli
  2003-11-10 19:28   ` [Caml-list] Efficient and canonical set representation? Julien Signoles
  0 siblings, 2 replies; 29+ messages in thread
From: Diego Olivier Fernandez Pons @ 2003-11-10 13:24 UTC (permalink / raw)
  To: johnh; +Cc: caml-list

    Bonjour,

> After your remarks and Brian's, I'm starting to wonder if it is
> possible at all to do what I want. Maybe I should be looking for an
> impossibility proof instead...

Patricia sets seem to be what you are looking for.
 (1). Efficient usual operations (lookup, insertion, union)
 (2). Structural equality

Their only problem is that they cannot handle polymorphic orderable
types but only integers...

Hash the data, use this key to insert it in a patricia map and solve
the collisions by chaining in an ordered list (with the polymorphic
[compare] function). (1) and (2) still hold under usual hypothesis on
the rate of collisions.

A few changes to JCF's implementation should be enough.

        Diego Olivier



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Caml-list] Rounding mode
  2003-11-10 13:24 ` Diego Olivier Fernandez Pons
@ 2003-11-10 13:49   ` Christophe Raffalli
  2003-11-10 14:10     ` Eric Dahlman
       [not found]     ` <16305.25815.3793.545198@karryall.dnsalias.org>
  2003-11-10 19:28   ` [Caml-list] Efficient and canonical set representation? Julien Signoles
  1 sibling, 2 replies; 29+ messages in thread
From: Christophe Raffalli @ 2003-11-10 13:49 UTC (permalink / raw)
  Cc: caml-list


I saw a previous discussion about rounding mode for OCaml (in 2000).

As anyone implemented the functions to change the rounding mode for 
floating point from OCaml (possibly as a patch to add primitives and 
save a C function call) ?

This is necessary to implement interval arithmetic ...

Note: I found the assembly code for 6 architectures on the web:
http://www-anp.lip6.fr/cadna/Rounding_mode_Dir/Accueil.php

 From this, it would be easy to add it in OCaml ?


-- 
Christophe Raffalli
Université de Savoie
Batiment Le Chablais, bureau 21
73376 Le Bourget-du-Lac Cedex

tél: (33) 4 79 75 81 03
fax: (33) 4 79 75 87 42
mail: Christophe.Raffalli@univ-savoie.fr
www: http://www.lama.univ-savoie.fr/~RAFFALLI
---------------------------------------------
IMPORTANT: this mail is signed using PGP/MIME
At least Enigmail/Mozilla, mutt or evolution
can check this signature
---------------------------------------------

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Rounding mode
  2003-11-10 13:49   ` [Caml-list] Rounding mode Christophe Raffalli
@ 2003-11-10 14:10     ` Eric Dahlman
  2003-11-10 18:03       ` Brian Hurt
  2003-11-10 21:23       ` Christophe Raffalli
       [not found]     ` <16305.25815.3793.545198@karryall.dnsalias.org>
  1 sibling, 2 replies; 29+ messages in thread
From: Eric Dahlman @ 2003-11-10 14:10 UTC (permalink / raw)
  To: Christophe Raffalli; +Cc: caml-list

Christophe Raffalli wrote:

>
> I saw a previous discussion about rounding mode for OCaml (in 2000).
>
> As anyone implemented the functions to change the rounding mode for 
> floating point from OCaml (possibly as a patch to add primitives and 
> save a C function call) ?
>
> This is necessary to implement interval arithmetic ...

Somewhat off topic but why is this necessary from a numerical math type 
of perspective.  I am honestly curious as I don't see how this would 
interact with the calculation in a meaningful way.

-Eric

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Rounding mode
  2003-11-10 14:10     ` Eric Dahlman
@ 2003-11-10 18:03       ` Brian Hurt
  2003-11-10 20:35         ` Eric Dahlman
  2003-11-10 21:23       ` Christophe Raffalli
  1 sibling, 1 reply; 29+ messages in thread
From: Brian Hurt @ 2003-11-10 18:03 UTC (permalink / raw)
  To: Eric Dahlman; +Cc: Ocaml Mailing List

On Mon, 10 Nov 2003, Eric Dahlman wrote:

> Christophe Raffalli wrote:
> 
> >
> > I saw a previous discussion about rounding mode for OCaml (in 2000).
> >
> > As anyone implemented the functions to change the rounding mode for 
> > floating point from OCaml (possibly as a patch to add primitives and 
> > save a C function call) ?
> >
> > This is necessary to implement interval arithmetic ...
> 
> Somewhat off topic but why is this necessary from a numerical math type 
> of perspective.  I am honestly curious as I don't see how this would 
> interact with the calculation in a meaningful way.
>

Simplistic discussion of why changing the rounding modes is usefull for 
numeric programming.  If you know numerical analysis, stop reading now 
:-).

FP numbers are only precise to so many digits.  So every time you do an 
operation, a small amount of error is introduced- rather than being the 
exact answer, it's the closest number to the exact answer that FP numbers 
can represent.  This is why the following code does not return the answer 
you expect:

let unexpected () =
    let loop i n =
        if (i < 10) then
            loop (i + 1) (n +. 0.1)
        else
            n
    in
    loop 0 0.
;;

1/10 in binary is infinitely repeating the way 1/3 is in decimal.  So the 
expression n +. 0.1 isn't adding *exactly* 0.1, but instead 0.1 - e for 
some small e.  Where e is ~1E-16.  But this is large enough that 
unexpected() returns  0.999999999999999889 and not 1.0.

Note also that this is nothing to do with Ocaml- C, Java, Perl, Python, 
FORTRAN, and every other language out there that does floating point has 
this problem- it's a problem with floating point.

Now, for a single operation this error is negligable- one the order of
1e-15 (assuming the answer is 1.0).  But over large numbers of
calculations, this error can accumulate, and become large enough to affect
the results.

Consider, for example, multiplying two numbers- x and y.  But you don't 
have x and y exactly, you have (x + e) and (y + e) for some error amount 
e.  You multiply them, and you get (x + e)*(y + e) = x*y + x*e + y*e + 
e*e.  Now, e*e is vanishingly small, we can ignore it.  And, assuming x 
and y are both about 1.0, our answer is x*y + ~2*e- in other words, we've 
about doubled the amount of error.  Even starting at 1e-15, doubling it 
every operation and you eat up all your precision surprisingly fast.

How can we tell how much error we have?  Well, you can either do a whole
bunch of graduate-level mathematics, or you can run the same algorithm
with different rounding modes.  If you always round towards positive
infinity, you'll always be adding an error term- i.e. it'll be (x + e) and
(y + e) with a result of x*y + 2e.  If you always round to negative
infinity, you'll be subtracting the error term- (x - e) and (y - e) with a 
result x*Y - 2e.  Comparing the results gives you both a good idea of 
where the answer really is, and how big your errors are.

Note, all is not necessarily lost.  If instead we have (x + e) and (y - e)  
a little bit of quick algebra shows that the result is actually *more*
accurate than either of the original inputs.  The errors tend to cancel
instead of accumulate.  Numerical analysts call algorithms whose errors
tend to cancel "stable", while algorithsm whose errors tend to accumulate
are "unstable".  A good textbook on numerical methods will generally only 
discuss stable algorithms- use 'em.

The effect can also be sort of acheived by using a random rounding method.  
For each operation, the CPU picks one of it's rounding methods at random 
to use (well, pseudo-random anyways).  Now, while this will in general 
increase the accuracy of floating point programs, it also means that every 
time you run a numerical program, even over the same data, you will get a 
slightly different answer.  Newbies wondering why their program keeps 
changing it's answer will probably rival newbies wondering why their stack 
overflows pretty quickly.  Which is why I'm hesitant to advocate doing 
this to all programs.  I still would like to be able to select the 
rounding mode- either at compile time or at run time, which ever is 
easier.  Actually, run time is probably easier to implement and more 
usefull (I can switch rounding modes for different parts of the program).
 
-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [Caml-list] Efficient and canonical set representation?
  2003-11-10 13:24 ` Diego Olivier Fernandez Pons
  2003-11-10 13:49   ` [Caml-list] Rounding mode Christophe Raffalli
@ 2003-11-10 19:28   ` Julien Signoles
  1 sibling, 0 replies; 29+ messages in thread
From: Julien Signoles @ 2003-11-10 19:28 UTC (permalink / raw)
  To: caml-list

On Mon, 10 Nov 2003, Diego Olivier Fernandez Pons wrote:

> Patricia sets seem to be what you are looking for.
>  (1). Efficient usual operations (lookup, insertion, union)
>  (2). Structural equality
>
> Their only problem is that they cannot handle polymorphic orderable
> types but only integers...
>
> Hash the data, use this key to insert it in a patricia map and solve
> the collisions by chaining in an ordered list (with the polymorphic
> [compare] function). (1) and (2) still hold under usual hypothesis on
> the rate of collisions.
>
> A few changes to JCF's implementation should be enough.

I think JCF's Hmap module is what you want.
A hmap is a map over hash-consed values implemented as Patricia Trees.
See http://www.lri.fr/~filliatr/software.en.html for more details.

Julien Signoles
-- 
mailto:Julien.Signoles@lri.fr ; http://www.lri.fr/~signoles
"In theory, practice and theory are the same,
but in practice they are different" (Larry McVoy)

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Rounding mode
  2003-11-10 18:03       ` Brian Hurt
@ 2003-11-10 20:35         ` Eric Dahlman
  2003-11-10 23:09           ` Brian Hurt
  2003-11-12 17:19           ` Diego Olivier Fernandez Pons
  0 siblings, 2 replies; 29+ messages in thread
From: Eric Dahlman @ 2003-11-10 20:35 UTC (permalink / raw)
  To: Brian Hurt; +Cc: Ocaml Mailing List

Brian Hurt wrote:

>On Mon, 10 Nov 2003, Eric Dahlman wrote:
>
>  
>
>>Christophe Raffalli wrote:
>>
>>    
>>
>>>I saw a previous discussion about rounding mode for OCaml (in 2000).
>>>
>>>As anyone implemented the functions to change the rounding mode for 
>>>floating point from OCaml (possibly as a patch to add primitives and 
>>>save a C function call) ?
>>>
>>>This is necessary to implement interval arithmetic ...
>>>      
>>>
>>Somewhat off topic but why is this necessary from a numerical math type 
>>of perspective.  I am honestly curious as I don't see how this would 
>>interact with the calculation in a meaningful way.
>>
>>    
>>
>
>Simplistic discussion of why changing the rounding modes is usefull for 
>numeric programming.  If you know numerical analysis, stop reading now 
>:-).
>  
>
I read it anyway ;-)

[Snip a description of how floating point works.]

>Consider, for example, multiplying two numbers- x and y.  But you don't 
>have x and y exactly, you have (x + e) and (y + e) for some error amount 
>e.  You multiply them, and you get (x + e)*(y + e) = x*y + x*e + y*e + 
>e*e.  Now, e*e is vanishingly small, we can ignore it.  And, assuming x 
>and y are both about 1.0, our answer is x*y + ~2*e- in other words, we've 
>about doubled the amount of error.  Even starting at 1e-15, doubling it 
>every operation and you eat up all your precision surprisingly fast.
>  
>
>How can we tell how much error we have?  Well, you can either do a whole
>bunch of graduate-level mathematics, or you can run the same algorithm
>with different rounding modes.  If you always round towards positive
>infinity, you'll always be adding an error term- i.e. it'll be (x + e) and
>(y + e) with a result of x*y + 2e.  If you always round to negative
>infinity, you'll be subtracting the error term- (x - e) and (y - e) with a 
>result x*Y - 2e.  Comparing the results gives you both a good idea of 
>where the answer really is, and how big your errors are.
>  
>
Now I don't know that it is this simple.  I understand that by always 
rounding in one direction you are in effect  calculating a bound on the 
error in that direction but that will only work in the most simplistic 
of cases.  For instance if half of your values had their sign changed 
before being summed the sign of half  the associated error terms would 
also change.  The result would not give you any information about the 
effective lower/upper bound you thought you were calculating.  It seems 
to me that this approach is only applicable to problems which would be 
trivially solvable with graduate level mathematics and of absolutely no 
value in any interesting cases.

>Note, all is not necessarily lost.  If instead we have (x + e) and (y - e)  
>a little bit of quick algebra shows that the result is actually *more*
>accurate than either of the original inputs.  The errors tend to cancel
>instead of accumulate.  Numerical analysts call algorithms whose errors
>tend to cancel "stable", while algorithsm whose errors tend to accumulate
>are "unstable".  A good textbook on numerical methods will generally only 
>discuss stable algorithms- use 'em.
>  
>
This is incorrect because you are thinking about real numbers while 
working with floating point numbers, this is especially dangerous when 
you are operating near or below the epsilon for the particular floating 
point representation like we are now.  If you were to look at the set of 
numbers which are precisely represented by a given floating point 
representation you would see that it is concentrated near 0 and that it 
tends to thin out as you move away from there.   The distance  between 
any  two adjacent floating point numbers is highly variable and not 
uniform or "nice" by almost any measure.  I really am having a hard time 
figuring out what you can say about the tiny e terms in this case as we 
move between regions with differing "floating point resolutions".  

Just to illustrate this the different "resolutions" the smallest 
positive double float which can be represented is  2.225 x 10^(-308) 
while the epsilon for doubles in general is 1.11 x 10^(-16) and can be 
thought of at the coarsest resolution.  That is a big difference and is 
because floats are packed in tightly around 0.

As for a numerically stable algorithm being able to produce a result 
which is more accurate than the original inputs that is total hogwash.  
You cannot ever have more accuracy than was present in your original 
input, ever. Nihil ex nihilo.

>The effect can also be sort of acheived by using a random rounding method.  
>For each operation, the CPU picks one of it's rounding methods at random 
>to use (well, pseudo-random anyways).  Now, while this will in general 
>increase the accuracy of floating point programs, it also means that every 
>time you run a numerical program, even over the same data, you will get a 
>slightly different answer.  Newbies wondering why their program keeps 
>changing it's answer will probably rival newbies wondering why their stack 
>overflows pretty quickly.  Which is why I'm hesitant to advocate doing 
>this to all programs.  I still would like to be able to select the 
>rounding mode- either at compile time or at run time, which ever is 
>easier.  Actually, run time is probably easier to implement and more 
>usefull (I can switch rounding modes for different parts of the program). 
>  
>
As an oldby I would be disconcerted by this as well.  All you are doing 
is trading one sort of  numerical noise for another which you guess 
should be better behaved but in the absence of a neat proof to the 
contrary I don't believe is.  There are just *SO* many sources of huger 
error terms in numerical calculations that the tiny rounding errors are 
really lost in the noise.  I even tried to come up with a case where it 
would matter but that got so farcical that I gave up on it.

What I was really wondering about when I first asked this question was 
if the other person had exactly that kind of example where they were 
using some clever trick to usefully misapply floating point calculations 
in their problem domain.

-Eric

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Rounding mode
  2003-11-10 14:10     ` Eric Dahlman
  2003-11-10 18:03       ` Brian Hurt
@ 2003-11-10 21:23       ` Christophe Raffalli
  1 sibling, 0 replies; 29+ messages in thread
From: Christophe Raffalli @ 2003-11-10 21:23 UTC (permalink / raw)
  To: Eric Dahlman; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]

Eric Dahlman wrote:
> Christophe Raffalli wrote:
> 
>>
>> I saw a previous discussion about rounding mode for OCaml (in 2000).
>>
>> As anyone implemented the functions to change the rounding mode for 
>> floating point from OCaml (possibly as a patch to add primitives and 
>> save a C function call) ?
>>
>> This is necessary to implement interval arithmetic ...
> 
> 
> Somewhat off topic but why is this necessary from a numerical math type 
> of perspective.  I am honestly curious as I don't see how this would 
> interact with the calculation in a meaningful way.
> 
> -Eric
> 

interval arithmetic proceeds by computing not one appriximation of a 
real number x but an interval containing x. To add two intervals, you 
need to add both ends ... but one end with near to plus infinity round 
and the other end with near to minus infinity (if you care about correct 
program :-).

-- 
Christophe Raffalli
Université de Savoie
Batiment Le Chablais, bureau 21
73376 Le Bourget-du-Lac Cedex

tél: (33) 4 79 75 81 03
fax: (33) 4 79 75 87 42
mail: Christophe.Raffalli@univ-savoie.fr
www: http://www.lama.univ-savoie.fr/~RAFFALLI
---------------------------------------------
IMPORTANT: this mail is signed using PGP/MIME
At least Enigmail/Mozilla, mutt or evolution
can check this signature
---------------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Rounding mode
  2003-11-10 20:35         ` Eric Dahlman
@ 2003-11-10 23:09           ` Brian Hurt
  2003-11-17 21:15             ` Xavier Leroy
  2003-11-12 17:19           ` Diego Olivier Fernandez Pons
  1 sibling, 1 reply; 29+ messages in thread
From: Brian Hurt @ 2003-11-10 23:09 UTC (permalink / raw)
  To: Eric Dahlman; +Cc: Ocaml Mailing List

On Mon, 10 Nov 2003, Eric Dahlman wrote:

> Brian Hurt wrote:
> 
> >
> >Simplistic discussion of why changing the rounding modes is usefull for 
> >numeric programming.  If you know numerical analysis, stop reading now 
> >:-).
> >  
> >
> I read it anyway ;-)

The reason I asked you to stop reading was that I was going to be 
massively simplifying a subject that is in effect it's own branch of PhD 
level mathematics.  The result will be somewhat like those books on 
relativity aimed at grade school kids.  Physicists who read them are 
regularly horrified that important subtlities are glossed over, hand waved 
past, or simply blatantly ignored.  The reason is, of course, I was 
writting an email not a book :-).

Also, I'm far from an expert in this field myself (although I know a 
couple).  I know just enough to know how much I don't know.

[Snip happens]

> Now I don't know that it is this simple.  

You're right, it's not at all that simple.  For any given two rounding 
modes, it's possible to construct algorithms for which both rounding modes 
will error in the same direction- i.e. the two results will not bracket 
the "real" answer, and will also misdirect you as to how large your error 
bounds really are.  Also, doing this trick only tells you about the data 
you are using.  Algorithms can have unexpected failure modes where they 
lose a lot more presision.  A classic example of this is subtraction.  You 
may know that x = 1.00000000000002 to 15 digits of precision, and y = 
1.00000000000001 to 15 digits of precision, but you only know x-y = 
0.00000000000001 to 1 digit of precision.  Most of your precision just got 
canceled.  Opps.

Interval arithmetic has similiar problems.  There are no mechanistic 
replacements for numerically analyzing your algorithms.  Both are usefull 
as experimental evidence that your analysis is correct.  But numerically 
analyzing your algorithms is a subtle and tricky- which is why my advice 
is to leave to the numerical analysts, and only use algorithms they have 
blessed (if possible).

Hmm.  Actually, I wonder if type systems could be extended to detect 
numerical instabilities?

> This is incorrect because you are thinking about real numbers while 
> working with floating point numbers, this is especially dangerous when 
> you are operating near or below the epsilon for the particular floating 
> point representation like we are now.  If you were to look at the set of 
> numbers which are precisely represented by a given floating point 
> representation you would see that it is concentrated near 0 and that it 
> tends to thin out as you move away from there.   The distance  between 
> any  two adjacent floating point numbers is highly variable and not 
> uniform or "nice" by almost any measure.  I really am having a hard time 
> figuring out what you can say about the tiny e terms in this case as we 
> move between regions with differing "floating point resolutions".  

Most of the numerical proofs I have seen assume error is a scale factor,
not an additive factor.  I.e., it's x * (1 + e), not x + e.  Then x*y with
error terms becomes x*(1+e_x)*y*(1+e_y) = x*y*(1 + e_x + e_y + e_x*e_y).  
This way I sidestep any assumptions about the "sizes" of x and y.  You
still have to deal with the corner cases where x and y become too small,
and you start hitting denormalized numbers.  This is more accurate, but
more complex math.  Thus the simplification to (x + e), while making sure
the conclusions are still more or less correct.

> As for a numerically stable algorithm being able to produce a result 
> which is more accurate than the original inputs that is total hogwash.  
> You cannot ever have more accuracy than was present in your original 
> input, ever. Nihil ex nihilo.

Yep.  But you can have less, even none what so ever.  Stable algorithms
are stable precisely because the errors they introduce tend to cancel each
other out instead of accumulating.  Which was the point I was trying to
make- I'll admit I was speaking inaccurately when I implied that stable
algorithms somehow magically added precision that wasn't there already.

You can make both type I and type II errors (warping statistic
terminology), and both are incorrect.  Type I errors would here be
assuming that because the computer printed out 15 digits of output, all 15
digits are to be beleived (despite the fact that the inputs are only
accurate to 3 digits).  Type II errors are assuming that errors only
accumulate.  This leads to demands for, for example, 2048-bit floating
point types.  The hope here is that if you can make the errors small
enough, you may still have some precision when you're done.

In my experience, type II errors are most often made by programmers who 
have recently "discovered" that floating point introduces errors into 
calculations.  Which is why I was addressing it early- the solution isn't 
to throw more precision at the problem, the solution is to make the errors 
introduced cancel each other.  And in so doing, unwittingly played to 
making type I errors.

[ snip: randomized rounding as the default ]

> As an oldby I would be disconcerted by this as well.  All you are doing
> is trading one sort of numerical noise for another which you guess
> should be better behaved but in the absence of a neat proof to the
> contrary I don't believe is.  There are just *SO* many sources of huger
> error terms in numerical calculations that the tiny rounding errors are
> really lost in the noise.  I even tried to come up with a case where it
> would matter but that got so farcical that I gave up on it.

All stable algorithms I know of remain stable in the face of randomized 
rounding.  There are some algorithms which are instable with any fixed 
rounding scheme, but which are stable with randomized rounding.

Consider the following case: adding a list of three numbers together-
1.0, 0.5e-15, and 0.5e-15.  Adding them in that order is unstable no 
matter what fixed rounding you use.  The correct answer is, obviously, 
1.000000000000001.  If you always round up, you get 1.000000000000002, 
while if you always round down, you get 1.000000000000000.  With 
randomized rounding, you have a 50% shot of getting the right answer- more 
likely than any other answer.  Now, replace that three-element list with 
1.0 followed by 0.5e-15 repeated a million times.

The stable algorithm here is to sort your long list of integers to add 
first, and always add from smallest to largest.  If you do this with the 
three element list, you'll add 0.5e-15 to itself, getting 1e-15, a number 
large enough not get totally rounded off when added to 1.0.  Adding 
0.5e-15 to itself a million times gets you 0.5e-9, which when added to 1.0 
always gets the right answer, with very little dependence on what rounding 
happens.  Replacing an instable algorithm with a stable algorithm is 
always a good idea.  Of course, in this case, you're replacing an O(n) 
algorithm with an O(n log n) algorithm.

There are reasons why people who know what they're doing may want to play 
with rounding modes.  And while randomized rounding may in some sense be 
"better", I agree that making it the default it a bad idea.  I'm not even 
sure if all architectures offer it.

I've got to admit to a certain fondness for the writtings of Prof. Kahan:
http://www.cs.berkeley.edu/~wkahan/

In fact, I'll go so far as saying the following should be required 
reading for the Ocaml developers:
http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf

(Well, except for operator overloading, I'll disagree with Kahan on that. 
 Although for what Kahan wants, I wonder if Ocaml's ability to define new
infix operators isn't enough, or even better.)

> 
> What I was really wondering about when I first asked this question was 
> if the other person had exactly that kind of example where they were 
> using some clever trick to usefully misapply floating point calculations 
> in their problem domain.

I agree- this is an incredibly bad idea.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Rounding mode + extended
       [not found]     ` <16305.25815.3793.545198@karryall.dnsalias.org>
@ 2003-11-12 15:35       ` Christophe Raffalli
  2003-11-13 17:35         ` Xavier Leroy
  0 siblings, 1 reply; 29+ messages in thread
From: Christophe Raffalli @ 2003-11-12 15:35 UTC (permalink / raw)
  To: caml-list


At this address, you will find a very small library for changing the
rounding mode from OCaml (works on 6 processors, but only tested on i486)

ftp://www.lama.univ-savoie.fr/pub/users/RAFFALLI/OCaml-round.tgz

...

Is it possible to use "extended" instead of "double" in OCaml for float
(I am ready to recompile OCaml for this)

What was the reason to have only one type of "float" ? Three types of
float (float, double and extended) do not explode the complexity of the
optimization for float array or float in record (I think).

Ok, you should have three additions +. +.. and +... :-(

Is this the only reason ?

-- 
Christophe Raffalli
Université de Savoie
Batiment Le Chablais, bureau 21
73376 Le Bourget-du-Lac Cedex

tél: (33) 4 79 75 81 03
fax: (33) 4 79 75 87 42
mail: Christophe.Raffalli@univ-savoie.fr
www: http://www.lama.univ-savoie.fr/~RAFFALLI
---------------------------------------------
IMPORTANT: this mail is signed using PGP/MIME
At least Enigmail/Mozilla, mutt or evolution
can check this signature
---------------------------------------------


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Rounding mode
  2003-11-10 20:35         ` Eric Dahlman
  2003-11-10 23:09           ` Brian Hurt
@ 2003-11-12 17:19           ` Diego Olivier Fernandez Pons
  2003-11-13 15:47             ` Eric Dahlman
  2003-11-17 17:03             ` Diego Olivier Fernandez Pons
  1 sibling, 2 replies; 29+ messages in thread
From: Diego Olivier Fernandez Pons @ 2003-11-12 17:19 UTC (permalink / raw)
  To: Eric Dahlman; +Cc: caml-list

    Bonjour,

> Somewhat off topic but why is this necessary from a numerical math
> type of perspective. I am honestly curious as I don't see how this
> would interact with the calculation in a meaningful way.

You are right when you say that there are many sources of errors in
numerical computations and that rounding errors are usually
insignificant with respect to them.

The point is that stochastic arithmetic (and its deterministic variant
interval arithmetic) are useful to find where the accurancy of your
computation is falling drastically (e.g. cancellations)

I really haven't the place to explain extensively how CESTAC works but
there are a few explanations in the ANP website

   http://anp.lip6.fr/cadna/Accueil.php

(CADNA for C/C++ source codes, user's guide. Chapter 4. Survey of the
CESTAC method. Many examples also on the homepages).

The main idea is that in a first order approximation, the number of
significant digits of a result can be estimated with respects to the
dispersion of the different values it can take using several rounding
modes.

Then, you can avoid doing unstable computations like dividing by a
small number (epsilon) very noised which makes you believe it is a
good 'pivot' in a gaussian resolution, etc. The whole computation will
then give a more accurate value.

The website gives an example where usual gauss method finds

   x1 = 60 x2 = - 8.9 x3 = 0.0 and x4 = 1.0

when you estimate the errors, you find

   x1 = 1.0 x2 = 1.0 x3 = 0.1 e-07 and x4 = 1.0

exact values are

   x1 = 1 x2 = 1 x3 = 0.1 e-07 x4 = 1

The difference is only due to a 'bad' pivot succesfully detected and
therefore avoided.


        Diego Olivier


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Rounding mode
  2003-11-12 17:19           ` Diego Olivier Fernandez Pons
@ 2003-11-13 15:47             ` Eric Dahlman
  2003-11-17 17:03             ` Diego Olivier Fernandez Pons
  1 sibling, 0 replies; 29+ messages in thread
From: Eric Dahlman @ 2003-11-13 15:47 UTC (permalink / raw)
  To: Diego Olivier Fernandez Pons; +Cc: caml-list

Hi Diego,

I looked at that page thanks a bunch!  That was just what I was hoping 
for and it answered many of my questions.

-Eric


Diego Olivier Fernandez Pons wrote:

>     Bonjour,
> 
> 
>>Somewhat off topic but why is this necessary from a numerical math
>>type of perspective. I am honestly curious as I don't see how this
>>would interact with the calculation in a meaningful way.
> 
> 
> You are right when you say that there are many sources of errors in
> numerical computations and that rounding errors are usually
> insignificant with respect to them.
> 
> The point is that stochastic arithmetic (and its deterministic variant
> interval arithmetic) are useful to find where the accurancy of your
> computation is falling drastically (e.g. cancellations)
> 
> I really haven't the place to explain extensively how CESTAC works but
> there are a few explanations in the ANP website
> 
>    http://anp.lip6.fr/cadna/Accueil.php
> 
> (CADNA for C/C++ source codes, user's guide. Chapter 4. Survey of the
> CESTAC method. Many examples also on the homepages).
> 
> The main idea is that in a first order approximation, the number of
> significant digits of a result can be estimated with respects to the
> dispersion of the different values it can take using several rounding
> modes.
> 
> Then, you can avoid doing unstable computations like dividing by a
> small number (epsilon) very noised which makes you believe it is a
> good 'pivot' in a gaussian resolution, etc. The whole computation will
> then give a more accurate value.
> 
> The website gives an example where usual gauss method finds
> 
>    x1 = 60 x2 = - 8.9 x3 = 0.0 and x4 = 1.0
> 
> when you estimate the errors, you find
> 
>    x1 = 1.0 x2 = 1.0 x3 = 0.1 e-07 and x4 = 1.0
> 
> exact values are
> 
>    x1 = 1 x2 = 1 x3 = 0.1 e-07 x4 = 1
> 
> The difference is only due to a 'bad' pivot succesfully detected and
> therefore avoided.
> 
> 
>         Diego Olivier
> 
> 
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> 


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Rounding mode + extended
  2003-11-12 15:35       ` [Caml-list] Rounding mode + extended Christophe Raffalli
@ 2003-11-13 17:35         ` Xavier Leroy
  0 siblings, 0 replies; 29+ messages in thread
From: Xavier Leroy @ 2003-11-13 17:35 UTC (permalink / raw)
  To: Christophe Raffalli; +Cc: caml-list

> At this address, you will find a very small library for changing the
> rounding mode from OCaml (works on 6 processors, but only tested on i486)

A general comment on the rounding mode issue and your question about
extended-precision floats.

While I agree it would be nice to have full IEEE 754 support in OCaml,
one has to keep in mind that the OCaml standard libraries are severely
constrained by the general requirement that the bytecoded part should
work on any "reasonable" platform, where "reasonable" is currently
defined as "ANSI C plus Unix 98".

Unfortunately, this combination of standards has very little to offer
in terms of advanced features of IEEE float arithmetic.  In
particular, there's no standardized API for controlling rounding
modes.

ISO C9X does provide functions (fegetround, fesetround) to control
rounding modes, and it might be a good idea to use them in your
library when available.  The problem is that very few platforms
implement these functions, let alone full ISO C9X.  For instance, glibc
is the only C library that I know of that provides fe[gs]etround.

So, between standards that few implement and processor-specific asm
statements like you did, there is very little hope of achieving portability.

> Is it possible to use "extended" instead of "double" in OCaml for float
> (I am ready to recompile OCaml for this)

You mean "long double"? :-)  (That's the ISO C9X name for
extended-precision floats.)  

The core bytecode system might keep working if you do this change,
except output_value/input_value on floats, which require 8-byte floats.
As for the native-code compiler, that would require non-trivial
changes throughout.

> What was the reason to have only one type of "float" ? Three types of
> float (float, double and extended) do not explode the complexity of the
> optimization for float array or float in record (I think).

If you want all three array types (float array, double array, extended array)
to be unboxed, it will certainly explode that complexity, turning (in
the polymorphic case) a 2-way branch into a 4-way branch.

> Ok, you should have three additions +. +.. and +... :-(

And much more, notably conversions between the various float types,
conversions to/from integers, to/from strings, printf formats, and so on.
Just look at the Int32, Int64 and Nativeint modules for an idea of the
combinatorics involved.

> Is this the only reason ?

No, there are two others.  The first is that single-precision floats
(32 bits) are nearly useless, since modern processors compute just as
fast over double-precision floats.  The only case where you'd want to
use single-precision floats is as a storage type, to halve the size of
large arrays of numbers, but that's an option supported by bigarrays.

The second is that extended-precision floats are not fully specified
(these can use 80, 96 or 128 bits if I remember correctly), and few
processors implement them efficiently in hardware: IA32, perhaps IA64,
maybe some IBM 360 series processors.  Even on IA32, the trend is to
use SSE2 for floating-point computation, which don't support 80-bit
floats.  So, again, if you program for the common denominator, you'd
use 64-bit floats and nothing else.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Rounding mode
  2003-11-12 17:19           ` Diego Olivier Fernandez Pons
  2003-11-13 15:47             ` Eric Dahlman
@ 2003-11-17 17:03             ` Diego Olivier Fernandez Pons
  1 sibling, 0 replies; 29+ messages in thread
From: Diego Olivier Fernandez Pons @ 2003-11-17 17:03 UTC (permalink / raw)
  To: bhurt; +Cc: caml-list

    Bonjour,

> Actually, I wonder if type systems could be extended to detect
> numerical instabilities ?

I forgot to answer to that question...

There was a discusion on the types forum

http://www.cis.upenn.edu/~bcpierce/types/archives/current/msg01419.html

The main problem could be stated as "what properties do you want to
guarantee for your arithmetics" : algebraic properties ? topological
properties ? subtyping properties ?

Several examples violated algebraic properties were given by Tim
Sweeney in his post (Leibniz equalities, uniqueness of the zero,
forall x. x = x, etc.)

Several examples of topological properties were given by Joe Darcy :

- 'compactification' "IEEE 754 floating-point arithmetics is a
complete system; that is for all inputs, every arithmetic operation in
the standard has a defined IEEE 754 value as a result [...] There are
two ways to add infinite values to the field of real numbers : 1. a
single infinity 2. two affine infinities ..." Actually, there are much
more ways of compactifying a locally compact space and
'compactification' hasn't the same mathematical meaning than
'completion' or 'inner operations' but that is not the problem here.

- funny theorems on limits and series like 'if the general term of a
serie converges, then the whole serie converges'

The subtyping question seems to be related to commutative diagrams :
we want the surjection (e.g. 64 -> 32 bits) to commute with arithmetic
operations.

"Though one can losslessly convert from 32-bit to 64-bit and back,
performing a 32-bit arithmetic operation on such 32-bit numbers can
produce a different result than performing a 64-bit arithmetic
operation on their 64-bit extensions, and then converting back."


We surely can answer to some of these question : with respect to group
properties there is a well known theorem 'any subgroup of R either is
dense or has the form aZ'. In both cases they are infinite. Therefore
you cannot expect any floting points arithmetic to keep all group
properties.

For the typing question, we can give part of an answer :

(I use stochastic arithmetic because interval arithmetic is usually
too pessimistic. I am not a statistician but they all seem to agree
with the fact that in many cases being 'almost sure' is much more easy
than being 'absolutely sure')

Cestac allows you to compute the 'meaningful part' of a computation
(in a statistical sense). Then you can see a result like a couple
float * int where the integers gives the number of exact digits.

If you see a coercion like a surjection (f, n) -> (f, m) with m < n
(which just means that you loose accuracy) then :

(i) We can construct a 'clean' family of subtyping operators that is
such as there is at least one surjection that commutes with your
computations (which one of course depends on the stability of the
given computations).

(ii) We can construct a 'type system' which ensures dynamically that
every result is 'well typed'. Actually, this is exactly what CADNA
does but as a library.


        Diego Olivier


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Rounding mode
  2003-11-10 23:09           ` Brian Hurt
@ 2003-11-17 21:15             ` Xavier Leroy
  0 siblings, 0 replies; 29+ messages in thread
From: Xavier Leroy @ 2003-11-17 21:15 UTC (permalink / raw)
  To: Brian Hurt; +Cc: Eric Dahlman, Ocaml Mailing List

> Hmm.  Actually, I wonder if type systems could be extended to detect 
> numerical instabilities?

It's an extremely tough problem, but you might be interested in the
following papers:

"Static Analyses of the Precision of Floating-Point Operations",
Eric Goubault, Static Analysis Symposium 2001.

"Propagation of Roundoff Errors in Finite Precision Computations: a
Semantics Approach", Matthieu Martel, European Symposium on
Programming 2002.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [Caml-list] Efficient and canonical set representation?
@ 2003-11-12 17:18 Harrison, John R
  0 siblings, 0 replies; 29+ messages in thread
From: Harrison, John R @ 2003-11-12 17:18 UTC (permalink / raw)
  To: Diego Olivier Fernandez Pons; +Cc: caml-list, Harrison, John R

| It is not a coincidence. I remember having read somewhere that it
| could be proven that if a balanced tree (based on comparisons) did not
| have enough different representations of a same set, then the
| insertion could not be done in logarithmic time.

Interesting. I was beginning to suspect that might be the case. Does
anyone have an exact reference for such a result?

John.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [Caml-list] Efficient and canonical set representation?
  2003-11-12  0:20 Harrison, John R
  2003-11-12  2:04 ` Brian Hurt
@ 2003-11-12 16:16 ` Diego Olivier Fernandez Pons
  1 sibling, 0 replies; 29+ messages in thread
From: Diego Olivier Fernandez Pons @ 2003-11-12 16:16 UTC (permalink / raw)
  To: Harrison, John R; +Cc: caml-list

    Bonjour,

> Is it just a coincidence that the numerous varieties of balanced
> tree (AVL, 2-3-4, red-black, ...) all seem to be non-canonical? Or
> is it essential to their efficiency? (Perhaps this is a question for
> another forum.)

It is not a coincidence. I remember having read somewhere that it
could be proven that if a balanced tree (based on comparisons) did not
have enough different representations of a same set, then the
insertion could not be done in logarithmic time.


        Diego Olivier

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [Caml-list] Efficient and canonical set representation?
  2003-11-12  3:34 Harrison, John R
@ 2003-11-12  7:50 ` Brian Hurt
  0 siblings, 0 replies; 29+ messages in thread
From: Brian Hurt @ 2003-11-12  7:50 UTC (permalink / raw)
  To: Harrison, John R; +Cc: Ocaml Mailing List

On Tue, 11 Nov 2003, Harrison, John R wrote:

> | I've been batting around ideas for ways to do balanced trees so that no 
> | matter what order you add things, you always get the same tree.  But even 
> | assuming you could do this, doing a structural compare is still O(N).  So 
> | you might as well let the trees be different.
> 
> Right, but see my second message --- I'm only interested in canonicity
> up to structural equality and I'm happy with O(N) comparison. So it's
> just the "no matter what order you add things you get the same tree"
> property that I care about. But it's not yet obvious to me whether I
> can even achieve that much.
> 

It feels like that can be done, at the cost of an occassional O(N) 
"massive rebalancing".  Well, it certainly can be done with an O(N) 
insert/delete.  I'll think about it a bit.


-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [Caml-list] Efficient and canonical set representation?
@ 2003-11-12  3:34 Harrison, John R
  2003-11-12  7:50 ` Brian Hurt
  0 siblings, 1 reply; 29+ messages in thread
From: Harrison, John R @ 2003-11-12  3:34 UTC (permalink / raw)
  To: Brian Hurt; +Cc: Ocaml Mailing List, Harrison, John R

| I've been batting around ideas for ways to do balanced trees so that no 
| matter what order you add things, you always get the same tree.  But even 
| assuming you could do this, doing a structural compare is still O(N).  So 
| you might as well let the trees be different.

Right, but see my second message --- I'm only interested in canonicity
up to structural equality and I'm happy with O(N) comparison. So it's
just the "no matter what order you add things you get the same tree"
property that I care about. But it's not yet obvious to me whether I
can even achieve that much.

John.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [Caml-list] Efficient and canonical set representation?
  2003-11-12  0:20 Harrison, John R
@ 2003-11-12  2:04 ` Brian Hurt
  2003-11-12 16:16 ` Diego Olivier Fernandez Pons
  1 sibling, 0 replies; 29+ messages in thread
From: Brian Hurt @ 2003-11-12  2:04 UTC (permalink / raw)
  To: Harrison, John R; +Cc: Diego Olivier Fernandez Pons, Ocaml Mailing List

On Tue, 11 Nov 2003, Harrison, John R wrote:

> That seems to be the best suggestion so far. I guess it would work well
> in practice. But theoretically it still doesn't give O(log n) lookup
> and insertion without the kinds of assumptions you noted about the
> distribution of elements w.r.t. the hash function. And relying on
> polymorphic hashing seems a bit of a hack.
> 
> So I still can't help wondering if there's an elegant solution with the
> desired worst-case behaviour, preferably relying only on pairwise
> comparison. Is it just a coincidence that the numerous varieties of
> balanced tree (AVL, 2-3-4, red-black, ...) all seem to be non-canonical?
> Or is it essential to their efficiency? (Perhaps this is a question for
> another forum.)

I don't think so.

I've been batting around ideas for ways to do balanced trees so that no 
matter what order you add things, you always get the same tree.  But even 
assuming you could do this, doing a structural compare is still O(N).  So 
you might as well let the trees be different.  Note that Patricia trees, 
as I understand them, don't save you here either.  Mathematically, two 
sets A and B are equal if every element in set A is in set B and vice 
versa.

Think about it for a moment.  Assume we have a tree strucutre:
type 'a node_t = Node of 'a * 'a node_t * 'a node_t | Empty

Now, assume the code magically keeps the trees balanced exactly the same 
way.  How would you do the comparison?

let rec equals a b =
    match a, b with
        Empty, Empty -> true
        | Node(a_data, a_left, a_right), Node(b_data, b_left, b_right) ->
            (a_data == b_data) && (equals a_left b_left) &&
            (equals a_right b_right)
        | _ -> false
;;

This is an O(N) algorithm still.

The only way I can think of to make pointer equivelence meaningfull is to 
keep some structure of all the structures currently in use in the 
background.  Then you have to search this structure on every insertion or 
deletion to see if the new set is equal (using the old O(N) comparison) to 
an already existing set.  This structure could be a tree as well, but this 
still makes insertion and deletion O(N log M) (where M is the number of 
structures currently in use).  Instead of just O(log N).  Much worse.

There are ways you can make comparison faster.  For example, keep the 
number of elements handy in the structure (O(1) length operation) and just 
compare the lengths before doing anything else.  If you can hash the 
objects, you can keep a hash of the entire structure, being the sum of the 
hashes of the individual elements (updating the hash is then O(1) on 
insert or delete).  If the hashs don't match, the structures are 
gaurenteed to be different.  And, obviously, if pointer comparison is 
equal, the structures are equal.  Note that you always have cases where 
you have to do an O(N) comparison.

I think you're SOL.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian




-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [Caml-list] Efficient and canonical set representation?
@ 2003-11-12  0:20 Harrison, John R
  2003-11-12  2:04 ` Brian Hurt
  2003-11-12 16:16 ` Diego Olivier Fernandez Pons
  0 siblings, 2 replies; 29+ messages in thread
From: Harrison, John R @ 2003-11-12  0:20 UTC (permalink / raw)
  To: Diego Olivier Fernandez Pons; +Cc: caml-list, Harrison, John R

That seems to be the best suggestion so far. I guess it would work well
in practice. But theoretically it still doesn't give O(log n) lookup
and insertion without the kinds of assumptions you noted about the
distribution of elements w.r.t. the hash function. And relying on
polymorphic hashing seems a bit of a hack.

So I still can't help wondering if there's an elegant solution with the
desired worst-case behaviour, preferably relying only on pairwise
comparison. Is it just a coincidence that the numerous varieties of
balanced tree (AVL, 2-3-4, red-black, ...) all seem to be non-canonical?
Or is it essential to their efficiency? (Perhaps this is a question for
another forum.)

John.

-----Original Message-----
From: owner-caml-list@pauillac.inria.fr
[mailto:owner-caml-list@pauillac.inria.fr]On Behalf Of Diego Olivier
Fernandez Pons
Sent: Monday, November 10, 2003 5:25 AM
To: Harrison, John R
Cc: caml-list@inria.fr
Subject: RE: [Caml-list] Efficient and canonical set representation?


    Bonjour,

> After your remarks and Brian's, I'm starting to wonder if it is
> possible at all to do what I want. Maybe I should be looking for an
> impossibility proof instead...

Patricia sets seem to be what you are looking for.
 (1). Efficient usual operations (lookup, insertion, union)
 (2). Structural equality

Their only problem is that they cannot handle polymorphic orderable
types but only integers...

Hash the data, use this key to insert it in a patricia map and solve
the collisions by chaining in an ordered list (with the polymorphic
[compare] function). (1) and (2) still hold under usual hypothesis on
the rate of collisions.

A few changes to JCF's implementation should be enough.

        Diego Olivier



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Efficient and canonical set representation?
  2003-11-07 15:44 ` Samuel Lacas
@ 2003-11-08 16:50   ` Eray Ozkural
  0 siblings, 0 replies; 29+ messages in thread
From: Eray Ozkural @ 2003-11-08 16:50 UTC (permalink / raw)
  To: Samuel Lacas, caml-list

On Friday 07 November 2003 17:44, Samuel Lacas wrote:
> Hmm, except that, if I'm not wrong, it was required the structure to
> hold any kind of object. Sorted arrays require the elements to be
> sortable. Using the hash of the objects may be an answer ?

You can give a number to each member object I guess in a lot of cases. But of 
course, in general a set doesn't mean "set of sortable objects".

Regards,


-- 
Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr>
Comp. Sci. Dept., Bilkent University, Ankara  KDE Project: http://www.kde.org
www: http://www.cs.bilkent.edu.tr/~erayo  Malfunction: http://mp3.com/ariza
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Efficient and canonical set representation?
  2003-11-07 15:27 Fred Smith
@ 2003-11-07 15:44 ` Samuel Lacas
  2003-11-08 16:50   ` Eray Ozkural
  0 siblings, 1 reply; 29+ messages in thread
From: Samuel Lacas @ 2003-11-07 15:44 UTC (permalink / raw)
  To: caml-list

Fred Smith a écrit 2.2K le Fri, Nov 07, 2003 at 10:27:25AM -0500:
# 
# I guess what you're looking for are sorted arrays:
#   1) O(log n) lookup and insertion via binary search
#   2) O(n) union and intersection are simple
#   3) Equal sets are represented by structurally equivalent objects.
# 
# -Fred

Hmm, except that, if I'm not wrong, it was required the structure to
hold any kind of object. Sorted arrays require the elements to be
sortable. Using the hash of the objects may be an answer ?

sL

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [Caml-list] Efficient and canonical set representation?
@ 2003-11-07 15:27 Fred Smith
  2003-11-07 15:44 ` Samuel Lacas
  0 siblings, 1 reply; 29+ messages in thread
From: Fred Smith @ 2003-11-07 15:27 UTC (permalink / raw)
  To: Harrison, John R, erayo, caml-list


I guess what you're looking for are sorted arrays:
  1) O(log n) lookup and insertion via binary search
  2) O(n) union and intersection are simple
  3) Equal sets are represented by structurally equivalent objects.

-Fred

> -----Original Message-----
> From: owner-caml-list@pauillac.inria.fr 
> [mailto:owner-caml-list@pauillac.inria.fr] On Behalf Of 
> Harrison, John R
> Sent: Friday, November 07, 2003 9:16 AM
> To: erayo@cs.bilkent.edu.tr; caml-list@inria.fr
> Cc: Harrison, John R
> Subject: RE: [Caml-list] Efficient and canonical set representation?
> 
> 
> | You basically want O(1) for set equality, I suppose.
> 
> Actually, no --- perhaps I should have made clearer what I 
> *really* want. The efficiency of comparison wasn't my 
> motivation, but rather elegance and aesthetics. And I meant 
> "canonical" with respect to ordinary structural equality, not 
> necessarily pointer equality, so the problem is potentially a 
> bit easier than you might have thought.
> 
> I want to be able to treat an abstract type in a truly 
> abstract way, and not worry about special-purpose equality 
> relations on certain types. Otherwise it's an ugly mess 
> dealing with complicated nestings like sets of pairs of lists of sets.
> 
> Now, I think the right solution, conceptually speaking, is to 
> allow user-defined equality on abstract types. But as far as 
> I know this cannot be done in OCaml, and I've never met much 
> enthusiasm for the idea among the CAML or SML experts.
> 
> So a poor second best is to define abstract types in a canonical way, 
> which was the starting-point of my question.
> 
> After your remarks and Brian's, I'm starting to wonder if it 
> is possible at all to do what I want. Maybe I should be 
> looking for an impossibility proof instead...
> 
> John.
> 
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: 
http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs
FAQ: http://caml.inria.fr/FAQ/ Beginner's list:
http://groups.yahoo.com/group/ocaml_beginners

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [Caml-list] Efficient and canonical set representation?
@ 2003-11-07 14:15 Harrison, John R
  0 siblings, 0 replies; 29+ messages in thread
From: Harrison, John R @ 2003-11-07 14:15 UTC (permalink / raw)
  To: erayo, caml-list; +Cc: Harrison, John R

| You basically want O(1) for set equality, I suppose.

Actually, no --- perhaps I should have made clearer what I *really* want.
The efficiency of comparison wasn't my motivation, but rather elegance
and aesthetics. And I meant "canonical" with respect to ordinary
structural equality, not necessarily pointer equality, so the problem is
potentially a bit easier than you might have thought.

I want to be able to treat an abstract type in a truly abstract way,
and not worry about special-purpose equality relations on certain types.
Otherwise it's an ugly mess dealing with complicated nestings like sets
of pairs of lists of sets.

Now, I think the right solution, conceptually speaking, is to allow
user-defined equality on abstract types. But as far as I know this cannot
be done in OCaml, and I've never met much enthusiasm for the idea among
the CAML or SML experts.

So a poor second best is to define abstract types in a canonical way, 
which was the starting-point of my question.

After your remarks and Brian's, I'm starting to wonder if it is possible
at all to do what I want. Maybe I should be looking for an impossibility
proof instead...

John.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Efficient and canonical set representation?
  2003-11-06 16:41 Harrison, John R
  2003-11-06 17:04 ` Brian Hurt
  2003-11-07  3:43 ` Eray Ozkural
@ 2003-11-07  3:52 ` Eray Ozkural
  2 siblings, 0 replies; 29+ messages in thread
From: Eray Ozkural @ 2003-11-07  3:52 UTC (permalink / raw)
  To: Harrison, John R, caml-list

Hello John,

I forgot to mention: this is exactly the same question that I asked aeons ago 
for representing hypergraphs on haskell list. (For those who didn't remember: 
a hypergraph is basically a collection of sets whose elements are drawn from 
a finite set X, it is the generalization of a graph in which each edge 
connects multiple vertices rather than 2)

Of course there is a way to represent hypergraphs efficiently, but it's not a 
functional way. In the optimal and general purpose iterative method, you use 
the equivalent of adjacency list representation extended to hypergraphs: 
basically an array of pins (for each net) and an array of nets (for each 
pin).

 Since you are asking this question, you probably already know this. I am 
pretty sure a good ocaml implementation doesn't exist, but the idea is the 
same as in C and you can code it ;)

If speed isn't a paramount concern over abstraction at the present you can 
hack the Hypergraph module that uses edison which I had posted to haskell 
list. If you can't find it or can't get it to work, let me know and structure 
monster will try to help :)

Regards,

On Thursday 06 November 2003 18:41, Harrison, John R wrote:
> Does anyone know a representation of finite sets over an orderable
> polymorphic type that's (1) efficient and (2) canonical? Even better would
> be a CAML or OCaml implementation. More precisely I'm looking for:
>
>   1. Log-time lookup and insertion, and linear-time union, intersection
> etc.
>
>   2. Equal sets are represented by the same object.
>
> For example, ordered lists satisfy (2) but only part of (1), while all the
> variants of balanced trees I can remember that satisfy (1) --- AVL trees
> etc. --- fail (2).
>
> Thanks,
>
> John.
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives:
> http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
> http://caml.inria.fr/FAQ/ Beginner's list:
> http://groups.yahoo.com/group/ocaml_beginners

-- 
Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr>
Comp. Sci. Dept., Bilkent University, Ankara  KDE Project: http://www.kde.org
www: http://www.cs.bilkent.edu.tr/~erayo  Malfunction: http://mp3.com/ariza
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Efficient and canonical set representation?
  2003-11-06 16:41 Harrison, John R
  2003-11-06 17:04 ` Brian Hurt
@ 2003-11-07  3:43 ` Eray Ozkural
  2003-11-07  3:52 ` Eray Ozkural
  2 siblings, 0 replies; 29+ messages in thread
From: Eray Ozkural @ 2003-11-07  3:43 UTC (permalink / raw)
  To: Harrison, John R, caml-list

On Thursday 06 November 2003 18:41, Harrison, John R wrote:
> Does anyone know a representation of finite sets over an orderable
> polymorphic type that's (1) efficient and (2) canonical? Even better would
> be a CAML or OCaml implementation. More precisely I'm looking for:
>
>   1. Log-time lookup and insertion, and linear-time union, intersection
> etc.
>
>   2. Equal sets are represented by the same object.
>
> For example, ordered lists satisfy (2) but only part of (1), while all the
> variants of balanced trees I can remember that satisfy (1) --- AVL trees
> etc. --- fail (2).

It will be pretty hard to get 2. Not unless you are using disjoint sets :) You 
basically want O(1) for set equality, I suppose. Now, I think you wish to 
insert a new set in amortized O(lgn) time like in a disjoint set 
implementation. 

You can still use edison, isn't it good enough for you?

I'm saying this, because I don't think there is a straightforward functional  
way. I have on my mind 2-universal hash functions, but here I am facing bugs 
of my own. I'm not in the structure monster mode right now it seems :) 

Cheers,

-- 
Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr>
Comp. Sci. Dept., Bilkent University, Ankara  KDE Project: http://www.kde.org
www: http://www.cs.bilkent.edu.tr/~erayo  Malfunction: http://mp3.com/ariza
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Efficient and canonical set representation?
  2003-11-06 16:41 Harrison, John R
@ 2003-11-06 17:04 ` Brian Hurt
  2003-11-07  3:43 ` Eray Ozkural
  2003-11-07  3:52 ` Eray Ozkural
  2 siblings, 0 replies; 29+ messages in thread
From: Brian Hurt @ 2003-11-06 17:04 UTC (permalink / raw)
  To: Harrison, John R; +Cc: caml-list

On Thu, 6 Nov 2003, Harrison, John R wrote:

> Does anyone know a representation of finite sets over an orderable polymorphic type
> that's (1) efficient and (2) canonical? Even better would be a CAML or OCaml
> implementation. More precisely I'm looking for:
> 
>   1. Log-time lookup and insertion, and linear-time union, intersection etc.
> 
>   2. Equal sets are represented by the same object.

Two is the tricky one to implement.  Imagine a case where I have set A 
with it's elements, and set B with all the elements less one of set A, but 
inserted in a different order.  B is a different object than A (the two 
sets are not equal).  Now you add that one last element from A, you want 
the insert routine to return A.  This means that the insert routine has to 
know that A exists, and has to compare the new B to A to determine that it 
should return A and not B.  It can be done but it's not trivial.

Games with structure definitions don't help, because Ocaml will happily
allocate different structures with the same data (this is why 1. == 1. is
false).  With a balanced tree structure you can implement the naive
equality comparison in linear time (the sequence i/2^i converges, allowing
you enumerate the elements in linear time).  If you need faster (average) 
compares, there are a number of short cuts you can do.  For example, you 
can keep the number of elements currently in the set handy, and if the 
number of elements don't match, obviously the sets won't be equal.  
Fancier, you can also keep a hash of all elements in the set- the hashs 
aren't equal, you can gaurentee the sets aren't equal.  Be carefull with 
defining your hash function so the order elements were added isn't 
important.

Brian


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Caml-list] Efficient and canonical set representation?
@ 2003-11-06 16:41 Harrison, John R
  2003-11-06 17:04 ` Brian Hurt
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Harrison, John R @ 2003-11-06 16:41 UTC (permalink / raw)
  To: caml-list; +Cc: Harrison, John R

Does anyone know a representation of finite sets over an orderable polymorphic type
that's (1) efficient and (2) canonical? Even better would be a CAML or OCaml
implementation. More precisely I'm looking for:

  1. Log-time lookup and insertion, and linear-time union, intersection etc.

  2. Equal sets are represented by the same object.

For example, ordered lists satisfy (2) but only part of (1), while all the variants
of balanced trees I can remember that satisfy (1) --- AVL trees etc. --- fail (2).

Thanks,

John.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2003-11-17 21:15 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-07 17:27 [Caml-list] Efficient and canonical set representation? Fred Smith
2003-11-10 13:24 ` Diego Olivier Fernandez Pons
2003-11-10 13:49   ` [Caml-list] Rounding mode Christophe Raffalli
2003-11-10 14:10     ` Eric Dahlman
2003-11-10 18:03       ` Brian Hurt
2003-11-10 20:35         ` Eric Dahlman
2003-11-10 23:09           ` Brian Hurt
2003-11-17 21:15             ` Xavier Leroy
2003-11-12 17:19           ` Diego Olivier Fernandez Pons
2003-11-13 15:47             ` Eric Dahlman
2003-11-17 17:03             ` Diego Olivier Fernandez Pons
2003-11-10 21:23       ` Christophe Raffalli
     [not found]     ` <16305.25815.3793.545198@karryall.dnsalias.org>
2003-11-12 15:35       ` [Caml-list] Rounding mode + extended Christophe Raffalli
2003-11-13 17:35         ` Xavier Leroy
2003-11-10 19:28   ` [Caml-list] Efficient and canonical set representation? Julien Signoles
  -- strict thread matches above, loose matches on Subject: below --
2003-11-12 17:18 Harrison, John R
2003-11-12  3:34 Harrison, John R
2003-11-12  7:50 ` Brian Hurt
2003-11-12  0:20 Harrison, John R
2003-11-12  2:04 ` Brian Hurt
2003-11-12 16:16 ` Diego Olivier Fernandez Pons
2003-11-07 15:27 Fred Smith
2003-11-07 15:44 ` Samuel Lacas
2003-11-08 16:50   ` Eray Ozkural
2003-11-07 14:15 Harrison, John R
2003-11-06 16:41 Harrison, John R
2003-11-06 17:04 ` Brian Hurt
2003-11-07  3:43 ` Eray Ozkural
2003-11-07  3:52 ` Eray Ozkural

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).