caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Set union/inter/diff efficiency
@ 2005-07-27  9:12 Jon Harrop
  2005-07-27  9:42 ` [Caml-list] " Diego Olivier Fernandez Pons
  2005-07-27 16:04 ` james woodyatt
  0 siblings, 2 replies; 5+ messages in thread
From: Jon Harrop @ 2005-07-27  9:12 UTC (permalink / raw)
  To: caml-list


Does anyone have any ideas or references on how the union/inter/diff functions 
of the Set module could be optimised by accepting a sequence of sets rather 
than a pair at a time? For example, if A overlaps B overlaps C but A does not 
overlap C then it is probably quicker to compute the union "(A U C) U B" 
rather than "A U B U C".

Better still, does anyone have a replacement Set module which implements this 
functionality?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Set union/inter/diff efficiency
  2005-07-27  9:12 Set union/inter/diff efficiency Jon Harrop
@ 2005-07-27  9:42 ` Diego Olivier Fernandez Pons
  2005-07-27 16:04 ` james woodyatt
  1 sibling, 0 replies; 5+ messages in thread
From: Diego Olivier Fernandez Pons @ 2005-07-27  9:42 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

    Bonjour,

> Does anyone have any ideas or references on how the union/inter/diff
> functions of the Set module could be optimised by accepting a
> sequence of sets rather than a pair at a time ?

No.

> For example, if A overlaps B overlaps C but A does not overlap C
> then it is probably quicker to compute the union "(A U C) U B"
> rather than "A U B U C".

I remember having discussed with Jean-Christophe Filliâtre of the
[compare] implementation of Xavier Leroy. He noticed that it was a
smart lazy linearization of both sets.

In other words you can see it as if one had put a zipper on each set
and one calls when needed the [next] function.

A = 3 -> 5 -> 6 -> 7 -> 10
B = 3 -> 6 -> 8 -> 13

You can say that A < B at the second call of [next]

I suppose you could do a similar thing for union and intersection with
several sets

A = 3 -> 5 -> 6 -> 7 -> 10
B = 3 -> 6 -> 8 -> 13
C = 2 -> 4 -> 5 -> 6

You can call [next] in such a way that the pointers "jump" to an
interesting point. Here, it would be something like

max/min = 3
C -> 4 (first integer >= 3)
max/min = 4
A -> 5 (first integer >= 4)
max/min = 5
B -> 6 (first integer >= 5)
max/min = 6
C -> 6 (...)
A -> 6 (...)
=> output 6 in the intersection

> Better still, does anyone have a replacement Set module which
> implements this functionality?

I am not aware of any in Caml, SML or Haskell but I may be wrong.


        Diego Olivier



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Set union/inter/diff efficiency
  2005-07-27  9:12 Set union/inter/diff efficiency Jon Harrop
  2005-07-27  9:42 ` [Caml-list] " Diego Olivier Fernandez Pons
@ 2005-07-27 16:04 ` james woodyatt
  2005-07-27 17:00   ` james woodyatt
  1 sibling, 1 reply; 5+ messages in thread
From: james woodyatt @ 2005-07-27 16:04 UTC (permalink / raw)
  To: Ocaml Trade

On 27 Jul 2005, at 02:12, Jon Harrop wrote:
>
> Does anyone have any ideas or references on how the union/inter/ 
> diff functions
> of the Set module could be optimised by accepting a sequence of  
> sets rather
> than a pair at a time? For example, if A overlaps B overlaps C but  
> A does not
> overlap C then it is probably quicker to compute the union "(A U C)  
> U B"
> rather than "A U B U C".
>
> Better still, does anyone have a replacement Set module which  
> implements this
> functionality?

No, but you could maybe make an extension more easily using my OCaml  
NAE core foundation library.

Here is the pseudo-code for set union that I would try:

     Make a heap of sets [Cf_heap.of_seq].
     Map into a sequence of sets [Cf_heap.to_seq].
     Map into a sequence of element sequences [Cf_seq.map,  
Cf_set.to_seq_incr].
     Load into a queue.
     While queue is not empty,
       Take an element sequence from the queue.
       Take an element from the head of the sequence.
       If there is no output yet, or the element is greater than  
current output, then
         Output the element
       If the element sequence tail is not empty, then
         Push the element sequence tail onto the queue
     End while

You could do similar things for difference and intersection.

I'm not optimistic that this will actually improve performance.   
Beating the implementation in the standard library is tricky and  
harder than one might think.


-- 
j h woodyatt <jhw@wetware.com>
markets are only free to the people who own them.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Set union/inter/diff efficiency
  2005-07-27 16:04 ` james woodyatt
@ 2005-07-27 17:00   ` james woodyatt
  2005-07-27 17:32     ` james woodyatt
  0 siblings, 1 reply; 5+ messages in thread
From: james woodyatt @ 2005-07-27 17:00 UTC (permalink / raw)
  To: Ocaml Trade

On 27 Jul 2005, at 09:04, james woodyatt wrote:
>
>     Load into a queue.
>     While queue is not empty,

Okay, a queue is the wrong idea.  The right idea would be somewhat  
trickier loop over the sequence of element sequences to catch the  
union elements in the right order.  And I neglected to mention that  
you'd need to build the result set with [Cf_set.of_incr_list].


-- 
j h woodyatt <jhw@wetware.com>
markets are only free to the people who own them.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Set union/inter/diff efficiency
  2005-07-27 17:00   ` james woodyatt
@ 2005-07-27 17:32     ` james woodyatt
  0 siblings, 0 replies; 5+ messages in thread
From: james woodyatt @ 2005-07-27 17:32 UTC (permalink / raw)
  To: Ocaml Trade

On 27 Jul 2005, at 10:00, james woodyatt wrote:
> On 27 Jul 2005, at 09:04, james woodyatt wrote:
>>
>>     Load into a queue.
>>     While queue is not empty,
>
> Okay, a queue is the wrong idea.  The right idea would be somewhat  
> trickier loop over the sequence of element sequences to catch the  
> union elements in the right order.  And I neglected to mention that  
> you'd need to build the result set with [Cf_set.of_incr_list].

Dammit.  I can't do anything right this morning.

I'm pretty sure you can get what you want with a combination of  
different things in my Cf library, e.g. Cf_seq, Cf_heap and Cf_set.   
For union: Convert the sets into a heap of element sequences.  Loop  
through the heap to unify into a single element sequence.  Build a  
new set from the unified element sequence.  For difference and  
intersection: I'd have to think about it some more.


-- 
j h woodyatt <jhw@wetware.com>
markets are only free to the people who own them.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-07-27 17:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-27  9:12 Set union/inter/diff efficiency Jon Harrop
2005-07-27  9:42 ` [Caml-list] " Diego Olivier Fernandez Pons
2005-07-27 16:04 ` james woodyatt
2005-07-27 17:00   ` james woodyatt
2005-07-27 17:32     ` james woodyatt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).