caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Brian Hurt <bhurt@spnz.org>
To: "Harrison, John R" <johnh@ichips.intel.com>
Cc: Diego Olivier Fernandez Pons <Diego.FERNANDEZ_PONS@etu.upmc.fr>,
	Ocaml Mailing List <caml-list@inria.fr>
Subject: RE: [Caml-list] Efficient and canonical set representation?
Date: Tue, 11 Nov 2003 20:04:44 -0600 (CST)	[thread overview]
Message-ID: <Pine.LNX.4.44.0311111930280.5009-100000@localhost.localdomain> (raw)
In-Reply-To: <3C4C3612EC443546A33E57003DB4F0F914C273@orsmsx409.jf.intel.com>

On Tue, 11 Nov 2003, Harrison, John R wrote:

> That seems to be the best suggestion so far. I guess it would work well
> in practice. But theoretically it still doesn't give O(log n) lookup
> and insertion without the kinds of assumptions you noted about the
> distribution of elements w.r.t. the hash function. And relying on
> polymorphic hashing seems a bit of a hack.
> 
> So I still can't help wondering if there's an elegant solution with the
> desired worst-case behaviour, preferably relying only on pairwise
> comparison. Is it just a coincidence that the numerous varieties of
> balanced tree (AVL, 2-3-4, red-black, ...) all seem to be non-canonical?
> Or is it essential to their efficiency? (Perhaps this is a question for
> another forum.)

I don't think so.

I've been batting around ideas for ways to do balanced trees so that no 
matter what order you add things, you always get the same tree.  But even 
assuming you could do this, doing a structural compare is still O(N).  So 
you might as well let the trees be different.  Note that Patricia trees, 
as I understand them, don't save you here either.  Mathematically, two 
sets A and B are equal if every element in set A is in set B and vice 
versa.

Think about it for a moment.  Assume we have a tree strucutre:
type 'a node_t = Node of 'a * 'a node_t * 'a node_t | Empty

Now, assume the code magically keeps the trees balanced exactly the same 
way.  How would you do the comparison?

let rec equals a b =
    match a, b with
        Empty, Empty -> true
        | Node(a_data, a_left, a_right), Node(b_data, b_left, b_right) ->
            (a_data == b_data) && (equals a_left b_left) &&
            (equals a_right b_right)
        | _ -> false
;;

This is an O(N) algorithm still.

The only way I can think of to make pointer equivelence meaningfull is to 
keep some structure of all the structures currently in use in the 
background.  Then you have to search this structure on every insertion or 
deletion to see if the new set is equal (using the old O(N) comparison) to 
an already existing set.  This structure could be a tree as well, but this 
still makes insertion and deletion O(N log M) (where M is the number of 
structures currently in use).  Instead of just O(log N).  Much worse.

There are ways you can make comparison faster.  For example, keep the 
number of elements handy in the structure (O(1) length operation) and just 
compare the lengths before doing anything else.  If you can hash the 
objects, you can keep a hash of the entire structure, being the sum of the 
hashes of the individual elements (updating the hash is then O(1) on 
insert or delete).  If the hashs don't match, the structures are 
gaurenteed to be different.  And, obviously, if pointer comparison is 
equal, the structures are equal.  Note that you always have cases where 
you have to do an O(N) comparison.

I think you're SOL.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian




-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2003-11-12  1:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-12  0:20 Harrison, John R
2003-11-12  2:04 ` Brian Hurt [this message]
2003-11-12 16:16 ` Diego Olivier Fernandez Pons
  -- strict thread matches above, loose matches on Subject: below --
2003-11-12 17:18 Harrison, John R
2003-11-12  3:34 Harrison, John R
2003-11-12  7:50 ` Brian Hurt
2003-11-07 17:27 Fred Smith
2003-11-10 13:24 ` Diego Olivier Fernandez Pons
2003-11-10 19:28   ` Julien Signoles
2003-11-07 15:27 Fred Smith
2003-11-07 15:44 ` Samuel Lacas
2003-11-08 16:50   ` Eray Ozkural
2003-11-07 14:15 Harrison, John R
2003-11-06 16:41 Harrison, John R
2003-11-06 17:04 ` Brian Hurt
2003-11-07  3:43 ` Eray Ozkural
2003-11-07  3:52 ` Eray Ozkural

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.44.0311111930280.5009-100000@localhost.localdomain \
    --to=bhurt@spnz.org \
    --cc=Diego.FERNANDEZ_PONS@etu.upmc.fr \
    --cc=caml-list@inria.fr \
    --cc=johnh@ichips.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).