From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id CAA26475; Wed, 12 Nov 2003 02:05:57 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id CAA27998 for ; Wed, 12 Nov 2003 02:05:56 +0100 (MET) Received: from herd.plethora.net (herd.plethora.net [205.166.146.1]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id hAC15t106638 for ; Wed, 12 Nov 2003 02:05:55 +0100 (MET) Received: from bhurt.plethora.net (bhurt.plethora.net [205.166.146.49]) by herd.plethora.net (8.11.6/8.10.1) with ESMTP id hAC15gC22483; Tue, 11 Nov 2003 19:05:42 -0600 (CST) Date: Tue, 11 Nov 2003 20:04:44 -0600 (CST) From: Brian Hurt X-X-Sender: bhurt@localhost.localdomain To: "Harrison, John R" cc: Diego Olivier Fernandez Pons , Ocaml Mailing List Subject: RE: [Caml-list] Efficient and canonical set representation? In-Reply-To: <3C4C3612EC443546A33E57003DB4F0F914C273@orsmsx409.jf.intel.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 hash:01 hashing:01 worst-case:01 red-black:01 meaningfull:01 hash:01 hashes:01 sol:99 rec:01 behaviour:01 nov:01 polymorphic:01 node:02 node:02 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Tue, 11 Nov 2003, Harrison, John R wrote: > That seems to be the best suggestion so far. I guess it would work well > in practice. But theoretically it still doesn't give O(log n) lookup > and insertion without the kinds of assumptions you noted about the > distribution of elements w.r.t. the hash function. And relying on > polymorphic hashing seems a bit of a hack. > > So I still can't help wondering if there's an elegant solution with the > desired worst-case behaviour, preferably relying only on pairwise > comparison. Is it just a coincidence that the numerous varieties of > balanced tree (AVL, 2-3-4, red-black, ...) all seem to be non-canonical? > Or is it essential to their efficiency? (Perhaps this is a question for > another forum.) I don't think so. I've been batting around ideas for ways to do balanced trees so that no matter what order you add things, you always get the same tree. But even assuming you could do this, doing a structural compare is still O(N). So you might as well let the trees be different. Note that Patricia trees, as I understand them, don't save you here either. Mathematically, two sets A and B are equal if every element in set A is in set B and vice versa. Think about it for a moment. Assume we have a tree strucutre: type 'a node_t = Node of 'a * 'a node_t * 'a node_t | Empty Now, assume the code magically keeps the trees balanced exactly the same way. How would you do the comparison? let rec equals a b = match a, b with Empty, Empty -> true | Node(a_data, a_left, a_right), Node(b_data, b_left, b_right) -> (a_data == b_data) && (equals a_left b_left) && (equals a_right b_right) | _ -> false ;; This is an O(N) algorithm still. The only way I can think of to make pointer equivelence meaningfull is to keep some structure of all the structures currently in use in the background. Then you have to search this structure on every insertion or deletion to see if the new set is equal (using the old O(N) comparison) to an already existing set. This structure could be a tree as well, but this still makes insertion and deletion O(N log M) (where M is the number of structures currently in use). Instead of just O(log N). Much worse. There are ways you can make comparison faster. For example, keep the number of elements handy in the structure (O(1) length operation) and just compare the lengths before doing anything else. If you can hash the objects, you can keep a hash of the entire structure, being the sum of the hashes of the individual elements (updating the hash is then O(1) on insert or delete). If the hashs don't match, the structures are gaurenteed to be different. And, obviously, if pointer comparison is equal, the structures are equal. Note that you always have cases where you have to do an O(N) comparison. I think you're SOL. -- "Usenet is like a herd of performing elephants with diarrhea -- massive, difficult to redirect, awe-inspiring, entertaining, and a source of mind-boggling amounts of excrement when you least expect it." - Gene Spafford Brian ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners