From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id OAA16301; Wed, 28 Aug 2002 14:29:23 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id OAA16062 for ; Wed, 28 Aug 2002 14:29:22 +0200 (MET DST) Received: from smtp3.cern.ch (smtp3.cern.ch [137.138.131.164]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id g7SCTL508073 for ; Wed, 28 Aug 2002 14:29:21 +0200 (MET DST) Received: from pcdh91.cern.ch (pcdh91.cern.ch [137.138.30.104]) by smtp3.cern.ch (8.12.1/8.12.1) with ESMTP id g7SCT9Pr026961; Wed, 28 Aug 2002 14:29:09 +0200 (MET DST) Received: from simko by pcdh91.cern.ch with local (Exim 3.35 #1 (Debian)) id 17k1wc-0000P1-00; Wed, 28 Aug 2002 14:29:10 +0200 To: "Yaron M. Minsky" Cc: caml-list@inria.fr Subject: Re: [Caml-list] intersecting huge integer sets References: <87n0r835w9.fsf@pcdh91.cern.ch> <1030450016.24909.45.camel@dragonfly.localdomain> <20020827121243.GC32176@fichte.ai.univie.ac.at> From: Tibor Simko Date: Wed, 28 Aug 2002 14:29:10 +0200 In-Reply-To: <20020827121243.GC32176@fichte.ai.univie.ac.at> (Markus Mottl's message of "Tue, 27 Aug 2002 14:12:43 +0200") Message-ID: <878z2r4161.fsf@pcdh91.cern.ch> User-Agent: Gnus/5.090007 (Oort Gnus v0.07) Emacs/21.2 (i386-debian-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Hello Thanks for all the suggestions. Here's a little summary [figures below obtained by studying some special cases]: As for Ptset, I found that Patricia trees are good in sparse situations only: here they may be about 2x faster than Hashtbl. However, the situation to optimize is rather the dense set intersection performance, as said in my previous example. Here Ptset often performs 3x slower than Hashtbl: even the ordinary Set module intersection is often faster than Ptset's one. Overall, having tried several sparse-dense situations, I found that Hashtbl sets perform much better than Ptset sets. As for Bitv, the set operations are indeed very fast for dense sets, often an order of magnitude faster than Hashtbl. For sparse sets, Hashtbl may be faster but Bitv performance is acceptable here. And, since marshaling of Bitv vectors is blazingly fast too (often two order of magnitudes faster than Hashtbl), it looks like Bitv is the ideal overall data structure for my problem. :-) Tibor ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners