From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 13123BBBB for ; Fri, 31 Mar 2006 14:51:43 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k2VCpg9b002399 for ; Fri, 31 Mar 2006 14:51:42 +0200 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id OAA23474 for ; Fri, 31 Mar 2006 14:51:41 +0200 (MET DST) Received: from gw-eur4.philips.com (gw-eur4.philips.com [161.85.125.10]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k2VCpe3c002396 for ; Fri, 31 Mar 2006 14:51:41 +0200 Received: from smtpscan-eur5.philips.com (smtpscan-eur5.mail.philips.com [130.144.57.168]) by gw-eur4.philips.com (Postfix) with ESMTP id BD5E04971F; Fri, 31 Mar 2006 12:51:40 +0000 (UTC) Received: from smtpscan-eur5.philips.com (localhost [127.0.0.1]) by localhost.philips.com (Postfix) with ESMTP id 808FD47; Fri, 31 Mar 2006 12:51:40 +0000 (GMT) Received: from smtprelay-eur2.philips.com (smtprelay-eur2.philips.com [130.144.57.171]) by smtpscan-eur5.philips.com (Postfix) with ESMTP id D890D3D; Fri, 31 Mar 2006 12:51:39 +0000 (GMT) Received: from ehvrmh02.diamond.philips.com (ehvrmh02-srv.diamond.philips.com [130.139.27.125]) by smtprelay-eur2.philips.com (Postfix) with ESMTP id 5F6DD3C; Fri, 31 Mar 2006 12:51:39 +0000 (GMT) In-Reply-To: <4120ea8c70420ccd2f670bd9e4433527@cs.uni-sb.de> To: Christian Lindig Cc: Caml List Subject: Re: [Caml-list] efficient binary relations? MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.0.3 September 26, 2003 Message-ID: From: Sebastian Egner Date: Fri, 31 Mar 2006 14:50:28 +0200 X-MIMETrack: Serialize by Router on ehvrmh02/H/SERVER/PHILIPS(Release 6.5.3FP1HF291 | September 19, 2005) at 31/03/2006 14:50:31, Serialize complete at 31/03/2006 14:50:31 Content-Type: multipart/alternative; boundary="=_alternative 0046A4FDC1257142_=" X-Miltered: at concorde with ID 442D25DE.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 442D25DC.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; binary:01 bitvectors:01 arrays:01 knuth:01 knuth:01 binary:01 bitvectors:01 arrays:01 caml-list:01 algorithm:01 algorithm:01 data:02 data:02 clarify:02 clarify:02 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=HTML_40_50,HTML_MESSAGE autolearn=disabled version=3.0.3 This is a multipart message in MIME format. --=_alternative 0046A4FDC1257142_= Content-Type: text/plain; charset="US-ASCII" Hello Christian, Please could you clarify the circumstances a little bit? 1. Are you looking for a data structure that you set up for a fixed R once and then query many times for different X? Or are you looking for a dynamic data structure in which you keep changing R? Or are you looking for a 'functional data structure' where the older versions of R are preserved? Or for a functional data structure where R is fixed, but the queries X are constructed incrementally? 2. Is R sparse, i.e. is |R| << |\X|*|\Y|? If not, bitvectors might not be so bad after all. If yes, you might want to look for a data structure storing the sets {x}' = {y : (x, y) in R} in such a way that they can be intersected efficiently. If R is fixed, {x}' can be represented as a sorted array of the elements. These arrays can be intersected quickly (see Knuth/TACP) but the asymptotically optimal algorithms are rather tricky (if I recall correctly, Knuth doesn't give them directly but only cites them). A straight-forward algorithm is taking the shorter array and looking up the elements in the longer one by binary search. This nicely generalizes to your case of computing X' for a given X: Get all {x}' for x in X and sort them into increasing |{x}'|; this takes O(|X| log |X|). Then for all y in {x}', x such that |{x}'| is minimal, look up y in {u}' by binary search for all u in X\{x} or until it is not found anymore; this takes O(Sum[log |{u}'| : u in X\{x}]) per candidate y. Simplifying the estimation (aehem...) you end up with O( min { |{x}'| : x in X } * |X| log max { |{x}'| : x in X } ), which is at least independent on |\Y|, if that is your main concern. Sebastian. --=_alternative 0046A4FDC1257142_= Content-Type: text/html; charset="US-ASCII"
Hello Christian,

Please could you clarify the circumstances a little bit?

1. Are you looking for a data structure that you set
up for a fixed R once and then query many times for
different X? Or are you looking for a dynamic data
structure in which you keep changing R? Or are you
looking for a 'functional data structure' where the
older versions of R are preserved? Or for a functional
data structure where R is fixed, but the queries X
are constructed incrementally?

2. Is R sparse, i.e. is |R| << |\X|*|\Y|?

If not, bitvectors might not be so bad after all.

If yes, you might want to look for a data structure
storing the sets {x}' = {y : (x, y) in R} in such a
way that they can be intersected efficiently. If R
is fixed, {x}' can be represented as a sorted array
of the elements. These arrays can be intersected
quickly (see Knuth/TACP) but the asymptotically optimal
algorithms are rather tricky (if I recall correctly,
Knuth doesn't give them directly but only cites them).

A straight-forward algorithm is taking the shorter array
and looking up the elements in the longer one by binary
search. This nicely generalizes to your case of computing
X' for a given X: Get all {x}' for x in X and sort them
into increasing |{x}'|; this takes O(|X| log |X|).
Then for all y in {x}', x such that |{x}'| is minimal,
look up y in {u}' by binary search for all u in X\{x}
or until it is not found anymore; this takes
O(Sum[log |{u}'| : u in X\{x}]) per candidate y.
Simplifying the estimation (aehem...) you end up with
O( min { |{x}'| : x in X } * |X| log max { |{x}'| : x in X } ),
which is at least independent on |\Y|, if that is your
main concern.

Sebastian. --=_alternative 0046A4FDC1257142_=--