From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id CB34DBBAF for ; Wed, 26 Aug 2009 16:35:50 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AsgBAPPklEpCbwQZkGdsb2JhbACbEAEBAQEJCQwHEwStP4c6iEYFhBo X-IronPort-AV: E=Sophos;i="4.44,279,1249250400"; d="scan'208";a="31693666" Received: from out1.smtp.messagingengine.com ([66.111.4.25]) by mail2-smtp-roc.national.inria.fr with ESMTP; 26 Aug 2009 16:35:50 +0200 Received: from compute2.internal (compute2.internal [10.202.2.42]) by gateway1.messagingengine.com (Postfix) with ESMTP id 91998584F2; Wed, 26 Aug 2009 10:35:49 -0400 (EDT) Received: from heartbeat2.messagingengine.com ([10.202.2.161]) by compute2.internal (MEProxy); Wed, 26 Aug 2009 10:35:49 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=messagingengine.com; h=cc:message-id:from:to:in-reply-to:content-type:content-transfer-encoding:mime-version:subject:date:references; s=smtpout; bh=xmyuryj3Wp3OniJ8+QkpfbnWkp0=; b=X/yCF5wX4yCAW3KlU/odqmzV9bE6FzcHGMHrG7z6v5WYEdBXGYmwRALirc/B+KlWahc8rpYB1ulZxZhjVFAxqY3YXWqER57oLygeluiYe+X9rvoulCPhyK+c0CCnZNf8dzJ6Brq0rfi5UbuP88E3Zme90DQabca/q9lXMpSCAWs= X-Sasl-enc: dzQVLwzAxWvCoF3VeYktV0kNn8ygmwECvRxp320qsf64 1251297348 Received: from [192.168.254.6] (70-100-236-43.dr04.glvv.ny.frontiernet.net [70.100.236.43]) by mail.messagingengine.com (Postfix) with ESMTPSA id D1C3D16411; Wed, 26 Aug 2009 10:35:48 -0400 (EDT) Cc: caml-list@yquem.inria.fr Message-Id: From: Benjamin Ylvisaker To: Julien Signoles In-Reply-To: <4A93F39A.7090704@cea.fr> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v936) Subject: Re: [Caml-list] ocamlgraph predecessors Date: Wed, 26 Aug 2009 10:35:47 -0400 References: <4EC4528F-4993-482F-8ECB-ED6848C3FC4D@cs.washington.edu> <4A93F39A.7090704@cea.fr> X-Mailer: Apple Mail (2.936) X-Spam: no; 0.00; signoles:01 moderately:01 work-around:01 work-around:01 amortized:01 functor:01 polluting:01 non-trivial:01 externally:01 application-:01 directional:98 cheung:98 directional:98 mate:98 garbage:01 On Aug 25, 2009, at 10:22 AM, Julien Signoles wrote: > Benjamin Ylvisaker a =E9crit : >> I have been using ocamlgraph for a while, and have been generally =20 >> happy with it. I experienced some poor performance with moderately =20= >> large graphs (10-100k vertices) recently, which led me to look =20 >> through the source code a little. It seems that doing anything =20 >> with the predecessors of a vertex, even just getting a list of =20 >> them, requires scanning through all the vertices in a graph. This =20= >> seems a little crazy to me. Am I missing something? Is there some =20= >> kind of work-around that gives reasonable performance for =20 >> predecessor operations (i.e. not O(|V|)). > > Actually, looking at the current implementation, accessing =20 > predecessors is worse that O(|V|): that is max(O(|V|,O(|E|)). > > If you use concrete (imperative directional) graphs, the simpler =20 > work-around is to use Imperative.Digraph.ConcreteBidirectional as =20 > suggested by Kevin Cheung. It uses more memory space (at worse the =20 > double) that standard concrete directional graphs. But accessing =20 > predecessors is in O(1) amortized instead of max(O(|V|,O(|E|)) and =20 > removing a vertex is in O(D*ln(D)) where D is the maximal degree of =20= > the graph instead of O(|V|*ln(|V|)). > > If you don't use this functor, other work-arounds have been =20 > suggested in other posts. > > By the way contributing to ocamlgraph by adding =20 > Imperative.Digraph.AbstractBidirectional (for instance) is still =20 > possible and welcome :o). Thanks for your suggestions. I had not noticed the =20 ConcreteBidirectional module, but it looks like it wouldn't be a drop-=20= in replacement for me, because it's unlabeled and I need labels. If anyone is curious, here is the wrapper logic that I ended up adding: When the user wants an edge from v1 to v2 with label X, two "internal" =20= edges get created: one from v1 to v2 with label EdgeForward (X) and =20 one from v2 to v1 with label EdgeBackward (X). These two edges are =20 considered equivalent by the wrapper code. Additionally, to made edge =20= removal faster, I added a table that maps an edge to its "pair". All =20= edge-related wrapper functions can take either of the pair of wrapper =20= edges, and will use the source or destination vertex of the edge, =20 depending on the Forward/Backward label. The edge removal wrapper =20 function gets the mate of the passed in edge by looking it up in the =20 table, and removes them both. To make vertex removal faster, I added a Boolean "removed" field to =20 vertex labels that is set to false on vertex creation. When the =20 wrapper vertex removal function is called, it removes all the incident =20= edges and then just sets the vertex's removed flag to true. Vertex =20 scanning functions clearly need to check the flag to determine whether =20= an "internal" vertex should actually be considered part of the graph =20 or not. If there is a lot of vertex creation and removal in an application, =20 clearly a lot of "garbage" vertices will end up polluting the graph. =20= When the amount of garbage gets too large, a new copy of a graph can =20 be constructed with only the "real" vertices copied over. This is not =20= a totally transparent operation, because if there are any tables keyed =20= on vertices or edges, external to the graph itself, they'll get =20 confused by the copying. This set of wrapper logic clearly adds a non-trivial amount of memory =20= overhead: 1) The edge and vertex labels are a little bit bigger. 2) =20= There are two edges for every externally visible edge. 3) The edge-=20 pair table adds another O(|E|) chunk of memory. 4) An application-=20 dependent number of garbage vertices will be floating around. I'm =20 pretty sure all the wrapper operations are asymptotically as fast as =20 they reasonably can be, though. If an implementation like this were =20 done in the library itself (notice the strategic use of the passive =20 voice) instead of as a wrapper, I'm pretty sure vertex removal could =20 be handled more cleanly. Ben