From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ben8@cs.washington.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by yquem.inria.fr (Postfix) with ESMTP id CB34DBBAF
	for <caml-list@yquem.inria.fr>; Wed, 26 Aug 2009 16:35:50 +0200 (CEST)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AsgBAPPklEpCbwQZkGdsb2JhbACbEAEBAQEJCQwHEwStP4c6iEYFhBo
X-IronPort-AV: E=Sophos;i="4.44,279,1249250400"; 
   d="scan'208";a="31693666"
Received: from out1.smtp.messagingengine.com ([66.111.4.25])
  by mail2-smtp-roc.national.inria.fr with ESMTP; 26 Aug 2009 16:35:50 +0200
Received: from compute2.internal (compute2.internal [10.202.2.42])
	by gateway1.messagingengine.com (Postfix) with ESMTP id 91998584F2;
	Wed, 26 Aug 2009 10:35:49 -0400 (EDT)
Received: from heartbeat2.messagingengine.com ([10.202.2.161])
  by compute2.internal (MEProxy); Wed, 26 Aug 2009 10:35:49 -0400
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=messagingengine.com; h=cc:message-id:from:to:in-reply-to:content-type:content-transfer-encoding:mime-version:subject:date:references; s=smtpout; bh=xmyuryj3Wp3OniJ8+QkpfbnWkp0=; b=X/yCF5wX4yCAW3KlU/odqmzV9bE6FzcHGMHrG7z6v5WYEdBXGYmwRALirc/B+KlWahc8rpYB1ulZxZhjVFAxqY3YXWqER57oLygeluiYe+X9rvoulCPhyK+c0CCnZNf8dzJ6Brq0rfi5UbuP88E3Zme90DQabca/q9lXMpSCAWs=
X-Sasl-enc: dzQVLwzAxWvCoF3VeYktV0kNn8ygmwECvRxp320qsf64 1251297348
Received: from [192.168.254.6] (70-100-236-43.dr04.glvv.ny.frontiernet.net [70.100.236.43])
	by mail.messagingengine.com (Postfix) with ESMTPSA id D1C3D16411;
	Wed, 26 Aug 2009 10:35:48 -0400 (EDT)
Cc: caml-list@yquem.inria.fr
Message-Id: <D42E2AE8-E252-4528-8D1C-EEC25120502A@cs.washington.edu>
From: Benjamin Ylvisaker <ben8@cs.washington.edu>
To: Julien Signoles <Julien.Signoles@cea.fr>
In-Reply-To: <4A93F39A.7090704@cea.fr>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Apple Message framework v936)
Subject: Re: [Caml-list] ocamlgraph predecessors
Date: Wed, 26 Aug 2009 10:35:47 -0400
References: <4EC4528F-4993-482F-8ECB-ED6848C3FC4D@cs.washington.edu> <4A93F39A.7090704@cea.fr>
X-Mailer: Apple Mail (2.936)
X-Spam: no; 0.00; signoles:01 moderately:01 work-around:01 work-around:01 amortized:01 functor:01 polluting:01 non-trivial:01 externally:01 application-:01 directional:98 cheung:98 directional:98 mate:98 garbage:01 


On Aug 25, 2009, at 10:22 AM, Julien Signoles wrote:

> Benjamin Ylvisaker a =E9crit :
>> I have been using ocamlgraph for a while, and have been generally =20
>> happy with it.  I experienced some poor performance with moderately =20=

>> large graphs (10-100k vertices) recently, which led me to look =20
>> through the source code a little.  It seems that doing anything =20
>> with the predecessors of a vertex, even just getting a list of =20
>> them, requires scanning through all the vertices in a graph.  This =20=

>> seems a little crazy to me.  Am I missing something?  Is there some =20=

>> kind of work-around that gives reasonable performance for =20
>> predecessor operations (i.e. not O(|V|)).
>
> Actually, looking at the current implementation, accessing =20
> predecessors is worse that O(|V|): that is max(O(|V|,O(|E|)).
>
> If you use concrete (imperative directional) graphs, the simpler =20
> work-around  is to use Imperative.Digraph.ConcreteBidirectional as =20
> suggested by Kevin Cheung. It uses more memory space (at worse the =20
> double) that standard concrete directional graphs. But accessing =20
> predecessors is in O(1) amortized instead of max(O(|V|,O(|E|)) and =20
> removing a vertex is in O(D*ln(D)) where D is the maximal degree of =20=

> the graph instead of O(|V|*ln(|V|)).
>
> If you don't use this functor, other work-arounds have been =20
> suggested in other posts.
>
> By the way contributing to ocamlgraph by adding =20
> Imperative.Digraph.AbstractBidirectional (for instance) is still =20
> possible and welcome :o).

Thanks for your suggestions.  I had not noticed the =20
ConcreteBidirectional module, but it looks like it wouldn't be a drop-=20=

in replacement for me, because it's unlabeled and I need labels.

If anyone is curious, here is the wrapper logic that I ended up adding:

When the user wants an edge from v1 to v2 with label X, two "internal" =20=

edges get created: one from v1 to v2 with label EdgeForward (X) and =20
one from v2 to v1 with label EdgeBackward (X).  These two edges are =20
considered equivalent by the wrapper code.  Additionally, to made edge =20=

removal faster, I added a table that maps an edge to its "pair".  All =20=

edge-related wrapper functions can take either of the pair of wrapper =20=

edges, and will use the source or destination vertex of the edge, =20
depending on the Forward/Backward label.  The edge removal wrapper =20
function gets the mate of the passed in edge by looking it up in the =20
table, and removes them both.

To make vertex removal faster, I added a Boolean "removed" field to =20
vertex labels that is set to false on vertex creation.  When the =20
wrapper vertex removal function is called, it removes all the incident =20=

edges and then just sets the vertex's removed flag to true.  Vertex =20
scanning functions clearly need to check the flag to determine whether =20=

an "internal" vertex should actually be considered part of the graph =20
or not.

If there is a lot of vertex creation and removal in an application, =20
clearly a lot of "garbage" vertices will end up polluting the graph.  =20=

When the amount of garbage gets too large, a new copy of a graph can =20
be constructed with only the "real" vertices copied over.  This is not =20=

a totally transparent operation, because if there are any tables keyed =20=

on vertices or edges, external to the graph itself, they'll get =20
confused by the copying.

This set of wrapper logic clearly adds a non-trivial amount of memory =20=

overhead:  1) The edge and vertex labels are a little bit bigger.  2) =20=

There are two edges for every externally visible edge.  3) The edge-=20
pair table adds another O(|E|) chunk of memory.  4) An application-=20
dependent number of garbage vertices will be floating around.  I'm =20
pretty sure all the wrapper operations are asymptotically as fast as =20
they reasonably can be, though.  If an implementation like this were =20
done in the library itself (notice the strategic use of the passive =20
voice) instead of as a wrapper, I'm pretty sure vertex removal could =20
be handled more cleanly.

Ben