caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Benjamin Ylvisaker <ben8@cs.washington.edu>
To: Julien Signoles <Julien.Signoles@cea.fr>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] ocamlgraph predecessors
Date: Wed, 26 Aug 2009 10:35:47 -0400	[thread overview]
Message-ID: <D42E2AE8-E252-4528-8D1C-EEC25120502A@cs.washington.edu> (raw)
In-Reply-To: <4A93F39A.7090704@cea.fr>


On Aug 25, 2009, at 10:22 AM, Julien Signoles wrote:

> Benjamin Ylvisaker a écrit :
>> I have been using ocamlgraph for a while, and have been generally  
>> happy with it.  I experienced some poor performance with moderately  
>> large graphs (10-100k vertices) recently, which led me to look  
>> through the source code a little.  It seems that doing anything  
>> with the predecessors of a vertex, even just getting a list of  
>> them, requires scanning through all the vertices in a graph.  This  
>> seems a little crazy to me.  Am I missing something?  Is there some  
>> kind of work-around that gives reasonable performance for  
>> predecessor operations (i.e. not O(|V|)).
>
> Actually, looking at the current implementation, accessing  
> predecessors is worse that O(|V|): that is max(O(|V|,O(|E|)).
>
> If you use concrete (imperative directional) graphs, the simpler  
> work-around  is to use Imperative.Digraph.ConcreteBidirectional as  
> suggested by Kevin Cheung. It uses more memory space (at worse the  
> double) that standard concrete directional graphs. But accessing  
> predecessors is in O(1) amortized instead of max(O(|V|,O(|E|)) and  
> removing a vertex is in O(D*ln(D)) where D is the maximal degree of  
> the graph instead of O(|V|*ln(|V|)).
>
> If you don't use this functor, other work-arounds have been  
> suggested in other posts.
>
> By the way contributing to ocamlgraph by adding  
> Imperative.Digraph.AbstractBidirectional (for instance) is still  
> possible and welcome :o).

Thanks for your suggestions.  I had not noticed the  
ConcreteBidirectional module, but it looks like it wouldn't be a drop- 
in replacement for me, because it's unlabeled and I need labels.

If anyone is curious, here is the wrapper logic that I ended up adding:

When the user wants an edge from v1 to v2 with label X, two "internal"  
edges get created: one from v1 to v2 with label EdgeForward (X) and  
one from v2 to v1 with label EdgeBackward (X).  These two edges are  
considered equivalent by the wrapper code.  Additionally, to made edge  
removal faster, I added a table that maps an edge to its "pair".  All  
edge-related wrapper functions can take either of the pair of wrapper  
edges, and will use the source or destination vertex of the edge,  
depending on the Forward/Backward label.  The edge removal wrapper  
function gets the mate of the passed in edge by looking it up in the  
table, and removes them both.

To make vertex removal faster, I added a Boolean "removed" field to  
vertex labels that is set to false on vertex creation.  When the  
wrapper vertex removal function is called, it removes all the incident  
edges and then just sets the vertex's removed flag to true.  Vertex  
scanning functions clearly need to check the flag to determine whether  
an "internal" vertex should actually be considered part of the graph  
or not.

If there is a lot of vertex creation and removal in an application,  
clearly a lot of "garbage" vertices will end up polluting the graph.   
When the amount of garbage gets too large, a new copy of a graph can  
be constructed with only the "real" vertices copied over.  This is not  
a totally transparent operation, because if there are any tables keyed  
on vertices or edges, external to the graph itself, they'll get  
confused by the copying.

This set of wrapper logic clearly adds a non-trivial amount of memory  
overhead:  1) The edge and vertex labels are a little bit bigger.  2)  
There are two edges for every externally visible edge.  3) The edge- 
pair table adds another O(|E|) chunk of memory.  4) An application- 
dependent number of garbage vertices will be floating around.  I'm  
pretty sure all the wrapper operations are asymptotically as fast as  
they reasonably can be, though.  If an implementation like this were  
done in the library itself (notice the strategic use of the passive  
voice) instead of as a wrapper, I'm pretty sure vertex removal could  
be handled more cleanly.

Ben


  reply	other threads:[~2009-08-26 14:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-08  5:24 Benjamin Ylvisaker
2009-08-08 13:35 ` [Caml-list] " Edgar Friendly
2009-08-08 20:16   ` Benjamin Ylvisaker
2009-08-09 14:56     ` Edgar Friendly
2009-08-25 14:22       ` Julien Signoles
2009-08-25 14:22 ` Julien Signoles
2009-08-26 14:35   ` Benjamin Ylvisaker [this message]
2009-08-26  6:54 ` Jean-Christophe Filliâtre
2009-08-09 18:32 rixed
2009-08-10  1:59 ` Jacques Garrigue
2009-08-10 12:19   ` kcheung
2009-08-10 17:46     ` Benjamin Ylvisaker
2009-08-12 22:40   ` Francis Dupont

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D42E2AE8-E252-4528-8D1C-EEC25120502A@cs.washington.edu \
    --to=ben8@cs.washington.edu \
    --cc=Julien.Signoles@cea.fr \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).