On Sun, Feb 26, 2017 at 04:39:25PM -0500, Rich Felker wrote:
> > > > > Are you sure? My understanding of what it does is:
> > > > > 
> > > > > 1. Descend a->b->c, construct c, and back up to b.
> > > > 
> > > > you did not explain how you get back to b after c
> > > > without a stack of visited dsos or modified c->needed_by.
> > > 
> > > Sorry, that should have been back up to a (c->needed_by). Then:
> > > 
> > > 2. Descend a->b->d, construct d, and back up to b.
> > > 
> > > The key point is that x->needed_by is always the first dso that pulled
> > > in x, so if we back all the way back up to x->needed_by, we'll revisit
> > > all later dsos which depend on x.
> > 
> > for that a->b transition has to happen twice,
> > but a.next_dep is already past b the second
> > time a is visited, so i still don't see why
> > this works.
> 
> Indeed, that looks like a bug. Removing the ++ from p =
> p->deps[p->next_dep++]; fixes it, but breaks the logic for avoiding
> circular descent (the condition !p->deps[p->next_dep]->next_dep). I
> think we need to add a separate field to control that, and a visited
> flag does not suffice; instead it should probably be something like
> the descent depth (or just sequence number) at which the DSO was first
> encountered, so that we can avoid descending into a DSO that we
> already started descending into and that will be descended into again
> as part of the backing-up process.

Here's a v4 of the patch that saves the "init parent" we descended
from so that it can return where it left off. There are a couple
gratuitous hunks left over adding setting of "needed_by" where it made
sense to be set, but it's not actually used anymore. They could be
dropped if desired but are probably nice to keep for the sake of
consistency of data, even thoough it's data we don't use.

I believe this can be extended to allow concurrent dlopen by amending
the case in the tree-walk where a dependency isn't constructed yet but
already has an "init parent" to check whether it's
pending-construction in the calling thread (recursive dlopen from a
ctor) or another thread; in the former case (as now) treat it as
already-constructed; in the latter, wait on a condvar that gets
signaled at the end of each construction, then continue the loop
without advancing p. There are probably some subtleties I'm missing,
though.

Rich