From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13871
Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: dlsym(handle) may search in unrelated libraries
Date: Tue, 26 Feb 2019 10:07:33 -0500
Message-ID: <20190226150733.GQ23599@brightrain.aerifal.cx>
References: <845057a3202b1c3e23a4059e067f5f7b@ispras.ru>
 <20190210010347.GA23599@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226";
	logging-data="139245"; mail-complaints-to="usenet@blaine.gmane.org"
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: musl@lists.openwall.com
To: Alexey Izbyshev <izbyshev@ispras.ru>
Original-X-From: musl-return-13887-gllmg-musl=m.gmane.org@lists.openwall.com Tue Feb 26 16:07:54 2019
Return-path: <musl-return-13887-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.89)
	(envelope-from <musl-return-13887-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1gyeKo-000a1d-1P
	for gllmg-musl@m.gmane.org; Tue, 26 Feb 2019 16:07:54 +0100
Original-Received: (qmail 13457 invoked by uid 550); 26 Feb 2019 15:07:51 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 13430 invoked from network); 26 Feb 2019 15:07:50 -0000
Content-Disposition: inline
In-Reply-To: <20190210010347.GA23599@brightrain.aerifal.cx>
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:13871
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/13871>

On Sat, Feb 09, 2019 at 08:03:47PM -0500, Rich Felker wrote:
> On Sun, Feb 10, 2019 at 01:53:02AM +0300, Alexey Izbyshev wrote:
> > (Replying to https://www.openwall.com/lists/musl/2019/02/07/7)
> > 
> > >On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote:
> > >> On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote:
> > >> > I'm not sure if it makes sense to build both of these lists at the
> > >> > same time. We currently try to avoid allocating dependency lists for
> > >> > libraries loaded at startup in order not to allocate and setup data
> > >> > that might never be used, defering until if/when dlopen is called on
> > >> > them. I've wanted to do the ctor order walk without allocating a
> > >> > dependency list, but I don't know a good way to do so. Note that the
> > >> > code that runs ctors cannot allocate because, at the time it runs,
> > >> > dlopen has already "committed" the load and can't back it out. It has
> > >> > to have all resources it needs for the walk precommitted.
> > >>
> > >> Depth first means stack. Means recursion or explicit stack. Explicit is
> > >> going to be hard, considering there is no known limit to dependency
> > >> nesting depth. We could argue that the user better make their stack size
> > >> ulimit large enough for the main thread, but I hazard a guess that
> > >> nobody expects dlopen() to use overly much stack in other threads.
> > >>
> > >> Alright, what's the algorithm here?
> > >>
> > >> init(lib):
> > >>     if (!lib.inited):
> > >>         foreach d in lib.deps init(d)
> > >>         start_init(lib)
> > >>         lib.inited = 1
> > >>
> > >> That about right? Because that means we need a stack as high as the
> > >> nesting depth of dependencies. We can get a pessimistic estimation from
> > >> the number of deps loaded, but that may be way too high (see below).
> > >
> > >Yes, but you can also avoid recursion just by looping to the deepest
> > >dependency with !inited, then going back to the root. For a one-time
> > >operation at dlopen-time or program-start time, the quadratic search
> > >for each !inited seems unlikely to be a problem:
> > >
> > >> > Since the beginning of a list of breadth-first dependencies is just
> > >> > the direct dependencies, if we recorded the number of direct
> > >> > dependencies for each dso, the breadth-first lists could be used to
> > >> > perform a depth-first walk to run ctors; this can be made
> > >> > non-recursive, at the cost of quadratic time but with trivially-tiny
> > >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >> > constant factor, by simply restarting at the root of the tree after
> > >    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >> > each node and finding the deepest not-yet-constructed dso. This
> > >    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > We'd need breadth-first lists for all dsos to implement DFS as you
> > suggest because a single breadth-first list for the root is not
> > enough to reconstruct all edges of the dependency graph even if we
> > know the number of direct dependencies for each dso (we need to
> > process each edge for DFS to work correctly). Or am I
> > misunderstanding your suggestion?
> 
> I think you misunderstood. I was talking about having the full BFS
> list for each dso, but tagging each one with the count of entries
> which are direct dependencies, so that the corresponding initial
> subarray can be used for DFS.
> 
> > If we're willing to add some fields to struct dso to implement the
> > correct ctor order, I'd suggest to use a non-recursive DFS with an
> > explicit stack instead. We'd need just two fields in each stack slot
> > (dso and the number of processed DT_NEEDED entries), so the memory
> > overhead is low compared to sizeof(struct dso). The stack can be
> > preallocated before "committing" dlopen(). As Markus said, the upper
> > bound is the number of dsos loaded with this dlopen() call because
> > all other dsos are already constructed, so we don't need to visit
> > them. This stack can be freed immediately after ctors are run. We
> > already have dso->constructed to mark visited dsos, and we already
> > use another list for finalization (dso->fini_next), and that list
> > would be built naturally by DFS.
> 
> I think it might get more complicated with concurrent and recursive
> calls to dlopen (e.g. dlopen from a ctor). Right now the
> init_fini_lock prevents a dlopen from another thread from completing
> as long as ctors are still running from a previous call, except
> recursively (in the same thread) thanks to a recursive mutex. Really,
> though, we should never be holding libc locks while application code
> runs, so the lock should be released each time a ctor is run and
> re-taken after it finishes.
> 
> I'd considered preallocating the explicit stack structure as part of
> the dsos, which is problematic because multiple callers might need to
> use it. However if the explicit stack were allocated by and just tied
> to the dlopen call frame instance, I think it should work. There may
> be deps that were already loaded from another thread's dlopen, though,
> but which were not yet constructed, so the bound is probably not the
> number of newly-loaded dsos but the total number of deps or the max
> depth of deps. This is probably each to record from building the BFS
> list.

I've gotten back to thinking about this, and here's a rough idea of
the algorithm I'm thinking of using:

Load-direct-deps operation for a single DSO:
- Count DT_NEEDED entries, store number of direct deps.
- Allocate storage for direct deps list.
- Call load_library for each direct dep, save pointer to it.

Load-deps operation for newly dlopen'd DSO:
- Start with all DSOs unmarked
- Use the deps list as a queue. For each entry in queue:
  - If the DSO is marked, skip it.
  - If the DSO doesn't have a deps list, perform load-direct-deps.
  - Append the initial n_direct_deps entries to the queue.
  - Mark the DSO.
- When end of queue is reached, finish and unmark all DSOs.

This should leave us with direct-dep lists for all DSOs, usable for
dependency-order ctor execution, and a non-redundant (currently, the
list can be highly redundant and thus inefficient to dlsym) BFS deps
list for the dlopen'd DSO. The direct-dep lists for DSOs not
explicitly opened also form the initial segment of any future BFS deps
list needed when dlopen is called on them.

The above is entirely non-recursive and uses no temp storage; the data
is generated in-place. Walking the direct-deps lists for ctor
execution can either use recursion or the "restart" approach I've
described before.

Rich