From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13871 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: dlsym(handle) may search in unrelated libraries Date: Tue, 26 Feb 2019 10:07:33 -0500 Message-ID: <20190226150733.GQ23599@brightrain.aerifal.cx> References: <845057a3202b1c3e23a4059e067f5f7b@ispras.ru> <20190210010347.GA23599@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="139245"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.5.21 (2010-09-15) Cc: musl@lists.openwall.com To: Alexey Izbyshev Original-X-From: musl-return-13887-gllmg-musl=m.gmane.org@lists.openwall.com Tue Feb 26 16:07:54 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1gyeKo-000a1d-1P for gllmg-musl@m.gmane.org; Tue, 26 Feb 2019 16:07:54 +0100 Original-Received: (qmail 13457 invoked by uid 550); 26 Feb 2019 15:07:51 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 13430 invoked from network); 26 Feb 2019 15:07:50 -0000 Content-Disposition: inline In-Reply-To: <20190210010347.GA23599@brightrain.aerifal.cx> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13871 Archived-At: On Sat, Feb 09, 2019 at 08:03:47PM -0500, Rich Felker wrote: > On Sun, Feb 10, 2019 at 01:53:02AM +0300, Alexey Izbyshev wrote: > > (Replying to https://www.openwall.com/lists/musl/2019/02/07/7) > > > > >On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote: > > >> On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote: > > >> > I'm not sure if it makes sense to build both of these lists at the > > >> > same time. We currently try to avoid allocating dependency lists for > > >> > libraries loaded at startup in order not to allocate and setup data > > >> > that might never be used, defering until if/when dlopen is called on > > >> > them. I've wanted to do the ctor order walk without allocating a > > >> > dependency list, but I don't know a good way to do so. Note that the > > >> > code that runs ctors cannot allocate because, at the time it runs, > > >> > dlopen has already "committed" the load and can't back it out. It has > > >> > to have all resources it needs for the walk precommitted. > > >> > > >> Depth first means stack. Means recursion or explicit stack. Explicit is > > >> going to be hard, considering there is no known limit to dependency > > >> nesting depth. We could argue that the user better make their stack size > > >> ulimit large enough for the main thread, but I hazard a guess that > > >> nobody expects dlopen() to use overly much stack in other threads. > > >> > > >> Alright, what's the algorithm here? > > >> > > >> init(lib): > > >> if (!lib.inited): > > >> foreach d in lib.deps init(d) > > >> start_init(lib) > > >> lib.inited = 1 > > >> > > >> That about right? Because that means we need a stack as high as the > > >> nesting depth of dependencies. We can get a pessimistic estimation from > > >> the number of deps loaded, but that may be way too high (see below). > > > > > >Yes, but you can also avoid recursion just by looping to the deepest > > >dependency with !inited, then going back to the root. For a one-time > > >operation at dlopen-time or program-start time, the quadratic search > > >for each !inited seems unlikely to be a problem: > > > > > >> > Since the beginning of a list of breadth-first dependencies is just > > >> > the direct dependencies, if we recorded the number of direct > > >> > dependencies for each dso, the breadth-first lists could be used to > > >> > perform a depth-first walk to run ctors; this can be made > > >> > non-recursive, at the cost of quadratic time but with trivially-tiny > > >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > >> > constant factor, by simply restarting at the root of the tree after > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > >> > each node and finding the deepest not-yet-constructed dso. This > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > We'd need breadth-first lists for all dsos to implement DFS as you > > suggest because a single breadth-first list for the root is not > > enough to reconstruct all edges of the dependency graph even if we > > know the number of direct dependencies for each dso (we need to > > process each edge for DFS to work correctly). Or am I > > misunderstanding your suggestion? > > I think you misunderstood. I was talking about having the full BFS > list for each dso, but tagging each one with the count of entries > which are direct dependencies, so that the corresponding initial > subarray can be used for DFS. > > > If we're willing to add some fields to struct dso to implement the > > correct ctor order, I'd suggest to use a non-recursive DFS with an > > explicit stack instead. We'd need just two fields in each stack slot > > (dso and the number of processed DT_NEEDED entries), so the memory > > overhead is low compared to sizeof(struct dso). The stack can be > > preallocated before "committing" dlopen(). As Markus said, the upper > > bound is the number of dsos loaded with this dlopen() call because > > all other dsos are already constructed, so we don't need to visit > > them. This stack can be freed immediately after ctors are run. We > > already have dso->constructed to mark visited dsos, and we already > > use another list for finalization (dso->fini_next), and that list > > would be built naturally by DFS. > > I think it might get more complicated with concurrent and recursive > calls to dlopen (e.g. dlopen from a ctor). Right now the > init_fini_lock prevents a dlopen from another thread from completing > as long as ctors are still running from a previous call, except > recursively (in the same thread) thanks to a recursive mutex. Really, > though, we should never be holding libc locks while application code > runs, so the lock should be released each time a ctor is run and > re-taken after it finishes. > > I'd considered preallocating the explicit stack structure as part of > the dsos, which is problematic because multiple callers might need to > use it. However if the explicit stack were allocated by and just tied > to the dlopen call frame instance, I think it should work. There may > be deps that were already loaded from another thread's dlopen, though, > but which were not yet constructed, so the bound is probably not the > number of newly-loaded dsos but the total number of deps or the max > depth of deps. This is probably each to record from building the BFS > list. I've gotten back to thinking about this, and here's a rough idea of the algorithm I'm thinking of using: Load-direct-deps operation for a single DSO: - Count DT_NEEDED entries, store number of direct deps. - Allocate storage for direct deps list. - Call load_library for each direct dep, save pointer to it. Load-deps operation for newly dlopen'd DSO: - Start with all DSOs unmarked - Use the deps list as a queue. For each entry in queue: - If the DSO is marked, skip it. - If the DSO doesn't have a deps list, perform load-direct-deps. - Append the initial n_direct_deps entries to the queue. - Mark the DSO. - When end of queue is reached, finish and unmark all DSOs. This should leave us with direct-dep lists for all DSOs, usable for dependency-order ctor execution, and a non-redundant (currently, the list can be highly redundant and thus inefficient to dlsym) BFS deps list for the dlopen'd DSO. The direct-dep lists for DSOs not explicitly opened also form the initial segment of any future BFS deps list needed when dlopen is called on them. The above is entirely non-recursive and uses no temp storage; the data is generated in-place. Walking the direct-deps lists for ctor execution can either use recursion or the "restart" approach I've described before. Rich