mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: dlsym(handle) may search in unrelated libraries
Date: Thu, 7 Feb 2019 13:57:36 -0500	[thread overview]
Message-ID: <20190207185736.GR23599@brightrain.aerifal.cx> (raw)
In-Reply-To: <20190207183637.GF5469@voyager>

On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote:
> On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote:
> > However, a depth-first list of dependencies is also needed to solve
> > the longstanding ctor-order issue discussed in several past threads
> > (which I can look up if search doesn't turn them up for you). This is
> > not a requirement of POSIX (which doesn't even have ctors), but it's a
> > reasonable expectation we currently get wrong and I think it might be
> > specified in ELF or some related sysv "standards".
> 
> Requirements first, nice-to-haves later, right? Once we have the dlsym()
> conformance issue squared away, we can still deal with this issue.
> 
> You mean, people expect their dependencies to be initialized in their
> own initializers? When it is well-known that there is no order to
> initializers, and e.g. the C++ standard says there is no order to
> constructors of static objects.

Yes. GCC has an extension for ctor priority within static linking
scope, but for dynamic linking scope that doesn't help. I don't like
any of this but glib depends on it to avoid just doing the right thing
with pthread_once/call_once, and refuses to fix it.

> > I'm not sure if it makes sense to build both of these lists at the
> > same time. We currently try to avoid allocating dependency lists for
> > libraries loaded at startup in order not to allocate and setup data
> > that might never be used, defering until if/when dlopen is called on
> > them. I've wanted to do the ctor order walk without allocating a
> > dependency list, but I don't know a good way to do so. Note that the
> > code that runs ctors cannot allocate because, at the time it runs,
> > dlopen has already "committed" the load and can't back it out. It has
> > to have all resources it needs for the walk precommitted.
> 
> Depth first means stack. Means recursion or explicit stack. Explicit is
> going to be hard, considering there is no known limit to dependency
> nesting depth. We could argue that the user better make their stack size
> ulimit large enough for the main thread, but I hazard a guess that
> nobody expects dlopen() to use overly much stack in other threads.
> 
> Alright, what's the algorithm here?
> 
> init(lib):
>     if (!lib.inited):
>         foreach d in lib.deps init(d)
>         start_init(lib)
>         lib.inited = 1
> 
> That about right? Because that means we need a stack as high as the
> nesting depth of dependencies. We can get a pessimistic estimation from
> the number of deps loaded, but that may be way too high (see below).

Yes, but you can also avoid recursion just by looping to the deepest
dependency with !inited, then going back to the root. For a one-time
operation at dlopen-time or program-start time, the quadratic search
for each !inited seems unlikely to be a problem:

> > Since the beginning of a list of breadth-first dependencies is just
> > the direct dependencies, if we recorded the number of direct
> > dependencies for each dso, the breadth-first lists could be used to
> > perform a depth-first walk to run ctors; this can be made
> > non-recursive, at the cost of quadratic time but with trivially-tiny
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > constant factor, by simply restarting at the root of the tree after
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > each node and finding the deepest not-yet-constructed dso. This
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> $ ldd =mpv | wc -l
> 292
> 
> And that, I will remind you, is the "suckless" alternative to mplayer.
> 
> Anyway, that is breadth, not depth. Experimentally, it is hard to find a
> chain of libraries deeper than ten. They all eventually just lead back
> to libc.
> 

> So what you could do, when it comes time to run the initializers, is to
> walk the dyn vectors again. Those are already allocated, and all the
> libs they reference are already loaded, so load_library() would only be
> a search operation.

Note that each load_library is a linear search, so you still have the
quadratic time, now with much higher coefficient. And if you want to
use my go-back-to-root approach to avoid having O(depth) working
space, it's cubic time.

Also, as-written, load_library is not "only a search operation";
perhaps this is a bug in itself. For absolute paths it will always
open the file and compare inode identity to see if it matches an
already-loaded file. This should probably be bypassed by first
checking long-name. Fortunately absolute paths in DT_NEEDED are
strongly discouraged/a-really-bad-idea, so it probably doesn't come up
much in practice.

> By the time the initializers run, that must be the
> case, both at load time and at runtime. Unless loading the library
> failed the first time, in which case at runtime we have already bailed.
> And at load time we... haven't. Are you sure continuing on load failure
> of a dependency during load time is a good idea?

I don't follow. The dlopen operation is not committed until load of
all dependencies completes successfully, and if any fail to load, the
whole operation is backed-out. But ctors don't/can't run until *after*
that, when we've already committed to success.

Rich


  reply	other threads:[~2019-02-07 18:57 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-05 21:02 Alexey Izbyshev
2019-02-06 13:40 ` Alexey Izbyshev
2019-02-06 16:02 ` Markus Wichmann
2019-02-06 17:02   ` Alexey Izbyshev
2019-02-06 20:25     ` Markus Wichmann
2019-02-06 21:23       ` Alexey Izbyshev
2019-02-07  5:33         ` Markus Wichmann
2019-02-07 13:42           ` Alexey Izbyshev
2019-02-07 17:43             ` Markus Wichmann
2019-02-07 20:37               ` Markus Wichmann
2019-02-07 21:29               ` Rich Felker
2019-02-07 16:54           ` Rich Felker
2019-02-07 18:36             ` Markus Wichmann
2019-02-07 18:57               ` Rich Felker [this message]
2019-02-07 20:31                 ` Markus Wichmann
2019-02-07 21:33                   ` Rich Felker
2019-02-07 21:37                     ` Rich Felker
2019-02-08 10:19             ` A. Wilcox
2019-02-08 12:00               ` Szabolcs Nagy
2019-02-08 16:09                 ` Rich Felker
2019-02-09 22:53 Alexey Izbyshev
2019-02-10  1:03 ` Rich Felker
2019-02-26 15:07   ` Rich Felker
2019-03-04  2:11     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190207185736.GR23599@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).