mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Markus Wichmann <nullplan@gmx.net>
To: musl@lists.openwall.com
Subject: Re: dlsym(handle) may search in unrelated libraries
Date: Thu, 7 Feb 2019 19:36:38 +0100	[thread overview]
Message-ID: <20190207183637.GF5469@voyager> (raw)
In-Reply-To: <20190207165447.GP23599@brightrain.aerifal.cx>

On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote:
> Yes, it needs to contain all direct and indirect dependencies, in the
> correct order. Right now I think it incorrectly contains a superset of
> that. Is that your assessment of the situation too?
> 

Yes, it is a superset, if the original argument to load_deps() is not
the tail of the chain of libraries.

> However, a depth-first list of dependencies is also needed to solve
> the longstanding ctor-order issue discussed in several past threads
> (which I can look up if search doesn't turn them up for you). This is
> not a requirement of POSIX (which doesn't even have ctors), but it's a
> reasonable expectation we currently get wrong and I think it might be
> specified in ELF or some related sysv "standards".
> 

Requirements first, nice-to-haves later, right? Once we have the dlsym()
conformance issue squared away, we can still deal with this issue.

You mean, people expect their dependencies to be initialized in their
own initializers? When it is well-known that there is no order to
initializers, and e.g. the C++ standard says there is no order to
constructors of static objects.

> I'm not sure if it makes sense to build both of these lists at the
> same time. We currently try to avoid allocating dependency lists for
> libraries loaded at startup in order not to allocate and setup data
> that might never be used, defering until if/when dlopen is called on
> them. I've wanted to do the ctor order walk without allocating a
> dependency list, but I don't know a good way to do so. Note that the
> code that runs ctors cannot allocate because, at the time it runs,
> dlopen has already "committed" the load and can't back it out. It has
> to have all resources it needs for the walk precommitted.
> 

Depth first means stack. Means recursion or explicit stack. Explicit is
going to be hard, considering there is no known limit to dependency
nesting depth. We could argue that the user better make their stack size
ulimit large enough for the main thread, but I hazard a guess that
nobody expects dlopen() to use overly much stack in other threads.

Alright, what's the algorithm here?

init(lib):
    if (!lib.inited):
        foreach d in lib.deps init(d)
        start_init(lib)
        lib.inited = 1

That about right? Because that means we need a stack as high as the
nesting depth of dependencies. We can get a pessimistic estimation from
the number of deps loaded, but that may be way too high (see below).

> Since the beginning of a list of breadth-first dependencies is just
> the direct dependencies, if we recorded the number of direct
> dependencies for each dso, the breadth-first lists could be used to
> perform a depth-first walk to run ctors; this can be made
> non-recursive, at the cost of quadratic time but with trivially-tiny
> constant factor, by simply restarting at the root of the tree after
> each node and finding the deepest not-yet-constructed dso. This
> suggests always allocating the breadth-first dependency lists might be
> a good idea. That "feels" unfortunate since in principle they can get
> large, but it's only going to be large for ridiculous bloatware with
> hundreds of libraries, in which case these dependency lists are very
> small on a relative scale.
> 

$ ldd =mpv | wc -l
292

And that, I will remind you, is the "suckless" alternative to mplayer.

Anyway, that is breadth, not depth. Experimentally, it is hard to find a
chain of libraries deeper than ten. They all eventually just lead back
to libc.

So what you could do, when it comes time to run the initializers, is to
walk the dyn vectors again. Those are already allocated, and all the
libs they reference are already loaded, so load_library() would only be
a search operation. By the time the initializers run, that must be the
case, both at load time and at runtime. Unless loading the library
failed the first time, in which case at runtime we have already bailed.
And at load time we... haven't. Are you sure continuing on load failure
of a dependency during load time is a good idea?

> I don't think recursive can be done safely since you don't have a
> small bound on levels of recursion. It could be done with an explicit
> stack in allocated storage though, since the operation has to allocate
> memory anyway and has to fail if allocation fails.
> 
> Rich

Nevermind, it was a misunderstanding of the requirements.

Ciao,
Markus


  reply	other threads:[~2019-02-07 18:36 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-05 21:02 Alexey Izbyshev
2019-02-06 13:40 ` Alexey Izbyshev
2019-02-06 16:02 ` Markus Wichmann
2019-02-06 17:02   ` Alexey Izbyshev
2019-02-06 20:25     ` Markus Wichmann
2019-02-06 21:23       ` Alexey Izbyshev
2019-02-07  5:33         ` Markus Wichmann
2019-02-07 13:42           ` Alexey Izbyshev
2019-02-07 17:43             ` Markus Wichmann
2019-02-07 20:37               ` Markus Wichmann
2019-02-07 21:29               ` Rich Felker
2019-02-07 16:54           ` Rich Felker
2019-02-07 18:36             ` Markus Wichmann [this message]
2019-02-07 18:57               ` Rich Felker
2019-02-07 20:31                 ` Markus Wichmann
2019-02-07 21:33                   ` Rich Felker
2019-02-07 21:37                     ` Rich Felker
2019-02-08 10:19             ` A. Wilcox
2019-02-08 12:00               ` Szabolcs Nagy
2019-02-08 16:09                 ` Rich Felker
2019-02-09 22:53 Alexey Izbyshev
2019-02-10  1:03 ` Rich Felker
2019-02-26 15:07   ` Rich Felker
2019-03-04  2:11     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190207183637.GF5469@voyager \
    --to=nullplan@gmx.net \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).