* Re: dlsym(handle) may search in unrelated libraries
@ 2019-02-09 22:53 Alexey Izbyshev
2019-02-10 1:03 ` Rich Felker
0 siblings, 1 reply; 24+ messages in thread
From: Alexey Izbyshev @ 2019-02-09 22:53 UTC (permalink / raw)
To: musl
(Replying to https://www.openwall.com/lists/musl/2019/02/07/7)
> On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote:
> > On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote:
> > > I'm not sure if it makes sense to build both of these lists at the
> > > same time. We currently try to avoid allocating dependency lists for
> > > libraries loaded at startup in order not to allocate and setup data
> > > that might never be used, defering until if/when dlopen is called on
> > > them. I've wanted to do the ctor order walk without allocating a
> > > dependency list, but I don't know a good way to do so. Note that the
> > > code that runs ctors cannot allocate because, at the time it runs,
> > > dlopen has already "committed" the load and can't back it out. It has
> > > to have all resources it needs for the walk precommitted.
> >
> > Depth first means stack. Means recursion or explicit stack. Explicit is
> > going to be hard, considering there is no known limit to dependency
> > nesting depth. We could argue that the user better make their stack size
> > ulimit large enough for the main thread, but I hazard a guess that
> > nobody expects dlopen() to use overly much stack in other threads.
> >
> > Alright, what's the algorithm here?
> >
> > init(lib):
> > if (!lib.inited):
> > foreach d in lib.deps init(d)
> > start_init(lib)
> > lib.inited = 1
> >
> > That about right? Because that means we need a stack as high as the
> > nesting depth of dependencies. We can get a pessimistic estimation from
> > the number of deps loaded, but that may be way too high (see below).
>
> Yes, but you can also avoid recursion just by looping to the deepest
> dependency with !inited, then going back to the root. For a one-time
> operation at dlopen-time or program-start time, the quadratic search
> for each !inited seems unlikely to be a problem:
>
> > > Since the beginning of a list of breadth-first dependencies is just
> > > the direct dependencies, if we recorded the number of direct
> > > dependencies for each dso, the breadth-first lists could be used to
> > > perform a depth-first walk to run ctors; this can be made
> > > non-recursive, at the cost of quadratic time but with trivially-tiny
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > constant factor, by simply restarting at the root of the tree after
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > each node and finding the deepest not-yet-constructed dso. This
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We'd need breadth-first lists for all dsos to implement DFS as you
suggest because a single breadth-first list for the root is not enough
to reconstruct all edges of the dependency graph even if we know the
number of direct dependencies for each dso (we need to process each edge
for DFS to work correctly). Or am I misunderstanding your suggestion?
If we're willing to add some fields to struct dso to implement the
correct ctor order, I'd suggest to use a non-recursive DFS with an
explicit stack instead. We'd need just two fields in each stack slot
(dso and the number of processed DT_NEEDED entries), so the memory
overhead is low compared to sizeof(struct dso). The stack can be
preallocated before "committing" dlopen(). As Markus said, the upper
bound is the number of dsos loaded with this dlopen() call because all
other dsos are already constructed, so we don't need to visit them. This
stack can be freed immediately after ctors are run. We already have
dso->constructed to mark visited dsos, and we already use another list
for finalization (dso->fini_next), and that list would be built
naturally by DFS.
Alexey
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-09 22:53 dlsym(handle) may search in unrelated libraries Alexey Izbyshev @ 2019-02-10 1:03 ` Rich Felker 2019-02-26 15:07 ` Rich Felker 0 siblings, 1 reply; 24+ messages in thread From: Rich Felker @ 2019-02-10 1:03 UTC (permalink / raw) To: Alexey Izbyshev; +Cc: musl On Sun, Feb 10, 2019 at 01:53:02AM +0300, Alexey Izbyshev wrote: > (Replying to https://www.openwall.com/lists/musl/2019/02/07/7) > > >On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote: > >> On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote: > >> > I'm not sure if it makes sense to build both of these lists at the > >> > same time. We currently try to avoid allocating dependency lists for > >> > libraries loaded at startup in order not to allocate and setup data > >> > that might never be used, defering until if/when dlopen is called on > >> > them. I've wanted to do the ctor order walk without allocating a > >> > dependency list, but I don't know a good way to do so. Note that the > >> > code that runs ctors cannot allocate because, at the time it runs, > >> > dlopen has already "committed" the load and can't back it out. It has > >> > to have all resources it needs for the walk precommitted. > >> > >> Depth first means stack. Means recursion or explicit stack. Explicit is > >> going to be hard, considering there is no known limit to dependency > >> nesting depth. We could argue that the user better make their stack size > >> ulimit large enough for the main thread, but I hazard a guess that > >> nobody expects dlopen() to use overly much stack in other threads. > >> > >> Alright, what's the algorithm here? > >> > >> init(lib): > >> if (!lib.inited): > >> foreach d in lib.deps init(d) > >> start_init(lib) > >> lib.inited = 1 > >> > >> That about right? Because that means we need a stack as high as the > >> nesting depth of dependencies. We can get a pessimistic estimation from > >> the number of deps loaded, but that may be way too high (see below). > > > >Yes, but you can also avoid recursion just by looping to the deepest > >dependency with !inited, then going back to the root. For a one-time > >operation at dlopen-time or program-start time, the quadratic search > >for each !inited seems unlikely to be a problem: > > > >> > Since the beginning of a list of breadth-first dependencies is just > >> > the direct dependencies, if we recorded the number of direct > >> > dependencies for each dso, the breadth-first lists could be used to > >> > perform a depth-first walk to run ctors; this can be made > >> > non-recursive, at the cost of quadratic time but with trivially-tiny > >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > >> > constant factor, by simply restarting at the root of the tree after > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > >> > each node and finding the deepest not-yet-constructed dso. This > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > We'd need breadth-first lists for all dsos to implement DFS as you > suggest because a single breadth-first list for the root is not > enough to reconstruct all edges of the dependency graph even if we > know the number of direct dependencies for each dso (we need to > process each edge for DFS to work correctly). Or am I > misunderstanding your suggestion? I think you misunderstood. I was talking about having the full BFS list for each dso, but tagging each one with the count of entries which are direct dependencies, so that the corresponding initial subarray can be used for DFS. > If we're willing to add some fields to struct dso to implement the > correct ctor order, I'd suggest to use a non-recursive DFS with an > explicit stack instead. We'd need just two fields in each stack slot > (dso and the number of processed DT_NEEDED entries), so the memory > overhead is low compared to sizeof(struct dso). The stack can be > preallocated before "committing" dlopen(). As Markus said, the upper > bound is the number of dsos loaded with this dlopen() call because > all other dsos are already constructed, so we don't need to visit > them. This stack can be freed immediately after ctors are run. We > already have dso->constructed to mark visited dsos, and we already > use another list for finalization (dso->fini_next), and that list > would be built naturally by DFS. I think it might get more complicated with concurrent and recursive calls to dlopen (e.g. dlopen from a ctor). Right now the init_fini_lock prevents a dlopen from another thread from completing as long as ctors are still running from a previous call, except recursively (in the same thread) thanks to a recursive mutex. Really, though, we should never be holding libc locks while application code runs, so the lock should be released each time a ctor is run and re-taken after it finishes. I'd considered preallocating the explicit stack structure as part of the dsos, which is problematic because multiple callers might need to use it. However if the explicit stack were allocated by and just tied to the dlopen call frame instance, I think it should work. There may be deps that were already loaded from another thread's dlopen, though, but which were not yet constructed, so the bound is probably not the number of newly-loaded dsos but the total number of deps or the max depth of deps. This is probably each to record from building the BFS list. Rich ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-10 1:03 ` Rich Felker @ 2019-02-26 15:07 ` Rich Felker 2019-03-04 2:11 ` Rich Felker 0 siblings, 1 reply; 24+ messages in thread From: Rich Felker @ 2019-02-26 15:07 UTC (permalink / raw) To: Alexey Izbyshev; +Cc: musl On Sat, Feb 09, 2019 at 08:03:47PM -0500, Rich Felker wrote: > On Sun, Feb 10, 2019 at 01:53:02AM +0300, Alexey Izbyshev wrote: > > (Replying to https://www.openwall.com/lists/musl/2019/02/07/7) > > > > >On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote: > > >> On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote: > > >> > I'm not sure if it makes sense to build both of these lists at the > > >> > same time. We currently try to avoid allocating dependency lists for > > >> > libraries loaded at startup in order not to allocate and setup data > > >> > that might never be used, defering until if/when dlopen is called on > > >> > them. I've wanted to do the ctor order walk without allocating a > > >> > dependency list, but I don't know a good way to do so. Note that the > > >> > code that runs ctors cannot allocate because, at the time it runs, > > >> > dlopen has already "committed" the load and can't back it out. It has > > >> > to have all resources it needs for the walk precommitted. > > >> > > >> Depth first means stack. Means recursion or explicit stack. Explicit is > > >> going to be hard, considering there is no known limit to dependency > > >> nesting depth. We could argue that the user better make their stack size > > >> ulimit large enough for the main thread, but I hazard a guess that > > >> nobody expects dlopen() to use overly much stack in other threads. > > >> > > >> Alright, what's the algorithm here? > > >> > > >> init(lib): > > >> if (!lib.inited): > > >> foreach d in lib.deps init(d) > > >> start_init(lib) > > >> lib.inited = 1 > > >> > > >> That about right? Because that means we need a stack as high as the > > >> nesting depth of dependencies. We can get a pessimistic estimation from > > >> the number of deps loaded, but that may be way too high (see below). > > > > > >Yes, but you can also avoid recursion just by looping to the deepest > > >dependency with !inited, then going back to the root. For a one-time > > >operation at dlopen-time or program-start time, the quadratic search > > >for each !inited seems unlikely to be a problem: > > > > > >> > Since the beginning of a list of breadth-first dependencies is just > > >> > the direct dependencies, if we recorded the number of direct > > >> > dependencies for each dso, the breadth-first lists could be used to > > >> > perform a depth-first walk to run ctors; this can be made > > >> > non-recursive, at the cost of quadratic time but with trivially-tiny > > >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > >> > constant factor, by simply restarting at the root of the tree after > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > >> > each node and finding the deepest not-yet-constructed dso. This > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > We'd need breadth-first lists for all dsos to implement DFS as you > > suggest because a single breadth-first list for the root is not > > enough to reconstruct all edges of the dependency graph even if we > > know the number of direct dependencies for each dso (we need to > > process each edge for DFS to work correctly). Or am I > > misunderstanding your suggestion? > > I think you misunderstood. I was talking about having the full BFS > list for each dso, but tagging each one with the count of entries > which are direct dependencies, so that the corresponding initial > subarray can be used for DFS. > > > If we're willing to add some fields to struct dso to implement the > > correct ctor order, I'd suggest to use a non-recursive DFS with an > > explicit stack instead. We'd need just two fields in each stack slot > > (dso and the number of processed DT_NEEDED entries), so the memory > > overhead is low compared to sizeof(struct dso). The stack can be > > preallocated before "committing" dlopen(). As Markus said, the upper > > bound is the number of dsos loaded with this dlopen() call because > > all other dsos are already constructed, so we don't need to visit > > them. This stack can be freed immediately after ctors are run. We > > already have dso->constructed to mark visited dsos, and we already > > use another list for finalization (dso->fini_next), and that list > > would be built naturally by DFS. > > I think it might get more complicated with concurrent and recursive > calls to dlopen (e.g. dlopen from a ctor). Right now the > init_fini_lock prevents a dlopen from another thread from completing > as long as ctors are still running from a previous call, except > recursively (in the same thread) thanks to a recursive mutex. Really, > though, we should never be holding libc locks while application code > runs, so the lock should be released each time a ctor is run and > re-taken after it finishes. > > I'd considered preallocating the explicit stack structure as part of > the dsos, which is problematic because multiple callers might need to > use it. However if the explicit stack were allocated by and just tied > to the dlopen call frame instance, I think it should work. There may > be deps that were already loaded from another thread's dlopen, though, > but which were not yet constructed, so the bound is probably not the > number of newly-loaded dsos but the total number of deps or the max > depth of deps. This is probably each to record from building the BFS > list. I've gotten back to thinking about this, and here's a rough idea of the algorithm I'm thinking of using: Load-direct-deps operation for a single DSO: - Count DT_NEEDED entries, store number of direct deps. - Allocate storage for direct deps list. - Call load_library for each direct dep, save pointer to it. Load-deps operation for newly dlopen'd DSO: - Start with all DSOs unmarked - Use the deps list as a queue. For each entry in queue: - If the DSO is marked, skip it. - If the DSO doesn't have a deps list, perform load-direct-deps. - Append the initial n_direct_deps entries to the queue. - Mark the DSO. - When end of queue is reached, finish and unmark all DSOs. This should leave us with direct-dep lists for all DSOs, usable for dependency-order ctor execution, and a non-redundant (currently, the list can be highly redundant and thus inefficient to dlsym) BFS deps list for the dlopen'd DSO. The direct-dep lists for DSOs not explicitly opened also form the initial segment of any future BFS deps list needed when dlopen is called on them. The above is entirely non-recursive and uses no temp storage; the data is generated in-place. Walking the direct-deps lists for ctor execution can either use recursion or the "restart" approach I've described before. Rich ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-26 15:07 ` Rich Felker @ 2019-03-04 2:11 ` Rich Felker 0 siblings, 0 replies; 24+ messages in thread From: Rich Felker @ 2019-03-04 2:11 UTC (permalink / raw) To: Alexey Izbyshev; +Cc: musl On Tue, Feb 26, 2019 at 10:07:33AM -0500, Rich Felker wrote: > On Sat, Feb 09, 2019 at 08:03:47PM -0500, Rich Felker wrote: > > On Sun, Feb 10, 2019 at 01:53:02AM +0300, Alexey Izbyshev wrote: > > > (Replying to https://www.openwall.com/lists/musl/2019/02/07/7) > > > > > > >On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote: > > > >> On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote: > > > >> > I'm not sure if it makes sense to build both of these lists at the > > > >> > same time. We currently try to avoid allocating dependency lists for > > > >> > libraries loaded at startup in order not to allocate and setup data > > > >> > that might never be used, defering until if/when dlopen is called on > > > >> > them. I've wanted to do the ctor order walk without allocating a > > > >> > dependency list, but I don't know a good way to do so. Note that the > > > >> > code that runs ctors cannot allocate because, at the time it runs, > > > >> > dlopen has already "committed" the load and can't back it out. It has > > > >> > to have all resources it needs for the walk precommitted. > > > >> > > > >> Depth first means stack. Means recursion or explicit stack. Explicit is > > > >> going to be hard, considering there is no known limit to dependency > > > >> nesting depth. We could argue that the user better make their stack size > > > >> ulimit large enough for the main thread, but I hazard a guess that > > > >> nobody expects dlopen() to use overly much stack in other threads. > > > >> > > > >> Alright, what's the algorithm here? > > > >> > > > >> init(lib): > > > >> if (!lib.inited): > > > >> foreach d in lib.deps init(d) > > > >> start_init(lib) > > > >> lib.inited = 1 > > > >> > > > >> That about right? Because that means we need a stack as high as the > > > >> nesting depth of dependencies. We can get a pessimistic estimation from > > > >> the number of deps loaded, but that may be way too high (see below). > > > > > > > >Yes, but you can also avoid recursion just by looping to the deepest > > > >dependency with !inited, then going back to the root. For a one-time > > > >operation at dlopen-time or program-start time, the quadratic search > > > >for each !inited seems unlikely to be a problem: > > > > > > > >> > Since the beginning of a list of breadth-first dependencies is just > > > >> > the direct dependencies, if we recorded the number of direct > > > >> > dependencies for each dso, the breadth-first lists could be used to > > > >> > perform a depth-first walk to run ctors; this can be made > > > >> > non-recursive, at the cost of quadratic time but with trivially-tiny > > > >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > >> > constant factor, by simply restarting at the root of the tree after > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > >> > each node and finding the deepest not-yet-constructed dso. This > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > We'd need breadth-first lists for all dsos to implement DFS as you > > > suggest because a single breadth-first list for the root is not > > > enough to reconstruct all edges of the dependency graph even if we > > > know the number of direct dependencies for each dso (we need to > > > process each edge for DFS to work correctly). Or am I > > > misunderstanding your suggestion? > > > > I think you misunderstood. I was talking about having the full BFS > > list for each dso, but tagging each one with the count of entries > > which are direct dependencies, so that the corresponding initial > > subarray can be used for DFS. > > > > > If we're willing to add some fields to struct dso to implement the > > > correct ctor order, I'd suggest to use a non-recursive DFS with an > > > explicit stack instead. We'd need just two fields in each stack slot > > > (dso and the number of processed DT_NEEDED entries), so the memory > > > overhead is low compared to sizeof(struct dso). The stack can be > > > preallocated before "committing" dlopen(). As Markus said, the upper > > > bound is the number of dsos loaded with this dlopen() call because > > > all other dsos are already constructed, so we don't need to visit > > > them. This stack can be freed immediately after ctors are run. We > > > already have dso->constructed to mark visited dsos, and we already > > > use another list for finalization (dso->fini_next), and that list > > > would be built naturally by DFS. > > > > I think it might get more complicated with concurrent and recursive > > calls to dlopen (e.g. dlopen from a ctor). Right now the > > init_fini_lock prevents a dlopen from another thread from completing > > as long as ctors are still running from a previous call, except > > recursively (in the same thread) thanks to a recursive mutex. Really, > > though, we should never be holding libc locks while application code > > runs, so the lock should be released each time a ctor is run and > > re-taken after it finishes. > > > > I'd considered preallocating the explicit stack structure as part of > > the dsos, which is problematic because multiple callers might need to > > use it. However if the explicit stack were allocated by and just tied > > to the dlopen call frame instance, I think it should work. There may > > be deps that were already loaded from another thread's dlopen, though, > > but which were not yet constructed, so the bound is probably not the > > number of newly-loaded dsos but the total number of deps or the max > > depth of deps. This is probably each to record from building the BFS > > list. > > I've gotten back to thinking about this, and here's a rough idea of > the algorithm I'm thinking of using: > > Load-direct-deps operation for a single DSO: > - Count DT_NEEDED entries, store number of direct deps. > - Allocate storage for direct deps list. > - Call load_library for each direct dep, save pointer to it. > > Load-deps operation for newly dlopen'd DSO: > - Start with all DSOs unmarked > - Use the deps list as a queue. For each entry in queue: > - If the DSO is marked, skip it. > - If the DSO doesn't have a deps list, perform load-direct-deps. > - Append the initial n_direct_deps entries to the queue. > - Mark the DSO. > - When end of queue is reached, finish and unmark all DSOs. > > This should leave us with direct-dep lists for all DSOs, usable for > dependency-order ctor execution, and a non-redundant (currently, the > list can be highly redundant and thus inefficient to dlsym) BFS deps > list for the dlopen'd DSO. The direct-dep lists for DSOs not > explicitly opened also form the initial segment of any future BFS deps > list needed when dlopen is called on them. > > The above is entirely non-recursive and uses no temp storage; the data > is generated in-place. Walking the direct-deps lists for ctor > execution can either use recursion or the "restart" approach I've > described before. These and related changes are now upstream in git master, so the dlsym issue should be fixed. Please let me know if it still doesn't work right or if there are new problems. Rich ^ permalink raw reply [flat|nested] 24+ messages in thread
* dlsym(handle) may search in unrelated libraries @ 2019-02-05 21:02 Alexey Izbyshev 2019-02-06 13:40 ` Alexey Izbyshev 2019-02-06 16:02 ` Markus Wichmann 0 siblings, 2 replies; 24+ messages in thread From: Alexey Izbyshev @ 2019-02-05 21:02 UTC (permalink / raw) To: musl Hello! I've discovered a bug in musl dynamic loader (tested on 1.1.21) which is demonstrated by the following simple example: $ cat bar.c int bar = 42; $ musl-gcc -fPIC -shared bar.c -o libbar.so $ cat foo.c extern int bar; int *foo = &bar; $ musl-gcc -fPIC -shared foo.c -L. -lbar -Wl,-rpath='$ORIGIN' -o libfoo.so $ cat main.c #include <dlfcn.h> #include <stdio.h> int main(void) { if (!dlopen("libfoo.so", RTLD_NOW)) return 1; void *h = dlopen("libc.so.6", RTLD_NOW); printf("%p\n", dlsym(h, "bar")); } $ musl-gcc main.c -Wl,-rpath='$ORIGIN' -ldl $ ./a.out 0x7fd7ebe96020 dlsym(handle) is supposed to search only in the library referred to by the handle and in its dependencies. "libc.so.6" doesn't have dependencies and doesn't have a definition for "bar", so dlsym(h, "bar") should return NULL, but it finds "bar" in libbar.so instead. The problem occurs because of the following: 1) Initially, "deps" in dso structure for libc.so.6 is NULL. 2) When dlopen("libc.so.6") is called, "first_load" is true, despite that it's not actually the first load (ldso/dynlink.c:1835): /* First load handling */ int first_load = !p->deps; if (first_load) { load_deps(p); 3) load_deps() then iterates over the dso list starting from "libc.so.6", treating all libraries found in DT_NEEDED of each processed dso as dependencies of "libc.so.6". However, the dso list already contains "libfoo.so" loaded earlier, so "libbar.so" (which is needed by "libfoo.so") is treated as a dependency of "libc.so.6". As a result, dlsym(h, "bar") succeeds. It's also notable that "libfoo.so" and "libbar.so" were loaded with RTLD_LOCAL, but this bug effectively makes their symbols available in such searches regardless of the scope of a library used with dlsym(). ISTM that load_deps(p) was written to work only in real "first load" situations, where "p" is initially the last dso in the list, and new dsos are only added to the list in the course of recursive loading of the dependencies of "p". Could this be fixed? Thanks! (Please CC me on replying, I'm not subscribed to the list.) Alexey ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-05 21:02 Alexey Izbyshev @ 2019-02-06 13:40 ` Alexey Izbyshev 2019-02-06 16:02 ` Markus Wichmann 1 sibling, 0 replies; 24+ messages in thread From: Alexey Izbyshev @ 2019-02-06 13:40 UTC (permalink / raw) To: musl Alexander Monakov noticed a confusing detail in my report: I used "libc.so.6" in my example despite that musl's "make install" creates "libc.so" only. To clarify, "libc.so.6" has nothing to do with glibc, and "libc.so.6" may be replaced with just "libc.so" in my example without any change in semantics. Alexey On 2/6/19 12:02 AM, Alexey Izbyshev wrote: > Hello! > > I've discovered a bug in musl dynamic loader (tested on 1.1.21) which > is demonstrated by the following simple example: > > $ cat bar.c > int bar = 42; > $ musl-gcc -fPIC -shared bar.c -o libbar.so > $ cat foo.c > extern int bar; > int *foo = &bar; > $ musl-gcc -fPIC -shared foo.c -L. -lbar -Wl,-rpath='$ORIGIN' -o > libfoo.so > $ cat main.c > #include <dlfcn.h> > #include <stdio.h> > > int main(void) { > if (!dlopen("libfoo.so", RTLD_NOW)) > return 1; > void *h = dlopen("libc.so.6", RTLD_NOW); > printf("%p\n", dlsym(h, "bar")); > } > $ musl-gcc main.c -Wl,-rpath='$ORIGIN' -ldl > $ ./a.out > 0x7fd7ebe96020 > > dlsym(handle) is supposed to search only in the library referred to by > the handle and in its dependencies. "libc.so.6" doesn't have > dependencies and doesn't have a definition for "bar", so dlsym(h, > "bar") should return NULL, but it finds "bar" in libbar.so instead. > > The problem occurs because of the following: > 1) Initially, "deps" in dso structure for libc.so.6 is NULL. > 2) When dlopen("libc.so.6") is called, "first_load" is true, despite > that it's not actually the first load (ldso/dynlink.c:1835): > > /* First load handling */ > int first_load = !p->deps; > if (first_load) { > load_deps(p); > > 3) load_deps() then iterates over the dso list starting from > "libc.so.6", treating all libraries found in DT_NEEDED of each > processed dso as dependencies of "libc.so.6". However, the dso list > already contains "libfoo.so" loaded earlier, so "libbar.so" (which is > needed by "libfoo.so") is treated as a dependency of "libc.so.6". As a > result, dlsym(h, "bar") succeeds. > > It's also notable that "libfoo.so" and "libbar.so" were loaded with > RTLD_LOCAL, but this bug effectively makes their symbols available in > such searches regardless of the scope of a library used with dlsym(). > > ISTM that load_deps(p) was written to work only in real "first load" > situations, where "p" is initially the last dso in the list, and new > dsos are only added to the list in the course of recursive loading of > the dependencies of "p". > > Could this be fixed? Thanks! > > (Please CC me on replying, I'm not subscribed to the list.) > > Alexey ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-05 21:02 Alexey Izbyshev 2019-02-06 13:40 ` Alexey Izbyshev @ 2019-02-06 16:02 ` Markus Wichmann 2019-02-06 17:02 ` Alexey Izbyshev 1 sibling, 1 reply; 24+ messages in thread From: Markus Wichmann @ 2019-02-06 16:02 UTC (permalink / raw) To: musl; +Cc: Alexey Izbyshev [-- Attachment #1: Type: text/plain, Size: 1260 bytes --] On Wed, Feb 06, 2019 at 12:02:39AM +0300, Alexey Izbyshev wrote: > Hello! > > I've discovered a bug in musl dynamic loader (tested on 1.1.21) which is > demonstrated by the following simple example: > > $ cat bar.c > int bar = 42; > $ musl-gcc -fPIC -shared bar.c -o libbar.so > $ cat foo.c > extern int bar; > int *foo = &bar; > $ musl-gcc -fPIC -shared foo.c -L. -lbar -Wl,-rpath='$ORIGIN' -o libfoo.so > $ cat main.c > #include <dlfcn.h> > #include <stdio.h> > > int main(void) { > if (!dlopen("libfoo.so", RTLD_NOW)) > return 1; > void *h = dlopen("libc.so.6", RTLD_NOW); > printf("%p\n", dlsym(h, "bar")); > } > $ musl-gcc main.c -Wl,-rpath='$ORIGIN' -ldl > $ ./a.out > 0x7fd7ebe96020 > In case you were wondering, your typo here doesn't change anything, since "libc.so.6" has the prefix "libc.", which is recogized as reserved in load_library(), and makes dlopen() return a handle to the libc. Thankfully the patch is simple: Explicitly make ldso and vdso have no deps. I was tempted to put this into kernel_mapped_dso(), but then I remembered that the app is also a kernel mapped dso, and it usually does have deps that need processing. At least, in nontrivial cases. The attached patch should tide you over. > > Alexey Ciao, Markus [-- Attachment #2: 0006-Make-libc-and-vdso-explicitly-have-no-deps.patch --] [-- Type: text/x-diff, Size: 1443 bytes --] From e823910d69ff56ffccecaa9b29fd4b67b901798a Mon Sep 17 00:00:00 2001 From: Markus Wichmann <nullplan@gmx.net> Date: Wed, 6 Feb 2019 16:51:53 +0100 Subject: [PATCH 6/6] Make libc and vdso explicitly have no deps. Alexey Izbyshev reported that without this, dlopen("libc.so") returns a handle that is capable of finding every symbol in libraries loaded as dependencies, since dso->deps == 0 usually means dependencies haven't been loaded. --- ldso/dynlink.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/ldso/dynlink.c b/ldso/dynlink.c index ec921dfd..6ffeca85 100644 --- a/ldso/dynlink.c +++ b/ldso/dynlink.c @@ -1244,6 +1244,7 @@ static void reloc_all(struct dso *p) static void kernel_mapped_dso(struct dso *p) { size_t min_addr = -1, max_addr = 0, cnt; + static const struct dso *sentinel = 0; Phdr *ph = p->phdr; for (cnt = p->phnum; cnt--; ph = (void *)((char *)ph + p->phentsize)) { if (ph->p_type == PT_DYNAMIC) { @@ -1428,6 +1429,7 @@ hidden void __dls2(unsigned char *base, size_t *sp) ldso.phdr = laddr(&ldso, ehdr->e_phoff); ldso.phentsize = ehdr->e_phentsize; kernel_mapped_dso(&ldso); + ldso.deps = (struct dso**)&nodeps_dummy; decode_dyn(&ldso); if (DL_FDPIC) makefuncdescs(&ldso); @@ -1675,6 +1677,7 @@ _Noreturn void __dls3(size_t *sp) vdso.prev = tail; tail->next = &vdso; tail = &vdso; + vdso.deps = (struct dso**)&nodeps_dummy; } for (i=0; app.dynv[i]; i+=2) { -- 2.20.1 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-06 16:02 ` Markus Wichmann @ 2019-02-06 17:02 ` Alexey Izbyshev 2019-02-06 20:25 ` Markus Wichmann 0 siblings, 1 reply; 24+ messages in thread From: Alexey Izbyshev @ 2019-02-06 17:02 UTC (permalink / raw) To: Markus Wichmann, musl On 2/6/19 7:02 PM, Markus Wichmann wrote: > > Thankfully the patch is simple: Explicitly make ldso and vdso have no > deps. I was tempted to put this into kernel_mapped_dso(), but then I > remembered that the app is also a kernel mapped dso, and it usually does > have deps that need processing. At least, in nontrivial cases. > > The attached patch should tide you over. > Thank you for the quick response and the patch, Markus! Your patch fixes the exact test case I posted. Unfortunately, my test case was a simplified example of a general problem: dso->deps is assigned only for the main app and for libraries opened with dlopen(), but not for their dependencies. Consider the following: $ cat bar.c int bar = 42; $ musl-gcc -fPIC -shared bar.c -o libbar.so $ cat foo.c extern int bar; int *foo = &bar; $ cat baz.c extern int bazdep; int *baz = &bazdep; $ cat bazdep.c int bazdep = 1; $ cat main.c #include <dlfcn.h> #include <stdio.h> int main(void) { if (!dlopen("libbaz.so", RTLD_NOW|RTLD_LOCAL)) return 1; if (!dlopen("libfoo.so", RTLD_NOW|RTLD_LOCAL)) return 1; void *h = dlopen("libbazdep.so", RTLD_NOW|RTLD_LOCAL); printf("%p\n", dlsym(h, "bar")); } $ musl-gcc main.c -Wl,-rpath='$ORIGIN' -ldl $ ./a.out 0x7f66ed371020 Here, "libbazdep.so" assumes the role of "libc.so" from the previous test: it's a library with dso->deps == NULL that is loaded before "libfoo.so". So, when "libbazdep.so" is dlopen'd, musl considers it to be a "first load" and erroneously includes "libbar.so" to the list of dependencies of "libbazdep.so". Alexey ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-06 17:02 ` Alexey Izbyshev @ 2019-02-06 20:25 ` Markus Wichmann 2019-02-06 21:23 ` Alexey Izbyshev 0 siblings, 1 reply; 24+ messages in thread From: Markus Wichmann @ 2019-02-06 20:25 UTC (permalink / raw) To: musl; +Cc: Alexey Izbyshev [-- Attachment #1: Type: text/plain, Size: 999 bytes --] On Wed, Feb 06, 2019 at 08:02:28PM +0300, Alexey Izbyshev wrote: > Unfortunately, my test case was a simplified example of a general problem: > dso->deps is assigned only for the main app and for libraries opened with > dlopen(), but not for their dependencies. Consider the following: Right you are. It took me a while to understand what the deps array was even for (since musl's dlclose() doesn't do anything, tracking dependencies is mostly pointless), but I found it is needed for lazy relocation processing. So it is necessary for all libs opened by dlopen() directly to contain a list of all their dependencies. All the other libs can have an empty list. So I propose the attached patch in addition to the previous one. This will set dso->deps to the empty list in all libs not directly loaded from dlopen(). The previous patch is still necessary, as nothing ever calls load_deps() on the libc or the vdso, but all other modules get a load_deps() treatment. > > Alexey > > Ciao, Markus [-- Attachment #2: 0007-Initialize-deps-on-non-directly-loaded-libs.patch --] [-- Type: text/x-diff, Size: 964 bytes --] From 841dd4e075040e2aeb01adea8ef5e2f7c0fc006a Mon Sep 17 00:00:00 2001 From: Markus Wichmann <nullplan@gmx.net> Date: Wed, 6 Feb 2019 21:13:05 +0100 Subject: [PATCH 7/7] Initialize deps on non-directly loaded libs. As pointed out by Alexey Izbyshev, having the deps member be zero opens dlopen() and dlsym() up to malfunctions, if a library was previously loaded as dependency and is then dlopen()ed. Therefore, we now set the deps member of the dso descriptor to the sentry value in all libs loaded as dependencies. --- ldso/dynlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ldso/dynlink.c b/ldso/dynlink.c index 6ffeca85..f8346c54 100644 --- a/ldso/dynlink.c +++ b/ldso/dynlink.c @@ -1158,8 +1158,8 @@ static void load_deps(struct dso *p) *deps = tmp; } } + if (!p->deps) p->deps = (struct dso**)&nodeps_dummy; } - if (!*deps) *deps = (struct dso **)&nodeps_dummy; } static void load_preload(char *s) -- 2.20.1 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-06 20:25 ` Markus Wichmann @ 2019-02-06 21:23 ` Alexey Izbyshev 2019-02-07 5:33 ` Markus Wichmann 0 siblings, 1 reply; 24+ messages in thread From: Alexey Izbyshev @ 2019-02-06 21:23 UTC (permalink / raw) To: Markus Wichmann; +Cc: musl On 2019-02-06 23:25, Markus Wichmann wrote: > Right you are. It took me a while to understand what the deps array was > even for (since musl's dlclose() doesn't do anything, tracking > dependencies is mostly pointless), but I found it is needed for lazy > relocation processing. So it is necessary for all libs opened by > dlopen() directly to contain a list of all their dependencies. All the > other libs can have an empty list. Actually, dso->deps is used in dlsym(handle) because it must use the dependency order for symbol search, so it's incorrect to have deps empty for "all the other" libs. Consider the following modification of my previous example: $ cat bazdep.c int bazdep = 1; extern int bazdepdep; int *p = &bazdepdep; $ cat bazdepdep.c int bazdepdep = 2; $ cat main.c #include <dlfcn.h> #include <stdio.h> int main(void) { if (!dlopen("libbaz.so", RTLD_NOW|RTLD_LOCAL)) return 1; if (!dlopen("libfoo.so", RTLD_NOW|RTLD_LOCAL)) return 1; void *h = dlopen("libbazdep.so", RTLD_NOW|RTLD_LOCAL); printf("%p\n", dlsym(h, "bar")); printf("%p\n", dlsym(h, "bazdepdep")); } The correct output is zero in the first line and some non-zero address in the second. Vanilla musl 1.1.21 prints two non-zero addresses. But with your patch the output is two zeros because dlsym() can't search in dependencies of "libbazdep.so" anymore. Alexey ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-06 21:23 ` Alexey Izbyshev @ 2019-02-07 5:33 ` Markus Wichmann 2019-02-07 13:42 ` Alexey Izbyshev 2019-02-07 16:54 ` Rich Felker 0 siblings, 2 replies; 24+ messages in thread From: Markus Wichmann @ 2019-02-07 5:33 UTC (permalink / raw) To: Alexey Izbyshev; +Cc: musl On Thu, Feb 07, 2019 at 12:23:06AM +0300, Alexey Izbyshev wrote: > On 2019-02-06 23:25, Markus Wichmann wrote: > > Right you are. It took me a while to understand what the deps array was > > even for (since musl's dlclose() doesn't do anything, tracking > > dependencies is mostly pointless), but I found it is needed for lazy > > relocation processing. So it is necessary for all libs opened by > > dlopen() directly to contain a list of all their dependencies. All the > > other libs can have an empty list. > > Actually, dso->deps is used in dlsym(handle) because it must use the > dependency order for symbol search, so it's incorrect to have deps empty for > "all the other" libs. Consider the following modification of my previous > example: > > $ cat bazdep.c > int bazdep = 1; > extern int bazdepdep; > int *p = &bazdepdep; > $ cat bazdepdep.c > int bazdepdep = 2; > $ cat main.c > #include <dlfcn.h> > #include <stdio.h> > > int main(void) { > if (!dlopen("libbaz.so", RTLD_NOW|RTLD_LOCAL)) > return 1; > if (!dlopen("libfoo.so", RTLD_NOW|RTLD_LOCAL)) > return 1; > void *h = dlopen("libbazdep.so", RTLD_NOW|RTLD_LOCAL); > printf("%p\n", dlsym(h, "bar")); > printf("%p\n", dlsym(h, "bazdepdep")); > } > > The correct output is zero in the first line and some non-zero address in > the second. Vanilla musl 1.1.21 prints two non-zero addresses. But with your > patch the output is two zeros because dlsym() can't search in dependencies > of "libbazdep.so" anymore. > > Alexey OK, so life just got more interesting. I gather the deps handling was always incorrect. Let's consider the original code. liba depends on libb, which depends on libc. dlopen("liba") returns a handle with libb and libc in the deps, but libb->deps == 0. If we now call dlopen("libb"), that does the right thing, but only because libb happens to be the last lib in the chain. If we'd have loaded libx, liby, and libz before trying libb, it would add all the symbols of libs x, y, and z to the libb handle. I guess the hope was that this situation never arrises. So how do we fix this? I think the easiest is probably going to be to patch up load_deps, but avoiding recursion is going to be the fun part. My plan is to make dso->deps contain all direct and indirect dependencies (which is what the code seems to depend on, anyway). This is going to consume more memory, but we are talking a few pointers, and we are dealing with shared libs, anyway. As you said, order is important. What is the correct order, depth-first or breadth-first? I think it should be depth-first, but lack any authoritative knowledge on this. It would make the most sense, anyway (if, from the point of view of a user a library contains all the symbols of its dependencies, then those dependencies must also contain all the symbols of their dependencies). So with the following dependency tree: liba->libb->libc `>libx->liby the handle for liba would list libc before libx. Easiest implementation is probably still going to be recursive. Let's hope the dependency trees don't get too wild. I'll look into it after work. Ciao, Markus ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-07 5:33 ` Markus Wichmann @ 2019-02-07 13:42 ` Alexey Izbyshev 2019-02-07 17:43 ` Markus Wichmann 2019-02-07 16:54 ` Rich Felker 1 sibling, 1 reply; 24+ messages in thread From: Alexey Izbyshev @ 2019-02-07 13:42 UTC (permalink / raw) To: Markus Wichmann; +Cc: musl On 2/7/19 8:33 AM, Markus Wichmann wrote: > Let's consider the original code. liba depends on libb, which depends on > libc. dlopen("liba") returns a handle with libb and libc in the deps, > but libb->deps == 0. If we now call dlopen("libb"), that does the right > thing, but only because libb happens to be the last lib in the chain. If > we'd have loaded libx, liby, and libz before trying libb, it would add > all the symbols of libs x, y, and z to the libb handle. Your description almost captures the problem, but is imprecise in the last part: "it would add all the symbols of libs x, y, and z to the libb handle". load_deps() looks only at DT_NEEDED entries of libraries it iterates over, so, for example, if libx depends on both liby and libz, then liby and libz (but not libx) would be added to deps of libb. Moreover, consider the following dependency hierarchy (loaded on dlopen("liba")): liba libb libd libe In this case, even dlopen("libb") wouldn't do the right thing because load_deps() would find libe in DT_NEEDED of libd and add it to deps of libb. > > As you said, order is important. What is the correct order, depth-first > or breadth-first? I think it should be depth-first, but lack any > authoritative knowledge on this. dlsym(handle) uses so-called "dependency order"[1], which is breadth-first[2]. This is what musl current does in cases when load_deps() is called on a real first load of a library (so that everything that's further in the dso list are implicitly loaded dependencies of this library). So with the following dependency tree: > > liba->libb->libc > `>libx->liby > > the handle for liba would list libc before libx. > The correct order is what load_deps() does currently: liba libb libx libc liby > Easiest implementation is probably still going to be recursive. Let's hope the dependency trees don't get too wild. I think the easiest way is simply to modify load_deps() to always traverse DT_NEEDED in breadth-first order without relying on the dso list in the outer loop. load_deps() already effectively maintains a queue (deps) that can be used for BFS, so no recursion is needed. > > I'll look into it after work. > Thanks for following this up, Markus! [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlsym.html [2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlopen.html Alexey ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-07 13:42 ` Alexey Izbyshev @ 2019-02-07 17:43 ` Markus Wichmann 2019-02-07 20:37 ` Markus Wichmann 2019-02-07 21:29 ` Rich Felker 0 siblings, 2 replies; 24+ messages in thread From: Markus Wichmann @ 2019-02-07 17:43 UTC (permalink / raw) To: Alexey Izbyshev; +Cc: musl [-- Attachment #1: Type: text/plain, Size: 1861 bytes --] On Thu, Feb 07, 2019 at 04:42:15PM +0300, Alexey Izbyshev wrote: > I think the easiest way is simply to modify load_deps() to always traverse > DT_NEEDED in breadth-first order without relying on the dso list in the > outer loop. load_deps() already effectively maintains a queue (deps) that > can be used for BFS, so no recursion is needed. OK, since we have to implement a BFS, that does in fact work. So I implemented that. Still needs testing, though. One side effect is, patch 7 from the previous mail was reverted. Another is that now load_deps() depends on the deps array as loop structure. I was almost as far as just using the runtime code and adding an a_crash() in case of allocation failure during loadtime, but then I decided to just split loadtime and runtime apart. So load_deps_loadtime() is just a copy of load_deps(), refactored with the assumption runtime==0, and load_deps_runtime() is a copy of load_deps(), with the patch discussed here and refactored under the assumption runtime!=0. I had noticed, during the refactoring, that this means that app->deps == {0}, always. So I wondered if that might bite us. However, the only normal way to obtain a handle to the app itself is to call dlsym() with RTLD_NEXT. Which is one of the special symbols that will load symbols from the given DSO and all following ones in the symbol list. And all of the main app's dependencies are immediately added to the symbol list after the first load_deps() call (now load_deps_loadtime()). For Rich's comfort, I am attaching patch 6 again, so all relevant patches are in one mail. > [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlsym.html > [2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlopen.html > Right you are, again. Maybe I should read more POSIX before working on these things. :-) > Alexey > Ciao, Markus [-- Attachment #2: 0006-Make-libc-and-vdso-explicitly-have-no-deps.patch --] [-- Type: text/x-diff, Size: 1443 bytes --] From e823910d69ff56ffccecaa9b29fd4b67b901798a Mon Sep 17 00:00:00 2001 From: Markus Wichmann <nullplan@gmx.net> Date: Wed, 6 Feb 2019 16:51:53 +0100 Subject: [PATCH 6/9] Make libc and vdso explicitly have no deps. Alexey Izbyshev reported that without this, dlopen("libc.so") returns a handle that is capable of finding every symbol in libraries loaded as dependencies, since dso->deps == 0 usually means dependencies haven't been loaded. --- ldso/dynlink.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/ldso/dynlink.c b/ldso/dynlink.c index ec921dfd..6ffeca85 100644 --- a/ldso/dynlink.c +++ b/ldso/dynlink.c @@ -1244,6 +1244,7 @@ static void reloc_all(struct dso *p) static void kernel_mapped_dso(struct dso *p) { size_t min_addr = -1, max_addr = 0, cnt; + static const struct dso *sentinel = 0; Phdr *ph = p->phdr; for (cnt = p->phnum; cnt--; ph = (void *)((char *)ph + p->phentsize)) { if (ph->p_type == PT_DYNAMIC) { @@ -1428,6 +1429,7 @@ hidden void __dls2(unsigned char *base, size_t *sp) ldso.phdr = laddr(&ldso, ehdr->e_phoff); ldso.phentsize = ehdr->e_phentsize; kernel_mapped_dso(&ldso); + ldso.deps = (struct dso**)&nodeps_dummy; decode_dyn(&ldso); if (DL_FDPIC) makefuncdescs(&ldso); @@ -1675,6 +1677,7 @@ _Noreturn void __dls3(size_t *sp) vdso.prev = tail; tail->next = &vdso; tail = &vdso; + vdso.deps = (struct dso**)&nodeps_dummy; } for (i=0; app.dynv[i]; i+=2) { -- 2.20.1 [-- Attachment #3: 0009-Fix-runtime-dependency-accounting-in-dlopen.patch --] [-- Type: text/x-diff, Size: 3228 bytes --] From 18008eb03acd59f6cbaa82c607f1969c70707e21 Mon Sep 17 00:00:00 2001 From: Markus Wichmann <nullplan@gmx.net> Date: Thu, 7 Feb 2019 18:17:25 +0100 Subject: [PATCH 9/9] Fix runtime dependency accounting in dlopen(). As Alexey Izbyshev pointed out, the library given as argument to dlopen() does not necessarily have to be the last in the chain. Nor do any of the dependencies. Therefore it is wrong to assume that walking the chain of libraries from any of them forward will only walk over dependencies of the freshly loaded library. I had to split the runtime and loadtime paths of load_deps() apart, or otherwise I would have had to allocate the deps array for the application at loadtime. And then I would have needed a resolution for allocation failure, which would have been a crash. --- ldso/dynlink.c | 43 ++++++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 15 deletions(-) diff --git a/ldso/dynlink.c b/ldso/dynlink.c index 6ffeca85..66e6f18b 100644 --- a/ldso/dynlink.c +++ b/ldso/dynlink.c @@ -1136,10 +1136,10 @@ static struct dso *load_library(const char *name, struct dso *needed_by) return p; } -static void load_deps(struct dso *p) -{ - size_t i, ndeps=0; - struct dso ***deps = &p->deps, **tmp, *dep; +static void load_deps_loadtime(struct dso *p) { + size_t i; + struct dso *dep; + p->deps = (struct dso**)&nodeps_dummy; for (; p; p=p->next) { for (i=0; p->dynv[i]; i+=2) { if (p->dynv[i] != DT_NEEDED) continue; @@ -1147,19 +1147,32 @@ static void load_deps(struct dso *p) if (!dep) { error("Error loading shared library %s: %m (needed by %s)", p->strings + p->dynv[i+1], p->name); - if (runtime) longjmp(*rtld_fail, 1); - continue; } - if (runtime) { - tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2)); - if (!tmp) longjmp(*rtld_fail, 1); - tmp[ndeps++] = dep; - tmp[ndeps] = 0; - *deps = tmp; + } + } +} + +static void load_deps_runtime(struct dso *p) +{ + size_t i, ndeps=0, j=0; + struct dso ***deps = &p->deps, **tmp, *dep; + for (; p; p=(*deps)[j++]) { + for (i=0; p->dynv[i]; i+=2) { + if (p->dynv[i] != DT_NEEDED) continue; + dep = load_library(p->strings + p->dynv[i+1], p); + if (!dep) { + error("Error loading shared library %s: %m (needed by %s)", + p->strings + p->dynv[i+1], p->name); + longjmp(*rtld_fail, 1); } + tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2)); + if (!tmp) longjmp(*rtld_fail, 1); + tmp[ndeps++] = dep; + tmp[ndeps] = 0; + *deps = tmp; } } - if (!*deps) *deps = (struct dso **)&nodeps_dummy; + if (!*deps) *deps = (struct dso**)&nodeps_dummy; } static void load_preload(char *s) @@ -1653,7 +1666,7 @@ _Noreturn void __dls3(size_t *sp) /* Load preload/needed libraries, add symbols to global namespace. */ if (env_preload) load_preload(env_preload); - load_deps(&app); + load_deps_loadtime(&app); for (struct dso *p=head; p; p=p->next) add_syms(p); @@ -1836,7 +1849,7 @@ void *dlopen(const char *file, int mode) /* First load handling */ int first_load = !p->deps; if (first_load) { - load_deps(p); + load_deps_runtime(p); if (!p->relocated && (mode & RTLD_LAZY)) { prepare_lazy(p); for (i=0; p->deps[i]; i++) -- 2.20.1 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-07 17:43 ` Markus Wichmann @ 2019-02-07 20:37 ` Markus Wichmann 2019-02-07 21:29 ` Rich Felker 1 sibling, 0 replies; 24+ messages in thread From: Markus Wichmann @ 2019-02-07 20:37 UTC (permalink / raw) To: Alexey Izbyshev; +Cc: musl [-- Attachment #1: Type: text/plain, Size: 316 bytes --] I accidentally added a crash. Namely exactly here: > +static void load_deps_runtime(struct dso *p) > +{ > + size_t i, ndeps=0, j=0; > + struct dso ***deps = &p->deps, **tmp, *dep; > + for (; p; p=(*deps)[j++]) { *deps might be null pointer, which this expression is dereferencing. Patch is attached. Ciao, Markus [-- Attachment #2: 0010-Fix-crash-bug-from-previous-commit.patch --] [-- Type: text/x-diff, Size: 983 bytes --] From d90e719cccfe09439324b66cd8894d3781b00048 Mon Sep 17 00:00:00 2001 From: Markus Wichmann <nullplan@gmx.net> Date: Thu, 7 Feb 2019 21:35:17 +0100 Subject: [PATCH 10/10] Fix crash bug from previous commit. The previous revision would crash if dlopen() was called for a library with no dependencies. --- ldso/dynlink.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ldso/dynlink.c b/ldso/dynlink.c index 66e6f18b..85c3db75 100644 --- a/ldso/dynlink.c +++ b/ldso/dynlink.c @@ -1136,7 +1136,8 @@ static struct dso *load_library(const char *name, struct dso *needed_by) return p; } -static void load_deps_loadtime(struct dso *p) { +static void load_deps_loadtime(struct dso *p) +{ size_t i; struct dso *dep; p->deps = (struct dso**)&nodeps_dummy; @@ -1171,6 +1172,7 @@ static void load_deps_runtime(struct dso *p) tmp[ndeps] = 0; *deps = tmp; } + if (!*deps) break; } if (!*deps) *deps = (struct dso**)&nodeps_dummy; } -- 2.20.1 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-07 17:43 ` Markus Wichmann 2019-02-07 20:37 ` Markus Wichmann @ 2019-02-07 21:29 ` Rich Felker 1 sibling, 0 replies; 24+ messages in thread From: Rich Felker @ 2019-02-07 21:29 UTC (permalink / raw) To: musl On Thu, Feb 07, 2019 at 06:43:12PM +0100, Markus Wichmann wrote: > On Thu, Feb 07, 2019 at 04:42:15PM +0300, Alexey Izbyshev wrote: > > I think the easiest way is simply to modify load_deps() to always traverse > > DT_NEEDED in breadth-first order without relying on the dso list in the > > outer loop. load_deps() already effectively maintains a queue (deps) that > > can be used for BFS, so no recursion is needed. > > OK, since we have to implement a BFS, that does in fact work. So I > implemented that. Still needs testing, though. Comments below: > One side effect is, patch 7 from the previous mail was reverted. > > Another is that now load_deps() depends on the deps array as loop > structure. I was almost as far as just using the runtime code and adding > an a_crash() in case of allocation failure during loadtime, but then I > decided to just split loadtime and runtime apart. So > load_deps_loadtime() is just a copy of load_deps(), refactored with the > assumption runtime==0, and load_deps_runtime() is a copy of load_deps(), > with the patch discussed here and refactored under the assumption > runtime!=0. The error() function sets ldso_fail to true, which prevents running of the program. So it should work just fine to keep them together, and to build the deps lists at program start time if we want (which is still an open question, I think). > I had noticed, during the refactoring, that this means that app->deps == > {0}, always. So I wondered if that might bite us. However, the only > normal way to obtain a handle to the app itself is to call dlsym() with > RTLD_NEXT. Which is one of the special symbols that will load symbols > from the given DSO and all following ones in the symbol list. And all of > the main app's dependencies are immediately added to the symbol list > after the first load_deps() call (now load_deps_loadtime()). This seems correct. > For Rich's comfort, I am attaching patch 6 again, so all relevant > patches are in one mail. One thing to fix in it, see below.. > From e823910d69ff56ffccecaa9b29fd4b67b901798a Mon Sep 17 00:00:00 2001 > From: Markus Wichmann <nullplan@gmx.net> > Date: Wed, 6 Feb 2019 16:51:53 +0100 > Subject: [PATCH 6/9] Make libc and vdso explicitly have no deps. > > Alexey Izbyshev reported that without this, dlopen("libc.so") returns a > handle that is capable of finding every symbol in libraries loaded as > dependencies, since dso->deps == 0 usually means dependencies haven't > been loaded. > --- > ldso/dynlink.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/ldso/dynlink.c b/ldso/dynlink.c > index ec921dfd..6ffeca85 100644 > --- a/ldso/dynlink.c > +++ b/ldso/dynlink.c > @@ -1244,6 +1244,7 @@ static void reloc_all(struct dso *p) > static void kernel_mapped_dso(struct dso *p) > { > size_t min_addr = -1, max_addr = 0, cnt; > + static const struct dso *sentinel = 0; This fragment is unused and looks like cruft leftover from an earlier idea. > Phdr *ph = p->phdr; > for (cnt = p->phnum; cnt--; ph = (void *)((char *)ph + p->phentsize)) { > if (ph->p_type == PT_DYNAMIC) { > @@ -1428,6 +1429,7 @@ hidden void __dls2(unsigned char *base, size_t *sp) > ldso.phdr = laddr(&ldso, ehdr->e_phoff); > ldso.phentsize = ehdr->e_phentsize; > kernel_mapped_dso(&ldso); > + ldso.deps = (struct dso**)&nodeps_dummy; > decode_dyn(&ldso); > > if (DL_FDPIC) makefuncdescs(&ldso); > @@ -1675,6 +1677,7 @@ _Noreturn void __dls3(size_t *sp) > vdso.prev = tail; > tail->next = &vdso; > tail = &vdso; > + vdso.deps = (struct dso**)&nodeps_dummy; > } Style nit: (struct dso **) > From 18008eb03acd59f6cbaa82c607f1969c70707e21 Mon Sep 17 00:00:00 2001 > From: Markus Wichmann <nullplan@gmx.net> > Date: Thu, 7 Feb 2019 18:17:25 +0100 > Subject: [PATCH 9/9] Fix runtime dependency accounting in dlopen(). > > As Alexey Izbyshev pointed out, the library given as argument to > dlopen() does not necessarily have to be the last in the chain. Nor do > any of the dependencies. Therefore it is wrong to assume that walking > the chain of libraries from any of them forward will only walk over > dependencies of the freshly loaded library. > > I had to split the runtime and loadtime paths of load_deps() apart, or > otherwise I would have had to allocate the deps array for the > application at loadtime. And then I would have needed a resolution for > allocation failure, which would have been a crash. > --- > ldso/dynlink.c | 43 ++++++++++++++++++++++++++++--------------- > 1 file changed, 28 insertions(+), 15 deletions(-) > > diff --git a/ldso/dynlink.c b/ldso/dynlink.c > index 6ffeca85..66e6f18b 100644 > --- a/ldso/dynlink.c > +++ b/ldso/dynlink.c > @@ -1136,10 +1136,10 @@ static struct dso *load_library(const char *name, struct dso *needed_by) > return p; > } > > -static void load_deps(struct dso *p) > -{ > - size_t i, ndeps=0; > - struct dso ***deps = &p->deps, **tmp, *dep; > +static void load_deps_loadtime(struct dso *p) { > + size_t i; > + struct dso *dep; > + p->deps = (struct dso**)&nodeps_dummy; > for (; p; p=p->next) { > for (i=0; p->dynv[i]; i+=2) { > if (p->dynv[i] != DT_NEEDED) continue; > @@ -1147,19 +1147,32 @@ static void load_deps(struct dso *p) > if (!dep) { > error("Error loading shared library %s: %m (needed by %s)", > p->strings + p->dynv[i+1], p->name); > - if (runtime) longjmp(*rtld_fail, 1); > - continue; > } > - if (runtime) { > - tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2)); > - if (!tmp) longjmp(*rtld_fail, 1); > - tmp[ndeps++] = dep; > - tmp[ndeps] = 0; > - *deps = tmp; > + } > + } > +} > + > +static void load_deps_runtime(struct dso *p) > +{ > + size_t i, ndeps=0, j=0; > + struct dso ***deps = &p->deps, **tmp, *dep; > + for (; p; p=(*deps)[j++]) { > + for (i=0; p->dynv[i]; i+=2) { > + if (p->dynv[i] != DT_NEEDED) continue; > + dep = load_library(p->strings + p->dynv[i+1], p); > + if (!dep) { > + error("Error loading shared library %s: %m (needed by %s)", > + p->strings + p->dynv[i+1], p->name); > + longjmp(*rtld_fail, 1); > } > + tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2)); > + if (!tmp) longjmp(*rtld_fail, 1); > + tmp[ndeps++] = dep; > + tmp[ndeps] = 0; > + *deps = tmp; > } Aside from above remark about not splitting the two versions, I don't think the algorithm works. In the case of circular dependencies, which are awful but do happen in the wild, the loop will run forever and keep appending to the deps array. I think this can be fixed via comparison of each new dep against prior slots (linear time for each addition, so quadratic overall) to avoid adding it more than once, or since we hold a lock, a tag could be added to struct dso to tag which ones we've already hit (constant time). Since load_library is already a linear search with a larger value of N than the number of dependencies, I don't really see any advantage to avoiding the linear search here, and would just go with it since it's simpler and less invasive. > } > - if (!*deps) *deps = (struct dso **)&nodeps_dummy; > + if (!*deps) *deps = (struct dso**)&nodeps_dummy; Spurious style regression. :) Rich ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-07 5:33 ` Markus Wichmann 2019-02-07 13:42 ` Alexey Izbyshev @ 2019-02-07 16:54 ` Rich Felker 2019-02-07 18:36 ` Markus Wichmann 2019-02-08 10:19 ` A. Wilcox 1 sibling, 2 replies; 24+ messages in thread From: Rich Felker @ 2019-02-07 16:54 UTC (permalink / raw) To: musl On Thu, Feb 07, 2019 at 06:33:27AM +0100, Markus Wichmann wrote: > On Thu, Feb 07, 2019 at 12:23:06AM +0300, Alexey Izbyshev wrote: > > On 2019-02-06 23:25, Markus Wichmann wrote: > > > Right you are. It took me a while to understand what the deps array was > > > even for (since musl's dlclose() doesn't do anything, tracking > > > dependencies is mostly pointless), but I found it is needed for lazy > > > relocation processing. So it is necessary for all libs opened by > > > dlopen() directly to contain a list of all their dependencies. All the > > > other libs can have an empty list. > > > > Actually, dso->deps is used in dlsym(handle) because it must use the > > dependency order for symbol search, so it's incorrect to have deps empty for > > "all the other" libs. Consider the following modification of my previous > > example: > > > > $ cat bazdep.c > > int bazdep = 1; > > extern int bazdepdep; > > int *p = &bazdepdep; > > $ cat bazdepdep.c > > int bazdepdep = 2; > > $ cat main.c > > #include <dlfcn.h> > > #include <stdio.h> > > > > int main(void) { > > if (!dlopen("libbaz.so", RTLD_NOW|RTLD_LOCAL)) > > return 1; > > if (!dlopen("libfoo.so", RTLD_NOW|RTLD_LOCAL)) > > return 1; > > void *h = dlopen("libbazdep.so", RTLD_NOW|RTLD_LOCAL); > > printf("%p\n", dlsym(h, "bar")); > > printf("%p\n", dlsym(h, "bazdepdep")); > > } > > > > The correct output is zero in the first line and some non-zero address in > > the second. Vanilla musl 1.1.21 prints two non-zero addresses. But with your > > patch the output is two zeros because dlsym() can't search in dependencies > > of "libbazdep.so" anymore. > > > > Alexey > > OK, so life just got more interesting. I gather the deps handling was > always incorrect. > > Let's consider the original code. liba depends on libb, which depends on > libc. dlopen("liba") returns a handle with libb and libc in the deps, > but libb->deps == 0. If we now call dlopen("libb"), that does the right > thing, but only because libb happens to be the last lib in the chain. If > we'd have loaded libx, liby, and libz before trying libb, it would add > all the symbols of libs x, y, and z to the libb handle. > > I guess the hope was that this situation never arrises. So how do we fix > this? > > I think the easiest is probably going to be to patch up load_deps, but > avoiding recursion is going to be the fun part. My plan is to make > dso->deps contain all direct and indirect dependencies (which is what > the code seems to depend on, anyway). This is going to consume more > memory, but we are talking a few pointers, and we are dealing with > shared libs, anyway. Yes, it needs to contain all direct and indirect dependencies, in the correct order. Right now I think it incorrectly contains a superset of that. Is that your assessment of the situation too? > As you said, order is important. What is the correct order, depth-first > or breadth-first? I think it should be depth-first, but lack any > authoritative knowledge on this. It's not; it's breadth-first. The definition of "dependency order" used for dlsym is here: http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlopen.html "Dependency ordering uses a breadth-first order starting with a given executable object file, then all of its dependencies, then any dependents of those, iterating until all dependencies are satisfied. With the exception of the global symbol table handle obtained via a dlopen() operation with a null pointer as the file argument, dependency ordering is used by the dlsym() function." However, a depth-first list of dependencies is also needed to solve the longstanding ctor-order issue discussed in several past threads (which I can look up if search doesn't turn them up for you). This is not a requirement of POSIX (which doesn't even have ctors), but it's a reasonable expectation we currently get wrong and I think it might be specified in ELF or some related sysv "standards". I'm not sure if it makes sense to build both of these lists at the same time. We currently try to avoid allocating dependency lists for libraries loaded at startup in order not to allocate and setup data that might never be used, defering until if/when dlopen is called on them. I've wanted to do the ctor order walk without allocating a dependency list, but I don't know a good way to do so. Note that the code that runs ctors cannot allocate because, at the time it runs, dlopen has already "committed" the load and can't back it out. It has to have all resources it needs for the walk precommitted. Since the beginning of a list of breadth-first dependencies is just the direct dependencies, if we recorded the number of direct dependencies for each dso, the breadth-first lists could be used to perform a depth-first walk to run ctors; this can be made non-recursive, at the cost of quadratic time but with trivially-tiny constant factor, by simply restarting at the root of the tree after each node and finding the deepest not-yet-constructed dso. This suggests always allocating the breadth-first dependency lists might be a good idea. That "feels" unfortunate since in principle they can get large, but it's only going to be large for ridiculous bloatware with hundreds of libraries, in which case these dependency lists are very small on a relative scale. > It would make the most sense, anyway > (if, from the point of view of a user a library contains all the symbols > of its dependencies, then those dependencies must also contain all the > symbols of their dependencies). So with the following dependency tree: > > liba->libb->libc > `>libx->liby > > the handle for liba would list libc before libx. > > Easiest implementation is probably still going to be recursive. Let's > hope the dependency trees don't get too wild. I don't think recursive can be done safely since you don't have a small bound on levels of recursion. It could be done with an explicit stack in allocated storage though, since the operation has to allocate memory anyway and has to fail if allocation fails. Rich ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-07 16:54 ` Rich Felker @ 2019-02-07 18:36 ` Markus Wichmann 2019-02-07 18:57 ` Rich Felker 2019-02-08 10:19 ` A. Wilcox 1 sibling, 1 reply; 24+ messages in thread From: Markus Wichmann @ 2019-02-07 18:36 UTC (permalink / raw) To: musl On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote: > Yes, it needs to contain all direct and indirect dependencies, in the > correct order. Right now I think it incorrectly contains a superset of > that. Is that your assessment of the situation too? > Yes, it is a superset, if the original argument to load_deps() is not the tail of the chain of libraries. > However, a depth-first list of dependencies is also needed to solve > the longstanding ctor-order issue discussed in several past threads > (which I can look up if search doesn't turn them up for you). This is > not a requirement of POSIX (which doesn't even have ctors), but it's a > reasonable expectation we currently get wrong and I think it might be > specified in ELF or some related sysv "standards". > Requirements first, nice-to-haves later, right? Once we have the dlsym() conformance issue squared away, we can still deal with this issue. You mean, people expect their dependencies to be initialized in their own initializers? When it is well-known that there is no order to initializers, and e.g. the C++ standard says there is no order to constructors of static objects. > I'm not sure if it makes sense to build both of these lists at the > same time. We currently try to avoid allocating dependency lists for > libraries loaded at startup in order not to allocate and setup data > that might never be used, defering until if/when dlopen is called on > them. I've wanted to do the ctor order walk without allocating a > dependency list, but I don't know a good way to do so. Note that the > code that runs ctors cannot allocate because, at the time it runs, > dlopen has already "committed" the load and can't back it out. It has > to have all resources it needs for the walk precommitted. > Depth first means stack. Means recursion or explicit stack. Explicit is going to be hard, considering there is no known limit to dependency nesting depth. We could argue that the user better make their stack size ulimit large enough for the main thread, but I hazard a guess that nobody expects dlopen() to use overly much stack in other threads. Alright, what's the algorithm here? init(lib): if (!lib.inited): foreach d in lib.deps init(d) start_init(lib) lib.inited = 1 That about right? Because that means we need a stack as high as the nesting depth of dependencies. We can get a pessimistic estimation from the number of deps loaded, but that may be way too high (see below). > Since the beginning of a list of breadth-first dependencies is just > the direct dependencies, if we recorded the number of direct > dependencies for each dso, the breadth-first lists could be used to > perform a depth-first walk to run ctors; this can be made > non-recursive, at the cost of quadratic time but with trivially-tiny > constant factor, by simply restarting at the root of the tree after > each node and finding the deepest not-yet-constructed dso. This > suggests always allocating the breadth-first dependency lists might be > a good idea. That "feels" unfortunate since in principle they can get > large, but it's only going to be large for ridiculous bloatware with > hundreds of libraries, in which case these dependency lists are very > small on a relative scale. > $ ldd =mpv | wc -l 292 And that, I will remind you, is the "suckless" alternative to mplayer. Anyway, that is breadth, not depth. Experimentally, it is hard to find a chain of libraries deeper than ten. They all eventually just lead back to libc. So what you could do, when it comes time to run the initializers, is to walk the dyn vectors again. Those are already allocated, and all the libs they reference are already loaded, so load_library() would only be a search operation. By the time the initializers run, that must be the case, both at load time and at runtime. Unless loading the library failed the first time, in which case at runtime we have already bailed. And at load time we... haven't. Are you sure continuing on load failure of a dependency during load time is a good idea? > I don't think recursive can be done safely since you don't have a > small bound on levels of recursion. It could be done with an explicit > stack in allocated storage though, since the operation has to allocate > memory anyway and has to fail if allocation fails. > > Rich Nevermind, it was a misunderstanding of the requirements. Ciao, Markus ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-07 18:36 ` Markus Wichmann @ 2019-02-07 18:57 ` Rich Felker 2019-02-07 20:31 ` Markus Wichmann 0 siblings, 1 reply; 24+ messages in thread From: Rich Felker @ 2019-02-07 18:57 UTC (permalink / raw) To: musl On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote: > On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote: > > However, a depth-first list of dependencies is also needed to solve > > the longstanding ctor-order issue discussed in several past threads > > (which I can look up if search doesn't turn them up for you). This is > > not a requirement of POSIX (which doesn't even have ctors), but it's a > > reasonable expectation we currently get wrong and I think it might be > > specified in ELF or some related sysv "standards". > > Requirements first, nice-to-haves later, right? Once we have the dlsym() > conformance issue squared away, we can still deal with this issue. > > You mean, people expect their dependencies to be initialized in their > own initializers? When it is well-known that there is no order to > initializers, and e.g. the C++ standard says there is no order to > constructors of static objects. Yes. GCC has an extension for ctor priority within static linking scope, but for dynamic linking scope that doesn't help. I don't like any of this but glib depends on it to avoid just doing the right thing with pthread_once/call_once, and refuses to fix it. > > I'm not sure if it makes sense to build both of these lists at the > > same time. We currently try to avoid allocating dependency lists for > > libraries loaded at startup in order not to allocate and setup data > > that might never be used, defering until if/when dlopen is called on > > them. I've wanted to do the ctor order walk without allocating a > > dependency list, but I don't know a good way to do so. Note that the > > code that runs ctors cannot allocate because, at the time it runs, > > dlopen has already "committed" the load and can't back it out. It has > > to have all resources it needs for the walk precommitted. > > Depth first means stack. Means recursion or explicit stack. Explicit is > going to be hard, considering there is no known limit to dependency > nesting depth. We could argue that the user better make their stack size > ulimit large enough for the main thread, but I hazard a guess that > nobody expects dlopen() to use overly much stack in other threads. > > Alright, what's the algorithm here? > > init(lib): > if (!lib.inited): > foreach d in lib.deps init(d) > start_init(lib) > lib.inited = 1 > > That about right? Because that means we need a stack as high as the > nesting depth of dependencies. We can get a pessimistic estimation from > the number of deps loaded, but that may be way too high (see below). Yes, but you can also avoid recursion just by looping to the deepest dependency with !inited, then going back to the root. For a one-time operation at dlopen-time or program-start time, the quadratic search for each !inited seems unlikely to be a problem: > > Since the beginning of a list of breadth-first dependencies is just > > the direct dependencies, if we recorded the number of direct > > dependencies for each dso, the breadth-first lists could be used to > > perform a depth-first walk to run ctors; this can be made > > non-recursive, at the cost of quadratic time but with trivially-tiny ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > constant factor, by simply restarting at the root of the tree after ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > each node and finding the deepest not-yet-constructed dso. This ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > $ ldd =mpv | wc -l > 292 > > And that, I will remind you, is the "suckless" alternative to mplayer. > > Anyway, that is breadth, not depth. Experimentally, it is hard to find a > chain of libraries deeper than ten. They all eventually just lead back > to libc. > > So what you could do, when it comes time to run the initializers, is to > walk the dyn vectors again. Those are already allocated, and all the > libs they reference are already loaded, so load_library() would only be > a search operation. Note that each load_library is a linear search, so you still have the quadratic time, now with much higher coefficient. And if you want to use my go-back-to-root approach to avoid having O(depth) working space, it's cubic time. Also, as-written, load_library is not "only a search operation"; perhaps this is a bug in itself. For absolute paths it will always open the file and compare inode identity to see if it matches an already-loaded file. This should probably be bypassed by first checking long-name. Fortunately absolute paths in DT_NEEDED are strongly discouraged/a-really-bad-idea, so it probably doesn't come up much in practice. > By the time the initializers run, that must be the > case, both at load time and at runtime. Unless loading the library > failed the first time, in which case at runtime we have already bailed. > And at load time we... haven't. Are you sure continuing on load failure > of a dependency during load time is a good idea? I don't follow. The dlopen operation is not committed until load of all dependencies completes successfully, and if any fail to load, the whole operation is backed-out. But ctors don't/can't run until *after* that, when we've already committed to success. Rich ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-07 18:57 ` Rich Felker @ 2019-02-07 20:31 ` Markus Wichmann 2019-02-07 21:33 ` Rich Felker 0 siblings, 1 reply; 24+ messages in thread From: Markus Wichmann @ 2019-02-07 20:31 UTC (permalink / raw) To: musl On Thu, Feb 07, 2019 at 01:57:36PM -0500, Rich Felker wrote: > Yes. GCC has an extension for ctor priority within static linking > scope, but for dynamic linking scope that doesn't help. I don't like > any of this but glib depends on it to avoid just doing the right thing > with pthread_once/call_once, and refuses to fix it. > Well, at least we are on the same page, here. And my opinion of glib is validated once more. Unfortunately, at this point it is too big to fail, in several ways. > Yes, but you can also avoid recursion just by looping to the deepest > dependency with !inited, then going back to the root. For a one-time > operation at dlopen-time or program-start time, the quadratic search > for each !inited seems unlikely to be a problem: > Wait, I have an idea. If the only ordering is that the dependencies need to be initialized before their dependents, then couldn't we just initialize the libs in reverse BFS order? The elements further down the tree are all necessarily further down the list, aren't they? > I don't follow. The dlopen operation is not committed until load of > all dependencies completes successfully, and if any fail to load, the > whole operation is backed-out. But ctors don't/can't run until *after* > that, when we've already committed to success. > That is true for the runtime case, i.e. dlopen(). But load_deps() is also called at load time. And initializers have to run at load time, too. And in the correct order. If at load time, any dependencies fail to load, an error message is printed and then the loop continues. load_deps() has no way to signal failure to the caller, and at load time it will not exit the function in another way, i.e. longjump (which is good since that would be invalid at that time). So by the time the initializers are called, all dependencies are loaded except those which failed. > Rich Ciao, Markus ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-07 20:31 ` Markus Wichmann @ 2019-02-07 21:33 ` Rich Felker 2019-02-07 21:37 ` Rich Felker 0 siblings, 1 reply; 24+ messages in thread From: Rich Felker @ 2019-02-07 21:33 UTC (permalink / raw) To: musl On Thu, Feb 07, 2019 at 09:31:38PM +0100, Markus Wichmann wrote: > > Yes, but you can also avoid recursion just by looping to the deepest > > dependency with !inited, then going back to the root. For a one-time > > operation at dlopen-time or program-start time, the quadratic search > > for each !inited seems unlikely to be a problem: > > > > Wait, I have an idea. If the only ordering is that the dependencies need > to be initialized before their dependents, then couldn't we just > initialize the libs in reverse BFS order? The elements further down the > tree are all necessarily further down the list, aren't they? No. Suppose X depends on Y and Z, and Z also depends on Y. If you do reverse-BFS order, you'll construct Z before Y, despite Z depending on Y (and Z's ctors depending on Y's ctors already having run). > > I don't follow. The dlopen operation is not committed until load of > > all dependencies completes successfully, and if any fail to load, the > > whole operation is backed-out. But ctors don't/can't run until *after* > > that, when we've already committed to success. > > That is true for the runtime case, i.e. dlopen(). But load_deps() is > also called at load time. And initializers have to run at load time, > too. And in the correct order. > > If at load time, any dependencies fail to load, an error message is > printed and then the loop continues. load_deps() has no way to signal > failure to the caller, and at load time it will not exit the function in > another way, i.e. longjump (which is good since that would be invalid at > that time). So by the time the initializers are called, all dependencies > are loaded except those which failed. See the definition of error(). It sets ldso_fail so that execution never proceeds to the program. Rich ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-07 21:33 ` Rich Felker @ 2019-02-07 21:37 ` Rich Felker 0 siblings, 0 replies; 24+ messages in thread From: Rich Felker @ 2019-02-07 21:37 UTC (permalink / raw) To: musl On Thu, Feb 07, 2019 at 04:33:18PM -0500, Rich Felker wrote: > On Thu, Feb 07, 2019 at 09:31:38PM +0100, Markus Wichmann wrote: > > > Yes, but you can also avoid recursion just by looping to the deepest > > > dependency with !inited, then going back to the root. For a one-time > > > operation at dlopen-time or program-start time, the quadratic search > > > for each !inited seems unlikely to be a problem: > > > > > > > Wait, I have an idea. If the only ordering is that the dependencies need > > to be initialized before their dependents, then couldn't we just > > initialize the libs in reverse BFS order? The elements further down the > > tree are all necessarily further down the list, aren't they? > > No. Suppose X depends on Y and Z, and Z also depends on Y. If you do > reverse-BFS order, you'll construct Z before Y, despite Z depending on > Y (and Z's ctors depending on Y's ctors already having run). Hmm, maybe if you add redundant entries like your code was doing, this works -- the list gets "YZY" for my example -- but then you waste a good bit of space and need a way to cut off circular dependencies (which can't respect ctor dependency order) while not breaking the order for non-circular ones. Rich ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-07 16:54 ` Rich Felker 2019-02-07 18:36 ` Markus Wichmann @ 2019-02-08 10:19 ` A. Wilcox 2019-02-08 12:00 ` Szabolcs Nagy 1 sibling, 1 reply; 24+ messages in thread From: A. Wilcox @ 2019-02-08 10:19 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 883 bytes --] On Feb 7, 2019, at 10:54 AM, Rich Felker <dalias@libc.org> wrote: > > However, a depth-first list of dependencies is also needed to solve the longstanding ctor-order issue discussed in several past threads (which I can look up if search doesn't turn them up for you). This is not a requirement of POSIX (which doesn't even have ctors), but it's a reasonable expectation we currently get wrong and I think it might be specified in ELF or some related sysv "standards". It is part of the ELF 1.2 standard and is not only required by GLib, but also Nouveau, gtk-doc, some Qt apps, and others. More info, including the spec citation, example failures, and the musl ML thread, can be found on our BTS: https://bts.adelielinux.org/show_bug.cgi?id=11 Best, —arw -- A. Wilcox (Sent from my iPhone - not signed) Project Lead, Adélie Linux https://adelielinux.org [-- Attachment #2: Type: text/html, Size: 1904 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-08 10:19 ` A. Wilcox @ 2019-02-08 12:00 ` Szabolcs Nagy 2019-02-08 16:09 ` Rich Felker 0 siblings, 1 reply; 24+ messages in thread From: Szabolcs Nagy @ 2019-02-08 12:00 UTC (permalink / raw) To: musl * A. Wilcox <awilfox@adelielinux.org> [2019-02-08 04:19:32 -0600]: > On Feb 7, 2019, at 10:54 AM, Rich Felker <dalias@libc.org> wrote: > > > > However, a depth-first list of dependencies is also needed to solve the longstanding ctor-order issue discussed in several past threads (which I can look up if search doesn't turn them up for you). This is not a requirement of POSIX (which doesn't even have ctors), but it's a reasonable expectation we currently get wrong and I think it might be specified in ELF or some related sysv "standards". > > > It is part of the ELF 1.2 standard and is not only required by GLib, but also Nouveau, gtk-doc, some Qt apps, and others. note, that the ctor order in c++ is undefined across translation units so if a c++ application relies on it that's a bug: the c++ ctor concept does not necessarily map directly to the elf initialization function concept on an elf platform so the elf spec does not save them. in the elf context, relying on DT_NEEDED for ctor ordering across libraries means that static linking is broken since then there is no DT_NEEDED. so these packages should be fixed independently of musl (assuming they want to be portable or at least static linkable). that said, lot of libraries do rely on dependency based ctor ordering even beyond the elf guarantees (glibc and solaris dynamic linkers considers relocations on top of DT_NEEDED to establish dependencies between dsos) and some applications have dependency cycles which of course is impossible to get right but glibc tries to keep the ctor order deterministic in some way even in the presence of dlopen during init. originally dynamic linkers did exactly what musl did, there is a nice overview how the complexity evolved: https://blogs.oracle.com/solaris/init-and-fini-processing-who-designed-this-v2 > More info, including the spec citation, example failures, and the musl ML thread, can be found on our BTS: > > https://bts.adelielinux.org/show_bug.cgi?id=11 > > Best, > —arw > > > -- > A. Wilcox (Sent from my iPhone - not signed) > Project Lead, Adélie Linux > https://adelielinux.org ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: dlsym(handle) may search in unrelated libraries 2019-02-08 12:00 ` Szabolcs Nagy @ 2019-02-08 16:09 ` Rich Felker 0 siblings, 0 replies; 24+ messages in thread From: Rich Felker @ 2019-02-08 16:09 UTC (permalink / raw) To: musl On Fri, Feb 08, 2019 at 01:00:16PM +0100, Szabolcs Nagy wrote: > * A. Wilcox <awilfox@adelielinux.org> [2019-02-08 04:19:32 -0600]: > > On Feb 7, 2019, at 10:54 AM, Rich Felker <dalias@libc.org> wrote: > > > > > > However, a depth-first list of dependencies is also needed to solve the longstanding ctor-order issue discussed in several past threads (which I can look up if search doesn't turn them up for you). This is not a requirement of POSIX (which doesn't even have ctors), but it's a reasonable expectation we currently get wrong and I think it might be specified in ELF or some related sysv "standards". > > > > > > It is part of the ELF 1.2 standard and is not only required by GLib, but also Nouveau, gtk-doc, some Qt apps, and others. > > note, that the ctor order in c++ is undefined across translation > units so if a c++ application relies on it that's a bug: the c++ > ctor concept does not necessarily map directly to the elf > initialization function concept on an elf platform so the elf > spec does not save them. In this case they're not using C++ but the GNU C extension. > in the elf context, relying on DT_NEEDED for ctor ordering > across libraries means that static linking is broken since > then there is no DT_NEEDED. This is true, but they can make static linking also work (within libraries that share a convention on the priority values to use) by using the GNU C ctor priority attributes. This could fix the glib bug. I forget if any distros are doing it out-of-tree or if they just patched the one ctor to call the other init function explicitly. > so these packages should be fixed independently of musl > (assuming they want to be portable or at least static linkable). This is probably mostly correct. > that said, lot of libraries do rely on dependency based ctor > ordering even beyond the elf guarantees (glibc and solaris > dynamic linkers considers relocations on top of DT_NEEDED to > establish dependencies between dsos) and some applications > have dependency cycles which of course is impossible to get > right but glibc tries to keep the ctor order deterministic > in some way even in the presence of dlopen during init. > > originally dynamic linkers did exactly what musl did, there > is a nice overview how the complexity evolved: > https://blogs.oracle.com/solaris/init-and-fini-processing-who-designed-this-v2 Good to know there's precedent. Rich ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2019-03-04 2:11 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-09 22:53 dlsym(handle) may search in unrelated libraries Alexey Izbyshev 2019-02-10 1:03 ` Rich Felker 2019-02-26 15:07 ` Rich Felker 2019-03-04 2:11 ` Rich Felker -- strict thread matches above, loose matches on Subject: below -- 2019-02-05 21:02 Alexey Izbyshev 2019-02-06 13:40 ` Alexey Izbyshev 2019-02-06 16:02 ` Markus Wichmann 2019-02-06 17:02 ` Alexey Izbyshev 2019-02-06 20:25 ` Markus Wichmann 2019-02-06 21:23 ` Alexey Izbyshev 2019-02-07 5:33 ` Markus Wichmann 2019-02-07 13:42 ` Alexey Izbyshev 2019-02-07 17:43 ` Markus Wichmann 2019-02-07 20:37 ` Markus Wichmann 2019-02-07 21:29 ` Rich Felker 2019-02-07 16:54 ` Rich Felker 2019-02-07 18:36 ` Markus Wichmann 2019-02-07 18:57 ` Rich Felker 2019-02-07 20:31 ` Markus Wichmann 2019-02-07 21:33 ` Rich Felker 2019-02-07 21:37 ` Rich Felker 2019-02-08 10:19 ` A. Wilcox 2019-02-08 12:00 ` Szabolcs Nagy 2019-02-08 16:09 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).