mailing list of musl libc
 help / color / mirror / code / Atom feed
* dlsym(handle) may search in unrelated libraries
@ 2019-02-05 21:02 Alexey Izbyshev
  2019-02-06 13:40 ` Alexey Izbyshev
  2019-02-06 16:02 ` Markus Wichmann
  0 siblings, 2 replies; 24+ messages in thread
From: Alexey Izbyshev @ 2019-02-05 21:02 UTC (permalink / raw)
  To: musl

Hello!

I've discovered a bug in musl dynamic loader (tested on 1.1.21) which is 
demonstrated by the following simple example:

$ cat bar.c
int bar = 42;
$ musl-gcc -fPIC -shared bar.c -o libbar.so
$ cat foo.c
extern int bar;
int *foo = &bar;
$ musl-gcc -fPIC -shared foo.c -L. -lbar -Wl,-rpath='$ORIGIN' -o 
libfoo.so
$ cat main.c
#include <dlfcn.h>
#include <stdio.h>

int main(void) {
   if (!dlopen("libfoo.so", RTLD_NOW))
     return 1;
   void *h = dlopen("libc.so.6", RTLD_NOW);
   printf("%p\n", dlsym(h, "bar"));
}
$ musl-gcc main.c -Wl,-rpath='$ORIGIN' -ldl
$ ./a.out
0x7fd7ebe96020

dlsym(handle) is supposed to search only in the library referred to by 
the handle and in its dependencies. "libc.so.6" doesn't have 
dependencies and doesn't have a definition for "bar", so dlsym(h, "bar") 
should return NULL, but it finds "bar" in libbar.so instead.

The problem occurs because of the following:
1) Initially, "deps" in dso structure for libc.so.6 is NULL.
2) When dlopen("libc.so.6") is called, "first_load" is true, despite 
that it's not actually the first load (ldso/dynlink.c:1835):

     /* First load handling */
     int first_load = !p->deps;
     if (first_load) {
         load_deps(p);

3) load_deps() then iterates over the dso list starting from 
"libc.so.6", treating all libraries found in DT_NEEDED of each processed 
dso as dependencies of "libc.so.6". However, the dso list already 
contains "libfoo.so" loaded earlier, so "libbar.so" (which is needed by 
"libfoo.so") is treated as a dependency of "libc.so.6". As a result, 
dlsym(h, "bar") succeeds.

It's also notable that "libfoo.so" and "libbar.so" were loaded with 
RTLD_LOCAL, but this bug effectively makes their symbols available in 
such searches regardless of the scope of a library used with dlsym().

ISTM that load_deps(p) was written to work only in real "first load" 
situations, where "p" is initially the last dso in the list, and new 
dsos are only added to the list in the course of recursive loading of 
the dependencies of "p".

Could this be fixed? Thanks!

(Please CC me on replying, I'm not subscribed to the list.)

Alexey


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-05 21:02 dlsym(handle) may search in unrelated libraries Alexey Izbyshev
@ 2019-02-06 13:40 ` Alexey Izbyshev
  2019-02-06 16:02 ` Markus Wichmann
  1 sibling, 0 replies; 24+ messages in thread
From: Alexey Izbyshev @ 2019-02-06 13:40 UTC (permalink / raw)
  To: musl

Alexander Monakov noticed a confusing detail in my report: I used 
"libc.so.6" in my example despite that musl's "make install" creates 
"libc.so" only. To clarify, "libc.so.6" has nothing to do with glibc, 
and "libc.so.6" may be replaced with just "libc.so" in my example 
without any change in semantics.

Alexey

On 2/6/19 12:02 AM, Alexey Izbyshev wrote:
> Hello!
>
> I've discovered a bug in musl dynamic loader (tested on 1.1.21) which 
> is demonstrated by the following simple example:
>
> $ cat bar.c
> int bar = 42;
> $ musl-gcc -fPIC -shared bar.c -o libbar.so
> $ cat foo.c
> extern int bar;
> int *foo = &bar;
> $ musl-gcc -fPIC -shared foo.c -L. -lbar -Wl,-rpath='$ORIGIN' -o 
> libfoo.so
> $ cat main.c
> #include <dlfcn.h>
> #include <stdio.h>
>
> int main(void) {
>   if (!dlopen("libfoo.so", RTLD_NOW))
>     return 1;
>   void *h = dlopen("libc.so.6", RTLD_NOW);
>   printf("%p\n", dlsym(h, "bar"));
> }
> $ musl-gcc main.c -Wl,-rpath='$ORIGIN' -ldl
> $ ./a.out
> 0x7fd7ebe96020
>
> dlsym(handle) is supposed to search only in the library referred to by 
> the handle and in its dependencies. "libc.so.6" doesn't have 
> dependencies and doesn't have a definition for "bar", so dlsym(h, 
> "bar") should return NULL, but it finds "bar" in libbar.so instead.
>
> The problem occurs because of the following:
> 1) Initially, "deps" in dso structure for libc.so.6 is NULL.
> 2) When dlopen("libc.so.6") is called, "first_load" is true, despite 
> that it's not actually the first load (ldso/dynlink.c:1835):
>
>     /* First load handling */
>     int first_load = !p->deps;
>     if (first_load) {
>         load_deps(p);
>
> 3) load_deps() then iterates over the dso list starting from 
> "libc.so.6", treating all libraries found in DT_NEEDED of each 
> processed dso as dependencies of "libc.so.6". However, the dso list 
> already contains "libfoo.so" loaded earlier, so "libbar.so" (which is 
> needed by "libfoo.so") is treated as a dependency of "libc.so.6". As a 
> result, dlsym(h, "bar") succeeds.
>
> It's also notable that "libfoo.so" and "libbar.so" were loaded with 
> RTLD_LOCAL, but this bug effectively makes their symbols available in 
> such searches regardless of the scope of a library used with dlsym().
>
> ISTM that load_deps(p) was written to work only in real "first load" 
> situations, where "p" is initially the last dso in the list, and new 
> dsos are only added to the list in the course of recursive loading of 
> the dependencies of "p".
>
> Could this be fixed? Thanks!
>
> (Please CC me on replying, I'm not subscribed to the list.)
>
> Alexey




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-05 21:02 dlsym(handle) may search in unrelated libraries Alexey Izbyshev
  2019-02-06 13:40 ` Alexey Izbyshev
@ 2019-02-06 16:02 ` Markus Wichmann
  2019-02-06 17:02   ` Alexey Izbyshev
  1 sibling, 1 reply; 24+ messages in thread
From: Markus Wichmann @ 2019-02-06 16:02 UTC (permalink / raw)
  To: musl; +Cc: Alexey Izbyshev

[-- Attachment #1: Type: text/plain, Size: 1260 bytes --]

On Wed, Feb 06, 2019 at 12:02:39AM +0300, Alexey Izbyshev wrote:
> Hello!
> 
> I've discovered a bug in musl dynamic loader (tested on 1.1.21) which is
> demonstrated by the following simple example:
> 
> $ cat bar.c
> int bar = 42;
> $ musl-gcc -fPIC -shared bar.c -o libbar.so
> $ cat foo.c
> extern int bar;
> int *foo = &bar;
> $ musl-gcc -fPIC -shared foo.c -L. -lbar -Wl,-rpath='$ORIGIN' -o libfoo.so
> $ cat main.c
> #include <dlfcn.h>
> #include <stdio.h>
> 
> int main(void) {
>   if (!dlopen("libfoo.so", RTLD_NOW))
>     return 1;
>   void *h = dlopen("libc.so.6", RTLD_NOW);
>   printf("%p\n", dlsym(h, "bar"));
> }
> $ musl-gcc main.c -Wl,-rpath='$ORIGIN' -ldl
> $ ./a.out
> 0x7fd7ebe96020
> 

In case you were wondering, your typo here doesn't change anything,
since "libc.so.6" has the prefix "libc.", which is recogized as reserved
in load_library(), and makes dlopen() return a handle to the libc.

Thankfully the patch is simple: Explicitly make ldso and vdso have no
deps. I was tempted to put this into kernel_mapped_dso(), but then I
remembered that the app is also a kernel mapped dso, and it usually does
have deps that need processing. At least, in nontrivial cases.

The attached patch should tide you over.

> 
> Alexey

Ciao,
Markus

[-- Attachment #2: 0006-Make-libc-and-vdso-explicitly-have-no-deps.patch --]
[-- Type: text/x-diff, Size: 1443 bytes --]

From e823910d69ff56ffccecaa9b29fd4b67b901798a Mon Sep 17 00:00:00 2001
From: Markus Wichmann <nullplan@gmx.net>
Date: Wed, 6 Feb 2019 16:51:53 +0100
Subject: [PATCH 6/6] Make libc and vdso explicitly have no deps.

Alexey Izbyshev reported that without this, dlopen("libc.so") returns a
handle that is capable of finding every symbol in libraries loaded as
dependencies, since dso->deps == 0 usually means dependencies haven't
been loaded.
---
 ldso/dynlink.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index ec921dfd..6ffeca85 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -1244,6 +1244,7 @@ static void reloc_all(struct dso *p)
 static void kernel_mapped_dso(struct dso *p)
 {
 	size_t min_addr = -1, max_addr = 0, cnt;
+	static const struct dso *sentinel = 0;
 	Phdr *ph = p->phdr;
 	for (cnt = p->phnum; cnt--; ph = (void *)((char *)ph + p->phentsize)) {
 		if (ph->p_type == PT_DYNAMIC) {
@@ -1428,6 +1429,7 @@ hidden void __dls2(unsigned char *base, size_t *sp)
 	ldso.phdr = laddr(&ldso, ehdr->e_phoff);
 	ldso.phentsize = ehdr->e_phentsize;
 	kernel_mapped_dso(&ldso);
+	ldso.deps = (struct dso**)&nodeps_dummy;
 	decode_dyn(&ldso);
 
 	if (DL_FDPIC) makefuncdescs(&ldso);
@@ -1675,6 +1677,7 @@ _Noreturn void __dls3(size_t *sp)
 		vdso.prev = tail;
 		tail->next = &vdso;
 		tail = &vdso;
+		vdso.deps = (struct dso**)&nodeps_dummy;
 	}
 
 	for (i=0; app.dynv[i]; i+=2) {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-06 16:02 ` Markus Wichmann
@ 2019-02-06 17:02   ` Alexey Izbyshev
  2019-02-06 20:25     ` Markus Wichmann
  0 siblings, 1 reply; 24+ messages in thread
From: Alexey Izbyshev @ 2019-02-06 17:02 UTC (permalink / raw)
  To: Markus Wichmann, musl

On 2/6/19 7:02 PM, Markus Wichmann wrote:
> 
> Thankfully the patch is simple: Explicitly make ldso and vdso have no
> deps. I was tempted to put this into kernel_mapped_dso(), but then I
> remembered that the app is also a kernel mapped dso, and it usually does
> have deps that need processing. At least, in nontrivial cases.
> 
> The attached patch should tide you over.
> 
Thank you for the quick response and the patch, Markus! Your patch fixes 
the exact test case I posted.

Unfortunately, my test case was a simplified example of a general 
problem: dso->deps is assigned only for the main app and for libraries 
opened with dlopen(), but not for their dependencies. Consider the 
following:

$ cat bar.c
int bar = 42;
$ musl-gcc -fPIC -shared bar.c -o libbar.so
$ cat foo.c
extern int bar;
int *foo = &bar;
$ cat baz.c
extern int bazdep;
int *baz = &bazdep;
$ cat bazdep.c
int bazdep = 1;
$ cat main.c
#include <dlfcn.h>
#include <stdio.h>

int main(void) {
   if (!dlopen("libbaz.so", RTLD_NOW|RTLD_LOCAL))
     return 1;
   if (!dlopen("libfoo.so", RTLD_NOW|RTLD_LOCAL))
     return 1;
   void *h = dlopen("libbazdep.so", RTLD_NOW|RTLD_LOCAL);
   printf("%p\n", dlsym(h, "bar"));
}
$ musl-gcc main.c -Wl,-rpath='$ORIGIN' -ldl
$ ./a.out
0x7f66ed371020

Here, "libbazdep.so" assumes the role of "libc.so" from the previous 
test: it's a library with dso->deps == NULL that is loaded before 
"libfoo.so". So, when "libbazdep.so" is dlopen'd, musl considers it to 
be a "first load" and erroneously includes "libbar.so" to the list of 
dependencies of "libbazdep.so".

Alexey




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-06 17:02   ` Alexey Izbyshev
@ 2019-02-06 20:25     ` Markus Wichmann
  2019-02-06 21:23       ` Alexey Izbyshev
  0 siblings, 1 reply; 24+ messages in thread
From: Markus Wichmann @ 2019-02-06 20:25 UTC (permalink / raw)
  To: musl; +Cc: Alexey Izbyshev

[-- Attachment #1: Type: text/plain, Size: 999 bytes --]

On Wed, Feb 06, 2019 at 08:02:28PM +0300, Alexey Izbyshev wrote:
> Unfortunately, my test case was a simplified example of a general problem:
> dso->deps is assigned only for the main app and for libraries opened with
> dlopen(), but not for their dependencies. Consider the following:

Right you are. It took me a while to understand what the deps array was
even for (since musl's dlclose() doesn't do anything, tracking
dependencies is mostly pointless), but I found it is needed for lazy
relocation processing. So it is necessary for all libs opened by
dlopen() directly to contain a list of all their dependencies. All the
other libs can have an empty list.

So I propose the attached patch in addition to the previous one. This
will set dso->deps to the empty list in all libs not directly loaded
from dlopen(). The previous patch is still necessary, as nothing ever
calls load_deps() on the libc or the vdso, but all other modules get a
load_deps() treatment.

> 
> Alexey
> 
> 

Ciao,
Markus

[-- Attachment #2: 0007-Initialize-deps-on-non-directly-loaded-libs.patch --]
[-- Type: text/x-diff, Size: 964 bytes --]

From 841dd4e075040e2aeb01adea8ef5e2f7c0fc006a Mon Sep 17 00:00:00 2001
From: Markus Wichmann <nullplan@gmx.net>
Date: Wed, 6 Feb 2019 21:13:05 +0100
Subject: [PATCH 7/7] Initialize deps on non-directly loaded libs.

As pointed out by Alexey Izbyshev, having the deps member be zero opens
dlopen() and dlsym() up to malfunctions, if a library was previously
loaded as dependency and is then dlopen()ed. Therefore, we now set the
deps member of the dso descriptor to the sentry value in all libs loaded
as dependencies.
---
 ldso/dynlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index 6ffeca85..f8346c54 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -1158,8 +1158,8 @@ static void load_deps(struct dso *p)
 				*deps = tmp;
 			}
 		}
+		if (!p->deps) p->deps = (struct dso**)&nodeps_dummy;
 	}
-	if (!*deps) *deps = (struct dso **)&nodeps_dummy;
 }
 
 static void load_preload(char *s)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-06 20:25     ` Markus Wichmann
@ 2019-02-06 21:23       ` Alexey Izbyshev
  2019-02-07  5:33         ` Markus Wichmann
  0 siblings, 1 reply; 24+ messages in thread
From: Alexey Izbyshev @ 2019-02-06 21:23 UTC (permalink / raw)
  To: Markus Wichmann; +Cc: musl

On 2019-02-06 23:25, Markus Wichmann wrote:
> Right you are. It took me a while to understand what the deps array was
> even for (since musl's dlclose() doesn't do anything, tracking
> dependencies is mostly pointless), but I found it is needed for lazy
> relocation processing. So it is necessary for all libs opened by
> dlopen() directly to contain a list of all their dependencies. All the
> other libs can have an empty list.

Actually, dso->deps is used in dlsym(handle) because it must use the 
dependency order for symbol search, so it's incorrect to have deps empty 
for "all the other" libs. Consider the following modification of my 
previous example:

$ cat bazdep.c
int bazdep = 1;
extern int bazdepdep;
int *p = &bazdepdep;
$ cat bazdepdep.c
int bazdepdep = 2;
$ cat main.c
#include <dlfcn.h>
#include <stdio.h>

int main(void) {
   if (!dlopen("libbaz.so", RTLD_NOW|RTLD_LOCAL))
     return 1;
   if (!dlopen("libfoo.so", RTLD_NOW|RTLD_LOCAL))
     return 1;
   void *h = dlopen("libbazdep.so", RTLD_NOW|RTLD_LOCAL);
   printf("%p\n", dlsym(h, "bar"));
   printf("%p\n", dlsym(h, "bazdepdep"));
}

The correct output is zero in the first line and some non-zero address 
in the second. Vanilla musl 1.1.21 prints two non-zero addresses. But 
with your patch the output is two zeros because dlsym() can't search in 
dependencies of "libbazdep.so" anymore.

Alexey


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-06 21:23       ` Alexey Izbyshev
@ 2019-02-07  5:33         ` Markus Wichmann
  2019-02-07 13:42           ` Alexey Izbyshev
  2019-02-07 16:54           ` Rich Felker
  0 siblings, 2 replies; 24+ messages in thread
From: Markus Wichmann @ 2019-02-07  5:33 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Thu, Feb 07, 2019 at 12:23:06AM +0300, Alexey Izbyshev wrote:
> On 2019-02-06 23:25, Markus Wichmann wrote:
> > Right you are. It took me a while to understand what the deps array was
> > even for (since musl's dlclose() doesn't do anything, tracking
> > dependencies is mostly pointless), but I found it is needed for lazy
> > relocation processing. So it is necessary for all libs opened by
> > dlopen() directly to contain a list of all their dependencies. All the
> > other libs can have an empty list.
> 
> Actually, dso->deps is used in dlsym(handle) because it must use the
> dependency order for symbol search, so it's incorrect to have deps empty for
> "all the other" libs. Consider the following modification of my previous
> example:
> 
> $ cat bazdep.c
> int bazdep = 1;
> extern int bazdepdep;
> int *p = &bazdepdep;
> $ cat bazdepdep.c
> int bazdepdep = 2;
> $ cat main.c
> #include <dlfcn.h>
> #include <stdio.h>
> 
> int main(void) {
>   if (!dlopen("libbaz.so", RTLD_NOW|RTLD_LOCAL))
>     return 1;
>   if (!dlopen("libfoo.so", RTLD_NOW|RTLD_LOCAL))
>     return 1;
>   void *h = dlopen("libbazdep.so", RTLD_NOW|RTLD_LOCAL);
>   printf("%p\n", dlsym(h, "bar"));
>   printf("%p\n", dlsym(h, "bazdepdep"));
> }
> 
> The correct output is zero in the first line and some non-zero address in
> the second. Vanilla musl 1.1.21 prints two non-zero addresses. But with your
> patch the output is two zeros because dlsym() can't search in dependencies
> of "libbazdep.so" anymore.
> 
> Alexey

OK, so life just got more interesting. I gather the deps handling was
always incorrect.

Let's consider the original code. liba depends on libb, which depends on
libc. dlopen("liba") returns a handle with libb and libc in the deps,
but libb->deps == 0. If we now call dlopen("libb"), that does the right
thing, but only because libb happens to be the last lib in the chain. If
we'd have loaded libx, liby, and libz before trying libb, it would add
all the symbols of libs x, y, and z to the libb handle.

I guess the hope was that this situation never arrises. So how do we fix
this?

I think the easiest is probably going to be to patch up load_deps, but
avoiding recursion is going to be the fun part. My plan is to make
dso->deps contain all direct and indirect dependencies (which is what
the code seems to depend on, anyway). This is going to consume more
memory, but we are talking a few pointers, and we are dealing with
shared libs, anyway.

As you said, order is important. What is the correct order, depth-first
or breadth-first? I think it should be depth-first, but lack any
authoritative knowledge on this. It would make the most sense, anyway
(if, from the point of view of a user a library contains all the symbols
of its dependencies, then those dependencies must also contain all the
symbols of their dependencies). So with the following dependency tree:

liba->libb->libc
    `>libx->liby

the handle for liba would list libc before libx.

Easiest implementation is probably still going to be recursive. Let's
hope the dependency trees don't get too wild.

I'll look into it after work.

Ciao,
Markus


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-07  5:33         ` Markus Wichmann
@ 2019-02-07 13:42           ` Alexey Izbyshev
  2019-02-07 17:43             ` Markus Wichmann
  2019-02-07 16:54           ` Rich Felker
  1 sibling, 1 reply; 24+ messages in thread
From: Alexey Izbyshev @ 2019-02-07 13:42 UTC (permalink / raw)
  To: Markus Wichmann; +Cc: musl

On 2/7/19 8:33 AM, Markus Wichmann wrote:
> Let's consider the original code. liba depends on libb, which depends on
> libc. dlopen("liba") returns a handle with libb and libc in the deps,
> but libb->deps == 0. If we now call dlopen("libb"), that does the right
> thing, but only because libb happens to be the last lib in the chain. If
> we'd have loaded libx, liby, and libz before trying libb, it would add
> all the symbols of libs x, y, and z to the libb handle.

Your description almost captures the problem, but is imprecise in the 
last part: "it would add all the symbols of libs x, y, and z to the libb 
handle". load_deps() looks only at DT_NEEDED entries of libraries it 
iterates over, so, for example, if libx depends on both liby and libz, 
then liby and libz (but not libx) would be added to deps of libb.

Moreover, consider the following dependency hierarchy (loaded on 
dlopen("liba")):
liba
   libb
   libd
     libe

In this case, even dlopen("libb") wouldn't do the right thing because 
load_deps() would find libe in DT_NEEDED of libd and add it to deps of libb.
> 
> As you said, order is important. What is the correct order, depth-first
> or breadth-first? I think it should be depth-first, but lack any
> authoritative knowledge on this.
dlsym(handle) uses so-called "dependency order"[1], which is 
breadth-first[2]. This is what musl current does in cases when 
load_deps() is called on a real first load of a library (so that 
everything that's further in the dso list are implicitly loaded 
dependencies of this library).

So with the following dependency tree:
> 
> liba->libb->libc
>      `>libx->liby
> 
> the handle for liba would list libc before libx.
> 
The correct order is what load_deps() does currently: liba libb libx 
libc liby

 > Easiest implementation is probably still going to be recursive. Let's
hope the dependency trees don't get too wild.

I think the easiest way is simply to modify load_deps() to always 
traverse DT_NEEDED in breadth-first order without relying on the dso 
list in the outer loop. load_deps() already effectively maintains a 
queue (deps) that can be used for BFS, so no recursion is needed.
> 
> I'll look into it after work.
> 
Thanks for following this up, Markus!

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlsym.html
[2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlopen.html

Alexey



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-07  5:33         ` Markus Wichmann
  2019-02-07 13:42           ` Alexey Izbyshev
@ 2019-02-07 16:54           ` Rich Felker
  2019-02-07 18:36             ` Markus Wichmann
  2019-02-08 10:19             ` A. Wilcox
  1 sibling, 2 replies; 24+ messages in thread
From: Rich Felker @ 2019-02-07 16:54 UTC (permalink / raw)
  To: musl

On Thu, Feb 07, 2019 at 06:33:27AM +0100, Markus Wichmann wrote:
> On Thu, Feb 07, 2019 at 12:23:06AM +0300, Alexey Izbyshev wrote:
> > On 2019-02-06 23:25, Markus Wichmann wrote:
> > > Right you are. It took me a while to understand what the deps array was
> > > even for (since musl's dlclose() doesn't do anything, tracking
> > > dependencies is mostly pointless), but I found it is needed for lazy
> > > relocation processing. So it is necessary for all libs opened by
> > > dlopen() directly to contain a list of all their dependencies. All the
> > > other libs can have an empty list.
> > 
> > Actually, dso->deps is used in dlsym(handle) because it must use the
> > dependency order for symbol search, so it's incorrect to have deps empty for
> > "all the other" libs. Consider the following modification of my previous
> > example:
> > 
> > $ cat bazdep.c
> > int bazdep = 1;
> > extern int bazdepdep;
> > int *p = &bazdepdep;
> > $ cat bazdepdep.c
> > int bazdepdep = 2;
> > $ cat main.c
> > #include <dlfcn.h>
> > #include <stdio.h>
> > 
> > int main(void) {
> >   if (!dlopen("libbaz.so", RTLD_NOW|RTLD_LOCAL))
> >     return 1;
> >   if (!dlopen("libfoo.so", RTLD_NOW|RTLD_LOCAL))
> >     return 1;
> >   void *h = dlopen("libbazdep.so", RTLD_NOW|RTLD_LOCAL);
> >   printf("%p\n", dlsym(h, "bar"));
> >   printf("%p\n", dlsym(h, "bazdepdep"));
> > }
> > 
> > The correct output is zero in the first line and some non-zero address in
> > the second. Vanilla musl 1.1.21 prints two non-zero addresses. But with your
> > patch the output is two zeros because dlsym() can't search in dependencies
> > of "libbazdep.so" anymore.
> > 
> > Alexey
> 
> OK, so life just got more interesting. I gather the deps handling was
> always incorrect.
> 
> Let's consider the original code. liba depends on libb, which depends on
> libc. dlopen("liba") returns a handle with libb and libc in the deps,
> but libb->deps == 0. If we now call dlopen("libb"), that does the right
> thing, but only because libb happens to be the last lib in the chain. If
> we'd have loaded libx, liby, and libz before trying libb, it would add
> all the symbols of libs x, y, and z to the libb handle.
> 
> I guess the hope was that this situation never arrises. So how do we fix
> this?
> 
> I think the easiest is probably going to be to patch up load_deps, but
> avoiding recursion is going to be the fun part. My plan is to make
> dso->deps contain all direct and indirect dependencies (which is what
> the code seems to depend on, anyway). This is going to consume more
> memory, but we are talking a few pointers, and we are dealing with
> shared libs, anyway.

Yes, it needs to contain all direct and indirect dependencies, in the
correct order. Right now I think it incorrectly contains a superset of
that. Is that your assessment of the situation too?

> As you said, order is important. What is the correct order, depth-first
> or breadth-first? I think it should be depth-first, but lack any
> authoritative knowledge on this.

It's not; it's breadth-first. The definition of "dependency order"
used for dlsym is here:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlopen.html

    "Dependency ordering uses a breadth-first order starting with a
    given executable object file, then all of its dependencies, then
    any dependents of those, iterating until all dependencies are
    satisfied. With the exception of the global symbol table handle
    obtained via a dlopen() operation with a null pointer as the file
    argument, dependency ordering is used by the dlsym() function."

However, a depth-first list of dependencies is also needed to solve
the longstanding ctor-order issue discussed in several past threads
(which I can look up if search doesn't turn them up for you). This is
not a requirement of POSIX (which doesn't even have ctors), but it's a
reasonable expectation we currently get wrong and I think it might be
specified in ELF or some related sysv "standards".

I'm not sure if it makes sense to build both of these lists at the
same time. We currently try to avoid allocating dependency lists for
libraries loaded at startup in order not to allocate and setup data
that might never be used, defering until if/when dlopen is called on
them. I've wanted to do the ctor order walk without allocating a
dependency list, but I don't know a good way to do so. Note that the
code that runs ctors cannot allocate because, at the time it runs,
dlopen has already "committed" the load and can't back it out. It has
to have all resources it needs for the walk precommitted.

Since the beginning of a list of breadth-first dependencies is just
the direct dependencies, if we recorded the number of direct
dependencies for each dso, the breadth-first lists could be used to
perform a depth-first walk to run ctors; this can be made
non-recursive, at the cost of quadratic time but with trivially-tiny
constant factor, by simply restarting at the root of the tree after
each node and finding the deepest not-yet-constructed dso. This
suggests always allocating the breadth-first dependency lists might be
a good idea. That "feels" unfortunate since in principle they can get
large, but it's only going to be large for ridiculous bloatware with
hundreds of libraries, in which case these dependency lists are very
small on a relative scale.

> It would make the most sense, anyway
> (if, from the point of view of a user a library contains all the symbols
> of its dependencies, then those dependencies must also contain all the
> symbols of their dependencies). So with the following dependency tree:
> 
> liba->libb->libc
>     `>libx->liby
> 
> the handle for liba would list libc before libx.
> 
> Easiest implementation is probably still going to be recursive. Let's
> hope the dependency trees don't get too wild.

I don't think recursive can be done safely since you don't have a
small bound on levels of recursion. It could be done with an explicit
stack in allocated storage though, since the operation has to allocate
memory anyway and has to fail if allocation fails.

Rich


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-07 13:42           ` Alexey Izbyshev
@ 2019-02-07 17:43             ` Markus Wichmann
  2019-02-07 20:37               ` Markus Wichmann
  2019-02-07 21:29               ` Rich Felker
  0 siblings, 2 replies; 24+ messages in thread
From: Markus Wichmann @ 2019-02-07 17:43 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 1861 bytes --]

On Thu, Feb 07, 2019 at 04:42:15PM +0300, Alexey Izbyshev wrote:
> I think the easiest way is simply to modify load_deps() to always traverse
> DT_NEEDED in breadth-first order without relying on the dso list in the
> outer loop. load_deps() already effectively maintains a queue (deps) that
> can be used for BFS, so no recursion is needed.

OK, since we have to implement a BFS, that does in fact work. So I
implemented that. Still needs testing, though.

One side effect is, patch 7 from the previous mail was reverted.

Another is that now load_deps() depends on the deps array as loop
structure. I was almost as far as just using the runtime code and adding
an a_crash() in case of allocation failure during loadtime, but then I
decided to just split loadtime and runtime apart. So
load_deps_loadtime() is just a copy of load_deps(), refactored with the
assumption runtime==0, and load_deps_runtime() is a copy of load_deps(),
with the patch discussed here and refactored under the assumption
runtime!=0.

I had noticed, during the refactoring, that this means that app->deps ==
{0}, always. So I wondered if that might bite us. However, the only
normal way to obtain a handle to the app itself is to call dlsym() with
RTLD_NEXT. Which is one of the special symbols that will load symbols
from the given DSO and all following ones in the symbol list. And all of
the main app's dependencies are immediately added to the symbol list
after the first load_deps() call (now load_deps_loadtime()).

For Rich's comfort, I am attaching patch 6 again, so all relevant
patches are in one mail.

> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlsym.html
> [2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlopen.html
> 

Right you are, again. Maybe I should read more POSIX before working on
these things. :-)

> Alexey
> 

Ciao,
Markus

[-- Attachment #2: 0006-Make-libc-and-vdso-explicitly-have-no-deps.patch --]
[-- Type: text/x-diff, Size: 1443 bytes --]

From e823910d69ff56ffccecaa9b29fd4b67b901798a Mon Sep 17 00:00:00 2001
From: Markus Wichmann <nullplan@gmx.net>
Date: Wed, 6 Feb 2019 16:51:53 +0100
Subject: [PATCH 6/9] Make libc and vdso explicitly have no deps.

Alexey Izbyshev reported that without this, dlopen("libc.so") returns a
handle that is capable of finding every symbol in libraries loaded as
dependencies, since dso->deps == 0 usually means dependencies haven't
been loaded.
---
 ldso/dynlink.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index ec921dfd..6ffeca85 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -1244,6 +1244,7 @@ static void reloc_all(struct dso *p)
 static void kernel_mapped_dso(struct dso *p)
 {
 	size_t min_addr = -1, max_addr = 0, cnt;
+	static const struct dso *sentinel = 0;
 	Phdr *ph = p->phdr;
 	for (cnt = p->phnum; cnt--; ph = (void *)((char *)ph + p->phentsize)) {
 		if (ph->p_type == PT_DYNAMIC) {
@@ -1428,6 +1429,7 @@ hidden void __dls2(unsigned char *base, size_t *sp)
 	ldso.phdr = laddr(&ldso, ehdr->e_phoff);
 	ldso.phentsize = ehdr->e_phentsize;
 	kernel_mapped_dso(&ldso);
+	ldso.deps = (struct dso**)&nodeps_dummy;
 	decode_dyn(&ldso);
 
 	if (DL_FDPIC) makefuncdescs(&ldso);
@@ -1675,6 +1677,7 @@ _Noreturn void __dls3(size_t *sp)
 		vdso.prev = tail;
 		tail->next = &vdso;
 		tail = &vdso;
+		vdso.deps = (struct dso**)&nodeps_dummy;
 	}
 
 	for (i=0; app.dynv[i]; i+=2) {
-- 
2.20.1


[-- Attachment #3: 0009-Fix-runtime-dependency-accounting-in-dlopen.patch --]
[-- Type: text/x-diff, Size: 3228 bytes --]

From 18008eb03acd59f6cbaa82c607f1969c70707e21 Mon Sep 17 00:00:00 2001
From: Markus Wichmann <nullplan@gmx.net>
Date: Thu, 7 Feb 2019 18:17:25 +0100
Subject: [PATCH 9/9] Fix runtime dependency accounting in dlopen().

As Alexey Izbyshev pointed out, the library given as argument to
dlopen() does not necessarily have to be the last in the chain. Nor do
any of the dependencies. Therefore it is wrong to assume that walking
the chain of libraries from any of them forward will only walk over
dependencies of the freshly loaded library.

I had to split the runtime and loadtime paths of load_deps() apart, or
otherwise I would have had to allocate the deps array for the
application at loadtime. And then I would have needed a resolution for
allocation failure, which would have been a crash.
---
 ldso/dynlink.c | 43 ++++++++++++++++++++++++++++---------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index 6ffeca85..66e6f18b 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -1136,10 +1136,10 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
 	return p;
 }
 
-static void load_deps(struct dso *p)
-{
-	size_t i, ndeps=0;
-	struct dso ***deps = &p->deps, **tmp, *dep;
+static void load_deps_loadtime(struct dso *p) {
+	size_t i;
+	struct dso *dep;
+	p->deps = (struct dso**)&nodeps_dummy;
 	for (; p; p=p->next) {
 		for (i=0; p->dynv[i]; i+=2) {
 			if (p->dynv[i] != DT_NEEDED) continue;
@@ -1147,19 +1147,32 @@ static void load_deps(struct dso *p)
 			if (!dep) {
 				error("Error loading shared library %s: %m (needed by %s)",
 					p->strings + p->dynv[i+1], p->name);
-				if (runtime) longjmp(*rtld_fail, 1);
-				continue;
 			}
-			if (runtime) {
-				tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2));
-				if (!tmp) longjmp(*rtld_fail, 1);
-				tmp[ndeps++] = dep;
-				tmp[ndeps] = 0;
-				*deps = tmp;
+		}
+	}
+}
+
+static void load_deps_runtime(struct dso *p)
+{
+	size_t i, ndeps=0, j=0;
+	struct dso ***deps = &p->deps, **tmp, *dep;
+	for (; p; p=(*deps)[j++]) {
+		for (i=0; p->dynv[i]; i+=2) {
+			if (p->dynv[i] != DT_NEEDED) continue;
+			dep = load_library(p->strings + p->dynv[i+1], p);
+			if (!dep) {
+				error("Error loading shared library %s: %m (needed by %s)",
+					p->strings + p->dynv[i+1], p->name);
+				longjmp(*rtld_fail, 1);
 			}
+			tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2));
+			if (!tmp) longjmp(*rtld_fail, 1);
+			tmp[ndeps++] = dep;
+			tmp[ndeps] = 0;
+			*deps = tmp;
 		}
 	}
-	if (!*deps) *deps = (struct dso **)&nodeps_dummy;
+	if (!*deps) *deps = (struct dso**)&nodeps_dummy;
 }
 
 static void load_preload(char *s)
@@ -1653,7 +1666,7 @@ _Noreturn void __dls3(size_t *sp)
 
 	/* Load preload/needed libraries, add symbols to global namespace. */
 	if (env_preload) load_preload(env_preload);
- 	load_deps(&app);
+ 	load_deps_loadtime(&app);
 	for (struct dso *p=head; p; p=p->next)
 		add_syms(p);
 
@@ -1836,7 +1849,7 @@ void *dlopen(const char *file, int mode)
 	/* First load handling */
 	int first_load = !p->deps;
 	if (first_load) {
-		load_deps(p);
+		load_deps_runtime(p);
 		if (!p->relocated && (mode & RTLD_LAZY)) {
 			prepare_lazy(p);
 			for (i=0; p->deps[i]; i++)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-07 16:54           ` Rich Felker
@ 2019-02-07 18:36             ` Markus Wichmann
  2019-02-07 18:57               ` Rich Felker
  2019-02-08 10:19             ` A. Wilcox
  1 sibling, 1 reply; 24+ messages in thread
From: Markus Wichmann @ 2019-02-07 18:36 UTC (permalink / raw)
  To: musl

On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote:
> Yes, it needs to contain all direct and indirect dependencies, in the
> correct order. Right now I think it incorrectly contains a superset of
> that. Is that your assessment of the situation too?
> 

Yes, it is a superset, if the original argument to load_deps() is not
the tail of the chain of libraries.

> However, a depth-first list of dependencies is also needed to solve
> the longstanding ctor-order issue discussed in several past threads
> (which I can look up if search doesn't turn them up for you). This is
> not a requirement of POSIX (which doesn't even have ctors), but it's a
> reasonable expectation we currently get wrong and I think it might be
> specified in ELF or some related sysv "standards".
> 

Requirements first, nice-to-haves later, right? Once we have the dlsym()
conformance issue squared away, we can still deal with this issue.

You mean, people expect their dependencies to be initialized in their
own initializers? When it is well-known that there is no order to
initializers, and e.g. the C++ standard says there is no order to
constructors of static objects.

> I'm not sure if it makes sense to build both of these lists at the
> same time. We currently try to avoid allocating dependency lists for
> libraries loaded at startup in order not to allocate and setup data
> that might never be used, defering until if/when dlopen is called on
> them. I've wanted to do the ctor order walk without allocating a
> dependency list, but I don't know a good way to do so. Note that the
> code that runs ctors cannot allocate because, at the time it runs,
> dlopen has already "committed" the load and can't back it out. It has
> to have all resources it needs for the walk precommitted.
> 

Depth first means stack. Means recursion or explicit stack. Explicit is
going to be hard, considering there is no known limit to dependency
nesting depth. We could argue that the user better make their stack size
ulimit large enough for the main thread, but I hazard a guess that
nobody expects dlopen() to use overly much stack in other threads.

Alright, what's the algorithm here?

init(lib):
    if (!lib.inited):
        foreach d in lib.deps init(d)
        start_init(lib)
        lib.inited = 1

That about right? Because that means we need a stack as high as the
nesting depth of dependencies. We can get a pessimistic estimation from
the number of deps loaded, but that may be way too high (see below).

> Since the beginning of a list of breadth-first dependencies is just
> the direct dependencies, if we recorded the number of direct
> dependencies for each dso, the breadth-first lists could be used to
> perform a depth-first walk to run ctors; this can be made
> non-recursive, at the cost of quadratic time but with trivially-tiny
> constant factor, by simply restarting at the root of the tree after
> each node and finding the deepest not-yet-constructed dso. This
> suggests always allocating the breadth-first dependency lists might be
> a good idea. That "feels" unfortunate since in principle they can get
> large, but it's only going to be large for ridiculous bloatware with
> hundreds of libraries, in which case these dependency lists are very
> small on a relative scale.
> 

$ ldd =mpv | wc -l
292

And that, I will remind you, is the "suckless" alternative to mplayer.

Anyway, that is breadth, not depth. Experimentally, it is hard to find a
chain of libraries deeper than ten. They all eventually just lead back
to libc.

So what you could do, when it comes time to run the initializers, is to
walk the dyn vectors again. Those are already allocated, and all the
libs they reference are already loaded, so load_library() would only be
a search operation. By the time the initializers run, that must be the
case, both at load time and at runtime. Unless loading the library
failed the first time, in which case at runtime we have already bailed.
And at load time we... haven't. Are you sure continuing on load failure
of a dependency during load time is a good idea?

> I don't think recursive can be done safely since you don't have a
> small bound on levels of recursion. It could be done with an explicit
> stack in allocated storage though, since the operation has to allocate
> memory anyway and has to fail if allocation fails.
> 
> Rich

Nevermind, it was a misunderstanding of the requirements.

Ciao,
Markus


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-07 18:36             ` Markus Wichmann
@ 2019-02-07 18:57               ` Rich Felker
  2019-02-07 20:31                 ` Markus Wichmann
  0 siblings, 1 reply; 24+ messages in thread
From: Rich Felker @ 2019-02-07 18:57 UTC (permalink / raw)
  To: musl

On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote:
> On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote:
> > However, a depth-first list of dependencies is also needed to solve
> > the longstanding ctor-order issue discussed in several past threads
> > (which I can look up if search doesn't turn them up for you). This is
> > not a requirement of POSIX (which doesn't even have ctors), but it's a
> > reasonable expectation we currently get wrong and I think it might be
> > specified in ELF or some related sysv "standards".
> 
> Requirements first, nice-to-haves later, right? Once we have the dlsym()
> conformance issue squared away, we can still deal with this issue.
> 
> You mean, people expect their dependencies to be initialized in their
> own initializers? When it is well-known that there is no order to
> initializers, and e.g. the C++ standard says there is no order to
> constructors of static objects.

Yes. GCC has an extension for ctor priority within static linking
scope, but for dynamic linking scope that doesn't help. I don't like
any of this but glib depends on it to avoid just doing the right thing
with pthread_once/call_once, and refuses to fix it.

> > I'm not sure if it makes sense to build both of these lists at the
> > same time. We currently try to avoid allocating dependency lists for
> > libraries loaded at startup in order not to allocate and setup data
> > that might never be used, defering until if/when dlopen is called on
> > them. I've wanted to do the ctor order walk without allocating a
> > dependency list, but I don't know a good way to do so. Note that the
> > code that runs ctors cannot allocate because, at the time it runs,
> > dlopen has already "committed" the load and can't back it out. It has
> > to have all resources it needs for the walk precommitted.
> 
> Depth first means stack. Means recursion or explicit stack. Explicit is
> going to be hard, considering there is no known limit to dependency
> nesting depth. We could argue that the user better make their stack size
> ulimit large enough for the main thread, but I hazard a guess that
> nobody expects dlopen() to use overly much stack in other threads.
> 
> Alright, what's the algorithm here?
> 
> init(lib):
>     if (!lib.inited):
>         foreach d in lib.deps init(d)
>         start_init(lib)
>         lib.inited = 1
> 
> That about right? Because that means we need a stack as high as the
> nesting depth of dependencies. We can get a pessimistic estimation from
> the number of deps loaded, but that may be way too high (see below).

Yes, but you can also avoid recursion just by looping to the deepest
dependency with !inited, then going back to the root. For a one-time
operation at dlopen-time or program-start time, the quadratic search
for each !inited seems unlikely to be a problem:

> > Since the beginning of a list of breadth-first dependencies is just
> > the direct dependencies, if we recorded the number of direct
> > dependencies for each dso, the breadth-first lists could be used to
> > perform a depth-first walk to run ctors; this can be made
> > non-recursive, at the cost of quadratic time but with trivially-tiny
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > constant factor, by simply restarting at the root of the tree after
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > each node and finding the deepest not-yet-constructed dso. This
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> $ ldd =mpv | wc -l
> 292
> 
> And that, I will remind you, is the "suckless" alternative to mplayer.
> 
> Anyway, that is breadth, not depth. Experimentally, it is hard to find a
> chain of libraries deeper than ten. They all eventually just lead back
> to libc.
> 

> So what you could do, when it comes time to run the initializers, is to
> walk the dyn vectors again. Those are already allocated, and all the
> libs they reference are already loaded, so load_library() would only be
> a search operation.

Note that each load_library is a linear search, so you still have the
quadratic time, now with much higher coefficient. And if you want to
use my go-back-to-root approach to avoid having O(depth) working
space, it's cubic time.

Also, as-written, load_library is not "only a search operation";
perhaps this is a bug in itself. For absolute paths it will always
open the file and compare inode identity to see if it matches an
already-loaded file. This should probably be bypassed by first
checking long-name. Fortunately absolute paths in DT_NEEDED are
strongly discouraged/a-really-bad-idea, so it probably doesn't come up
much in practice.

> By the time the initializers run, that must be the
> case, both at load time and at runtime. Unless loading the library
> failed the first time, in which case at runtime we have already bailed.
> And at load time we... haven't. Are you sure continuing on load failure
> of a dependency during load time is a good idea?

I don't follow. The dlopen operation is not committed until load of
all dependencies completes successfully, and if any fail to load, the
whole operation is backed-out. But ctors don't/can't run until *after*
that, when we've already committed to success.

Rich


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-07 18:57               ` Rich Felker
@ 2019-02-07 20:31                 ` Markus Wichmann
  2019-02-07 21:33                   ` Rich Felker
  0 siblings, 1 reply; 24+ messages in thread
From: Markus Wichmann @ 2019-02-07 20:31 UTC (permalink / raw)
  To: musl

On Thu, Feb 07, 2019 at 01:57:36PM -0500, Rich Felker wrote:
> Yes. GCC has an extension for ctor priority within static linking
> scope, but for dynamic linking scope that doesn't help. I don't like
> any of this but glib depends on it to avoid just doing the right thing
> with pthread_once/call_once, and refuses to fix it.
> 

Well, at least we are on the same page, here. And my opinion of glib is
validated once more. Unfortunately, at this point it is too big to fail,
in several ways.

> Yes, but you can also avoid recursion just by looping to the deepest
> dependency with !inited, then going back to the root. For a one-time
> operation at dlopen-time or program-start time, the quadratic search
> for each !inited seems unlikely to be a problem:
> 

Wait, I have an idea. If the only ordering is that the dependencies need
to be initialized before their dependents, then couldn't we just
initialize the libs in reverse BFS order? The elements further down the
tree are all necessarily further down the list, aren't they?

> I don't follow. The dlopen operation is not committed until load of
> all dependencies completes successfully, and if any fail to load, the
> whole operation is backed-out. But ctors don't/can't run until *after*
> that, when we've already committed to success.
> 

That is true for the runtime case, i.e. dlopen(). But load_deps() is
also called at load time. And initializers have to run at load time,
too. And in the correct order.

If at load time, any dependencies fail to load, an error message is
printed and then the loop continues. load_deps() has no way to signal
failure to the caller, and at load time it will not exit the function in
another way, i.e. longjump (which is good since that would be invalid at
that time). So by the time the initializers are called, all dependencies
are loaded except those which failed.

> Rich

Ciao,
Markus


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-07 17:43             ` Markus Wichmann
@ 2019-02-07 20:37               ` Markus Wichmann
  2019-02-07 21:29               ` Rich Felker
  1 sibling, 0 replies; 24+ messages in thread
From: Markus Wichmann @ 2019-02-07 20:37 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 316 bytes --]

I accidentally added a crash. Namely exactly here:

> +static void load_deps_runtime(struct dso *p)
> +{
> +	size_t i, ndeps=0, j=0;
> +	struct dso ***deps = &p->deps, **tmp, *dep;
> +	for (; p; p=(*deps)[j++]) {

*deps might be null pointer, which this expression is dereferencing.
Patch is attached.

Ciao,
Markus

[-- Attachment #2: 0010-Fix-crash-bug-from-previous-commit.patch --]
[-- Type: text/x-diff, Size: 983 bytes --]

From d90e719cccfe09439324b66cd8894d3781b00048 Mon Sep 17 00:00:00 2001
From: Markus Wichmann <nullplan@gmx.net>
Date: Thu, 7 Feb 2019 21:35:17 +0100
Subject: [PATCH 10/10] Fix crash bug from previous commit.

The previous revision would crash if dlopen() was called for a library
with no dependencies.
---
 ldso/dynlink.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index 66e6f18b..85c3db75 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -1136,7 +1136,8 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
 	return p;
 }
 
-static void load_deps_loadtime(struct dso *p) {
+static void load_deps_loadtime(struct dso *p)
+{
 	size_t i;
 	struct dso *dep;
 	p->deps = (struct dso**)&nodeps_dummy;
@@ -1171,6 +1172,7 @@ static void load_deps_runtime(struct dso *p)
 			tmp[ndeps] = 0;
 			*deps = tmp;
 		}
+		if (!*deps) break;
 	}
 	if (!*deps) *deps = (struct dso**)&nodeps_dummy;
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-07 17:43             ` Markus Wichmann
  2019-02-07 20:37               ` Markus Wichmann
@ 2019-02-07 21:29               ` Rich Felker
  1 sibling, 0 replies; 24+ messages in thread
From: Rich Felker @ 2019-02-07 21:29 UTC (permalink / raw)
  To: musl

On Thu, Feb 07, 2019 at 06:43:12PM +0100, Markus Wichmann wrote:
> On Thu, Feb 07, 2019 at 04:42:15PM +0300, Alexey Izbyshev wrote:
> > I think the easiest way is simply to modify load_deps() to always traverse
> > DT_NEEDED in breadth-first order without relying on the dso list in the
> > outer loop. load_deps() already effectively maintains a queue (deps) that
> > can be used for BFS, so no recursion is needed.
> 
> OK, since we have to implement a BFS, that does in fact work. So I
> implemented that. Still needs testing, though.

Comments below:

> One side effect is, patch 7 from the previous mail was reverted.
> 
> Another is that now load_deps() depends on the deps array as loop
> structure. I was almost as far as just using the runtime code and adding
> an a_crash() in case of allocation failure during loadtime, but then I
> decided to just split loadtime and runtime apart. So
> load_deps_loadtime() is just a copy of load_deps(), refactored with the
> assumption runtime==0, and load_deps_runtime() is a copy of load_deps(),
> with the patch discussed here and refactored under the assumption
> runtime!=0.

The error() function sets ldso_fail to true, which prevents running of
the program. So it should work just fine to keep them together, and to
build the deps lists at program start time if we want (which is still
an open question, I think).

> I had noticed, during the refactoring, that this means that app->deps ==
> {0}, always. So I wondered if that might bite us. However, the only
> normal way to obtain a handle to the app itself is to call dlsym() with
> RTLD_NEXT. Which is one of the special symbols that will load symbols
> from the given DSO and all following ones in the symbol list. And all of
> the main app's dependencies are immediately added to the symbol list
> after the first load_deps() call (now load_deps_loadtime()).

This seems correct.

> For Rich's comfort, I am attaching patch 6 again, so all relevant
> patches are in one mail.

One thing to fix in it, see below..

> From e823910d69ff56ffccecaa9b29fd4b67b901798a Mon Sep 17 00:00:00 2001
> From: Markus Wichmann <nullplan@gmx.net>
> Date: Wed, 6 Feb 2019 16:51:53 +0100
> Subject: [PATCH 6/9] Make libc and vdso explicitly have no deps.
> 
> Alexey Izbyshev reported that without this, dlopen("libc.so") returns a
> handle that is capable of finding every symbol in libraries loaded as
> dependencies, since dso->deps == 0 usually means dependencies haven't
> been loaded.
> ---
>  ldso/dynlink.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> index ec921dfd..6ffeca85 100644
> --- a/ldso/dynlink.c
> +++ b/ldso/dynlink.c
> @@ -1244,6 +1244,7 @@ static void reloc_all(struct dso *p)
>  static void kernel_mapped_dso(struct dso *p)
>  {
>  	size_t min_addr = -1, max_addr = 0, cnt;
> +	static const struct dso *sentinel = 0;

This fragment is unused and looks like cruft leftover from an earlier
idea.

>  	Phdr *ph = p->phdr;
>  	for (cnt = p->phnum; cnt--; ph = (void *)((char *)ph + p->phentsize)) {
>  		if (ph->p_type == PT_DYNAMIC) {
> @@ -1428,6 +1429,7 @@ hidden void __dls2(unsigned char *base, size_t *sp)
>  	ldso.phdr = laddr(&ldso, ehdr->e_phoff);
>  	ldso.phentsize = ehdr->e_phentsize;
>  	kernel_mapped_dso(&ldso);
> +	ldso.deps = (struct dso**)&nodeps_dummy;
>  	decode_dyn(&ldso);
>  
>  	if (DL_FDPIC) makefuncdescs(&ldso);
> @@ -1675,6 +1677,7 @@ _Noreturn void __dls3(size_t *sp)
>  		vdso.prev = tail;
>  		tail->next = &vdso;
>  		tail = &vdso;
> +		vdso.deps = (struct dso**)&nodeps_dummy;
>  	}

Style nit: (struct dso **)

> From 18008eb03acd59f6cbaa82c607f1969c70707e21 Mon Sep 17 00:00:00 2001
> From: Markus Wichmann <nullplan@gmx.net>
> Date: Thu, 7 Feb 2019 18:17:25 +0100
> Subject: [PATCH 9/9] Fix runtime dependency accounting in dlopen().
> 
> As Alexey Izbyshev pointed out, the library given as argument to
> dlopen() does not necessarily have to be the last in the chain. Nor do
> any of the dependencies. Therefore it is wrong to assume that walking
> the chain of libraries from any of them forward will only walk over
> dependencies of the freshly loaded library.
> 
> I had to split the runtime and loadtime paths of load_deps() apart, or
> otherwise I would have had to allocate the deps array for the
> application at loadtime. And then I would have needed a resolution for
> allocation failure, which would have been a crash.
> ---
>  ldso/dynlink.c | 43 ++++++++++++++++++++++++++++---------------
>  1 file changed, 28 insertions(+), 15 deletions(-)
> 
> diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> index 6ffeca85..66e6f18b 100644
> --- a/ldso/dynlink.c
> +++ b/ldso/dynlink.c
> @@ -1136,10 +1136,10 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
>  	return p;
>  }
>  
> -static void load_deps(struct dso *p)
> -{
> -	size_t i, ndeps=0;
> -	struct dso ***deps = &p->deps, **tmp, *dep;
> +static void load_deps_loadtime(struct dso *p) {
> +	size_t i;
> +	struct dso *dep;
> +	p->deps = (struct dso**)&nodeps_dummy;
>  	for (; p; p=p->next) {
>  		for (i=0; p->dynv[i]; i+=2) {
>  			if (p->dynv[i] != DT_NEEDED) continue;
> @@ -1147,19 +1147,32 @@ static void load_deps(struct dso *p)
>  			if (!dep) {
>  				error("Error loading shared library %s: %m (needed by %s)",
>  					p->strings + p->dynv[i+1], p->name);
> -				if (runtime) longjmp(*rtld_fail, 1);
> -				continue;
>  			}
> -			if (runtime) {
> -				tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2));
> -				if (!tmp) longjmp(*rtld_fail, 1);
> -				tmp[ndeps++] = dep;
> -				tmp[ndeps] = 0;
> -				*deps = tmp;
> +		}
> +	}
> +}
> +
> +static void load_deps_runtime(struct dso *p)
> +{
> +	size_t i, ndeps=0, j=0;
> +	struct dso ***deps = &p->deps, **tmp, *dep;
> +	for (; p; p=(*deps)[j++]) {
> +		for (i=0; p->dynv[i]; i+=2) {
> +			if (p->dynv[i] != DT_NEEDED) continue;
> +			dep = load_library(p->strings + p->dynv[i+1], p);
> +			if (!dep) {
> +				error("Error loading shared library %s: %m (needed by %s)",
> +					p->strings + p->dynv[i+1], p->name);
> +				longjmp(*rtld_fail, 1);
>  			}
> +			tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2));
> +			if (!tmp) longjmp(*rtld_fail, 1);
> +			tmp[ndeps++] = dep;
> +			tmp[ndeps] = 0;
> +			*deps = tmp;
>  		}

Aside from above remark about not splitting the two versions, I don't
think the algorithm works. In the case of circular dependencies, which
are awful but do happen in the wild, the loop will run forever and
keep appending to the deps array. I think this can be fixed via
comparison of each new dep against prior slots (linear time for each
addition, so quadratic overall) to avoid adding it more than once, or
since we hold a lock, a tag could be added to struct dso to tag which
ones we've already hit (constant time). Since load_library is already
a linear search with a larger value of N than the number of
dependencies, I don't really see any advantage to avoiding the linear
search here, and would just go with it since it's simpler and less
invasive.

>  	}
> -	if (!*deps) *deps = (struct dso **)&nodeps_dummy;
> +	if (!*deps) *deps = (struct dso**)&nodeps_dummy;

Spurious style regression. :)

Rich


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-07 20:31                 ` Markus Wichmann
@ 2019-02-07 21:33                   ` Rich Felker
  2019-02-07 21:37                     ` Rich Felker
  0 siblings, 1 reply; 24+ messages in thread
From: Rich Felker @ 2019-02-07 21:33 UTC (permalink / raw)
  To: musl

On Thu, Feb 07, 2019 at 09:31:38PM +0100, Markus Wichmann wrote:
> > Yes, but you can also avoid recursion just by looping to the deepest
> > dependency with !inited, then going back to the root. For a one-time
> > operation at dlopen-time or program-start time, the quadratic search
> > for each !inited seems unlikely to be a problem:
> > 
> 
> Wait, I have an idea. If the only ordering is that the dependencies need
> to be initialized before their dependents, then couldn't we just
> initialize the libs in reverse BFS order? The elements further down the
> tree are all necessarily further down the list, aren't they?

No. Suppose X depends on Y and Z, and Z also depends on Y. If you do
reverse-BFS order, you'll construct Z before Y, despite Z depending on
Y (and Z's ctors depending on Y's ctors already having run).

> > I don't follow. The dlopen operation is not committed until load of
> > all dependencies completes successfully, and if any fail to load, the
> > whole operation is backed-out. But ctors don't/can't run until *after*
> > that, when we've already committed to success.
> 
> That is true for the runtime case, i.e. dlopen(). But load_deps() is
> also called at load time. And initializers have to run at load time,
> too. And in the correct order.
> 
> If at load time, any dependencies fail to load, an error message is
> printed and then the loop continues. load_deps() has no way to signal
> failure to the caller, and at load time it will not exit the function in
> another way, i.e. longjump (which is good since that would be invalid at
> that time). So by the time the initializers are called, all dependencies
> are loaded except those which failed.

See the definition of error(). It sets ldso_fail so that execution
never proceeds to the program.

Rich


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-07 21:33                   ` Rich Felker
@ 2019-02-07 21:37                     ` Rich Felker
  0 siblings, 0 replies; 24+ messages in thread
From: Rich Felker @ 2019-02-07 21:37 UTC (permalink / raw)
  To: musl

On Thu, Feb 07, 2019 at 04:33:18PM -0500, Rich Felker wrote:
> On Thu, Feb 07, 2019 at 09:31:38PM +0100, Markus Wichmann wrote:
> > > Yes, but you can also avoid recursion just by looping to the deepest
> > > dependency with !inited, then going back to the root. For a one-time
> > > operation at dlopen-time or program-start time, the quadratic search
> > > for each !inited seems unlikely to be a problem:
> > > 
> > 
> > Wait, I have an idea. If the only ordering is that the dependencies need
> > to be initialized before their dependents, then couldn't we just
> > initialize the libs in reverse BFS order? The elements further down the
> > tree are all necessarily further down the list, aren't they?
> 
> No. Suppose X depends on Y and Z, and Z also depends on Y. If you do
> reverse-BFS order, you'll construct Z before Y, despite Z depending on
> Y (and Z's ctors depending on Y's ctors already having run).

Hmm, maybe if you add redundant entries like your code was doing, this
works -- the list gets "YZY" for my example -- but then you waste a
good bit of space and need a way to cut off circular dependencies
(which can't respect ctor dependency order) while not breaking the
order for non-circular ones.

Rich


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-07 16:54           ` Rich Felker
  2019-02-07 18:36             ` Markus Wichmann
@ 2019-02-08 10:19             ` A. Wilcox
  2019-02-08 12:00               ` Szabolcs Nagy
  1 sibling, 1 reply; 24+ messages in thread
From: A. Wilcox @ 2019-02-08 10:19 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 883 bytes --]

On Feb 7, 2019, at 10:54 AM, Rich Felker <dalias@libc.org> wrote:
> 
> However, a depth-first list of dependencies is also needed to solve the longstanding ctor-order issue discussed in several past threads (which I can look up if search doesn't turn them up for you). This is not a requirement of POSIX (which doesn't even have ctors), but it's a reasonable expectation we currently get wrong and I think it might be specified in ELF or some related sysv "standards".


It is part of the ELF 1.2 standard and is not only required by GLib, but also Nouveau, gtk-doc, some Qt apps, and others.

More info, including the spec citation, example failures, and the musl ML thread, can be found on our BTS:

https://bts.adelielinux.org/show_bug.cgi?id=11

Best,
—arw 


--
A. Wilcox (Sent from my iPhone - not signed)
Project Lead, Adélie Linux
https://adelielinux.org

[-- Attachment #2: Type: text/html, Size: 1904 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-08 10:19             ` A. Wilcox
@ 2019-02-08 12:00               ` Szabolcs Nagy
  2019-02-08 16:09                 ` Rich Felker
  0 siblings, 1 reply; 24+ messages in thread
From: Szabolcs Nagy @ 2019-02-08 12:00 UTC (permalink / raw)
  To: musl

* A. Wilcox <awilfox@adelielinux.org> [2019-02-08 04:19:32 -0600]:
> On Feb 7, 2019, at 10:54 AM, Rich Felker <dalias@libc.org> wrote:
> > 
> > However, a depth-first list of dependencies is also needed to solve the longstanding ctor-order issue discussed in several past threads (which I can look up if search doesn't turn them up for you). This is not a requirement of POSIX (which doesn't even have ctors), but it's a reasonable expectation we currently get wrong and I think it might be specified in ELF or some related sysv "standards".
> 
> 
> It is part of the ELF 1.2 standard and is not only required by GLib, but also Nouveau, gtk-doc, some Qt apps, and others.

note, that the ctor order in c++ is undefined across translation
units so if a c++ application relies on it that's a bug: the c++
ctor concept does not necessarily map directly to the elf
initialization function concept on an elf platform so the elf
spec does not save them.

in the elf context, relying on DT_NEEDED for ctor ordering
across libraries means that static linking is broken since
then there is no DT_NEEDED.

so these packages should be fixed independently of musl
(assuming they want to be portable or at least static linkable).

that said, lot of libraries do rely on dependency based ctor
ordering even beyond the elf guarantees (glibc and solaris
dynamic linkers considers relocations on top of DT_NEEDED to
establish dependencies between dsos) and some applications
have dependency cycles which of course is impossible to get
right but glibc tries to keep the ctor order deterministic
in some way even in the presence of dlopen during init.

originally dynamic linkers did exactly what musl did, there
is a nice overview how the complexity evolved:
https://blogs.oracle.com/solaris/init-and-fini-processing-who-designed-this-v2

> More info, including the spec citation, example failures, and the musl ML thread, can be found on our BTS:
> 
> https://bts.adelielinux.org/show_bug.cgi?id=11
> 
> Best,
> —arw 
> 
> 
> --
> A. Wilcox (Sent from my iPhone - not signed)
> Project Lead, Adélie Linux
> https://adelielinux.org


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-08 12:00               ` Szabolcs Nagy
@ 2019-02-08 16:09                 ` Rich Felker
  0 siblings, 0 replies; 24+ messages in thread
From: Rich Felker @ 2019-02-08 16:09 UTC (permalink / raw)
  To: musl

On Fri, Feb 08, 2019 at 01:00:16PM +0100, Szabolcs Nagy wrote:
> * A. Wilcox <awilfox@adelielinux.org> [2019-02-08 04:19:32 -0600]:
> > On Feb 7, 2019, at 10:54 AM, Rich Felker <dalias@libc.org> wrote:
> > > 
> > > However, a depth-first list of dependencies is also needed to solve the longstanding ctor-order issue discussed in several past threads (which I can look up if search doesn't turn them up for you). This is not a requirement of POSIX (which doesn't even have ctors), but it's a reasonable expectation we currently get wrong and I think it might be specified in ELF or some related sysv "standards".
> > 
> > 
> > It is part of the ELF 1.2 standard and is not only required by GLib, but also Nouveau, gtk-doc, some Qt apps, and others.
> 
> note, that the ctor order in c++ is undefined across translation
> units so if a c++ application relies on it that's a bug: the c++
> ctor concept does not necessarily map directly to the elf
> initialization function concept on an elf platform so the elf
> spec does not save them.

In this case they're not using C++ but the GNU C extension.

> in the elf context, relying on DT_NEEDED for ctor ordering
> across libraries means that static linking is broken since
> then there is no DT_NEEDED.

This is true, but they can make static linking also work (within
libraries that share a convention on the priority values to use) by
using the GNU C ctor priority attributes. This could fix the glib bug.
I forget if any distros are doing it out-of-tree or if they just
patched the one ctor to call the other init function explicitly.

> so these packages should be fixed independently of musl
> (assuming they want to be portable or at least static linkable).

This is probably mostly correct.

> that said, lot of libraries do rely on dependency based ctor
> ordering even beyond the elf guarantees (glibc and solaris
> dynamic linkers considers relocations on top of DT_NEEDED to
> establish dependencies between dsos) and some applications
> have dependency cycles which of course is impossible to get
> right but glibc tries to keep the ctor order deterministic
> in some way even in the presence of dlopen during init.
> 
> originally dynamic linkers did exactly what musl did, there
> is a nice overview how the complexity evolved:
> https://blogs.oracle.com/solaris/init-and-fini-processing-who-designed-this-v2

Good to know there's precedent.

Rich


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-26 15:07   ` Rich Felker
@ 2019-03-04  2:11     ` Rich Felker
  0 siblings, 0 replies; 24+ messages in thread
From: Rich Felker @ 2019-03-04  2:11 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Tue, Feb 26, 2019 at 10:07:33AM -0500, Rich Felker wrote:
> On Sat, Feb 09, 2019 at 08:03:47PM -0500, Rich Felker wrote:
> > On Sun, Feb 10, 2019 at 01:53:02AM +0300, Alexey Izbyshev wrote:
> > > (Replying to https://www.openwall.com/lists/musl/2019/02/07/7)
> > > 
> > > >On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote:
> > > >> On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote:
> > > >> > I'm not sure if it makes sense to build both of these lists at the
> > > >> > same time. We currently try to avoid allocating dependency lists for
> > > >> > libraries loaded at startup in order not to allocate and setup data
> > > >> > that might never be used, defering until if/when dlopen is called on
> > > >> > them. I've wanted to do the ctor order walk without allocating a
> > > >> > dependency list, but I don't know a good way to do so. Note that the
> > > >> > code that runs ctors cannot allocate because, at the time it runs,
> > > >> > dlopen has already "committed" the load and can't back it out. It has
> > > >> > to have all resources it needs for the walk precommitted.
> > > >>
> > > >> Depth first means stack. Means recursion or explicit stack. Explicit is
> > > >> going to be hard, considering there is no known limit to dependency
> > > >> nesting depth. We could argue that the user better make their stack size
> > > >> ulimit large enough for the main thread, but I hazard a guess that
> > > >> nobody expects dlopen() to use overly much stack in other threads.
> > > >>
> > > >> Alright, what's the algorithm here?
> > > >>
> > > >> init(lib):
> > > >>     if (!lib.inited):
> > > >>         foreach d in lib.deps init(d)
> > > >>         start_init(lib)
> > > >>         lib.inited = 1
> > > >>
> > > >> That about right? Because that means we need a stack as high as the
> > > >> nesting depth of dependencies. We can get a pessimistic estimation from
> > > >> the number of deps loaded, but that may be way too high (see below).
> > > >
> > > >Yes, but you can also avoid recursion just by looping to the deepest
> > > >dependency with !inited, then going back to the root. For a one-time
> > > >operation at dlopen-time or program-start time, the quadratic search
> > > >for each !inited seems unlikely to be a problem:
> > > >
> > > >> > Since the beginning of a list of breadth-first dependencies is just
> > > >> > the direct dependencies, if we recorded the number of direct
> > > >> > dependencies for each dso, the breadth-first lists could be used to
> > > >> > perform a depth-first walk to run ctors; this can be made
> > > >> > non-recursive, at the cost of quadratic time but with trivially-tiny
> > > >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > >> > constant factor, by simply restarting at the root of the tree after
> > > >    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > >> > each node and finding the deepest not-yet-constructed dso. This
> > > >    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > 
> > > We'd need breadth-first lists for all dsos to implement DFS as you
> > > suggest because a single breadth-first list for the root is not
> > > enough to reconstruct all edges of the dependency graph even if we
> > > know the number of direct dependencies for each dso (we need to
> > > process each edge for DFS to work correctly). Or am I
> > > misunderstanding your suggestion?
> > 
> > I think you misunderstood. I was talking about having the full BFS
> > list for each dso, but tagging each one with the count of entries
> > which are direct dependencies, so that the corresponding initial
> > subarray can be used for DFS.
> > 
> > > If we're willing to add some fields to struct dso to implement the
> > > correct ctor order, I'd suggest to use a non-recursive DFS with an
> > > explicit stack instead. We'd need just two fields in each stack slot
> > > (dso and the number of processed DT_NEEDED entries), so the memory
> > > overhead is low compared to sizeof(struct dso). The stack can be
> > > preallocated before "committing" dlopen(). As Markus said, the upper
> > > bound is the number of dsos loaded with this dlopen() call because
> > > all other dsos are already constructed, so we don't need to visit
> > > them. This stack can be freed immediately after ctors are run. We
> > > already have dso->constructed to mark visited dsos, and we already
> > > use another list for finalization (dso->fini_next), and that list
> > > would be built naturally by DFS.
> > 
> > I think it might get more complicated with concurrent and recursive
> > calls to dlopen (e.g. dlopen from a ctor). Right now the
> > init_fini_lock prevents a dlopen from another thread from completing
> > as long as ctors are still running from a previous call, except
> > recursively (in the same thread) thanks to a recursive mutex. Really,
> > though, we should never be holding libc locks while application code
> > runs, so the lock should be released each time a ctor is run and
> > re-taken after it finishes.
> > 
> > I'd considered preallocating the explicit stack structure as part of
> > the dsos, which is problematic because multiple callers might need to
> > use it. However if the explicit stack were allocated by and just tied
> > to the dlopen call frame instance, I think it should work. There may
> > be deps that were already loaded from another thread's dlopen, though,
> > but which were not yet constructed, so the bound is probably not the
> > number of newly-loaded dsos but the total number of deps or the max
> > depth of deps. This is probably each to record from building the BFS
> > list.
> 
> I've gotten back to thinking about this, and here's a rough idea of
> the algorithm I'm thinking of using:
> 
> Load-direct-deps operation for a single DSO:
> - Count DT_NEEDED entries, store number of direct deps.
> - Allocate storage for direct deps list.
> - Call load_library for each direct dep, save pointer to it.
> 
> Load-deps operation for newly dlopen'd DSO:
> - Start with all DSOs unmarked
> - Use the deps list as a queue. For each entry in queue:
>   - If the DSO is marked, skip it.
>   - If the DSO doesn't have a deps list, perform load-direct-deps.
>   - Append the initial n_direct_deps entries to the queue.
>   - Mark the DSO.
> - When end of queue is reached, finish and unmark all DSOs.
> 
> This should leave us with direct-dep lists for all DSOs, usable for
> dependency-order ctor execution, and a non-redundant (currently, the
> list can be highly redundant and thus inefficient to dlsym) BFS deps
> list for the dlopen'd DSO. The direct-dep lists for DSOs not
> explicitly opened also form the initial segment of any future BFS deps
> list needed when dlopen is called on them.
> 
> The above is entirely non-recursive and uses no temp storage; the data
> is generated in-place. Walking the direct-deps lists for ctor
> execution can either use recursion or the "restart" approach I've
> described before.

These and related changes are now upstream in git master, so the dlsym
issue should be fixed. Please let me know if it still doesn't work
right or if there are new problems.

Rich


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-10  1:03 ` Rich Felker
@ 2019-02-26 15:07   ` Rich Felker
  2019-03-04  2:11     ` Rich Felker
  0 siblings, 1 reply; 24+ messages in thread
From: Rich Felker @ 2019-02-26 15:07 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Sat, Feb 09, 2019 at 08:03:47PM -0500, Rich Felker wrote:
> On Sun, Feb 10, 2019 at 01:53:02AM +0300, Alexey Izbyshev wrote:
> > (Replying to https://www.openwall.com/lists/musl/2019/02/07/7)
> > 
> > >On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote:
> > >> On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote:
> > >> > I'm not sure if it makes sense to build both of these lists at the
> > >> > same time. We currently try to avoid allocating dependency lists for
> > >> > libraries loaded at startup in order not to allocate and setup data
> > >> > that might never be used, defering until if/when dlopen is called on
> > >> > them. I've wanted to do the ctor order walk without allocating a
> > >> > dependency list, but I don't know a good way to do so. Note that the
> > >> > code that runs ctors cannot allocate because, at the time it runs,
> > >> > dlopen has already "committed" the load and can't back it out. It has
> > >> > to have all resources it needs for the walk precommitted.
> > >>
> > >> Depth first means stack. Means recursion or explicit stack. Explicit is
> > >> going to be hard, considering there is no known limit to dependency
> > >> nesting depth. We could argue that the user better make their stack size
> > >> ulimit large enough for the main thread, but I hazard a guess that
> > >> nobody expects dlopen() to use overly much stack in other threads.
> > >>
> > >> Alright, what's the algorithm here?
> > >>
> > >> init(lib):
> > >>     if (!lib.inited):
> > >>         foreach d in lib.deps init(d)
> > >>         start_init(lib)
> > >>         lib.inited = 1
> > >>
> > >> That about right? Because that means we need a stack as high as the
> > >> nesting depth of dependencies. We can get a pessimistic estimation from
> > >> the number of deps loaded, but that may be way too high (see below).
> > >
> > >Yes, but you can also avoid recursion just by looping to the deepest
> > >dependency with !inited, then going back to the root. For a one-time
> > >operation at dlopen-time or program-start time, the quadratic search
> > >for each !inited seems unlikely to be a problem:
> > >
> > >> > Since the beginning of a list of breadth-first dependencies is just
> > >> > the direct dependencies, if we recorded the number of direct
> > >> > dependencies for each dso, the breadth-first lists could be used to
> > >> > perform a depth-first walk to run ctors; this can be made
> > >> > non-recursive, at the cost of quadratic time but with trivially-tiny
> > >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >> > constant factor, by simply restarting at the root of the tree after
> > >    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >> > each node and finding the deepest not-yet-constructed dso. This
> > >    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > We'd need breadth-first lists for all dsos to implement DFS as you
> > suggest because a single breadth-first list for the root is not
> > enough to reconstruct all edges of the dependency graph even if we
> > know the number of direct dependencies for each dso (we need to
> > process each edge for DFS to work correctly). Or am I
> > misunderstanding your suggestion?
> 
> I think you misunderstood. I was talking about having the full BFS
> list for each dso, but tagging each one with the count of entries
> which are direct dependencies, so that the corresponding initial
> subarray can be used for DFS.
> 
> > If we're willing to add some fields to struct dso to implement the
> > correct ctor order, I'd suggest to use a non-recursive DFS with an
> > explicit stack instead. We'd need just two fields in each stack slot
> > (dso and the number of processed DT_NEEDED entries), so the memory
> > overhead is low compared to sizeof(struct dso). The stack can be
> > preallocated before "committing" dlopen(). As Markus said, the upper
> > bound is the number of dsos loaded with this dlopen() call because
> > all other dsos are already constructed, so we don't need to visit
> > them. This stack can be freed immediately after ctors are run. We
> > already have dso->constructed to mark visited dsos, and we already
> > use another list for finalization (dso->fini_next), and that list
> > would be built naturally by DFS.
> 
> I think it might get more complicated with concurrent and recursive
> calls to dlopen (e.g. dlopen from a ctor). Right now the
> init_fini_lock prevents a dlopen from another thread from completing
> as long as ctors are still running from a previous call, except
> recursively (in the same thread) thanks to a recursive mutex. Really,
> though, we should never be holding libc locks while application code
> runs, so the lock should be released each time a ctor is run and
> re-taken after it finishes.
> 
> I'd considered preallocating the explicit stack structure as part of
> the dsos, which is problematic because multiple callers might need to
> use it. However if the explicit stack were allocated by and just tied
> to the dlopen call frame instance, I think it should work. There may
> be deps that were already loaded from another thread's dlopen, though,
> but which were not yet constructed, so the bound is probably not the
> number of newly-loaded dsos but the total number of deps or the max
> depth of deps. This is probably each to record from building the BFS
> list.

I've gotten back to thinking about this, and here's a rough idea of
the algorithm I'm thinking of using:

Load-direct-deps operation for a single DSO:
- Count DT_NEEDED entries, store number of direct deps.
- Allocate storage for direct deps list.
- Call load_library for each direct dep, save pointer to it.

Load-deps operation for newly dlopen'd DSO:
- Start with all DSOs unmarked
- Use the deps list as a queue. For each entry in queue:
  - If the DSO is marked, skip it.
  - If the DSO doesn't have a deps list, perform load-direct-deps.
  - Append the initial n_direct_deps entries to the queue.
  - Mark the DSO.
- When end of queue is reached, finish and unmark all DSOs.

This should leave us with direct-dep lists for all DSOs, usable for
dependency-order ctor execution, and a non-redundant (currently, the
list can be highly redundant and thus inefficient to dlsym) BFS deps
list for the dlopen'd DSO. The direct-dep lists for DSOs not
explicitly opened also form the initial segment of any future BFS deps
list needed when dlopen is called on them.

The above is entirely non-recursive and uses no temp storage; the data
is generated in-place. Walking the direct-deps lists for ctor
execution can either use recursion or the "restart" approach I've
described before.

Rich


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
  2019-02-09 22:53 Alexey Izbyshev
@ 2019-02-10  1:03 ` Rich Felker
  2019-02-26 15:07   ` Rich Felker
  0 siblings, 1 reply; 24+ messages in thread
From: Rich Felker @ 2019-02-10  1:03 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Sun, Feb 10, 2019 at 01:53:02AM +0300, Alexey Izbyshev wrote:
> (Replying to https://www.openwall.com/lists/musl/2019/02/07/7)
> 
> >On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote:
> >> On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote:
> >> > I'm not sure if it makes sense to build both of these lists at the
> >> > same time. We currently try to avoid allocating dependency lists for
> >> > libraries loaded at startup in order not to allocate and setup data
> >> > that might never be used, defering until if/when dlopen is called on
> >> > them. I've wanted to do the ctor order walk without allocating a
> >> > dependency list, but I don't know a good way to do so. Note that the
> >> > code that runs ctors cannot allocate because, at the time it runs,
> >> > dlopen has already "committed" the load and can't back it out. It has
> >> > to have all resources it needs for the walk precommitted.
> >>
> >> Depth first means stack. Means recursion or explicit stack. Explicit is
> >> going to be hard, considering there is no known limit to dependency
> >> nesting depth. We could argue that the user better make their stack size
> >> ulimit large enough for the main thread, but I hazard a guess that
> >> nobody expects dlopen() to use overly much stack in other threads.
> >>
> >> Alright, what's the algorithm here?
> >>
> >> init(lib):
> >>     if (!lib.inited):
> >>         foreach d in lib.deps init(d)
> >>         start_init(lib)
> >>         lib.inited = 1
> >>
> >> That about right? Because that means we need a stack as high as the
> >> nesting depth of dependencies. We can get a pessimistic estimation from
> >> the number of deps loaded, but that may be way too high (see below).
> >
> >Yes, but you can also avoid recursion just by looping to the deepest
> >dependency with !inited, then going back to the root. For a one-time
> >operation at dlopen-time or program-start time, the quadratic search
> >for each !inited seems unlikely to be a problem:
> >
> >> > Since the beginning of a list of breadth-first dependencies is just
> >> > the direct dependencies, if we recorded the number of direct
> >> > dependencies for each dso, the breadth-first lists could be used to
> >> > perform a depth-first walk to run ctors; this can be made
> >> > non-recursive, at the cost of quadratic time but with trivially-tiny
> >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >> > constant factor, by simply restarting at the root of the tree after
> >    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >> > each node and finding the deepest not-yet-constructed dso. This
> >    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> We'd need breadth-first lists for all dsos to implement DFS as you
> suggest because a single breadth-first list for the root is not
> enough to reconstruct all edges of the dependency graph even if we
> know the number of direct dependencies for each dso (we need to
> process each edge for DFS to work correctly). Or am I
> misunderstanding your suggestion?

I think you misunderstood. I was talking about having the full BFS
list for each dso, but tagging each one with the count of entries
which are direct dependencies, so that the corresponding initial
subarray can be used for DFS.

> If we're willing to add some fields to struct dso to implement the
> correct ctor order, I'd suggest to use a non-recursive DFS with an
> explicit stack instead. We'd need just two fields in each stack slot
> (dso and the number of processed DT_NEEDED entries), so the memory
> overhead is low compared to sizeof(struct dso). The stack can be
> preallocated before "committing" dlopen(). As Markus said, the upper
> bound is the number of dsos loaded with this dlopen() call because
> all other dsos are already constructed, so we don't need to visit
> them. This stack can be freed immediately after ctors are run. We
> already have dso->constructed to mark visited dsos, and we already
> use another list for finalization (dso->fini_next), and that list
> would be built naturally by DFS.

I think it might get more complicated with concurrent and recursive
calls to dlopen (e.g. dlopen from a ctor). Right now the
init_fini_lock prevents a dlopen from another thread from completing
as long as ctors are still running from a previous call, except
recursively (in the same thread) thanks to a recursive mutex. Really,
though, we should never be holding libc locks while application code
runs, so the lock should be released each time a ctor is run and
re-taken after it finishes.

I'd considered preallocating the explicit stack structure as part of
the dsos, which is problematic because multiple callers might need to
use it. However if the explicit stack were allocated by and just tied
to the dlopen call frame instance, I think it should work. There may
be deps that were already loaded from another thread's dlopen, though,
but which were not yet constructed, so the bound is probably not the
number of newly-loaded dsos but the total number of deps or the max
depth of deps. This is probably each to record from building the BFS
list.

Rich


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: dlsym(handle) may search in unrelated libraries
@ 2019-02-09 22:53 Alexey Izbyshev
  2019-02-10  1:03 ` Rich Felker
  0 siblings, 1 reply; 24+ messages in thread
From: Alexey Izbyshev @ 2019-02-09 22:53 UTC (permalink / raw)
  To: musl

(Replying to https://www.openwall.com/lists/musl/2019/02/07/7)

> On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote:
> > On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote:
> > > I'm not sure if it makes sense to build both of these lists at the
> > > same time. We currently try to avoid allocating dependency lists for
> > > libraries loaded at startup in order not to allocate and setup data
> > > that might never be used, defering until if/when dlopen is called on
> > > them. I've wanted to do the ctor order walk without allocating a
> > > dependency list, but I don't know a good way to do so. Note that the
> > > code that runs ctors cannot allocate because, at the time it runs,
> > > dlopen has already "committed" the load and can't back it out. It has
> > > to have all resources it needs for the walk precommitted.
> >
> > Depth first means stack. Means recursion or explicit stack. Explicit is
> > going to be hard, considering there is no known limit to dependency
> > nesting depth. We could argue that the user better make their stack size
> > ulimit large enough for the main thread, but I hazard a guess that
> > nobody expects dlopen() to use overly much stack in other threads.
> >
> > Alright, what's the algorithm here?
> >
> > init(lib):
> >     if (!lib.inited):
> >         foreach d in lib.deps init(d)
> >         start_init(lib)
> >         lib.inited = 1
> >
> > That about right? Because that means we need a stack as high as the
> > nesting depth of dependencies. We can get a pessimistic estimation from
> > the number of deps loaded, but that may be way too high (see below).
> 
> Yes, but you can also avoid recursion just by looping to the deepest
> dependency with !inited, then going back to the root. For a one-time
> operation at dlopen-time or program-start time, the quadratic search
> for each !inited seems unlikely to be a problem:
> 
> > > Since the beginning of a list of breadth-first dependencies is just
> > > the direct dependencies, if we recorded the number of direct
> > > dependencies for each dso, the breadth-first lists could be used to
> > > perform a depth-first walk to run ctors; this can be made
> > > non-recursive, at the cost of quadratic time but with trivially-tiny
>                    
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > constant factor, by simply restarting at the root of the tree after
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > each node and finding the deepest not-yet-constructed dso. This
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We'd need breadth-first lists for all dsos to implement DFS as you 
suggest because a single breadth-first list for the root is not enough 
to reconstruct all edges of the dependency graph even if we know the 
number of direct dependencies for each dso (we need to process each edge 
for DFS to work correctly). Or am I misunderstanding your suggestion?

If we're willing to add some fields to struct dso to implement the 
correct ctor order, I'd suggest to use a non-recursive DFS with an 
explicit stack instead. We'd need just two fields in each stack slot 
(dso and the number of processed DT_NEEDED entries), so the memory 
overhead is low compared to sizeof(struct dso). The stack can be 
preallocated before "committing" dlopen(). As Markus said, the upper 
bound is the number of dsos loaded with this dlopen() call because all 
other dsos are already constructed, so we don't need to visit them. This 
stack can be freed immediately after ctors are run. We already have 
dso->constructed to mark visited dsos, and we already use another list 
for finalization (dso->fini_next), and that list would be built 
naturally by DFS.

Alexey


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-03-04  2:11 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-05 21:02 dlsym(handle) may search in unrelated libraries Alexey Izbyshev
2019-02-06 13:40 ` Alexey Izbyshev
2019-02-06 16:02 ` Markus Wichmann
2019-02-06 17:02   ` Alexey Izbyshev
2019-02-06 20:25     ` Markus Wichmann
2019-02-06 21:23       ` Alexey Izbyshev
2019-02-07  5:33         ` Markus Wichmann
2019-02-07 13:42           ` Alexey Izbyshev
2019-02-07 17:43             ` Markus Wichmann
2019-02-07 20:37               ` Markus Wichmann
2019-02-07 21:29               ` Rich Felker
2019-02-07 16:54           ` Rich Felker
2019-02-07 18:36             ` Markus Wichmann
2019-02-07 18:57               ` Rich Felker
2019-02-07 20:31                 ` Markus Wichmann
2019-02-07 21:33                   ` Rich Felker
2019-02-07 21:37                     ` Rich Felker
2019-02-08 10:19             ` A. Wilcox
2019-02-08 12:00               ` Szabolcs Nagy
2019-02-08 16:09                 ` Rich Felker
2019-02-09 22:53 Alexey Izbyshev
2019-02-10  1:03 ` Rich Felker
2019-02-26 15:07   ` Rich Felker
2019-03-04  2:11     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).