mailing list of musl libc
 help / color / mirror / code / Atom feed
* Reviving planned ldso changes
@ 2017-01-03  5:43 Rich Felker
  2017-01-04  6:06 ` Rich Felker
  2017-01-04 10:51 ` Szabolcs Nagy
  0 siblings, 2 replies; 21+ messages in thread
From: Rich Felker @ 2017-01-03  5:43 UTC (permalink / raw)
  To: musl

One item that's been on the agenda for a long time (maybe 1.5 years
now?) is planned dynamic linker improvements. The thread "Further
dynamic linker optimizations" from summer 2015 covered parts but maybe
not all of what I had in mind.

- Instead of a global flag for each library, have a separate linked
  list for globally visible libraries. This will speed up symbol
  lookup (by skipping non-global ones, and skipping checking the
  global flag) and also makes it so that load order searching can be
  done based on the order in which the libraries became global, rather
  than their original load order when they might have been non-global.

- Dependency-order ctor execution. I think the walk order looks
  something like, starting at a given node (initially the main app or
  new library being loaded dynamically), traversing to its first
  deps[] entry that hasn't been constructed, or, if none remain,
  executing its ctors then traversing back to its needed_by. This
  process avoids the need for any call recursion and should be
  near-optimal (if not optimal) provided the position in a dso's
  deps[] list is saved in the dso struct.

- Recursive dlopen improvements. The thread "dlopen deadlock" from
  January 2016 covers some of the issues. I don't think we reached a
  complete conclusion/solution, but for reasonable cases where it's
  clear that dlopen *should* work without deadlock, I think we have a
  proposed design that works.

- Possibly other optimizations I've left out...

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-01-03  5:43 Reviving planned ldso changes Rich Felker
@ 2017-01-04  6:06 ` Rich Felker
  2017-01-04  6:22   ` Rich Felker
  2017-01-04 10:51 ` Szabolcs Nagy
  1 sibling, 1 reply; 21+ messages in thread
From: Rich Felker @ 2017-01-04  6:06 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 834 bytes --]

On Tue, Jan 03, 2017 at 12:43:51AM -0500, Rich Felker wrote:
> - Dependency-order ctor execution. I think the walk order looks
>   something like, starting at a given node (initially the main app or
>   new library being loaded dynamically), traversing to its first
>   deps[] entry that hasn't been constructed, or, if none remain,
>   executing its ctors then traversing back to its needed_by. This
>   process avoids the need for any call recursion and should be
>   near-optimal (if not optimal) provided the position in a dso's
>   deps[] list is saved in the dso struct.

Attached is a proposed, completely untested patch to implement
dependency order loading, posted for review/comments. In order to
work, I think it also needs p->deps to be setup unconditionally (right
now it's only setup for dynamically loaded libs).

Rich

[-- Attachment #2: ctor_dep_order.diff --]
[-- Type: text/plain, Size: 1540 bytes --]

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index c689084..cb82b2c 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -67,6 +67,7 @@ struct dso {
 	char constructed;
 	char kernel_mapped;
 	struct dso **deps, *needed_by;
+	size_t next_dep;
 	char *rpath_orig, *rpath;
 	struct tls_module tls;
 	size_t tls_id;
@@ -1211,13 +1212,14 @@ void __libc_exit_fini()
 static void do_init_fini(struct dso *p)
 {
 	size_t dyn[DYN_CNT];
-	int need_locking = libc.threads_minus_1;
-	/* Allow recursive calls that arise when a library calls
-	 * dlopen from one of its constructors, but block any
-	 * other threads until all ctors have finished. */
-	if (need_locking) pthread_mutex_lock(&init_fini_lock);
-	for (; p; p=p->prev) {
-		if (p->constructed) continue;
+	pthread_mutex_lock(&init_fini_lock);
+	while (!p->constructed) {
+		while (p->deps[p->next_dep] && p->deps[p->next_dep]->constructed)
+			p->next_dep++;
+		if (p->deps[p->next_dep]) {
+			p = p->deps[p->next_dep++];
+			continue;
+		}
 		p->constructed = 1;
 		decode_vec(p->dynv, dyn, DYN_CNT);
 		if (dyn[0] & ((1<<DT_FINI) | (1<<DT_FINI_ARRAY))) {
@@ -1233,12 +1235,9 @@ static void do_init_fini(struct dso *p)
 			size_t *fn = laddr(p, dyn[DT_INIT_ARRAY]);
 			while (n--) ((void (*)(void))*fn++)();
 		}
-		if (!need_locking && libc.threads_minus_1) {
-			need_locking = 1;
-			pthread_mutex_lock(&init_fini_lock);
-		}
+		p = p->needed_by;
 	}
-	if (need_locking) pthread_mutex_unlock(&init_fini_lock);
+	pthread_mutex_unlock(&init_fini_lock);
 }
 
 void __libc_start_init(void)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-01-04  6:06 ` Rich Felker
@ 2017-01-04  6:22   ` Rich Felker
  2017-01-04 19:36     ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2017-01-04  6:22 UTC (permalink / raw)
  To: musl

On Wed, Jan 04, 2017 at 01:06:40AM -0500, Rich Felker wrote:
> diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> index c689084..cb82b2c 100644
> --- a/ldso/dynlink.c
> +++ b/ldso/dynlink.c
> @@ -67,6 +67,7 @@ struct dso {
>  	char constructed;
>  	char kernel_mapped;
>  	struct dso **deps, *needed_by;
> +	size_t next_dep;
>  	char *rpath_orig, *rpath;
>  	struct tls_module tls;
>  	size_t tls_id;
> @@ -1211,13 +1212,14 @@ void __libc_exit_fini()
>  static void do_init_fini(struct dso *p)
>  {
>  	size_t dyn[DYN_CNT];
> -	int need_locking = libc.threads_minus_1;
> -	/* Allow recursive calls that arise when a library calls
> -	 * dlopen from one of its constructors, but block any
> -	 * other threads until all ctors have finished. */
> -	if (need_locking) pthread_mutex_lock(&init_fini_lock);
> -	for (; p; p=p->prev) {
> -		if (p->constructed) continue;
> +	pthread_mutex_lock(&init_fini_lock);
> +	while (!p->constructed) {
> +		while (p->deps[p->next_dep] && p->deps[p->next_dep]->constructed)
> +			p->next_dep++;
> +		if (p->deps[p->next_dep]) {
> +			p = p->deps[p->next_dep++];
> +			continue;
> +		}

I think this logic is probably broken in the case of circular
dependencies, and will end up skipping ctors for some deps due to the
increment in this line:

+			p = p->deps[p->next_dep++];

which happens before the new p's ctors have actually run. Omitting the
increment here, however, would turn it into an infinite loop. We need
some way to detect this case and let the dso with an indirect
dependency on itself run its ctors once all non-circular deps have
been satisfied. I don't now the right algorithm for this right off,
though; suggestions would be welcome.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-01-03  5:43 Reviving planned ldso changes Rich Felker
  2017-01-04  6:06 ` Rich Felker
@ 2017-01-04 10:51 ` Szabolcs Nagy
  2017-02-16  1:58   ` Szabolcs Nagy
  1 sibling, 1 reply; 21+ messages in thread
From: Szabolcs Nagy @ 2017-01-04 10:51 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2017-01-03 00:43:51 -0500]:
> One item that's been on the agenda for a long time (maybe 1.5 years
> now?) is planned dynamic linker improvements. The thread "Further
> dynamic linker optimizations" from summer 2015 covered parts but maybe
> not all of what I had in mind.
> 
> - Instead of a global flag for each library, have a separate linked
>   list for globally visible libraries. This will speed up symbol
>   lookup (by skipping non-global ones, and skipping checking the
>   global flag) and also makes it so that load order searching can be
>   done based on the order in which the libraries became global, rather
>   than their original load order when they might have been non-global.
> 
> - Dependency-order ctor execution. I think the walk order looks
>   something like, starting at a given node (initially the main app or
>   new library being loaded dynamically), traversing to its first
>   deps[] entry that hasn't been constructed, or, if none remain,
>   executing its ctors then traversing back to its needed_by. This
>   process avoids the need for any call recursion and should be
>   near-optimal (if not optimal) provided the position in a dso's
>   deps[] list is saved in the dso struct.
> 
> - Recursive dlopen improvements. The thread "dlopen deadlock" from
>   January 2016 covers some of the issues. I don't think we reached a
>   complete conclusion/solution, but for reasonable cases where it's
>   clear that dlopen *should* work without deadlock, I think we have a
>   proposed design that works.
> 
> - Possibly other optimizations I've left out...

- symbol lookup logic is duplicated between do_dlsym and find_sym,
  it would be nice to do that with common code path if possible.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-01-04  6:22   ` Rich Felker
@ 2017-01-04 19:36     ` Rich Felker
  2017-01-14 21:30       ` A. Wilcox
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2017-01-04 19:36 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2062 bytes --]

On Wed, Jan 04, 2017 at 01:22:03AM -0500, Rich Felker wrote:
> On Wed, Jan 04, 2017 at 01:06:40AM -0500, Rich Felker wrote:
> > diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> > index c689084..cb82b2c 100644
> > --- a/ldso/dynlink.c
> > +++ b/ldso/dynlink.c
> > @@ -67,6 +67,7 @@ struct dso {
> >  	char constructed;
> >  	char kernel_mapped;
> >  	struct dso **deps, *needed_by;
> > +	size_t next_dep;
> >  	char *rpath_orig, *rpath;
> >  	struct tls_module tls;
> >  	size_t tls_id;
> > @@ -1211,13 +1212,14 @@ void __libc_exit_fini()
> >  static void do_init_fini(struct dso *p)
> >  {
> >  	size_t dyn[DYN_CNT];
> > -	int need_locking = libc.threads_minus_1;
> > -	/* Allow recursive calls that arise when a library calls
> > -	 * dlopen from one of its constructors, but block any
> > -	 * other threads until all ctors have finished. */
> > -	if (need_locking) pthread_mutex_lock(&init_fini_lock);
> > -	for (; p; p=p->prev) {
> > -		if (p->constructed) continue;
> > +	pthread_mutex_lock(&init_fini_lock);
> > +	while (!p->constructed) {
> > +		while (p->deps[p->next_dep] && p->deps[p->next_dep]->constructed)
> > +			p->next_dep++;
> > +		if (p->deps[p->next_dep]) {
> > +			p = p->deps[p->next_dep++];
> > +			continue;
> > +		}
> 
> I think this logic is probably broken in the case of circular
> dependencies, and will end up skipping ctors for some deps due to the
> increment in this line:
> 
> +			p = p->deps[p->next_dep++];
> 
> which happens before the new p's ctors have actually run. Omitting the
> increment here, however, would turn it into an infinite loop. We need
> some way to detect this case and let the dso with an indirect
> dependency on itself run its ctors once all non-circular deps have
> been satisfied. I don't now the right algorithm for this right off,
> though; suggestions would be welcome.

Here's a v2 of the patch with the above issues fixed, and some
comments that hopefully make it make sense. I still think there's more
logic needed to allow concurrent ctors from unrelated dlopen in
multiple threads, though.

Rich

[-- Attachment #2: ctor_dep_order_v2.diff --]
[-- Type: text/plain, Size: 2681 bytes --]

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index c689084..fd59389 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -67,6 +67,7 @@ struct dso {
 	char constructed;
 	char kernel_mapped;
 	struct dso **deps, *needed_by;
+	size_t next_dep;
 	char *rpath_orig, *rpath;
 	struct tls_module tls;
 	size_t tls_id;
@@ -1090,9 +1091,12 @@ static void load_deps(struct dso *p)
 				if (runtime) longjmp(*rtld_fail, 1);
 				continue;
 			}
-			if (runtime) {
-				tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2));
-				if (!tmp) longjmp(*rtld_fail, 1);
+			tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2));
+			if (!tmp) {
+				error("Error allocating dependency data for %s: %m",
+					p->name);
+				if (runtime) longjmp(*rtld_fail, 1);
+			} else {
 				tmp[ndeps++] = dep;
 				tmp[ndeps] = 0;
 				*deps = tmp;
@@ -1211,13 +1215,21 @@ void __libc_exit_fini()
 static void do_init_fini(struct dso *p)
 {
 	size_t dyn[DYN_CNT];
-	int need_locking = libc.threads_minus_1;
-	/* Allow recursive calls that arise when a library calls
-	 * dlopen from one of its constructors, but block any
-	 * other threads until all ctors have finished. */
-	if (need_locking) pthread_mutex_lock(&init_fini_lock);
-	for (; p; p=p->prev) {
-		if (p->constructed) continue;
+	pthread_mutex_lock(&init_fini_lock);
+	/* Construct in dependency order without any recursive state. */
+	while (p && !p->constructed) {
+		/* The following loop descends into the first dependency
+		 * that is neither alredy constructed nor pending
+		 * construction due to circular deps, stopping only
+		 * when it reaches a dso with no remaining dependencies
+		 * to descend into. */
+		while (p->deps && p->deps[p->next_dep]) {
+			if (!p->deps[p->next_dep]->constructed &&
+			    !p->deps[p->next_dep]->next_dep)
+				p = p->deps[p->next_dep++];
+			else
+				p->next_dep++;
+		}
 		p->constructed = 1;
 		decode_vec(p->dynv, dyn, DYN_CNT);
 		if (dyn[0] & ((1<<DT_FINI) | (1<<DT_FINI_ARRAY))) {
@@ -1233,12 +1245,14 @@ static void do_init_fini(struct dso *p)
 			size_t *fn = laddr(p, dyn[DT_INIT_ARRAY]);
 			while (n--) ((void (*)(void))*fn++)();
 		}
-		if (!need_locking && libc.threads_minus_1) {
-			need_locking = 1;
-			pthread_mutex_lock(&init_fini_lock);
-		}
-	}
-	if (need_locking) pthread_mutex_unlock(&init_fini_lock);
+		/* Revisit "parent" dso which caused the just-constructed
+		 * dso to be pulled in as a dependency. On the next loop
+		 * iteration we will either descend to construct a sibling
+		 * of the just-constructed dso, or finish constructing the
+		 * parent if no unfinished deps remain. */
+		p = p->needed_by;
+	}
+	pthread_mutex_unlock(&init_fini_lock);
 }
 
 void __libc_start_init(void)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-01-04 19:36     ` Rich Felker
@ 2017-01-14 21:30       ` A. Wilcox
  2017-01-15 17:44         ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: A. Wilcox @ 2017-01-14 21:30 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2370 bytes --]

On 04/01/17 13:36, Rich Felker wrote:
> Here's a v2 of the patch with the above issues fixed, and some
> comments that hopefully make it make sense. I still think there's more
> logic needed to allow concurrent ctors from unrelated dlopen in
> multiple threads, though.
> 
> Rich
> 


Applied to this to Adélie's musl package in a dev overlay and rebooted a
box with this patch applied.

What a fantastic little show!

iv_tls_user_ptr: called on unregistered iv_tls_user
/etc/init.d/syslog-ng: line 34:  2560 Aborted                 syslog-ng
-s -f "${SYSLOG_NG_CONFIGFILE}"
 * ERROR: syslog-ng failed to start


When X tried to start up, further fireworks:


/usr/bin/startkde: line 384:  2638 Segmentation fault      kwrapper5
/usr/bin/ksmserver $KDEWM $KSMSERVEROPTIONS


Starting program: /usr/bin/kwrapper5 /usr/bin/ksmserver
process 3281 is executing new program: /usr/bin/ksmserver
[New LWP 3287]

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff009938b in operator== (s1=..., s2=...) at tools/qstring.cpp:2686
2686    tools/qstring.cpp: No such file or directory.
(gdb) bt
#0  0x00007ffff009938b in operator== (s1=..., s2=...) at
tools/qstring.cpp:2686
#1  0x00007fffe2af2ae4 in operator!= (s2=..., s1=...) at
/usr/include/qt5/QtCore/qstring.h:632
#2  KHintsSettings::KHintsSettings (this=0x7fffe65829c0, kdeglobals=...)
at
/usr/src/kde-plasma/plasma-integration-5.7.5/work/plasma-integration-5.7.5/src/platformtheme/khintssettings.cpp:70


Where khintssettings.cpp contains:

68    const QString looknfeel = cg.readEntry("LookAndFeelPackage",
defaultLookAndFeelPackage);
70    if (looknfeel != defaultLookAndFeelPackage) {


And defaultLookAndFeelPackage is defined earlier in the source file as a
constant:

static const QString defaultLookAndFeelPackage =
QStringLiteral("org.kde.breeze.desktop");


We can see that defaultLookAndFeelPackage was not initialised correctly:

(gdb) printqs5static looknfeel
$9 = (Qt5 QString)0xffffdde0 length=22: "org.kde.breeze.desktop"
(gdb) printqs5static defaultLookAndFeelPackage
$10 = (Qt5 QString)0xe2d0be90 length=Cannot access memory at address 0x4


It therefore seems to me that this patch still needs some refining.

--arw

-- 
A. Wilcox (awilfox)
Open-source programmer (C, C++, Python)
https://code.foxkit.us/u/awilfox/


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-01-14 21:30       ` A. Wilcox
@ 2017-01-15 17:44         ` Rich Felker
  2017-02-26  1:04           ` Szabolcs Nagy
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2017-01-15 17:44 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3560 bytes --]

On Sat, Jan 14, 2017 at 03:30:50PM -0600, A. Wilcox wrote:
> On 04/01/17 13:36, Rich Felker wrote:
> > Here's a v2 of the patch with the above issues fixed, and some
> > comments that hopefully make it make sense. I still think there's more
> > logic needed to allow concurrent ctors from unrelated dlopen in
> > multiple threads, though.
> > 
> > Rich
> > 
> 
> 
> Applied to this to Adélie's musl package in a dev overlay and rebooted a
> box with this patch applied.
> 
> What a fantastic little show!
> 
> iv_tls_user_ptr: called on unregistered iv_tls_user
> /etc/init.d/syslog-ng: line 34:  2560 Aborted                 syslog-ng
> -s -f "${SYSLOG_NG_CONFIGFILE}"
>  * ERROR: syslog-ng failed to start
> 
> 
> When X tried to start up, further fireworks:
> 
> 
> /usr/bin/startkde: line 384:  2638 Segmentation fault      kwrapper5
> /usr/bin/ksmserver $KDEWM $KSMSERVEROPTIONS
> 
> 
> Starting program: /usr/bin/kwrapper5 /usr/bin/ksmserver
> process 3281 is executing new program: /usr/bin/ksmserver
> [New LWP 3287]
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007ffff009938b in operator== (s1=..., s2=...) at tools/qstring.cpp:2686
> 2686    tools/qstring.cpp: No such file or directory.
> (gdb) bt
> #0  0x00007ffff009938b in operator== (s1=..., s2=...) at
> tools/qstring.cpp:2686
> #1  0x00007fffe2af2ae4 in operator!= (s2=..., s1=...) at
> /usr/include/qt5/QtCore/qstring.h:632
> #2  KHintsSettings::KHintsSettings (this=0x7fffe65829c0, kdeglobals=...)
> at
> /usr/src/kde-plasma/plasma-integration-5.7.5/work/plasma-integration-5.7.5/src/platformtheme/khintssettings.cpp:70
> 
> 
> Where khintssettings.cpp contains:
> 
> 68    const QString looknfeel = cg.readEntry("LookAndFeelPackage",
> defaultLookAndFeelPackage);
> 70    if (looknfeel != defaultLookAndFeelPackage) {
> 
> 
> And defaultLookAndFeelPackage is defined earlier in the source file as a
> constant:
> 
> static const QString defaultLookAndFeelPackage =
> QStringLiteral("org.kde.breeze.desktop");
> 
> 
> We can see that defaultLookAndFeelPackage was not initialised correctly:
> 
> (gdb) printqs5static looknfeel
> $9 = (Qt5 QString)0xffffdde0 length=22: "org.kde.breeze.desktop"
> (gdb) printqs5static defaultLookAndFeelPackage
> $10 = (Qt5 QString)0xe2d0be90 length=Cannot access memory at address 0x4
> 
> 
> It therefore seems to me that this patch still needs some refining.

Here's a v3 with a couple of issues fixed:

1. I failed to notice that do_init_fini needs to be called with a
pointer to the root of the (new part of) the dependency tree rather
than the tail of the dso list after the changes to its behavior. This
is now fixed.

2. The needed_by for libc.so itself was always null, causing tree
traversal to end immediately after visiting libc.so. It's now set to
the first dso that referenced it.

3. Likewise LD_PRELOAD dsos had a null needed_by. They're now treated
as being "needed by" the main app (as if they appeared in its
DT_NEEDED).

After these changes, your failing test case at
https://bpaste.net/raw/30ec06873fa2, code copied here:

------------------------------------------------------------------------
#include <iostream>

class NeedCXX
{
public:
  NeedCXX() { this->Foo = 1; }
  int GetFoo() { return this->Foo; }
private:
  int Foo;
};
int main()
{
  NeedCXX c;
  std::cout << c.GetFoo() << std::endl;
  return 0;
}
------------------------------------------------------------------------

seems to work as expected. I don't know if other bugs remain but at
least it seems plausible that it's working correctly now.

Rich

[-- Attachment #2: ctor_dep_order_v3.diff --]
[-- Type: text/plain, Size: 3427 bytes --]

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index a03f75e..3e52beb 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -67,6 +67,7 @@ struct dso {
 	char constructed;
 	char kernel_mapped;
 	struct dso **deps, *needed_by;
+	size_t next_dep;
 	char *rpath_orig, *rpath;
 	struct tls_module tls;
 	size_t tls_id;
@@ -934,6 +935,7 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
 		if (!ldso.prev) {
 			tail->next = &ldso;
 			ldso.prev = tail;
+			ldso.needed_by = needed_by;
 			tail = ldso.next ? ldso.next : &ldso;
 		}
 		return &ldso;
@@ -1090,9 +1092,12 @@ static void load_deps(struct dso *p)
 				if (runtime) longjmp(*rtld_fail, 1);
 				continue;
 			}
-			if (runtime) {
-				tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2));
-				if (!tmp) longjmp(*rtld_fail, 1);
+			tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2));
+			if (!tmp) {
+				error("Error allocating dependency data for %s: %m",
+					p->name);
+				if (runtime) longjmp(*rtld_fail, 1);
+			} else {
 				tmp[ndeps++] = dep;
 				tmp[ndeps] = 0;
 				*deps = tmp;
@@ -1110,7 +1115,7 @@ static void load_preload(char *s)
 		for (z=s; *z && !isspace(*z) && *z!=':'; z++);
 		tmp = *z;
 		*z = 0;
-		load_library(s, 0);
+		load_library(s, head);
 		*z = tmp;
 	}
 }
@@ -1211,13 +1216,21 @@ void __libc_exit_fini()
 static void do_init_fini(struct dso *p)
 {
 	size_t dyn[DYN_CNT];
-	int need_locking = libc.threads_minus_1;
-	/* Allow recursive calls that arise when a library calls
-	 * dlopen from one of its constructors, but block any
-	 * other threads until all ctors have finished. */
-	if (need_locking) pthread_mutex_lock(&init_fini_lock);
-	for (; p; p=p->prev) {
-		if (p->constructed) continue;
+	pthread_mutex_lock(&init_fini_lock);
+	/* Construct in dependency order without any recursive state. */
+	while (p && !p->constructed) {
+		/* The following loop descends into the first dependency
+		 * that is neither alredy constructed nor pending
+		 * construction due to circular deps, stopping only
+		 * when it reaches a dso with no remaining dependencies
+		 * to descend into. */
+		while (p->deps && p->deps[p->next_dep]) {
+			if (!p->deps[p->next_dep]->constructed &&
+			    !p->deps[p->next_dep]->next_dep)
+				p = p->deps[p->next_dep++];
+			else
+				p->next_dep++;
+		}
 		p->constructed = 1;
 		decode_vec(p->dynv, dyn, DYN_CNT);
 		if (dyn[0] & ((1<<DT_FINI) | (1<<DT_FINI_ARRAY))) {
@@ -1233,17 +1246,19 @@ static void do_init_fini(struct dso *p)
 			size_t *fn = laddr(p, dyn[DT_INIT_ARRAY]);
 			while (n--) ((void (*)(void))*fn++)();
 		}
-		if (!need_locking && libc.threads_minus_1) {
-			need_locking = 1;
-			pthread_mutex_lock(&init_fini_lock);
-		}
-	}
-	if (need_locking) pthread_mutex_unlock(&init_fini_lock);
+		/* Revisit "parent" dso which caused the just-constructed
+		 * dso to be pulled in as a dependency. On the next loop
+		 * iteration we will either descend to construct a sibling
+		 * of the just-constructed dso, or finish constructing the
+		 * parent if no unfinished deps remain. */
+		p = p->needed_by;
+	}
+	pthread_mutex_unlock(&init_fini_lock);
 }
 
 void __libc_start_init(void)
 {
-	do_init_fini(tail);
+	do_init_fini(head);
 }
 
 static void dl_debug_state(void)
@@ -1731,7 +1746,7 @@ end:
 	__release_ptc();
 	if (p) gencnt++;
 	pthread_rwlock_unlock(&lock);
-	if (p) do_init_fini(orig_tail);
+	if (p) do_init_fini(p);
 	pthread_setcancelstate(cs, 0);
 	return p;
 }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-01-04 10:51 ` Szabolcs Nagy
@ 2017-02-16  1:58   ` Szabolcs Nagy
  2017-02-16  2:39     ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Szabolcs Nagy @ 2017-02-16  1:58 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 663 bytes --]

* Szabolcs Nagy <nsz@port70.net> [2017-01-04 11:51:09 +0100]:
> * Rich Felker <dalias@libc.org> [2017-01-03 00:43:51 -0500]:
> > One item that's been on the agenda for a long time (maybe 1.5 years
> > now?) is planned dynamic linker improvements. The thread "Further
> > dynamic linker optimizations" from summer 2015 covered parts but maybe
> > not all of what I had in mind.
...
> - symbol lookup logic is duplicated between do_dlsym and find_sym,
>   it would be nice to do that with common code path if possible.

i looked into this and found some issues

i attached my current patches for comments, but the bugs may be
fixed separately from the refactoring.

[-- Attachment #2: 0001-treat-STB_WEAK-and-STB_GNU_UNIQUE-like-STB_GLOBAL-in.patch --]
[-- Type: text/x-diff, Size: 1447 bytes --]

From 5dd896dcdf53e50fc2d7c1957d62df7b8930ff01 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Sat, 3 Dec 2016 20:52:43 +0000
Subject: [PATCH 1/2] treat STB_WEAK and STB_GNU_UNIQUE like STB_GLOBAL in
 find_sym

A weak symbol definition is not special during dynamic linking, so
don't let a strong definition in a later module override it.
(glibc dynamic linker allows overriding weak definitions if
LD_DYNAMIC_WEAK is set, musl does not.)

STB_GNU_UNIQUE means that the symbol is global, even if it is in a
module that's loaded with RTLD_LOCAL, and all references resolve to
the same definition. This semantics is only relevant for c++ plugin
systems and even there it's often not what the user wants (so it can
be turned off in g++ by -fno-gnu-unique when the c++ shared lib is
compiled). In musl just treat it like STB_GLOBAL.
---
 ldso/dynlink.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index c6890845..476cd6a8 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -286,11 +286,9 @@ static struct symdef find_sym(struct dso *dso, const char *s, int need_def)
 				continue;
 		if (!(1<<(sym->st_info&0xf) & OK_TYPES)) continue;
 		if (!(1<<(sym->st_info>>4) & OK_BINDS)) continue;
-
-		if (def.sym && sym->st_info>>4 == STB_WEAK) continue;
 		def.sym = sym;
 		def.dso = dso;
-		if (sym->st_info>>4 == STB_GLOBAL) break;
+		break;
 	}
 	return def;
 }
-- 
2.11.0


[-- Attachment #3: 0002-make-relocation-time-symbol-lookup-and-dlsym-consist.patch --]
[-- Type: text/x-diff, Size: 5400 bytes --]

From b5e20dbf30e67a12bc0931f12126553d036bbd31 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Sat, 3 Dec 2016 23:08:44 +0000
Subject: [PATCH 2/2] make relocation time symbol lookup and dlsym consistent

Using common code path for all symbol lookups fixes three dlsym issues:
- st_shndx of STT_TLS symbols were not checked and thus an undefined
tls symbol reference could be incorrectly treated as a definition
(the sysv hash lookup returns undefined symbols, gnu does not, so should
be rare in practice).
- symbol binding was not checked so a hidden symbol may be returned
(in principle STB_LOCAL symbols may appear in the dynamic symbol table
for hidden symbols, but linkers most likely don't produce it).
- mips specific behaviour was not applied (ARCH_SYM_REJECT_UND).

TODO: not benchmarked, common code should be inlined probably.
---
 ldso/dynlink.c | 83 +++++++++++++++++++---------------------------------------
 1 file changed, 27 insertions(+), 56 deletions(-)

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index 476cd6a8..e5a382b0 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -256,14 +256,15 @@ static Sym *gnu_lookup_filtered(uint32_t h1, uint32_t *hashtab, struct dso *dso,
 #define ARCH_SYM_REJECT_UND(s) 0
 #endif
 
-static struct symdef find_sym(struct dso *dso, const char *s, int need_def)
+static struct symdef find_sym(struct dso *dso, const char *s, int need_def, int use_deps)
 {
 	uint32_t h = 0, gh, gho, *ght;
 	size_t ghm = 0;
 	struct symdef def = {0};
-	for (; dso; dso=dso->next) {
+	struct dso **deps = use_deps ? dso->deps : 0;
+	for (; dso; dso=use_deps ? *deps++ : dso->next) {
 		Sym *sym;
-		if (!dso->global) continue;
+		if (!dso->global && !use_deps) continue;
 		if ((ght = dso->ghashtab)) {
 			if (!ghm) {
 				gh = gnu_hash(s);
@@ -332,7 +333,7 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
 			ctx = type==REL_COPY ? head->next : head;
 			def = (sym->st_info&0xf) == STT_SECTION
 				? (struct symdef){ .dso = dso, .sym = sym }
-				: find_sym(ctx, name, type==REL_PLT);
+				: find_sym(ctx, name, type==REL_PLT, 0);
 			if (!def.sym && (sym->st_shndx != SHN_UNDEF
 			    || sym->st_info>>4 != STB_WEAK)) {
 				error("Error relocating %s: %s: symbol not found",
@@ -1377,7 +1378,7 @@ void __dls2(unsigned char *base, size_t *sp)
 	/* Call dynamic linker stage-3, __dls3, looking it up
 	 * symbolically as a barrier against moving the address
 	 * load across the above relocation processing. */
-	struct symdef dls3_def = find_sym(&ldso, "__dls3", 0);
+	struct symdef dls3_def = find_sym(&ldso, "__dls3", 0, 0);
 	if (DL_FDPIC) ((stage3_func)&ldso.funcdescs[dls3_def.sym-ldso.syms])(sp);
 	else ((stage3_func)laddr(&ldso, dls3_def.sym->st_value))(sp);
 }
@@ -1770,58 +1771,28 @@ void *__tls_get_addr(size_t *);
 
 static void *do_dlsym(struct dso *p, const char *s, void *ra)
 {
-	size_t i;
-	uint32_t h = 0, gh = 0, *ght;
-	Sym *sym;
-	if (p == head || p == RTLD_DEFAULT || p == RTLD_NEXT) {
-		if (p == RTLD_DEFAULT) {
-			p = head;
-		} else if (p == RTLD_NEXT) {
-			p = addr2dso((size_t)ra);
-			if (!p) p=head;
-			p = p->next;
-		}
-		struct symdef def = find_sym(p, s, 0);
-		if (!def.sym) goto failed;
-		if ((def.sym->st_info&0xf) == STT_TLS)
-			return __tls_get_addr((size_t []){def.dso->tls_id, def.sym->st_value});
-		if (DL_FDPIC && (def.sym->st_info&0xf) == STT_FUNC)
-			return def.dso->funcdescs + (def.sym - def.dso->syms);
-		return laddr(def.dso, def.sym->st_value);
-	}
-	if (__dl_invalid_handle(p))
+	struct symdef def;
+	int use_deps = 0;
+	if (p == head || p == RTLD_DEFAULT) {
+		p = head;
+	} else if (p == RTLD_NEXT) {
+		p = addr2dso((size_t)ra);
+		if (!p) p=head;
+		p = p->next;
+	} else if (__dl_invalid_handle(p)) {
 		return 0;
-	if ((ght = p->ghashtab)) {
-		gh = gnu_hash(s);
-		sym = gnu_lookup(gh, ght, p, s);
-	} else {
-		h = sysv_hash(s);
-		sym = sysv_lookup(s, h, p);
-	}
-	if (sym && (sym->st_info&0xf) == STT_TLS)
-		return __tls_get_addr((size_t []){p->tls_id, sym->st_value});
-	if (DL_FDPIC && sym && sym->st_shndx && (sym->st_info&0xf) == STT_FUNC)
-		return p->funcdescs + (sym - p->syms);
-	if (sym && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES))
-		return laddr(p, sym->st_value);
-	if (p->deps) for (i=0; p->deps[i]; i++) {
-		if ((ght = p->deps[i]->ghashtab)) {
-			if (!gh) gh = gnu_hash(s);
-			sym = gnu_lookup(gh, ght, p->deps[i], s);
-		} else {
-			if (!h) h = sysv_hash(s);
-			sym = sysv_lookup(s, h, p->deps[i]);
-		}
-		if (sym && (sym->st_info&0xf) == STT_TLS)
-			return __tls_get_addr((size_t []){p->deps[i]->tls_id, sym->st_value});
-		if (DL_FDPIC && sym && sym->st_shndx && (sym->st_info&0xf) == STT_FUNC)
-			return p->deps[i]->funcdescs + (sym - p->deps[i]->syms);
-		if (sym && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES))
-			return laddr(p->deps[i], sym->st_value);
-	}
-failed:
-	error("Symbol not found: %s", s);
-	return 0;
+	} else
+		use_deps = 1;
+	def = find_sym(p, s, 0, use_deps);
+	if (!def.sym) {
+		error("Symbol not found: %s", s);
+		return 0;
+	}
+	if ((def.sym->st_info&0xf) == STT_TLS)
+		return __tls_get_addr((size_t []){def.dso->tls_id, def.sym->st_value});
+	if (DL_FDPIC && (def.sym->st_info&0xf) == STT_FUNC)
+		return def.dso->funcdescs + (def.sym - def.dso->syms);
+	return laddr(def.dso, def.sym->st_value);
 }
 
 int dladdr(const void *addr, Dl_info *info)
-- 
2.11.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-02-16  1:58   ` Szabolcs Nagy
@ 2017-02-16  2:39     ` Rich Felker
  0 siblings, 0 replies; 21+ messages in thread
From: Rich Felker @ 2017-02-16  2:39 UTC (permalink / raw)
  To: musl

On Thu, Feb 16, 2017 at 02:58:24AM +0100, Szabolcs Nagy wrote:
> * Szabolcs Nagy <nsz@port70.net> [2017-01-04 11:51:09 +0100]:
> > * Rich Felker <dalias@libc.org> [2017-01-03 00:43:51 -0500]:
> > > One item that's been on the agenda for a long time (maybe 1.5 years
> > > now?) is planned dynamic linker improvements. The thread "Further
> > > dynamic linker optimizations" from summer 2015 covered parts but maybe
> > > not all of what I had in mind.
> ....
> > - symbol lookup logic is duplicated between do_dlsym and find_sym,
> >   it would be nice to do that with common code path if possible.
> 
> i looked into this and found some issues
> 
> i attached my current patches for comments, but the bugs may be
> fixed separately from the refactoring.

OK. I don't mind fixing them together as long as it's documented in
the commit log that we're fixing bugs. Yes it's nice to have
independent fixes, but sometimes it's not worth it if doing so
involves extra duplicate code.

> >From 5dd896dcdf53e50fc2d7c1957d62df7b8930ff01 Mon Sep 17 00:00:00 2001
> From: Szabolcs Nagy <nsz@port70.net>
> Date: Sat, 3 Dec 2016 20:52:43 +0000
> Subject: [PATCH 1/2] treat STB_WEAK and STB_GNU_UNIQUE like STB_GLOBAL in
>  find_sym
> 
> A weak symbol definition is not special during dynamic linking, so
> don't let a strong definition in a later module override it.
> (glibc dynamic linker allows overriding weak definitions if
> LD_DYNAMIC_WEAK is set, musl does not.)
> 
> STB_GNU_UNIQUE means that the symbol is global, even if it is in a
> module that's loaded with RTLD_LOCAL, and all references resolve to
> the same definition. This semantics is only relevant for c++ plugin
> systems and even there it's often not what the user wants (so it can
> be turned off in g++ by -fno-gnu-unique when the c++ shared lib is
> compiled). In musl just treat it like STB_GLOBAL.

I agree with all the policy aspects here.

> ---
>  ldso/dynlink.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> index c6890845..476cd6a8 100644
> --- a/ldso/dynlink.c
> +++ b/ldso/dynlink.c
> @@ -286,11 +286,9 @@ static struct symdef find_sym(struct dso *dso, const char *s, int need_def)
>  				continue;
>  		if (!(1<<(sym->st_info&0xf) & OK_TYPES)) continue;
>  		if (!(1<<(sym->st_info>>4) & OK_BINDS)) continue;
> -
> -		if (def.sym && sym->st_info>>4 == STB_WEAK) continue;
>  		def.sym = sym;
>  		def.dso = dso;
> -		if (sym->st_info>>4 == STB_GLOBAL) break;
> +		break;
>  	}

This looks ok, but it might be nice to reformat the loop in a separate
subsequent patch, since the break/continue usage is rather
"unconventional" now. :-)


> >From b5e20dbf30e67a12bc0931f12126553d036bbd31 Mon Sep 17 00:00:00 2001
> From: Szabolcs Nagy <nsz@port70.net>
> Date: Sat, 3 Dec 2016 23:08:44 +0000
> Subject: [PATCH 2/2] make relocation time symbol lookup and dlsym consistent
> 
> Using common code path for all symbol lookups fixes three dlsym issues:
> - st_shndx of STT_TLS symbols were not checked and thus an undefined
> tls symbol reference could be incorrectly treated as a definition
> (the sysv hash lookup returns undefined symbols, gnu does not, so should
> be rare in practice).
> - symbol binding was not checked so a hidden symbol may be returned
> (in principle STB_LOCAL symbols may appear in the dynamic symbol table
> for hidden symbols, but linkers most likely don't produce it).
> - mips specific behaviour was not applied (ARCH_SYM_REJECT_UND).
> 
> TODO: not benchmarked, common code should be inlined probably.

Yes, I suspect we should have:

#ifdef __GNUC__
__attribute__((__always_inline__))
#endif

on this function or similar. But maybe it doesn't matter. We should
check.

> ---
>  ldso/dynlink.c | 83 +++++++++++++++++++---------------------------------------
>  1 file changed, 27 insertions(+), 56 deletions(-)
> 
> diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> index 476cd6a8..e5a382b0 100644
> --- a/ldso/dynlink.c
> +++ b/ldso/dynlink.c
> @@ -256,14 +256,15 @@ static Sym *gnu_lookup_filtered(uint32_t h1, uint32_t *hashtab, struct dso *dso,
>  #define ARCH_SYM_REJECT_UND(s) 0
>  #endif
>  
> -static struct symdef find_sym(struct dso *dso, const char *s, int need_def)
> +static struct symdef find_sym(struct dso *dso, const char *s, int need_def, int use_deps)
>  {
>  	uint32_t h = 0, gh, gho, *ght;
>  	size_t ghm = 0;
>  	struct symdef def = {0};
> -	for (; dso; dso=dso->next) {
> +	struct dso **deps = use_deps ? dso->deps : 0;
> +	for (; dso; dso=use_deps ? *deps++ : dso->next) {

Spacing should really be changed here. a=b ? c : d is really bad.

>  		Sym *sym;
> -		if (!dso->global) continue;
> +		if (!dso->global && !use_deps) continue;

This will have to change a bit with the planned change to have a
separate global dso chain.

>  		if ((ght = dso->ghashtab)) {
>  			if (!ghm) {
>  				gh = gnu_hash(s);
> @@ -332,7 +333,7 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
>  			ctx = type==REL_COPY ? head->next : head;
>  			def = (sym->st_info&0xf) == STT_SECTION
>  				? (struct symdef){ .dso = dso, .sym = sym }
> -				: find_sym(ctx, name, type==REL_PLT);
> +				: find_sym(ctx, name, type==REL_PLT, 0);
>  			if (!def.sym && (sym->st_shndx != SHN_UNDEF
>  			    || sym->st_info>>4 != STB_WEAK)) {
>  				error("Error relocating %s: %s: symbol not found",
> @@ -1377,7 +1378,7 @@ void __dls2(unsigned char *base, size_t *sp)
>  	/* Call dynamic linker stage-3, __dls3, looking it up
>  	 * symbolically as a barrier against moving the address
>  	 * load across the above relocation processing. */
> -	struct symdef dls3_def = find_sym(&ldso, "__dls3", 0);
> +	struct symdef dls3_def = find_sym(&ldso, "__dls3", 0, 0);

And one place we really don't need/want the always_inline, so maybe we
need to be more clever about it.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-01-15 17:44         ` Rich Felker
@ 2017-02-26  1:04           ` Szabolcs Nagy
  2017-02-26  1:39             ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Szabolcs Nagy @ 2017-02-26  1:04 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2017-01-15 12:44:38 -0500]:
>  static void do_init_fini(struct dso *p)
>  {
>  	size_t dyn[DYN_CNT];
> -	int need_locking = libc.threads_minus_1;
> -	/* Allow recursive calls that arise when a library calls
> -	 * dlopen from one of its constructors, but block any
> -	 * other threads until all ctors have finished. */
> -	if (need_locking) pthread_mutex_lock(&init_fini_lock);
> -	for (; p; p=p->prev) {
> -		if (p->constructed) continue;
> +	pthread_mutex_lock(&init_fini_lock);
> +	/* Construct in dependency order without any recursive state. */
> +	while (p && !p->constructed) {
> +		/* The following loop descends into the first dependency
> +		 * that is neither alredy constructed nor pending
> +		 * construction due to circular deps, stopping only
> +		 * when it reaches a dso with no remaining dependencies
> +		 * to descend into. */
> +		while (p->deps && p->deps[p->next_dep]) {
> +			if (!p->deps[p->next_dep]->constructed &&
> +			    !p->deps[p->next_dep]->next_dep)
> +				p = p->deps[p->next_dep++];
> +			else
> +				p->next_dep++;
> +		}
>  		p->constructed = 1;
>  		decode_vec(p->dynv, dyn, DYN_CNT);
>  		if (dyn[0] & ((1<<DT_FINI) | (1<<DT_FINI_ARRAY))) {
> @@ -1233,17 +1246,19 @@ static void do_init_fini(struct dso *p)
>  			size_t *fn = laddr(p, dyn[DT_INIT_ARRAY]);
>  			while (n--) ((void (*)(void))*fn++)();
>  		}
> -		if (!need_locking && libc.threads_minus_1) {
> -			need_locking = 1;
> -			pthread_mutex_lock(&init_fini_lock);
> -		}
> -	}
> -	if (need_locking) pthread_mutex_unlock(&init_fini_lock);
> +		/* Revisit "parent" dso which caused the just-constructed
> +		 * dso to be pulled in as a dependency. On the next loop
> +		 * iteration we will either descend to construct a sibling
> +		 * of the just-constructed dso, or finish constructing the
> +		 * parent if no unfinished deps remain. */
> +		p = p->needed_by;
> +	}

i think with

a.deps: b c
b.deps: c d
b.needed_by: a
c.needed_by: a

the visiting order starting from a is
a
b
c
a

and d never gets constructed.

i was looking for the dfs stack (how you track back
on the path you descended into the dependency tree),
the needed_by entry might not point to the parent
dso through which you arrived somewhere.

i'm not sure what the right fix is, if needed_by
is not used elsewhere then it could be set during
traversal, but there might be other ways.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-02-26  1:04           ` Szabolcs Nagy
@ 2017-02-26  1:39             ` Rich Felker
  2017-02-26 10:28               ` Szabolcs Nagy
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2017-02-26  1:39 UTC (permalink / raw)
  To: musl

On Sun, Feb 26, 2017 at 02:04:30AM +0100, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2017-01-15 12:44:38 -0500]:
> >  static void do_init_fini(struct dso *p)
> >  {
> >  	size_t dyn[DYN_CNT];
> > -	int need_locking = libc.threads_minus_1;
> > -	/* Allow recursive calls that arise when a library calls
> > -	 * dlopen from one of its constructors, but block any
> > -	 * other threads until all ctors have finished. */
> > -	if (need_locking) pthread_mutex_lock(&init_fini_lock);
> > -	for (; p; p=p->prev) {
> > -		if (p->constructed) continue;
> > +	pthread_mutex_lock(&init_fini_lock);
> > +	/* Construct in dependency order without any recursive state. */
> > +	while (p && !p->constructed) {
> > +		/* The following loop descends into the first dependency
> > +		 * that is neither alredy constructed nor pending
> > +		 * construction due to circular deps, stopping only
> > +		 * when it reaches a dso with no remaining dependencies
> > +		 * to descend into. */
> > +		while (p->deps && p->deps[p->next_dep]) {
> > +			if (!p->deps[p->next_dep]->constructed &&
> > +			    !p->deps[p->next_dep]->next_dep)
> > +				p = p->deps[p->next_dep++];
> > +			else
> > +				p->next_dep++;
> > +		}
> >  		p->constructed = 1;
> >  		decode_vec(p->dynv, dyn, DYN_CNT);
> >  		if (dyn[0] & ((1<<DT_FINI) | (1<<DT_FINI_ARRAY))) {
> > @@ -1233,17 +1246,19 @@ static void do_init_fini(struct dso *p)
> >  			size_t *fn = laddr(p, dyn[DT_INIT_ARRAY]);
> >  			while (n--) ((void (*)(void))*fn++)();
> >  		}
> > -		if (!need_locking && libc.threads_minus_1) {
> > -			need_locking = 1;
> > -			pthread_mutex_lock(&init_fini_lock);
> > -		}
> > -	}
> > -	if (need_locking) pthread_mutex_unlock(&init_fini_lock);
> > +		/* Revisit "parent" dso which caused the just-constructed
> > +		 * dso to be pulled in as a dependency. On the next loop
> > +		 * iteration we will either descend to construct a sibling
> > +		 * of the just-constructed dso, or finish constructing the
> > +		 * parent if no unfinished deps remain. */
> > +		p = p->needed_by;
> > +	}
> 
> i think with
> 
> a.deps: b c
> b.deps: c d
> b.needed_by: a
> c.needed_by: a
> 
> the visiting order starting from a is
> a
> b
> c
> a
> 
> and d never gets constructed.

Are you sure? My understanding of what it does is:

1. Descend a->b->c, construct c, and back up to b.
2. Descend b->d, construct d, and back up to b.
3. Find all of b's deps constructed, construct b, and back up to a.
4. Find all of a's deps constructed, construct a, and end.

I think you have a misunderstanding of "visit order". Nodes are not
visited while descending, only when reaching a point where no further
descent into a non-constructed dep is possible.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-02-26  1:39             ` Rich Felker
@ 2017-02-26 10:28               ` Szabolcs Nagy
  2017-02-26 15:20                 ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Szabolcs Nagy @ 2017-02-26 10:28 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2017-02-25 20:39:26 -0500]:
> On Sun, Feb 26, 2017 at 02:04:30AM +0100, Szabolcs Nagy wrote:
> > * Rich Felker <dalias@libc.org> [2017-01-15 12:44:38 -0500]:
> > >  static void do_init_fini(struct dso *p)
> > >  {
> > >  	size_t dyn[DYN_CNT];
> > > -	int need_locking = libc.threads_minus_1;
> > > -	/* Allow recursive calls that arise when a library calls
> > > -	 * dlopen from one of its constructors, but block any
> > > -	 * other threads until all ctors have finished. */
> > > -	if (need_locking) pthread_mutex_lock(&init_fini_lock);
> > > -	for (; p; p=p->prev) {
> > > -		if (p->constructed) continue;
> > > +	pthread_mutex_lock(&init_fini_lock);
> > > +	/* Construct in dependency order without any recursive state. */
> > > +	while (p && !p->constructed) {
> > > +		/* The following loop descends into the first dependency
> > > +		 * that is neither alredy constructed nor pending
> > > +		 * construction due to circular deps, stopping only
> > > +		 * when it reaches a dso with no remaining dependencies
> > > +		 * to descend into. */
> > > +		while (p->deps && p->deps[p->next_dep]) {
> > > +			if (!p->deps[p->next_dep]->constructed &&
> > > +			    !p->deps[p->next_dep]->next_dep)
> > > +				p = p->deps[p->next_dep++];
> > > +			else
> > > +				p->next_dep++;
> > > +		}
> > >  		p->constructed = 1;
> > >  		decode_vec(p->dynv, dyn, DYN_CNT);
> > >  		if (dyn[0] & ((1<<DT_FINI) | (1<<DT_FINI_ARRAY))) {
> > > @@ -1233,17 +1246,19 @@ static void do_init_fini(struct dso *p)
> > >  			size_t *fn = laddr(p, dyn[DT_INIT_ARRAY]);
> > >  			while (n--) ((void (*)(void))*fn++)();
> > >  		}
> > > -		if (!need_locking && libc.threads_minus_1) {
> > > -			need_locking = 1;
> > > -			pthread_mutex_lock(&init_fini_lock);
> > > -		}
> > > -	}
> > > -	if (need_locking) pthread_mutex_unlock(&init_fini_lock);
> > > +		/* Revisit "parent" dso which caused the just-constructed
> > > +		 * dso to be pulled in as a dependency. On the next loop
> > > +		 * iteration we will either descend to construct a sibling
> > > +		 * of the just-constructed dso, or finish constructing the
> > > +		 * parent if no unfinished deps remain. */
> > > +		p = p->needed_by;
> > > +	}
> > 
> > i think with
> > 
> > a.deps: b c
> > b.deps: c d
> > b.needed_by: a
> > c.needed_by: a
> > 
> > the visiting order starting from a is
> > a
> > b
> > c
> > a
> > 
> > and d never gets constructed.
> 
> Are you sure? My understanding of what it does is:
> 
> 1. Descend a->b->c, construct c, and back up to b.

you did not explain how you get back to b after c
without a stack of visited dsos or modified c->needed_by.

> 2. Descend b->d, construct d, and back up to b.
> 3. Find all of b's deps constructed, construct b, and back up to a.
> 4. Find all of a's deps constructed, construct a, and end.
> 
> I think you have a misunderstanding of "visit order". Nodes are not
> visited while descending, only when reaching a point where no further
> descent into a non-constructed dep is possible.
> 
> Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-02-26 10:28               ` Szabolcs Nagy
@ 2017-02-26 15:20                 ` Rich Felker
  2017-02-26 15:34                   ` Szabolcs Nagy
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2017-02-26 15:20 UTC (permalink / raw)
  To: musl

On Sun, Feb 26, 2017 at 11:28:30AM +0100, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2017-02-25 20:39:26 -0500]:
> > On Sun, Feb 26, 2017 at 02:04:30AM +0100, Szabolcs Nagy wrote:
> > > * Rich Felker <dalias@libc.org> [2017-01-15 12:44:38 -0500]:
> > > >  static void do_init_fini(struct dso *p)
> > > >  {
> > > >  	size_t dyn[DYN_CNT];
> > > > -	int need_locking = libc.threads_minus_1;
> > > > -	/* Allow recursive calls that arise when a library calls
> > > > -	 * dlopen from one of its constructors, but block any
> > > > -	 * other threads until all ctors have finished. */
> > > > -	if (need_locking) pthread_mutex_lock(&init_fini_lock);
> > > > -	for (; p; p=p->prev) {
> > > > -		if (p->constructed) continue;
> > > > +	pthread_mutex_lock(&init_fini_lock);
> > > > +	/* Construct in dependency order without any recursive state. */
> > > > +	while (p && !p->constructed) {
> > > > +		/* The following loop descends into the first dependency
> > > > +		 * that is neither alredy constructed nor pending
> > > > +		 * construction due to circular deps, stopping only
> > > > +		 * when it reaches a dso with no remaining dependencies
> > > > +		 * to descend into. */
> > > > +		while (p->deps && p->deps[p->next_dep]) {
> > > > +			if (!p->deps[p->next_dep]->constructed &&
> > > > +			    !p->deps[p->next_dep]->next_dep)
> > > > +				p = p->deps[p->next_dep++];
> > > > +			else
> > > > +				p->next_dep++;
> > > > +		}
> > > >  		p->constructed = 1;
> > > >  		decode_vec(p->dynv, dyn, DYN_CNT);
> > > >  		if (dyn[0] & ((1<<DT_FINI) | (1<<DT_FINI_ARRAY))) {
> > > > @@ -1233,17 +1246,19 @@ static void do_init_fini(struct dso *p)
> > > >  			size_t *fn = laddr(p, dyn[DT_INIT_ARRAY]);
> > > >  			while (n--) ((void (*)(void))*fn++)();
> > > >  		}
> > > > -		if (!need_locking && libc.threads_minus_1) {
> > > > -			need_locking = 1;
> > > > -			pthread_mutex_lock(&init_fini_lock);
> > > > -		}
> > > > -	}
> > > > -	if (need_locking) pthread_mutex_unlock(&init_fini_lock);
> > > > +		/* Revisit "parent" dso which caused the just-constructed
> > > > +		 * dso to be pulled in as a dependency. On the next loop
> > > > +		 * iteration we will either descend to construct a sibling
> > > > +		 * of the just-constructed dso, or finish constructing the
> > > > +		 * parent if no unfinished deps remain. */
> > > > +		p = p->needed_by;
> > > > +	}
> > > 
> > > i think with
> > > 
> > > a.deps: b c
> > > b.deps: c d
> > > b.needed_by: a
> > > c.needed_by: a
> > > 
> > > the visiting order starting from a is
> > > a
> > > b
> > > c
> > > a
> > > 
> > > and d never gets constructed.
> > 
> > Are you sure? My understanding of what it does is:
> > 
> > 1. Descend a->b->c, construct c, and back up to b.
> 
> you did not explain how you get back to b after c
> without a stack of visited dsos or modified c->needed_by.

Sorry, that should have been back up to a (c->needed_by). Then:

2. Descend a->b->d, construct d, and back up to b.

The key point is that x->needed_by is always the first dso that pulled
in x, so if we back all the way back up to x->needed_by, we'll revisit
all later dsos which depend on x.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-02-26 15:20                 ` Rich Felker
@ 2017-02-26 15:34                   ` Szabolcs Nagy
  2017-02-26 21:39                     ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Szabolcs Nagy @ 2017-02-26 15:34 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2017-02-26 10:20:16 -0500]:
> On Sun, Feb 26, 2017 at 11:28:30AM +0100, Szabolcs Nagy wrote:
> > * Rich Felker <dalias@libc.org> [2017-02-25 20:39:26 -0500]:
> > > On Sun, Feb 26, 2017 at 02:04:30AM +0100, Szabolcs Nagy wrote:
> > > > * Rich Felker <dalias@libc.org> [2017-01-15 12:44:38 -0500]:
> > > > >  static void do_init_fini(struct dso *p)
> > > > >  {
> > > > >  	size_t dyn[DYN_CNT];
> > > > > -	int need_locking = libc.threads_minus_1;
> > > > > -	/* Allow recursive calls that arise when a library calls
> > > > > -	 * dlopen from one of its constructors, but block any
> > > > > -	 * other threads until all ctors have finished. */
> > > > > -	if (need_locking) pthread_mutex_lock(&init_fini_lock);
> > > > > -	for (; p; p=p->prev) {
> > > > > -		if (p->constructed) continue;
> > > > > +	pthread_mutex_lock(&init_fini_lock);
> > > > > +	/* Construct in dependency order without any recursive state. */
> > > > > +	while (p && !p->constructed) {
> > > > > +		/* The following loop descends into the first dependency
> > > > > +		 * that is neither alredy constructed nor pending
> > > > > +		 * construction due to circular deps, stopping only
> > > > > +		 * when it reaches a dso with no remaining dependencies
> > > > > +		 * to descend into. */
> > > > > +		while (p->deps && p->deps[p->next_dep]) {
> > > > > +			if (!p->deps[p->next_dep]->constructed &&
> > > > > +			    !p->deps[p->next_dep]->next_dep)
> > > > > +				p = p->deps[p->next_dep++];
> > > > > +			else
> > > > > +				p->next_dep++;
> > > > > +		}
> > > > >  		p->constructed = 1;
> > > > >  		decode_vec(p->dynv, dyn, DYN_CNT);
> > > > >  		if (dyn[0] & ((1<<DT_FINI) | (1<<DT_FINI_ARRAY))) {
> > > > > @@ -1233,17 +1246,19 @@ static void do_init_fini(struct dso *p)
> > > > >  			size_t *fn = laddr(p, dyn[DT_INIT_ARRAY]);
> > > > >  			while (n--) ((void (*)(void))*fn++)();
> > > > >  		}
> > > > > -		if (!need_locking && libc.threads_minus_1) {
> > > > > -			need_locking = 1;
> > > > > -			pthread_mutex_lock(&init_fini_lock);
> > > > > -		}
> > > > > -	}
> > > > > -	if (need_locking) pthread_mutex_unlock(&init_fini_lock);
> > > > > +		/* Revisit "parent" dso which caused the just-constructed
> > > > > +		 * dso to be pulled in as a dependency. On the next loop
> > > > > +		 * iteration we will either descend to construct a sibling
> > > > > +		 * of the just-constructed dso, or finish constructing the
> > > > > +		 * parent if no unfinished deps remain. */
> > > > > +		p = p->needed_by;
> > > > > +	}
> > > > 
> > > > i think with
> > > > 
> > > > a.deps: b c
> > > > b.deps: c d
> > > > b.needed_by: a
> > > > c.needed_by: a
> > > > 
> > > > the visiting order starting from a is
> > > > a
> > > > b
> > > > c
> > > > a
> > > > 
> > > > and d never gets constructed.
> > > 
> > > Are you sure? My understanding of what it does is:
> > > 
> > > 1. Descend a->b->c, construct c, and back up to b.
> > 
> > you did not explain how you get back to b after c
> > without a stack of visited dsos or modified c->needed_by.
> 
> Sorry, that should have been back up to a (c->needed_by). Then:
> 
> 2. Descend a->b->d, construct d, and back up to b.
> 
> The key point is that x->needed_by is always the first dso that pulled
> in x, so if we back all the way back up to x->needed_by, we'll revisit
> all later dsos which depend on x.

for that a->b transition has to happen twice,
but a.next_dep is already past b the second
time a is visited, so i still don't see why
this works.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-02-26 15:34                   ` Szabolcs Nagy
@ 2017-02-26 21:39                     ` Rich Felker
  2017-03-03  1:30                       ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2017-02-26 21:39 UTC (permalink / raw)
  To: musl

On Sun, Feb 26, 2017 at 04:34:36PM +0100, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2017-02-26 10:20:16 -0500]:
> > On Sun, Feb 26, 2017 at 11:28:30AM +0100, Szabolcs Nagy wrote:
> > > * Rich Felker <dalias@libc.org> [2017-02-25 20:39:26 -0500]:
> > > > On Sun, Feb 26, 2017 at 02:04:30AM +0100, Szabolcs Nagy wrote:
> > > > > * Rich Felker <dalias@libc.org> [2017-01-15 12:44:38 -0500]:
> > > > > >  static void do_init_fini(struct dso *p)
> > > > > >  {
> > > > > >  	size_t dyn[DYN_CNT];
> > > > > > -	int need_locking = libc.threads_minus_1;
> > > > > > -	/* Allow recursive calls that arise when a library calls
> > > > > > -	 * dlopen from one of its constructors, but block any
> > > > > > -	 * other threads until all ctors have finished. */
> > > > > > -	if (need_locking) pthread_mutex_lock(&init_fini_lock);
> > > > > > -	for (; p; p=p->prev) {
> > > > > > -		if (p->constructed) continue;
> > > > > > +	pthread_mutex_lock(&init_fini_lock);
> > > > > > +	/* Construct in dependency order without any recursive state. */
> > > > > > +	while (p && !p->constructed) {
> > > > > > +		/* The following loop descends into the first dependency
> > > > > > +		 * that is neither alredy constructed nor pending
> > > > > > +		 * construction due to circular deps, stopping only
> > > > > > +		 * when it reaches a dso with no remaining dependencies
> > > > > > +		 * to descend into. */
> > > > > > +		while (p->deps && p->deps[p->next_dep]) {
> > > > > > +			if (!p->deps[p->next_dep]->constructed &&
> > > > > > +			    !p->deps[p->next_dep]->next_dep)
> > > > > > +				p = p->deps[p->next_dep++];
> > > > > > +			else
> > > > > > +				p->next_dep++;
> > > > > > +		}
> > > > > >  		p->constructed = 1;
> > > > > >  		decode_vec(p->dynv, dyn, DYN_CNT);
> > > > > >  		if (dyn[0] & ((1<<DT_FINI) | (1<<DT_FINI_ARRAY))) {
> > > > > > @@ -1233,17 +1246,19 @@ static void do_init_fini(struct dso *p)
> > > > > >  			size_t *fn = laddr(p, dyn[DT_INIT_ARRAY]);
> > > > > >  			while (n--) ((void (*)(void))*fn++)();
> > > > > >  		}
> > > > > > -		if (!need_locking && libc.threads_minus_1) {
> > > > > > -			need_locking = 1;
> > > > > > -			pthread_mutex_lock(&init_fini_lock);
> > > > > > -		}
> > > > > > -	}
> > > > > > -	if (need_locking) pthread_mutex_unlock(&init_fini_lock);
> > > > > > +		/* Revisit "parent" dso which caused the just-constructed
> > > > > > +		 * dso to be pulled in as a dependency. On the next loop
> > > > > > +		 * iteration we will either descend to construct a sibling
> > > > > > +		 * of the just-constructed dso, or finish constructing the
> > > > > > +		 * parent if no unfinished deps remain. */
> > > > > > +		p = p->needed_by;
> > > > > > +	}
> > > > > 
> > > > > i think with
> > > > > 
> > > > > a.deps: b c
> > > > > b.deps: c d
> > > > > b.needed_by: a
> > > > > c.needed_by: a
> > > > > 
> > > > > the visiting order starting from a is
> > > > > a
> > > > > b
> > > > > c
> > > > > a
> > > > > 
> > > > > and d never gets constructed.
> > > > 
> > > > Are you sure? My understanding of what it does is:
> > > > 
> > > > 1. Descend a->b->c, construct c, and back up to b.
> > > 
> > > you did not explain how you get back to b after c
> > > without a stack of visited dsos or modified c->needed_by.
> > 
> > Sorry, that should have been back up to a (c->needed_by). Then:
> > 
> > 2. Descend a->b->d, construct d, and back up to b.
> > 
> > The key point is that x->needed_by is always the first dso that pulled
> > in x, so if we back all the way back up to x->needed_by, we'll revisit
> > all later dsos which depend on x.
> 
> for that a->b transition has to happen twice,
> but a.next_dep is already past b the second
> time a is visited, so i still don't see why
> this works.

Indeed, that looks like a bug. Removing the ++ from p =
p->deps[p->next_dep++]; fixes it, but breaks the logic for avoiding
circular descent (the condition !p->deps[p->next_dep]->next_dep). I
think we need to add a separate field to control that, and a visited
flag does not suffice; instead it should probably be something like
the descent depth (or just sequence number) at which the DSO was first
encountered, so that we can avoid descending into a DSO that we
already started descending into and that will be descended into again
as part of the backing-up process.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-02-26 21:39                     ` Rich Felker
@ 2017-03-03  1:30                       ` Rich Felker
  2017-03-04 10:58                         ` Szabolcs Nagy
  2017-03-06 16:25                         ` Rich Felker
  0 siblings, 2 replies; 21+ messages in thread
From: Rich Felker @ 2017-03-03  1:30 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2358 bytes --]

On Sun, Feb 26, 2017 at 04:39:25PM -0500, Rich Felker wrote:
> > > > > Are you sure? My understanding of what it does is:
> > > > > 
> > > > > 1. Descend a->b->c, construct c, and back up to b.
> > > > 
> > > > you did not explain how you get back to b after c
> > > > without a stack of visited dsos or modified c->needed_by.
> > > 
> > > Sorry, that should have been back up to a (c->needed_by). Then:
> > > 
> > > 2. Descend a->b->d, construct d, and back up to b.
> > > 
> > > The key point is that x->needed_by is always the first dso that pulled
> > > in x, so if we back all the way back up to x->needed_by, we'll revisit
> > > all later dsos which depend on x.
> > 
> > for that a->b transition has to happen twice,
> > but a.next_dep is already past b the second
> > time a is visited, so i still don't see why
> > this works.
> 
> Indeed, that looks like a bug. Removing the ++ from p =
> p->deps[p->next_dep++]; fixes it, but breaks the logic for avoiding
> circular descent (the condition !p->deps[p->next_dep]->next_dep). I
> think we need to add a separate field to control that, and a visited
> flag does not suffice; instead it should probably be something like
> the descent depth (or just sequence number) at which the DSO was first
> encountered, so that we can avoid descending into a DSO that we
> already started descending into and that will be descended into again
> as part of the backing-up process.

Here's a v4 of the patch that saves the "init parent" we descended
from so that it can return where it left off. There are a couple
gratuitous hunks left over adding setting of "needed_by" where it made
sense to be set, but it's not actually used anymore. They could be
dropped if desired but are probably nice to keep for the sake of
consistency of data, even thoough it's data we don't use.

I believe this can be extended to allow concurrent dlopen by amending
the case in the tree-walk where a dependency isn't constructed yet but
already has an "init parent" to check whether it's
pending-construction in the calling thread (recursive dlopen from a
ctor) or another thread; in the former case (as now) treat it as
already-constructed; in the latter, wait on a condvar that gets
signaled at the end of each construction, then continue the loop
without advancing p. There are probably some subtleties I'm missing,
though.

Rich

[-- Attachment #2: ctor_dep_order_v4.diff --]
[-- Type: text/plain, Size: 3553 bytes --]

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index a03f75e..b4754a5 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -66,7 +66,8 @@ struct dso {
 	char relocated;
 	char constructed;
 	char kernel_mapped;
-	struct dso **deps, *needed_by;
+	struct dso **deps, *needed_by, *init_parent;
+	size_t next_dep;
 	char *rpath_orig, *rpath;
 	struct tls_module tls;
 	size_t tls_id;
@@ -934,6 +935,7 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
 		if (!ldso.prev) {
 			tail->next = &ldso;
 			ldso.prev = tail;
+			ldso.needed_by = needed_by;
 			tail = ldso.next ? ldso.next : &ldso;
 		}
 		return &ldso;
@@ -1090,9 +1092,12 @@ static void load_deps(struct dso *p)
 				if (runtime) longjmp(*rtld_fail, 1);
 				continue;
 			}
-			if (runtime) {
-				tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2));
-				if (!tmp) longjmp(*rtld_fail, 1);
+			tmp = realloc(*deps, sizeof(*tmp)*(ndeps+2));
+			if (!tmp) {
+				error("Error allocating dependency data for %s: %m",
+					p->name);
+				if (runtime) longjmp(*rtld_fail, 1);
+			} else {
 				tmp[ndeps++] = dep;
 				tmp[ndeps] = 0;
 				*deps = tmp;
@@ -1110,7 +1115,7 @@ static void load_preload(char *s)
 		for (z=s; *z && !isspace(*z) && *z!=':'; z++);
 		tmp = *z;
 		*z = 0;
-		load_library(s, 0);
+		load_library(s, head);
 		*z = tmp;
 	}
 }
@@ -1211,13 +1216,23 @@ void __libc_exit_fini()
 static void do_init_fini(struct dso *p)
 {
 	size_t dyn[DYN_CNT];
-	int need_locking = libc.threads_minus_1;
-	/* Allow recursive calls that arise when a library calls
-	 * dlopen from one of its constructors, but block any
-	 * other threads until all ctors have finished. */
-	if (need_locking) pthread_mutex_lock(&init_fini_lock);
-	for (; p; p=p->prev) {
-		if (p->constructed) continue;
+	pthread_mutex_lock(&init_fini_lock);
+	/* Construct in dependency order without any recursive state. */
+	while (p && !p->constructed) {
+		/* The following loop descends into the first dependency
+		 * that is neither alredy constructed nor pending
+		 * construction due to circular deps, stopping only
+		 * when it reaches a dso with no remaining dependencies
+		 * to descend into. */
+		while (p->deps && p->deps[p->next_dep]) {
+			if (!p->deps[p->next_dep]->constructed &&
+			    !p->deps[p->next_dep]->init_parent) {
+				p->deps[p->next_dep]->init_parent = p;
+				p = p->deps[p->next_dep++];
+			} else {
+				p->next_dep++;
+			}
+		}
 		p->constructed = 1;
 		decode_vec(p->dynv, dyn, DYN_CNT);
 		if (dyn[0] & ((1<<DT_FINI) | (1<<DT_FINI_ARRAY))) {
@@ -1233,17 +1248,19 @@ static void do_init_fini(struct dso *p)
 			size_t *fn = laddr(p, dyn[DT_INIT_ARRAY]);
 			while (n--) ((void (*)(void))*fn++)();
 		}
-		if (!need_locking && libc.threads_minus_1) {
-			need_locking = 1;
-			pthread_mutex_lock(&init_fini_lock);
-		}
-	}
-	if (need_locking) pthread_mutex_unlock(&init_fini_lock);
+		/* Revisit "parent" dso which caused the just-constructed
+		 * dso to be pulled in as a dependency. On the next loop
+		 * iteration we will either descend to construct a sibling
+		 * of the just-constructed dso, or finish constructing the
+		 * parent if no unfinished deps remain. */
+		p = p->init_parent;
+	}
+	pthread_mutex_unlock(&init_fini_lock);
 }
 
 void __libc_start_init(void)
 {
-	do_init_fini(tail);
+	do_init_fini(head);
 }
 
 static void dl_debug_state(void)
@@ -1731,7 +1748,7 @@ end:
 	__release_ptc();
 	if (p) gencnt++;
 	pthread_rwlock_unlock(&lock);
-	if (p) do_init_fini(orig_tail);
+	if (p) do_init_fini(p);
 	pthread_setcancelstate(cs, 0);
 	return p;
 }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-03-03  1:30                       ` Rich Felker
@ 2017-03-04 10:58                         ` Szabolcs Nagy
  2017-03-06  1:11                           ` Rich Felker
  2017-03-06 16:25                         ` Rich Felker
  1 sibling, 1 reply; 21+ messages in thread
From: Szabolcs Nagy @ 2017-03-04 10:58 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2017-03-02 20:30:26 -0500]:
> Here's a v4 of the patch that saves the "init parent" we descended
> from so that it can return where it left off. There are a couple
> gratuitous hunks left over adding setting of "needed_by" where it made
> sense to be set, but it's not actually used anymore. They could be
> dropped if desired but are probably nice to keep for the sake of
> consistency of data, even thoough it's data we don't use.
> 
> I believe this can be extended to allow concurrent dlopen by amending
> the case in the tree-walk where a dependency isn't constructed yet but
> already has an "init parent" to check whether it's
> pending-construction in the calling thread (recursive dlopen from a
> ctor) or another thread; in the former case (as now) treat it as
> already-constructed; in the latter, wait on a condvar that gets
> signaled at the end of each construction, then continue the loop
> without advancing p. There are probably some subtleties I'm missing,
> though.
...
>  static void do_init_fini(struct dso *p)
>  {
>  	size_t dyn[DYN_CNT];
> -	int need_locking = libc.threads_minus_1;
> -	/* Allow recursive calls that arise when a library calls
> -	 * dlopen from one of its constructors, but block any
> -	 * other threads until all ctors have finished. */
> -	if (need_locking) pthread_mutex_lock(&init_fini_lock);
> -	for (; p; p=p->prev) {
> -		if (p->constructed) continue;
> +	pthread_mutex_lock(&init_fini_lock);
> +	/* Construct in dependency order without any recursive state. */
> +	while (p && !p->constructed) {
> +		/* The following loop descends into the first dependency
> +		 * that is neither alredy constructed nor pending
> +		 * construction due to circular deps, stopping only
> +		 * when it reaches a dso with no remaining dependencies
> +		 * to descend into. */
> +		while (p->deps && p->deps[p->next_dep]) {
> +			if (!p->deps[p->next_dep]->constructed &&
> +			    !p->deps[p->next_dep]->init_parent) {
> +				p->deps[p->next_dep]->init_parent = p;
> +				p = p->deps[p->next_dep++];

i think the root may be visited twice because it
has no init_parent, which may be problematic with
the concurrent dlopen (and can cause unexpected
ctor order: the root node is not constructed last
if there is a cycle through it)

i think only checking init_parent of a dep is
enough and the root node can have a dummy parent
that is guaranteed to be not a dependency (ldso?)
and constructed so it stops the loop.

> +			} else {
> +				p->next_dep++;
> +			}
> +		}
>  		p->constructed = 1;
>  		decode_vec(p->dynv, dyn, DYN_CNT);
>  		if (dyn[0] & ((1<<DT_FINI) | (1<<DT_FINI_ARRAY))) {
> @@ -1233,17 +1248,19 @@ static void do_init_fini(struct dso *p)
>  			size_t *fn = laddr(p, dyn[DT_INIT_ARRAY]);
>  			while (n--) ((void (*)(void))*fn++)();
>  		}
> -		if (!need_locking && libc.threads_minus_1) {
> -			need_locking = 1;
> -			pthread_mutex_lock(&init_fini_lock);
> -		}
> -	}
> -	if (need_locking) pthread_mutex_unlock(&init_fini_lock);
> +		/* Revisit "parent" dso which caused the just-constructed
> +		 * dso to be pulled in as a dependency. On the next loop
> +		 * iteration we will either descend to construct a sibling
> +		 * of the just-constructed dso, or finish constructing the
> +		 * parent if no unfinished deps remain. */
> +		p = p->init_parent;
> +	}
> +	pthread_mutex_unlock(&init_fini_lock);
>  }


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-03-04 10:58                         ` Szabolcs Nagy
@ 2017-03-06  1:11                           ` Rich Felker
  2017-03-07 22:02                             ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2017-03-06  1:11 UTC (permalink / raw)
  To: musl

On Sat, Mar 04, 2017 at 11:58:18AM +0100, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2017-03-02 20:30:26 -0500]:
> > Here's a v4 of the patch that saves the "init parent" we descended
> > from so that it can return where it left off. There are a couple
> > gratuitous hunks left over adding setting of "needed_by" where it made
> > sense to be set, but it's not actually used anymore. They could be
> > dropped if desired but are probably nice to keep for the sake of
> > consistency of data, even thoough it's data we don't use.
> > 
> > I believe this can be extended to allow concurrent dlopen by amending
> > the case in the tree-walk where a dependency isn't constructed yet but
> > already has an "init parent" to check whether it's
> > pending-construction in the calling thread (recursive dlopen from a
> > ctor) or another thread; in the former case (as now) treat it as
> > already-constructed; in the latter, wait on a condvar that gets
> > signaled at the end of each construction, then continue the loop
> > without advancing p. There are probably some subtleties I'm missing,
> > though.
> ....
> >  static void do_init_fini(struct dso *p)
> >  {
> >  	size_t dyn[DYN_CNT];
> > -	int need_locking = libc.threads_minus_1;
> > -	/* Allow recursive calls that arise when a library calls
> > -	 * dlopen from one of its constructors, but block any
> > -	 * other threads until all ctors have finished. */
> > -	if (need_locking) pthread_mutex_lock(&init_fini_lock);
> > -	for (; p; p=p->prev) {
> > -		if (p->constructed) continue;
> > +	pthread_mutex_lock(&init_fini_lock);
> > +	/* Construct in dependency order without any recursive state. */
> > +	while (p && !p->constructed) {
> > +		/* The following loop descends into the first dependency
> > +		 * that is neither alredy constructed nor pending
> > +		 * construction due to circular deps, stopping only
> > +		 * when it reaches a dso with no remaining dependencies
> > +		 * to descend into. */
> > +		while (p->deps && p->deps[p->next_dep]) {
> > +			if (!p->deps[p->next_dep]->constructed &&
> > +			    !p->deps[p->next_dep]->init_parent) {
> > +				p->deps[p->next_dep]->init_parent = p;
> > +				p = p->deps[p->next_dep++];
> 
> i think the root may be visited twice because it
> has no init_parent, which may be problematic with
> the concurrent dlopen (and can cause unexpected
> ctor order: the root node is not constructed last
> if there is a cycle through it)

Ah, the case where the root is an indirect dependency for itself? Yes,
I think you're right in that case. We should be able to avoid it by
setting the initial p->init_parent to head (the application), I think.

> i think only checking init_parent of a dep is
> enough and the root node can have a dummy parent
> that is guaranteed to be not a dependency (ldso?)
> and constructed so it stops the loop.

I think ldso would work too, but in principle it need not be a
dependency of anything if you have a dynamic-linked program that
doesn't use libc (-nostdlib), so it's better to use head, I think.

Also I agree we don't need to check p->constructed now, but once we
unlock during ctor execution, the !init_parent and !constructed cases
need to be treated separately. If it's constructed or pending
construction in the same thread, we can just do p->next_dep++, but if
it has an init_parent but isn't constructed or pending construction in
same thread (recursive) we need to condvar wait and re-check instead,
right?

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-03-03  1:30                       ` Rich Felker
  2017-03-04 10:58                         ` Szabolcs Nagy
@ 2017-03-06 16:25                         ` Rich Felker
  1 sibling, 0 replies; 21+ messages in thread
From: Rich Felker @ 2017-03-06 16:25 UTC (permalink / raw)
  To: musl

On Thu, Mar 02, 2017 at 08:30:26PM -0500, Rich Felker wrote:
> Here's a v4 of the patch that saves the "init parent" we descended
> from so that it can return where it left off. There are a couple
> gratuitous hunks left over adding setting of "needed_by" where it made
> sense to be set, but it's not actually used anymore. They could be
> dropped if desired but are probably nice to keep for the sake of
> consistency of data, even thoough it's data we don't use.

Upon further review I think these additions to needed_by should be
dropped, or moved to a separate change if deemed correct. They alter
rpath processing. For ldso it doesn't matter since ldso has no deps,
but for LD_PRELOAD libs, the change causes them to use the main app's
rpath for finding their deps, which may or may not be reasonable, but
is a functional change unrelated to dependency-order ctor execution.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-03-06  1:11                           ` Rich Felker
@ 2017-03-07 22:02                             ` Rich Felker
  2017-03-08 18:55                               ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2017-03-07 22:02 UTC (permalink / raw)
  To: musl

On Sun, Mar 05, 2017 at 08:11:59PM -0500, Rich Felker wrote:
> On Sat, Mar 04, 2017 at 11:58:18AM +0100, Szabolcs Nagy wrote:
> > * Rich Felker <dalias@libc.org> [2017-03-02 20:30:26 -0500]:
> > > Here's a v4 of the patch that saves the "init parent" we descended
> > > from so that it can return where it left off. There are a couple
> > > gratuitous hunks left over adding setting of "needed_by" where it made
> > > sense to be set, but it's not actually used anymore. They could be
> > > dropped if desired but are probably nice to keep for the sake of
> > > consistency of data, even thoough it's data we don't use.
> > > 
> > > I believe this can be extended to allow concurrent dlopen by amending
> > > the case in the tree-walk where a dependency isn't constructed yet but
> > > already has an "init parent" to check whether it's
> > > pending-construction in the calling thread (recursive dlopen from a
> > > ctor) or another thread; in the former case (as now) treat it as
> > > already-constructed; in the latter, wait on a condvar that gets
> > > signaled at the end of each construction, then continue the loop
> > > without advancing p. There are probably some subtleties I'm missing,
> > > though.
> > ....
> > >  static void do_init_fini(struct dso *p)
> > >  {
> > >  	size_t dyn[DYN_CNT];
> > > -	int need_locking = libc.threads_minus_1;
> > > -	/* Allow recursive calls that arise when a library calls
> > > -	 * dlopen from one of its constructors, but block any
> > > -	 * other threads until all ctors have finished. */
> > > -	if (need_locking) pthread_mutex_lock(&init_fini_lock);
> > > -	for (; p; p=p->prev) {
> > > -		if (p->constructed) continue;
> > > +	pthread_mutex_lock(&init_fini_lock);
> > > +	/* Construct in dependency order without any recursive state. */
> > > +	while (p && !p->constructed) {
> > > +		/* The following loop descends into the first dependency
> > > +		 * that is neither alredy constructed nor pending
> > > +		 * construction due to circular deps, stopping only
> > > +		 * when it reaches a dso with no remaining dependencies
> > > +		 * to descend into. */
> > > +		while (p->deps && p->deps[p->next_dep]) {
> > > +			if (!p->deps[p->next_dep]->constructed &&
> > > +			    !p->deps[p->next_dep]->init_parent) {
> > > +				p->deps[p->next_dep]->init_parent = p;
> > > +				p = p->deps[p->next_dep++];
> > 
> > i think the root may be visited twice because it
> > has no init_parent, which may be problematic with
> > the concurrent dlopen (and can cause unexpected
> > ctor order: the root node is not constructed last
> > if there is a cycle through it)
> 
> Ah, the case where the root is an indirect dependency for itself? Yes,
> I think you're right in that case. We should be able to avoid it by
> setting the initial p->init_parent to head (the application), I think.
> 
> > i think only checking init_parent of a dep is
> > enough and the root node can have a dummy parent
> > that is guaranteed to be not a dependency (ldso?)
> > and constructed so it stops the loop.
> 
> I think ldso would work too, but in principle it need not be a
> dependency of anything if you have a dynamic-linked program that
> doesn't use libc (-nostdlib), so it's better to use head, I think.
> 
> Also I agree we don't need to check p->constructed now, but once we
> unlock during ctor execution, the !init_parent and !constructed cases
> need to be treated separately. If it's constructed or pending
> construction in the same thread, we can just do p->next_dep++, but if
> it has an init_parent but isn't constructed or pending construction in
> same thread (recursive) we need to condvar wait and re-check instead,
> right?

Arg, deep problems I missed. Quoting from IRC:

<dalias> nsz, uhg, the dep-order draft so far has a big bug
<dalias> p->deps is not actually deps for p
<dalias> rather, it's all indirect deps, but only set for a lib that was explicitly dlopen'd
<dalias> so the new code doesn't actually do dep-order
<dalias> it just walks a flat list of all (breadth-first, not depth-first) direct and indirect dependencies of p
<dalias> and descends into each then immediately backs out
<dalias> because after descending, p->deps is null
<dalias> i think we should get rid of the old use of p->deps
<dalias> which is just undoing temp-globalization of libs during load for reloc purposes

If I first do the work of having a separate global-namespace dso list
(which is a pending change that will speed up relocations anyway),
then the old use of p->deps is no longer needed and we can simply
repurpose it to be direct-deps only.

An alternate idea I like a lot, but don't see how to make work with
concurrency (making it so ctors still running in one thread don't
unnecessarily preclude a new dlopen in another thread), is to build a
queue for ctor execution at the time dependencies are loaded. The
logic for this seems simple: when processing the DT_NEEDED entry,
check if the needed library is already constructed or already in the
ctor queue. If so, do nothing. If not, insert it in the queue just
before the library it's a dependency for.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reviving planned ldso changes
  2017-03-07 22:02                             ` Rich Felker
@ 2017-03-08 18:55                               ` Rich Felker
  0 siblings, 0 replies; 21+ messages in thread
From: Rich Felker @ 2017-03-08 18:55 UTC (permalink / raw)
  To: musl

On Tue, Mar 07, 2017 at 05:02:09PM -0500, Rich Felker wrote:
> On Sun, Mar 05, 2017 at 08:11:59PM -0500, Rich Felker wrote:
> > On Sat, Mar 04, 2017 at 11:58:18AM +0100, Szabolcs Nagy wrote:
> > > * Rich Felker <dalias@libc.org> [2017-03-02 20:30:26 -0500]:
> > > > Here's a v4 of the patch that saves the "init parent" we descended
> > > > from so that it can return where it left off. There are a couple
> > > > gratuitous hunks left over adding setting of "needed_by" where it made
> > > > sense to be set, but it's not actually used anymore. They could be
> > > > dropped if desired but are probably nice to keep for the sake of
> > > > consistency of data, even thoough it's data we don't use.
> > > > 
> > > > I believe this can be extended to allow concurrent dlopen by amending
> > > > the case in the tree-walk where a dependency isn't constructed yet but
> > > > already has an "init parent" to check whether it's
> > > > pending-construction in the calling thread (recursive dlopen from a
> > > > ctor) or another thread; in the former case (as now) treat it as
> > > > already-constructed; in the latter, wait on a condvar that gets
> > > > signaled at the end of each construction, then continue the loop
> > > > without advancing p. There are probably some subtleties I'm missing,
> > > > though.
> > > ....
> > > >  static void do_init_fini(struct dso *p)
> > > >  {
> > > >  	size_t dyn[DYN_CNT];
> > > > -	int need_locking = libc.threads_minus_1;
> > > > -	/* Allow recursive calls that arise when a library calls
> > > > -	 * dlopen from one of its constructors, but block any
> > > > -	 * other threads until all ctors have finished. */
> > > > -	if (need_locking) pthread_mutex_lock(&init_fini_lock);
> > > > -	for (; p; p=p->prev) {
> > > > -		if (p->constructed) continue;
> > > > +	pthread_mutex_lock(&init_fini_lock);
> > > > +	/* Construct in dependency order without any recursive state. */
> > > > +	while (p && !p->constructed) {
> > > > +		/* The following loop descends into the first dependency
> > > > +		 * that is neither alredy constructed nor pending
> > > > +		 * construction due to circular deps, stopping only
> > > > +		 * when it reaches a dso with no remaining dependencies
> > > > +		 * to descend into. */
> > > > +		while (p->deps && p->deps[p->next_dep]) {
> > > > +			if (!p->deps[p->next_dep]->constructed &&
> > > > +			    !p->deps[p->next_dep]->init_parent) {
> > > > +				p->deps[p->next_dep]->init_parent = p;
> > > > +				p = p->deps[p->next_dep++];
> > > 
> > > i think the root may be visited twice because it
> > > has no init_parent, which may be problematic with
> > > the concurrent dlopen (and can cause unexpected
> > > ctor order: the root node is not constructed last
> > > if there is a cycle through it)
> > 
> > Ah, the case where the root is an indirect dependency for itself? Yes,
> > I think you're right in that case. We should be able to avoid it by
> > setting the initial p->init_parent to head (the application), I think.
> > 
> > > i think only checking init_parent of a dep is
> > > enough and the root node can have a dummy parent
> > > that is guaranteed to be not a dependency (ldso?)
> > > and constructed so it stops the loop.
> > 
> > I think ldso would work too, but in principle it need not be a
> > dependency of anything if you have a dynamic-linked program that
> > doesn't use libc (-nostdlib), so it's better to use head, I think.
> > 
> > Also I agree we don't need to check p->constructed now, but once we
> > unlock during ctor execution, the !init_parent and !constructed cases
> > need to be treated separately. If it's constructed or pending
> > construction in the same thread, we can just do p->next_dep++, but if
> > it has an init_parent but isn't constructed or pending construction in
> > same thread (recursive) we need to condvar wait and re-check instead,
> > right?
> 
> Arg, deep problems I missed. Quoting from IRC:
> 
> <dalias> nsz, uhg, the dep-order draft so far has a big bug
> <dalias> p->deps is not actually deps for p
> <dalias> rather, it's all indirect deps, but only set for a lib that was explicitly dlopen'd
> <dalias> so the new code doesn't actually do dep-order
> <dalias> it just walks a flat list of all (breadth-first, not depth-first) direct and indirect dependencies of p
> <dalias> and descends into each then immediately backs out
> <dalias> because after descending, p->deps is null
> <dalias> i think we should get rid of the old use of p->deps
> <dalias> which is just undoing temp-globalization of libs during load for reloc purposes
> 
> If I first do the work of having a separate global-namespace dso list
> (which is a pending change that will speed up relocations anyway),
> then the old use of p->deps is no longer needed and we can simply
> repurpose it to be direct-deps only.

This is incorrect. p->deps in its current form is used for dlsym
"dependency ordering" symbol resolution, where a breadth-first list of
all direct and indirect dependencies is exactly what you want. So I
don't think it can be eliminated.

I wonder if it suffices to walk the flat p->deps in reverse. I suspect
there are cases where this is wrong when a dependency appears more
than once.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-03-08 18:55 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-03  5:43 Reviving planned ldso changes Rich Felker
2017-01-04  6:06 ` Rich Felker
2017-01-04  6:22   ` Rich Felker
2017-01-04 19:36     ` Rich Felker
2017-01-14 21:30       ` A. Wilcox
2017-01-15 17:44         ` Rich Felker
2017-02-26  1:04           ` Szabolcs Nagy
2017-02-26  1:39             ` Rich Felker
2017-02-26 10:28               ` Szabolcs Nagy
2017-02-26 15:20                 ` Rich Felker
2017-02-26 15:34                   ` Szabolcs Nagy
2017-02-26 21:39                     ` Rich Felker
2017-03-03  1:30                       ` Rich Felker
2017-03-04 10:58                         ` Szabolcs Nagy
2017-03-06  1:11                           ` Rich Felker
2017-03-07 22:02                             ` Rich Felker
2017-03-08 18:55                               ` Rich Felker
2017-03-06 16:25                         ` Rich Felker
2017-01-04 10:51 ` Szabolcs Nagy
2017-02-16  1:58   ` Szabolcs Nagy
2017-02-16  2:39     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).