mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found
@ 2024-02-07  1:22 Markus Mayer
  2024-02-07  1:22 ` [musl] [PATCH 1/1] ldso: continue searching if wrong architecture is found first Markus Mayer
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Markus Mayer @ 2024-02-07  1:22 UTC (permalink / raw)
  To: Musl Mailing List; +Cc: Markus Mayer

Hi all,

We have just discovered an interesting issue after recently
transitioning from glibc to musl-libc.

This is an Aarch64 platform with 64-bit userland that also supports
running 32-bit applications. We are using GCC 12.3 and musl 1.2.3.

Some application teams are still developing their apps as 32-bit
applications. As mentioned, the base system is 64-bit. To ensure all app
dependencies are met, all shared libraries the app needs are bundled
with the app in a tar-ball by the application developers. The launch
script of the app will set LD_LIBRARY_PATH to the directory containing
these 32-bit libraries before calling the actual application. This way,
the app can find all its shared libraries. Some of these 32-bit
libraries are "standard" libraries (libz, libcrypto) that also exist as
64-bit version in /usr/lib.

What we have discovered is that 64-bit applications launched by the
32-bit app will fail due to a shared library mismatch. The 32-bit app
needs to call system utilities, which are 64-bit. So, this needs to
work.

It took some digging, but I think I know what is going on. Let me
summarize how to reproduce the problem, why it occurs and, hopefully,
what can be done about it.

The ingredients:

* A 64-bit system
* A directory containing 32-bit versions of "standard" libraries also
  present on the system as 64-bit versions (libz, libcrypto, etc.)
* LD_LIBRARY_PATH pointing to the custom 32-bit library directory
* Attempting to launch a 64-bit application that uses one of the
  libraries that exist as 32-bit and 64-bit version

The problem:

If LD_LIBRARY_PATH is set to a directory containing 32-bit libraries and
then a 64-bit binary is invoked, the shared library loader will pick up
the 32-bit version of a library first, because it'll look at
LD_LIBRARY_PATH before anything else. Mapping the 32-bit library into
the 64-bit process will fail. This much is expected.

However, even though the correct library resides on the system, the
shared library loader never attempts to look for it. The 64-bit process
will fail to launch, even though there is no reason for the failure. The
problem only exists, because the shared library launcher doesn't look in
the remaining shared library directories.

The solution:

The shared library loader needs to keep searching the rest of the
library search path if the library it found in LD_LIBRARY_PATH could not
be mapped. If the library loader does this, everything will work fine as
long as the library resides on the system in a well known path.

How to reproduce:

The problem can be simulated easily in the shell as follows.

1) Baseline: call /sbin/lsmod normally. Everything is working.

$ /sbin/lsmod
Module                  Size  Used by
bdc                    53248  0
udc_core               49152  1 bdc

2) Set LD_LIBRARY_PATH to the 32-bit directory and try again.

$ LD_LIBRARY_PATH=/path/to/app /sbin/lsmod
Error loading shared library libz.so.1: Exec format error (needed by /sbin/lsmod)
Error loading shared library libcrypto.so.3: Exec format error (needed by /sbin/lsmod)

Suddenly the 64-bit binary fails to run, because the copies of libz and
libcrypto the shared library loader finds are the 32-bit versions
residing in the app directory. It never tries looking in /usr/lib.

Potential solution (the proposed patch):

# LD_LIBRARY_PATH=/path/to/app /path/to/custom/libc.so /sbin/lsmod
Module                  Size  Used by
bdc                    53248  0
udc_core               49152  1 bdc

With the attached patch applied, everything is working again. Here, I am
still setting LD_LIBRARY_PATH to the offending directory, but then I am
using a patched version of musl-libc to launch /sbin/lsmod. This version
of libc.so contains the patch, so it *will* search the rest of the
system directories after discovering that the 32-bit versions of libz
and libcrypto didn't work out. So, /sbin/lsmod is able to run fine.

We can confirm this using strace:

# LD_LIBRARY_PATH=. /path/to/custom/libc.so \
    /usr/bin/strace -e openat /path/to/custom/libc.so --list /sbin/lsmod
openat(AT_FDCWD, "/sbin/lsmod", O_RDONLY|O_LARGEFILE) = 3
	/lib/ld-musl-aarch64.so.1 (0x7fb72b1000)
openat(AT_FDCWD, "./liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld-musl-aarch64.path", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
	liblzma.so.5 => /usr/lib/liblzma.so.5 (0x7fb7265000)
openat(AT_FDCWD, "./libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
	libz.so.1 => /usr/lib/libz.so.1 (0x7fb7251000)
openat(AT_FDCWD, "./libcrypto.so.3", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/libcrypto.so.3", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/libcrypto.so.3", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libcrypto.so.3", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
	libcrypto.so.3 => /usr/lib/libcrypto.so.3 (0x7fb6e92000)
	libc.so => /lib/ld-musl-aarch64.so.1 (0x7fb72b1000)

This call to strace is invoking the patched libc.so twice. Once, so
strace itself won't fail, and once to launch /sbin/lsmod. We can see it
finds the 32-bit versions of a number of libraries, but then keeps
searching, once it finds it is unable to map them. In the end, it finds
the proper libraries, and everything is working.

Conversely, the unpatched libc.so will not try any other location
besides the one pointed to by LD_LIBRARY_PATH, if library it is looking
for exists in LD_LIBRARY_PATH. (It'll find the proper liblzma, because
that does NOT exist as 32-bit version in LD_LIBRARY_PATH. But libz and
libcrypto do, and that's where it fails.)

openat(AT_FDCWD, "/sbin/lsmod", O_RDONLY|O_LARGEFILE) = 3
openat(AT_FDCWD, "./liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld-musl-aarch64.path", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
openat(AT_FDCWD, "./libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
openat(AT_FDCWD, "./libcrypto.so.3", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
+++ exited with 127 +++

My proposal is, of course, only one possibility of many to achieve the
goal of searching the system library locations after mapping a library
from LD_LIBRARY_PATH fails.

What is your take? Does the idea make sense in principal? Does my patch
make sense?

Continuing to search the system directories does seem to be the right
thing to do under the circumstances described here. Also, it is what
glibc does.

Regards,
-Markus

Markus Mayer (1):
  ldso: continue searching if wrong architecture is found first

 ldso/dynlink.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [musl] [PATCH 1/1] ldso: continue searching if wrong architecture is found first
  2024-02-07  1:22 [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found Markus Mayer
@ 2024-02-07  1:22 ` Markus Mayer
  2024-02-07 17:30   ` Rich Felker
  2024-02-07  8:26 ` [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found i262jq
  2024-02-07 18:09 ` Colin Cross
  2 siblings, 1 reply; 13+ messages in thread
From: Markus Mayer @ 2024-02-07  1:22 UTC (permalink / raw)
  To: Musl Mailing List; +Cc: Markus Mayer

When LD_LIBRARY_PATH is being used, the shared library loader may end up
finding a shared library with the correct name but from a wrong
architecture. This primarily happens when a 32-bit / 64-bit mismatch
occurs.

Rather than giving up immediately and aborting, the shared library
loader should continue to look in all known locations for a matching
library that may work.

Signed-off-by: Markus Mayer <mmayer@broadcom.com>
---
 ldso/dynlink.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index fd09ca69932c..d7f085b0a212 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -1019,6 +1019,7 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
 	size_t alloc_size;
 	int n_th = 0;
 	int is_self = 0;
+	int is_using_env_path = 0;
 
 	if (!*name) {
 		errno = EINVAL;
@@ -1072,7 +1073,11 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
 		}
 		if (strlen(name) > NAME_MAX) return 0;
 		fd = -1;
-		if (env_path) fd = path_open(name, env_path, buf, sizeof buf);
+		if (env_path) {
+			fd = path_open(name, env_path, buf, sizeof buf);
+			is_using_env_path = (fd >= 0);
+		}
+retry:
 		for (p=needed_by; fd == -1 && p; p=p->needed_by) {
 			if (fixup_rpath(p, buf, sizeof buf) < 0)
 				fd = -2; /* Inhibit further search. */
@@ -1138,7 +1143,16 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
 	}
 	map = noload ? 0 : map_library(fd, &temp_dso);
 	close(fd);
-	if (!map) return 0;
+	if (!map) {
+		if (is_using_env_path) {
+			/* If LD_LIBARY_PATH resulted in a library that couldn't
+			 * be loaded, let's try the default locations. */
+			fd = -1;
+			is_using_env_path = 0;
+			goto retry;
+		}
+		return 0;
+	}
 
 	/* Avoid the danger of getting two versions of libc mapped into the
 	 * same process when an absolute pathname was used. The symbols
-- 
2.43.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found
  2024-02-07  1:22 [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found Markus Mayer
  2024-02-07  1:22 ` [musl] [PATCH 1/1] ldso: continue searching if wrong architecture is found first Markus Mayer
@ 2024-02-07  8:26 ` i262jq
  2024-02-07 16:45   ` enh
  2024-02-07 18:09 ` Colin Cross
  2 siblings, 1 reply; 13+ messages in thread
From: i262jq @ 2024-02-07  8:26 UTC (permalink / raw)
  To: musl

On Tue, Feb 06, 2024 at 05:22:42PM -0800, Markus Mayer wrote:
> script of the app will set LD_LIBRARY_PATH to the directory containing
[...]
> What we have discovered is that 64-bit applications launched by the
> 32-bit app will fail due to a shared library mismatch. The 32-bit app

This is a known problem with the LD_LIBRARY_PATH semantics.

> The solution:
> 
> The shared library loader needs to keep searching the rest of the
> library search path if the library it found in LD_LIBRARY_PATH could not
> be mapped. If the library loader does this, everything will work fine as
> long as the library resides on the system in a well known path.

This would possibly solve your particular case, but not the main trouble
inherent in LD_LIBRARY_PATH (a child process mapping an unexpected
library instance).

A well working solution is to wrap each application binary in a starter
which explicitly runs the dynamic loader with the argument
 --library-path <where the libraries are>
(then the path to the application binary and the arguments, of course)

This solves the problems reliably and is well proven.

My 2c
/i262jq

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found
  2024-02-07  8:26 ` [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found i262jq
@ 2024-02-07 16:45   ` enh
  2024-02-07 17:07     ` Thorsten Glaser
  0 siblings, 1 reply; 13+ messages in thread
From: enh @ 2024-02-07 16:45 UTC (permalink / raw)
  To: musl

On Wed, Feb 7, 2024 at 12:29 AM <i262jq@0w.se> wrote:
>
> On Tue, Feb 06, 2024 at 05:22:42PM -0800, Markus Mayer wrote:
> > script of the app will set LD_LIBRARY_PATH to the directory containing
> [...]
> > What we have discovered is that 64-bit applications launched by the
> > 32-bit app will fail due to a shared library mismatch. The 32-bit app
>
> This is a known problem with the LD_LIBRARY_PATH semantics.
>
> > The solution:
> >
> > The shared library loader needs to keep searching the rest of the
> > library search path if the library it found in LD_LIBRARY_PATH could not
> > be mapped. If the library loader does this, everything will work fine as
> > long as the library resides on the system in a well known path.
>
> This would possibly solve your particular case, but not the main trouble
> inherent in LD_LIBRARY_PATH (a child process mapping an unexpected
> library instance).
>
> A well working solution is to wrap each application binary in a starter
> which explicitly runs the dynamic loader with the argument
>  --library-path <where the libraries are>
> (then the path to the application binary and the arguments, of course)
>
> This solves the problems reliably and is well proven.

(in particular: this is the way Android went... we originally did
support just ignoring invalid entries on the path, but that was
confusing and error-prone so we switched to a clear "your library path
is wrong for this binary" error. surprisingly [to me] it was also
helpful in catching cases where people had libraries for the wrong
_architecture_, not just the wrong bitness. "with enough users, all
errors are common" :-) )

> My 2c
> /i262jq

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found
  2024-02-07 16:45   ` enh
@ 2024-02-07 17:07     ` Thorsten Glaser
  2024-02-07 17:54       ` enh
  0 siblings, 1 reply; 13+ messages in thread
From: Thorsten Glaser @ 2024-02-07 17:07 UTC (permalink / raw)
  To: musl

enh dixit:

>helpful in catching cases where people had libraries for the wrong
>_architecture_, not just the wrong bitness. "with enough users, all
>errors are common" :-) )

But are these errors?

On Debian, with Multi-Arch and qemu-user and whatnot, that would not
even be unusual, if perhaps not common.

I’m mildly tending towards ignoring objects that are not valid
shared objects of the current architecture and ABI.

bye,
//mirabilos
-- 
Support mksh as /bin/sh and RoQA dash NOW!
‣ src:bash (429 (458) bugs: 0 RC, 295 (315) I&N, 134 (143) M&W, 0 F&P) + 209
‣ src:dash (90 (104) bugs: 0 RC, 51 (54) I&N, 39 (50) M&W, 0 F&P) + 62 ubu
‣ src:mksh (1 bug: 0 RC, 0 I&N, 1 M&W, 0 F&P)
dash has two RC bugs they just closed because they don’t care about quality…

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] [PATCH 1/1] ldso: continue searching if wrong architecture is found first
  2024-02-07  1:22 ` [musl] [PATCH 1/1] ldso: continue searching if wrong architecture is found first Markus Mayer
@ 2024-02-07 17:30   ` Rich Felker
  2024-02-07 19:10     ` i262jq
  2024-02-07 20:59     ` Markus Mayer
  0 siblings, 2 replies; 13+ messages in thread
From: Rich Felker @ 2024-02-07 17:30 UTC (permalink / raw)
  To: Markus Mayer; +Cc: Musl Mailing List

On Tue, Feb 06, 2024 at 05:22:43PM -0800, Markus Mayer wrote:
> When LD_LIBRARY_PATH is being used, the shared library loader may end up
> finding a shared library with the correct name but from a wrong
> architecture. This primarily happens when a 32-bit / 64-bit mismatch
> occurs.
> 
> Rather than giving up immediately and aborting, the shared library
> loader should continue to look in all known locations for a matching
> library that may work.

I don't think the concept is objectionable, but the implementation
looks wrong in at least two major ways:

1. It does not seem to continue the path search where it left off, but
   just skips the rest of the LD_LIBRARY_PATH search on retry. Yes
   this solves your problem of having a wrong-arch path element
   present, but it breaks search of any correct-arch path elements
   also present later there. This might not be a big deal, but it
   strongly violates principle of least surprise, I think.

The bigger one is:

2. No consideration is made for the *cause of failure* when retrying.
   Any in particular, lots of potential errors that were supposed to
   be reportable errors now turn into vectors for loading the wrong
   library, potentially under control of malicious inputs or
   environmental factors (kernel resource exhaustion, etc.).
   
If "wrong library arch" is going to be a continue-path-search
operation, it needs to be distinguishable (as "we successfully read
the Ehdrs and determined the library is not suitable for this arch")
from inconclusive errors (like out-of-memory, too many mmaps, etc.)

This probably means some minor refactoring is called for, replacing
the open calls in path_open and the direct use of open for absolute
paths by a new function like lib_open or something that would also
place the start of the file into a caller-provided buffer and validate
that it's usable before returning success, moving that logic out of
map_library and letting map_library assume it already has the initial
buffer contents available on entry.

If you think there are less invasive ways to do this, I'd be happy to
hear other ideas too.

Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found
  2024-02-07 17:07     ` Thorsten Glaser
@ 2024-02-07 17:54       ` enh
  0 siblings, 0 replies; 13+ messages in thread
From: enh @ 2024-02-07 17:54 UTC (permalink / raw)
  To: musl

On Wed, Feb 7, 2024 at 9:17 AM Thorsten Glaser <tg@mirbsd.de> wrote:
>
> enh dixit:
>
> >helpful in catching cases where people had libraries for the wrong
> >_architecture_, not just the wrong bitness. "with enough users, all
> >errors are common" :-) )
>
> But are these errors?

well, that's the philosophical question that brings us here today :-)

> On Debian, with Multi-Arch and qemu-user and whatnot, that would not
> even be unusual, if perhaps not common.
>
> I’m mildly tending towards ignoring objects that are not valid
> shared objects of the current architecture and ABI.

i think the strongest argument we heard in favor of ignoring this is
the "what about a 64-bit process that exec()s a 32-bit process, or
vice versa?". on Android that's probably also a bug. having the
_libraries_ for two bitnesses isn't unusual (or "wasn't" anyway --- in
2024 we're getting to the point where low-end is 32-only and the rest
64-only, but 32/64 was common for the last decade), but _executables_
(other than the two zygotes) was probably a bug, and encouraging apps
not to have 64-bit helpers in addition to 64-bit libraries was only
going to cause us trouble for the inevitable _switch_ to 64-only when
arm64 hardware dropped 32-bit support. moreover, getting a _correct_
value of LD_LIBRARY_PATH on Android is non-trivial anyway, given the
split between system and vendor code.

but, yeah, if you're mainly thinking about x86-64, that doesn't seem
likely to ever drop 32-bit support, so your "transition" might be
something you never come out of.

> bye,
> //mirabilos
> --
> Support mksh as /bin/sh and RoQA dash NOW!
> ‣ src:bash (429 (458) bugs: 0 RC, 295 (315) I&N, 134 (143) M&W, 0 F&P) + 209
> ‣ src:dash (90 (104) bugs: 0 RC, 51 (54) I&N, 39 (50) M&W, 0 F&P) + 62 ubu
> ‣ src:mksh (1 bug: 0 RC, 0 I&N, 1 M&W, 0 F&P)
> dash has two RC bugs they just closed because they don’t care about quality…

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found
  2024-02-07  1:22 [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found Markus Mayer
  2024-02-07  1:22 ` [musl] [PATCH 1/1] ldso: continue searching if wrong architecture is found first Markus Mayer
  2024-02-07  8:26 ` [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found i262jq
@ 2024-02-07 18:09 ` Colin Cross
  2024-04-02 20:59   ` Markus Mayer
  2 siblings, 1 reply; 13+ messages in thread
From: Colin Cross @ 2024-02-07 18:09 UTC (permalink / raw)
  To: musl; +Cc: Markus Mayer

On Tue, Feb 6, 2024 at 5:23 PM Markus Mayer <mmayer@broadcom.com> wrote:
>
> Hi all,
>
> We have just discovered an interesting issue after recently
> transitioning from glibc to musl-libc.
>
> This is an Aarch64 platform with 64-bit userland that also supports
> running 32-bit applications. We are using GCC 12.3 and musl 1.2.3.
>
> Some application teams are still developing their apps as 32-bit
> applications. As mentioned, the base system is 64-bit. To ensure all app
> dependencies are met, all shared libraries the app needs are bundled
> with the app in a tar-ball by the application developers. The launch
> script of the app will set LD_LIBRARY_PATH to the directory containing
> these 32-bit libraries before calling the actual application. This way,
> the app can find all its shared libraries. Some of these 32-bit
> libraries are "standard" libraries (libz, libcrypto) that also exist as
> 64-bit version in /usr/lib.
>
> What we have discovered is that 64-bit applications launched by the
> 32-bit app will fail due to a shared library mismatch. The 32-bit app
> needs to call system utilities, which are 64-bit. So, this needs to
> work.
>
> It took some digging, but I think I know what is going on. Let me
> summarize how to reproduce the problem, why it occurs and, hopefully,
> what can be done about it.
>
> The ingredients:
>
> * A 64-bit system
> * A directory containing 32-bit versions of "standard" libraries also
>   present on the system as 64-bit versions (libz, libcrypto, etc.)
> * LD_LIBRARY_PATH pointing to the custom 32-bit library directory
> * Attempting to launch a 64-bit application that uses one of the
>   libraries that exist as 32-bit and 64-bit version
>
> The problem:
>
> If LD_LIBRARY_PATH is set to a directory containing 32-bit libraries and
> then a 64-bit binary is invoked, the shared library loader will pick up
> the 32-bit version of a library first, because it'll look at
> LD_LIBRARY_PATH before anything else. Mapping the 32-bit library into
> the 64-bit process will fail. This much is expected.
>
> However, even though the correct library resides on the system, the
> shared library loader never attempts to look for it. The 64-bit process
> will fail to launch, even though there is no reason for the failure. The
> problem only exists, because the shared library launcher doesn't look in
> the remaining shared library directories.
>
> The solution:
>
> The shared library loader needs to keep searching the rest of the
> library search path if the library it found in LD_LIBRARY_PATH could not
> be mapped. If the library loader does this, everything will work fine as
> long as the library resides on the system in a well known path.
>
> How to reproduce:
>
> The problem can be simulated easily in the shell as follows.
>
> 1) Baseline: call /sbin/lsmod normally. Everything is working.
>
> $ /sbin/lsmod
> Module                  Size  Used by
> bdc                    53248  0
> udc_core               49152  1 bdc
>
> 2) Set LD_LIBRARY_PATH to the 32-bit directory and try again.
>
> $ LD_LIBRARY_PATH=/path/to/app /sbin/lsmod
> Error loading shared library libz.so.1: Exec format error (needed by /sbin/lsmod)
> Error loading shared library libcrypto.so.3: Exec format error (needed by /sbin/lsmod)
>
> Suddenly the 64-bit binary fails to run, because the copies of libz and
> libcrypto the shared library loader finds are the 32-bit versions
> residing in the app directory. It never tries looking in /usr/lib.
>
> Potential solution (the proposed patch):
>
> # LD_LIBRARY_PATH=/path/to/app /path/to/custom/libc.so /sbin/lsmod
> Module                  Size  Used by
> bdc                    53248  0
> udc_core               49152  1 bdc
>
> With the attached patch applied, everything is working again. Here, I am
> still setting LD_LIBRARY_PATH to the offending directory, but then I am
> using a patched version of musl-libc to launch /sbin/lsmod. This version
> of libc.so contains the patch, so it *will* search the rest of the
> system directories after discovering that the 32-bit versions of libz
> and libcrypto didn't work out. So, /sbin/lsmod is able to run fine.
>
> We can confirm this using strace:
>
> # LD_LIBRARY_PATH=. /path/to/custom/libc.so \
>     /usr/bin/strace -e openat /path/to/custom/libc.so --list /sbin/lsmod
> openat(AT_FDCWD, "/sbin/lsmod", O_RDONLY|O_LARGEFILE) = 3
>         /lib/ld-musl-aarch64.so.1 (0x7fb72b1000)
> openat(AT_FDCWD, "./liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/etc/ld-musl-aarch64.path", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/lib/liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/local/lib/liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/lib/liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
>         liblzma.so.5 => /usr/lib/liblzma.so.5 (0x7fb7265000)
> openat(AT_FDCWD, "./libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
> openat(AT_FDCWD, "/lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/local/lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/lib/libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
>         libz.so.1 => /usr/lib/libz.so.1 (0x7fb7251000)
> openat(AT_FDCWD, "./libcrypto.so.3", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
> openat(AT_FDCWD, "/lib/libcrypto.so.3", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/local/lib/libcrypto.so.3", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/lib/libcrypto.so.3", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
>         libcrypto.so.3 => /usr/lib/libcrypto.so.3 (0x7fb6e92000)
>         libc.so => /lib/ld-musl-aarch64.so.1 (0x7fb72b1000)
>
> This call to strace is invoking the patched libc.so twice. Once, so
> strace itself won't fail, and once to launch /sbin/lsmod. We can see it
> finds the 32-bit versions of a number of libraries, but then keeps
> searching, once it finds it is unable to map them. In the end, it finds
> the proper libraries, and everything is working.
>
> Conversely, the unpatched libc.so will not try any other location
> besides the one pointed to by LD_LIBRARY_PATH, if library it is looking
> for exists in LD_LIBRARY_PATH. (It'll find the proper liblzma, because
> that does NOT exist as 32-bit version in LD_LIBRARY_PATH. But libz and
> libcrypto do, and that's where it fails.)
>
> openat(AT_FDCWD, "/sbin/lsmod", O_RDONLY|O_LARGEFILE) = 3
> openat(AT_FDCWD, "./liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/etc/ld-musl-aarch64.path", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/lib/liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/local/lib/liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/lib/liblzma.so.5", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
> openat(AT_FDCWD, "./libz.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
> openat(AT_FDCWD, "./libcrypto.so.3", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
> +++ exited with 127 +++
>
> My proposal is, of course, only one possibility of many to achieve the
> goal of searching the system library locations after mapping a library
> from LD_LIBRARY_PATH fails.
>
> What is your take? Does the idea make sense in principal? Does my patch
> make sense?
>
> Continuing to search the system directories does seem to be the right
> thing to do under the circumstances described here. Also, it is what
> glibc does.
>
> Regards,
> -Markus
>
> Markus Mayer (1):
>   ldso: continue searching if wrong architecture is found first
>
>  ldso/dynlink.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> --
> 2.43.0
>

There is a previous discussion of the same issue at
https://www.openwall.com/lists/musl/2023/02/07/3.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] [PATCH 1/1] ldso: continue searching if wrong architecture is found first
  2024-02-07 17:30   ` Rich Felker
@ 2024-02-07 19:10     ` i262jq
  2024-02-07 19:51       ` Thorsten Glaser
  2024-02-07 20:59     ` Markus Mayer
  1 sibling, 1 reply; 13+ messages in thread
From: i262jq @ 2024-02-07 19:10 UTC (permalink / raw)
  To: musl

Dear Rich,

On Wed, Feb 07, 2024 at 12:30:24PM -0500, Rich Felker wrote:
> On Tue, Feb 06, 2024 at 05:22:43PM -0800, Markus Mayer wrote:
> > When LD_LIBRARY_PATH is being used, the shared library loader may end up
> > finding a shared library with the correct name but from a wrong
> > architecture. This primarily happens when a 32-bit / 64-bit mismatch
> > occurs.
> > 
> > Rather than giving up immediately and aborting, the shared library
> > loader should continue to look in all known locations for a matching
> > library that may work.
> 
> I don't think the concept is objectionable, but the implementation

I wonder how the loader would distinguish between "right" and "wrong"
if there are multiple libraries differing for example only in some minor
build option - which difference still can be crucial for the actual
application or for other used libraries?

In other words:

LD_LIBRARY_PATH is inherently unsafe when the applications can execute
other binaries than prepared by the same party, on a system not managed
by the same party.

Is it really worth to make LD_LIBRARY_PATH "more usable", but still unsafe?

To the contrary, relying on the explicit loader in a wrapper is totally
safe, because the path to the libraries is not inherited at any later
exec.

Are there any practical cases where the overhead of an extra execve()
of a small wrapper would be a noticeable problem?

> If you think there are less invasive ways to do this, I'd be happy to
> hear other ideas too.

The least invasive way might be to avoid introducing such support?

My perspective reflects years of handling third party applications and the
consequences of their use of LD_LIBRARY_PATH. This works - *mostly*.

At the same time, building and freely mixing applications linked
to a plethora of simultaneously available versions and instances of
libraries, this worked without any problems ever, thanks to the loader
with --library-path. AFAICS, reliance on checks in the loader would
never provide the same level of robustness, ease and flexibility.

Thanks for musl, regards,
/i262jq

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] [PATCH 1/1] ldso: continue searching if wrong architecture is found first
  2024-02-07 19:10     ` i262jq
@ 2024-02-07 19:51       ` Thorsten Glaser
  2024-02-08 15:52         ` i262jq
  0 siblings, 1 reply; 13+ messages in thread
From: Thorsten Glaser @ 2024-02-07 19:51 UTC (permalink / raw)
  To: musl

i262jq@0w.se dixit:

>I wonder how the loader would distinguish between "right" and "wrong"
>if there are multiple libraries differing for example only in some minor

That is precisely what we wondered about for the time_t-64-on-ILP32
transition in Debian.

>build option - which difference still can be crucial for the actual
>application or for other used libraries?

tl;dr: random build options cannot be distinguished. You have the
flags in the ELF header, which have CPU architecture and bitness,
and (other than x32 (amd64ilp32) and arm64ilp32, from there on it
gets machine-dependent as you need to define per-architecture ABI
flags, e.g. EABI vs. old ABI or MIPS o32 vs n32 vs n64.

So this will always be some kind of system-global at least, if not
world-global, thing, not something for the odd application or admin
to decide.

>Is it really worth to make LD_LIBRARY_PATH "more usable", but still unsafe?

I still think so.

>To the contrary, relying on the explicit loader in a wrapper is totally
>safe, because the path to the libraries is not inherited at any later
>exec.

What if the first binary you run is not the one that needs to get the
changed library path, but the binaries it runs?

>Are there any practical cases where the overhead of an extra execve()
>of a small wrapper would be a noticeable problem?

It may be very hard to get these into the right places.

bye,
//mirabilos
-- 
„Cool, /usr/share/doc/mksh/examples/uhr.gz ist ja ein Grund,
mksh auf jedem System zu installieren.“
	-- XTaran auf der OpenRheinRuhr, ganz begeistert
(EN: “[…]uhr.gz is a reason to install mksh on every system.”)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] [PATCH 1/1] ldso: continue searching if wrong architecture is found first
  2024-02-07 17:30   ` Rich Felker
  2024-02-07 19:10     ` i262jq
@ 2024-02-07 20:59     ` Markus Mayer
  1 sibling, 0 replies; 13+ messages in thread
From: Markus Mayer @ 2024-02-07 20:59 UTC (permalink / raw)
  To: Musl Mailing List; +Cc: Rich Felker

On Wed, 7 Feb 2024 at 09:30, Rich Felker <dalias@libc.org> wrote:
>
> On Tue, Feb 06, 2024 at 05:22:43PM -0800, Markus Mayer wrote:
> > When LD_LIBRARY_PATH is being used, the shared library loader may end up
> > finding a shared library with the correct name but from a wrong
> > architecture. This primarily happens when a 32-bit / 64-bit mismatch
> > occurs.
> >
> > Rather than giving up immediately and aborting, the shared library
> > loader should continue to look in all known locations for a matching
> > library that may work.
>
> I don't think the concept is objectionable, but the implementation
> looks wrong in at least two major ways:

I had a feeling that my approach might be too simplistic. At least it
served as a proof-of-concept.

> 1. It does not seem to continue the path search where it left off, but
>    just skips the rest of the LD_LIBRARY_PATH search on retry. Yes
>    this solves your problem of having a wrong-arch path element
>    present, but it breaks search of any correct-arch path elements
>    also present later there. This might not be a big deal, but it
>    strongly violates principle of least surprise, I think.

You are correct. The implementation does not deal with the remainder
of LD_LIBRARY_PATH and just searches system directories after it finds
an issue with a library in LD_LIBRARY_PATH. I can see how that could
be problematic in other scenarios.

> The bigger one is:
>
> 2. No consideration is made for the *cause of failure* when retrying.
>    Any in particular, lots of potential errors that were supposed to
>    be reportable errors now turn into vectors for loading the wrong
>    library, potentially under control of malicious inputs or
>    environmental factors (kernel resource exhaustion, etc.).

Okay, That makes a lot of sense. We definitely wouldn't want to
introduce potential instabilities or even create attack vectors.

> If "wrong library arch" is going to be a continue-path-search
> operation, it needs to be distinguishable (as "we successfully read
> the Ehdrs and determined the library is not suitable for this arch")
> from inconclusive errors (like out-of-memory, too many mmaps, etc.)

I did some more digging and found where and how it is failing in my
32-bit / 64-bit scenario. I am getting this output when I sprinkle
some debug messages throughout the code.

map_library, 647: fd=3, phsize=324108288, l=960
map_library, 652: fd=3, phsize=324108288, l=0
map_library, 816: errno=8

Of course the line numbers are off between the output and the actual sources.

 640         ssize_t l = read(fd, buf, sizeof buf);
 641         eh = buf;
 642         if (l<0) return 0;
 643         if (l<sizeof *eh || (eh->e_type != ET_DYN && eh->e_type
!= ET_EXEC))
 644                 goto noexec;
 645         phsize = eh->e_phentsize * eh->e_phnum;
> "Line 647" output
 646         if (phsize > sizeof buf - sizeof *eh) {
 647                 allocated_buf = malloc(phsize);
 648                 if (!allocated_buf) return 0;
 649                 l = pread(fd, allocated_buf, phsize, eh->e_phoff);
> "Line 652" output
 650                 if (l < 0) goto error;
 651                 if (l != phsize) goto noexec;
 652                 ph = ph0 = allocated_buf;
 653         } else if (eh->e_phoff + phsize > l) {
 ...
 803 error:
> "Line 816" output
 804         if (map!=MAP_FAILED) unmap_library(dso);
 805         free(allocated_buf);
 806         return 0;
 807 }

So it is finding out only rather late in the process that mapping the
library didn't work. By this point, it isn't easy to go back to
searching. Conversely, it knows right away when open() fails, so it
can easily keep looking.

> This probably means some minor refactoring is called for, replacing
> the open calls in path_open and the direct use of open for absolute
> paths by a new function like lib_open or something that would also
> place the start of the file into a caller-provided buffer and validate
> that it's usable before returning success, moving that logic out of
> map_library and letting map_library assume it already has the initial
> buffer contents available on entry.

I'll mull this one over a bit. I guess we'd want a way to treat "exec
format error" in the same way as open() failing, i.e. as "we didn't
find anything useful, keep on searching". So, we need to know sooner
whether or not mapping will work or not. If I am understanding your
proposal correctly, that is exactly what you are getting at, as well.

> If you think there are less invasive ways to do this, I'd be happy to
> hear other ideas too.

We'll see. :-)

Regards,
-Markus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] [PATCH 1/1] ldso: continue searching if wrong architecture is found first
  2024-02-07 19:51       ` Thorsten Glaser
@ 2024-02-08 15:52         ` i262jq
  0 siblings, 0 replies; 13+ messages in thread
From: i262jq @ 2024-02-08 15:52 UTC (permalink / raw)
  To: musl

On Wed, Feb 07, 2024 at 07:51:54PM +0000, Thorsten Glaser wrote:
> i262jq@0w.se dixit:
> >To the contrary, relying on the explicit loader in a wrapper is totally
> >safe, because the path to the libraries is not inherited at any later
> >exec.
> 
> What if the first binary you run is not the one that needs to get the
> changed library path, but the binaries it runs?

Only starters/wrappers shall be in the PATH or otherwise run. Not any
application executables directly.

Every wrapper sets the right library path for its binary.

> >Are there any practical cases where the overhead of an extra execve()
> >of a small wrapper would be a noticeable problem?
> 
> It may be very hard to get these into the right places.

Why hard? You do not package software without knowing what are its
executables. To replace them with wrappers is a trivial task, or rather
to put the wrappers separately, in another file tree or a directory to
be used via PATH or directly.

If the software is to be used from locally varying paths, you may choose
to generate wrappers at "installation"/"file-tree-relocation" stage
or otherwise just let your pre-made wrappers examine a _package_specific_
environment variable - resulting in all the semantics of LD_LIBRARY_PATH,
but in a totally safe fashion.

In other words, I do not see this as a task for the loader (who generally
does *not* have all the relevant information, only the limited data
from elf), but as a task for the packaging, with an available solution
of low cost and complete reliability.

Regards,
/i262jq

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found
  2024-02-07 18:09 ` Colin Cross
@ 2024-04-02 20:59   ` Markus Mayer
  0 siblings, 0 replies; 13+ messages in thread
From: Markus Mayer @ 2024-04-02 20:59 UTC (permalink / raw)
  To: Musl Mailing List; +Cc: Colin Cross

On Wed, 7 Feb 2024 at 10:09, Colin Cross <ccross@google.com> wrote:
>
> On Tue, Feb 6, 2024 at 5:23 PM Markus Mayer <mmayer@broadcom.com> wrote:
[...]
> > The problem:
> >
> > If LD_LIBRARY_PATH is set to a directory containing 32-bit libraries and
> > then a 64-bit binary is invoked, the shared library loader will pick up
> > the 32-bit version of a library first, because it'll look at
> > LD_LIBRARY_PATH before anything else. Mapping the 32-bit library into
> > the 64-bit process will fail. This much is expected.
> >
> > However, even though the correct library resides on the system, the
> > shared library loader never attempts to look for it. The 64-bit process
> > will fail to launch, even though there is no reason for the failure. The
> > problem only exists, because the shared library launcher doesn't look in
> > the remaining shared library directories.
> >
[...]
> > Continuing to search the system directories does seem to be the right
> > thing to do under the circumstances described here. Also, it is what
> > glibc does.
[...]
>
> There is a previous discussion of the same issue at
> https://www.openwall.com/lists/musl/2023/02/07/3.

Thanks, Colin, for pointing this out. This was very helpful.

We have been able to determine that Colin's patch series from February
2023 does address our issue. If I apply Colin's patches, the shared
library loader will continue searching for a shared library match
after unsuccessfully trying to load a 32-bit library into a 64-bit
process. This behaviour is just what one would expect. Consequently,
launching a 64-bit binary no longer fails just because the shared
library loader happens to find the 32-bit version of a library it
needs first.

In light of this, does it make sense to revive Colin's series from
last year and reconsider it for inclusion? If there is anything I can
do to facilitate the process, please let me know. I can certainly help
with testing.

If Colin's series is somehow not acceptable, please let me know as
well, and I will see what I can come up with to resolve the issue.

Thanks,
-Markus

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-04-02 21:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-07  1:22 [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found Markus Mayer
2024-02-07  1:22 ` [musl] [PATCH 1/1] ldso: continue searching if wrong architecture is found first Markus Mayer
2024-02-07 17:30   ` Rich Felker
2024-02-07 19:10     ` i262jq
2024-02-07 19:51       ` Thorsten Glaser
2024-02-08 15:52         ` i262jq
2024-02-07 20:59     ` Markus Mayer
2024-02-07  8:26 ` [musl] [PATCH 0/1] ldso: continue searching if wrong architecture is found i262jq
2024-02-07 16:45   ` enh
2024-02-07 17:07     ` Thorsten Glaser
2024-02-07 17:54       ` enh
2024-02-07 18:09 ` Colin Cross
2024-04-02 20:59   ` Markus Mayer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).