Hello, Just as many (well, few) people I was surprised by the inability to dlopen() from a static binary (https://www.openwall.com/lists/musl/2012/12/08/4 , etc.). I started to hack into musl's dynamic linker, just to find it a bit ... tangled. That of course was nothing compared to taking a standalone ELF loader and trying to deal with glibc's dynamic loader, that was total mess (just look at https://github.com/robgjansen/elf-loader, which tried to do that; tried, because it doesn't work with recent glibc versions, and need constant patching). Oh, forgot to say that I'm not looking for a way to load a particular musl-dynlinked shared library into musl-staticlinked binary. So, arguments like "but you'll need to carry around musl's libc.so" don't apply. What I'm looking for is a way to have a static closed-world application, but let it, at the user's request, to interface with whatever system may be outside. So, seeing what a mess is doing "honest" dynamic loading for real world, and given my usecase, which is about wanting to touch that mess as little as possible with bare hands, I came to a cute blackbox'ish solution to an issue. The rest of the story and proof of concept code is at https://github.com/pfalcon/foreign-dlopen . (Sorry for somewhat tangled message, I made that proof of concept a month ago and it was justing sitting in my github account, so posting mostly for search engines' use, to help people who may come up to similar needs, whenever that may happen.) -- Best regards, Paul mailto:pmiscml@gmail.com
On Thu, Apr 23, 2020 at 02:25:31AM +0300, Paul Sokolovsky wrote:
> Hello,
>
> Just as many (well, few) people I was surprised by the inability to
> dlopen() from a static binary
> (https://www.openwall.com/lists/musl/2012/12/08/4 , etc.). I started to
> hack into musl's dynamic linker, just to find it a bit ... tangled.
> That of course was nothing compared to taking a standalone ELF loader
> and trying to deal with glibc's dynamic loader, that was total mess
> (just look at https://github.com/robgjansen/elf-loader, which tried to
> do that; tried, because it doesn't work with recent glibc versions,
> and need constant patching).
>
> Oh, forgot to say that I'm not looking for a way to load a
> particular musl-dynlinked shared library into musl-staticlinked binary.
> So, arguments like "but you'll need to carry around musl's libc.so"
> don't apply. What I'm looking for is a way to have a static closed-world
> application, but let it, at the user's request, to interface with
> whatever system may be outside.
>
> So, seeing what a mess is doing "honest" dynamic loading for real
> world, and given my usecase, which is about wanting to touch that mess
> as little as possible with bare hands, I came to a cute blackbox'ish
> solution to an issue. The rest of the story and proof of concept code
> is at https://github.com/pfalcon/foreign-dlopen .
In your example it looks like you're foreign_dlopen'ing glibc. That
simply *can't* work, because part of the interface contract of all
glibc functions is that they're called with the thread pointer
register (%gs or %fs on i386 or x86_64 respectively) pointing to a
glibc TCB, which will not be the case when they're invoked from a
musl-linked (or other non-glibc-linked) program.
If you relax to the case where you're not doing that, and instead only
opening *pure library* code which has no tie-in to global state or TLS
contracts, then it should be able to work.
Rich
Hello, On Wed, 22 Apr 2020 22:39:41 -0400 Rich Felker <dalias@libc.org> wrote: [] > > Oh, forgot to say that I'm not looking for a way to load a > > particular musl-dynlinked shared library into musl-staticlinked > > binary. So, arguments like "but you'll need to carry around musl's > > libc.so" don't apply. What I'm looking for is a way to have a > > static closed-world application, but let it, at the user's request, > > to interface with whatever system may be outside. [] > > of concept code is at https://github.com/pfalcon/foreign-dlopen . > > In your example it looks like you're foreign_dlopen'ing glibc. That > simply *can't* work, because part of the interface contract of all > glibc functions is that they're called with the thread pointer > register (%gs or %fs on i386 or x86_64 respectively) pointing to a > glibc TCB, which will not be the case when they're invoked from a > musl-linked (or other non-glibc-linked) program. Thanks for the response and for the word of warning. As I mentioned, this is essentially a proof of concept, and so far was tested only by calling glibc's printf() from a host app which was either linked with glibc itself or -nostdlib and static. And that was already more than with any other ELF loader which I tried (which worked for simple functions like write(), but crashed in anything more complex like printf()). But it certainly doesn't touch a case you describe, when "foreign" vs local libc expect different values of %gs/%fs (so apparently, "foreign function call" facility would need to swap them around a call). > > If you relax to the case where you're not doing that, and instead only > opening *pure library* code which has no tie-in to global state or TLS > contracts, then it should be able to work. > > Rich -- Best regards, Paul mailto:pmiscml@gmail.com
* Paul Sokolovsky <pmiscml@gmail.com> [2020-04-23 12:16:26 +0300]: > Hello, > > On Wed, 22 Apr 2020 22:39:41 -0400 > Rich Felker <dalias@libc.org> wrote: > > [] > > > > Oh, forgot to say that I'm not looking for a way to load a > > > particular musl-dynlinked shared library into musl-staticlinked > > > binary. So, arguments like "but you'll need to carry around musl's > > > libc.so" don't apply. What I'm looking for is a way to have a > > > static closed-world application, but let it, at the user's request, > > > to interface with whatever system may be outside. > [] > > > of concept code is at > https://github.com/pfalcon/foreign-dlopen . > > > > In your example it looks like you're foreign_dlopen'ing glibc. That > > simply *can't* work, because part of the interface contract of all > > glibc functions is that they're called with the thread pointer > > register (%gs or %fs on i386 or x86_64 respectively) pointing to a > > glibc TCB, which will not be the case when they're invoked from a > > musl-linked (or other non-glibc-linked) program. > > Thanks for the response and for the word of warning. As I mentioned, > this is essentially a proof of concept, and so far was tested only by > calling glibc's printf() from a host app which was either linked with > glibc itself or -nostdlib and static. And that was already more than > with any other ELF loader which I tried (which worked for simple > functions like write(), but crashed in anything more complex like > printf()). > > But it certainly doesn't touch a case you describe, when "foreign" vs > local libc expect different values of %gs/%fs (so apparently, "foreign > function call" facility would need to swap them around a call). yes, libc functions should be called on libc owned threads and your code can only run on the same thread if you follow the same abi (which is more than just the call convention), swapping the thread pointer means that the foreign libc has to create the thread on which you invoke the foreign function (or it has to be the main thread) since the data structures at tp are set up at thread creation (or early libc init for the main thread). what's worse is that some process global state also has to be under the control of libc (e.g. libc internal signal handlers or global state controlled via prctl or libc may want fd 0,1,2 in a particular state) so cross calling a different libc involves system calls (e.g. the go runtime gets this wrong for obvious reasons: calling c from go would be really slow, this is why you normally try to avoid using your own libc independent runtime. go gets away with this because libc internal signals are rarely relevant and most process state is per thread on linux so if you let the foreign libc to create the os threads and take over the signal handlers and signal masks then things work) > > If you relax to the case where you're not doing that, and instead only > > opening *pure library* code which has no tie-in to global state or TLS > > contracts, then it should be able to work. it's not documented what api is implemented as pure library code and in principle libc code may call other libc code via plt and then lazy binding can happen which is not pure. (glibc tries to avoid this of course, but it does have some runtime loaded components e.g. for locale specific char conversions so things that may seem pure from the outside can end up unpure).
On Thu, Apr 23, 2020 at 02:22:34PM +0200, Szabolcs Nagy wrote: > * Paul Sokolovsky <pmiscml@gmail.com> [2020-04-23 12:16:26 +0300]: > > Hello, > > > > On Wed, 22 Apr 2020 22:39:41 -0400 > > Rich Felker <dalias@libc.org> wrote: > > > > [] > > > > > > Oh, forgot to say that I'm not looking for a way to load a > > > > particular musl-dynlinked shared library into musl-staticlinked > > > > binary. So, arguments like "but you'll need to carry around musl's > > > > libc.so" don't apply. What I'm looking for is a way to have a > > > > static closed-world application, but let it, at the user's request, > > > > to interface with whatever system may be outside. > > [] > > > > of concept code is at > > https://github.com/pfalcon/foreign-dlopen . > > > > > > In your example it looks like you're foreign_dlopen'ing glibc. That > > > simply *can't* work, because part of the interface contract of all > > > glibc functions is that they're called with the thread pointer > > > register (%gs or %fs on i386 or x86_64 respectively) pointing to a > > > glibc TCB, which will not be the case when they're invoked from a > > > musl-linked (or other non-glibc-linked) program. > > > > Thanks for the response and for the word of warning. As I mentioned, > > this is essentially a proof of concept, and so far was tested only by > > calling glibc's printf() from a host app which was either linked with > > glibc itself or -nostdlib and static. And that was already more than > > with any other ELF loader which I tried (which worked for simple > > functions like write(), but crashed in anything more complex like > > printf()). > > > > But it certainly doesn't touch a case you describe, when "foreign" vs > > local libc expect different values of %gs/%fs (so apparently, "foreign > > function call" facility would need to swap them around a call). > > yes, libc functions should be called on libc owned > threads and your code can only run on the same thread if > you follow the same abi (which is more than just the > call convention), swapping the thread pointer means that > the foreign libc has to create the thread on which you > invoke the foreign function (or it has to be the main > thread) since the data structures at tp are set up at > thread creation (or early libc init for the main thread). > > what's worse is that some process global state also > has to be under the control of libc (e.g. libc internal > signal handlers or global state controlled via prctl or > libc may want fd 0,1,2 in a particular state) so cross > calling a different libc involves system calls (e.g. the > go runtime gets this wrong for obvious reasons: calling > c from go would be really slow, this is why you normally > try to avoid using your own libc independent runtime. > go gets away with this because libc internal signals are > rarely relevant and most process state is per thread on > linux so if you let the foreign libc to create the os > threads and take over the signal handlers and signal > masks then things work) Yes, I don't think the "swap the thread pointer" approach works. And even if not for the other global state you pointed out, swapping the thread pointer is not safe if any signal handler may run, including even implementation-internal signals which you can't block. Moreover libc could even implement its own signal layer where underlying kernel signals aren't blocked just because they're blocked from the application's perspective. Any attempt to run a foreign libc in the same process is inherently going to be poking at implementation internals that are not stable interfaces you can make use of. > > > If you relax to the case where you're not doing that, and instead only > > > opening *pure library* code which has no tie-in to global state or TLS > > > contracts, then it should be able to work. > > it's not documented what api is implemented as pure > library code and in principle libc code may call > other libc code via plt and then lazy binding can > happen which is not pure. (glibc tries to avoid this > of course, but it does have some runtime loaded > components e.g. for locale specific char conversions > so things that may seem pure from the outside can end > up unpure). I'm referring to pure library code that doesn't even link libc, much less that's part of libc. A .so file with no DT_NEEDED at all (linked with -nostdlib). Rich