mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] foreign-dlopen: dlopen() from static binary, again (and not the way you think!)
@ 2020-04-22 23:25 Paul Sokolovsky
  2020-04-23  2:39 ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Sokolovsky @ 2020-04-22 23:25 UTC (permalink / raw)
  To: musl

Hello,

Just as many (well, few) people I was surprised by the inability to
dlopen() from a static binary
(https://www.openwall.com/lists/musl/2012/12/08/4 , etc.). I started to
hack into musl's dynamic linker, just to find it a bit ... tangled.
That of course was nothing compared to taking a standalone ELF loader
and trying to deal with glibc's dynamic loader, that was total mess
(just look at https://github.com/robgjansen/elf-loader, which tried to
do that; tried, because it doesn't work with recent glibc versions,
and need constant patching).

Oh, forgot to say that I'm not looking for a way to load a
particular musl-dynlinked shared library into musl-staticlinked binary.
So, arguments like "but you'll need to carry around musl's libc.so"
don't apply. What I'm looking for is a way to have a static closed-world
application, but let it, at the user's request, to interface with
whatever system may be outside.

So, seeing what a mess is doing "honest" dynamic loading for real
world, and given my usecase, which is about wanting to touch that mess
as little as possible with bare hands, I came to a cute blackbox'ish
solution to an issue. The rest of the story and proof of concept code
is at https://github.com/pfalcon/foreign-dlopen .


(Sorry for somewhat tangled message, I made that proof of concept a
month ago and it was justing sitting in my github account, so posting
mostly for search engines' use, to help people who may come up to
similar needs, whenever that may happen.)

-- 
Best regards,
 Paul                          mailto:pmiscml@gmail.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] foreign-dlopen: dlopen() from static binary, again (and not the way you think!)
  2020-04-22 23:25 [musl] foreign-dlopen: dlopen() from static binary, again (and not the way you think!) Paul Sokolovsky
@ 2020-04-23  2:39 ` Rich Felker
  2020-04-23  9:16   ` Paul Sokolovsky
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2020-04-23  2:39 UTC (permalink / raw)
  To: musl

On Thu, Apr 23, 2020 at 02:25:31AM +0300, Paul Sokolovsky wrote:
> Hello,
> 
> Just as many (well, few) people I was surprised by the inability to
> dlopen() from a static binary
> (https://www.openwall.com/lists/musl/2012/12/08/4 , etc.). I started to
> hack into musl's dynamic linker, just to find it a bit ... tangled.
> That of course was nothing compared to taking a standalone ELF loader
> and trying to deal with glibc's dynamic loader, that was total mess
> (just look at https://github.com/robgjansen/elf-loader, which tried to
> do that; tried, because it doesn't work with recent glibc versions,
> and need constant patching).
> 
> Oh, forgot to say that I'm not looking for a way to load a
> particular musl-dynlinked shared library into musl-staticlinked binary.
> So, arguments like "but you'll need to carry around musl's libc.so"
> don't apply. What I'm looking for is a way to have a static closed-world
> application, but let it, at the user's request, to interface with
> whatever system may be outside.
> 
> So, seeing what a mess is doing "honest" dynamic loading for real
> world, and given my usecase, which is about wanting to touch that mess
> as little as possible with bare hands, I came to a cute blackbox'ish
> solution to an issue. The rest of the story and proof of concept code
> is at https://github.com/pfalcon/foreign-dlopen .

In your example it looks like you're foreign_dlopen'ing glibc. That
simply *can't* work, because part of the interface contract of all
glibc functions is that they're called with the thread pointer
register (%gs or %fs on i386 or x86_64 respectively) pointing to a
glibc TCB, which will not be the case when they're invoked from a
musl-linked (or other non-glibc-linked) program.

If you relax to the case where you're not doing that, and instead only
opening *pure library* code which has no tie-in to global state or TLS
contracts, then it should be able to work.

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] foreign-dlopen: dlopen() from static binary, again (and not the way you think!)
  2020-04-23  2:39 ` Rich Felker
@ 2020-04-23  9:16   ` Paul Sokolovsky
  2020-04-23 12:22     ` Szabolcs Nagy
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Sokolovsky @ 2020-04-23  9:16 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

Hello,

On Wed, 22 Apr 2020 22:39:41 -0400
Rich Felker <dalias@libc.org> wrote:

[]

> > Oh, forgot to say that I'm not looking for a way to load a
> > particular musl-dynlinked shared library into musl-staticlinked
> > binary. So, arguments like "but you'll need to carry around musl's
> > libc.so" don't apply. What I'm looking for is a way to have a
> > static closed-world application, but let it, at the user's request,
> > to interface with whatever system may be outside.
[]
> > of concept code is at
https://github.com/pfalcon/foreign-dlopen .  
> 
> In your example it looks like you're foreign_dlopen'ing glibc. That
> simply *can't* work, because part of the interface contract of all
> glibc functions is that they're called with the thread pointer
> register (%gs or %fs on i386 or x86_64 respectively) pointing to a
> glibc TCB, which will not be the case when they're invoked from a
> musl-linked (or other non-glibc-linked) program.

Thanks for the response and for the word of warning. As I mentioned,
this is essentially a proof of concept, and so far was tested only by
calling glibc's printf() from a host app which was either linked with
glibc itself or -nostdlib and static. And that was already more than
with any other ELF loader which I tried (which worked for simple
functions like write(), but crashed in anything more complex like
printf()).

But it certainly doesn't touch a case you describe, when "foreign" vs
local libc expect different values of %gs/%fs (so apparently, "foreign
function call" facility would need to swap them around a call).

> 
> If you relax to the case where you're not doing that, and instead only
> opening *pure library* code which has no tie-in to global state or TLS
> contracts, then it should be able to work.
> 
> Rich



-- 
Best regards,
 Paul                          mailto:pmiscml@gmail.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] foreign-dlopen: dlopen() from static binary, again (and not the way you think!)
  2020-04-23  9:16   ` Paul Sokolovsky
@ 2020-04-23 12:22     ` Szabolcs Nagy
  2020-04-23 16:24       ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Szabolcs Nagy @ 2020-04-23 12:22 UTC (permalink / raw)
  To: Paul Sokolovsky; +Cc: Rich Felker, musl

* Paul Sokolovsky <pmiscml@gmail.com> [2020-04-23 12:16:26 +0300]:
> Hello,
> 
> On Wed, 22 Apr 2020 22:39:41 -0400
> Rich Felker <dalias@libc.org> wrote:
> 
> []
> 
> > > Oh, forgot to say that I'm not looking for a way to load a
> > > particular musl-dynlinked shared library into musl-staticlinked
> > > binary. So, arguments like "but you'll need to carry around musl's
> > > libc.so" don't apply. What I'm looking for is a way to have a
> > > static closed-world application, but let it, at the user's request,
> > > to interface with whatever system may be outside.
> []
> > > of concept code is at
> https://github.com/pfalcon/foreign-dlopen .  
> > 
> > In your example it looks like you're foreign_dlopen'ing glibc. That
> > simply *can't* work, because part of the interface contract of all
> > glibc functions is that they're called with the thread pointer
> > register (%gs or %fs on i386 or x86_64 respectively) pointing to a
> > glibc TCB, which will not be the case when they're invoked from a
> > musl-linked (or other non-glibc-linked) program.
> 
> Thanks for the response and for the word of warning. As I mentioned,
> this is essentially a proof of concept, and so far was tested only by
> calling glibc's printf() from a host app which was either linked with
> glibc itself or -nostdlib and static. And that was already more than
> with any other ELF loader which I tried (which worked for simple
> functions like write(), but crashed in anything more complex like
> printf()).
> 
> But it certainly doesn't touch a case you describe, when "foreign" vs
> local libc expect different values of %gs/%fs (so apparently, "foreign
> function call" facility would need to swap them around a call).

yes, libc functions should be called on libc owned
threads and your code can only run on the same thread if
you follow the same abi (which is more than just the
call convention), swapping the thread pointer means that
the foreign libc has to create the thread on which you
invoke the foreign function (or it has to be the main
thread) since the data structures at tp are set up at
thread creation (or early libc init for the main thread).

what's worse is that some process global state also
has to be under the control of libc (e.g. libc internal
signal handlers or global state controlled via prctl or
libc may want fd 0,1,2 in a particular state) so cross
calling a different libc involves system calls (e.g. the
go runtime gets this wrong for obvious reasons: calling
c from go would be really slow, this is why you normally
try to avoid using your own libc independent runtime.
go gets away with this because libc internal signals are
rarely relevant and most process state is per thread on
linux so if you let the foreign libc to create the os
threads and take over the signal handlers and signal
masks then things work)

> > If you relax to the case where you're not doing that, and instead only
> > opening *pure library* code which has no tie-in to global state or TLS
> > contracts, then it should be able to work.

it's not documented what api is implemented as pure
library code and in principle libc code may call
other libc code via plt and then lazy binding can
happen which is not pure. (glibc tries to avoid this
of course, but it does have some runtime loaded
components e.g. for locale specific char conversions
so things that may seem pure from the outside can end
up unpure).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] foreign-dlopen: dlopen() from static binary, again (and not the way you think!)
  2020-04-23 12:22     ` Szabolcs Nagy
@ 2020-04-23 16:24       ` Rich Felker
  0 siblings, 0 replies; 5+ messages in thread
From: Rich Felker @ 2020-04-23 16:24 UTC (permalink / raw)
  To: Paul Sokolovsky, musl

On Thu, Apr 23, 2020 at 02:22:34PM +0200, Szabolcs Nagy wrote:
> * Paul Sokolovsky <pmiscml@gmail.com> [2020-04-23 12:16:26 +0300]:
> > Hello,
> > 
> > On Wed, 22 Apr 2020 22:39:41 -0400
> > Rich Felker <dalias@libc.org> wrote:
> > 
> > []
> > 
> > > > Oh, forgot to say that I'm not looking for a way to load a
> > > > particular musl-dynlinked shared library into musl-staticlinked
> > > > binary. So, arguments like "but you'll need to carry around musl's
> > > > libc.so" don't apply. What I'm looking for is a way to have a
> > > > static closed-world application, but let it, at the user's request,
> > > > to interface with whatever system may be outside.
> > []
> > > > of concept code is at
> > https://github.com/pfalcon/foreign-dlopen .  
> > > 
> > > In your example it looks like you're foreign_dlopen'ing glibc. That
> > > simply *can't* work, because part of the interface contract of all
> > > glibc functions is that they're called with the thread pointer
> > > register (%gs or %fs on i386 or x86_64 respectively) pointing to a
> > > glibc TCB, which will not be the case when they're invoked from a
> > > musl-linked (or other non-glibc-linked) program.
> > 
> > Thanks for the response and for the word of warning. As I mentioned,
> > this is essentially a proof of concept, and so far was tested only by
> > calling glibc's printf() from a host app which was either linked with
> > glibc itself or -nostdlib and static. And that was already more than
> > with any other ELF loader which I tried (which worked for simple
> > functions like write(), but crashed in anything more complex like
> > printf()).
> > 
> > But it certainly doesn't touch a case you describe, when "foreign" vs
> > local libc expect different values of %gs/%fs (so apparently, "foreign
> > function call" facility would need to swap them around a call).
> 
> yes, libc functions should be called on libc owned
> threads and your code can only run on the same thread if
> you follow the same abi (which is more than just the
> call convention), swapping the thread pointer means that
> the foreign libc has to create the thread on which you
> invoke the foreign function (or it has to be the main
> thread) since the data structures at tp are set up at
> thread creation (or early libc init for the main thread).
> 
> what's worse is that some process global state also
> has to be under the control of libc (e.g. libc internal
> signal handlers or global state controlled via prctl or
> libc may want fd 0,1,2 in a particular state) so cross
> calling a different libc involves system calls (e.g. the
> go runtime gets this wrong for obvious reasons: calling
> c from go would be really slow, this is why you normally
> try to avoid using your own libc independent runtime.
> go gets away with this because libc internal signals are
> rarely relevant and most process state is per thread on
> linux so if you let the foreign libc to create the os
> threads and take over the signal handlers and signal
> masks then things work)

Yes, I don't think the "swap the thread pointer" approach works. And
even if not for the other global state you pointed out, swapping the
thread pointer is not safe if any signal handler may run, including
even implementation-internal signals which you can't block. Moreover
libc could even implement its own signal layer where underlying
kernel signals aren't blocked just because they're blocked from the
application's perspective. Any attempt to run a foreign libc in the
same process is inherently going to be poking at implementation
internals that are not stable interfaces you can make use of.

> > > If you relax to the case where you're not doing that, and instead only
> > > opening *pure library* code which has no tie-in to global state or TLS
> > > contracts, then it should be able to work.
> 
> it's not documented what api is implemented as pure
> library code and in principle libc code may call
> other libc code via plt and then lazy binding can
> happen which is not pure. (glibc tries to avoid this
> of course, but it does have some runtime loaded
> components e.g. for locale specific char conversions
> so things that may seem pure from the outside can end
> up unpure).

I'm referring to pure library code that doesn't even link libc, much
less that's part of libc. A .so file with no DT_NEEDED at all (linked
with -nostdlib).

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-04-23 16:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-22 23:25 [musl] foreign-dlopen: dlopen() from static binary, again (and not the way you think!) Paul Sokolovsky
2020-04-23  2:39 ` Rich Felker
2020-04-23  9:16   ` Paul Sokolovsky
2020-04-23 12:22     ` Szabolcs Nagy
2020-04-23 16:24       ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).