mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Why the entries in the dynamic section are not always relocated?
@ 2022-05-08  7:48 Pablo Galindo Salgado
  2022-05-08 11:39 ` Markus Wichmann
  0 siblings, 1 reply; 5+ messages in thread
From: Pablo Galindo Salgado @ 2022-05-08  7:48 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1906 bytes --]

Hi,

I have noticed that when using dl_iterate_phdr in muslc to inspect some
entries in the dynamic section
of different shared libraries loaded into a program, the pointers are not
always relocated (they
are offset instead of full addresses). For example, consider this program:

#include <elf.h>
#include <link.h>
#include <iostream>

static int
phdrs_callback(dl_phdr_info* info, size_t size, void* data) noexcept
{
    for (auto phdr = info->dlpi_phdr, end = phdr + info->dlpi_phnum; phdr
!= end; ++phdr) {
        if (phdr->p_type != PT_DYNAMIC) {
            continue;
        }
        const auto* dynamic_section = reinterpret_cast<const
ElfW(Dyn)*>(phdr->p_vaddr + info->dlpi_addr);

        for (; dynamic_section->d_tag != DT_NULL; ++dynamic_section) {
            if (dynamic_section->d_tag == DT_JMPREL) {
                std::cerr << "Address of JMPREL for lib " <<
info->dlpi_name <<" is: " << (void*)dynamic_section->d_un.d_ptr <<
std::endl;
            }
        }
    }
    return 0;
}

int main() {
    dl_iterate_phdr(&phdrs_callback, NULL);
}

In most glibc systems (for instance  ubuntu:20.04), this prints regular
addressed that have been relocated:

Address of JMPREL for lib  is: 0x557bed97a8e8
Address of JMPREL for lib /usr/lib/libc.so.6 is: 0x7f8e0f08cc18

but in muslc systems (like Alpine linux, for example using
the python:3.10-alpine container) this prints offsets
instead of full relocated addresses:

Address of JMPREL for lib ./a.out is: 0x7f8
Address of JMPREL for lib /usr/lib/libstdc++.so.6 is: 0xa94e0
Address of JMPREL for lib /lib/ld-musl-x86_64.so.1 is: 0x145c0
Address of JMPREL for lib /usr/lib/libgcc_s.so.1 is: 0x22e8

Why is this happening? How can one programmatically know when the linker is
going to place here offsets or full
relocated addresses? In which situation does this happen?

Thanks in advance for the help!

Kind regards,
Pablo Galindo Salgado

[-- Attachment #2: Type: text/html, Size: 4723 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Why the entries in the dynamic section are not always relocated?
  2022-05-08  7:48 [musl] Why the entries in the dynamic section are not always relocated? Pablo Galindo Salgado
@ 2022-05-08 11:39 ` Markus Wichmann
  2022-05-08 13:54   ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Markus Wichmann @ 2022-05-08 11:39 UTC (permalink / raw)
  To: musl

On Sun, May 08, 2022 at 08:48:29AM +0100, Pablo Galindo Salgado wrote:
> Why is this happening?

The easy question first: This is happening because glibc finds some
value in writing the actual addresses into the dynamic section, and musl
does not. All of the addresses given in the dynamic section must
necessarily be offsets into the library itself (rather, the run-time map
of the library), so anyone who knows the base address of the library can
interpret these values, anyway.

See, you are accessing an implementation detail here. I am not aware of
any documentation of dl_iterate_phdr() which says whether the dynamic
section is relocated or not. Which leads directly to:

> How can one programmatically know when the linker is
> going to place here offsets or full
> relocated addresses?

In general, you cannot. You could reconstruct the length of the library
mapping from the LOAD headers, then heuristically assume that any value
below that is an offset, and any value above it probably a pointer.
Doesn't help you far, though, since you also need the base address.
Though I suppose you could assume that the start of the page the PHDRs
start on is likely the base of the library mapping.

Also, the heuristic will fail for libraries mapped to a low address. In
theory, all address space after the zero page is fair game, right? But
libraries can take more space than that.

And God help you if you ever run into an FDPIC architecture.

It appears to me that whatever you are trying to do is not possible
portibly on Linux at this time. Could you fill us in?

Ciao,
Markus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Why the entries in the dynamic section are not always relocated?
  2022-05-08 11:39 ` Markus Wichmann
@ 2022-05-08 13:54   ` Rich Felker
  2022-05-08 14:23     ` Pablo Galindo Salgado
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2022-05-08 13:54 UTC (permalink / raw)
  To: Markus Wichmann; +Cc: musl, Pablo Galindo Salgado

On Sun, May 08, 2022 at 01:39:10PM +0200, Markus Wichmann wrote:
> On Sun, May 08, 2022 at 08:48:29AM +0100, Pablo Galindo Salgado wrote:
> > Why is this happening?
> 
> The easy question first: This is happening because glibc finds some
> value in writing the actual addresses into the dynamic section, and musl
> does not. All of the addresses given in the dynamic section must
> necessarily be offsets into the library itself (rather, the run-time map
> of the library), so anyone who knows the base address of the library can
> interpret these values, anyway.

That's basically it. musl does not do this mainly because it's not
possible in general -- on some archs _DYNAMIC is in read-only memory
-- and we generally avoid arch-specific behavior in the dynamic
linker. The only part of _DYNAMIC we modify, on archs where it's
allowed, is DT_DEBUG, because that's a (nasty, should be replaced)
interface with debuggers to let them find things.

> See, you are accessing an implementation detail here. I am not aware of
> any documentation of dl_iterate_phdr() which says whether the dynamic
> section is relocated or not. Which leads directly to:

It's not so much in the scope of dl_iterate_phdr, but in the runtime
contents of ELF data structures. There are specs on *some* of that,
but they are not among the list of standards musl purports to conform
to (and for example some things like handling of RPATH/RUNPATH
intentionally differ from legacy behaviors here).

> > How can one programmatically know when the linker is
> > going to place here offsets or full
> > relocated addresses?
> 
> In general, you cannot. You could reconstruct the length of the library
> mapping from the LOAD headers, then heuristically assume that any value
> below that is an offset, and any value above it probably a pointer.
> Doesn't help you far, though, since you also need the base address.
> Though I suppose you could assume that the start of the page the PHDRs
> start on is likely the base of the library mapping.
> 
> Also, the heuristic will fail for libraries mapped to a low address. In
> theory, all address space after the zero page is fair game, right? But
> libraries can take more space than that.
> 
> And God help you if you ever run into an FDPIC architecture.
> 
> It appears to me that whatever you are trying to do is not possible
> portibly on Linux at this time. Could you fill us in?

Indeed, this is probably either an XY problem with a simple portable
way to achieve whatever the underlying goal is, or a glorious hack
that's making a lot more assumptions about implementation internals
and not something you'd be able to rely on continuing to work in the
future, even if you got it working.

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Why the entries in the dynamic section are not always relocated?
  2022-05-08 13:54   ` Rich Felker
@ 2022-05-08 14:23     ` Pablo Galindo Salgado
  2022-05-08 19:38       ` Szabolcs Nagy
  0 siblings, 1 reply; 5+ messages in thread
From: Pablo Galindo Salgado @ 2022-05-08 14:23 UTC (permalink / raw)
  To: Rich Felker; +Cc: Markus Wichmann, musl

[-- Attachment #1: Type: text/plain, Size: 5241 bytes --]

Thanks for all the answers to this! Here are some clarifications
and context.

> It appears to me that whatever you are trying to do is not possible
> portibly on Linux at this time. Could you fill us in?

As part of writing profiling and debugging tools, I am trying to rewrite
the PLT
table to hook into some symbols of shared libraries. This technique is
quite common
and is already used in a considerable number of debuggers, profilers and
elf inspection
tools. Currently, the way this is handled is "not at all" or "checking
against the base
address and heuristically assuming that is an offset if the address is less
than the base",
which is suboptimal. This use case may sound "advanced" or "hacky" but this
is quite a
common technique for doing profilers, debuggers, state inspection tools and
other related
tooling.

Notice that the lack of anything predictable here makes these tools be more
unreliable
across libc implementations (most people assume it "works" based on what
glibc does
but even old glibcs seem to be inconsistent with this).

Apart from some advanced profiling/debugger use cases, I think there are
several important
use cases here that would benefit from some way to handle this at runtime.
For instance,
inspecting the string tables and symbol tables and other entries in the
dynamic section.

Here are (some) examples of software dealing with this problem in the wild:

https://github.com/ClickHouse/ClickHouse/blob/8513f20cfded839032795978a2ffb8ef1fc6d61b/src/Common/SymbolIndex.cpp#L163

https://gitlab.collabora.com/vivek/libcapsule/-/blob/master/utils/dump.c#L850

There are many, many more examples of tools that are not aware of this
incompatibility and are doing it wrong. Just some examples
of this:

https://github.com/kubo/plthook/blob/fa0267b29e989e310c2594afa095cf697ea09da0/plthook_elf.c#L548-L555

https://github.com/KDE/heaptrack/blob/d9c51f3f76d7a37348020d3aead651f5301f8ea7/src/track/heaptrack_inject.cpp#L317

https://gist.github.com/aeppert/0b1a38d4364e2863d27a8a0ce2c97dc8

https://course.ccs.neu.edu/cs7680sp17/elf-parser/util-plugin.c.txt

(and many more).

I think there is value on having some way to programmatically efficiently
know how to interpret these addresses. At the very least,
allowing these tools to work correctly on muslc without even more hacks on
top.

Thanks for your consideration!


On Sun, 8 May 2022 at 14:54, Rich Felker <dalias@libc.org> wrote:

> On Sun, May 08, 2022 at 01:39:10PM +0200, Markus Wichmann wrote:
> > On Sun, May 08, 2022 at 08:48:29AM +0100, Pablo Galindo Salgado wrote:
> > > Why is this happening?
> >
> > The easy question first: This is happening because glibc finds some
> > value in writing the actual addresses into the dynamic section, and musl
> > does not. All of the addresses given in the dynamic section must
> > necessarily be offsets into the library itself (rather, the run-time map
> > of the library), so anyone who knows the base address of the library can
> > interpret these values, anyway.
>
> That's basically it. musl does not do this mainly because it's not
> possible in general -- on some archs _DYNAMIC is in read-only memory
> -- and we generally avoid arch-specific behavior in the dynamic
> linker. The only part of _DYNAMIC we modify, on archs where it's
> allowed, is DT_DEBUG, because that's a (nasty, should be replaced)
> interface with debuggers to let them find things.
>
> > See, you are accessing an implementation detail here. I am not aware of
> > any documentation of dl_iterate_phdr() which says whether the dynamic
> > section is relocated or not. Which leads directly to:
>
> It's not so much in the scope of dl_iterate_phdr, but in the runtime
> contents of ELF data structures. There are specs on *some* of that,
> but they are not among the list of standards musl purports to conform
> to (and for example some things like handling of RPATH/RUNPATH
> intentionally differ from legacy behaviors here).
>
> > > How can one programmatically know when the linker is
> > > going to place here offsets or full
> > > relocated addresses?
> >
> > In general, you cannot. You could reconstruct the length of the library
> > mapping from the LOAD headers, then heuristically assume that any value
> > below that is an offset, and any value above it probably a pointer.
> > Doesn't help you far, though, since you also need the base address.
> > Though I suppose you could assume that the start of the page the PHDRs
> > start on is likely the base of the library mapping.
> >
> > Also, the heuristic will fail for libraries mapped to a low address. In
> > theory, all address space after the zero page is fair game, right? But
> > libraries can take more space than that.
> >
> > And God help you if you ever run into an FDPIC architecture.
> >
> > It appears to me that whatever you are trying to do is not possible
> > portibly on Linux at this time. Could you fill us in?
>
> Indeed, this is probably either an XY problem with a simple portable
> way to achieve whatever the underlying goal is, or a glorious hack
> that's making a lot more assumptions about implementation internals
> and not something you'd be able to rely on continuing to work in the
> future, even if you got it working.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 11483 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Why the entries in the dynamic section are not always relocated?
  2022-05-08 14:23     ` Pablo Galindo Salgado
@ 2022-05-08 19:38       ` Szabolcs Nagy
  0 siblings, 0 replies; 5+ messages in thread
From: Szabolcs Nagy @ 2022-05-08 19:38 UTC (permalink / raw)
  To: Pablo Galindo Salgado; +Cc: Rich Felker, Markus Wichmann, musl

* Pablo Galindo Salgado <pablogsal@gmail.com> [2022-05-08 15:23:29 +0100]:
> Thanks for all the answers to this! Here are some clarifications
> and context.
> 
> > It appears to me that whatever you are trying to do is not possible
> > portibly on Linux at this time. Could you fill us in?
> 
> As part of writing profiling and debugging tools, I am trying to rewrite
> the PLT
> table to hook into some symbols of shared libraries. This technique is
> quite common
> and is already used in a considerable number of debuggers, profilers and
> elf inspection
> tools. Currently, the way this is handled is "not at all" or "checking
> against the base
> address and heuristically assuming that is an offset if the address is less
> than the base",
> which is suboptimal. This use case may sound "advanced" or "hacky" but this
> is quite a
> common technique for doing profilers, debuggers, state inspection tools and
> other related
> tooling.
> 
> Notice that the lack of anything predictable here makes these tools be more
> unreliable
> across libc implementations (most people assume it "works" based on what
> glibc does
> but even old glibcs seem to be inconsistent with this).

note: in glibc the internal macro DL_RO_DYN_SECTION controls if the
dynamic section is relocated or not. on mips and riscv it is set so
there the dynamic section is not relocated.

i guess gdb decides based on the target how to find the debug info.

but clearly relocating the dynamic section is not compatible with
having it exposed as part of the public abi (the libc does not know
about future dynamic tags, unknown tags are ignored, not relocated).

if user code needs to access the dynamic section at runtime then
it can hard code glibc specific knowledge (decide based on glibc
version and target) or use another, supported libc interface. (e.g.
glibc supports plt hooks via LD_AUDIT).

thanks for the example links, those were interesting.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-08 19:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-08  7:48 [musl] Why the entries in the dynamic section are not always relocated? Pablo Galindo Salgado
2022-05-08 11:39 ` Markus Wichmann
2022-05-08 13:54   ` Rich Felker
2022-05-08 14:23     ` Pablo Galindo Salgado
2022-05-08 19:38       ` Szabolcs Nagy

Code repositories for project(s) associated with this inbox:

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).