* [musl] Why the entries in the dynamic section are not always relocated? @ 2022-05-08 7:48 Pablo Galindo Salgado 2022-05-08 11:39 ` Markus Wichmann 0 siblings, 1 reply; 5+ messages in thread From: Pablo Galindo Salgado @ 2022-05-08 7:48 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1906 bytes --] Hi, I have noticed that when using dl_iterate_phdr in muslc to inspect some entries in the dynamic section of different shared libraries loaded into a program, the pointers are not always relocated (they are offset instead of full addresses). For example, consider this program: #include <elf.h> #include <link.h> #include <iostream> static int phdrs_callback(dl_phdr_info* info, size_t size, void* data) noexcept { for (auto phdr = info->dlpi_phdr, end = phdr + info->dlpi_phnum; phdr != end; ++phdr) { if (phdr->p_type != PT_DYNAMIC) { continue; } const auto* dynamic_section = reinterpret_cast<const ElfW(Dyn)*>(phdr->p_vaddr + info->dlpi_addr); for (; dynamic_section->d_tag != DT_NULL; ++dynamic_section) { if (dynamic_section->d_tag == DT_JMPREL) { std::cerr << "Address of JMPREL for lib " << info->dlpi_name <<" is: " << (void*)dynamic_section->d_un.d_ptr << std::endl; } } } return 0; } int main() { dl_iterate_phdr(&phdrs_callback, NULL); } In most glibc systems (for instance ubuntu:20.04), this prints regular addressed that have been relocated: Address of JMPREL for lib is: 0x557bed97a8e8 Address of JMPREL for lib /usr/lib/libc.so.6 is: 0x7f8e0f08cc18 but in muslc systems (like Alpine linux, for example using the python:3.10-alpine container) this prints offsets instead of full relocated addresses: Address of JMPREL for lib ./a.out is: 0x7f8 Address of JMPREL for lib /usr/lib/libstdc++.so.6 is: 0xa94e0 Address of JMPREL for lib /lib/ld-musl-x86_64.so.1 is: 0x145c0 Address of JMPREL for lib /usr/lib/libgcc_s.so.1 is: 0x22e8 Why is this happening? How can one programmatically know when the linker is going to place here offsets or full relocated addresses? In which situation does this happen? Thanks in advance for the help! Kind regards, Pablo Galindo Salgado [-- Attachment #2: Type: text/html, Size: 4723 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] Why the entries in the dynamic section are not always relocated? 2022-05-08 7:48 [musl] Why the entries in the dynamic section are not always relocated? Pablo Galindo Salgado @ 2022-05-08 11:39 ` Markus Wichmann 2022-05-08 13:54 ` Rich Felker 0 siblings, 1 reply; 5+ messages in thread From: Markus Wichmann @ 2022-05-08 11:39 UTC (permalink / raw) To: musl On Sun, May 08, 2022 at 08:48:29AM +0100, Pablo Galindo Salgado wrote: > Why is this happening? The easy question first: This is happening because glibc finds some value in writing the actual addresses into the dynamic section, and musl does not. All of the addresses given in the dynamic section must necessarily be offsets into the library itself (rather, the run-time map of the library), so anyone who knows the base address of the library can interpret these values, anyway. See, you are accessing an implementation detail here. I am not aware of any documentation of dl_iterate_phdr() which says whether the dynamic section is relocated or not. Which leads directly to: > How can one programmatically know when the linker is > going to place here offsets or full > relocated addresses? In general, you cannot. You could reconstruct the length of the library mapping from the LOAD headers, then heuristically assume that any value below that is an offset, and any value above it probably a pointer. Doesn't help you far, though, since you also need the base address. Though I suppose you could assume that the start of the page the PHDRs start on is likely the base of the library mapping. Also, the heuristic will fail for libraries mapped to a low address. In theory, all address space after the zero page is fair game, right? But libraries can take more space than that. And God help you if you ever run into an FDPIC architecture. It appears to me that whatever you are trying to do is not possible portibly on Linux at this time. Could you fill us in? Ciao, Markus ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] Why the entries in the dynamic section are not always relocated? 2022-05-08 11:39 ` Markus Wichmann @ 2022-05-08 13:54 ` Rich Felker 2022-05-08 14:23 ` Pablo Galindo Salgado 0 siblings, 1 reply; 5+ messages in thread From: Rich Felker @ 2022-05-08 13:54 UTC (permalink / raw) To: Markus Wichmann; +Cc: musl, Pablo Galindo Salgado On Sun, May 08, 2022 at 01:39:10PM +0200, Markus Wichmann wrote: > On Sun, May 08, 2022 at 08:48:29AM +0100, Pablo Galindo Salgado wrote: > > Why is this happening? > > The easy question first: This is happening because glibc finds some > value in writing the actual addresses into the dynamic section, and musl > does not. All of the addresses given in the dynamic section must > necessarily be offsets into the library itself (rather, the run-time map > of the library), so anyone who knows the base address of the library can > interpret these values, anyway. That's basically it. musl does not do this mainly because it's not possible in general -- on some archs _DYNAMIC is in read-only memory -- and we generally avoid arch-specific behavior in the dynamic linker. The only part of _DYNAMIC we modify, on archs where it's allowed, is DT_DEBUG, because that's a (nasty, should be replaced) interface with debuggers to let them find things. > See, you are accessing an implementation detail here. I am not aware of > any documentation of dl_iterate_phdr() which says whether the dynamic > section is relocated or not. Which leads directly to: It's not so much in the scope of dl_iterate_phdr, but in the runtime contents of ELF data structures. There are specs on *some* of that, but they are not among the list of standards musl purports to conform to (and for example some things like handling of RPATH/RUNPATH intentionally differ from legacy behaviors here). > > How can one programmatically know when the linker is > > going to place here offsets or full > > relocated addresses? > > In general, you cannot. You could reconstruct the length of the library > mapping from the LOAD headers, then heuristically assume that any value > below that is an offset, and any value above it probably a pointer. > Doesn't help you far, though, since you also need the base address. > Though I suppose you could assume that the start of the page the PHDRs > start on is likely the base of the library mapping. > > Also, the heuristic will fail for libraries mapped to a low address. In > theory, all address space after the zero page is fair game, right? But > libraries can take more space than that. > > And God help you if you ever run into an FDPIC architecture. > > It appears to me that whatever you are trying to do is not possible > portibly on Linux at this time. Could you fill us in? Indeed, this is probably either an XY problem with a simple portable way to achieve whatever the underlying goal is, or a glorious hack that's making a lot more assumptions about implementation internals and not something you'd be able to rely on continuing to work in the future, even if you got it working. Rich ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] Why the entries in the dynamic section are not always relocated? 2022-05-08 13:54 ` Rich Felker @ 2022-05-08 14:23 ` Pablo Galindo Salgado 2022-05-08 19:38 ` Szabolcs Nagy 0 siblings, 1 reply; 5+ messages in thread From: Pablo Galindo Salgado @ 2022-05-08 14:23 UTC (permalink / raw) To: Rich Felker; +Cc: Markus Wichmann, musl [-- Attachment #1: Type: text/plain, Size: 5241 bytes --] Thanks for all the answers to this! Here are some clarifications and context. > It appears to me that whatever you are trying to do is not possible > portibly on Linux at this time. Could you fill us in? As part of writing profiling and debugging tools, I am trying to rewrite the PLT table to hook into some symbols of shared libraries. This technique is quite common and is already used in a considerable number of debuggers, profilers and elf inspection tools. Currently, the way this is handled is "not at all" or "checking against the base address and heuristically assuming that is an offset if the address is less than the base", which is suboptimal. This use case may sound "advanced" or "hacky" but this is quite a common technique for doing profilers, debuggers, state inspection tools and other related tooling. Notice that the lack of anything predictable here makes these tools be more unreliable across libc implementations (most people assume it "works" based on what glibc does but even old glibcs seem to be inconsistent with this). Apart from some advanced profiling/debugger use cases, I think there are several important use cases here that would benefit from some way to handle this at runtime. For instance, inspecting the string tables and symbol tables and other entries in the dynamic section. Here are (some) examples of software dealing with this problem in the wild: https://github.com/ClickHouse/ClickHouse/blob/8513f20cfded839032795978a2ffb8ef1fc6d61b/src/Common/SymbolIndex.cpp#L163 https://gitlab.collabora.com/vivek/libcapsule/-/blob/master/utils/dump.c#L850 There are many, many more examples of tools that are not aware of this incompatibility and are doing it wrong. Just some examples of this: https://github.com/kubo/plthook/blob/fa0267b29e989e310c2594afa095cf697ea09da0/plthook_elf.c#L548-L555 https://github.com/KDE/heaptrack/blob/d9c51f3f76d7a37348020d3aead651f5301f8ea7/src/track/heaptrack_inject.cpp#L317 https://gist.github.com/aeppert/0b1a38d4364e2863d27a8a0ce2c97dc8 https://course.ccs.neu.edu/cs7680sp17/elf-parser/util-plugin.c.txt (and many more). I think there is value on having some way to programmatically efficiently know how to interpret these addresses. At the very least, allowing these tools to work correctly on muslc without even more hacks on top. Thanks for your consideration! On Sun, 8 May 2022 at 14:54, Rich Felker <dalias@libc.org> wrote: > On Sun, May 08, 2022 at 01:39:10PM +0200, Markus Wichmann wrote: > > On Sun, May 08, 2022 at 08:48:29AM +0100, Pablo Galindo Salgado wrote: > > > Why is this happening? > > > > The easy question first: This is happening because glibc finds some > > value in writing the actual addresses into the dynamic section, and musl > > does not. All of the addresses given in the dynamic section must > > necessarily be offsets into the library itself (rather, the run-time map > > of the library), so anyone who knows the base address of the library can > > interpret these values, anyway. > > That's basically it. musl does not do this mainly because it's not > possible in general -- on some archs _DYNAMIC is in read-only memory > -- and we generally avoid arch-specific behavior in the dynamic > linker. The only part of _DYNAMIC we modify, on archs where it's > allowed, is DT_DEBUG, because that's a (nasty, should be replaced) > interface with debuggers to let them find things. > > > See, you are accessing an implementation detail here. I am not aware of > > any documentation of dl_iterate_phdr() which says whether the dynamic > > section is relocated or not. Which leads directly to: > > It's not so much in the scope of dl_iterate_phdr, but in the runtime > contents of ELF data structures. There are specs on *some* of that, > but they are not among the list of standards musl purports to conform > to (and for example some things like handling of RPATH/RUNPATH > intentionally differ from legacy behaviors here). > > > > How can one programmatically know when the linker is > > > going to place here offsets or full > > > relocated addresses? > > > > In general, you cannot. You could reconstruct the length of the library > > mapping from the LOAD headers, then heuristically assume that any value > > below that is an offset, and any value above it probably a pointer. > > Doesn't help you far, though, since you also need the base address. > > Though I suppose you could assume that the start of the page the PHDRs > > start on is likely the base of the library mapping. > > > > Also, the heuristic will fail for libraries mapped to a low address. In > > theory, all address space after the zero page is fair game, right? But > > libraries can take more space than that. > > > > And God help you if you ever run into an FDPIC architecture. > > > > It appears to me that whatever you are trying to do is not possible > > portibly on Linux at this time. Could you fill us in? > > Indeed, this is probably either an XY problem with a simple portable > way to achieve whatever the underlying goal is, or a glorious hack > that's making a lot more assumptions about implementation internals > and not something you'd be able to rely on continuing to work in the > future, even if you got it working. > > Rich > [-- Attachment #2: Type: text/html, Size: 11483 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] Why the entries in the dynamic section are not always relocated? 2022-05-08 14:23 ` Pablo Galindo Salgado @ 2022-05-08 19:38 ` Szabolcs Nagy 0 siblings, 0 replies; 5+ messages in thread From: Szabolcs Nagy @ 2022-05-08 19:38 UTC (permalink / raw) To: Pablo Galindo Salgado; +Cc: Rich Felker, Markus Wichmann, musl * Pablo Galindo Salgado <pablogsal@gmail.com> [2022-05-08 15:23:29 +0100]: > Thanks for all the answers to this! Here are some clarifications > and context. > > > It appears to me that whatever you are trying to do is not possible > > portibly on Linux at this time. Could you fill us in? > > As part of writing profiling and debugging tools, I am trying to rewrite > the PLT > table to hook into some symbols of shared libraries. This technique is > quite common > and is already used in a considerable number of debuggers, profilers and > elf inspection > tools. Currently, the way this is handled is "not at all" or "checking > against the base > address and heuristically assuming that is an offset if the address is less > than the base", > which is suboptimal. This use case may sound "advanced" or "hacky" but this > is quite a > common technique for doing profilers, debuggers, state inspection tools and > other related > tooling. > > Notice that the lack of anything predictable here makes these tools be more > unreliable > across libc implementations (most people assume it "works" based on what > glibc does > but even old glibcs seem to be inconsistent with this). note: in glibc the internal macro DL_RO_DYN_SECTION controls if the dynamic section is relocated or not. on mips and riscv it is set so there the dynamic section is not relocated. i guess gdb decides based on the target how to find the debug info. but clearly relocating the dynamic section is not compatible with having it exposed as part of the public abi (the libc does not know about future dynamic tags, unknown tags are ignored, not relocated). if user code needs to access the dynamic section at runtime then it can hard code glibc specific knowledge (decide based on glibc version and target) or use another, supported libc interface. (e.g. glibc supports plt hooks via LD_AUDIT). thanks for the example links, those were interesting. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-05-08 19:38 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-08 7:48 [musl] Why the entries in the dynamic section are not always relocated? Pablo Galindo Salgado 2022-05-08 11:39 ` Markus Wichmann 2022-05-08 13:54 ` Rich Felker 2022-05-08 14:23 ` Pablo Galindo Salgado 2022-05-08 19:38 ` Szabolcs Nagy
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).