From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 325 invoked from network); 8 May 2022 14:46:51 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 8 May 2022 14:46:51 -0000 Received: (qmail 11307 invoked by uid 550); 8 May 2022 14:46:48 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 1241 invoked from network); 8 May 2022 14:23:52 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+GX641Vd0jyQk69XCQC0HC1dMhsGcrFZTpEIMsJdbgw=; b=OCwY3VHrnvDHQrWKuE+ype3M9+dy0cjUnCKA76oTBcUd6LjAh1Tbq0VdlyyjvQzjZP X4eDX48x1l1OV8ugukZ99x+2DjFlPq2HQpycZ/TLzB/w/s/JSxK20REjXu+EmDlGdpjg Cuj4FMiiCScGL2awMKZgBXmNw6nHEUsr6qD+EMZCkgypxH9FpEBJsipdhvFCPegqHI+o kbxt6I4iSYDf0m3hKiHcmsr9VXH0iIfPlzWI7rL3DOlgAyC2eLTID5+F6gHAXzsIe2bm D+HGkf0jLoOJkQdq2HJe34K6aIl7GP/9v02cXL0DOsFqOS5By9elLI7KBwOnFeF9MNXX Dlaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+GX641Vd0jyQk69XCQC0HC1dMhsGcrFZTpEIMsJdbgw=; b=E2gMe+s2XBccp6XBj7YLGmSU0LZZViGkpZAcvGRDcSIh7XnqcTBA4B2Sl3MzlOr4ye p7gfsyY/v4PUlao7Vm+YOSTerZ/lrzdjwmughYCCiWERK+tlAJBOMd0NtJAbyrGqBPF6 9x+a7PFFm5WKyp1VMhzVwfBnRHMs7AgrNfoLa+cpc7O3fVTKrlazf6b0GjxUN6q2GXKv /k0yI3hTrgx6+1zD9dJ//oiftYlDh3dw1uvwhDYnUg8j2/SdKdNDPAChvZnJhCIVrga+ hm3TO9Qk3WQmJVeWqi0FmiG1KeZWGAvt9QiJ2Fx/zBIZIau1z7qHK//fdHyWaic0DgMP gsNQ== X-Gm-Message-State: AOAM530s3wif0Y5TNhWJAvrRPLRz0K948u+8T27vHxjFUiRnrddvGjx1 lWE/pOyvaToe3HG6ui9Uu1V0IHf5162TgfvikYs= X-Google-Smtp-Source: ABdhPJwOkWMMuOOgQcbkbPfVj0bfuUuvxUwIMzyYlWz5xzIhMoNxOjO51tPflVrQQifEap24rjqpuZsy1gm68v89JHE= X-Received: by 2002:a4a:96e3:0:b0:35e:6c87:9133 with SMTP id t32-20020a4a96e3000000b0035e6c879133mr4403447ooi.32.1652019820738; Sun, 08 May 2022 07:23:40 -0700 (PDT) MIME-Version: 1.0 References: <20220508113910.GC7958@voyager> <20220508135444.GD7074@brightrain.aerifal.cx> In-Reply-To: <20220508135444.GD7074@brightrain.aerifal.cx> From: Pablo Galindo Salgado Date: Sun, 8 May 2022 15:23:29 +0100 Message-ID: To: Rich Felker Cc: Markus Wichmann , musl@lists.openwall.com Content-Type: multipart/alternative; boundary="00000000000093bae405de80d639" Subject: Re: [musl] Why the entries in the dynamic section are not always relocated? --00000000000093bae405de80d639 Content-Type: text/plain; charset="UTF-8" Thanks for all the answers to this! Here are some clarifications and context. > It appears to me that whatever you are trying to do is not possible > portibly on Linux at this time. Could you fill us in? As part of writing profiling and debugging tools, I am trying to rewrite the PLT table to hook into some symbols of shared libraries. This technique is quite common and is already used in a considerable number of debuggers, profilers and elf inspection tools. Currently, the way this is handled is "not at all" or "checking against the base address and heuristically assuming that is an offset if the address is less than the base", which is suboptimal. This use case may sound "advanced" or "hacky" but this is quite a common technique for doing profilers, debuggers, state inspection tools and other related tooling. Notice that the lack of anything predictable here makes these tools be more unreliable across libc implementations (most people assume it "works" based on what glibc does but even old glibcs seem to be inconsistent with this). Apart from some advanced profiling/debugger use cases, I think there are several important use cases here that would benefit from some way to handle this at runtime. For instance, inspecting the string tables and symbol tables and other entries in the dynamic section. Here are (some) examples of software dealing with this problem in the wild: https://github.com/ClickHouse/ClickHouse/blob/8513f20cfded839032795978a2ffb8ef1fc6d61b/src/Common/SymbolIndex.cpp#L163 https://gitlab.collabora.com/vivek/libcapsule/-/blob/master/utils/dump.c#L850 There are many, many more examples of tools that are not aware of this incompatibility and are doing it wrong. Just some examples of this: https://github.com/kubo/plthook/blob/fa0267b29e989e310c2594afa095cf697ea09da0/plthook_elf.c#L548-L555 https://github.com/KDE/heaptrack/blob/d9c51f3f76d7a37348020d3aead651f5301f8ea7/src/track/heaptrack_inject.cpp#L317 https://gist.github.com/aeppert/0b1a38d4364e2863d27a8a0ce2c97dc8 https://course.ccs.neu.edu/cs7680sp17/elf-parser/util-plugin.c.txt (and many more). I think there is value on having some way to programmatically efficiently know how to interpret these addresses. At the very least, allowing these tools to work correctly on muslc without even more hacks on top. Thanks for your consideration! On Sun, 8 May 2022 at 14:54, Rich Felker wrote: > On Sun, May 08, 2022 at 01:39:10PM +0200, Markus Wichmann wrote: > > On Sun, May 08, 2022 at 08:48:29AM +0100, Pablo Galindo Salgado wrote: > > > Why is this happening? > > > > The easy question first: This is happening because glibc finds some > > value in writing the actual addresses into the dynamic section, and musl > > does not. All of the addresses given in the dynamic section must > > necessarily be offsets into the library itself (rather, the run-time map > > of the library), so anyone who knows the base address of the library can > > interpret these values, anyway. > > That's basically it. musl does not do this mainly because it's not > possible in general -- on some archs _DYNAMIC is in read-only memory > -- and we generally avoid arch-specific behavior in the dynamic > linker. The only part of _DYNAMIC we modify, on archs where it's > allowed, is DT_DEBUG, because that's a (nasty, should be replaced) > interface with debuggers to let them find things. > > > See, you are accessing an implementation detail here. I am not aware of > > any documentation of dl_iterate_phdr() which says whether the dynamic > > section is relocated or not. Which leads directly to: > > It's not so much in the scope of dl_iterate_phdr, but in the runtime > contents of ELF data structures. There are specs on *some* of that, > but they are not among the list of standards musl purports to conform > to (and for example some things like handling of RPATH/RUNPATH > intentionally differ from legacy behaviors here). > > > > How can one programmatically know when the linker is > > > going to place here offsets or full > > > relocated addresses? > > > > In general, you cannot. You could reconstruct the length of the library > > mapping from the LOAD headers, then heuristically assume that any value > > below that is an offset, and any value above it probably a pointer. > > Doesn't help you far, though, since you also need the base address. > > Though I suppose you could assume that the start of the page the PHDRs > > start on is likely the base of the library mapping. > > > > Also, the heuristic will fail for libraries mapped to a low address. In > > theory, all address space after the zero page is fair game, right? But > > libraries can take more space than that. > > > > And God help you if you ever run into an FDPIC architecture. > > > > It appears to me that whatever you are trying to do is not possible > > portibly on Linux at this time. Could you fill us in? > > Indeed, this is probably either an XY problem with a simple portable > way to achieve whatever the underlying goal is, or a glorious hack > that's making a lot more assumptions about implementation internals > and not something you'd be able to rely on continuing to work in the > future, even if you got it working. > > Rich > --00000000000093bae405de80d639 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for all the answers to thi= s! Here are some clarifications
and context.<= /div>

> It a= ppears to me that whatever you are trying to do is not possible
> po= rtibly on Linux at this time. Could you fill us in?

As part of writing pro= filing and debugging tools, I am trying to rewrite the PLT
table to hook into some symbols of shared libraries. This te= chnique is quite common
and is already used i= n a considerable number of debuggers, profilers and elf inspection
tools. Currently, the way this is handled is "no= t at all" or "checking against the base
address and heuristically assuming that is an offset if the address is= less than the base",
which is suboptima= l. This use case may sound "advanced" or "hacky" but th= is is quite a
common technique for doing prof= ilers, debuggers, state inspection tools and other related
tooling.

Notice that the lack of anything predictable here makes thes= e tools be more unreliable
across libc implem= entations (most people assume it "works" based on what glibc does=
but even old glibcs seem to be inconsistent = with this).

Apart from some advanced=C2=A0profiling/debugger use cases, I think there= are several important
use cases here that wo= uld benefit from some way to handle this at runtime. For instance,
inspecting the string tables and symbol tables and ot= her entries in the dynamic section.
There are many, man= y more examples of tools that are not aware of this incompatibility and are= doing it wrong. Just some examples

(and many more).

I think ther= e is value on having some way to programmatically efficiently know how to i= nterpret these addresses. At the very least,
= allowing these tools to work correctly on muslc without even more hacks on = top.

Thank= s for your consideration!

On Sun, 8= May 2022 at 14:54, Rich Felker <dali= as@libc.org> wrote:
On Sun, May 08, 2022 at 01:39:10PM +0200, Markus Wichmann wrote:=
> On Sun, May 08, 2022 at 08:48:29AM +0100, Pablo Galindo Salgado wrote:=
> > Why is this happening?
>
> The easy question first: This is happening because glibc finds some > value in writing the actual addresses into the dynamic section, and mu= sl
> does not. All of the addresses given in the dynamic section must
> necessarily be offsets into the library itself (rather, the run-time m= ap
> of the library), so anyone who knows the base address of the library c= an
> interpret these values, anyway.

That's basically it. musl does not do this mainly because it's not<= br> possible in general -- on some archs _DYNAMIC is in read-only memory
-- and we generally avoid arch-specific behavior in the dynamic
linker. The only part of _DYNAMIC we modify, on archs where it's
allowed, is DT_DEBUG, because that's a (nasty, should be replaced)
interface with debuggers to let them find things.

> See, you are accessing an implementation detail here. I am not aware o= f
> any documentation of dl_iterate_phdr() which says whether the dynamic<= br> > section is relocated or not. Which leads directly to:

It's not so much in the scope of dl_iterate_phdr, but in the runtime contents of ELF data structures. There are specs on *some* of that,
but they are not among the list of standards musl purports to conform
to (and for example some things like handling of RPATH/RUNPATH
intentionally differ from legacy behaviors here).

> > How can one programmatically know when the linker is
> > going to place here offsets or full
> > relocated addresses?
>
> In general, you cannot. You could reconstruct the length of the librar= y
> mapping from the LOAD headers, then heuristically assume that any valu= e
> below that is an offset, and any value above it probably a pointer. > Doesn't help you far, though, since you also need the base address= .
> Though I suppose you could assume that the start of the page the PHDRs=
> start on is likely the base of the library mapping.
>
> Also, the heuristic will fail for libraries mapped to a low address. I= n
> theory, all address space after the zero page is fair game, right? But=
> libraries can take more space than that.
>
> And God help you if you ever run into an FDPIC architecture.
>
> It appears to me that whatever you are trying to do is not possible > portibly on Linux at this time. Could you fill us in?

Indeed, this is probably either an XY problem with a simple portable
way to achieve whatever the underlying goal is, or a glorious hack
that's making a lot more assumptions about implementation internals
and not something you'd be able to rely on continuing to work in the future, even if you got it working.

Rich
--00000000000093bae405de80d639--