mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
@ 2024-11-23  0:20 Alex Rønne Petersen
  2024-11-23  8:30 ` Alexander Monakov
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Rønne Petersen @ 2024-11-23  0:20 UTC (permalink / raw)
  To: musl; +Cc: Alex Rønne Petersen

Similar to what's done for __syscall_ret, __sigsetjmp_tail, etc. This fixes a
linker error when building musl libc.so with zig cc.
---
 src/thread/s390x/__tls_get_offset.s | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/thread/s390x/__tls_get_offset.s b/src/thread/s390x/__tls_get_offset.s
index 8ee92de8..2e0913cc 100644
--- a/src/thread/s390x/__tls_get_offset.s
+++ b/src/thread/s390x/__tls_get_offset.s
@@ -5,6 +5,7 @@ __tls_get_offset:
 	aghi  %r15, -160
 
 	la    %r2, 0(%r2, %r12)
+.hidden __tls_get_addr
 	brasl %r14, __tls_get_addr
 
 	ear   %r1, %a0
-- 
2.40.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-11-23  0:20 [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it Alex Rønne Petersen
@ 2024-11-23  8:30 ` Alexander Monakov
  2024-11-23 12:15   ` Alex Rønne Petersen
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Monakov @ 2024-11-23  8:30 UTC (permalink / raw)
  To: musl; +Cc: Alex Rønne Petersen

[-- Attachment #1: Type: text/plain, Size: 929 bytes --]

On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:

> Similar to what's done for __syscall_ret, __sigsetjmp_tail, etc. This fixes a
> linker error when building musl libc.so with zig cc.

Hm, on s390 __tls_get_addr is not used for TLS ABI, so it's fine that it ends up
hidden in libc.so. Unusual.

(linkers must take the most restrictive visibility from all mentions of a symbol)

I'm curious, what kind of error with zig cc were you seeing?

Thanks.
Alexander

> ---
>  src/thread/s390x/__tls_get_offset.s | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/thread/s390x/__tls_get_offset.s b/src/thread/s390x/__tls_get_offset.s
> index 8ee92de8..2e0913cc 100644
> --- a/src/thread/s390x/__tls_get_offset.s
> +++ b/src/thread/s390x/__tls_get_offset.s
> @@ -5,6 +5,7 @@ __tls_get_offset:
>  	aghi  %r15, -160
>  
>  	la    %r2, 0(%r2, %r12)
> +.hidden __tls_get_addr
>  	brasl %r14, __tls_get_addr
>  
>  	ear   %r1, %a0
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-11-23  8:30 ` Alexander Monakov
@ 2024-11-23 12:15   ` Alex Rønne Petersen
  2024-11-23 12:36     ` Alexander Monakov
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Rønne Petersen @ 2024-11-23 12:15 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: musl

On Sat, Nov 23, 2024 at 9:30 AM Alexander Monakov <amonakov@ispras.ru> wrote:
>
> On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
>
> > Similar to what's done for __syscall_ret, __sigsetjmp_tail, etc. This fixes a
> > linker error when building musl libc.so with zig cc.
>
> Hm, on s390 __tls_get_addr is not used for TLS ABI, so it's fine that it ends up
> hidden in libc.so. Unusual.
>
> (linkers must take the most restrictive visibility from all mentions of a symbol)
>
> I'm curious, what kind of error with zig cc were you seeing?

This:

ld.lld: error: relocation R_390_PC32DBL cannot be used against symbol
'__tls_get_addr'; recompile with -fPIC
>>> defined in obj/src/thread/__tls_get_addr.lo
>>> referenced by __tls_get_offset.s:8 (src/thread/s390x/__tls_get_offset.s:8)
>>>               obj/src/thread/s390x/__tls_get_offset.lo:(.text+0x10)

(-fPIC is actually in use.)

Presumably this could be fixed in lld, considering GNU ld seems fine
with it. But I figured that, since glibc also marks __tls_get_addr
hidden for s390x, musl should probably just do the same anyway.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-11-23 12:15   ` Alex Rønne Petersen
@ 2024-11-23 12:36     ` Alexander Monakov
  2024-11-23 12:57       ` Alex Rønne Petersen
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Monakov @ 2024-11-23 12:36 UTC (permalink / raw)
  To: Alex Rønne Petersen; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 1569 bytes --]

On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:

> On Sat, Nov 23, 2024 at 9:30 AM Alexander Monakov <amonakov@ispras.ru> wrote:
> >
> > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
> >
> > > Similar to what's done for __syscall_ret, __sigsetjmp_tail, etc. This fixes a
> > > linker error when building musl libc.so with zig cc.
> >
> > Hm, on s390 __tls_get_addr is not used for TLS ABI, so it's fine that it ends up
> > hidden in libc.so. Unusual.
> >
> > (linkers must take the most restrictive visibility from all mentions of a symbol)
> >
> > I'm curious, what kind of error with zig cc were you seeing?
> 
> This:
> 
> ld.lld: error: relocation R_390_PC32DBL cannot be used against symbol
> '__tls_get_addr'; recompile with -fPIC
> >>> defined in obj/src/thread/__tls_get_addr.lo
> >>> referenced by __tls_get_offset.s:8 (src/thread/s390x/__tls_get_offset.s:8)
> >>>               obj/src/thread/s390x/__tls_get_offset.lo:(.text+0x10)
> 
> (-fPIC is actually in use.)
> 
> Presumably this could be fixed in lld, considering GNU ld seems fine
> with it. But I figured that, since glibc also marks __tls_get_addr
> hidden for s390x, musl should probably just do the same anyway.

I see, thanks. Your commit message was confusing to me, because unlike
__syscall_ret and the like, __tls_get_addr is not an internal helper,
it may not have hidden visibility anywhere except s390. So it felt like
the commit message was drawing a false parallel.

I would love this to land with a clearer commit message, but that's up
to Rich and yourself to sort out.

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-11-23 12:36     ` Alexander Monakov
@ 2024-11-23 12:57       ` Alex Rønne Petersen
  2024-11-29 13:48         ` Rich Felker
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Rønne Petersen @ 2024-11-23 12:57 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: musl

On Sat, Nov 23, 2024 at 1:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
> On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
>
> > On Sat, Nov 23, 2024 at 9:30 AM Alexander Monakov <amonakov@ispras.ru> wrote:
> > >
> > > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
> > >
> > > > Similar to what's done for __syscall_ret, __sigsetjmp_tail, etc. This fixes a
> > > > linker error when building musl libc.so with zig cc.
> > >
> > > Hm, on s390 __tls_get_addr is not used for TLS ABI, so it's fine that it ends up
> > > hidden in libc.so. Unusual.
> > >
> > > (linkers must take the most restrictive visibility from all mentions of a symbol)
> > >
> > > I'm curious, what kind of error with zig cc were you seeing?
> >
> > This:
> >
> > ld.lld: error: relocation R_390_PC32DBL cannot be used against symbol
> > '__tls_get_addr'; recompile with -fPIC
> > >>> defined in obj/src/thread/__tls_get_addr.lo
> > >>> referenced by __tls_get_offset.s:8 (src/thread/s390x/__tls_get_offset.s:8)
> > >>>               obj/src/thread/s390x/__tls_get_offset.lo:(.text+0x10)
> >
> > (-fPIC is actually in use.)
> >
> > Presumably this could be fixed in lld, considering GNU ld seems fine
> > with it. But I figured that, since glibc also marks __tls_get_addr
> > hidden for s390x, musl should probably just do the same anyway.
>
> I see, thanks. Your commit message was confusing to me, because unlike
> __syscall_ret and the like, __tls_get_addr is not an internal helper,
> it may not have hidden visibility anywhere except s390. So it felt like
> the commit message was drawing a false parallel.
>
> I would love this to land with a clearer commit message, but that's up
> to Rich and yourself to sort out.

Yeah, I think that's fair. I wrote the commit message before I
actually investigated in detail how __tls_get_addr is supposed to be
handled for s390x.

Should I re-send the patch with an updated commit message, or how is
this usually handled?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-11-23 12:57       ` Alex Rønne Petersen
@ 2024-11-29 13:48         ` Rich Felker
  2024-11-29 19:49           ` Alex Rønne Petersen
  0 siblings, 1 reply; 16+ messages in thread
From: Rich Felker @ 2024-11-29 13:48 UTC (permalink / raw)
  To: Alex Rønne Petersen; +Cc: Alexander Monakov, musl

On Sat, Nov 23, 2024 at 01:57:16PM +0100, Alex Rønne Petersen wrote:
> On Sat, Nov 23, 2024 at 1:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> >
> > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
> >
> > > On Sat, Nov 23, 2024 at 9:30 AM Alexander Monakov <amonakov@ispras.ru> wrote:
> > > >
> > > > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
> > > >
> > > > > Similar to what's done for __syscall_ret, __sigsetjmp_tail, etc. This fixes a
> > > > > linker error when building musl libc.so with zig cc.
> > > >
> > > > Hm, on s390 __tls_get_addr is not used for TLS ABI, so it's fine that it ends up
> > > > hidden in libc.so. Unusual.
> > > >
> > > > (linkers must take the most restrictive visibility from all mentions of a symbol)
> > > >
> > > > I'm curious, what kind of error with zig cc were you seeing?
> > >
> > > This:
> > >
> > > ld.lld: error: relocation R_390_PC32DBL cannot be used against symbol
> > > '__tls_get_addr'; recompile with -fPIC
> > > >>> defined in obj/src/thread/__tls_get_addr.lo
> > > >>> referenced by __tls_get_offset.s:8 (src/thread/s390x/__tls_get_offset.s:8)
> > > >>>               obj/src/thread/s390x/__tls_get_offset.lo:(.text+0x10)
> > >
> > > (-fPIC is actually in use.)
> > >
> > > Presumably this could be fixed in lld, considering GNU ld seems fine
> > > with it. But I figured that, since glibc also marks __tls_get_addr
> > > hidden for s390x, musl should probably just do the same anyway.
> >
> > I see, thanks. Your commit message was confusing to me, because unlike
> > __syscall_ret and the like, __tls_get_addr is not an internal helper,
> > it may not have hidden visibility anywhere except s390. So it felt like
> > the commit message was drawing a false parallel.
> >
> > I would love this to land with a clearer commit message, but that's up
> > to Rich and yourself to sort out.
> 
> Yeah, I think that's fair. I wrote the commit message before I
> actually investigated in detail how __tls_get_addr is supposed to be
> handled for s390x.
> 
> Should I re-send the patch with an updated commit message, or how is
> this usually handled?

While s390x doesn't need __tls_get_addr to be a public symbol, I'd
kinda prefer not to have an arch-specific hack to make it hidden.
Looking at the code, it's got to be significantly gratuitously slow
having __tls_get_offset making a second function call to
__tls_get_addr, setting up a stack frame and all.

The __tls_get_offset code dates back to 2016 when it was actually
necessary to call into C code in case new TLS needed to be installed.
Since 2019 (9d44b6460a) that's not necessary, so I think we could just
open code the asm for __tls_get_offset entirely and have it be
decently fast.

Rich

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-11-29 13:48         ` Rich Felker
@ 2024-11-29 19:49           ` Alex Rønne Petersen
  2024-11-30  3:20             ` Rich Felker
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Rønne Petersen @ 2024-11-29 19:49 UTC (permalink / raw)
  To: Rich Felker; +Cc: Alexander Monakov, musl

On Fri, Nov 29, 2024 at 2:48 PM Rich Felker <dalias@libc.org> wrote:
>
> On Sat, Nov 23, 2024 at 01:57:16PM +0100, Alex Rønne Petersen wrote:
> > On Sat, Nov 23, 2024 at 1:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> > >
> > > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
> > >
> > > > On Sat, Nov 23, 2024 at 9:30 AM Alexander Monakov <amonakov@ispras.ru> wrote:
> > > > >
> > > > > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
> > > > >
> > > > > > Similar to what's done for __syscall_ret, __sigsetjmp_tail, etc. This fixes a
> > > > > > linker error when building musl libc.so with zig cc.
> > > > >
> > > > > Hm, on s390 __tls_get_addr is not used for TLS ABI, so it's fine that it ends up
> > > > > hidden in libc.so. Unusual.
> > > > >
> > > > > (linkers must take the most restrictive visibility from all mentions of a symbol)
> > > > >
> > > > > I'm curious, what kind of error with zig cc were you seeing?
> > > >
> > > > This:
> > > >
> > > > ld.lld: error: relocation R_390_PC32DBL cannot be used against symbol
> > > > '__tls_get_addr'; recompile with -fPIC
> > > > >>> defined in obj/src/thread/__tls_get_addr.lo
> > > > >>> referenced by __tls_get_offset.s:8 (src/thread/s390x/__tls_get_offset.s:8)
> > > > >>>               obj/src/thread/s390x/__tls_get_offset.lo:(.text+0x10)
> > > >
> > > > (-fPIC is actually in use.)
> > > >
> > > > Presumably this could be fixed in lld, considering GNU ld seems fine
> > > > with it. But I figured that, since glibc also marks __tls_get_addr
> > > > hidden for s390x, musl should probably just do the same anyway.
> > >
> > > I see, thanks. Your commit message was confusing to me, because unlike
> > > __syscall_ret and the like, __tls_get_addr is not an internal helper,
> > > it may not have hidden visibility anywhere except s390. So it felt like
> > > the commit message was drawing a false parallel.
> > >
> > > I would love this to land with a clearer commit message, but that's up
> > > to Rich and yourself to sort out.
> >
> > Yeah, I think that's fair. I wrote the commit message before I
> > actually investigated in detail how __tls_get_addr is supposed to be
> > handled for s390x.
> >
> > Should I re-send the patch with an updated commit message, or how is
> > this usually handled?
>
> While s390x doesn't need __tls_get_addr to be a public symbol, I'd
> kinda prefer not to have an arch-specific hack to make it hidden.
> Looking at the code, it's got to be significantly gratuitously slow
> having __tls_get_offset making a second function call to
> __tls_get_addr, setting up a stack frame and all.
>
> The __tls_get_offset code dates back to 2016 when it was actually
> necessary to call into C code in case new TLS needed to be installed.
> Since 2019 (9d44b6460a) that's not necessary, so I think we could just
> open code the asm for __tls_get_offset entirely and have it be
> decently fast.

That sounds reasonable. I don't have a ton of experience with writing
s390x assembly, though. I can do the obvious thing and extract the
compiled logic from __tls_get_addr without the calling convention
fluff. Would that be sufficient?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-11-29 19:49           ` Alex Rønne Petersen
@ 2024-11-30  3:20             ` Rich Felker
  2024-11-30 17:51               ` Fangrui Song
  0 siblings, 1 reply; 16+ messages in thread
From: Rich Felker @ 2024-11-30  3:20 UTC (permalink / raw)
  To: Alex Rønne Petersen; +Cc: Alexander Monakov, musl

On Fri, Nov 29, 2024 at 08:49:00PM +0100, Alex Rønne Petersen wrote:
> On Fri, Nov 29, 2024 at 2:48 PM Rich Felker <dalias@libc.org> wrote:
> >
> > On Sat, Nov 23, 2024 at 01:57:16PM +0100, Alex Rønne Petersen wrote:
> > > On Sat, Nov 23, 2024 at 1:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> > > >
> > > > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
> > > >
> > > > > On Sat, Nov 23, 2024 at 9:30 AM Alexander Monakov <amonakov@ispras.ru> wrote:
> > > > > >
> > > > > > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
> > > > > >
> > > > > > > Similar to what's done for __syscall_ret, __sigsetjmp_tail, etc.. This fixes a
> > > > > > > linker error when building musl libc.so with zig cc.
> > > > > >
> > > > > > Hm, on s390 __tls_get_addr is not used for TLS ABI, so it's fine that it ends up
> > > > > > hidden in libc.so. Unusual.
> > > > > >
> > > > > > (linkers must take the most restrictive visibility from all mentions of a symbol)
> > > > > >
> > > > > > I'm curious, what kind of error with zig cc were you seeing?
> > > > >
> > > > > This:
> > > > >
> > > > > ld.lld: error: relocation R_390_PC32DBL cannot be used against symbol
> > > > > '__tls_get_addr'; recompile with -fPIC
> > > > > >>> defined in obj/src/thread/__tls_get_addr.lo
> > > > > >>> referenced by __tls_get_offset.s:8 (src/thread/s390x/__tls_get_offset.s:8)
> > > > > >>>               obj/src/thread/s390x/__tls_get_offset.lo:(.text+0x10)
> > > > >
> > > > > (-fPIC is actually in use.)
> > > > >
> > > > > Presumably this could be fixed in lld, considering GNU ld seems fine
> > > > > with it. But I figured that, since glibc also marks __tls_get_addr
> > > > > hidden for s390x, musl should probably just do the same anyway.
> > > >
> > > > I see, thanks. Your commit message was confusing to me, because unlike
> > > > __syscall_ret and the like, __tls_get_addr is not an internal helper,
> > > > it may not have hidden visibility anywhere except s390. So it felt like
> > > > the commit message was drawing a false parallel.
> > > >
> > > > I would love this to land with a clearer commit message, but that's up
> > > > to Rich and yourself to sort out.
> > >
> > > Yeah, I think that's fair. I wrote the commit message before I
> > > actually investigated in detail how __tls_get_addr is supposed to be
> > > handled for s390x.
> > >
> > > Should I re-send the patch with an updated commit message, or how is
> > > this usually handled?
> >
> > While s390x doesn't need __tls_get_addr to be a public symbol, I'd
> > kinda prefer not to have an arch-specific hack to make it hidden.
> > Looking at the code, it's got to be significantly gratuitously slow
> > having __tls_get_offset making a second function call to
> > __tls_get_addr, setting up a stack frame and all.
> >
> > The __tls_get_offset code dates back to 2016 when it was actually
> > necessary to call into C code in case new TLS needed to be installed.
> > Since 2019 (9d44b6460a) that's not necessary, so I think we could just
> > open code the asm for __tls_get_offset entirely and have it be
> > decently fast.
> 
> That sounds reasonable. I don't have a ton of experience with writing
> s390x assembly, though. I can do the obvious thing and extract the
> compiled logic from __tls_get_addr without the calling convention
> fluff. Would that be sufficient?

That's what I was looking at doing. Basically just compiling a
modified version of __tls_get_addr that subtracts the thread pointer,
then prepending the code to load the index address from the GOT
pointer argument in r12.

A further optimization later could be storing the address with tp
pre-subtracted in the dtv. This would also be optimal for archs with
TLSDESC support, at the expense of an extra addition in legacy
__tls_get_addr access. On some archs it may even save a temp register
in the TLSDESC function.

Rich

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-11-30  3:20             ` Rich Felker
@ 2024-11-30 17:51               ` Fangrui Song
  2024-12-12 17:45                 ` Alex Rønne Petersen
  0 siblings, 1 reply; 16+ messages in thread
From: Fangrui Song @ 2024-11-30 17:51 UTC (permalink / raw)
  To: musl; +Cc: Alex Rønne Petersen, Alexander Monakov

On Fri, Nov 29, 2024 at 7:20 PM Rich Felker <dalias@libc.org> wrote:
>
> On Fri, Nov 29, 2024 at 08:49:00PM +0100, Alex Rønne Petersen wrote:
> > On Fri, Nov 29, 2024 at 2:48 PM Rich Felker <dalias@libc.org> wrote:
> > >
> > > On Sat, Nov 23, 2024 at 01:57:16PM +0100, Alex Rønne Petersen wrote:
> > > > On Sat, Nov 23, 2024 at 1:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> > > > >
> > > > > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
> > > > >
> > > > > > On Sat, Nov 23, 2024 at 9:30 AM Alexander Monakov <amonakov@ispras.ru> wrote:
> > > > > > >
> > > > > > > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
> > > > > > >
> > > > > > > > Similar to what's done for __syscall_ret, __sigsetjmp_tail, etc.. This fixes a
> > > > > > > > linker error when building musl libc.so with zig cc.
> > > > > > >
> > > > > > > Hm, on s390 __tls_get_addr is not used for TLS ABI, so it's fine that it ends up
> > > > > > > hidden in libc.so. Unusual.
> > > > > > >
> > > > > > > (linkers must take the most restrictive visibility from all mentions of a symbol)
> > > > > > >
> > > > > > > I'm curious, what kind of error with zig cc were you seeing?
> > > > > >
> > > > > > This:
> > > > > >
> > > > > > ld.lld: error: relocation R_390_PC32DBL cannot be used against symbol
> > > > > > '__tls_get_addr'; recompile with -fPIC
> > > > > > >>> defined in obj/src/thread/__tls_get_addr.lo
> > > > > > >>> referenced by __tls_get_offset.s:8 (src/thread/s390x/__tls_get_offset.s:8)
> > > > > > >>>               obj/src/thread/s390x/__tls_get_offset.lo:(.text+0x10)
> > > > > >
> > > > > > (-fPIC is actually in use.)
> > > > > >
> > > > > > Presumably this could be fixed in lld, considering GNU ld seems fine
> > > > > > with it. But I figured that, since glibc also marks __tls_get_addr
> > > > > > hidden for s390x, musl should probably just do the same anyway.
> > > > >
> > > > > I see, thanks. Your commit message was confusing to me, because unlike
> > > > > __syscall_ret and the like, __tls_get_addr is not an internal helper,
> > > > > it may not have hidden visibility anywhere except s390. So it felt like
> > > > > the commit message was drawing a false parallel.
> > > > >
> > > > > I would love this to land with a clearer commit message, but that's up
> > > > > to Rich and yourself to sort out.
> > > >
> > > > Yeah, I think that's fair. I wrote the commit message before I
> > > > actually investigated in detail how __tls_get_addr is supposed to be
> > > > handled for s390x.
> > > >
> > > > Should I re-send the patch with an updated commit message, or how is
> > > > this usually handled?
> > >
> > > While s390x doesn't need __tls_get_addr to be a public symbol, I'd
> > > kinda prefer not to have an arch-specific hack to make it hidden.
> > > Looking at the code, it's got to be significantly gratuitously slow
> > > having __tls_get_offset making a second function call to
> > > __tls_get_addr, setting up a stack frame and all.
> > >
> > > The __tls_get_offset code dates back to 2016 when it was actually
> > > necessary to call into C code in case new TLS needed to be installed.
> > > Since 2019 (9d44b6460a) that's not necessary, so I think we could just
> > > open code the asm for __tls_get_offset entirely and have it be
> > > decently fast.
> >
> > That sounds reasonable. I don't have a ton of experience with writing
> > s390x assembly, though. I can do the obvious thing and extract the
> > compiled logic from __tls_get_addr without the calling convention
> > fluff. Would that be sufficient?
>
> That's what I was looking at doing. Basically just compiling a
> modified version of __tls_get_addr that subtracts the thread pointer,
> then prepending the code to load the index address from the GOT
> pointer argument in r12.
>
> A further optimization later could be storing the address with tp
> pre-subtracted in the dtv. This would also be optimal for archs with
> TLSDESC support, at the expense of an extra addition in legacy
> __tls_get_addr access. On some archs it may even save a temp register
> in the TLSDESC function.
>
> Rich

(I am not versed in s390x assembly, but I have some notes about __tls_get_offset

https://maskray.me/blog/2024-02-11-toolchain-notes-on-z-architecture#general-dynamic-tls-model

The 32-bit ABI had to use __tls_get_offset because some nice
general-instructions-extension was unavailable when the ABI was
codified.
The 64-bit ABI following the 32-bit __tls_get_offset was just unfortunate.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-11-30 17:51               ` Fangrui Song
@ 2024-12-12 17:45                 ` Alex Rønne Petersen
  2024-12-13 11:18                   ` Rich Felker
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Rønne Petersen @ 2024-12-12 17:45 UTC (permalink / raw)
  To: Fangrui Song, dalias; +Cc: musl, Alexander Monakov

On Sat, Nov 30, 2024 at 6:51 PM Fangrui Song <i@maskray.me> wrote:
>
> On Fri, Nov 29, 2024 at 7:20 PM Rich Felker <dalias@libc.org> wrote:
> >
> > On Fri, Nov 29, 2024 at 08:49:00PM +0100, Alex Rønne Petersen wrote:
> > > On Fri, Nov 29, 2024 at 2:48 PM Rich Felker <dalias@libc.org> wrote:
> > > >
> > > > On Sat, Nov 23, 2024 at 01:57:16PM +0100, Alex Rønne Petersen wrote:
> > > > > On Sat, Nov 23, 2024 at 1:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> > > > > >
> > > > > > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
> > > > > >
> > > > > > > On Sat, Nov 23, 2024 at 9:30 AM Alexander Monakov <amonakov@ispras.ru> wrote:
> > > > > > > >
> > > > > > > > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote:
> > > > > > > >
> > > > > > > > > Similar to what's done for __syscall_ret, __sigsetjmp_tail, etc.. This fixes a
> > > > > > > > > linker error when building musl libc.so with zig cc.
> > > > > > > >
> > > > > > > > Hm, on s390 __tls_get_addr is not used for TLS ABI, so it's fine that it ends up
> > > > > > > > hidden in libc.so. Unusual.
> > > > > > > >
> > > > > > > > (linkers must take the most restrictive visibility from all mentions of a symbol)
> > > > > > > >
> > > > > > > > I'm curious, what kind of error with zig cc were you seeing?
> > > > > > >
> > > > > > > This:
> > > > > > >
> > > > > > > ld.lld: error: relocation R_390_PC32DBL cannot be used against symbol
> > > > > > > '__tls_get_addr'; recompile with -fPIC
> > > > > > > >>> defined in obj/src/thread/__tls_get_addr.lo
> > > > > > > >>> referenced by __tls_get_offset.s:8 (src/thread/s390x/__tls_get_offset.s:8)
> > > > > > > >>>               obj/src/thread/s390x/__tls_get_offset.lo:(.text+0x10)
> > > > > > >
> > > > > > > (-fPIC is actually in use.)
> > > > > > >
> > > > > > > Presumably this could be fixed in lld, considering GNU ld seems fine
> > > > > > > with it. But I figured that, since glibc also marks __tls_get_addr
> > > > > > > hidden for s390x, musl should probably just do the same anyway.
> > > > > >
> > > > > > I see, thanks. Your commit message was confusing to me, because unlike
> > > > > > __syscall_ret and the like, __tls_get_addr is not an internal helper,
> > > > > > it may not have hidden visibility anywhere except s390. So it felt like
> > > > > > the commit message was drawing a false parallel.
> > > > > >
> > > > > > I would love this to land with a clearer commit message, but that's up
> > > > > > to Rich and yourself to sort out.
> > > > >
> > > > > Yeah, I think that's fair. I wrote the commit message before I
> > > > > actually investigated in detail how __tls_get_addr is supposed to be
> > > > > handled for s390x.
> > > > >
> > > > > Should I re-send the patch with an updated commit message, or how is
> > > > > this usually handled?
> > > >
> > > > While s390x doesn't need __tls_get_addr to be a public symbol, I'd
> > > > kinda prefer not to have an arch-specific hack to make it hidden.
> > > > Looking at the code, it's got to be significantly gratuitously slow
> > > > having __tls_get_offset making a second function call to
> > > > __tls_get_addr, setting up a stack frame and all.
> > > >
> > > > The __tls_get_offset code dates back to 2016 when it was actually
> > > > necessary to call into C code in case new TLS needed to be installed.
> > > > Since 2019 (9d44b6460a) that's not necessary, so I think we could just
> > > > open code the asm for __tls_get_offset entirely and have it be
> > > > decently fast.
> > >
> > > That sounds reasonable. I don't have a ton of experience with writing
> > > s390x assembly, though. I can do the obvious thing and extract the
> > > compiled logic from __tls_get_addr without the calling convention
> > > fluff. Would that be sufficient?
> >
> > That's what I was looking at doing. Basically just compiling a
> > modified version of __tls_get_addr that subtracts the thread pointer,
> > then prepending the code to load the index address from the GOT
> > pointer argument in r12.
> >
> > A further optimization later could be storing the address with tp
> > pre-subtracted in the dtv. This would also be optimal for archs with
> > TLSDESC support, at the expense of an extra addition in legacy
> > __tls_get_addr access. On some archs it may even save a temp register
> > in the TLSDESC function.
> >
> > Rich
>
> (I am not versed in s390x assembly, but I have some notes about __tls_get_offset
>
> https://maskray.me/blog/2024-02-11-toolchain-notes-on-z-architecture#general-dynamic-tls-model
>
> The 32-bit ABI had to use __tls_get_offset because some nice
> general-instructions-extension was unavailable when the ABI was
> codified.
> The 64-bit ABI following the 32-bit __tls_get_offset was just unfortunate.

From your notes, it sounds like __tls_get_addr has to be hidden, even
if we don't actually make use of it in __tls_get_offset. Is my
understanding correct?

If yes, what would be the preferred way to achieve this in musl?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-12-12 17:45                 ` Alex Rønne Petersen
@ 2024-12-13 11:18                   ` Rich Felker
  2024-12-13 19:04                     ` Alex Rønne Petersen
  0 siblings, 1 reply; 16+ messages in thread
From: Rich Felker @ 2024-12-13 11:18 UTC (permalink / raw)
  To: Alex Rønne Petersen; +Cc: Fangrui Song, musl, Alexander Monakov

On Thu, Dec 12, 2024 at 06:45:39PM +0100, Alex Rønne Petersen wrote:
> On Sat, Nov 30, 2024 at 6:51 PM Fangrui Song <i@maskray.me> wrote:
> > (I am not versed in s390x assembly, but I have some notes about __tls_get_offset
> >
> > https://maskray.me/blog/2024-02-11-toolchain-notes-on-z-architecture#general-dynamic-tls-model
> >
> > The 32-bit ABI had to use __tls_get_offset because some nice
> > general-instructions-extension was unavailable when the ABI was
> > codified.
> > The 64-bit ABI following the 32-bit __tls_get_offset was just unfortunate..
> 
> From your notes, it sounds like __tls_get_addr has to be hidden, even
> if we don't actually make use of it in __tls_get_offset. Is my
> understanding correct?
> 
> If yes, what would be the preferred way to achieve this in musl?

There is no requirement for a symbol to be hidden unless it violates
namespace, which is not the case here. The problem is that the code in
__tls_get_offset is performing a call to __tls_get_addr in a manner
that's not valid unless the call target is local.

My preferred fix would be getting rid of the call and inlining
__tls_get_addr into __tls_get_offset. This was not possible back when
the port was added because __tls_get_addr had a complex code path for
installing new TLS on first-access. That was changed long ago, so now
it's a fairly trivial instruction sequence.

Rich

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-12-13 11:18                   ` Rich Felker
@ 2024-12-13 19:04                     ` Alex Rønne Petersen
  2024-12-22 13:23                       ` Rich Felker
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Rønne Petersen @ 2024-12-13 19:04 UTC (permalink / raw)
  To: Rich Felker; +Cc: Fangrui Song, musl, Alexander Monakov

On Fri, Dec 13, 2024 at 12:18 PM Rich Felker <dalias@libc.org> wrote:
>
> On Thu, Dec 12, 2024 at 06:45:39PM +0100, Alex Rønne Petersen wrote:
> > On Sat, Nov 30, 2024 at 6:51 PM Fangrui Song <i@maskray.me> wrote:
> > > (I am not versed in s390x assembly, but I have some notes about __tls_get_offset
> > >
> > > https://maskray.me/blog/2024-02-11-toolchain-notes-on-z-architecture#general-dynamic-tls-model
> > >
> > > The 32-bit ABI had to use __tls_get_offset because some nice
> > > general-instructions-extension was unavailable when the ABI was
> > > codified.
> > > The 64-bit ABI following the 32-bit __tls_get_offset was just unfortunate..
> >
> > From your notes, it sounds like __tls_get_addr has to be hidden, even
> > if we don't actually make use of it in __tls_get_offset. Is my
> > understanding correct?
> >
> > If yes, what would be the preferred way to achieve this in musl?
>
> There is no requirement for a symbol to be hidden unless it violates
> namespace, which is not the case here. The problem is that the code in
> __tls_get_offset is performing a call to __tls_get_addr in a manner
> that's not valid unless the call target is local.
>
> My preferred fix would be getting rid of the call and inlining
> __tls_get_addr into __tls_get_offset. This was not possible back when
> the port was added because __tls_get_addr had a complex code path for
> installing new TLS on first-access. That was changed long ago, so now
> it's a fairly trivial instruction sequence.

Before I send a patch, just to confirm, is this what you have in mind?

        .global __tls_get_offset
        .type __tls_get_offset,%function
__tls_get_offset:
        stmg  %r14, %r15, 112(%r15)
        aghi  %r15, -160

        ear   %r0, %a0
        sllg  %r0, %r0, 32
        ear   %r0, %a1

        la    %r1, 0(%r2, %r12)

        lg    %r3, 0(%r1)
        sllg  %r4, %r3, 3
        lg    %r5, 8(%r0)
        lg    %r2, 0(%r4, %r5)
        ag    %r2, 8(%r1)
        sgr   %r2, %r0

        lmg   %r14, %r15, 272(%r15)
        br    %r14

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-12-13 19:04                     ` Alex Rønne Petersen
@ 2024-12-22 13:23                       ` Rich Felker
  2024-12-22 13:53                         ` Alex Rønne Petersen
  0 siblings, 1 reply; 16+ messages in thread
From: Rich Felker @ 2024-12-22 13:23 UTC (permalink / raw)
  To: Alex Rønne Petersen; +Cc: Fangrui Song, musl, Alexander Monakov

On Fri, Dec 13, 2024 at 08:04:22PM +0100, Alex Rønne Petersen wrote:
> On Fri, Dec 13, 2024 at 12:18 PM Rich Felker <dalias@libc.org> wrote:
> >
> > On Thu, Dec 12, 2024 at 06:45:39PM +0100, Alex Rønne Petersen wrote:
> > > On Sat, Nov 30, 2024 at 6:51 PM Fangrui Song <i@maskray.me> wrote:
> > > > (I am not versed in s390x assembly, but I have some notes about __tls_get_offset
> > > >
> > > > https://maskray.me/blog/2024-02-11-toolchain-notes-on-z-architecture#general-dynamic-tls-model
> > > >
> > > > The 32-bit ABI had to use __tls_get_offset because some nice
> > > > general-instructions-extension was unavailable when the ABI was
> > > > codified.
> > > > The 64-bit ABI following the 32-bit __tls_get_offset was just unfortunate..
> > >
> > > From your notes, it sounds like __tls_get_addr has to be hidden, even
> > > if we don't actually make use of it in __tls_get_offset. Is my
> > > understanding correct?
> > >
> > > If yes, what would be the preferred way to achieve this in musl?
> >
> > There is no requirement for a symbol to be hidden unless it violates
> > namespace, which is not the case here. The problem is that the code in
> > __tls_get_offset is performing a call to __tls_get_addr in a manner
> > that's not valid unless the call target is local.
> >
> > My preferred fix would be getting rid of the call and inlining
> > __tls_get_addr into __tls_get_offset. This was not possible back when
> > the port was added because __tls_get_addr had a complex code path for
> > installing new TLS on first-access. That was changed long ago, so now
> > it's a fairly trivial instruction sequence.
> 
> Before I send a patch, just to confirm, is this what you have in mind?
> 
>         .global __tls_get_offset
>         .type __tls_get_offset,%function
> __tls_get_offset:
>         stmg  %r14, %r15, 112(%r15)
>         aghi  %r15, -160
> 
>         ear   %r0, %a0
>         sllg  %r0, %r0, 32
>         ear   %r0, %a1
> 
>         la    %r1, 0(%r2, %r12)
> 
>         lg    %r3, 0(%r1)
>         sllg  %r4, %r3, 3
>         lg    %r5, 8(%r0)
>         lg    %r2, 0(%r4, %r5)
>         ag    %r2, 8(%r1)
>         sgr   %r2, %r0
> 
>         lmg   %r14, %r15, 272(%r15)
>         br    %r14

I'm not clear on what you're setting up the stack frame (munging r14
and r15) for. My disasm of existing __tls_get_addr doesn't do that,
and it doesn't seem to be useful for a leaf function -- or desirable
for one that's a critical hot path.

The 3 lines before loading the actual argument from r2+r12 also look
wrong. I don't understand s390x asm very well but I don't see how they
could do anything meaningful there.

The disasm I have is:

0000000000000000 <__tls_get_addr>:
   0:	e3 10 20 00 00 04 	lg	%r1,0(%r2)
   6:	eb 11 00 03 00 0d 	sllg	%r1,%r1,3
   c:	b2 4f 00 30       	ear	%r3,%a0
  10:	eb 33 00 20 00 0d 	sllg	%r3,%r3,32
  16:	b2 4f 00 31       	ear	%r3,%a1
  1a:	e3 30 30 08 00 04 	lg	%r3,8(%r3)
  20:	e3 11 30 00 00 04 	lg	%r1,0(%r1,%r3)
  26:	e3 10 20 08 00 08 	ag	%r1,8(%r2)
  2c:	b9 04 00 21       	lgr	%r2,%r1
  30:	07 fe             	br	%r14

and AIUI the only changes that should be made are adding at the
beginning:

	la	%r2,0(%r2,%r12)

and subtracting off the thread pointer before return. (This latter
part wouldn't be needed if we switch to storing tp-rel addressed in
dtv, but that would be a separate change.)

Rich

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-12-22 13:23                       ` Rich Felker
@ 2024-12-22 13:53                         ` Alex Rønne Petersen
  2024-12-22 14:14                           ` Rich Felker
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Rønne Petersen @ 2024-12-22 13:53 UTC (permalink / raw)
  To: Rich Felker; +Cc: Fangrui Song, musl, Alexander Monakov

On Sun, Dec 22, 2024 at 2:23 PM Rich Felker <dalias@libc.org> wrote:
>
> On Fri, Dec 13, 2024 at 08:04:22PM +0100, Alex Rønne Petersen wrote:
> > On Fri, Dec 13, 2024 at 12:18 PM Rich Felker <dalias@libc.org> wrote:
> > >
> > > On Thu, Dec 12, 2024 at 06:45:39PM +0100, Alex Rønne Petersen wrote:
> > > > On Sat, Nov 30, 2024 at 6:51 PM Fangrui Song <i@maskray.me> wrote:
> > > > > (I am not versed in s390x assembly, but I have some notes about __tls_get_offset
> > > > >
> > > > > https://maskray.me/blog/2024-02-11-toolchain-notes-on-z-architecture#general-dynamic-tls-model
> > > > >
> > > > > The 32-bit ABI had to use __tls_get_offset because some nice
> > > > > general-instructions-extension was unavailable when the ABI was
> > > > > codified.
> > > > > The 64-bit ABI following the 32-bit __tls_get_offset was just unfortunate..
> > > >
> > > > From your notes, it sounds like __tls_get_addr has to be hidden, even
> > > > if we don't actually make use of it in __tls_get_offset. Is my
> > > > understanding correct?
> > > >
> > > > If yes, what would be the preferred way to achieve this in musl?
> > >
> > > There is no requirement for a symbol to be hidden unless it violates
> > > namespace, which is not the case here. The problem is that the code in
> > > __tls_get_offset is performing a call to __tls_get_addr in a manner
> > > that's not valid unless the call target is local.
> > >
> > > My preferred fix would be getting rid of the call and inlining
> > > __tls_get_addr into __tls_get_offset. This was not possible back when
> > > the port was added because __tls_get_addr had a complex code path for
> > > installing new TLS on first-access. That was changed long ago, so now
> > > it's a fairly trivial instruction sequence.
> >
> > Before I send a patch, just to confirm, is this what you have in mind?
> >
> >         .global __tls_get_offset
> >         .type __tls_get_offset,%function
> > __tls_get_offset:
> >         stmg  %r14, %r15, 112(%r15)
> >         aghi  %r15, -160
> >
> >         ear   %r0, %a0
> >         sllg  %r0, %r0, 32
> >         ear   %r0, %a1
> >
> >         la    %r1, 0(%r2, %r12)
> >
> >         lg    %r3, 0(%r1)
> >         sllg  %r4, %r3, 3
> >         lg    %r5, 8(%r0)
> >         lg    %r2, 0(%r4, %r5)
> >         ag    %r2, 8(%r1)
> >         sgr   %r2, %r0
> >
> >         lmg   %r14, %r15, 272(%r15)
> >         br    %r14
>
> I'm not clear on what you're setting up the stack frame (munging r14
> and r15) for. My disasm of existing __tls_get_addr doesn't do that,
> and it doesn't seem to be useful for a leaf function -- or desirable
> for one that's a critical hot path.

That was already there in the __tls_get_offset asm. I wasn't sure if
removing it is fine ABI-wise. If it is, then I'll definitely do so.

>
> The 3 lines before loading the actual argument from r2+r12 also look
> wrong. I don't understand s390x asm very well but I don't see how they
> could do anything meaningful there.

%r0 is used twice: Once in the code inlined from __tls_get_addr and
once more to subtract the thread pointer before return as you mention
below.

I guess it might be a bit confusing in a side-by-side comparison
because I changed the register number usage to be (IMO) more
understandable when just reading the new __tls_get_offset in a vacuum
(vs inlining __tls_get_addr's register number usage unchanged).

>
> The disasm I have is:
>
> 0000000000000000 <__tls_get_addr>:
>    0:   e3 10 20 00 00 04       lg      %r1,0(%r2)
>    6:   eb 11 00 03 00 0d       sllg    %r1,%r1,3
>    c:   b2 4f 00 30             ear     %r3,%a0
>   10:   eb 33 00 20 00 0d       sllg    %r3,%r3,32
>   16:   b2 4f 00 31             ear     %r3,%a1
>   1a:   e3 30 30 08 00 04       lg      %r3,8(%r3)
>   20:   e3 11 30 00 00 04       lg      %r1,0(%r1,%r3)
>   26:   e3 10 20 08 00 08       ag      %r1,8(%r2)
>   2c:   b9 04 00 21             lgr     %r2,%r1
>   30:   07 fe                   br      %r14
>
> and AIUI the only changes that should be made are adding at the
> beginning:
>
>         la      %r2,0(%r2,%r12)
>
> and subtracting off the thread pointer before return. (This latter
> part wouldn't be needed if we switch to storing tp-rel addressed in
> dtv, but that would be a separate change.)
>
> Rich

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-12-22 13:53                         ` Alex Rønne Petersen
@ 2024-12-22 14:14                           ` Rich Felker
  2024-12-23  7:08                             ` Rich Felker
  0 siblings, 1 reply; 16+ messages in thread
From: Rich Felker @ 2024-12-22 14:14 UTC (permalink / raw)
  To: Alex Rønne Petersen; +Cc: Fangrui Song, musl, Alexander Monakov

On Sun, Dec 22, 2024 at 02:53:33PM +0100, Alex Rønne Petersen wrote:
> On Sun, Dec 22, 2024 at 2:23 PM Rich Felker <dalias@libc.org> wrote:
> >
> > On Fri, Dec 13, 2024 at 08:04:22PM +0100, Alex Rønne Petersen wrote:
> > > On Fri, Dec 13, 2024 at 12:18 PM Rich Felker <dalias@libc.org> wrote:
> > > >
> > > > On Thu, Dec 12, 2024 at 06:45:39PM +0100, Alex Rønne Petersen wrote:
> > > > > On Sat, Nov 30, 2024 at 6:51 PM Fangrui Song <i@maskray.me> wrote:
> > > > > > (I am not versed in s390x assembly, but I have some notes about __tls_get_offset
> > > > > >
> > > > > > https://maskray.me/blog/2024-02-11-toolchain-notes-on-z-architecture#general-dynamic-tls-model
> > > > > >
> > > > > > The 32-bit ABI had to use __tls_get_offset because some nice
> > > > > > general-instructions-extension was unavailable when the ABI was
> > > > > > codified.
> > > > > > The 64-bit ABI following the 32-bit __tls_get_offset was just unfortunate..
> > > > >
> > > > > From your notes, it sounds like __tls_get_addr has to be hidden, even
> > > > > if we don't actually make use of it in __tls_get_offset. Is my
> > > > > understanding correct?
> > > > >
> > > > > If yes, what would be the preferred way to achieve this in musl?
> > > >
> > > > There is no requirement for a symbol to be hidden unless it violates
> > > > namespace, which is not the case here. The problem is that the code in
> > > > __tls_get_offset is performing a call to __tls_get_addr in a manner
> > > > that's not valid unless the call target is local.
> > > >
> > > > My preferred fix would be getting rid of the call and inlining
> > > > __tls_get_addr into __tls_get_offset. This was not possible back when
> > > > the port was added because __tls_get_addr had a complex code path for
> > > > installing new TLS on first-access. That was changed long ago, so now
> > > > it's a fairly trivial instruction sequence.
> > >
> > > Before I send a patch, just to confirm, is this what you have in mind?
> > >
> > >         .global __tls_get_offset
> > >         .type __tls_get_offset,%function
> > > __tls_get_offset:
> > >         stmg  %r14, %r15, 112(%r15)
> > >         aghi  %r15, -160
> > >
> > >         ear   %r0, %a0
> > >         sllg  %r0, %r0, 32
> > >         ear   %r0, %a1
> > >
> > >         la    %r1, 0(%r2, %r12)
> > >
> > >         lg    %r3, 0(%r1)
> > >         sllg  %r4, %r3, 3
> > >         lg    %r5, 8(%r0)
> > >         lg    %r2, 0(%r4, %r5)
> > >         ag    %r2, 8(%r1)
> > >         sgr   %r2, %r0
> > >
> > >         lmg   %r14, %r15, 272(%r15)
> > >         br    %r14
> >
> > I'm not clear on what you're setting up the stack frame (munging r14
> > and r15) for. My disasm of existing __tls_get_addr doesn't do that,
> > and it doesn't seem to be useful for a leaf function -- or desirable
> > for one that's a critical hot path.
> 
> That was already there in the __tls_get_offset asm. I wasn't sure if
> removing it is fine ABI-wise. If it is, then I'll definitely do so.

Must be different gcc version or cflags...

> > The 3 lines before loading the actual argument from r2+r12 also look
> > wrong. I don't understand s390x asm very well but I don't see how they
> > could do anything meaningful there.
> 
> %r0 is used twice: Once in the code inlined from __tls_get_addr and
> once more to subtract the thread pointer before return as you mention
> below.

Ahh, that's just loading the thread pointer. I forgot the weird insn
sequence for that. So I think it looks right.

Rich

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.
  2024-12-22 14:14                           ` Rich Felker
@ 2024-12-23  7:08                             ` Rich Felker
  0 siblings, 0 replies; 16+ messages in thread
From: Rich Felker @ 2024-12-23  7:08 UTC (permalink / raw)
  To: Alex Rønne Petersen; +Cc: Fangrui Song, musl, Alexander Monakov

On Sun, Dec 22, 2024 at 09:14:25AM -0500, Rich Felker wrote:
> On Sun, Dec 22, 2024 at 02:53:33PM +0100, Alex Rønne Petersen wrote:
> > On Sun, Dec 22, 2024 at 2:23 PM Rich Felker <dalias@libc.org> wrote:
> > >
> > > On Fri, Dec 13, 2024 at 08:04:22PM +0100, Alex Rønne Petersen wrote:
> > > > On Fri, Dec 13, 2024 at 12:18 PM Rich Felker <dalias@libc.org> wrote:
> > > > >
> > > > > On Thu, Dec 12, 2024 at 06:45:39PM +0100, Alex Rønne Petersen wrote:
> > > > > > On Sat, Nov 30, 2024 at 6:51 PM Fangrui Song <i@maskray.me> wrote:
> > > > > > > (I am not versed in s390x assembly, but I have some notes about __tls_get_offset
> > > > > > >
> > > > > > > https://maskray.me/blog/2024-02-11-toolchain-notes-on-z-architecture#general-dynamic-tls-model
> > > > > > >
> > > > > > > The 32-bit ABI had to use __tls_get_offset because some nice
> > > > > > > general-instructions-extension was unavailable when the ABI was
> > > > > > > codified.
> > > > > > > The 64-bit ABI following the 32-bit __tls_get_offset was just unfortunate..
> > > > > >
> > > > > > From your notes, it sounds like __tls_get_addr has to be hidden, even
> > > > > > if we don't actually make use of it in __tls_get_offset. Is my
> > > > > > understanding correct?
> > > > > >
> > > > > > If yes, what would be the preferred way to achieve this in musl?
> > > > >
> > > > > There is no requirement for a symbol to be hidden unless it violates
> > > > > namespace, which is not the case here. The problem is that the code in
> > > > > __tls_get_offset is performing a call to __tls_get_addr in a manner
> > > > > that's not valid unless the call target is local.
> > > > >
> > > > > My preferred fix would be getting rid of the call and inlining
> > > > > __tls_get_addr into __tls_get_offset. This was not possible back when
> > > > > the port was added because __tls_get_addr had a complex code path for
> > > > > installing new TLS on first-access. That was changed long ago, so now
> > > > > it's a fairly trivial instruction sequence.
> > > >
> > > > Before I send a patch, just to confirm, is this what you have in mind?
> > > >
> > > >         .global __tls_get_offset
> > > >         .type __tls_get_offset,%function
> > > > __tls_get_offset:
> > > >         stmg  %r14, %r15, 112(%r15)
> > > >         aghi  %r15, -160
> > > >
> > > >         ear   %r0, %a0
> > > >         sllg  %r0, %r0, 32
> > > >         ear   %r0, %a1
> > > >
> > > >         la    %r1, 0(%r2, %r12)
> > > >
> > > >         lg    %r3, 0(%r1)
> > > >         sllg  %r4, %r3, 3
> > > >         lg    %r5, 8(%r0)
> > > >         lg    %r2, 0(%r4, %r5)
> > > >         ag    %r2, 8(%r1)
> > > >         sgr   %r2, %r0
> > > >
> > > >         lmg   %r14, %r15, 272(%r15)
> > > >         br    %r14
> > >
> > > I'm not clear on what you're setting up the stack frame (munging r14
> > > and r15) for. My disasm of existing __tls_get_addr doesn't do that,
> > > and it doesn't seem to be useful for a leaf function -- or desirable
> > > for one that's a critical hot path.
> > 
> > That was already there in the __tls_get_offset asm. I wasn't sure if
> > removing it is fine ABI-wise. If it is, then I'll definitely do so.
> 
> Must be different gcc version or cflags...
> 
> > > The 3 lines before loading the actual argument from r2+r12 also look
> > > wrong. I don't understand s390x asm very well but I don't see how they
> > > could do anything meaningful there.
> > 
> > %r0 is used twice: Once in the code inlined from __tls_get_addr and
> > once more to subtract the thread pointer before return as you mention
> > below.
> 
> Ahh, that's just loading the thread pointer. I forgot the weird insn
> sequence for that. So I think it looks right.

Can you go ahead with a patch to do this, omitting the frame pointer
stuff? Also please let me know if you've tested it or if I should
request someone test before applying.

Rich

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-12-23  7:09 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-23  0:20 [musl] [PATCH] s390x: Mark __tls_get_addr hidden before invoking it Alex Rønne Petersen
2024-11-23  8:30 ` Alexander Monakov
2024-11-23 12:15   ` Alex Rønne Petersen
2024-11-23 12:36     ` Alexander Monakov
2024-11-23 12:57       ` Alex Rønne Petersen
2024-11-29 13:48         ` Rich Felker
2024-11-29 19:49           ` Alex Rønne Petersen
2024-11-30  3:20             ` Rich Felker
2024-11-30 17:51               ` Fangrui Song
2024-12-12 17:45                 ` Alex Rønne Petersen
2024-12-13 11:18                   ` Rich Felker
2024-12-13 19:04                     ` Alex Rønne Petersen
2024-12-22 13:23                       ` Rich Felker
2024-12-22 13:53                         ` Alex Rønne Petersen
2024-12-22 14:14                           ` Rich Felker
2024-12-23  7:08                             ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).