mailing list of musl libc
 help / color / mirror / code / Atom feed
* Removing glibc from the musl .2 ABI
@ 2019-07-11 23:58 A. Wilcox
  2019-07-12  0:51 ` Khem Raj
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: A. Wilcox @ 2019-07-11 23:58 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 790 bytes --]

(Full disclosure: I am the principal author of gcompat.)

Hi,

Now that gcompat has matured, I was wondering if perhaps musl should
consider dropping the glibc ABI guarantees when the "2 ABI" lands.

This would make the LFS64 symbol mess completely moot.

It would also allow musl to "fix" a lot of dumb glibc decisions.  I'm
thinking specifically here of things like ctermid(3), which musl could
actually implement correctly if it wasn't being held back by glibc
defining L_ctermid as 9.

I'm aware this is probably controversial, and it will probably be shot
down quickly, but I thought I would at least suggest this as an option.

Thank you for your consideration.

Best,
--arw

-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-11 23:58 Removing glibc from the musl .2 ABI A. Wilcox
@ 2019-07-12  0:51 ` Khem Raj
  2019-07-12  1:45 ` Rich Felker
  2019-07-17  3:37 ` Rich Felker
  2 siblings, 0 replies; 18+ messages in thread
From: Khem Raj @ 2019-07-12  0:51 UTC (permalink / raw)
  To: musl

On Thu, Jul 11, 2019 at 4:59 PM A. Wilcox <awilfox@adelielinux.org> wrote:
>
> (Full disclosure: I am the principal author of gcompat.)
>
> Hi,
>
> Now that gcompat has matured, I was wondering if perhaps musl should
> consider dropping the glibc ABI guarantees when the "2 ABI" lands.
>
> This would make the LFS64 symbol mess completely moot.
>
> It would also allow musl to "fix" a lot of dumb glibc decisions.  I'm
> thinking specifically here of things like ctermid(3), which musl could
> actually implement correctly if it wasn't being held back by glibc
> defining L_ctermid as 9.
>
> I'm aware this is probably controversial, and it will probably be shot
> down quickly, but I thought I would at least suggest this as an option.
>

I think its too early to drop it but we could provide a configure option
for dropping it and keep the defaults. Since there are enough pre-compiled
apps which probably are not going to change in anytime soon.

> Thank you for your consideration.
>
> Best,
> --arw
>
> --
> A. Wilcox (awilfox)
> Project Lead, Adélie Linux
> https://www.adelielinux.org
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-11 23:58 Removing glibc from the musl .2 ABI A. Wilcox
  2019-07-12  0:51 ` Khem Raj
@ 2019-07-12  1:45 ` Rich Felker
  2019-07-12  1:47   ` Rich Felker
  2019-07-17  3:37 ` Rich Felker
  2 siblings, 1 reply; 18+ messages in thread
From: Rich Felker @ 2019-07-12  1:45 UTC (permalink / raw)
  To: musl

On Thu, Jul 11, 2019 at 06:58:38PM -0500, A. Wilcox wrote:
> (Full disclosure: I am the principal author of gcompat.)
> 
> Hi,
> 
> Now that gcompat has matured, I was wondering if perhaps musl should
> consider dropping the glibc ABI guarantees when the "2 ABI" lands.

It's not decided that it will, or at least not in the near term. I
think the other approach proposed to 64-bit time_t is a lot more
appealing to most existing 32-bit users. I've had out-of-band feedback
from one big user that they depend on ABI stability for the existing
32-bit arch+ABIs and hope there won't be a hard Y2038 EOL for them,
and I myself would also rather prefer not to have to do an ABI switch.

From the beginning, ABI stability was one of the big promises of musl.
I realize we have "enough" time between now and 2038 for putting off
an ABI switch (except for ppl making embedded stuff with really long
lifetimes), so that users who care about ABI stability could stick
with .1 "for now", but then we just push the problem back and they're
unhappy in some moderately-distant future, and probably end up in a
mess when they realize they need time_t's representing times a decade
or two out sooner than they thought...

> This would make the LFS64 symbol mess completely moot.

Yes. Actually I'd like to move all of the ABI-compat symbols out of
ld-reachable symbol table and make them ABI-compat only. But I'd also
like to *improve* ABI-compat, e.g. making regexec from glibc libs safe
on 64-bit (where their regoff_t was wrong), 

> It would also allow musl to "fix" a lot of dumb glibc decisions.  I'm
> thinking specifically here of things like ctermid(3), which musl could
> actually implement correctly if it wasn't being held back by glibc
> defining L_ctermid as 9.

ctermid is something of a junk function anyway, but there are similar
non-junk interfaces affected. Identifying and overhauling them all is
probably a bigger project than I want to take on now, but I still
think it's a promising direction at some point in the future. Ideally
this might go hand in hand with making musl less Linux-centric, in the
form of developing a types ABI that's uniform across archs and meant
to be used natively on bare metal or non-Linux kernels.

> I'm aware this is probably controversial, and it will probably be shot
> down quickly, but I thought I would at least suggest this as an option.
> 
> Thank you for your consideration.

Thanks for the feedback.

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-12  1:45 ` Rich Felker
@ 2019-07-12  1:47   ` Rich Felker
  0 siblings, 0 replies; 18+ messages in thread
From: Rich Felker @ 2019-07-12  1:47 UTC (permalink / raw)
  To: musl

On Thu, Jul 11, 2019 at 09:45:27PM -0400, Rich Felker wrote:
> > This would make the LFS64 symbol mess completely moot.
> 
> Yes. Actually I'd like to move all of the ABI-compat symbols out of
> ld-reachable symbol table and make them ABI-compat only. But I'd also
> like to *improve* ABI-compat, e.g. making regexec from glibc libs safe
> on 64-bit (where their regoff_t was wrong), 

I forgot to finish this paragraph. To follow up, doing this stuff in
the dynamic linker would likely improve ABI-compat functionality,
making it possible to remap symbols just for binaries/libraries that
were detected as being glibc-linked.

I'm actually not sure if this will still be relevant by the time we
get around to doing it, but it's nice to have the option open.

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-11 23:58 Removing glibc from the musl .2 ABI A. Wilcox
  2019-07-12  0:51 ` Khem Raj
  2019-07-12  1:45 ` Rich Felker
@ 2019-07-17  3:37 ` Rich Felker
  2019-07-17 13:13   ` A. Wilcox
  2 siblings, 1 reply; 18+ messages in thread
From: Rich Felker @ 2019-07-17  3:37 UTC (permalink / raw)
  To: musl

On Thu, Jul 11, 2019 at 06:58:38PM -0500, A. Wilcox wrote:
> (Full disclosure: I am the principal author of gcompat.)
> 
> Hi,
> 
> Now that gcompat has matured, I was wondering if perhaps musl should
> consider dropping the glibc ABI guarantees when the "2 ABI" lands.
> 
> This would make the LFS64 symbol mess completely moot.

This is separate from the .2 ABI topic, but what would you think about
removing glibc ABI-compat from the current .1 ABI and replacing it
with enhanced gcompat? I was thinking ldso could load libgcompat
instead of returning a reference to itself for DT_NEEDED referencing
libc.so.6, and we could move all ABI-compat symbols into gcompat.

The reason I bring it up is that ripping out the LFS64
unwantedly-linkable stuff while keeping it as ABI-only is looking like
more of a pain than I expected.

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-17  3:37 ` Rich Felker
@ 2019-07-17 13:13   ` A. Wilcox
  2019-07-17 15:11     ` Rich Felker
  0 siblings, 1 reply; 18+ messages in thread
From: A. Wilcox @ 2019-07-17 13:13 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 1444 bytes --]

On 07/16/19 22:37, Rich Felker wrote:
> On Thu, Jul 11, 2019 at 06:58:38PM -0500, A. Wilcox wrote:
>> (Full disclosure: I am the principal author of gcompat.)
>>
>> Hi,
>>
>> Now that gcompat has matured, I was wondering if perhaps musl should
>> consider dropping the glibc ABI guarantees when the "2 ABI" lands.
>>
>> This would make the LFS64 symbol mess completely moot.
> 
> This is separate from the .2 ABI topic, but what would you think about
> removing glibc ABI-compat from the current .1 ABI and replacing it
> with enhanced gcompat? I was thinking ldso could load libgcompat
> instead of returning a reference to itself for DT_NEEDED referencing
> libc.so.6, and we could move all ABI-compat symbols into gcompat.
> 
> The reason I bring it up is that ripping out the LFS64
> unwantedly-linkable stuff while keeping it as ABI-only is looking like
> more of a pain than I expected.
> 
> Rich


We would be more than happy to work with you on that.

Would gcompat then become a runtime requirement for glibc apps on musl?
What would musl do if gcompat isn't installed on a system?  What about
things like libm and libdl, which I've seen some apps force DT_NEEDED
anyway when built against musl?

Just trying to make sure the community has a clear view of what this
looks like before we jump in.

Best,
--arw


-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-17 13:13   ` A. Wilcox
@ 2019-07-17 15:11     ` Rich Felker
  2019-07-17 18:10       ` A. Wilcox
  0 siblings, 1 reply; 18+ messages in thread
From: Rich Felker @ 2019-07-17 15:11 UTC (permalink / raw)
  To: musl

On Wed, Jul 17, 2019 at 08:13:44AM -0500, A. Wilcox wrote:
> On 07/16/19 22:37, Rich Felker wrote:
> > On Thu, Jul 11, 2019 at 06:58:38PM -0500, A. Wilcox wrote:
> >> (Full disclosure: I am the principal author of gcompat.)
> >>
> >> Hi,
> >>
> >> Now that gcompat has matured, I was wondering if perhaps musl should
> >> consider dropping the glibc ABI guarantees when the "2 ABI" lands.
> >>
> >> This would make the LFS64 symbol mess completely moot.
> > 
> > This is separate from the .2 ABI topic, but what would you think about
> > removing glibc ABI-compat from the current .1 ABI and replacing it
> > with enhanced gcompat? I was thinking ldso could load libgcompat
> > instead of returning a reference to itself for DT_NEEDED referencing
> > libc.so.6, and we could move all ABI-compat symbols into gcompat.
> > 
> > The reason I bring it up is that ripping out the LFS64
> > unwantedly-linkable stuff while keeping it as ABI-only is looking like
> > more of a pain than I expected.
> 
> We would be more than happy to work with you on that.
> 
> Would gcompat then become a runtime requirement for glibc apps on musl?
> What would musl do if gcompat isn't installed on a system?

It would just be a failed DT_NEEDED.

> What about
> things like libm and libdl, which I've seen some apps force DT_NEEDED
> anyway when built against musl?

These could still be ignored (mapped to internal libc) since any
program using them would also necessarily be using libc.so.6.

> Just trying to make sure the community has a clear view of what this
> looks like before we jump in.

Yes. This isn't a request to jump in, just looking at feasability and
whether there'd be interest from your side. Being that ABI-compat
doesn't actually work very well without gcompat right now, though, I
think it might make sense. I'll continue to look at whether there are
other options, possibly just transitional, that might be good too.

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-17 15:11     ` Rich Felker
@ 2019-07-17 18:10       ` A. Wilcox
  2019-07-17 18:16         ` Rich Felker
  0 siblings, 1 reply; 18+ messages in thread
From: A. Wilcox @ 2019-07-17 18:10 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 2631 bytes --]

On 07/17/19 10:11, Rich Felker wrote:
> On Wed, Jul 17, 2019 at 08:13:44AM -0500, A. Wilcox wrote:
>> On 07/16/19 22:37, Rich Felker wrote:
>>> On Thu, Jul 11, 2019 at 06:58:38PM -0500, A. Wilcox wrote:
>>>> (Full disclosure: I am the principal author of gcompat.)
>>>>
>>>> Hi,
>>>>
>>>> Now that gcompat has matured, I was wondering if perhaps musl should
>>>> consider dropping the glibc ABI guarantees when the "2 ABI" lands.
>>>>
>>>> This would make the LFS64 symbol mess completely moot.
>>>
>>> This is separate from the .2 ABI topic, but what would you think about
>>> removing glibc ABI-compat from the current .1 ABI and replacing it
>>> with enhanced gcompat? I was thinking ldso could load libgcompat
>>> instead of returning a reference to itself for DT_NEEDED referencing
>>> libc.so.6, and we could move all ABI-compat symbols into gcompat.
>>>
>>> The reason I bring it up is that ripping out the LFS64
>>> unwantedly-linkable stuff while keeping it as ABI-only is looking like
>>> more of a pain than I expected.
>>
>> We would be more than happy to work with you on that.
>>
>> Would gcompat then become a runtime requirement for glibc apps on musl?
>> What would musl do if gcompat isn't installed on a system?
> 
> It would just be a failed DT_NEEDED.


Okay, sounds reasonable.


>> What about
>> things like libm and libdl, which I've seen some apps force DT_NEEDED
>> anyway when built against musl?
> 
> These could still be ignored (mapped to internal libc) since any
> program using them would also necessarily be using libc.so.6.


Likewise.


>> Just trying to make sure the community has a clear view of what this
>> looks like before we jump in.
> 
> Yes. This isn't a request to jump in, just looking at feasability and
> whether there'd be interest from your side. Being that ABI-compat
> doesn't actually work very well without gcompat right now, though, I
> think it might make sense. I'll continue to look at whether there are
> other options, possibly just transitional, that might be good too.


I meant: I want a clear view of the boundaries between musl and gcompat,
before we (Adélie / the gcompat team) jump in and start designing how we
want to handle all the new symbols we may end up with :)

We also were considering setting up a dedicated gcompat site so that the
community could share apps that are known to work / fail, symbol
presence, LSB missing symbols, etc.  Would that be of interest from your
side as well?

Best,
--arw

-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-17 18:10       ` A. Wilcox
@ 2019-07-17 18:16         ` Rich Felker
  2019-07-22 15:52           ` Rich Felker
  0 siblings, 1 reply; 18+ messages in thread
From: Rich Felker @ 2019-07-17 18:16 UTC (permalink / raw)
  To: musl

On Wed, Jul 17, 2019 at 01:10:19PM -0500, A. Wilcox wrote:
> >> Just trying to make sure the community has a clear view of what this
> >> looks like before we jump in.
> > 
> > Yes. This isn't a request to jump in, just looking at feasability and
> > whether there'd be interest from your side. Being that ABI-compat
> > doesn't actually work very well without gcompat right now, though, I
> > think it might make sense. I'll continue to look at whether there are
> > other options, possibly just transitional, that might be good too.
> 
> I meant: I want a clear view of the boundaries between musl and gcompat,
> before we (Adélie / the gcompat team) jump in and start designing how we
> want to handle all the new symbols we may end up with :)

If we go this route, I would think that gcompat could provide all
symbols which are not either public APIs (extensions you can
legitimately use in source) or musl-header-induced ABIs (for example
things like __ctype_get_mb_cur_max, which is used to define the
MB_CUR_MAX macro). This would include LFS64 as well as the "__xstat"
stuff, the other __ctype_* stuff, etc.

> We also were considering setting up a dedicated gcompat site so that the
> community could share apps that are known to work / fail, symbol
> presence, LSB missing symbols, etc.  Would that be of interest from your
> side as well?

Definitely, regardless of whether we go ahead with the above or not.

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-17 18:16         ` Rich Felker
@ 2019-07-22 15:52           ` Rich Felker
  2019-07-24 15:17             ` Szabolcs Nagy
  2019-07-24 16:33             ` James Y Knight
  0 siblings, 2 replies; 18+ messages in thread
From: Rich Felker @ 2019-07-22 15:52 UTC (permalink / raw)
  To: musl

On Wed, Jul 17, 2019 at 02:16:51PM -0400, Rich Felker wrote:
> On Wed, Jul 17, 2019 at 01:10:19PM -0500, A. Wilcox wrote:
> > >> Just trying to make sure the community has a clear view of what this
> > >> looks like before we jump in.
> > > 
> > > Yes. This isn't a request to jump in, just looking at feasability and
> > > whether there'd be interest from your side. Being that ABI-compat
> > > doesn't actually work very well without gcompat right now, though, I
> > > think it might make sense. I'll continue to look at whether there are
> > > other options, possibly just transitional, that might be good too.
> > 
> > I meant: I want a clear view of the boundaries between musl and gcompat,
> > before we (Adélie / the gcompat team) jump in and start designing how we
> > want to handle all the new symbols we may end up with :)
> 
> If we go this route, I would think that gcompat could provide all
> symbols which are not either public APIs (extensions you can
> legitimately use in source) or musl-header-induced ABIs (for example
> things like __ctype_get_mb_cur_max, which is used to define the
> MB_CUR_MAX macro). This would include LFS64 as well as the "__xstat"
> stuff, the other __ctype_* stuff, etc.

I think I'd like to go foward with this. Further work on time64 has
made it apparent to me that the current glibc ABI-compat we have
inside musl is fragile and is imposing unwanted constraints on musl,
which has long been one of the criteria for exclusion. In particular,
consider this situation:

Several structures that are part of public interfaces in musl were
created with extra space reserved for future extension. In some cases
the reserved space was added by musl; in other cases glibc had the
same. However, if we mandate glibc ABI-compat, *all* of this reserved
space is permanently unusable:

- If the reserved space is specific to musl, then reads from it may
  fault, and stores to it may clobber unrelated memory, if the
  structure was allocated by glibc-linked code.

- If the reserved space is present in both musl and glibc, we can't
  make use of it without risking that glibc makes some different use
  of it in the future, making calls from glibc-linked code dangerous.

This came up in the context of structs rusage and timex, but also
applies to stat, sched_param, sysinfo, statvfs, and perhaps others,
which might have reason for wanting extensibility in the future.

Right now, without the glibc ABI-compat constraint, getrusage, wait3,
and wait4 can avoid new time64 remappings entirely (by using the
reserved space we already have in rusage, which glibc doesn't have at
all). [clock_]adjtime[x] hit the second case -- glibc also has
reserved space in timex, but if they end up wanting to use it for
something else and we've put the 64-bit time there, we may be in
trouble.

I don't think the rusage and timex issues here are compelling by
themselves. It's not a big deal to make compat shims here, and I might
still end up doing it. But I think it's indicative that maintaining
glibc ABI-compat in musl is going to become increasingly problematic.

So, what I'd (tentatively; for discussion) like to do:

When ldso loads an application or shared library and detects that it's
glibc-linked (DT_NEEDED for libc.so.6), it both loads a gcompat
library instead *and* flags the dso as needing ABI-compat. The gcompat
library would be permanently RTLD_LOCAL, unable to be used for
resolving global symbols, since it would have to define symbols
conflicting with libc symbols names and with future directions of the
musl ABI.

Symbol lookups when relocating such a flagged dso would take place by
first processing gcompat (logically, adding it to the head of the dso
search list), then the normal symbol search order. The gcompat library
could also provide a replacement dlsym function, so that dlsym calls
from the glibc-linked DSO also follow this order, and a replacement
dlopen, so that dlopen of libc from the glibc-linked DSO would get the
gcompat module.

I'm not sure what mechanism gcompat would then use to make its own
references to the underlying real libc functions. This is something
we'd need to think about.

Before we decide to do it, please be aware that this would be a bit of
a burden on gcompat to do more than it's doing now. But it would also
make lots of cases work that fundamentally *can't* work now -- compat
with 32-bit code using the legacy 32-bit off_t functions, compat with
64-bit code using regexec, etc. -- anywhere the musl ABI currently
conflicts with the glibc ABI. Of course much of this is optional. The
new things that would be mandatory would mainly be moving over
existing glibc compat shims (like the __ctype and __xstat stuff) and
implementing converting wrappers where musl's use of reserved space
creates unsafety/incompatibility with the existing glibc code.

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-22 15:52           ` Rich Felker
@ 2019-07-24 15:17             ` Szabolcs Nagy
  2019-07-24 16:02               ` Rich Felker
  2019-07-24 16:33             ` James Y Knight
  1 sibling, 1 reply; 18+ messages in thread
From: Szabolcs Nagy @ 2019-07-24 15:17 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2019-07-22 11:52:59 -0400]:
> So, what I'd (tentatively; for discussion) like to do:
> 
> When ldso loads an application or shared library and detects that it's
> glibc-linked (DT_NEEDED for libc.so.6), it both loads a gcompat
> library instead *and* flags the dso as needing ABI-compat. The gcompat
> library would be permanently RTLD_LOCAL, unable to be used for
> resolving global symbols, since it would have to define symbols
> conflicting with libc symbols names and with future directions of the
> musl ABI.
> 
> Symbol lookups when relocating such a flagged dso would take place by
> first processing gcompat (logically, adding it to the head of the dso
> search list), then the normal symbol search order. The gcompat library
> could also provide a replacement dlsym function, so that dlsym calls
> from the glibc-linked DSO also follow this order, and a replacement
> dlopen, so that dlopen of libc from the glibc-linked DSO would get the
> gcompat module.
> 
> I'm not sure what mechanism gcompat would then use to make its own
> references to the underlying real libc functions. This is something
> we'd need to think about.

i'm not sure how gcompat would implement dlsym, if it's
on top of the musl dlsym, then that needs to be accessible
already (e.g. by exposing a __musl_dlsym alias) and can be
used to do lookups in libc.so.

> 
> Before we decide to do it, please be aware that this would be a bit of
> a burden on gcompat to do more than it's doing now. But it would also
> make lots of cases work that fundamentally *can't* work now -- compat
> with 32-bit code using the legacy 32-bit off_t functions, compat with
> 64-bit code using regexec, etc. -- anywhere the musl ABI currently
> conflicts with the glibc ABI. Of course much of this is optional. The
> new things that would be mandatory would mainly be moving over
> existing glibc compat shims (like the __ctype and __xstat stuff) and
> implementing converting wrappers where musl's use of reserved space
> creates unsafety/incompatibility with the existing glibc code.
> 
> Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-24 15:17             ` Szabolcs Nagy
@ 2019-07-24 16:02               ` Rich Felker
  0 siblings, 0 replies; 18+ messages in thread
From: Rich Felker @ 2019-07-24 16:02 UTC (permalink / raw)
  To: musl

On Wed, Jul 24, 2019 at 05:17:35PM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2019-07-22 11:52:59 -0400]:
> > So, what I'd (tentatively; for discussion) like to do:
> > 
> > When ldso loads an application or shared library and detects that it's
> > glibc-linked (DT_NEEDED for libc.so.6), it both loads a gcompat
> > library instead *and* flags the dso as needing ABI-compat. The gcompat
> > library would be permanently RTLD_LOCAL, unable to be used for
> > resolving global symbols, since it would have to define symbols
> > conflicting with libc symbols names and with future directions of the
> > musl ABI.
> > 
> > Symbol lookups when relocating such a flagged dso would take place by
> > first processing gcompat (logically, adding it to the head of the dso
> > search list), then the normal symbol search order. The gcompat library
> > could also provide a replacement dlsym function, so that dlsym calls
> > from the glibc-linked DSO also follow this order, and a replacement
> > dlopen, so that dlopen of libc from the glibc-linked DSO would get the
> > gcompat module.
> > 
> > I'm not sure what mechanism gcompat would then use to make its own
> > references to the underlying real libc functions. This is something
> > we'd need to think about.
> 
> i'm not sure how gcompat would implement dlsym, if it's
> on top of the musl dlsym, then that needs to be accessible
> already (e.g. by exposing a __musl_dlsym alias) and can be
> used to do lookups in libc.so.

The same applies to any interface it would have to wrap due to
mismatched ABI. I can think of a couple potential ways:

1. Simply by referencing the symbol name directly. libgcompat would
not be a glibc dso, so ldso would not use it to resolve its own
symbols, and would end up finding them in libc. The only concern here
is whether the compiler or linker might do some kind of early binding
that makes the references circular and non-symbolic. I don't think the
linker can (can in the sense of "has standing to") do this, but
perhaps the compiler could optimize a call from a function named dlsym
to a function named dlsym to be local rather than going through the
GOT/PLT, since if it were interposed it would never be called to begin
with. However this wouldn't work if there were other aliases for the
same function, so it's probably not something the compiler could do.
gcompat could explicitly preclude it via something like:

void *__gcompat_dlsym(void *dso, const char *restrict name)
{
	...
	return dlsym(...);
}
weak_alias(__gcompat_dlsym, dlsym);

This works because the alias imposes a mandate that the definitions be
the same, and __gcompat_dlsym would be exported (thus not able to be
optimized out) and would necessarily have to honor interposition of
dlsym.

2. By having ldso remap symbol references *from* gcompat, so that
gcompat could refer to __libc_foo and have the reference get remapped
to plain foo.

I feel like option 2 is a nastier hack (especially on the musl side)
and not needed, since option 1 seems to work.

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-22 15:52           ` Rich Felker
  2019-07-24 15:17             ` Szabolcs Nagy
@ 2019-07-24 16:33             ` James Y Knight
  2019-07-24 17:36               ` Szabolcs Nagy
  2019-07-24 21:29               ` Rich Felker
  1 sibling, 2 replies; 18+ messages in thread
From: James Y Knight @ 2019-07-24 16:33 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 5889 bytes --]

One thing I've not seen mentioned yet: if this is done, then anyone
(whether intentionally or inadvertently) who links any glibc-compiled .o or
.a files into a musl binary/shared-lib will be broken.

Up until now, with musl's mostly-glibc-compatible ABI, you could link the
two object files together, and generally expect it to work. When
compatibility is instead done with magic in the dynamic loader, that
obviously can only ever work with a shared-object boundary.

I don't know if anyone actually uses musl in a context where this is likely
to be a problem, but it at least seems worth discussing (and loudly
documenting as a warning to users not to do this if implemented).


On Mon, Jul 22, 2019 at 8:53 AM Rich Felker <dalias@libc.org> wrote:

> On Wed, Jul 17, 2019 at 02:16:51PM -0400, Rich Felker wrote:
> > On Wed, Jul 17, 2019 at 01:10:19PM -0500, A. Wilcox wrote:
> > > >> Just trying to make sure the community has a clear view of what this
> > > >> looks like before we jump in.
> > > >
> > > > Yes. This isn't a request to jump in, just looking at feasability and
> > > > whether there'd be interest from your side. Being that ABI-compat
> > > > doesn't actually work very well without gcompat right now, though, I
> > > > think it might make sense. I'll continue to look at whether there are
> > > > other options, possibly just transitional, that might be good too.
> > >
> > > I meant: I want a clear view of the boundaries between musl and
> gcompat,
> > > before we (Adélie / the gcompat team) jump in and start designing how
> we
> > > want to handle all the new symbols we may end up with :)
> >
> > If we go this route, I would think that gcompat could provide all
> > symbols which are not either public APIs (extensions you can
> > legitimately use in source) or musl-header-induced ABIs (for example
> > things like __ctype_get_mb_cur_max, which is used to define the
> > MB_CUR_MAX macro). This would include LFS64 as well as the "__xstat"
> > stuff, the other __ctype_* stuff, etc.
>
> I think I'd like to go foward with this. Further work on time64 has
> made it apparent to me that the current glibc ABI-compat we have
> inside musl is fragile and is imposing unwanted constraints on musl,
> which has long been one of the criteria for exclusion. In particular,
> consider this situation:
>
> Several structures that are part of public interfaces in musl were
> created with extra space reserved for future extension. In some cases
> the reserved space was added by musl; in other cases glibc had the
> same. However, if we mandate glibc ABI-compat, *all* of this reserved
> space is permanently unusable:
>
> - If the reserved space is specific to musl, then reads from it may
>   fault, and stores to it may clobber unrelated memory, if the
>   structure was allocated by glibc-linked code.
>
> - If the reserved space is present in both musl and glibc, we can't
>   make use of it without risking that glibc makes some different use
>   of it in the future, making calls from glibc-linked code dangerous.
>
> This came up in the context of structs rusage and timex, but also
> applies to stat, sched_param, sysinfo, statvfs, and perhaps others,
> which might have reason for wanting extensibility in the future.
>
> Right now, without the glibc ABI-compat constraint, getrusage, wait3,
> and wait4 can avoid new time64 remappings entirely (by using the
> reserved space we already have in rusage, which glibc doesn't have at
> all). [clock_]adjtime[x] hit the second case -- glibc also has
> reserved space in timex, but if they end up wanting to use it for
> something else and we've put the 64-bit time there, we may be in
> trouble.
>
> I don't think the rusage and timex issues here are compelling by
> themselves. It's not a big deal to make compat shims here, and I might
> still end up doing it. But I think it's indicative that maintaining
> glibc ABI-compat in musl is going to become increasingly problematic.
>
> So, what I'd (tentatively; for discussion) like to do:
>
> When ldso loads an application or shared library and detects that it's
> glibc-linked (DT_NEEDED for libc.so.6), it both loads a gcompat
> library instead *and* flags the dso as needing ABI-compat. The gcompat
> library would be permanently RTLD_LOCAL, unable to be used for
> resolving global symbols, since it would have to define symbols
> conflicting with libc symbols names and with future directions of the
> musl ABI.
>
> Symbol lookups when relocating such a flagged dso would take place by
> first processing gcompat (logically, adding it to the head of the dso
> search list), then the normal symbol search order. The gcompat library
> could also provide a replacement dlsym function, so that dlsym calls
> from the glibc-linked DSO also follow this order, and a replacement
> dlopen, so that dlopen of libc from the glibc-linked DSO would get the
> gcompat module.
>
> I'm not sure what mechanism gcompat would then use to make its own
> references to the underlying real libc functions. This is something
> we'd need to think about.
>
> Before we decide to do it, please be aware that this would be a bit of
> a burden on gcompat to do more than it's doing now. But it would also
> make lots of cases work that fundamentally *can't* work now -- compat
> with 32-bit code using the legacy 32-bit off_t functions, compat with
> 64-bit code using regexec, etc. -- anywhere the musl ABI currently
> conflicts with the glibc ABI. Of course much of this is optional. The
> new things that would be mandatory would mainly be moving over
> existing glibc compat shims (like the __ctype and __xstat stuff) and
> implementing converting wrappers where musl's use of reserved space
> creates unsafety/incompatibility with the existing glibc code.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 6631 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-24 16:33             ` James Y Knight
@ 2019-07-24 17:36               ` Szabolcs Nagy
  2019-07-24 21:31                 ` Rich Felker
  2019-07-24 21:29               ` Rich Felker
  1 sibling, 1 reply; 18+ messages in thread
From: Szabolcs Nagy @ 2019-07-24 17:36 UTC (permalink / raw)
  To: musl

* James Y Knight <jyknight@google.com> [2019-07-24 09:33:05 -0700]:
> One thing I've not seen mentioned yet: if this is done, then anyone
> (whether intentionally or inadvertently) who links any glibc-compiled .o or
> .a files into a musl binary/shared-lib will be broken.
> 
> Up until now, with musl's mostly-glibc-compatible ABI, you could link the
> two object files together, and generally expect it to work. When
> compatibility is instead done with magic in the dynamic loader, that
> obviously can only ever work with a shared-object boundary.
> 
> I don't know if anyone actually uses musl in a context where this is likely
> to be a problem, but it at least seems worth discussing (and loudly
> documenting as a warning to users not to do this if implemented).

is it common that binary only .o or .a is distributed?

binary only shared libs with glibc dependency are fairly
common (plugins, userspace driver code etc). i think the
abi compat was mainly intended to support that.

> 
> 
> On Mon, Jul 22, 2019 at 8:53 AM Rich Felker <dalias@libc.org> wrote:
> 
> > On Wed, Jul 17, 2019 at 02:16:51PM -0400, Rich Felker wrote:
> > > On Wed, Jul 17, 2019 at 01:10:19PM -0500, A. Wilcox wrote:
> > > > >> Just trying to make sure the community has a clear view of what this
> > > > >> looks like before we jump in.
> > > > >
> > > > > Yes. This isn't a request to jump in, just looking at feasability and
> > > > > whether there'd be interest from your side. Being that ABI-compat
> > > > > doesn't actually work very well without gcompat right now, though, I
> > > > > think it might make sense. I'll continue to look at whether there are
> > > > > other options, possibly just transitional, that might be good too.
> > > >
> > > > I meant: I want a clear view of the boundaries between musl and
> > gcompat,
> > > > before we (Adélie / the gcompat team) jump in and start designing how
> > we
> > > > want to handle all the new symbols we may end up with :)
> > >
> > > If we go this route, I would think that gcompat could provide all
> > > symbols which are not either public APIs (extensions you can
> > > legitimately use in source) or musl-header-induced ABIs (for example
> > > things like __ctype_get_mb_cur_max, which is used to define the
> > > MB_CUR_MAX macro). This would include LFS64 as well as the "__xstat"
> > > stuff, the other __ctype_* stuff, etc.
> >
> > I think I'd like to go foward with this. Further work on time64 has
> > made it apparent to me that the current glibc ABI-compat we have
> > inside musl is fragile and is imposing unwanted constraints on musl,
> > which has long been one of the criteria for exclusion. In particular,
> > consider this situation:
> >
> > Several structures that are part of public interfaces in musl were
> > created with extra space reserved for future extension. In some cases
> > the reserved space was added by musl; in other cases glibc had the
> > same. However, if we mandate glibc ABI-compat, *all* of this reserved
> > space is permanently unusable:
> >
> > - If the reserved space is specific to musl, then reads from it may
> >   fault, and stores to it may clobber unrelated memory, if the
> >   structure was allocated by glibc-linked code.
> >
> > - If the reserved space is present in both musl and glibc, we can't
> >   make use of it without risking that glibc makes some different use
> >   of it in the future, making calls from glibc-linked code dangerous.
> >
> > This came up in the context of structs rusage and timex, but also
> > applies to stat, sched_param, sysinfo, statvfs, and perhaps others,
> > which might have reason for wanting extensibility in the future.
> >
> > Right now, without the glibc ABI-compat constraint, getrusage, wait3,
> > and wait4 can avoid new time64 remappings entirely (by using the
> > reserved space we already have in rusage, which glibc doesn't have at
> > all). [clock_]adjtime[x] hit the second case -- glibc also has
> > reserved space in timex, but if they end up wanting to use it for
> > something else and we've put the 64-bit time there, we may be in
> > trouble.
> >
> > I don't think the rusage and timex issues here are compelling by
> > themselves. It's not a big deal to make compat shims here, and I might
> > still end up doing it. But I think it's indicative that maintaining
> > glibc ABI-compat in musl is going to become increasingly problematic.
> >
> > So, what I'd (tentatively; for discussion) like to do:
> >
> > When ldso loads an application or shared library and detects that it's
> > glibc-linked (DT_NEEDED for libc.so.6), it both loads a gcompat
> > library instead *and* flags the dso as needing ABI-compat. The gcompat
> > library would be permanently RTLD_LOCAL, unable to be used for
> > resolving global symbols, since it would have to define symbols
> > conflicting with libc symbols names and with future directions of the
> > musl ABI.
> >
> > Symbol lookups when relocating such a flagged dso would take place by
> > first processing gcompat (logically, adding it to the head of the dso
> > search list), then the normal symbol search order. The gcompat library
> > could also provide a replacement dlsym function, so that dlsym calls
> > from the glibc-linked DSO also follow this order, and a replacement
> > dlopen, so that dlopen of libc from the glibc-linked DSO would get the
> > gcompat module.
> >
> > I'm not sure what mechanism gcompat would then use to make its own
> > references to the underlying real libc functions. This is something
> > we'd need to think about.
> >
> > Before we decide to do it, please be aware that this would be a bit of
> > a burden on gcompat to do more than it's doing now. But it would also
> > make lots of cases work that fundamentally *can't* work now -- compat
> > with 32-bit code using the legacy 32-bit off_t functions, compat with
> > 64-bit code using regexec, etc. -- anywhere the musl ABI currently
> > conflicts with the glibc ABI. Of course much of this is optional. The
> > new things that would be mandatory would mainly be moving over
> > existing glibc compat shims (like the __ctype and __xstat stuff) and
> > implementing converting wrappers where musl's use of reserved space
> > creates unsafety/incompatibility with the existing glibc code.
> >
> > Rich
> >


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-24 16:33             ` James Y Knight
  2019-07-24 17:36               ` Szabolcs Nagy
@ 2019-07-24 21:29               ` Rich Felker
  2019-07-25 16:42                 ` James Y Knight
  1 sibling, 1 reply; 18+ messages in thread
From: Rich Felker @ 2019-07-24 21:29 UTC (permalink / raw)
  To: musl

On Wed, Jul 24, 2019 at 09:33:05AM -0700, James Y Knight wrote:
> One thing I've not seen mentioned yet: if this is done, then anyone
> (whether intentionally or inadvertently) who links any glibc-compiled .o or
> ..a files into a musl binary/shared-lib will be broken.

If it referenced glibc symbols that have been moved out of musl, it
would just fail to link (at ld time or ldso time, depending on program
binary/shared lib). The only way it would be silently broken is with
symbols where glibc and musl share the same symbol name but with
different ABI (like regexec on 64-bit, which is already possible now,
or the non-64bit-off_t functions on 32-bit archs, or lots of stuff on
mips and powerpc where there's minimal or no ABI-compat).

For the time64 stuff, my thought is to try to use redirected-symbol
names that don't match whatever names glibc will be using, so that
there's no risk of the link accidentally succeeding. I think it makes
sense in general to try to have ABI match when we add symbols that
will also exist in glibc, on the archs that have ABI-compat.

> Up until now, with musl's mostly-glibc-compatible ABI, you could link the
> two object files together, and generally expect it to work. When
> compatibility is instead done with magic in the dynamic loader, that
> obviously can only ever work with a shared-object boundary.
> 
> I don't know if anyone actually uses musl in a context where this is likely
> to be a problem, but it at least seems worth discussing (and loudly
> documenting as a warning to users not to do this if implemented).

My thought, for the things where it matters, is that it's an
improvement to fail. If you really want it to work (e.g. if you have a
binary-only static library you need to use), you can probably use
objcopy or similar to remap the symbols to shims.

Does my above analysis sound reasonable to you?

Rich


> On Mon, Jul 22, 2019 at 8:53 AM Rich Felker <dalias@libc.org> wrote:
> 
> > On Wed, Jul 17, 2019 at 02:16:51PM -0400, Rich Felker wrote:
> > > On Wed, Jul 17, 2019 at 01:10:19PM -0500, A. Wilcox wrote:
> > > > >> Just trying to make sure the community has a clear view of what this
> > > > >> looks like before we jump in.
> > > > >
> > > > > Yes. This isn't a request to jump in, just looking at feasability and
> > > > > whether there'd be interest from your side. Being that ABI-compat
> > > > > doesn't actually work very well without gcompat right now, though, I
> > > > > think it might make sense. I'll continue to look at whether there are
> > > > > other options, possibly just transitional, that might be good too.
> > > >
> > > > I meant: I want a clear view of the boundaries between musl and
> > gcompat,
> > > > before we (Adélie / the gcompat team) jump in and start designing how
> > we
> > > > want to handle all the new symbols we may end up with :)
> > >
> > > If we go this route, I would think that gcompat could provide all
> > > symbols which are not either public APIs (extensions you can
> > > legitimately use in source) or musl-header-induced ABIs (for example
> > > things like __ctype_get_mb_cur_max, which is used to define the
> > > MB_CUR_MAX macro). This would include LFS64 as well as the "__xstat"
> > > stuff, the other __ctype_* stuff, etc.
> >
> > I think I'd like to go foward with this. Further work on time64 has
> > made it apparent to me that the current glibc ABI-compat we have
> > inside musl is fragile and is imposing unwanted constraints on musl,
> > which has long been one of the criteria for exclusion. In particular,
> > consider this situation:
> >
> > Several structures that are part of public interfaces in musl were
> > created with extra space reserved for future extension. In some cases
> > the reserved space was added by musl; in other cases glibc had the
> > same. However, if we mandate glibc ABI-compat, *all* of this reserved
> > space is permanently unusable:
> >
> > - If the reserved space is specific to musl, then reads from it may
> >   fault, and stores to it may clobber unrelated memory, if the
> >   structure was allocated by glibc-linked code.
> >
> > - If the reserved space is present in both musl and glibc, we can't
> >   make use of it without risking that glibc makes some different use
> >   of it in the future, making calls from glibc-linked code dangerous.
> >
> > This came up in the context of structs rusage and timex, but also
> > applies to stat, sched_param, sysinfo, statvfs, and perhaps others,
> > which might have reason for wanting extensibility in the future.
> >
> > Right now, without the glibc ABI-compat constraint, getrusage, wait3,
> > and wait4 can avoid new time64 remappings entirely (by using the
> > reserved space we already have in rusage, which glibc doesn't have at
> > all). [clock_]adjtime[x] hit the second case -- glibc also has
> > reserved space in timex, but if they end up wanting to use it for
> > something else and we've put the 64-bit time there, we may be in
> > trouble.
> >
> > I don't think the rusage and timex issues here are compelling by
> > themselves. It's not a big deal to make compat shims here, and I might
> > still end up doing it. But I think it's indicative that maintaining
> > glibc ABI-compat in musl is going to become increasingly problematic.
> >
> > So, what I'd (tentatively; for discussion) like to do:
> >
> > When ldso loads an application or shared library and detects that it's
> > glibc-linked (DT_NEEDED for libc.so.6), it both loads a gcompat
> > library instead *and* flags the dso as needing ABI-compat. The gcompat
> > library would be permanently RTLD_LOCAL, unable to be used for
> > resolving global symbols, since it would have to define symbols
> > conflicting with libc symbols names and with future directions of the
> > musl ABI.
> >
> > Symbol lookups when relocating such a flagged dso would take place by
> > first processing gcompat (logically, adding it to the head of the dso
> > search list), then the normal symbol search order. The gcompat library
> > could also provide a replacement dlsym function, so that dlsym calls
> > from the glibc-linked DSO also follow this order, and a replacement
> > dlopen, so that dlopen of libc from the glibc-linked DSO would get the
> > gcompat module.
> >
> > I'm not sure what mechanism gcompat would then use to make its own
> > references to the underlying real libc functions. This is something
> > we'd need to think about.
> >
> > Before we decide to do it, please be aware that this would be a bit of
> > a burden on gcompat to do more than it's doing now. But it would also
> > make lots of cases work that fundamentally *can't* work now -- compat
> > with 32-bit code using the legacy 32-bit off_t functions, compat with
> > 64-bit code using regexec, etc. -- anywhere the musl ABI currently
> > conflicts with the glibc ABI. Of course much of this is optional. The
> > new things that would be mandatory would mainly be moving over
> > existing glibc compat shims (like the __ctype and __xstat stuff) and
> > implementing converting wrappers where musl's use of reserved space
> > creates unsafety/incompatibility with the existing glibc code.
> >
> > Rich
> >


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-24 17:36               ` Szabolcs Nagy
@ 2019-07-24 21:31                 ` Rich Felker
  0 siblings, 0 replies; 18+ messages in thread
From: Rich Felker @ 2019-07-24 21:31 UTC (permalink / raw)
  To: musl

On Wed, Jul 24, 2019 at 07:36:00PM +0200, Szabolcs Nagy wrote:
> * James Y Knight <jyknight@google.com> [2019-07-24 09:33:05 -0700]:
> > One thing I've not seen mentioned yet: if this is done, then anyone
> > (whether intentionally or inadvertently) who links any glibc-compiled .o or
> > .a files into a musl binary/shared-lib will be broken.
> > 
> > Up until now, with musl's mostly-glibc-compatible ABI, you could link the
> > two object files together, and generally expect it to work. When
> > compatibility is instead done with magic in the dynamic loader, that
> > obviously can only ever work with a shared-object boundary.
> > 
> > I don't know if anyone actually uses musl in a context where this is likely
> > to be a problem, but it at least seems worth discussing (and loudly
> > documenting as a warning to users not to do this if implemented).
> 
> is it common that binary only .o or .a is distributed?
> 
> binary only shared libs with glibc dependency are fairly
> common (plugins, userspace driver code etc). i think the
> abi compat was mainly intended to support that.

It may be common with proprietary middleware or userspace-drive stuff
for hardware devices, where presumably the idea of shipping a static
lib rather than a shared one is that you don't ship usable copies of
the middleware vendor's library too your customers along with your
product.

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-24 21:29               ` Rich Felker
@ 2019-07-25 16:42                 ` James Y Knight
  2019-07-25 20:03                   ` Rich Felker
  0 siblings, 1 reply; 18+ messages in thread
From: James Y Knight @ 2019-07-25 16:42 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 8389 bytes --]

On Wed, Jul 24, 2019 at 2:29 PM Rich Felker <dalias@libc.org> wrote:

> On Wed, Jul 24, 2019 at 09:33:05AM -0700, James Y Knight wrote:
> > One thing I've not seen mentioned yet: if this is done, then anyone
> > (whether intentionally or inadvertently) who links any glibc-compiled .o
> or
> > ..a files into a musl binary/shared-lib will be broken.
>
> If it referenced glibc symbols that have been moved out of musl, it
> would just fail to link (at ld time or ldso time, depending on program
> binary/shared lib). The only way it would be silently broken is with
> symbols where glibc and musl share the same symbol name but with
> different ABI (like regexec on 64-bit, which is already possible now,
> or the non-64bit-off_t functions on 32-bit archs, or lots of stuff on
> mips and powerpc where there's minimal or no ABI-compat).
>
> For the time64 stuff, my thought is to try to use redirected-symbol
> names that don't match whatever names glibc will be using, so that
> there's no risk of the link accidentally succeeding. I think it makes
> sense in general to try to have ABI match when we add symbols that
> will also exist in glibc, on the archs that have ABI-compat.
>
> > Up until now, with musl's mostly-glibc-compatible ABI, you could link the
> > two object files together, and generally expect it to work. When
> > compatibility is instead done with magic in the dynamic loader, that
> > obviously can only ever work with a shared-object boundary.
> >
> > I don't know if anyone actually uses musl in a context where this is
> likely
> > to be a problem, but it at least seems worth discussing (and loudly
> > documenting as a warning to users not to do this if implemented).
>
> My thought, for the things where it matters, is that it's an
> improvement to fail. If you really want it to work (e.g. if you have a
> binary-only static library you need to use), you can probably use
> objcopy or similar to remap the symbols to shims.
>
> Does my above analysis sound reasonable to you?
>

I had understood from your previous emails that musl would start dropping
glibc-abi-compatibility (potentially in general, not just for the
64-bit-time transition) of existing "undecorated" functions, and then
restore compatibility only in a shadowed version of that same function name
in libgcompat.so.

But yes -- just dropping symbols and triggering a link error seems totally
fine. My worry was mainly that there would be mysterious runtime bugs,
especially if a given function's ABI had previously been compatible, and
now becomes incompatible.

And again, I don't think it's a non-starter to make such a change, only
that if that is to happen, it should happen with deliberation and notice to
users.

Rich
>
>
> > On Mon, Jul 22, 2019 at 8:53 AM Rich Felker <dalias@libc.org> wrote:
> >
> > > On Wed, Jul 17, 2019 at 02:16:51PM -0400, Rich Felker wrote:
> > > > On Wed, Jul 17, 2019 at 01:10:19PM -0500, A. Wilcox wrote:
> > > > > >> Just trying to make sure the community has a clear view of what
> this
> > > > > >> looks like before we jump in.
> > > > > >
> > > > > > Yes. This isn't a request to jump in, just looking at
> feasability and
> > > > > > whether there'd be interest from your side. Being that ABI-compat
> > > > > > doesn't actually work very well without gcompat right now,
> though, I
> > > > > > think it might make sense. I'll continue to look at whether
> there are
> > > > > > other options, possibly just transitional, that might be good
> too.
> > > > >
> > > > > I meant: I want a clear view of the boundaries between musl and
> > > gcompat,
> > > > > before we (Adélie / the gcompat team) jump in and start designing
> how
> > > we
> > > > > want to handle all the new symbols we may end up with :)
> > > >
> > > > If we go this route, I would think that gcompat could provide all
> > > > symbols which are not either public APIs (extensions you can
> > > > legitimately use in source) or musl-header-induced ABIs (for example
> > > > things like __ctype_get_mb_cur_max, which is used to define the
> > > > MB_CUR_MAX macro). This would include LFS64 as well as the "__xstat"
> > > > stuff, the other __ctype_* stuff, etc.
> > >
> > > I think I'd like to go foward with this. Further work on time64 has
> > > made it apparent to me that the current glibc ABI-compat we have
> > > inside musl is fragile and is imposing unwanted constraints on musl,
> > > which has long been one of the criteria for exclusion. In particular,
> > > consider this situation:
> > >
> > > Several structures that are part of public interfaces in musl were
> > > created with extra space reserved for future extension. In some cases
> > > the reserved space was added by musl; in other cases glibc had the
> > > same. However, if we mandate glibc ABI-compat, *all* of this reserved
> > > space is permanently unusable:
> > >
> > > - If the reserved space is specific to musl, then reads from it may
> > >   fault, and stores to it may clobber unrelated memory, if the
> > >   structure was allocated by glibc-linked code.
> > >
> > > - If the reserved space is present in both musl and glibc, we can't
> > >   make use of it without risking that glibc makes some different use
> > >   of it in the future, making calls from glibc-linked code dangerous.
> > >
> > > This came up in the context of structs rusage and timex, but also
> > > applies to stat, sched_param, sysinfo, statvfs, and perhaps others,
> > > which might have reason for wanting extensibility in the future.
> > >
> > > Right now, without the glibc ABI-compat constraint, getrusage, wait3,
> > > and wait4 can avoid new time64 remappings entirely (by using the
> > > reserved space we already have in rusage, which glibc doesn't have at
> > > all). [clock_]adjtime[x] hit the second case -- glibc also has
> > > reserved space in timex, but if they end up wanting to use it for
> > > something else and we've put the 64-bit time there, we may be in
> > > trouble.
> > >
> > > I don't think the rusage and timex issues here are compelling by
> > > themselves. It's not a big deal to make compat shims here, and I might
> > > still end up doing it. But I think it's indicative that maintaining
> > > glibc ABI-compat in musl is going to become increasingly problematic.
> > >
> > > So, what I'd (tentatively; for discussion) like to do:
> > >
> > > When ldso loads an application or shared library and detects that it's
> > > glibc-linked (DT_NEEDED for libc.so.6), it both loads a gcompat
> > > library instead *and* flags the dso as needing ABI-compat. The gcompat
> > > library would be permanently RTLD_LOCAL, unable to be used for
> > > resolving global symbols, since it would have to define symbols
> > > conflicting with libc symbols names and with future directions of the
> > > musl ABI.
> > >
> > > Symbol lookups when relocating such a flagged dso would take place by
> > > first processing gcompat (logically, adding it to the head of the dso
> > > search list), then the normal symbol search order. The gcompat library
> > > could also provide a replacement dlsym function, so that dlsym calls
> > > from the glibc-linked DSO also follow this order, and a replacement
> > > dlopen, so that dlopen of libc from the glibc-linked DSO would get the
> > > gcompat module.
> > >
> > > I'm not sure what mechanism gcompat would then use to make its own
> > > references to the underlying real libc functions. This is something
> > > we'd need to think about.
> > >
> > > Before we decide to do it, please be aware that this would be a bit of
> > > a burden on gcompat to do more than it's doing now. But it would also
> > > make lots of cases work that fundamentally *can't* work now -- compat
> > > with 32-bit code using the legacy 32-bit off_t functions, compat with
> > > 64-bit code using regexec, etc. -- anywhere the musl ABI currently
> > > conflicts with the glibc ABI. Of course much of this is optional. The
> > > new things that would be mandatory would mainly be moving over
> > > existing glibc compat shims (like the __ctype and __xstat stuff) and
> > > implementing converting wrappers where musl's use of reserved space
> > > creates unsafety/incompatibility with the existing glibc code.
> > >
> > > Rich
> > >
>

[-- Attachment #2: Type: text/html, Size: 10113 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Removing glibc from the musl .2 ABI
  2019-07-25 16:42                 ` James Y Knight
@ 2019-07-25 20:03                   ` Rich Felker
  0 siblings, 0 replies; 18+ messages in thread
From: Rich Felker @ 2019-07-25 20:03 UTC (permalink / raw)
  To: musl

On Thu, Jul 25, 2019 at 09:42:23AM -0700, James Y Knight wrote:
> On Wed, Jul 24, 2019 at 2:29 PM Rich Felker <dalias@libc.org> wrote:
> 
> > On Wed, Jul 24, 2019 at 09:33:05AM -0700, James Y Knight wrote:
> > > One thing I've not seen mentioned yet: if this is done, then anyone
> > > (whether intentionally or inadvertently) who links any glibc-compiled .o
> > or
> > > ..a files into a musl binary/shared-lib will be broken.
> >
> > If it referenced glibc symbols that have been moved out of musl, it
> > would just fail to link (at ld time or ldso time, depending on program
> > binary/shared lib). The only way it would be silently broken is with
> > symbols where glibc and musl share the same symbol name but with
> > different ABI (like regexec on 64-bit, which is already possible now,
> > or the non-64bit-off_t functions on 32-bit archs, or lots of stuff on
> > mips and powerpc where there's minimal or no ABI-compat).
> >
> > For the time64 stuff, my thought is to try to use redirected-symbol
> > names that don't match whatever names glibc will be using, so that
> > there's no risk of the link accidentally succeeding. I think it makes
> > sense in general to try to have ABI match when we add symbols that
> > will also exist in glibc, on the archs that have ABI-compat.
> >
> > > Up until now, with musl's mostly-glibc-compatible ABI, you could link the
> > > two object files together, and generally expect it to work. When
> > > compatibility is instead done with magic in the dynamic loader, that
> > > obviously can only ever work with a shared-object boundary.
> > >
> > > I don't know if anyone actually uses musl in a context where this is
> > likely
> > > to be a problem, but it at least seems worth discussing (and loudly
> > > documenting as a warning to users not to do this if implemented).
> >
> > My thought, for the things where it matters, is that it's an
> > improvement to fail. If you really want it to work (e.g. if you have a
> > binary-only static library you need to use), you can probably use
> > objcopy or similar to remap the symbols to shims.
> >
> > Does my above analysis sound reasonable to you?
> 
> I had understood from your previous emails that musl would start dropping
> glibc-abi-compatibility (potentially in general, not just for the
> 64-bit-time transition) of existing "undecorated" functions, and then
> restore compatibility only in a shadowed version of that same function name
> in libgcompat.so.

Unless I misunderstand what you're saying, that's impossible without
also dropping musl-ABI compatibility. So no, it wouldn't happen.

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-07-25 20:03 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-11 23:58 Removing glibc from the musl .2 ABI A. Wilcox
2019-07-12  0:51 ` Khem Raj
2019-07-12  1:45 ` Rich Felker
2019-07-12  1:47   ` Rich Felker
2019-07-17  3:37 ` Rich Felker
2019-07-17 13:13   ` A. Wilcox
2019-07-17 15:11     ` Rich Felker
2019-07-17 18:10       ` A. Wilcox
2019-07-17 18:16         ` Rich Felker
2019-07-22 15:52           ` Rich Felker
2019-07-24 15:17             ` Szabolcs Nagy
2019-07-24 16:02               ` Rich Felker
2019-07-24 16:33             ` James Y Knight
2019-07-24 17:36               ` Szabolcs Nagy
2019-07-24 21:31                 ` Rich Felker
2019-07-24 21:29               ` Rich Felker
2019-07-25 16:42                 ` James Y Knight
2019-07-25 20:03                   ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).