From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14443 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: James Y Knight Newsgroups: gmane.linux.lib.musl.general Subject: Re: Removing glibc from the musl .2 ABI Date: Wed, 24 Jul 2019 09:33:05 -0700 Message-ID: References: <20190717033735.GJ1506@brightrain.aerifal.cx> <2d36174f-ae85-bcd2-1c71-10f50513b1a6@adelielinux.org> <20190717151107.GL1506@brightrain.aerifal.cx> <20190717181651.GN1506@brightrain.aerifal.cx> <20190722155259.GA7445@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000b5e1de058e6fde93" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="114176"; mail-complaints-to="usenet@blaine.gmane.org" To: musl@lists.openwall.com Original-X-From: musl-return-14459-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jul 24 18:33:48 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1hqKD6-000Tap-1l for gllmg-musl@m.gmane.org; Wed, 24 Jul 2019 18:33:48 +0200 Original-Received: (qmail 17507 invoked by uid 550); 24 Jul 2019 16:33:45 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 17485 invoked from network); 24 Jul 2019 16:33:45 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=p45JRgXrNSvOv1mOVbRG9KtMRWVCiWGSi5xOVgKRzIQ=; b=uO/SY7jtUS0QmSiIENR7EbAj78QI2EyTy4k4T1ryCPLmked4f+v9RXvHiHp8IsryvE rXNo0j0ClqPJjAvyLC/l15HP3Iraac/HFzorVXyFTW4au6fw2tm0inkqoD8pwpfTP8rn DVHcthIqhhFTGwjLb9eJi8u7KLluJHkZEtp6IS1IZkrQinuTgTxNkwtHG5x5O4xFIZ2P NbWjfls6Bl5B6S5sI8bFljCybPAJdl0IMs9m1fwcLDPcz/59XTQYtly/D0wWSKV1DSgS Ob1WbOP6WOGcwfHvYGeZGdFoTW//QUBitQL8oYKPuV0jmZgSUD0mebzkv1Npl9RWvI2Q t+ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=p45JRgXrNSvOv1mOVbRG9KtMRWVCiWGSi5xOVgKRzIQ=; b=m6VfJ1IKxOekxwYXiI9nPbseFe0oapcQGxZClUMQpjbM40sTM8uKnO+zED/33ltnnY IJRkJRkesxEu3N6yHp1jWcBPtO4f8hq6w4foYPfS22YsOhqYTPEs53T+VYVKfiD244eu jLzZa3E0gHEoMpwKMOXscKWVCjPIj5wC3Z/PGU9T9a9CJYBnDC4KPzhRGCTw2A6gmnUa 8JgZV/jRhSXVYfalhH5JYsD6cwp4hVrQ37ZUnZe5y7UxkgNDb7N+YCSHZmCTmUBFuE6e 8PQOaU5YR55DP7omza7hBA/NUcqFV0S8fgQruUiXTc63RNk3e0ICBaGOqt8gORnm5iiM hDhA== X-Gm-Message-State: APjAAAUag3Av59dQnLfmbd2L4k3GxtXTlMNTDLxJ5QXhtolKwX12TZ+C JaQnZ78StI3aZ1w0flyQx0v8QmNj7SVPMQpJ8F7W00aXVPQ= X-Google-Smtp-Source: APXvYqy8sdSS9QUURvr9kzd6DAXeNOY6Cdt/eZirnASEBnizFymB5jx6fpb+2W5FBkmBb2ykh0YZGsSi7g/y/8z24Iw= X-Received: by 2002:ab0:2789:: with SMTP id t9mr36372732uap.69.1563986012491; Wed, 24 Jul 2019 09:33:32 -0700 (PDT) In-Reply-To: <20190722155259.GA7445@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:14443 Archived-At: --000000000000b5e1de058e6fde93 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable One thing I've not seen mentioned yet: if this is done, then anyone (whether intentionally or inadvertently) who links any glibc-compiled .o or .a files into a musl binary/shared-lib will be broken. Up until now, with musl's mostly-glibc-compatible ABI, you could link the two object files together, and generally expect it to work. When compatibility is instead done with magic in the dynamic loader, that obviously can only ever work with a shared-object boundary. I don't know if anyone actually uses musl in a context where this is likely to be a problem, but it at least seems worth discussing (and loudly documenting as a warning to users not to do this if implemented). On Mon, Jul 22, 2019 at 8:53 AM Rich Felker wrote: > On Wed, Jul 17, 2019 at 02:16:51PM -0400, Rich Felker wrote: > > On Wed, Jul 17, 2019 at 01:10:19PM -0500, A. Wilcox wrote: > > > >> Just trying to make sure the community has a clear view of what th= is > > > >> looks like before we jump in. > > > > > > > > Yes. This isn't a request to jump in, just looking at feasability a= nd > > > > whether there'd be interest from your side. Being that ABI-compat > > > > doesn't actually work very well without gcompat right now, though, = I > > > > think it might make sense. I'll continue to look at whether there a= re > > > > other options, possibly just transitional, that might be good too. > > > > > > I meant: I want a clear view of the boundaries between musl and > gcompat, > > > before we (Ad=C3=A9lie / the gcompat team) jump in and start designin= g how > we > > > want to handle all the new symbols we may end up with :) > > > > If we go this route, I would think that gcompat could provide all > > symbols which are not either public APIs (extensions you can > > legitimately use in source) or musl-header-induced ABIs (for example > > things like __ctype_get_mb_cur_max, which is used to define the > > MB_CUR_MAX macro). This would include LFS64 as well as the "__xstat" > > stuff, the other __ctype_* stuff, etc. > > I think I'd like to go foward with this. Further work on time64 has > made it apparent to me that the current glibc ABI-compat we have > inside musl is fragile and is imposing unwanted constraints on musl, > which has long been one of the criteria for exclusion. In particular, > consider this situation: > > Several structures that are part of public interfaces in musl were > created with extra space reserved for future extension. In some cases > the reserved space was added by musl; in other cases glibc had the > same. However, if we mandate glibc ABI-compat, *all* of this reserved > space is permanently unusable: > > - If the reserved space is specific to musl, then reads from it may > fault, and stores to it may clobber unrelated memory, if the > structure was allocated by glibc-linked code. > > - If the reserved space is present in both musl and glibc, we can't > make use of it without risking that glibc makes some different use > of it in the future, making calls from glibc-linked code dangerous. > > This came up in the context of structs rusage and timex, but also > applies to stat, sched_param, sysinfo, statvfs, and perhaps others, > which might have reason for wanting extensibility in the future. > > Right now, without the glibc ABI-compat constraint, getrusage, wait3, > and wait4 can avoid new time64 remappings entirely (by using the > reserved space we already have in rusage, which glibc doesn't have at > all). [clock_]adjtime[x] hit the second case -- glibc also has > reserved space in timex, but if they end up wanting to use it for > something else and we've put the 64-bit time there, we may be in > trouble. > > I don't think the rusage and timex issues here are compelling by > themselves. It's not a big deal to make compat shims here, and I might > still end up doing it. But I think it's indicative that maintaining > glibc ABI-compat in musl is going to become increasingly problematic. > > So, what I'd (tentatively; for discussion) like to do: > > When ldso loads an application or shared library and detects that it's > glibc-linked (DT_NEEDED for libc.so.6), it both loads a gcompat > library instead *and* flags the dso as needing ABI-compat. The gcompat > library would be permanently RTLD_LOCAL, unable to be used for > resolving global symbols, since it would have to define symbols > conflicting with libc symbols names and with future directions of the > musl ABI. > > Symbol lookups when relocating such a flagged dso would take place by > first processing gcompat (logically, adding it to the head of the dso > search list), then the normal symbol search order. The gcompat library > could also provide a replacement dlsym function, so that dlsym calls > from the glibc-linked DSO also follow this order, and a replacement > dlopen, so that dlopen of libc from the glibc-linked DSO would get the > gcompat module. > > I'm not sure what mechanism gcompat would then use to make its own > references to the underlying real libc functions. This is something > we'd need to think about. > > Before we decide to do it, please be aware that this would be a bit of > a burden on gcompat to do more than it's doing now. But it would also > make lots of cases work that fundamentally *can't* work now -- compat > with 32-bit code using the legacy 32-bit off_t functions, compat with > 64-bit code using regexec, etc. -- anywhere the musl ABI currently > conflicts with the glibc ABI. Of course much of this is optional. The > new things that would be mandatory would mainly be moving over > existing glibc compat shims (like the __ctype and __xstat stuff) and > implementing converting wrappers where musl's use of reserved space > creates unsafety/incompatibility with the existing glibc code. > > Rich > --000000000000b5e1de058e6fde93 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
One thing I've not seen mentioned yet: if this is done= , then anyone (whether intentionally or inadvertently) who links any glibc-= compiled .o or .a files into a musl binary/shared-lib will be broken.
<= br>
Up until now, with musl's mostly-glibc-compatible ABI, yo= u could link the two object files together, and generally expect it to work= . When compatibility is instead done with magic in the dynamic loader, that= obviously can only ever work with a shared-object boundary.

=
I don't know if anyone actually uses musl in a context where= this is likely to be a problem, but it at least seems worth discussing (an= d loudly documenting as a warning to users not to do this if implemented).<= br>


On Mon, Jul 22, 2019 at 8:53 AM Rich Felker <dalias@libc.org> wrote:
On Wed, Jul 17, 2019 at 02:1= 6:51PM -0400, Rich Felker wrote:
> On Wed, Jul 17, 2019 at 01:10:19PM -0500, A. Wilcox wrote:
> > >> Just trying to make sure the community has a clear view = of what this
> > >> looks like before we jump in.
> > >
> > > Yes. This isn't a request to jump in, just looking at fe= asability and
> > > whether there'd be interest from your side. Being that A= BI-compat
> > > doesn't actually work very well without gcompat right no= w, though, I
> > > think it might make sense. I'll continue to look at whet= her there are
> > > other options, possibly just transitional, that might be goo= d too.
> >
> > I meant: I want a clear view of the boundaries between musl and g= compat,
> > before we (Ad=C3=A9lie / the gcompat team) jump in and start desi= gning how we
> > want to handle all the new symbols we may end up with :)
>
> If we go this route, I would think that gcompat could provide all
> symbols which are not either public APIs (extensions you can
> legitimately use in source) or musl-header-induced ABIs (for example > things like __ctype_get_mb_cur_max, which is used to define the
> MB_CUR_MAX macro). This would include LFS64 as well as the "__xst= at"
> stuff, the other __ctype_* stuff, etc.

I think I'd like to go foward with this. Further work on time64 has
made it apparent to me that the current glibc ABI-compat we have
inside musl is fragile and is imposing unwanted constraints on musl,
which has long been one of the criteria for exclusion. In particular,
consider this situation:

Several structures that are part of public interfaces in musl were
created with extra space reserved for future extension. In some cases
the reserved space was added by musl; in other cases glibc had the
same. However, if we mandate glibc ABI-compat, *all* of this reserved
space is permanently unusable:

- If the reserved space is specific to musl, then reads from it may
=C2=A0 fault, and stores to it may clobber unrelated memory, if the
=C2=A0 structure was allocated by glibc-linked code.

- If the reserved space is present in both musl and glibc, we can't
=C2=A0 make use of it without risking that glibc makes some different use =C2=A0 of it in the future, making calls from glibc-linked code dangerous.<= br>
This came up in the context of structs rusage and timex, but also
applies to stat, sched_param, sysinfo, statvfs, and perhaps others,
which might have reason for wanting extensibility in the future.

Right now, without the glibc ABI-compat constraint, getrusage, wait3,
and wait4 can avoid new time64 remappings entirely (by using the
reserved space we already have in rusage, which glibc doesn't have at all). [clock_]adjtime[x] hit the second case -- glibc also has
reserved space in timex, but if they end up wanting to use it for
something else and we've put the 64-bit time there, we may be in
trouble.

I don't think the rusage and timex issues here are compelling by
themselves. It's not a big deal to make compat shims here, and I might<= br> still end up doing it. But I think it's indicative that maintaining
glibc ABI-compat in musl is going to become increasingly problematic.

So, what I'd (tentatively; for discussion) like to do:

When ldso loads an application or shared library and detects that it's<= br> glibc-linked (DT_NEEDED for libc.so.6), it both loads a gcompat
library instead *and* flags the dso as needing ABI-compat. The gcompat
library would be permanently RTLD_LOCAL, unable to be used for
resolving global symbols, since it would have to define symbols
conflicting with libc symbols names and with future directions of the
musl ABI.

Symbol lookups when relocating such a flagged dso would take place by
first processing gcompat (logically, adding it to the head of the dso
search list), then the normal symbol search order. The gcompat library
could also provide a replacement dlsym function, so that dlsym calls
from the glibc-linked DSO also follow this order, and a replacement
dlopen, so that dlopen of libc from the glibc-linked DSO would get the
gcompat module.

I'm not sure what mechanism gcompat would then use to make its own
references to the underlying real libc functions. This is something
we'd need to think about.

Before we decide to do it, please be aware that this would be a bit of
a burden on gcompat to do more than it's doing now. But it would also make lots of cases work that fundamentally *can't* work now -- compat with 32-bit code using the legacy 32-bit off_t functions, compat with
64-bit code using regexec, etc. -- anywhere the musl ABI currently
conflicts with the glibc ABI. Of course much of this is optional. The
new things that would be mandatory would mainly be moving over
existing glibc compat shims (like the __ctype and __xstat stuff) and
implementing converting wrappers where musl's use of reserved space
creates unsafety/incompatibility with the existing glibc code.

Rich
--000000000000b5e1de058e6fde93--