On Wed, Jul 24, 2019 at 2:29 PM Rich Felker wrote: > On Wed, Jul 24, 2019 at 09:33:05AM -0700, James Y Knight wrote: > > One thing I've not seen mentioned yet: if this is done, then anyone > > (whether intentionally or inadvertently) who links any glibc-compiled .o > or > > ..a files into a musl binary/shared-lib will be broken. > > If it referenced glibc symbols that have been moved out of musl, it > would just fail to link (at ld time or ldso time, depending on program > binary/shared lib). The only way it would be silently broken is with > symbols where glibc and musl share the same symbol name but with > different ABI (like regexec on 64-bit, which is already possible now, > or the non-64bit-off_t functions on 32-bit archs, or lots of stuff on > mips and powerpc where there's minimal or no ABI-compat). > > For the time64 stuff, my thought is to try to use redirected-symbol > names that don't match whatever names glibc will be using, so that > there's no risk of the link accidentally succeeding. I think it makes > sense in general to try to have ABI match when we add symbols that > will also exist in glibc, on the archs that have ABI-compat. > > > Up until now, with musl's mostly-glibc-compatible ABI, you could link the > > two object files together, and generally expect it to work. When > > compatibility is instead done with magic in the dynamic loader, that > > obviously can only ever work with a shared-object boundary. > > > > I don't know if anyone actually uses musl in a context where this is > likely > > to be a problem, but it at least seems worth discussing (and loudly > > documenting as a warning to users not to do this if implemented). > > My thought, for the things where it matters, is that it's an > improvement to fail. If you really want it to work (e.g. if you have a > binary-only static library you need to use), you can probably use > objcopy or similar to remap the symbols to shims. > > Does my above analysis sound reasonable to you? > I had understood from your previous emails that musl would start dropping glibc-abi-compatibility (potentially in general, not just for the 64-bit-time transition) of existing "undecorated" functions, and then restore compatibility only in a shadowed version of that same function name in libgcompat.so. But yes -- just dropping symbols and triggering a link error seems totally fine. My worry was mainly that there would be mysterious runtime bugs, especially if a given function's ABI had previously been compatible, and now becomes incompatible. And again, I don't think it's a non-starter to make such a change, only that if that is to happen, it should happen with deliberation and notice to users. Rich > > > > On Mon, Jul 22, 2019 at 8:53 AM Rich Felker wrote: > > > > > On Wed, Jul 17, 2019 at 02:16:51PM -0400, Rich Felker wrote: > > > > On Wed, Jul 17, 2019 at 01:10:19PM -0500, A. Wilcox wrote: > > > > > >> Just trying to make sure the community has a clear view of what > this > > > > > >> looks like before we jump in. > > > > > > > > > > > > Yes. This isn't a request to jump in, just looking at > feasability and > > > > > > whether there'd be interest from your side. Being that ABI-compat > > > > > > doesn't actually work very well without gcompat right now, > though, I > > > > > > think it might make sense. I'll continue to look at whether > there are > > > > > > other options, possibly just transitional, that might be good > too. > > > > > > > > > > I meant: I want a clear view of the boundaries between musl and > > > gcompat, > > > > > before we (Adélie / the gcompat team) jump in and start designing > how > > > we > > > > > want to handle all the new symbols we may end up with :) > > > > > > > > If we go this route, I would think that gcompat could provide all > > > > symbols which are not either public APIs (extensions you can > > > > legitimately use in source) or musl-header-induced ABIs (for example > > > > things like __ctype_get_mb_cur_max, which is used to define the > > > > MB_CUR_MAX macro). This would include LFS64 as well as the "__xstat" > > > > stuff, the other __ctype_* stuff, etc. > > > > > > I think I'd like to go foward with this. Further work on time64 has > > > made it apparent to me that the current glibc ABI-compat we have > > > inside musl is fragile and is imposing unwanted constraints on musl, > > > which has long been one of the criteria for exclusion. In particular, > > > consider this situation: > > > > > > Several structures that are part of public interfaces in musl were > > > created with extra space reserved for future extension. In some cases > > > the reserved space was added by musl; in other cases glibc had the > > > same. However, if we mandate glibc ABI-compat, *all* of this reserved > > > space is permanently unusable: > > > > > > - If the reserved space is specific to musl, then reads from it may > > > fault, and stores to it may clobber unrelated memory, if the > > > structure was allocated by glibc-linked code. > > > > > > - If the reserved space is present in both musl and glibc, we can't > > > make use of it without risking that glibc makes some different use > > > of it in the future, making calls from glibc-linked code dangerous. > > > > > > This came up in the context of structs rusage and timex, but also > > > applies to stat, sched_param, sysinfo, statvfs, and perhaps others, > > > which might have reason for wanting extensibility in the future. > > > > > > Right now, without the glibc ABI-compat constraint, getrusage, wait3, > > > and wait4 can avoid new time64 remappings entirely (by using the > > > reserved space we already have in rusage, which glibc doesn't have at > > > all). [clock_]adjtime[x] hit the second case -- glibc also has > > > reserved space in timex, but if they end up wanting to use it for > > > something else and we've put the 64-bit time there, we may be in > > > trouble. > > > > > > I don't think the rusage and timex issues here are compelling by > > > themselves. It's not a big deal to make compat shims here, and I might > > > still end up doing it. But I think it's indicative that maintaining > > > glibc ABI-compat in musl is going to become increasingly problematic. > > > > > > So, what I'd (tentatively; for discussion) like to do: > > > > > > When ldso loads an application or shared library and detects that it's > > > glibc-linked (DT_NEEDED for libc.so.6), it both loads a gcompat > > > library instead *and* flags the dso as needing ABI-compat. The gcompat > > > library would be permanently RTLD_LOCAL, unable to be used for > > > resolving global symbols, since it would have to define symbols > > > conflicting with libc symbols names and with future directions of the > > > musl ABI. > > > > > > Symbol lookups when relocating such a flagged dso would take place by > > > first processing gcompat (logically, adding it to the head of the dso > > > search list), then the normal symbol search order. The gcompat library > > > could also provide a replacement dlsym function, so that dlsym calls > > > from the glibc-linked DSO also follow this order, and a replacement > > > dlopen, so that dlopen of libc from the glibc-linked DSO would get the > > > gcompat module. > > > > > > I'm not sure what mechanism gcompat would then use to make its own > > > references to the underlying real libc functions. This is something > > > we'd need to think about. > > > > > > Before we decide to do it, please be aware that this would be a bit of > > > a burden on gcompat to do more than it's doing now. But it would also > > > make lots of cases work that fundamentally *can't* work now -- compat > > > with 32-bit code using the legacy 32-bit off_t functions, compat with > > > 64-bit code using regexec, etc. -- anywhere the musl ABI currently > > > conflicts with the glibc ABI. Of course much of this is optional. The > > > new things that would be mandatory would mainly be moving over > > > existing glibc compat shims (like the __ctype and __xstat stuff) and > > > implementing converting wrappers where musl's use of reserved space > > > creates unsafety/incompatibility with the existing glibc code. > > > > > > Rich > > > >