max_align_t mess on i386

mailing list of musl libc
 help / color / mirror / code / Atom feed

* max_align_t mess on i386
@ 2019-12-14 15:19 Rich Felker
  2019-12-14 17:51 ` Florian Weimer
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Rich Felker @ 2019-12-14 15:19 UTC (permalink / raw)
  To: musl

In reserching how much memory could be saved, and how practical it
would be, for the new malloc to align only to 8-byte boundaries
instead of 16-byte on archs where alignof(max_align_t) is 8 (pretty
much all 32-bit archs), I discovered that GCC quietly changed its
idead of i386 max_align_t to 16-byte alignment in GCC 7, to better
accommodate the new _Float128 access via SSE. Presumably (I haven't
checked) the change is reflected with changes in the psABI document to
make it "official".

This is a somewhat ABI-breaking change (for example it would break ABI
stuct layout for any 3rd party library struct using max_align_t to
align part of a public type), but GCC folks seem to have done the
research at the time to indicate there wasn't anything affected in
practice in known published code.

The big question now is: should we change musl's i386 max_align_t to
follow? One of the advantages of not using compiler-provided headers
is that we don't get this kind of silent ABI change happening out from
under us, or have ABI depend on whether you used GC <=6 vs GCC >=7 to
compile (which is a rather awful property). But it also means we have
to make conscious decisions about following.

I was thinking of trying to make this decision in the next release
cycle (1.2.1) along with merging new malloc, so that we don't
potentially have a single release that drops i386 to 8-byte alignment
followed by one increasing it right back, and making further
combinatoric compat problems. But I realized just now that with time64
already being a hit to ABI-compat between pairs of libc consumers,
changing max_align_t at the same time, if we're going to do it, would
probably be better. FWIW I think this change is *far* less impactful
than time64 in terms of compate.

The disadvantage of changing max_align_t is that we shut out the
possibility of using 8-byte alloction granularity (on i386), which
looks like it could save something like 10-15% memory usage in typical
programs with small allocated objects (see also: Causes of Bloat,
Limits of Health paper[1]), but even up to 33% where the choice is
between 24 and 32 byte allocated slots for a 13-20 byte structure or
string (note: average tends to be half of max if requested sizes are
uniform, but at such small sizes they may tend to be non-uniform).
However, whatever we do with i386, the option of using 8-byte
granularity remains open for all the other 32-bit archs, most of which
tend to be used with machines far more memory-constrained than i386.

The disadvantage of leaving max_align_t alone is that we have to
(continue to) consider _Float128 an unsupported extension type whose
use would be outside the scope of any guarantees we make, and that
would need memalign to use. This is largely viable at present because
it's a fringe thing, but we don't know if that will continue to be
true far in the future.

I'm somewhat leaning towards following the ABI change, because (1) we
have a good opportunity to do it now that we won't get again, and (2)
I'm worried we'll eventually get into a mess by not doing it. But I
want to offer the opportunity for discussion before finalizing
anything, especially in case there are considerations I'm missing.

Rich

[1] https://ftp.barfooze.de/pub/sabotage/tmp/oopsla2007-bloat.pdf

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-14 15:19 max_align_t mess on i386 Rich Felker
@ 2019-12-14 17:51 ` Florian Weimer
  2019-12-14 18:17   ` Rich Felker
  2019-12-15 18:04   ` Rich Felker
  2019-12-15  5:47 ` Markus Wichmann
  2019-12-15 18:06 ` Jeffrey Walton
  2 siblings, 2 replies; 21+ messages in thread
From: Florian Weimer @ 2019-12-14 17:51 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> However, whatever we do with i386, the option of using 8-byte
> granularity remains open for all the other 32-bit archs, most of which
> tend to be used with machines far more memory-constrained than i386.

Note that powerpc has a similar issue, but with long double:

  <https://sourceware.org/bugzilla/show_bug.cgi?id=6527>

But perhaps musl follows the old powerpc ABI, where double and long
double are both binary64 (I have not checked, sorry).


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-14 17:51 ` Florian Weimer
@ 2019-12-14 18:17   ` Rich Felker
  2019-12-14 18:53     ` Daniel Kolesa
  2019-12-15 18:04   ` Rich Felker
  1 sibling, 1 reply; 21+ messages in thread
From: Rich Felker @ 2019-12-14 18:17 UTC (permalink / raw)
  To: musl

On Sat, Dec 14, 2019 at 06:51:50PM +0100, Florian Weimer wrote:
> * Rich Felker:
> 
> > However, whatever we do with i386, the option of using 8-byte
> > granularity remains open for all the other 32-bit archs, most of which
> > tend to be used with machines far more memory-constrained than i386.
> 
> Note that powerpc has a similar issue, but with long double:
> 
>   <https://sourceware.org/bugzilla/show_bug.cgi?id=6527>
> 
> But perhaps musl follows the old powerpc ABI, where double and long
> double are both binary64 (I have not checked, sorry).

We use the ld64 powerpc ABI. musl doesn't support non-IEEE-semantics
floating point types (stuff like IBM double-double) and quad was not
an option at the time, and if it's even supported now it's messy and
requires very recent tooling.

BTW I know someone from our community doing both musl and glibc stuff
on powerpc is actually interested in continuing to use the ld64 ABI
(with the old compat symbols) on glibc due to problems with
double-double support in applications.

Rich

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-14 18:17   ` Rich Felker
@ 2019-12-14 18:53     ` Daniel Kolesa
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel Kolesa @ 2019-12-14 18:53 UTC (permalink / raw)
  To: Rich Felker, musl

On 12/14/19 7:17 PM, Rich Felker wrote:
> On Sat, Dec 14, 2019 at 06:51:50PM +0100, Florian Weimer wrote:
>> * Rich Felker:
>>
>>> However, whatever we do with i386, the option of using 8-byte
>>> granularity remains open for all the other 32-bit archs, most of which
>>> tend to be used with machines far more memory-constrained than i386.
>> Note that powerpc has a similar issue, but with long double:
>>
>>    <https://sourceware.org/bugzilla/show_bug.cgi?id=6527>
>>
>> But perhaps musl follows the old powerpc ABI, where double and long
>> double are both binary64 (I have not checked, sorry).
> We use the ld64 powerpc ABI. musl doesn't support non-IEEE-semantics
> floating point types (stuff like IBM double-double) and quad was not
> an option at the time, and if it's even supported now it's messy and
> requires very recent tooling.
>
> BTW I know someone from our community doing both musl and glibc stuff
> on powerpc is actually interested in continuing to use the ld64 ABI
> (with the old compat symbols) on glibc due to problems with
> double-double support in applications.

Yes, that would be me. I've been looking into making my distribution use 
the old ld64 ABI for ppc (64le, 64, 32) but without much success. 
Technically, I did get it working for most part, thanks to glibc doing 
asm redirection, but there is still the edge case of people declaring 
math prototypes manually (which is allowed), which would result in the 
incorrect symbol being used, unless explicitly linked with 
-lnldbl_nonshared. So I put this effort on hold for the time being.

As far as I know, glibc is going to add support for IEEE754 binary128 
format (which distros like Fedora plan to use), which would require 
introduction of new symbol versions for stuff like math when built in 
that kind of configuration. However, this is only going to be available 
on platforms that support VSX (i.e. when built for POWER7 or better). 
I've been wondering if, while doing that, it would be possible to 
reintroduce support for the ld64 ABI in glibc, as in, the binary64 
symbols would have the same version as the binary128 ones.

Perhaps my thinking is wrong, but as I see it, it would mean no 
compatibility breakage then. Configurations using the IBM long double 
ABI would keep using their older versions, and configurations built to 
use the IEEE754 long double would use the newer versions, either 
binary128 (for VSX platforms) or binary64 (for the others). And that's 
what I would do as well; switch ppc64le+glibc to IEEE754 binary128, and 
have all musl variants plus ppc64 and ppc stick with binary64.

Florian, since you seem to be familiar with this, would you mind telling 
me if I'm wrong and if I am, why? And is there any chance of upstream 
glibc potentially accepting such change?

Daniel

>
> Rich
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-14 15:19 max_align_t mess on i386 Rich Felker
  2019-12-14 17:51 ` Florian Weimer
@ 2019-12-15  5:47 ` Markus Wichmann
  2019-12-15 18:06 ` Jeffrey Walton
  2 siblings, 0 replies; 21+ messages in thread
From: Markus Wichmann @ 2019-12-15  5:47 UTC (permalink / raw)
  To: musl

On Sat, Dec 14, 2019 at 10:19:32AM -0500, Rich Felker wrote:
> The disadvantage of leaving max_align_t alone is that we have to
> (continue to) consider _Float128 an unsupported extension type whose
> use would be outside the scope of any guarantees we make, and that
> would need memalign to use. This is largely viable at present because
> it's a fringe thing, but we don't know if that will continue to be
> true far in the future.
>

It wouldn't just be that. Any application making use of SSE vector types
would have to use *memalign(). Apparently, there are libraries out there
that expect to get a 16byte alignment out of malloc(), or at least
that's what the author of dietlibc is alleging here:

https://blog.fefe.de/?ts=bac7bb06

Yes, it's German, but Google Translate exists. More importantly though,
it is from 2006, and he says he's "hacking about with" a bignum library,
and I don't know if he means his own or a public one. In any case,
though, the mere existance of SSE was cause enough for that man to
change the allocator to return a higher alignment on x86. Maybe one more
factor leaning towards the ABI change, right?

> Rich

Ciao,
Markus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-14 17:51 ` Florian Weimer
  2019-12-14 18:17   ` Rich Felker
@ 2019-12-15 18:04   ` Rich Felker
  1 sibling, 0 replies; 21+ messages in thread
From: Rich Felker @ 2019-12-15 18:04 UTC (permalink / raw)
  To: musl

On Sat, Dec 14, 2019 at 06:51:50PM +0100, Florian Weimer wrote:
> * Rich Felker:
> 
> > However, whatever we do with i386, the option of using 8-byte
> > granularity remains open for all the other 32-bit archs, most of which
> > tend to be used with machines far more memory-constrained than i386.
> 
> Note that powerpc has a similar issue, but with long double:
> 
>   <https://sourceware.org/bugzilla/show_bug.cgi?id=6527>
> 
> But perhaps musl follows the old powerpc ABI, where double and long
> double are both binary64 (I have not checked, sorry).

One thing we should consider though: since presumably the psABI has
max_align_t as 16-byte alignment on powerpc now, if we increase i386
should we also increase powerpc? Even though there's no type actually
depending on it? This also applies to powerpc64 too, I think, which is
an arch not being affected by time64 change.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-14 15:19 max_align_t mess on i386 Rich Felker
  2019-12-14 17:51 ` Florian Weimer
  2019-12-15  5:47 ` Markus Wichmann
@ 2019-12-15 18:06 ` Jeffrey Walton
  2019-12-15 18:22   ` Rich Felker
  2019-12-15 18:23   ` Joakim Sindholt
  2 siblings, 2 replies; 21+ messages in thread
From: Jeffrey Walton @ 2019-12-15 18:06 UTC (permalink / raw)
  To: musl

On Sat, Dec 14, 2019 at 10:19 AM Rich Felker <dalias@libc.org> wrote:
>
> In reserching how much memory could be saved, and how practical it
> would be, for the new malloc to align only to 8-byte boundaries
> instead of 16-byte on archs where alignof(max_align_t) is 8 (pretty
> much all 32-bit archs), I discovered that GCC quietly changed its
> idead of i386 max_align_t to 16-byte alignment in GCC 7, to better
> accommodate the new _Float128 access via SSE. Presumably (I haven't
> checked) the change is reflected with changes in the psABI document to
> make it "official".

Be careful with policy changes like this. The malloc (3) man page says:

    The malloc() and calloc() functions return a pointer to the
    allocated memory that is suitably aligned for any kind of variable.

I expect to be able to use a pointer returned by malloc (and friends)
in MMX, SSE and AVX functions.

Jeff


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-15 18:06 ` Jeffrey Walton
@ 2019-12-15 18:22   ` Rich Felker
  2019-12-16 15:30     ` Jeffrey Walton
  2019-12-15 18:23   ` Joakim Sindholt
  1 sibling, 1 reply; 21+ messages in thread
From: Rich Felker @ 2019-12-15 18:22 UTC (permalink / raw)
  To: musl

On Sun, Dec 15, 2019 at 01:06:29PM -0500, Jeffrey Walton wrote:
> On Sat, Dec 14, 2019 at 10:19 AM Rich Felker <dalias@libc.org> wrote:
> >
> > In reserching how much memory could be saved, and how practical it
> > would be, for the new malloc to align only to 8-byte boundaries
> > instead of 16-byte on archs where alignof(max_align_t) is 8 (pretty
> > much all 32-bit archs), I discovered that GCC quietly changed its
> > idead of i386 max_align_t to 16-byte alignment in GCC 7, to better
> > accommodate the new _Float128 access via SSE. Presumably (I haven't
> > checked) the change is reflected with changes in the psABI document to
> > make it "official".
> 
> Be careful with policy changes like this. The malloc (3) man page says:

Generally, you should look to the C11 or POSIX (man 3p) specifications
for the functions rather than the "man 3" ones, but here it's pretty
close to the same, just imprecisely worded:

>     The malloc() and calloc() functions return a pointer to the
>     allocated memory that is suitably aligned for any kind of variable.
> 
> I expect to be able to use a pointer returned by malloc (and friends)
> in MMX, SSE and AVX functions.

"Any kind of variable" isn't "any kind of load/store instruction". For
example you most certainly will not get 32- or 64-byte alignment that
you may want for AVX-256 or AVX-512 without memalign. A max_align_t
(and corresponding malloc alignment constraint) that heavily aligned
would be awful to use, with memory waste possibly exceeding 1000% and
over 500% likely for real-world data structures. Over-alignment also
weakens hardening properties by making pointers more predictable.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-15 18:06 ` Jeffrey Walton
  2019-12-15 18:22   ` Rich Felker
@ 2019-12-15 18:23   ` Joakim Sindholt
  2019-12-15 18:51     ` Rich Felker
  1 sibling, 1 reply; 21+ messages in thread
From: Joakim Sindholt @ 2019-12-15 18:23 UTC (permalink / raw)
  To: musl

On Sun, Dec 15, 2019 at 01:06:29PM -0500, Jeffrey Walton wrote:
> On Sat, Dec 14, 2019 at 10:19 AM Rich Felker <dalias@libc.org> wrote:
> >
> > In reserching how much memory could be saved, and how practical it
> > would be, for the new malloc to align only to 8-byte boundaries
> > instead of 16-byte on archs where alignof(max_align_t) is 8 (pretty
> > much all 32-bit archs), I discovered that GCC quietly changed its
> > idead of i386 max_align_t to 16-byte alignment in GCC 7, to better
> > accommodate the new _Float128 access via SSE. Presumably (I haven't
> > checked) the change is reflected with changes in the psABI document to
> > make it "official".
> 
> Be careful with policy changes like this. The malloc (3) man page says:
> 
>     The malloc() and calloc() functions return a pointer to the
>     allocated memory that is suitably aligned for any kind of variable.

Your man pages are not the standard, but the standard does have this to
say:
> The pointer returned if the allocation succeeds shall be suitably
> aligned so that it may be assigned to a pointer to any type of object
> and then used to access such an object in the space allocated (until the
> space is explicitly freed or reallocated).

To me this sounds like my next suggestion is technically disallowed.

> I expect to be able to use a pointer returned by malloc (and friends)
> in MMX, SSE and AVX functions.

I might agree, but would it not be feasible to have the alignment of the
returned pointer be dependent on the size of the allocation? That way,
if you allocate <16 bytes you can get 8 byte alignment. You might even
be able to go all the way down to 4 byte alignment for <8 byte
allocations.
It might violate the standard technically speaking, but I don't know of
any examples of types smaller than 16 bytes that require 16 byte
alignment.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-15 18:23   ` Joakim Sindholt
@ 2019-12-15 18:51     ` Rich Felker
  2019-12-15 20:03       ` Alexander Monakov
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2019-12-15 18:51 UTC (permalink / raw)
  To: musl

On Sun, Dec 15, 2019 at 07:23:14PM +0100, Joakim Sindholt wrote:
> On Sun, Dec 15, 2019 at 01:06:29PM -0500, Jeffrey Walton wrote:
> > On Sat, Dec 14, 2019 at 10:19 AM Rich Felker <dalias@libc.org> wrote:
> > >
> > > In reserching how much memory could be saved, and how practical it
> > > would be, for the new malloc to align only to 8-byte boundaries
> > > instead of 16-byte on archs where alignof(max_align_t) is 8 (pretty
> > > much all 32-bit archs), I discovered that GCC quietly changed its
> > > idead of i386 max_align_t to 16-byte alignment in GCC 7, to better
> > > accommodate the new _Float128 access via SSE. Presumably (I haven't
> > > checked) the change is reflected with changes in the psABI document to
> > > make it "official".
> > 
> > Be careful with policy changes like this. The malloc (3) man page says:
> > 
> >     The malloc() and calloc() functions return a pointer to the
> >     allocated memory that is suitably aligned for any kind of variable.
> 
> Your man pages are not the standard, but the standard does have this to
> say:
> > The pointer returned if the allocation succeeds shall be suitably
> > aligned so that it may be assigned to a pointer to any type of object
> > and then used to access such an object in the space allocated (until the
> > space is explicitly freed or reallocated).
> 
> To me this sounds like my next suggestion is technically disallowed.
> 
> > I expect to be able to use a pointer returned by malloc (and friends)
> > in MMX, SSE and AVX functions.
> 
> I might agree, but would it not be feasible to have the alignment of the
> returned pointer be dependent on the size of the allocation? That way,
> if you allocate <16 bytes you can get 8 byte alignment. You might even
> be able to go all the way down to 4 byte alignment for <8 byte
> allocations.

This is a nice idea and the bump allocator (simple_malloc) in musl for
static-linked programs that don't use free does pretty much exactly
that. With a nontrivial allocator it gets more complicated though, and
I don't think there's any way to take advantage of this with the new
malloc.

For example, in the new allocator with 4-byte inband slot headers,
16-byte slots don't need 16-byte alignment because the largest object
they can hold is 12 bytes, and the largest alignment such an object
can need is 8-byte. However, since they're spaced 16 bytes apart,
there's no advantage to being able to misalign them mod 16; as long as
the first one in a run is aligned, all of them are.

The same would apply if we had 8-byte slots, but those are mostly
uninteresting with 4 bytes taken for headers.

Taking advantage of it with dlmalloc-type designs that don't involve
evenly-spaced slots is perhaps more practical, but can lead to messy
split/merge since the small underaligned chunks aren't starting on
valid boundaries to merge with adjacent free chunks. I think they'll
tend to eventually get tied up as unusable space at the bottom of
adjacent chunks, unnecessarily limiting the size of the allocations
just below them.

> It might violate the standard technically speaking, but I don't know of
> any examples of types smaller than 16 bytes that require 16 byte
> alignment.

It doesn't since no object can have size smaller than its alignment.
(As long as pointer types aren't lossy; if some pointer types lost low
bits, then it would be non-conforming.)

Rich

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-15 18:51     ` Rich Felker
@ 2019-12-15 20:03       ` Alexander Monakov
  2019-12-15 20:50         ` Szabolcs Nagy
  2019-12-15 21:51         ` Jeffrey Walton
  0 siblings, 2 replies; 21+ messages in thread
From: Alexander Monakov @ 2019-12-15 20:03 UTC (permalink / raw)
  To: musl

On Sun, 15 Dec 2019, Rich Felker wrote:

> > It might violate the standard technically speaking, but I don't know of
> > any examples of types smaller than 16 bytes that require 16 byte
> > alignment.
> 
> It doesn't since no object can have size smaller than its alignment.
> (As long as pointer types aren't lossy; if some pointer types lost low
> bits, then it would be non-conforming.)

Yeah. I believe one usual concern is whether low bits may be expected to be
zero in case one wants to carry a couple of bits along with the pointer.

On one hand, C doesn't say what it means for an arbitrary pointer to be
suitably aligned for a particular type. On the other hand, in practice
everyone assumes that it means that its value is divisible by alignment,
and so on platforms with _Alignof(max_align_t) == 16, it means that low 4 bits
of any address returned from malloc (including those with tiny allocated
storage) will be zero.  Which makes those bit positions available for flags
associated with the pointer, if you can arrange for them to be masked off
to use the pointer itself.

(in principle a compiler could transform a program like that too, and unlike
a programmer the compiler knows exactly what it means for a pointer to be
aligned)

So if such use is accepted as valid, malloc needs to ensure alignment despite
a small allocation size.

Alexander

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-15 20:03       ` Alexander Monakov
@ 2019-12-15 20:50         ` Szabolcs Nagy
  2019-12-15 21:51         ` Jeffrey Walton
  1 sibling, 0 replies; 21+ messages in thread
From: Szabolcs Nagy @ 2019-12-15 20:50 UTC (permalink / raw)
  To: musl

* Alexander Monakov <amonakov@ispras.ru> [2019-12-15 23:03:08 +0300]:
> On Sun, 15 Dec 2019, Rich Felker wrote:
> 
> > > It might violate the standard technically speaking, but I don't know of
> > > any examples of types smaller than 16 bytes that require 16 byte
> > > alignment.
> > 
> > It doesn't since no object can have size smaller than its alignment.
> > (As long as pointer types aren't lossy; if some pointer types lost low
> > bits, then it would be non-conforming.)
> 
> Yeah. I believe one usual concern is whether low bits may be expected to be
> zero in case one wants to carry a couple of bits along with the pointer.
> 
> On one hand, C doesn't say what it means for an arbitrary pointer to be
> suitably aligned for a particular type. On the other hand, in practice
> everyone assumes that it means that its value is divisible by alignment,
> and so on platforms with _Alignof(max_align_t) == 16, it means that low 4 bits
> of any address returned from malloc (including those with tiny allocated
> storage) will be zero.  Which makes those bit positions available for flags
> associated with the pointer, if you can arrange for them to be masked off
> to use the pointer itself.
> 
> (in principle a compiler could transform a program like that too, and unlike
> a programmer the compiler knows exactly what it means for a pointer to be
> aligned)
> 
> So if such use is accepted as valid, malloc needs to ensure alignment despite
> a small allocation size.

i think iso c is unclear, but that will change in c2x
which allows small alignment for small objects

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2293.htm


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-15 20:03       ` Alexander Monakov
  2019-12-15 20:50         ` Szabolcs Nagy
@ 2019-12-15 21:51         ` Jeffrey Walton
  1 sibling, 0 replies; 21+ messages in thread
From: Jeffrey Walton @ 2019-12-15 21:51 UTC (permalink / raw)
  To: musl

On Sun, Dec 15, 2019 at 3:03 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
> ...
> [SNIP] Which makes those bit positions available for flags
> associated with the pointer, if you can arrange for them to be masked off
> to use the pointer itself.

Be careful of those tricks. I believe they are called Tagged Pointers.

Aarch64 was doing it for a while. It caused a lot of problems in
practice. It was breaking diagnostic tools. It was also holding up the
porting of some libraries.

See, for example,
https://releases.llvm.org/6.0.0/tools/clang/docs/HardwareAssistedAddressSanitizerDesign.html
and https://bugzilla.mozilla.org/show_bug.cgi?id=1143022 .

Jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-15 18:22   ` Rich Felker
@ 2019-12-16 15:30     ` Jeffrey Walton
  2019-12-16 15:56       ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Jeffrey Walton @ 2019-12-16 15:30 UTC (permalink / raw)
  To: musl

On Sun, Dec 15, 2019 at 1:22 PM Rich Felker <dalias@libc.org> wrote:
>
> On Sun, Dec 15, 2019 at 01:06:29PM -0500, Jeffrey Walton wrote:
> > On Sat, Dec 14, 2019 at 10:19 AM Rich Felker <dalias@libc.org> wrote:
> > >
> > > In reserching how much memory could be saved, and how practical it
> > > would be, for the new malloc to align only to 8-byte boundaries
> > > instead of 16-byte on archs where alignof(max_align_t) is 8 (pretty
> > > much all 32-bit archs), I discovered that GCC quietly changed its
> > > idead of i386 max_align_t to 16-byte alignment in GCC 7, to better
> > > accommodate the new _Float128 access via SSE. Presumably (I haven't
> > > checked) the change is reflected with changes in the psABI document to
> > > make it "official".
> >
> > Be careful with policy changes like this. The malloc (3) man page says:
>
> Generally, you should look to the C11 or POSIX (man 3p) specifications
> for the functions rather than the "man 3" ones, but here it's pretty
> close to the same, just imprecisely worded:
>
> >     The malloc() and calloc() functions return a pointer to the
> >     allocated memory that is suitably aligned for any kind of variable.
> >
> > I expect to be able to use a pointer returned by malloc (and friends)
> > in MMX, SSE and AVX functions.
>
> "Any kind of variable" isn't "any kind of load/store instruction". For
> example you most certainly will not get 32- or 64-byte alignment that
> you may want for AVX-256 or AVX-512 without memalign.

GCC tells us the largest alignment that we can expect:

    $ gcc -dM -E - </dev/null | grep -i align
    #define __BIGGEST_ALIGNMENT__ 16

Because __BIGGEST_ALIGNMENT__ is 16, I don't expect to get 32-byte or
64-byte aligned buffers.

> A max_align_t
> (and corresponding malloc alignment constraint) that heavily aligned
> would be awful to use, with memory waste possibly exceeding 1000% and
> over 500% likely for real-world data structures. Over-alignment also
> weakens hardening properties by making pointers more predictable.

It sounds like you are moving the fragmentation problem from the
runtime library to the application. (When fragmentation is a problem).

Jeff


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-16 15:30     ` Jeffrey Walton
@ 2019-12-16 15:56       ` Rich Felker
  2019-12-16 16:36         ` Jeffrey Walton
  2019-12-16 16:40         ` Florian Weimer
  0 siblings, 2 replies; 21+ messages in thread
From: Rich Felker @ 2019-12-16 15:56 UTC (permalink / raw)
  To: musl

On Mon, Dec 16, 2019 at 10:30:30AM -0500, Jeffrey Walton wrote:
> On Sun, Dec 15, 2019 at 1:22 PM Rich Felker <dalias@libc.org> wrote:
> >
> > On Sun, Dec 15, 2019 at 01:06:29PM -0500, Jeffrey Walton wrote:
> > > On Sat, Dec 14, 2019 at 10:19 AM Rich Felker <dalias@libc.org> wrote:
> > > >
> > > > In reserching how much memory could be saved, and how practical it
> > > > would be, for the new malloc to align only to 8-byte boundaries
> > > > instead of 16-byte on archs where alignof(max_align_t) is 8 (pretty
> > > > much all 32-bit archs), I discovered that GCC quietly changed its
> > > > idead of i386 max_align_t to 16-byte alignment in GCC 7, to better
> > > > accommodate the new _Float128 access via SSE. Presumably (I haven't
> > > > checked) the change is reflected with changes in the psABI document to
> > > > make it "official".
> > >
> > > Be careful with policy changes like this. The malloc (3) man page says:
> >
> > Generally, you should look to the C11 or POSIX (man 3p) specifications
> > for the functions rather than the "man 3" ones, but here it's pretty
> > close to the same, just imprecisely worded:
> >
> > >     The malloc() and calloc() functions return a pointer to the
> > >     allocated memory that is suitably aligned for any kind of variable.
> > >
> > > I expect to be able to use a pointer returned by malloc (and friends)
> > > in MMX, SSE and AVX functions.
> >
> > "Any kind of variable" isn't "any kind of load/store instruction". For
> > example you most certainly will not get 32- or 64-byte alignment that
> > you may want for AVX-256 or AVX-512 without memalign.
> 
> GCC tells us the largest alignment that we can expect:
> 
>     $ gcc -dM -E - </dev/null | grep -i align
>     #define __BIGGEST_ALIGNMENT__ 16
> 
> Because __BIGGEST_ALIGNMENT__ is 16, I don't expect to get 32-byte or
> 64-byte aligned buffers.

I wasn't aware of this gcc feature. Do you know if it's documented and
what it's derived from? It seems to match what max_align_t is expected
to be, including on i386 (16) and powerpc (16) and indeed it's only 4
on a few 32-bit archs and even 2 on m68k.

> > A max_align_t
> > (and corresponding malloc alignment constraint) that heavily aligned
> > would be awful to use, with memory waste possibly exceeding 1000% and
> > over 500% likely for real-world data structures. Over-alignment also
> > weakens hardening properties by making pointers more predictable.
> 
> It sounds like you are moving the fragmentation problem from the
> runtime library to the application. (When fragmentation is a problem).

I don't understand what you mean.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-16 15:56       ` Rich Felker
@ 2019-12-16 16:36         ` Jeffrey Walton
  2019-12-16 17:49           ` Rich Felker
  2019-12-16 16:40         ` Florian Weimer
  1 sibling, 1 reply; 21+ messages in thread
From: Jeffrey Walton @ 2019-12-16 16:36 UTC (permalink / raw)
  To: musl

On Mon, Dec 16, 2019 at 10:56 AM Rich Felker <dalias@libc.org> wrote:
>
> On Mon, Dec 16, 2019 at 10:30:30AM -0500, Jeffrey Walton wrote:
> > On Sun, Dec 15, 2019 at 1:22 PM Rich Felker <dalias@libc.org> wrote:
> > >
> > > On Sun, Dec 15, 2019 at 01:06:29PM -0500, Jeffrey Walton wrote:
> > > > On Sat, Dec 14, 2019 at 10:19 AM Rich Felker <dalias@libc.org> wrote:
> > > > >
> > > > > In reserching how much memory could be saved, and how practical it
> > > > > would be, for the new malloc to align only to 8-byte boundaries
> > > > > instead of 16-byte on archs where alignof(max_align_t) is 8 (pretty
> > > > > much all 32-bit archs), I discovered that GCC quietly changed its
> > > > > idead of i386 max_align_t to 16-byte alignment in GCC 7, to better
> > > > > accommodate the new _Float128 access via SSE. Presumably (I haven't
> > > > > checked) the change is reflected with changes in the psABI document to
> > > > > make it "official".
> > > >
> > > > Be careful with policy changes like this. The malloc (3) man page says:
> > >
> > > Generally, you should look to the C11 or POSIX (man 3p) specifications
> > > for the functions rather than the "man 3" ones, but here it's pretty
> > > close to the same, just imprecisely worded:
> > >
> > > >     The malloc() and calloc() functions return a pointer to the
> > > >     allocated memory that is suitably aligned for any kind of variable.
> > > >
> > > > I expect to be able to use a pointer returned by malloc (and friends)
> > > > in MMX, SSE and AVX functions.
> > >
> > > "Any kind of variable" isn't "any kind of load/store instruction". For
> > > example you most certainly will not get 32- or 64-byte alignment that
> > > you may want for AVX-256 or AVX-512 without memalign.
> >
> > GCC tells us the largest alignment that we can expect:
> >
> >     $ gcc -dM -E - </dev/null | grep -i align
> >     #define __BIGGEST_ALIGNMENT__ 16
> >
> > Because __BIGGEST_ALIGNMENT__ is 16, I don't expect to get 32-byte or
> > 64-byte aligned buffers.
>
> I wasn't aware of this gcc feature. Do you know if it's documented and
> what it's derived from? It seems to match what max_align_t is expected
> to be, including on i386 (16) and powerpc (16) and indeed it's only 4
> on a few 32-bit archs and even 2 on m68k.

I believe it is documented at
https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html.

The linker problem discussed in the same area has bitten me several
times in the past. It usually arises on 32-bit systems. But PowerPC
also got me when using AIX.

> > > A max_align_t
> > > (and corresponding malloc alignment constraint) that heavily aligned
> > > would be awful to use, with memory waste possibly exceeding 1000% and
> > > over 500% likely for real-world data structures. Over-alignment also
> > > weakens hardening properties by making pointers more predictable.
> >
> > It sounds like you are moving the fragmentation problem from the
> > runtime library to the application. (When fragmentation is a problem).
>
> I don't understand what you mean.

When we can't get properly aligned buffers in userland, then we
(userland) have to over-commit in our allocators and play the pointer
games. For example, if I can only get 8-byte aligned pointers, then I
always have to allocate n+16 bytes, move the pointer 'p' to the right
for a 16 byte alignment, and store the offset at p-1 so I can delete
the base pointer on delete/free.

Those kind of pointer games are usually played out in the runtime
library. I can only says "usually" and not always because we have to
do them on AIX and GNU Hurd (among others).

Jeff


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-16 15:56       ` Rich Felker
  2019-12-16 16:36         ` Jeffrey Walton
@ 2019-12-16 16:40         ` Florian Weimer
  2019-12-16 17:45           ` Rich Felker
  1 sibling, 1 reply; 21+ messages in thread
From: Florian Weimer @ 2019-12-16 16:40 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> I wasn't aware of this gcc feature. Do you know if it's documented and
> what it's derived from? It seems to match what max_align_t is expected
> to be, including on i386 (16) and powerpc (16) and indeed it's only 4
> on a few 32-bit archs and even 2 on m68k.

@defmac BIGGEST_ALIGNMENT
Biggest alignment that any data type can require on this machine, in
bits.  Note that this is not the biggest alignment that is supported,
just the biggest alignment that, when violated, may cause a fault.
@end defmac

I don't think it does what you are after:

$ gcc -mavx512f -dM -E - </dev/null | grep -i align
#define __BIGGEST_ALIGNMENT__ 64

I suspect this is closer:

@defmac MALLOC_ABI_ALIGNMENT
Alignment, in bits, a C conformant malloc implementation has to
provide.  If not defined, the default value is @code{BITS_PER_WORD}.
@end defmac

I think this is what GCC uses internally in its optimizers.  I don't
think it's exposed directly.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-16 16:40         ` Florian Weimer
@ 2019-12-16 17:45           ` Rich Felker
  2019-12-16 17:49             ` Florian Weimer
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2019-12-16 17:45 UTC (permalink / raw)
  To: musl

On Mon, Dec 16, 2019 at 05:40:50PM +0100, Florian Weimer wrote:
> * Rich Felker:
> 
> > I wasn't aware of this gcc feature. Do you know if it's documented and
> > what it's derived from? It seems to match what max_align_t is expected
> > to be, including on i386 (16) and powerpc (16) and indeed it's only 4
> > on a few 32-bit archs and even 2 on m68k.
> 
> @defmac BIGGEST_ALIGNMENT
> Biggest alignment that any data type can require on this machine, in
> bits.  Note that this is not the biggest alignment that is supported,
> just the biggest alignment that, when violated, may cause a fault.
> @end defmac
> 
> I don't think it does what you are after:
> 
> $ gcc -mavx512f -dM -E - </dev/null | grep -i align
> #define __BIGGEST_ALIGNMENT__ 64

Indeed. Thanks.

> I suspect this is closer:
> 
> @defmac MALLOC_ABI_ALIGNMENT
> Alignment, in bits, a C conformant malloc implementation has to
> provide.  If not defined, the default value is @code{BITS_PER_WORD}.
> @end defmac

The latter looks buggy. It's clearly supposed to be in bits, not
bytes, with some archs defining it as 64 or 128 and:

gcc/defaults.h:#ifndef MALLOC_ABI_ALIGNMENT
gcc/defaults.h:#define MALLOC_ABI_ALIGNMENT BITS_PER_WORD

However arm has:

gcc/config/arm/arm.h:#define MALLOC_ABI_ALIGNMENT  BIGGEST_ALIGNMENT

which is in bytes...

> I think this is what GCC uses internally in its optimizers.  I don't
> think it's exposed directly.

No problem; I don't think we want to derive it dynamically from
something compiler-dependent. I was just looking for a way to check
what GCC's expectations are and this seems reasonable.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-16 17:45           ` Rich Felker
@ 2019-12-16 17:49             ` Florian Weimer
  2019-12-16 17:51               ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Florian Weimer @ 2019-12-16 17:49 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> The latter looks buggy. It's clearly supposed to be in bits, not
> bytes, with some archs defining it as 64 or 128 and:
>
> gcc/defaults.h:#ifndef MALLOC_ABI_ALIGNMENT
> gcc/defaults.h:#define MALLOC_ABI_ALIGNMENT BITS_PER_WORD
>
> However arm has:
>
> gcc/config/arm/arm.h:#define MALLOC_ABI_ALIGNMENT  BIGGEST_ALIGNMENT
>
> which is in bytes...

The target hook is in bits.  The macro synthesized from that is in
bytes:

  cpp_define_formatted (pfile, "__BIGGEST_ALIGNMENT__=%d",
                        BIGGEST_ALIGNMENT / BITS_PER_UNIT);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-16 16:36         ` Jeffrey Walton
@ 2019-12-16 17:49           ` Rich Felker
  0 siblings, 0 replies; 21+ messages in thread
From: Rich Felker @ 2019-12-16 17:49 UTC (permalink / raw)
  To: musl

On Mon, Dec 16, 2019 at 11:36:42AM -0500, Jeffrey Walton wrote:
> On Mon, Dec 16, 2019 at 10:56 AM Rich Felker <dalias@libc.org> wrote:
> >
> > On Mon, Dec 16, 2019 at 10:30:30AM -0500, Jeffrey Walton wrote:
> > > On Sun, Dec 15, 2019 at 1:22 PM Rich Felker <dalias@libc.org> wrote:
> > > >
> > > > On Sun, Dec 15, 2019 at 01:06:29PM -0500, Jeffrey Walton wrote:
> > > > > On Sat, Dec 14, 2019 at 10:19 AM Rich Felker <dalias@libc.org> wrote:
> > > > > >
> > > > > > In reserching how much memory could be saved, and how practical it
> > > > > > would be, for the new malloc to align only to 8-byte boundaries
> > > > > > instead of 16-byte on archs where alignof(max_align_t) is 8 (pretty
> > > > > > much all 32-bit archs), I discovered that GCC quietly changed its
> > > > > > idead of i386 max_align_t to 16-byte alignment in GCC 7, to better
> > > > > > accommodate the new _Float128 access via SSE. Presumably (I haven't
> > > > > > checked) the change is reflected with changes in the psABI document to
> > > > > > make it "official".
> > > > >
> > > > > Be careful with policy changes like this. The malloc (3) man page says:
> > > >
> > > > Generally, you should look to the C11 or POSIX (man 3p) specifications
> > > > for the functions rather than the "man 3" ones, but here it's pretty
> > > > close to the same, just imprecisely worded:
> > > >
> > > > >     The malloc() and calloc() functions return a pointer to the
> > > > >     allocated memory that is suitably aligned for any kind of variable.
> > > > >
> > > > > I expect to be able to use a pointer returned by malloc (and friends)
> > > > > in MMX, SSE and AVX functions.
> > > >
> > > > "Any kind of variable" isn't "any kind of load/store instruction". For
> > > > example you most certainly will not get 32- or 64-byte alignment that
> > > > you may want for AVX-256 or AVX-512 without memalign.
> > >
> > > GCC tells us the largest alignment that we can expect:
> > >
> > >     $ gcc -dM -E - </dev/null | grep -i align
> > >     #define __BIGGEST_ALIGNMENT__ 16
> > >
> > > Because __BIGGEST_ALIGNMENT__ is 16, I don't expect to get 32-byte or
> > > 64-byte aligned buffers.
> >
> > I wasn't aware of this gcc feature. Do you know if it's documented and
> > what it's derived from? It seems to match what max_align_t is expected
> > to be, including on i386 (16) and powerpc (16) and indeed it's only 4
> > on a few 32-bit archs and even 2 on m68k.
> 
> I believe it is documented at
> https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html.
> 
> The linker problem discussed in the same area has bitten me several
> times in the past. It usually arises on 32-bit systems. But PowerPC
> also got me when using AIX.
> 
> > > > A max_align_t
> > > > (and corresponding malloc alignment constraint) that heavily aligned
> > > > would be awful to use, with memory waste possibly exceeding 1000% and
> > > > over 500% likely for real-world data structures. Over-alignment also
> > > > weakens hardening properties by making pointers more predictable.
> > >
> > > It sounds like you are moving the fragmentation problem from the
> > > runtime library to the application. (When fragmentation is a problem).
> >
> > I don't understand what you mean.
> 
> When we can't get properly aligned buffers in userland, then we
> (userland) have to over-commit in our allocators and play the pointer
> games. For example, if I can only get 8-byte aligned pointers, then I
> always have to allocate n+16 bytes, move the pointer 'p' to the right
> for a 16 byte alignment, and store the offset at p-1 so I can delete
> the base pointer on delete/free.

You absolutely should never do this. Pretty much all historical
unix-like systems had (and still have) memalign, POSIX has
posix_memalign with an awkward and error-prone signature (but it's
easy enough to wrap), and C11+ has aligned_alloc. This "over-allocate
and adjust such that it's impossible to just call free" idiom is
something people did on Windows because Windows...

> Those kind of pointer games are usually played out in the runtime
> library. I can only says "usually" and not always because we have to
> do them on AIX and GNU Hurd (among others).

I don't understand your use of "userland" and "in the runtime
library". The only non-userland allocation is at page granularity (4k
or larger). If you mean at the application level (outside libc), this
is not something you need to do, at all.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: max_align_t mess on i386
  2019-12-16 17:49             ` Florian Weimer
@ 2019-12-16 17:51               ` Rich Felker
  0 siblings, 0 replies; 21+ messages in thread
From: Rich Felker @ 2019-12-16 17:51 UTC (permalink / raw)
  To: musl

On Mon, Dec 16, 2019 at 06:49:21PM +0100, Florian Weimer wrote:
> * Rich Felker:
> 
> > The latter looks buggy. It's clearly supposed to be in bits, not
> > bytes, with some archs defining it as 64 or 128 and:
> >
> > gcc/defaults.h:#ifndef MALLOC_ABI_ALIGNMENT
> > gcc/defaults.h:#define MALLOC_ABI_ALIGNMENT BITS_PER_WORD
> >
> > However arm has:
> >
> > gcc/config/arm/arm.h:#define MALLOC_ABI_ALIGNMENT  BIGGEST_ALIGNMENT
> >
> > which is in bytes...
> 
> The target hook is in bits.  The macro synthesized from that is in
> bytes:
> 
>   cpp_define_formatted (pfile, "__BIGGEST_ALIGNMENT__=%d",
>                         BIGGEST_ALIGNMENT / BITS_PER_UNIT);

Ah, that explains it. So no bug. Thanks.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-12-16 17:51 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-14 15:19 max_align_t mess on i386 Rich Felker
2019-12-14 17:51 ` Florian Weimer
2019-12-14 18:17   ` Rich Felker
2019-12-14 18:53     ` Daniel Kolesa
2019-12-15 18:04   ` Rich Felker
2019-12-15  5:47 ` Markus Wichmann
2019-12-15 18:06 ` Jeffrey Walton
2019-12-15 18:22   ` Rich Felker
2019-12-16 15:30     ` Jeffrey Walton
2019-12-16 15:56       ` Rich Felker
2019-12-16 16:36         ` Jeffrey Walton
2019-12-16 17:49           ` Rich Felker
2019-12-16 16:40         ` Florian Weimer
2019-12-16 17:45           ` Rich Felker
2019-12-16 17:49             ` Florian Weimer
2019-12-16 17:51               ` Rich Felker
2019-12-15 18:23   ` Joakim Sindholt
2019-12-15 18:51     ` Rich Felker
2019-12-15 20:03       ` Alexander Monakov
2019-12-15 20:50         ` Szabolcs Nagy
2019-12-15 21:51         ` Jeffrey Walton

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).