mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Potential bug in __res_msend_rc() wrt to union initialization.
@ 2024-03-18 19:56 Mike Cui
  2024-03-18 21:34 ` Rich Felker
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Cui @ 2024-03-18 19:56 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1918 bytes --]

I recently upgraded to clang-18, and after compiling musl with it, I
noticed that all my getaddrinfo() calls are failing. I tracked this to be
an issue in __res_msend_rc(), where the 'sin6' member of union 'sa' is
initialized to garbage, rather than 0. Then later bind() fails
with EADDRNOTAVAIL.

I reported this bug on clang discourse:
https://discourse.llvm.org/t/union-initialization-and-aliasing-clang-18-seems-to-miscompile-musl/77724,
and the discussion seems to suggest that there is potentially a bug in musl
as well. TL;DR:

- According to strict interpretation of the C standard, initializing a
union with '{0}', only initializes the first member of the union to 0 (in
this case, sin4), and "default" initializes the rest. This interpretation
is still up for debate. The proper way to initialize the entire union is '{
}' not '{ 0 }'.
- There is currently a bug in clang-18 that treats '{ }' to be the same as
'{ 0 }'. The proposed fix is to just zero out the entire union for both "{
0 }" and "{ }". However we cannot rely on "{ 0 }" to always zero out the
entire union in the future.

musl should be fixed to use "{ }" for initialization. And to work around
the current buggy release of clang-18, perhaps flip the order to make sin6
the first member of the struct? I've attached a patch that works for me.
There may be other instances of the same bug in the musl code base.

--- a/src/network/res_msend.c
+++ b/src/network/res_msend.c
@@ -83,9 +83,9 @@ int __res_msend_rc(int nqueries, const unsigned char
*const *queries,
        int fd;
        int timeout, attempts, retry_interval, servfail_retry;
        union {
-               struct sockaddr_in sin;
                struct sockaddr_in6 sin6;
-       } sa = {0}, ns[MAXNS] = {{0}};
+               struct sockaddr_in sin;
+       } sa = {}, ns[MAXNS] = {{}};
        socklen_t sl = sizeof sa.sin;
        int nns = 0;
        int family = AF_INET;

[-- Attachment #2: Type: text/html, Size: 2376 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-18 19:56 [musl] Potential bug in __res_msend_rc() wrt to union initialization Mike Cui
@ 2024-03-18 21:34 ` Rich Felker
  2024-03-18 22:22   ` NRK
  0 siblings, 1 reply; 17+ messages in thread
From: Rich Felker @ 2024-03-18 21:34 UTC (permalink / raw)
  To: Mike Cui; +Cc: musl

On Mon, Mar 18, 2024 at 12:56:41PM -0700, Mike Cui wrote:
> I recently upgraded to clang-18, and after compiling musl with it, I
> noticed that all my getaddrinfo() calls are failing. I tracked this to be
> an issue in __res_msend_rc(), where the 'sin6' member of union 'sa' is
> initialized to garbage, rather than 0. Then later bind() fails
> with EADDRNOTAVAIL.
> 
> I reported this bug on clang discourse:
> https://discourse.llvm.org/t/union-initialization-and-aliasing-clang-18-seems-to-miscompile-musl/77724,
> and the discussion seems to suggest that there is potentially a bug in musl
> as well. TL;DR:
> 
> - According to strict interpretation of the C standard, initializing a
> union with '{0}', only initializes the first member of the union to 0 (in
> this case, sin4), and "default" initializes the rest. This interpretation
> is still up for debate. The proper way to initialize the entire union is '{
> }' not '{ 0 }'.

No, { } is a constraint violation. It may be valid in C++ or C23, but
it's not in C99, which is the source language for musl. { 0 } has
always been the universal zero-initializer for C.

Moreover, the C language has no such thing as a "partially initialized
object". I guess it's possible for an implementor to interpret
zero-initialization of a union to produce something other than zero
bits in the storage that is not part of the first union member, but
it's rather hostile.

> - There is currently a bug in clang-18 that treats '{ }' to be the same as
> '{ 0 }'. The proposed fix is to just zero out the entire union for both "{
> 0 }" and "{ }". However we cannot rely on "{ 0 }" to always zero out the
> entire union in the future.
> 
> musl should be fixed to use "{ }" for initialization. And to work around
> the current buggy release of clang-18, perhaps flip the order to make sin6
> the first member of the struct? I've attached a patch that works for me.
> There may be other instances of the same bug in the musl code base.

If the clang interpretation is going to be this, we can just reorder
the union members so that the largest one is first. This should avoid
dependency on how the compiler decided to interpret the universal zero
initializer for unions.

Rich

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-18 21:34 ` Rich Felker
@ 2024-03-18 22:22   ` NRK
  2024-03-18 22:39     ` [musl] Potential bug in __res_msend_rc() wrt to union initialisation Thorsten Glaser
  2024-03-19  0:01     ` [musl] Potential bug in __res_msend_rc() wrt to union initialization Mike Cui
  0 siblings, 2 replies; 17+ messages in thread
From: NRK @ 2024-03-18 22:22 UTC (permalink / raw)
  To: musl; +Cc: Mike Cui

On Mon, Mar 18, 2024 at 05:34:42PM -0400, Rich Felker wrote:
> If the clang interpretation is going to be this, we can just reorder
> the union members so that the largest one is first.

Another option is to utilize implicit static initializer rules:

| if it is a union, the first named member is initialized (recursively)
| according to these rules, and any padding is initialized to zero bits;

So something like:

	static union u zero;
	union u u = zero;

Though, the "padding bits" part was added in C11 and wasn't present in
C99 in case you want to be pedantic about C99 conformance.

- NRK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialisation.
  2024-03-18 22:22   ` NRK
@ 2024-03-18 22:39     ` Thorsten Glaser
  2024-03-19  0:01     ` [musl] Potential bug in __res_msend_rc() wrt to union initialization Mike Cui
  1 sibling, 0 replies; 17+ messages in thread
From: Thorsten Glaser @ 2024-03-18 22:39 UTC (permalink / raw)
  To: musl

NRK dixit:

>So something like:
>
>	static union u zero;
>	union u u = zero;

Even that gets funny when you have:

union {
	struct {
		char *foo;
		uintptr_t bar;
	} a;
	struct {
		uintptr_t baz;
		char *bla;
	} b;
} u;

In this case, u.b.baz can very well be 0x55555555
when you initialise u.a as {} (C23+) or {0}, e.g.
on TenDRA with the option to have nil pointers be
not a binary 0-bits representation.

bye,
//mirabilos
-- 
22:20⎜<asarch> The crazy that persists in his craziness becomes a master
22:21⎜<asarch> And the distance between the craziness and geniality is
only measured by the success 18:35⎜<asarch> "Psychotics are consistently
inconsistent. The essence of sanity is to be inconsistently inconsistent

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-18 22:22   ` NRK
  2024-03-18 22:39     ` [musl] Potential bug in __res_msend_rc() wrt to union initialisation Thorsten Glaser
@ 2024-03-19  0:01     ` Mike Cui
  2024-03-19 13:18       ` Rich Felker
  1 sibling, 1 reply; 17+ messages in thread
From: Mike Cui @ 2024-03-19  0:01 UTC (permalink / raw)
  To: NRK; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 1066 bytes --]

Yeah I also just went over the C99 spec as well, section 6.7.8, and I have
to agree with clang developer's interpretation, that "{ 0 }"
only initializes the first member of the union.

"{ }" apparently is added in C23 as the "universal zero initializer". So
changing the order moving sin6 up is the only way to be C99 conformant.

On Mon, Mar 18, 2024 at 3:22 PM NRK <nrk@disroot.org> wrote:

> On Mon, Mar 18, 2024 at 05:34:42PM -0400, Rich Felker wrote:
> > If the clang interpretation is going to be this, we can just reorder
> > the union members so that the largest one is first.
>
> Another option is to utilize implicit static initializer rules:
>
> | if it is a union, the first named member is initialized (recursively)
> | according to these rules, and any padding is initialized to zero bits;
>
> So something like:
>
>         static union u zero;
>         union u u = zero;
>
> Though, the "padding bits" part was added in C11 and wasn't present in
> C99 in case you want to be pedantic about C99 conformance.
>
> - NRK
>

[-- Attachment #2: Type: text/html, Size: 1467 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-19  0:01     ` [musl] Potential bug in __res_msend_rc() wrt to union initialization Mike Cui
@ 2024-03-19 13:18       ` Rich Felker
  2024-03-19 15:04         ` Mike Cui
  0 siblings, 1 reply; 17+ messages in thread
From: Rich Felker @ 2024-03-19 13:18 UTC (permalink / raw)
  To: Mike Cui; +Cc: NRK, musl

On Mon, Mar 18, 2024 at 05:01:41PM -0700, Mike Cui wrote:
> Yeah I also just went over the C99 spec as well, section 6.7.8, and I have
> to agree with clang developer's interpretation, that "{ 0 }"
> only initializes the first member of the union.

There is no such thing as "only initializes [part]" in the C language.
The { 0 } *only provides a value for* the first member. The question
is about what happens to parts of the object for which the initializer
did not "provide a value". However, the C99 standard does not clearly
describe how the bits of a union that are not part of the member for
which a value is provided (usually the first, unless a designated
initializer is used) are filled on initialization.

C11 adds (in 6.7.9 ¶10):

    "if it is a union, the first named member is initialized
    (recursively) according to these rules, and any padding is
    initialized to zero bits;"

where C99 just had (6.7.8):

    "if it is a union, the first named member is initialized
    (recursively) according to these rules."

So I think C11 and later actually require the full zero
initialization of all bits, and clang is just wrong.

> "{ }" apparently is added in C23 as the "universal zero initializer". So
> changing the order moving sin6 up is the only way to be C99 conformant.

Indeed since at the source level we just depend on C99 not C11, this
should be changed. But clang needs to be fixed too.

Rich

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-19 13:18       ` Rich Felker
@ 2024-03-19 15:04         ` Mike Cui
  2024-03-19 15:42           ` Rich Felker
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Cui @ 2024-03-19 15:04 UTC (permalink / raw)
  To: Rich Felker; +Cc: NRK, musl

[-- Attachment #1: Type: text/plain, Size: 2314 bytes --]

On Tue, Mar 19, 2024 at 6:18 AM Rich Felker <dalias@libc.org> wrote:

> On Mon, Mar 18, 2024 at 05:01:41PM -0700, Mike Cui wrote:
> > Yeah I also just went over the C99 spec as well, section 6.7.8, and I
> have
> > to agree with clang developer's interpretation, that "{ 0 }"
> > only initializes the first member of the union.
>
> There is no such thing as "only initializes [part]" in the C language.
> The { 0 } *only provides a value for* the first member. The question
> is about what happens to parts of the object for which the initializer
> did not "provide a value". However, the C99 standard does not clearly
> describe how the bits of a union that are not part of the member for
> which a value is provided (usually the first, unless a designated
> initializer is used) are filled on initialization.
>
> You are referring to this paragraph?

6.7.9 ¶21
If there are fewer initializers in a brace-enclosed list than there are
elements or members of an aggregate, or fewer characters in a string
literal used to initialize an array of known size than there are elements
in the array, the remainder of the aggregate shall be initialized
implicitly the same as objects that have static storage duration.

Folks on the LLVM discourse pointed out this paragraph does not apply to
unions, since unions are not "aggegates" according to the definition in
6.2.5p21:
21. Arithmetic types and pointer types are collectively called scalar
types. Array and structure types are collectively called *aggregate* types.


> C11 adds (in 6.7.9 ¶10):
>
>     "if it is a union, the first named member is initialized
>     (recursively) according to these rules, and any padding is
>     initialized to zero bits;"
>
> where C99 just had (6.7.8):
>
>     "if it is a union, the first named member is initialized
>     (recursively) according to these rules."
>
> So I think C11 and later actually require the full zero
> initialization of all bits, and clang is just wrong.
>
> > "{ }" apparently is added in C23 as the "universal zero initializer". So
> > changing the order moving sin6 up is the only way to be C99 conformant.
>
> Indeed since at the source level we just depend on C99 not C11, this
> should be changed. But clang needs to be fixed too.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 3012 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-19 15:04         ` Mike Cui
@ 2024-03-19 15:42           ` Rich Felker
  2024-03-19 15:55             ` Mike Cui
  0 siblings, 1 reply; 17+ messages in thread
From: Rich Felker @ 2024-03-19 15:42 UTC (permalink / raw)
  To: Mike Cui; +Cc: NRK, musl

On Tue, Mar 19, 2024 at 08:04:31AM -0700, Mike Cui wrote:
> On Tue, Mar 19, 2024 at 6:18 AM Rich Felker <dalias@libc.org> wrote:
> 
> > On Mon, Mar 18, 2024 at 05:01:41PM -0700, Mike Cui wrote:
> > > Yeah I also just went over the C99 spec as well, section 6.7.8, and I
> > have
> > > to agree with clang developer's interpretation, that "{ 0 }"
> > > only initializes the first member of the union.
> >
> > There is no such thing as "only initializes [part]" in the C language.
> > The { 0 } *only provides a value for* the first member. The question
> > is about what happens to parts of the object for which the initializer
> > did not "provide a value". However, the C99 standard does not clearly
> > describe how the bits of a union that are not part of the member for
> > which a value is provided (usually the first, unless a designated
> > initializer is used) are filled on initialization.
> >
> > You are referring to this paragraph?
> 
> 6.7.9 ¶21
> If there are fewer initializers in a brace-enclosed list than there are
> elements or members of an aggregate, or fewer characters in a string
> literal used to initialize an array of known size than there are elements
> in the array, the remainder of the aggregate shall be initialized
> implicitly the same as objects that have static storage duration.
> 
> Folks on the LLVM discourse pointed out this paragraph does not apply to
> unions, since unions are not "aggegates" according to the definition in
> 6.2.5p21:
> 21. Arithmetic types and pointer types are collectively called scalar
> types. Array and structure types are collectively called *aggregate* types.

No, the part below that you didn't reply to covers unions:

> > C11 adds (in 6.7.9 ¶10):
> >
> >     "if it is a union, the first named member is initialized
> >     (recursively) according to these rules, and any padding is
> >     initialized to zero bits;"
> >
> > where C99 just had (6.7.8):
> >
> >     "if it is a union, the first named member is initialized
> >     (recursively) according to these rules."
> >
> > So I think C11 and later actually require the full zero
> > initialization of all bits, and clang is just wrong.
> >
> > > "{ }" apparently is added in C23 as the "universal zero initializer". So
> > > changing the order moving sin6 up is the only way to be C99 conformant.
> >
> > Indeed since at the source level we just depend on C99 not C11, this
> > should be changed. But clang needs to be fixed too.
> >
> > Rich
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-19 15:42           ` Rich Felker
@ 2024-03-19 15:55             ` Mike Cui
  2024-03-19 16:08               ` Rich Felker
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Cui @ 2024-03-19 15:55 UTC (permalink / raw)
  To: Rich Felker; +Cc: NRK, musl

[-- Attachment #1: Type: text/plain, Size: 3812 bytes --]

On Tue, Mar 19, 2024 at 8:42 AM Rich Felker <dalias@libc.org> wrote:

> On Tue, Mar 19, 2024 at 08:04:31AM -0700, Mike Cui wrote:
> > On Tue, Mar 19, 2024 at 6:18 AM Rich Felker <dalias@libc.org> wrote:
> >
> > > On Mon, Mar 18, 2024 at 05:01:41PM -0700, Mike Cui wrote:
> > > > Yeah I also just went over the C99 spec as well, section 6.7.8, and I
> > > have
> > > > to agree with clang developer's interpretation, that "{ 0 }"
> > > > only initializes the first member of the union.
> > >
> > > There is no such thing as "only initializes [part]" in the C language.
> > > The { 0 } *only provides a value for* the first member. The question
> > > is about what happens to parts of the object for which the initializer
> > > did not "provide a value". However, the C99 standard does not clearly
> > > describe how the bits of a union that are not part of the member for
> > > which a value is provided (usually the first, unless a designated
> > > initializer is used) are filled on initialization.
> > >
> > > You are referring to this paragraph?
> >
> > 6.7.9 ¶21
> > If there are fewer initializers in a brace-enclosed list than there are
> > elements or members of an aggregate, or fewer characters in a string
> > literal used to initialize an array of known size than there are elements
> > in the array, the remainder of the aggregate shall be initialized
> > implicitly the same as objects that have static storage duration.
> >
> > Folks on the LLVM discourse pointed out this paragraph does not apply to
> > unions, since unions are not "aggegates" according to the definition in
> > 6.2.5p21:
> > 21. Arithmetic types and pointer types are collectively called scalar
> > types. Array and structure types are collectively called *aggregate*
> types.
>
> No, the part below that you didn't reply to covers unions:
>
>
The full 6.7.9 ¶10:

10 If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate. If an object that has static or
thread storage duration is not initialized explicitly, then:
- if it has pointer type, it is initialized to a null pointer;
- if it has arithmetic type, it is initialized to (positive or unsigned)
zero;
- if it is an aggregate, every member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
- if it is a union, the first named member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;

The second part that you quoted applies to "static or thread storage
duration". The first sentence specifically says that anything not
initialized is indeterminate.
The only other paragraph which invokes 6.7.9 p10 is 6.7.9p21, which also
does not apply to unions. (p21 ensures that the "ns" array of unions in the
code would be zeroed out, but not the "sa" which a single union allocated
on the stack.)



> > > C11 adds (in 6.7.9 ¶10):
> > >
> > >     "if it is a union, the first named member is initialized
> > >     (recursively) according to these rules, and any padding is
> > >     initialized to zero bits;"
> > >
> > > where C99 just had (6.7.8):
> > >
> > >     "if it is a union, the first named member is initialized
> > >     (recursively) according to these rules."
> > >
> > > So I think C11 and later actually require the full zero
> > > initialization of all bits, and clang is just wrong.
> > >
> > > > "{ }" apparently is added in C23 as the "universal zero
> initializer". So
> > > > changing the order moving sin6 up is the only way to be C99
> conformant.
> > >
> > > Indeed since at the source level we just depend on C99 not C11, this
> > > should be changed. But clang needs to be fixed too.
> > >
> > > Rich
> > >
>

[-- Attachment #2: Type: text/html, Size: 4952 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-19 15:55             ` Mike Cui
@ 2024-03-19 16:08               ` Rich Felker
  2024-03-19 16:39                 ` Jₑₙₛ Gustedt
  0 siblings, 1 reply; 17+ messages in thread
From: Rich Felker @ 2024-03-19 16:08 UTC (permalink / raw)
  To: Mike Cui; +Cc: NRK, musl

On Tue, Mar 19, 2024 at 08:55:22AM -0700, Mike Cui wrote:
> On Tue, Mar 19, 2024 at 8:42 AM Rich Felker <dalias@libc.org> wrote:
> 
> > On Tue, Mar 19, 2024 at 08:04:31AM -0700, Mike Cui wrote:
> > > On Tue, Mar 19, 2024 at 6:18 AM Rich Felker <dalias@libc.org> wrote:
> > >
> > > > On Mon, Mar 18, 2024 at 05:01:41PM -0700, Mike Cui wrote:
> > > > > Yeah I also just went over the C99 spec as well, section 6.7.8, and I
> > > > have
> > > > > to agree with clang developer's interpretation, that "{ 0 }"
> > > > > only initializes the first member of the union.
> > > >
> > > > There is no such thing as "only initializes [part]" in the C language..
> > > > The { 0 } *only provides a value for* the first member. The question
> > > > is about what happens to parts of the object for which the initializer
> > > > did not "provide a value". However, the C99 standard does not clearly
> > > > describe how the bits of a union that are not part of the member for
> > > > which a value is provided (usually the first, unless a designated
> > > > initializer is used) are filled on initialization.
> > > >
> > > > You are referring to this paragraph?
> > >
> > > 6.7.9 ¶21
> > > If there are fewer initializers in a brace-enclosed list than there are
> > > elements or members of an aggregate, or fewer characters in a string
> > > literal used to initialize an array of known size than there are elements
> > > in the array, the remainder of the aggregate shall be initialized
> > > implicitly the same as objects that have static storage duration.
> > >
> > > Folks on the LLVM discourse pointed out this paragraph does not apply to
> > > unions, since unions are not "aggegates" according to the definition in
> > > 6.2.5p21:
> > > 21. Arithmetic types and pointer types are collectively called scalar
> > > types. Array and structure types are collectively called *aggregate*
> > types.
> >
> > No, the part below that you didn't reply to covers unions:
> >
> >
> The full 6.7.9 ¶10:
> 
> 10 If an object that has automatic storage duration is not initialized
> explicitly, its value is indeterminate. If an object that has static or
> thread storage duration is not initialized explicitly, then:
> - if it has pointer type, it is initialized to a null pointer;
> - if it has arithmetic type, it is initialized to (positive or unsigned)
> zero;
> - if it is an aggregate, every member is initialized (recursively)
> according to these rules, and any padding is initialized to zero bits;
> - if it is a union, the first named member is initialized (recursively)
> according to these rules, and any padding is initialized to zero bits;
> 
> The second part that you quoted applies to "static or thread storage
> duration". The first sentence specifically says that anything not
> initialized is indeterminate.
> The only other paragraph which invokes 6.7.9 p10 is 6.7.9p21, which also
> does not apply to unions. (p21 ensures that the "ns" array of unions in the
> code would be zeroed out, but not the "sa" which a single union allocated
> on the stack.)

¶19 says:

    "all subobjects that are not initialized explicitly shall be
    initialized implicitly the same as objects that have static
    storage duration."

The term "subobject" does not seem to be defined, so there's some
ambiguity, but I would read ¶19 as applying the above text about
static unions to automatic ones.

In any case, what clang wants to do here seems like a big gratuitous
footgun. We'll make the code in musl safe against this but I suspect
it will have lots of bad effects elsewhere...

Rich

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-19 16:08               ` Rich Felker
@ 2024-03-19 16:39                 ` Jₑₙₛ Gustedt
  2024-03-19 20:47                   ` Thorsten Glaser
  2024-03-19 21:04                   ` NRK
  0 siblings, 2 replies; 17+ messages in thread
From: Jₑₙₛ Gustedt @ 2024-03-19 16:39 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl, Mike Cui, NRK

Hi,
actually the introduction of `{}` versus `{0}` in C23 was not meant to
provide any difference in semantics, just to make the syntax nicer and
consistent with C++.

on Tue, 19 Mar 2024 12:08:32 -0400 you (Rich Felker <dalias@libc.org>)
wrote:

> On Tue, Mar 19, 2024 at 08:55:22AM -0700, Mike Cui wrote:
> > On Tue, Mar 19, 2024 at 8:42 AM Rich Felker <dalias@libc.org> wrote:
> >   
> > > On Tue, Mar 19, 2024 at 08:04:31AM -0700, Mike Cui wrote:  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
> > > types.
> > >
> > > No, the part below that you didn't reply to covers unions:
> > >
> > >  
> > The full 6.7.9 ¶10:
> > 
> > 10 If an object that has automatic storage duration is not
> > initialized explicitly, its value is indeterminate. If an object
> > that has static or thread storage duration is not initialized
> > explicitly, then:
> > - if it has pointer type, it is initialized to a null pointer;
> > - if it has arithmetic type, it is initialized to (positive or
> > unsigned) zero;
> > - if it is an aggregate, every member is initialized (recursively)
> > according to these rules, and any padding is initialized to zero
> > bits;
> > - if it is a union, the first named member is initialized
> > (recursively) according to these rules, and any padding is
> > initialized to zero bits;
> > 
> > The second part that you quoted applies to "static or thread storage
> > duration". The first sentence specifically says that anything not
> > initialized is indeterminate.
> > The only other paragraph which invokes 6.7.9 p10 is 6.7.9p21, which
> > also does not apply to unions. (p21 ensures that the "ns" array of
> > unions in the code would be zeroed out, but not the "sa" which a
> > single union allocated on the stack.)  
> 
> ¶19 says:
> 
>     "all subobjects that are not initialized explicitly shall be
>     initialized implicitly the same as objects that have static
>     storage duration."
> 
> The term "subobject" does not seem to be defined, so there's some
> ambiguity, but I would read ¶19 as applying the above text about
> static unions to automatic ones.
> 
> In any case, what clang wants to do here seems like a big gratuitous
> footgun. We'll make the code in musl safe against this but I suspect
> it will have lots of bad effects elsewhere...

To avoid such differences in interpretation, the simplest solution
seems to be to always put the biggest union member first, or to even
add an artificial first one `char [size-of-the-union] __dummy;`, such
that this is always 0-byte initialized if there is any initialization
at all.

Jₑₙₛ

-- 
:: ICube :::::::::::::::::::::::::::::: deputy director ::
:: Université de Strasbourg :::::::::::::::::::::: ICPS ::
:: INRIA Nancy Grand Est :::::::::::::::::::::::: Camus ::
:: :::::::::::::::::::::::::::::::::::: ☎ +33 368854536 ::
:: https://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-19 16:39                 ` Jₑₙₛ Gustedt
@ 2024-03-19 20:47                   ` Thorsten Glaser
  2024-03-21 10:58                     ` Jₑₙₛ Gustedt
  2024-03-19 21:04                   ` NRK
  1 sibling, 1 reply; 17+ messages in thread
From: Thorsten Glaser @ 2024-03-19 20:47 UTC (permalink / raw)
  To: musl

Jₑₙₛ Gustedt dixit:

>seems to be to always put the biggest union member first, or to even
>add an artificial first one `char [size-of-the-union] __dummy;`, such
>that this is always 0-byte initialized if there is any initialization

But then you can just memset the union and then initialise any
known pointer members to NULL/nullptr manually afterwards in the
union’s member you actually need.

(IIRC, POSIX actively specifies the possible pointer members in
these structs for that reason.)

bye,
//mirabilos
-- 
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-19 16:39                 ` Jₑₙₛ Gustedt
  2024-03-19 20:47                   ` Thorsten Glaser
@ 2024-03-19 21:04                   ` NRK
  2024-03-19 21:36                     ` Rich Felker
  1 sibling, 1 reply; 17+ messages in thread
From: NRK @ 2024-03-19 21:04 UTC (permalink / raw)
  To: Jₑₙₛ Gustedt; +Cc: Rich Felker, musl, Mike Cui

> actually the introduction of `{}` versus `{0}` in C23 was not meant to
> provide any difference in semantics, just to make the syntax nicer and
> consistent with C++.

Regardless of what the intention was, the reality is that it *does* have
semantic difference. Specifically, empty initialization `{}` benefits
from the default initialization rules, which specify zero-ing out the
padding bits whereas `{0}` doesn't guarantee that.

- NRK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-19 21:04                   ` NRK
@ 2024-03-19 21:36                     ` Rich Felker
  2024-03-20 17:11                       ` NRK
  0 siblings, 1 reply; 17+ messages in thread
From: Rich Felker @ 2024-03-19 21:36 UTC (permalink / raw)
  To: NRK; +Cc: Jₑₙₛ Gustedt, musl, Mike Cui

On Tue, Mar 19, 2024 at 09:04:41PM +0000, NRK wrote:
> > actually the introduction of `{}` versus `{0}` in C23 was not meant to
> > provide any difference in semantics, just to make the syntax nicer and
> > consistent with C++.
> 
> Regardless of what the intention was, the reality is that it *does* have
> semantic difference. Specifically, empty initialization `{}` benefits
> from the default initialization rules, which specify zero-ing out the
> padding bits whereas `{0}` doesn't guarantee that.

That's simply not true. There is no difference in the rules as
specified by the standard.

Rich

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-19 21:36                     ` Rich Felker
@ 2024-03-20 17:11                       ` NRK
  0 siblings, 0 replies; 17+ messages in thread
From: NRK @ 2024-03-20 17:11 UTC (permalink / raw)
  To: Rich Felker; +Cc: Jₑₙₛ Gustedt, musl, Mike Cui

> That's simply not true. There is no difference in the rules as
> specified by the standard.

I don't think your assertion is correct, at least it hasn't been
demonstrated.

> ¶19 says:
> 
>     "all subobjects that are not initialized explicitly shall be
>     initialized implicitly the same as objects that have static
>     storage duration."
> 
> The term "subobject" does not seem to be defined, so there's some
> ambiguity, but I would read ¶19 as applying the above text about
> static unions to automatic ones.

The term subobject might not be clearly defined but there's a strong
indication that it refers to objects contained within an aggregate, and
NOT members of a union. Consider the following case:

	struct { int a; int b; } object = { .a = 5 };

Assuming `a` and `b` are subobjects, `a` will be initialized to 5 and
`b` to 0. Which makes sense and is consistent with all existing
implementation I'm aware of. Now consider the same with a union:

	union { int a; int b; } object = { .a = 5 };

If `b` is a subobjects then it should be initialized to zero according
to the rule. But that can't happen since that'd overwrite the value of
`a`. This to me is a convincing case that subobjects do not refer to
union members, as there can be only 1 active at a time.

Now I don't agree with clang's decision to not zero the entire union in
case of `{0}`. It's unnecessarily hostile, brings negligible (if any)
gains and *will* be causing bugs (this is the 2nd one I'm witnessing).

But as it currently stands, it's not a wrong interpretation of the
standard either. If the intention was to have no difference between {0}
and {} it has not been written down clearly.

- NRK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-19 20:47                   ` Thorsten Glaser
@ 2024-03-21 10:58                     ` Jₑₙₛ Gustedt
  2024-03-21 16:41                       ` Thorsten Glaser
  0 siblings, 1 reply; 17+ messages in thread
From: Jₑₙₛ Gustedt @ 2024-03-21 10:58 UTC (permalink / raw)
  To: Thorsten Glaser; +Cc: musl

Hi,

on Tue, 19 Mar 2024 20:47:00 +0000 (UTC) you (Thorsten Glaser
<tg@mirbsd.de>) wrote:

> Jₑₙₛ Gustedt dixit:
> 
> >seems to be to always put the biggest union member first, or to even
> >add an artificial first one `char [size-of-the-union] __dummy;`, such
> >that this is always 0-byte initialized if there is any
> >initialization  
> 
> But then you can just memset the union and then initialise any
> known pointer members to NULL/nullptr manually afterwards in the
> union’s member you actually need.
> 
> (IIRC, POSIX actively specifies the possible pointer members in
> these structs for that reason.)

no, that is actually not a full solution, I think. As soon as you store
to any member, padding bytes may change to arbitrary values. I don't
know if compilers really do that, but for example writing a wide
register that has garbage in the upper half could be valid in some
cases.

Also, I think that up-thread there was the idea that a statically
initialized object could be used to initialize an automatic object,
and that this would guarantee that the padding is transferred. There
is no such guarantee.

So if you have known places where the pointer members are situated,
you could try to ensure that you have an artificial union member as
first, that has pointers at these places and otherwise fills the gaps
with `unsigned char[something]`. That would guarantee that
initialization does the right thing for pointer members, that all other
bytes are zero-initialized, *and* that these values would be properly
transferred on union assignment.

Thanks
Jₑₙₛ

-- 
:: ICube :::::::::::::::::::::::::::::: deputy director ::
:: Université de Strasbourg :::::::::::::::::::::: ICPS ::
:: INRIA Nancy Grand Est :::::::::::::::::::::::: Camus ::
:: :::::::::::::::::::::::::::::::::::: ☎ +33 368854536 ::
:: https://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization.
  2024-03-21 10:58                     ` Jₑₙₛ Gustedt
@ 2024-03-21 16:41                       ` Thorsten Glaser
  0 siblings, 0 replies; 17+ messages in thread
From: Thorsten Glaser @ 2024-03-21 16:41 UTC (permalink / raw)
  To: musl

Jₑₙₛ Gustedt dixit:

>As soon as you store
>to any member, padding bytes may change to arbitrary values.

Yes, but where is that a problem?

Something like:

union foo {
	struct bar {
		char *s;
		size_t z;
	} a;
	struct baz {
		size_t z;
		char *s;
	} b;
};

int
somefunc(int mode, char *buf, size_t len, …)
{
	union foo u;

	memset(u, '\0', sizeof(u));
	/* … */
	if (mode) {
		/* from here on, u is decided to be a */
		u.a.s = NULL;
	} else {
		/* from here on, u is decided to be b */
		u.b.s = NULL;
	}
	/* … some other processing … */
	if (mode) {
		u.a.s = buf;
		u.a.z = len;
	} else {
		u.b.s = buf;
		u.b.z = len;
	}
	return (someotherfunc(&u, mode, …));
}

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-03-21 16:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-18 19:56 [musl] Potential bug in __res_msend_rc() wrt to union initialization Mike Cui
2024-03-18 21:34 ` Rich Felker
2024-03-18 22:22   ` NRK
2024-03-18 22:39     ` [musl] Potential bug in __res_msend_rc() wrt to union initialisation Thorsten Glaser
2024-03-19  0:01     ` [musl] Potential bug in __res_msend_rc() wrt to union initialization Mike Cui
2024-03-19 13:18       ` Rich Felker
2024-03-19 15:04         ` Mike Cui
2024-03-19 15:42           ` Rich Felker
2024-03-19 15:55             ` Mike Cui
2024-03-19 16:08               ` Rich Felker
2024-03-19 16:39                 ` Jₑₙₛ Gustedt
2024-03-19 20:47                   ` Thorsten Glaser
2024-03-21 10:58                     ` Jₑₙₛ Gustedt
2024-03-21 16:41                       ` Thorsten Glaser
2024-03-19 21:04                   ` NRK
2024-03-19 21:36                     ` Rich Felker
2024-03-20 17:11                       ` NRK

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).