* [musl] Potential bug in __res_msend_rc() wrt to union initialization. @ 2024-03-18 19:56 Mike Cui 2024-03-18 21:34 ` Rich Felker 0 siblings, 1 reply; 17+ messages in thread From: Mike Cui @ 2024-03-18 19:56 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1918 bytes --] I recently upgraded to clang-18, and after compiling musl with it, I noticed that all my getaddrinfo() calls are failing. I tracked this to be an issue in __res_msend_rc(), where the 'sin6' member of union 'sa' is initialized to garbage, rather than 0. Then later bind() fails with EADDRNOTAVAIL. I reported this bug on clang discourse: https://discourse.llvm.org/t/union-initialization-and-aliasing-clang-18-seems-to-miscompile-musl/77724, and the discussion seems to suggest that there is potentially a bug in musl as well. TL;DR: - According to strict interpretation of the C standard, initializing a union with '{0}', only initializes the first member of the union to 0 (in this case, sin4), and "default" initializes the rest. This interpretation is still up for debate. The proper way to initialize the entire union is '{ }' not '{ 0 }'. - There is currently a bug in clang-18 that treats '{ }' to be the same as '{ 0 }'. The proposed fix is to just zero out the entire union for both "{ 0 }" and "{ }". However we cannot rely on "{ 0 }" to always zero out the entire union in the future. musl should be fixed to use "{ }" for initialization. And to work around the current buggy release of clang-18, perhaps flip the order to make sin6 the first member of the struct? I've attached a patch that works for me. There may be other instances of the same bug in the musl code base. --- a/src/network/res_msend.c +++ b/src/network/res_msend.c @@ -83,9 +83,9 @@ int __res_msend_rc(int nqueries, const unsigned char *const *queries, int fd; int timeout, attempts, retry_interval, servfail_retry; union { - struct sockaddr_in sin; struct sockaddr_in6 sin6; - } sa = {0}, ns[MAXNS] = {{0}}; + struct sockaddr_in sin; + } sa = {}, ns[MAXNS] = {{}}; socklen_t sl = sizeof sa.sin; int nns = 0; int family = AF_INET; [-- Attachment #2: Type: text/html, Size: 2376 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-18 19:56 [musl] Potential bug in __res_msend_rc() wrt to union initialization Mike Cui @ 2024-03-18 21:34 ` Rich Felker 2024-03-18 22:22 ` NRK 0 siblings, 1 reply; 17+ messages in thread From: Rich Felker @ 2024-03-18 21:34 UTC (permalink / raw) To: Mike Cui; +Cc: musl On Mon, Mar 18, 2024 at 12:56:41PM -0700, Mike Cui wrote: > I recently upgraded to clang-18, and after compiling musl with it, I > noticed that all my getaddrinfo() calls are failing. I tracked this to be > an issue in __res_msend_rc(), where the 'sin6' member of union 'sa' is > initialized to garbage, rather than 0. Then later bind() fails > with EADDRNOTAVAIL. > > I reported this bug on clang discourse: > https://discourse.llvm.org/t/union-initialization-and-aliasing-clang-18-seems-to-miscompile-musl/77724, > and the discussion seems to suggest that there is potentially a bug in musl > as well. TL;DR: > > - According to strict interpretation of the C standard, initializing a > union with '{0}', only initializes the first member of the union to 0 (in > this case, sin4), and "default" initializes the rest. This interpretation > is still up for debate. The proper way to initialize the entire union is '{ > }' not '{ 0 }'. No, { } is a constraint violation. It may be valid in C++ or C23, but it's not in C99, which is the source language for musl. { 0 } has always been the universal zero-initializer for C. Moreover, the C language has no such thing as a "partially initialized object". I guess it's possible for an implementor to interpret zero-initialization of a union to produce something other than zero bits in the storage that is not part of the first union member, but it's rather hostile. > - There is currently a bug in clang-18 that treats '{ }' to be the same as > '{ 0 }'. The proposed fix is to just zero out the entire union for both "{ > 0 }" and "{ }". However we cannot rely on "{ 0 }" to always zero out the > entire union in the future. > > musl should be fixed to use "{ }" for initialization. And to work around > the current buggy release of clang-18, perhaps flip the order to make sin6 > the first member of the struct? I've attached a patch that works for me. > There may be other instances of the same bug in the musl code base. If the clang interpretation is going to be this, we can just reorder the union members so that the largest one is first. This should avoid dependency on how the compiler decided to interpret the universal zero initializer for unions. Rich ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-18 21:34 ` Rich Felker @ 2024-03-18 22:22 ` NRK 2024-03-18 22:39 ` [musl] Potential bug in __res_msend_rc() wrt to union initialisation Thorsten Glaser 2024-03-19 0:01 ` [musl] Potential bug in __res_msend_rc() wrt to union initialization Mike Cui 0 siblings, 2 replies; 17+ messages in thread From: NRK @ 2024-03-18 22:22 UTC (permalink / raw) To: musl; +Cc: Mike Cui On Mon, Mar 18, 2024 at 05:34:42PM -0400, Rich Felker wrote: > If the clang interpretation is going to be this, we can just reorder > the union members so that the largest one is first. Another option is to utilize implicit static initializer rules: | if it is a union, the first named member is initialized (recursively) | according to these rules, and any padding is initialized to zero bits; So something like: static union u zero; union u u = zero; Though, the "padding bits" part was added in C11 and wasn't present in C99 in case you want to be pedantic about C99 conformance. - NRK ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialisation. 2024-03-18 22:22 ` NRK @ 2024-03-18 22:39 ` Thorsten Glaser 2024-03-19 0:01 ` [musl] Potential bug in __res_msend_rc() wrt to union initialization Mike Cui 1 sibling, 0 replies; 17+ messages in thread From: Thorsten Glaser @ 2024-03-18 22:39 UTC (permalink / raw) To: musl NRK dixit: >So something like: > > static union u zero; > union u u = zero; Even that gets funny when you have: union { struct { char *foo; uintptr_t bar; } a; struct { uintptr_t baz; char *bla; } b; } u; In this case, u.b.baz can very well be 0x55555555 when you initialise u.a as {} (C23+) or {0}, e.g. on TenDRA with the option to have nil pointers be not a binary 0-bits representation. bye, //mirabilos -- 22:20⎜<asarch> The crazy that persists in his craziness becomes a master 22:21⎜<asarch> And the distance between the craziness and geniality is only measured by the success 18:35⎜<asarch> "Psychotics are consistently inconsistent. The essence of sanity is to be inconsistently inconsistent ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-18 22:22 ` NRK 2024-03-18 22:39 ` [musl] Potential bug in __res_msend_rc() wrt to union initialisation Thorsten Glaser @ 2024-03-19 0:01 ` Mike Cui 2024-03-19 13:18 ` Rich Felker 1 sibling, 1 reply; 17+ messages in thread From: Mike Cui @ 2024-03-19 0:01 UTC (permalink / raw) To: NRK; +Cc: musl [-- Attachment #1: Type: text/plain, Size: 1066 bytes --] Yeah I also just went over the C99 spec as well, section 6.7.8, and I have to agree with clang developer's interpretation, that "{ 0 }" only initializes the first member of the union. "{ }" apparently is added in C23 as the "universal zero initializer". So changing the order moving sin6 up is the only way to be C99 conformant. On Mon, Mar 18, 2024 at 3:22 PM NRK <nrk@disroot.org> wrote: > On Mon, Mar 18, 2024 at 05:34:42PM -0400, Rich Felker wrote: > > If the clang interpretation is going to be this, we can just reorder > > the union members so that the largest one is first. > > Another option is to utilize implicit static initializer rules: > > | if it is a union, the first named member is initialized (recursively) > | according to these rules, and any padding is initialized to zero bits; > > So something like: > > static union u zero; > union u u = zero; > > Though, the "padding bits" part was added in C11 and wasn't present in > C99 in case you want to be pedantic about C99 conformance. > > - NRK > [-- Attachment #2: Type: text/html, Size: 1467 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-19 0:01 ` [musl] Potential bug in __res_msend_rc() wrt to union initialization Mike Cui @ 2024-03-19 13:18 ` Rich Felker 2024-03-19 15:04 ` Mike Cui 0 siblings, 1 reply; 17+ messages in thread From: Rich Felker @ 2024-03-19 13:18 UTC (permalink / raw) To: Mike Cui; +Cc: NRK, musl On Mon, Mar 18, 2024 at 05:01:41PM -0700, Mike Cui wrote: > Yeah I also just went over the C99 spec as well, section 6.7.8, and I have > to agree with clang developer's interpretation, that "{ 0 }" > only initializes the first member of the union. There is no such thing as "only initializes [part]" in the C language. The { 0 } *only provides a value for* the first member. The question is about what happens to parts of the object for which the initializer did not "provide a value". However, the C99 standard does not clearly describe how the bits of a union that are not part of the member for which a value is provided (usually the first, unless a designated initializer is used) are filled on initialization. C11 adds (in 6.7.9 ¶10): "if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;" where C99 just had (6.7.8): "if it is a union, the first named member is initialized (recursively) according to these rules." So I think C11 and later actually require the full zero initialization of all bits, and clang is just wrong. > "{ }" apparently is added in C23 as the "universal zero initializer". So > changing the order moving sin6 up is the only way to be C99 conformant. Indeed since at the source level we just depend on C99 not C11, this should be changed. But clang needs to be fixed too. Rich ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-19 13:18 ` Rich Felker @ 2024-03-19 15:04 ` Mike Cui 2024-03-19 15:42 ` Rich Felker 0 siblings, 1 reply; 17+ messages in thread From: Mike Cui @ 2024-03-19 15:04 UTC (permalink / raw) To: Rich Felker; +Cc: NRK, musl [-- Attachment #1: Type: text/plain, Size: 2314 bytes --] On Tue, Mar 19, 2024 at 6:18 AM Rich Felker <dalias@libc.org> wrote: > On Mon, Mar 18, 2024 at 05:01:41PM -0700, Mike Cui wrote: > > Yeah I also just went over the C99 spec as well, section 6.7.8, and I > have > > to agree with clang developer's interpretation, that "{ 0 }" > > only initializes the first member of the union. > > There is no such thing as "only initializes [part]" in the C language. > The { 0 } *only provides a value for* the first member. The question > is about what happens to parts of the object for which the initializer > did not "provide a value". However, the C99 standard does not clearly > describe how the bits of a union that are not part of the member for > which a value is provided (usually the first, unless a designated > initializer is used) are filled on initialization. > > You are referring to this paragraph? 6.7.9 ¶21 If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration. Folks on the LLVM discourse pointed out this paragraph does not apply to unions, since unions are not "aggegates" according to the definition in 6.2.5p21: 21. Arithmetic types and pointer types are collectively called scalar types. Array and structure types are collectively called *aggregate* types. > C11 adds (in 6.7.9 ¶10): > > "if it is a union, the first named member is initialized > (recursively) according to these rules, and any padding is > initialized to zero bits;" > > where C99 just had (6.7.8): > > "if it is a union, the first named member is initialized > (recursively) according to these rules." > > So I think C11 and later actually require the full zero > initialization of all bits, and clang is just wrong. > > > "{ }" apparently is added in C23 as the "universal zero initializer". So > > changing the order moving sin6 up is the only way to be C99 conformant. > > Indeed since at the source level we just depend on C99 not C11, this > should be changed. But clang needs to be fixed too. > > Rich > [-- Attachment #2: Type: text/html, Size: 3012 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-19 15:04 ` Mike Cui @ 2024-03-19 15:42 ` Rich Felker 2024-03-19 15:55 ` Mike Cui 0 siblings, 1 reply; 17+ messages in thread From: Rich Felker @ 2024-03-19 15:42 UTC (permalink / raw) To: Mike Cui; +Cc: NRK, musl On Tue, Mar 19, 2024 at 08:04:31AM -0700, Mike Cui wrote: > On Tue, Mar 19, 2024 at 6:18 AM Rich Felker <dalias@libc.org> wrote: > > > On Mon, Mar 18, 2024 at 05:01:41PM -0700, Mike Cui wrote: > > > Yeah I also just went over the C99 spec as well, section 6.7.8, and I > > have > > > to agree with clang developer's interpretation, that "{ 0 }" > > > only initializes the first member of the union. > > > > There is no such thing as "only initializes [part]" in the C language. > > The { 0 } *only provides a value for* the first member. The question > > is about what happens to parts of the object for which the initializer > > did not "provide a value". However, the C99 standard does not clearly > > describe how the bits of a union that are not part of the member for > > which a value is provided (usually the first, unless a designated > > initializer is used) are filled on initialization. > > > > You are referring to this paragraph? > > 6.7.9 ¶21 > If there are fewer initializers in a brace-enclosed list than there are > elements or members of an aggregate, or fewer characters in a string > literal used to initialize an array of known size than there are elements > in the array, the remainder of the aggregate shall be initialized > implicitly the same as objects that have static storage duration. > > Folks on the LLVM discourse pointed out this paragraph does not apply to > unions, since unions are not "aggegates" according to the definition in > 6.2.5p21: > 21. Arithmetic types and pointer types are collectively called scalar > types. Array and structure types are collectively called *aggregate* types. No, the part below that you didn't reply to covers unions: > > C11 adds (in 6.7.9 ¶10): > > > > "if it is a union, the first named member is initialized > > (recursively) according to these rules, and any padding is > > initialized to zero bits;" > > > > where C99 just had (6.7.8): > > > > "if it is a union, the first named member is initialized > > (recursively) according to these rules." > > > > So I think C11 and later actually require the full zero > > initialization of all bits, and clang is just wrong. > > > > > "{ }" apparently is added in C23 as the "universal zero initializer". So > > > changing the order moving sin6 up is the only way to be C99 conformant. > > > > Indeed since at the source level we just depend on C99 not C11, this > > should be changed. But clang needs to be fixed too. > > > > Rich > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-19 15:42 ` Rich Felker @ 2024-03-19 15:55 ` Mike Cui 2024-03-19 16:08 ` Rich Felker 0 siblings, 1 reply; 17+ messages in thread From: Mike Cui @ 2024-03-19 15:55 UTC (permalink / raw) To: Rich Felker; +Cc: NRK, musl [-- Attachment #1: Type: text/plain, Size: 3812 bytes --] On Tue, Mar 19, 2024 at 8:42 AM Rich Felker <dalias@libc.org> wrote: > On Tue, Mar 19, 2024 at 08:04:31AM -0700, Mike Cui wrote: > > On Tue, Mar 19, 2024 at 6:18 AM Rich Felker <dalias@libc.org> wrote: > > > > > On Mon, Mar 18, 2024 at 05:01:41PM -0700, Mike Cui wrote: > > > > Yeah I also just went over the C99 spec as well, section 6.7.8, and I > > > have > > > > to agree with clang developer's interpretation, that "{ 0 }" > > > > only initializes the first member of the union. > > > > > > There is no such thing as "only initializes [part]" in the C language. > > > The { 0 } *only provides a value for* the first member. The question > > > is about what happens to parts of the object for which the initializer > > > did not "provide a value". However, the C99 standard does not clearly > > > describe how the bits of a union that are not part of the member for > > > which a value is provided (usually the first, unless a designated > > > initializer is used) are filled on initialization. > > > > > > You are referring to this paragraph? > > > > 6.7.9 ¶21 > > If there are fewer initializers in a brace-enclosed list than there are > > elements or members of an aggregate, or fewer characters in a string > > literal used to initialize an array of known size than there are elements > > in the array, the remainder of the aggregate shall be initialized > > implicitly the same as objects that have static storage duration. > > > > Folks on the LLVM discourse pointed out this paragraph does not apply to > > unions, since unions are not "aggegates" according to the definition in > > 6.2.5p21: > > 21. Arithmetic types and pointer types are collectively called scalar > > types. Array and structure types are collectively called *aggregate* > types. > > No, the part below that you didn't reply to covers unions: > > The full 6.7.9 ¶10: 10 If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static or thread storage duration is not initialized explicitly, then: - if it has pointer type, it is initialized to a null pointer; - if it has arithmetic type, it is initialized to (positive or unsigned) zero; - if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits; - if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits; The second part that you quoted applies to "static or thread storage duration". The first sentence specifically says that anything not initialized is indeterminate. The only other paragraph which invokes 6.7.9 p10 is 6.7.9p21, which also does not apply to unions. (p21 ensures that the "ns" array of unions in the code would be zeroed out, but not the "sa" which a single union allocated on the stack.) > > > C11 adds (in 6.7.9 ¶10): > > > > > > "if it is a union, the first named member is initialized > > > (recursively) according to these rules, and any padding is > > > initialized to zero bits;" > > > > > > where C99 just had (6.7.8): > > > > > > "if it is a union, the first named member is initialized > > > (recursively) according to these rules." > > > > > > So I think C11 and later actually require the full zero > > > initialization of all bits, and clang is just wrong. > > > > > > > "{ }" apparently is added in C23 as the "universal zero > initializer". So > > > > changing the order moving sin6 up is the only way to be C99 > conformant. > > > > > > Indeed since at the source level we just depend on C99 not C11, this > > > should be changed. But clang needs to be fixed too. > > > > > > Rich > > > > [-- Attachment #2: Type: text/html, Size: 4952 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-19 15:55 ` Mike Cui @ 2024-03-19 16:08 ` Rich Felker 2024-03-19 16:39 ` Jₑₙₛ Gustedt 0 siblings, 1 reply; 17+ messages in thread From: Rich Felker @ 2024-03-19 16:08 UTC (permalink / raw) To: Mike Cui; +Cc: NRK, musl On Tue, Mar 19, 2024 at 08:55:22AM -0700, Mike Cui wrote: > On Tue, Mar 19, 2024 at 8:42 AM Rich Felker <dalias@libc.org> wrote: > > > On Tue, Mar 19, 2024 at 08:04:31AM -0700, Mike Cui wrote: > > > On Tue, Mar 19, 2024 at 6:18 AM Rich Felker <dalias@libc.org> wrote: > > > > > > > On Mon, Mar 18, 2024 at 05:01:41PM -0700, Mike Cui wrote: > > > > > Yeah I also just went over the C99 spec as well, section 6.7.8, and I > > > > have > > > > > to agree with clang developer's interpretation, that "{ 0 }" > > > > > only initializes the first member of the union. > > > > > > > > There is no such thing as "only initializes [part]" in the C language.. > > > > The { 0 } *only provides a value for* the first member. The question > > > > is about what happens to parts of the object for which the initializer > > > > did not "provide a value". However, the C99 standard does not clearly > > > > describe how the bits of a union that are not part of the member for > > > > which a value is provided (usually the first, unless a designated > > > > initializer is used) are filled on initialization. > > > > > > > > You are referring to this paragraph? > > > > > > 6.7.9 ¶21 > > > If there are fewer initializers in a brace-enclosed list than there are > > > elements or members of an aggregate, or fewer characters in a string > > > literal used to initialize an array of known size than there are elements > > > in the array, the remainder of the aggregate shall be initialized > > > implicitly the same as objects that have static storage duration. > > > > > > Folks on the LLVM discourse pointed out this paragraph does not apply to > > > unions, since unions are not "aggegates" according to the definition in > > > 6.2.5p21: > > > 21. Arithmetic types and pointer types are collectively called scalar > > > types. Array and structure types are collectively called *aggregate* > > types. > > > > No, the part below that you didn't reply to covers unions: > > > > > The full 6.7.9 ¶10: > > 10 If an object that has automatic storage duration is not initialized > explicitly, its value is indeterminate. If an object that has static or > thread storage duration is not initialized explicitly, then: > - if it has pointer type, it is initialized to a null pointer; > - if it has arithmetic type, it is initialized to (positive or unsigned) > zero; > - if it is an aggregate, every member is initialized (recursively) > according to these rules, and any padding is initialized to zero bits; > - if it is a union, the first named member is initialized (recursively) > according to these rules, and any padding is initialized to zero bits; > > The second part that you quoted applies to "static or thread storage > duration". The first sentence specifically says that anything not > initialized is indeterminate. > The only other paragraph which invokes 6.7.9 p10 is 6.7.9p21, which also > does not apply to unions. (p21 ensures that the "ns" array of unions in the > code would be zeroed out, but not the "sa" which a single union allocated > on the stack.) ¶19 says: "all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration." The term "subobject" does not seem to be defined, so there's some ambiguity, but I would read ¶19 as applying the above text about static unions to automatic ones. In any case, what clang wants to do here seems like a big gratuitous footgun. We'll make the code in musl safe against this but I suspect it will have lots of bad effects elsewhere... Rich ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-19 16:08 ` Rich Felker @ 2024-03-19 16:39 ` Jₑₙₛ Gustedt 2024-03-19 20:47 ` Thorsten Glaser 2024-03-19 21:04 ` NRK 0 siblings, 2 replies; 17+ messages in thread From: Jₑₙₛ Gustedt @ 2024-03-19 16:39 UTC (permalink / raw) To: Rich Felker; +Cc: musl, Mike Cui, NRK Hi, actually the introduction of `{}` versus `{0}` in C23 was not meant to provide any difference in semantics, just to make the syntax nicer and consistent with C++. on Tue, 19 Mar 2024 12:08:32 -0400 you (Rich Felker <dalias@libc.org>) wrote: > On Tue, Mar 19, 2024 at 08:55:22AM -0700, Mike Cui wrote: > > On Tue, Mar 19, 2024 at 8:42 AM Rich Felker <dalias@libc.org> wrote: > > > > > On Tue, Mar 19, 2024 at 08:04:31AM -0700, Mike Cui wrote: > [...] > [...] > [...] > [...] > [...] > [...] > [...] > > > types. > > > > > > No, the part below that you didn't reply to covers unions: > > > > > > > > The full 6.7.9 ¶10: > > > > 10 If an object that has automatic storage duration is not > > initialized explicitly, its value is indeterminate. If an object > > that has static or thread storage duration is not initialized > > explicitly, then: > > - if it has pointer type, it is initialized to a null pointer; > > - if it has arithmetic type, it is initialized to (positive or > > unsigned) zero; > > - if it is an aggregate, every member is initialized (recursively) > > according to these rules, and any padding is initialized to zero > > bits; > > - if it is a union, the first named member is initialized > > (recursively) according to these rules, and any padding is > > initialized to zero bits; > > > > The second part that you quoted applies to "static or thread storage > > duration". The first sentence specifically says that anything not > > initialized is indeterminate. > > The only other paragraph which invokes 6.7.9 p10 is 6.7.9p21, which > > also does not apply to unions. (p21 ensures that the "ns" array of > > unions in the code would be zeroed out, but not the "sa" which a > > single union allocated on the stack.) > > ¶19 says: > > "all subobjects that are not initialized explicitly shall be > initialized implicitly the same as objects that have static > storage duration." > > The term "subobject" does not seem to be defined, so there's some > ambiguity, but I would read ¶19 as applying the above text about > static unions to automatic ones. > > In any case, what clang wants to do here seems like a big gratuitous > footgun. We'll make the code in musl safe against this but I suspect > it will have lots of bad effects elsewhere... To avoid such differences in interpretation, the simplest solution seems to be to always put the biggest union member first, or to even add an artificial first one `char [size-of-the-union] __dummy;`, such that this is always 0-byte initialized if there is any initialization at all. Jₑₙₛ -- :: ICube :::::::::::::::::::::::::::::: deputy director :: :: Université de Strasbourg :::::::::::::::::::::: ICPS :: :: INRIA Nancy Grand Est :::::::::::::::::::::::: Camus :: :: :::::::::::::::::::::::::::::::::::: ☎ +33 368854536 :: :: https://icube-icps.unistra.fr/index.php/Jens_Gustedt :: ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-19 16:39 ` Jₑₙₛ Gustedt @ 2024-03-19 20:47 ` Thorsten Glaser 2024-03-21 10:58 ` Jₑₙₛ Gustedt 2024-03-19 21:04 ` NRK 1 sibling, 1 reply; 17+ messages in thread From: Thorsten Glaser @ 2024-03-19 20:47 UTC (permalink / raw) To: musl Jₑₙₛ Gustedt dixit: >seems to be to always put the biggest union member first, or to even >add an artificial first one `char [size-of-the-union] __dummy;`, such >that this is always 0-byte initialized if there is any initialization But then you can just memset the union and then initialise any known pointer members to NULL/nullptr manually afterwards in the union’s member you actually need. (IIRC, POSIX actively specifies the possible pointer members in these structs for that reason.) bye, //mirabilos -- 15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-19 20:47 ` Thorsten Glaser @ 2024-03-21 10:58 ` Jₑₙₛ Gustedt 2024-03-21 16:41 ` Thorsten Glaser 0 siblings, 1 reply; 17+ messages in thread From: Jₑₙₛ Gustedt @ 2024-03-21 10:58 UTC (permalink / raw) To: Thorsten Glaser; +Cc: musl Hi, on Tue, 19 Mar 2024 20:47:00 +0000 (UTC) you (Thorsten Glaser <tg@mirbsd.de>) wrote: > Jₑₙₛ Gustedt dixit: > > >seems to be to always put the biggest union member first, or to even > >add an artificial first one `char [size-of-the-union] __dummy;`, such > >that this is always 0-byte initialized if there is any > >initialization > > But then you can just memset the union and then initialise any > known pointer members to NULL/nullptr manually afterwards in the > union’s member you actually need. > > (IIRC, POSIX actively specifies the possible pointer members in > these structs for that reason.) no, that is actually not a full solution, I think. As soon as you store to any member, padding bytes may change to arbitrary values. I don't know if compilers really do that, but for example writing a wide register that has garbage in the upper half could be valid in some cases. Also, I think that up-thread there was the idea that a statically initialized object could be used to initialize an automatic object, and that this would guarantee that the padding is transferred. There is no such guarantee. So if you have known places where the pointer members are situated, you could try to ensure that you have an artificial union member as first, that has pointers at these places and otherwise fills the gaps with `unsigned char[something]`. That would guarantee that initialization does the right thing for pointer members, that all other bytes are zero-initialized, *and* that these values would be properly transferred on union assignment. Thanks Jₑₙₛ -- :: ICube :::::::::::::::::::::::::::::: deputy director :: :: Université de Strasbourg :::::::::::::::::::::: ICPS :: :: INRIA Nancy Grand Est :::::::::::::::::::::::: Camus :: :: :::::::::::::::::::::::::::::::::::: ☎ +33 368854536 :: :: https://icube-icps.unistra.fr/index.php/Jens_Gustedt :: ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-21 10:58 ` Jₑₙₛ Gustedt @ 2024-03-21 16:41 ` Thorsten Glaser 0 siblings, 0 replies; 17+ messages in thread From: Thorsten Glaser @ 2024-03-21 16:41 UTC (permalink / raw) To: musl Jₑₙₛ Gustedt dixit: >As soon as you store >to any member, padding bytes may change to arbitrary values. Yes, but where is that a problem? Something like: union foo { struct bar { char *s; size_t z; } a; struct baz { size_t z; char *s; } b; }; int somefunc(int mode, char *buf, size_t len, …) { union foo u; memset(u, '\0', sizeof(u)); /* … */ if (mode) { /* from here on, u is decided to be a */ u.a.s = NULL; } else { /* from here on, u is decided to be b */ u.b.s = NULL; } /* … some other processing … */ if (mode) { u.a.s = buf; u.a.z = len; } else { u.b.s = buf; u.b.z = len; } return (someotherfunc(&u, mode, …)); } ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-19 16:39 ` Jₑₙₛ Gustedt 2024-03-19 20:47 ` Thorsten Glaser @ 2024-03-19 21:04 ` NRK 2024-03-19 21:36 ` Rich Felker 1 sibling, 1 reply; 17+ messages in thread From: NRK @ 2024-03-19 21:04 UTC (permalink / raw) To: Jₑₙₛ Gustedt; +Cc: Rich Felker, musl, Mike Cui > actually the introduction of `{}` versus `{0}` in C23 was not meant to > provide any difference in semantics, just to make the syntax nicer and > consistent with C++. Regardless of what the intention was, the reality is that it *does* have semantic difference. Specifically, empty initialization `{}` benefits from the default initialization rules, which specify zero-ing out the padding bits whereas `{0}` doesn't guarantee that. - NRK ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-19 21:04 ` NRK @ 2024-03-19 21:36 ` Rich Felker 2024-03-20 17:11 ` NRK 0 siblings, 1 reply; 17+ messages in thread From: Rich Felker @ 2024-03-19 21:36 UTC (permalink / raw) To: NRK; +Cc: Jₑₙₛ Gustedt, musl, Mike Cui On Tue, Mar 19, 2024 at 09:04:41PM +0000, NRK wrote: > > actually the introduction of `{}` versus `{0}` in C23 was not meant to > > provide any difference in semantics, just to make the syntax nicer and > > consistent with C++. > > Regardless of what the intention was, the reality is that it *does* have > semantic difference. Specifically, empty initialization `{}` benefits > from the default initialization rules, which specify zero-ing out the > padding bits whereas `{0}` doesn't guarantee that. That's simply not true. There is no difference in the rules as specified by the standard. Rich ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [musl] Potential bug in __res_msend_rc() wrt to union initialization. 2024-03-19 21:36 ` Rich Felker @ 2024-03-20 17:11 ` NRK 0 siblings, 0 replies; 17+ messages in thread From: NRK @ 2024-03-20 17:11 UTC (permalink / raw) To: Rich Felker; +Cc: Jₑₙₛ Gustedt, musl, Mike Cui > That's simply not true. There is no difference in the rules as > specified by the standard. I don't think your assertion is correct, at least it hasn't been demonstrated. > ¶19 says: > > "all subobjects that are not initialized explicitly shall be > initialized implicitly the same as objects that have static > storage duration." > > The term "subobject" does not seem to be defined, so there's some > ambiguity, but I would read ¶19 as applying the above text about > static unions to automatic ones. The term subobject might not be clearly defined but there's a strong indication that it refers to objects contained within an aggregate, and NOT members of a union. Consider the following case: struct { int a; int b; } object = { .a = 5 }; Assuming `a` and `b` are subobjects, `a` will be initialized to 5 and `b` to 0. Which makes sense and is consistent with all existing implementation I'm aware of. Now consider the same with a union: union { int a; int b; } object = { .a = 5 }; If `b` is a subobjects then it should be initialized to zero according to the rule. But that can't happen since that'd overwrite the value of `a`. This to me is a convincing case that subobjects do not refer to union members, as there can be only 1 active at a time. Now I don't agree with clang's decision to not zero the entire union in case of `{0}`. It's unnecessarily hostile, brings negligible (if any) gains and *will* be causing bugs (this is the 2nd one I'm witnessing). But as it currently stands, it's not a wrong interpretation of the standard either. If the intention was to have no difference between {0} and {} it has not been written down clearly. - NRK ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-03-21 16:53 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-03-18 19:56 [musl] Potential bug in __res_msend_rc() wrt to union initialization Mike Cui 2024-03-18 21:34 ` Rich Felker 2024-03-18 22:22 ` NRK 2024-03-18 22:39 ` [musl] Potential bug in __res_msend_rc() wrt to union initialisation Thorsten Glaser 2024-03-19 0:01 ` [musl] Potential bug in __res_msend_rc() wrt to union initialization Mike Cui 2024-03-19 13:18 ` Rich Felker 2024-03-19 15:04 ` Mike Cui 2024-03-19 15:42 ` Rich Felker 2024-03-19 15:55 ` Mike Cui 2024-03-19 16:08 ` Rich Felker 2024-03-19 16:39 ` Jₑₙₛ Gustedt 2024-03-19 20:47 ` Thorsten Glaser 2024-03-21 10:58 ` Jₑₙₛ Gustedt 2024-03-21 16:41 ` Thorsten Glaser 2024-03-19 21:04 ` NRK 2024-03-19 21:36 ` Rich Felker 2024-03-20 17:11 ` NRK
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).