mailing list of musl libc
 help / color / mirror / code / Atom feed
* size_t and int64_t on a new platform
@ 2016-03-31 18:20 Dan Gohman
  2016-03-31 19:25 ` Rich Felker
  0 siblings, 1 reply; 35+ messages in thread
From: Dan Gohman @ 2016-03-31 18:20 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1161 bytes --]

I'm working on a new architecture (WebAssembly, aka wasm) and am hoping to
have a compatible ABI at the level of a "freestanding implementation"
between all libc ports.

The current design would translate into the following in a musl port (in
.../bits/alltypes.h.in):

#define _Addr long
#define _Int64 long long

Both the ILP32 and LP64 platform variants would use the same definitions.
This helps minimize differences between the two variants, which aligns with
an overall goal of the platform.

However, this differs from musl's convention of using "int" for _Addr on
ILP32 systems and using "long" for _Int64 on LP64 systems. But, as far as I
can tell, no musl code actually depends on this convention. Almost all code
in musl is either fully portable and can't, or is architecture-specific and
can just do the right thing for its own architecture.

Legacy code may have assumptions, though I'm aware of the issues and don't
believe it's a significant practical problem for WebAssembly.

If we decide to contribute wasm support upstream to the musl project in the
future, would the musl maintainers expect to be ok with the above
definitions?

Thanks,

Dan

[-- Attachment #2: Type: text/html, Size: 1397 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: size_t and int64_t on a new platform
  2016-03-31 18:20 size_t and int64_t on a new platform Dan Gohman
@ 2016-03-31 19:25 ` Rich Felker
  2016-03-31 20:10   ` Szabolcs Nagy
  2016-04-01  0:35   ` size_t and int64_t on a new platform Dan Gohman
  0 siblings, 2 replies; 35+ messages in thread
From: Rich Felker @ 2016-03-31 19:25 UTC (permalink / raw)
  To: musl

On Thu, Mar 31, 2016 at 11:20:22AM -0700, Dan Gohman wrote:
> I'm working on a new architecture (WebAssembly, aka wasm) and am hoping to
> have a compatible ABI at the level of a "freestanding implementation"
> between all libc ports.
> 
> The current design would translate into the following in a musl port (in
> ..../bits/alltypes.h.in):
> 
> #define _Addr long
> #define _Int64 long long
> 
> Both the ILP32 and LP64 platform variants would use the same definitions.
> This helps minimize differences between the two variants, which aligns with
> an overall goal of the platform.
> 
> However, this differs from musl's convention of using "int" for _Addr on
> ILP32 systems and using "long" for _Int64 on LP64 systems. But, as far as I
> can tell, no musl code actually depends on this convention. Almost all code
> in musl is either fully portable and can't, or is architecture-specific and
> can just do the right thing for its own architecture.
> 
> Legacy code may have assumptions, though I'm aware of the issues and don't
> believe it's a significant practical problem for WebAssembly.
> 
> If we decide to contribute wasm support upstream to the musl project in the
> future, would the musl maintainers expect to be ok with the above
> definitions?

At some point we'll probably have to make this relaxation anyway. I've
heard there's at least one arch we're planning to add (maybe
powerpc64? I forget) that's using long instead of int for _Addr types.
What would be most helpful to us (to keep things simple) is just
ensuring that all the relevant types (size_t, ssize_t, ptrdiff_t,
[u]intptr_t, etc.) are defined consistently as int or as long;
otherwise we have to pop a hole in the abstraction they're modeled
with now. That wouldn't be a huge problem either but it just adds more
redundancy to arch/*/bits/alltypes.h.in files.

Anyone else have objections to use of long for these types?

Rich


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: size_t and int64_t on a new platform
  2016-03-31 19:25 ` Rich Felker
@ 2016-03-31 20:10   ` Szabolcs Nagy
  2016-03-31 20:23     ` Alexander Monakov
  2016-04-01  0:35   ` size_t and int64_t on a new platform Dan Gohman
  1 sibling, 1 reply; 35+ messages in thread
From: Szabolcs Nagy @ 2016-03-31 20:10 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2016-03-31 15:25:18 -0400]:
> On Thu, Mar 31, 2016 at 11:20:22AM -0700, Dan Gohman wrote:
> > I'm working on a new architecture (WebAssembly, aka wasm) and am hoping to
> > have a compatible ABI at the level of a "freestanding implementation"
> > between all libc ports.
> > 
> > The current design would translate into the following in a musl port (in
> > ..../bits/alltypes.h.in):
> > 
> > #define _Addr long
> > #define _Int64 long long
> > 
> > Both the ILP32 and LP64 platform variants would use the same definitions.
> > This helps minimize differences between the two variants, which aligns with
> > an overall goal of the platform.
> > 
> > However, this differs from musl's convention of using "int" for _Addr on
> > ILP32 systems and using "long" for _Int64 on LP64 systems. But, as far as I
> > can tell, no musl code actually depends on this convention. Almost all code
> > in musl is either fully portable and can't, or is architecture-specific and
> > can just do the right thing for its own architecture.
> > 
> > Legacy code may have assumptions, though I'm aware of the issues and don't
> > believe it's a significant practical problem for WebAssembly.
> > 
> > If we decide to contribute wasm support upstream to the musl project in the
> > future, would the musl maintainers expect to be ok with the above
> > definitions?
> 
> At some point we'll probably have to make this relaxation anyway. I've
> heard there's at least one arch we're planning to add (maybe
> powerpc64? I forget) that's using long instead of int for _Addr types.
> What would be most helpful to us (to keep things simple) is just
> ensuring that all the relevant types (size_t, ssize_t, ptrdiff_t,
> [u]intptr_t, etc.) are defined consistently as int or as long;
> otherwise we have to pop a hole in the abstraction they're modeled
> with now. That wouldn't be a huge problem either but it just adds more
> redundancy to arch/*/bits/alltypes.h.in files.
> 
> Anyone else have objections to use of long for these types?
> 

there are currently two targets in gcc that do
the same (openbsd and vms), so most likely the
alternative typedefs are not an issue.
(i don't think powerpc64 is different, the same
glibc-stdint.h is used for all *-linux* targets
in gcc)

however musl has to match the abi the compiler
assumes (a compiler does not need to know about
the typedefs normally, but printf fmt warnings
and fortran c ffi rely on the compiler's knowledge
about these typedefs) so the compiler has to
be configured accordingly.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: size_t and int64_t on a new platform
  2016-03-31 20:10   ` Szabolcs Nagy
@ 2016-03-31 20:23     ` Alexander Monakov
  2016-03-31 20:30       ` Rich Felker
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Monakov @ 2016-03-31 20:23 UTC (permalink / raw)
  To: musl

How size_t and friends are typedef'd is visible in C++ mangled names, so
unless I'm misunderstanding the context here, musl most likely doesn't want
to typedef them differently to what's typical on the platform.

What about using compiler defines?  GCC and Clang will predefine __SIZE_TYPE__
and such, which is directly usable for typedef'ing size_t&co; see:

:| gcc -xc - -E -dD|grep TYPE

Alexander


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: size_t and int64_t on a new platform
  2016-03-31 20:23     ` Alexander Monakov
@ 2016-03-31 20:30       ` Rich Felker
  2016-04-01  9:16         ` recvmsg/sendmsg broken on mips64 Sebastian Gottschall
  0 siblings, 1 reply; 35+ messages in thread
From: Rich Felker @ 2016-03-31 20:30 UTC (permalink / raw)
  To: musl

On Thu, Mar 31, 2016 at 11:23:17PM +0300, Alexander Monakov wrote:
> How size_t and friends are typedef'd is visible in C++ mangled names, so
> unless I'm misunderstanding the context here, musl most likely doesn't want
> to typedef them differently to what's typical on the platform.
> 
> What about using compiler defines?  GCC and Clang will predefine __SIZE_TYPE__
> and such, which is directly usable for typedef'ing size_t&co; see:
> 
> :| gcc -xc - -E -dD|grep TYPE

Changing them on an existing platform is not what's under discussion.
The question was just about whether a new (virtual) arch can use
[unsigned] long rather than [unsigned] int for these types without
making things difficult for musl. Of course the compiler's choice of
types has to match musl's, whichever definition is used.

Rich


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: size_t and int64_t on a new platform
  2016-03-31 19:25 ` Rich Felker
  2016-03-31 20:10   ` Szabolcs Nagy
@ 2016-04-01  0:35   ` Dan Gohman
  1 sibling, 0 replies; 35+ messages in thread
From: Dan Gohman @ 2016-04-01  0:35 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2348 bytes --]

On Thu, Mar 31, 2016 at 12:25 PM, Rich Felker <dalias@libc.org> wrote:

> On Thu, Mar 31, 2016 at 11:20:22AM -0700, Dan Gohman wrote:
> > I'm working on a new architecture (WebAssembly, aka wasm) and am hoping
> to
> > have a compatible ABI at the level of a "freestanding implementation"
> > between all libc ports.
> >
> > The current design would translate into the following in a musl port (in
> > ..../bits/alltypes.h.in):
> >
> > #define _Addr long
> > #define _Int64 long long
> >
> > Both the ILP32 and LP64 platform variants would use the same definitions.
> > This helps minimize differences between the two variants, which aligns
> with
> > an overall goal of the platform.
> >
> > However, this differs from musl's convention of using "int" for _Addr on
> > ILP32 systems and using "long" for _Int64 on LP64 systems. But, as far
> as I
> > can tell, no musl code actually depends on this convention. Almost all
> code
> > in musl is either fully portable and can't, or is architecture-specific
> and
> > can just do the right thing for its own architecture.
> >
> > Legacy code may have assumptions, though I'm aware of the issues and
> don't
> > believe it's a significant practical problem for WebAssembly.
> >
> > If we decide to contribute wasm support upstream to the musl project in
> the
> > future, would the musl maintainers expect to be ok with the above
> > definitions?
>
> At some point we'll probably have to make this relaxation anyway. I've
> heard there's at least one arch we're planning to add (maybe
> powerpc64? I forget) that's using long instead of int for _Addr types.
> What would be most helpful to us (to keep things simple) is just
> ensuring that all the relevant types (size_t, ssize_t, ptrdiff_t,
> [u]intptr_t, etc.) are defined consistently as int or as long;
> otherwise we have to pop a hole in the abstraction they're modeled
> with now. That wouldn't be a huge problem either but it just adds more
> redundancy to arch/*/bits/alltypes.h.in files.
>

Sounds good. And I agree; size_t, ssize_t, ptrdiff_t, [u]intptr_t, etc.
would all remain consistent with each other.

And to answer the concerns about compilers, I'm also a developer on the
first C/C++ compiler being ported to this platform, so I'll make sure that
the compiler's types agree with those defined in the library headers.

Thanks,

Dan

[-- Attachment #2: Type: text/html, Size: 3166 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* recvmsg/sendmsg broken on mips64
  2016-03-31 20:30       ` Rich Felker
@ 2016-04-01  9:16         ` Sebastian Gottschall
  2016-04-01  9:49           ` Szabolcs Nagy
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-01  9:16 UTC (permalink / raw)
  To: musl

I discovered that the whole recvmsg/sendmsg code is broken in mips64
but i found also the solution
i throwed out all the _pad1, _pad2 crap in socket.h and the 
corrosponding code in recvmsg.c etc.
and used size_t instead. this works at the end. i see no reason for this 
padding, since using the correct datatype will handle it in the same way.
this solution may also work for other 64 bit targets. so proposal is 
fixing the datatype instead of using int with padding in case of 64 bit

this here is my working struct in mips64 (big endian)

struct msghdr {
         void *msg_name;
         socklen_t msg_namelen;
         struct iovec *msg_iov;
         size_t msg_iovlen;
         void *msg_control;
         size_t msg_controllen;
         int msg_flags;
};

struct cmsghdr {
         size_t cmsg_len;
         int cmsg_level;
         int cmsg_type;
};


Sebastian


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-01  9:16         ` recvmsg/sendmsg broken on mips64 Sebastian Gottschall
@ 2016-04-01  9:49           ` Szabolcs Nagy
  2016-04-01 10:29             ` Sebastian Gottschall
  0 siblings, 1 reply; 35+ messages in thread
From: Szabolcs Nagy @ 2016-04-01  9:49 UTC (permalink / raw)
  To: musl

* Sebastian Gottschall <s.gottschall@dd-wrt.com> [2016-04-01 11:16:20 +0200]:
> I discovered that the whole recvmsg/sendmsg code is broken in mips64
> but i found also the solution
> i throwed out all the _pad1, _pad2 crap in socket.h and the corrosponding
> code in recvmsg.c etc.
> and used size_t instead. this works at the end. i see no reason for this
> padding, since using the correct datatype will handle it in the same way.
> this solution may also work for other 64 bit targets. so proposal is fixing
> the datatype instead of using int with padding in case of 64 bit
> 

the padding is needed, i think __BIG_ENDIAN
or __LITTLE_ENDIAN might not be defined properly.

your fix is non-conforming and breaks both abi and api,
the definition must match

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_socket.h.html

> this here is my working struct in mips64 (big endian)
> 
> struct msghdr {
>         void *msg_name;
>         socklen_t msg_namelen;
>         struct iovec *msg_iov;
>         size_t msg_iovlen;
>         void *msg_control;
>         size_t msg_controllen;
>         int msg_flags;
> };
> 
> struct cmsghdr {
>         size_t cmsg_len;
>         int cmsg_level;
>         int cmsg_type;
> };
> 
> 
> Sebastian


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-01  9:49           ` Szabolcs Nagy
@ 2016-04-01 10:29             ` Sebastian Gottschall
  2016-04-01 11:31               ` Szabolcs Nagy
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-01 10:29 UTC (permalink / raw)
  To: musl

Am 01.04.2016 um 11:49 schrieb Szabolcs Nagy:
> * Sebastian Gottschall <s.gottschall@dd-wrt.com> [2016-04-01 11:16:20 +0200]:
>> I discovered that the whole recvmsg/sendmsg code is broken in mips64
>> but i found also the solution
>> i throwed out all the _pad1, _pad2 crap in socket.h and the corrosponding
>> code in recvmsg.c etc.
>> and used size_t instead. this works at the end. i see no reason for this
>> padding, since using the correct datatype will handle it in the same way.
>> this solution may also work for other 64 bit targets. so proposal is fixing
>> the datatype instead of using int with padding in case of 64 bit
>>
> the padding is needed, i think __BIG_ENDIAN
> or __LITTLE_ENDIAN might not be defined properly.
i checked this already. it was defined properly. the only solution was 
using the correct datatypes as defined in the kernel and i also checked 
uclibc. it uses also just size_t and nothing else.
the padding results in the same datatype size, just clears the upper and 
lower word. but this doesnt seem to be neccessary
> your fix is non-conforming and breaks both abi and api,
> the definition must match
socklen_t would result in the same 64bit datatype instead of int + pad 
(which is 64 bit too). so its conforming.
i mached by header variant by reading the kernel headers which uses 
size_t instead of socketlen_t
so i assume socketlen_t maches size_t
>
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_socket.h.html
>
>> this here is my working struct in mips64 (big endian)
>>
>> struct msghdr {
>>          void *msg_name;
>>          socklen_t msg_namelen;
>>          struct iovec *msg_iov;
>>          size_t msg_iovlen;
>>          void *msg_control;
>>          size_t msg_controllen;
>>          int msg_flags;
>> };
>>
>> struct cmsghdr {
>>          size_t cmsg_len;
>>          int cmsg_level;
>>          int cmsg_type;
>> };
>>
>>
>> Sebastian




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-01 10:29             ` Sebastian Gottschall
@ 2016-04-01 11:31               ` Szabolcs Nagy
  2016-04-01 11:37                 ` Sebastian Gottschall
  0 siblings, 1 reply; 35+ messages in thread
From: Szabolcs Nagy @ 2016-04-01 11:31 UTC (permalink / raw)
  To: musl

* Sebastian Gottschall <s.gottschall@dd-wrt.com> [2016-04-01 12:29:35 +0200]:

> Am 01.04.2016 um 11:49 schrieb Szabolcs Nagy:
> >* Sebastian Gottschall <s.gottschall@dd-wrt.com> [2016-04-01 11:16:20 +0200]:
> >>I discovered that the whole recvmsg/sendmsg code is broken in mips64
> >>but i found also the solution
> >>i throwed out all the _pad1, _pad2 crap in socket.h and the corrosponding
> >>code in recvmsg.c etc.
> >>and used size_t instead. this works at the end. i see no reason for this
> >>padding, since using the correct datatype will handle it in the same way.
> >>this solution may also work for other 64 bit targets. so proposal is fixing
> >>the datatype instead of using int with padding in case of 64 bit
> >>
> >the padding is needed, i think __BIG_ENDIAN
> >or __LITTLE_ENDIAN might not be defined properly.
> i checked this already. it was defined properly. the only solution was using
> the correct datatypes as defined in the kernel and i also checked uclibc. it
> uses also just size_t and nothing else.
> the padding results in the same datatype size, just clears the upper and
> lower word. but this doesnt seem to be neccessary
> >your fix is non-conforming and breaks both abi and api,
> >the definition must match
> socklen_t would result in the same 64bit datatype instead of int + pad
> (which is 64 bit too). so its conforming.
> i mached by header variant by reading the kernel headers which uses size_t
> instead of socketlen_t
> so i assume socketlen_t maches size_t

msg_iovlen must be int
msg_controllen and cmsg_len must be socklen_t

the socklen_t typedef must match what the size
the socket syscalls expect which is int.

uclibc and linux uapi is known to be broken,
linux uapi is not fixed because of abi compat
but we can work this around in musl.

if the endian macros are defined then the padding
should work.

> >
> >http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_socket.h.html
> >
> >>this here is my working struct in mips64 (big endian)
> >>
> >>struct msghdr {
> >>         void *msg_name;
> >>         socklen_t msg_namelen;
> >>         struct iovec *msg_iov;
> >>         size_t msg_iovlen;
> >>         void *msg_control;
> >>         size_t msg_controllen;
> >>         int msg_flags;
> >>};
> >>
> >>struct cmsghdr {
> >>         size_t cmsg_len;
> >>         int cmsg_level;
> >>         int cmsg_type;
> >>};
> >>
> >>
> >>Sebastian
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-01 11:31               ` Szabolcs Nagy
@ 2016-04-01 11:37                 ` Sebastian Gottschall
  2016-04-01 12:21                   ` Masanori Ogino
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-01 11:37 UTC (permalink / raw)
  To: musl

Am 01.04.2016 um 13:31 schrieb Szabolcs Nagy:
> * Sebastian Gottschall <s.gottschall@dd-wrt.com> [2016-04-01 12:29:35 +0200]:
>
>> Am 01.04.2016 um 11:49 schrieb Szabolcs Nagy:
>>> * Sebastian Gottschall <s.gottschall@dd-wrt.com> [2016-04-01 11:16:20 +0200]:
>>>> I discovered that the whole recvmsg/sendmsg code is broken in mips64
>>>> but i found also the solution
>>>> i throwed out all the _pad1, _pad2 crap in socket.h and the corrosponding
>>>> code in recvmsg.c etc.
>>>> and used size_t instead. this works at the end. i see no reason for this
>>>> padding, since using the correct datatype will handle it in the same way.
>>>> this solution may also work for other 64 bit targets. so proposal is fixing
>>>> the datatype instead of using int with padding in case of 64 bit
>>>>
>>> the padding is needed, i think __BIG_ENDIAN
>>> or __LITTLE_ENDIAN might not be defined properly.
>> i checked this already. it was defined properly. the only solution was using
>> the correct datatypes as defined in the kernel and i also checked uclibc. it
>> uses also just size_t and nothing else.
>> the padding results in the same datatype size, just clears the upper and
>> lower word. but this doesnt seem to be neccessary
>>> your fix is non-conforming and breaks both abi and api,
>>> the definition must match
>> socklen_t would result in the same 64bit datatype instead of int + pad
>> (which is 64 bit too). so its conforming.
>> i mached by header variant by reading the kernel headers which uses size_t
>> instead of socketlen_t
>> so i assume socketlen_t maches size_t
> msg_iovlen must be int
> msg_controllen and cmsg_len must be socklen_t
the kernel uses size_t
>
> the socklen_t typedef must match what the size
> the socket syscalls expect which is int.
>
> uclibc and linux uapi is known to be broken,
> linux uapi is not fixed because of abi compat
> but we can work this around in musl.
>
> if the endian macros are defined then the padding
> should work.
okay. but musl is a library used with linux only. so if linux uses 
size_t, then musl must use the same abi.
otherwise musl wont work with unimportant programs like "ip" for 64 bit 
targets (havent checked x64 yet)
>>> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_socket.h.html
>>>
>>>> this here is my working struct in mips64 (big endian)
>>>>
>>>> struct msghdr {
>>>>          void *msg_name;
>>>>          socklen_t msg_namelen;
>>>>          struct iovec *msg_iov;
>>>>          size_t msg_iovlen;
>>>>          void *msg_control;
>>>>          size_t msg_controllen;
>>>>          int msg_flags;
>>>> };
>>>>
>>>> struct cmsghdr {
>>>>          size_t cmsg_len;
>>>>          int cmsg_level;
>>>>          int cmsg_type;
>>>> };
>>>>
>>>>
>>>> Sebastian




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-01 11:37                 ` Sebastian Gottschall
@ 2016-04-01 12:21                   ` Masanori Ogino
  2016-04-01 12:42                     ` Sebastian Gottschall
  0 siblings, 1 reply; 35+ messages in thread
From: Masanori Ogino @ 2016-04-01 12:21 UTC (permalink / raw)
  To: musl

Hello,

2016-04-01 20:37 GMT+09:00 Sebastian Gottschall <s.gottschall@dd-wrt.com>:
> okay. but musl is a library used with linux only. so if linux uses size_t,
> then musl must use the same abi.
> otherwise musl wont work with unimportant programs like "ip" for 64 bit
> targets (havent checked x64 yet)

I have a x86_64 box with musl and iproute2 seems to work with current
(padded) definition.

You said the code is broken. How/when is it broken?
Could you give me a test code for the problem? Then, I can test that
on my x86_64 box.

(well, of course it may fail if the test code declares the structs
independently, but then the test *is* broken whether the definition is
standard-conformant or not.)

-- 
Masanori Ogino


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-01 12:21                   ` Masanori Ogino
@ 2016-04-01 12:42                     ` Sebastian Gottschall
  2016-04-01 13:17                       ` Szabolcs Nagy
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-01 12:42 UTC (permalink / raw)
  To: musl

Am 01.04.2016 um 14:21 schrieb Masanori Ogino:
> Hello,
>
> 2016-04-01 20:37 GMT+09:00 Sebastian Gottschall <s.gottschall@dd-wrt.com>:
>> okay. but musl is a library used with linux only. so if linux uses size_t,
>> then musl must use the same abi.
>> otherwise musl wont work with unimportant programs like "ip" for 64 bit
>> targets (havent checked x64 yet)
> I have a x86_64 box with musl and iproute2 seems to work with current
> (padded) definition.
yes it does. i checked it 30 minutes ago
>
> You said the code is broken. How/when is it broken?
> Could you give me a test code for the problem? Then, I can test that
> on my x86_64 box.
it only affects mips64 so far. not x64. i checked both using dd-wrt
> (well, of course it may fail if the test code declares the structs
> independently, but then the test *is* broken whether the definition is
> standard-conformant or not.)
with mips64 (octeon) the whole netlink code in iproute2 doesnt work. it 
simly fails since recvmsg returns no data. sendmsg is likelly broken in 
the same way since it uses the same struct
my dirty musl hack again fixed it by using the same datatypes used in 
the kernel. so this might be mips specific.
currently musl does convert the non conform kernel structures to posix 
specified structures, but this doesnt seem to work for mips64
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-01 12:42                     ` Sebastian Gottschall
@ 2016-04-01 13:17                       ` Szabolcs Nagy
  2016-04-02  9:52                         ` Sebastian Gottschall
  0 siblings, 1 reply; 35+ messages in thread
From: Szabolcs Nagy @ 2016-04-01 13:17 UTC (permalink / raw)
  To: musl

* Sebastian Gottschall <s.gottschall@dd-wrt.com> [2016-04-01 14:42:36 +0200]:
> it only affects mips64 so far. not x64. i checked both using dd-wrt

the types *must* be the same on the source level
on *all* targets as specified by posix, the linux
syscall abi is irrelevant, that is not visible
to userspace code which is written for the c
language level api, if you change the types
it is not possible to write portable c code.

> >(well, of course it may fail if the test code declares the structs
> >independently, but then the test *is* broken whether the definition is
> >standard-conformant or not.)
> with mips64 (octeon) the whole netlink code in iproute2 doesnt work. it
> simly fails since recvmsg returns no data. sendmsg is likelly broken in the

your fix does not explain that unless there is
a >4G message somewhere which i think is not
supported on the kernel side either.

please send a proper bug report about what
breaks, it sounds like the padding is at
the wrong place. changing int,int to size_t
should make no difference for iproute2.

> same way since it uses the same struct
> my dirty musl hack again fixed it by using the same datatypes used in the
> kernel. so this might be mips specific.
> currently musl does convert the non conform kernel structures to posix
> specified structures, but this doesnt seem to work for mips64
> >
we should figure out why it does not work
instead of breaking portability.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-01 13:17                       ` Szabolcs Nagy
@ 2016-04-02  9:52                         ` Sebastian Gottschall
  2016-04-07  9:48                           ` Szabolcs Nagy
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-02  9:52 UTC (permalink / raw)
  To: musl

Am 01.04.2016 um 15:17 schrieb Szabolcs Nagy:
> * Sebastian Gottschall <s.gottschall@dd-wrt.com> [2016-04-01 14:42:36 +0200]:
>> it only affects mips64 so far. not x64. i checked both using dd-wrt
> the types *must* be the same on the source level
> on *all* targets as specified by posix, the linux
> syscall abi is irrelevant, that is not visible
> to userspace code which is written for the c
> language level api, if you change the types
> it is not possible to write portable c code.
i understand the reason why the datatypes are defined as is, but on the 
other hand the argument is irrelevant
if it doesnt work portable or not. (for some reason which might be mips 
specific)
but anyway. i dont want to discuss this to the deep and. i prefer to 
find and fix the bug by keeping the original structures.
i will do some debugging on this today

>
> your fix does not explain that unless there is
> a >4G message somewhere which i think is not
> supported on the kernel side either.
>
> please send a proper bug report about what
> breaks, it sounds like the padding is at
> the wrong place. changing int,int to size_t
> should make no difference for iproute2.
it not just padding if you look at confusing codelines like this in sendmsg
for me it look like someone creates a copy of the buffer to work with 
it. but i dont see a reason for it and is does also limit the maximum 
size of a message
and code like this h = *msg; should be replaced by memcpy, since the 
compiler may optimize that in a bad way . i have seen compiler 
introduced bugs
in the past on lines like that. for me that code should be removed. 
clearing padding is one thing, but why doing a copy?

#if LONG_MAX > INT_MAX
         struct msghdr h;
         struct cmsghdr chbuf[1024/sizeof(struct cmsghdr)+1], *c;
         if (msg) {
                 h = *msg;
                 h.__pad1 = h.__pad2 = 0;
                 msg = &h;
                 if (h.msg_controllen) {
                         if (h.msg_controllen > 1024) {
                                 errno = ENOMEM;
                                 return -1;
                         }
                         memcpy(chbuf, h.msg_control, h.msg_controllen);
                         h.msg_control = chbuf;
                         for (c=CMSG_FIRSTHDR(&h); c; c=CMSG_NXTHDR(&h,c))
                                 c->__pad1 = 0;
                 }
         }
#endif

>
>> same way since it uses the same struct
>> my dirty musl hack again fixed it by using the same datatypes used in the
>> kernel. so this might be mips specific.
>> currently musl does convert the non conform kernel structures to posix
>> specified structures, but this doesnt seem to work for mips64
> we should figure out why it does not work
> instead of breaking portability.
okay
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-02  9:52                         ` Sebastian Gottschall
@ 2016-04-07  9:48                           ` Szabolcs Nagy
  2016-04-07 11:42                             ` Sebastian Gottschall
  0 siblings, 1 reply; 35+ messages in thread
From: Szabolcs Nagy @ 2016-04-07  9:48 UTC (permalink / raw)
  To: musl

* Sebastian Gottschall <s.gottschall@dd-wrt.com> [2016-04-02 11:52:32 +0200]:
> i understand the reason why the datatypes are defined as is, but on the
> other hand the argument is irrelevant
> if it doesnt work portable or not. (for some reason which might be mips
> specific)
> but anyway. i dont want to discuss this to the deep and. i prefer to find
> and fix the bug by keeping the original structures.

"original structures" does not make sense.

the next glibc will fix their bug and use exactly the same struct as musl:
http://sourceware.org/ml/libc-alpha/2016-03/msg00661.html

eventually uclibc will fix this too (assuming it's still maintained)

(and hopefully at some point the kernel will introduce new syscalls
that use the correct structs.)

> i will do some debugging on this today
> 
> >
> >your fix does not explain that unless there is
> >a >4G message somewhere which i think is not
> >supported on the kernel side either.
> >
> >please send a proper bug report about what
> >breaks, it sounds like the padding is at
> >the wrong place. changing int,int to size_t
> >should make no difference for iproute2.
> it not just padding if you look at confusing codelines like this in sendmsg
> for me it look like someone creates a copy of the buffer to work with it.
> but i dont see a reason for it and is does also limit the maximum size of a
> message
> and code like this h = *msg; should be replaced by memcpy, since the
> compiler may optimize that in a bad way . i have seen compiler introduced
> bugs
> in the past on lines like that. for me that code should be removed. clearing
> padding is one thing, but why doing a copy?
> 

ok so the failure is in sendmsg and in the msg_control copy.

does the call fail with ENOMEM (because >1024 bytes of ancillary data)?
that would be easy to fix..

(libc has to make a copy, the struct is const and might be in
readonly memory. a detailed bug report of the failure would
be more useful than speculations about broken compilers..
e.g. strace log with and without the msg_control copying.)

> #if LONG_MAX > INT_MAX
>         struct msghdr h;
>         struct cmsghdr chbuf[1024/sizeof(struct cmsghdr)+1], *c;
>         if (msg) {
>                 h = *msg;
>                 h.__pad1 = h.__pad2 = 0;
>                 msg = &h;
>                 if (h.msg_controllen) {
>                         if (h.msg_controllen > 1024) {
>                                 errno = ENOMEM;
>                                 return -1;
>                         }
>                         memcpy(chbuf, h.msg_control, h.msg_controllen);
>                         h.msg_control = chbuf;
>                         for (c=CMSG_FIRSTHDR(&h); c; c=CMSG_NXTHDR(&h,c))
>                                 c->__pad1 = 0;
>                 }
>         }
> #endif


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-07  9:48                           ` Szabolcs Nagy
@ 2016-04-07 11:42                             ` Sebastian Gottschall
  2016-04-07 18:46                               ` Szabolcs Nagy
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-07 11:42 UTC (permalink / raw)
  To: musl


> ok so the failure is in sendmsg and in the msg_control copy.
>
> does the call fail with ENOMEM (because >1024 bytes of ancillary data)?
> that would be easy to fix..
>
> (libc has to make a copy, the struct is const and might be in
> readonly memory. a detailed bug report of the failure would
> be more useful than speculations about broken compilers..
> e.g. strace log with and without the msg_control copying.)
how to make a more detailed report than just that all netlink operations 
in iproute2 fail. so the whole ip command doesnt work.
i tracked it down to recvmsg / sendmsg which do not return in success 
for some reason. if i send it more detailed, which means i debug out
the real cause would also mean submit a better patch. just had no time 
yet to care about since my hack works.

but does not fit to your philosophy. its easy to reproduce on any octeon 
device using musl and iproute2.

>
>> #if LONG_MAX > INT_MAX
>>          struct msghdr h;
>>          struct cmsghdr chbuf[1024/sizeof(struct cmsghdr)+1], *c;
>>          if (msg) {
>>                  h = *msg;
>>                  h.__pad1 = h.__pad2 = 0;
>>                  msg = &h;
>>                  if (h.msg_controllen) {
>>                          if (h.msg_controllen > 1024) {
>>                                  errno = ENOMEM;
>>                                  return -1;
>>                          }
>>                          memcpy(chbuf, h.msg_control, h.msg_controllen);
>>                          h.msg_control = chbuf;
>>                          for (c=CMSG_FIRSTHDR(&h); c; c=CMSG_NXTHDR(&h,c))
>>                                  c->__pad1 = 0;
>>                  }
>>          }
>> #endif




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-07 11:42                             ` Sebastian Gottschall
@ 2016-04-07 18:46                               ` Szabolcs Nagy
  2016-04-07 23:33                                 ` Sebastian Gottschall
  0 siblings, 1 reply; 35+ messages in thread
From: Szabolcs Nagy @ 2016-04-07 18:46 UTC (permalink / raw)
  To: musl

* Sebastian Gottschall <s.gottschall@dd-wrt.com> [2016-04-07 13:42:17 +0200]:
> >ok so the failure is in sendmsg and in the msg_control copy.
> >
> >does the call fail with ENOMEM (because >1024 bytes of ancillary data)?
> >that would be easy to fix..
> >
> >(libc has to make a copy, the struct is const and might be in
> >readonly memory. a detailed bug report of the failure would
> >be more useful than speculations about broken compilers..
> >e.g. strace log with and without the msg_control copying.)
> how to make a more detailed report than just that all netlink operations in
> iproute2 fail. so the whole ip command doesnt work.

there are only two places where msg->msg_control
is used in iproute2: bpf_scm.h and libnetlink.c,
they both use a fixed char[1024] buffer, which
should work with musl.

one thing i noticed is that iproute2 fails to
take cmsghdr alignment requirements into account,
so it only works by accident.

i think the musl struct has different alignment
(4 byte instead of 8 byte) which may cause problems
because the copy uses the musl alignment, i'm
not sure if this can cause what you observed.

so we still don't know what your problem was
and what fails exactly.

> i tracked it down to recvmsg / sendmsg which do not return in success for
> some reason. if i send it more detailed, which means i debug out
> the real cause would also mean submit a better patch. just had no time yet
> to care about since my hack works.
> 
> but does not fit to your philosophy. its easy to reproduce on any octeon
> device using musl and iproute2.
> 
> >
> >>#if LONG_MAX > INT_MAX
> >>         struct msghdr h;
> >>         struct cmsghdr chbuf[1024/sizeof(struct cmsghdr)+1], *c;
> >>         if (msg) {
> >>                 h = *msg;
> >>                 h.__pad1 = h.__pad2 = 0;
> >>                 msg = &h;
> >>                 if (h.msg_controllen) {
> >>                         if (h.msg_controllen > 1024) {
> >>                                 errno = ENOMEM;
> >>                                 return -1;
> >>                         }
> >>                         memcpy(chbuf, h.msg_control, h.msg_controllen);
> >>                         h.msg_control = chbuf;
> >>                         for (c=CMSG_FIRSTHDR(&h); c; c=CMSG_NXTHDR(&h,c))
> >>                                 c->__pad1 = 0;
> >>                 }
> >>         }
> >>#endif
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-07 18:46                               ` Szabolcs Nagy
@ 2016-04-07 23:33                                 ` Sebastian Gottschall
  2016-04-10 22:18                                   ` Rich Felker
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-07 23:33 UTC (permalink / raw)
  To: musl

Am 07.04.2016 um 20:46 schrieb Szabolcs Nagy:
> * Sebastian Gottschall <s.gottschall@dd-wrt.com> [2016-04-07 13:42:17 +0200]:
>>> ok so the failure is in sendmsg and in the msg_control copy.
>>>
>>> does the call fail with ENOMEM (because >1024 bytes of ancillary data)?
>>> that would be easy to fix..
>>>
>>> (libc has to make a copy, the struct is const and might be in
>>> readonly memory. a detailed bug report of the failure would
>>> be more useful than speculations about broken compilers..
>>> e.g. strace log with and without the msg_control copying.)
>> how to make a more detailed report than just that all netlink operations in
>> iproute2 fail. so the whole ip command doesnt work.
> there are only two places where msg->msg_control
> is used in iproute2: bpf_scm.h and libnetlink.c,
> they both use a fixed char[1024] buffer, which
> should work with musl.
>
> one thing i noticed is that iproute2 fails to
> take cmsghdr alignment requirements into account,
> so it only works by accident.
>
> i think the musl struct has different alignment
> (4 byte instead of 8 byte) which may cause problems
> because the copy uses the musl alignment, i'm
> not sure if this can cause what you observed.
>
> so we still don't know what your problem was
> and what fails exactly.
easy again. iproute2 doesnt work on mips64 targets since recvmsg / 
sendmsg call does fail, caused by a wrong structure alignment, bad 
structure at all or bad musl code at all.
its all resulting in the same failing recvmsg / sendmsg call.. so yes 
libnetlink.c does not work with musl on mips64 (it does work on x64 and 
everything else, just not mips64) unless the hack i offered was applied 
which again fixed all.
before you ask again for a problem description, just read again. it wont 
change the description if you ask again and just makes people tired on 
this list.
since the problem is related to the fieldsize of several length 
parameters, i also dont believe it has something todo with structure 
alignments, since i also did not change them in my pseudo fix.
i just matched the fieldsizes to the api used by the kernel, so that the 
kernel syscall does receive the same structure as expected, even if that 
missmatches the posix api
for sure a size change in fieldsized again may change alignment 
boundaries. but then again iproute2 would be the problem cause which 
works for 15 years now with all libc implementations.
if you really want to keep posix compliance. then provide a external 
structure for usage and convert them to a second internal structure 
which matches the linux api, this does remove all that padding stuff
and looks cleaner at the end. and it will also reduce codesize. so you 
can also fix possible alignment issues

>
>> i tracked it down to recvmsg / sendmsg which do not return in success for
>> some reason. if i send it more detailed, which means i debug out
>> the real cause would also mean submit a better patch. just had no time yet
>> to care about since my hack works.
>>
>> but does not fit to your philosophy. its easy to reproduce on any octeon
>> device using musl and iproute2.
>>
>>>> #if LONG_MAX > INT_MAX
>>>>          struct msghdr h;
>>>>          struct cmsghdr chbuf[1024/sizeof(struct cmsghdr)+1], *c;
>>>>          if (msg) {
>>>>                  h = *msg;
>>>>                  h.__pad1 = h.__pad2 = 0;
>>>>                  msg = &h;
>>>>                  if (h.msg_controllen) {
>>>>                          if (h.msg_controllen > 1024) {
>>>>                                  errno = ENOMEM;
>>>>                                  return -1;
>>>>                          }
>>>>                          memcpy(chbuf, h.msg_control, h.msg_controllen);
>>>>                          h.msg_control = chbuf;
>>>>                          for (c=CMSG_FIRSTHDR(&h); c; c=CMSG_NXTHDR(&h,c))
>>>>                                  c->__pad1 = 0;
>>>>                  }
>>>>          }
>>>> #endif




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-07 23:33                                 ` Sebastian Gottschall
@ 2016-04-10 22:18                                   ` Rich Felker
  2016-04-10 22:24                                     ` Sebastian Gottschall
  0 siblings, 1 reply; 35+ messages in thread
From: Rich Felker @ 2016-04-10 22:18 UTC (permalink / raw)
  To: musl

On Fri, Apr 08, 2016 at 01:33:51AM +0200, Sebastian Gottschall wrote:
> Am 07.04.2016 um 20:46 schrieb Szabolcs Nagy:
> >* Sebastian Gottschall <s.gottschall@dd-wrt.com> [2016-04-07 13:42:17 +0200]:
> >>>ok so the failure is in sendmsg and in the msg_control copy.
> >>>
> >>>does the call fail with ENOMEM (because >1024 bytes of ancillary data)?
> >>>that would be easy to fix..
> >>>
> >>>(libc has to make a copy, the struct is const and might be in
> >>>readonly memory. a detailed bug report of the failure would
> >>>be more useful than speculations about broken compilers..
> >>>e.g. strace log with and without the msg_control copying.)
> >>how to make a more detailed report than just that all netlink operations in
> >>iproute2 fail. so the whole ip command doesnt work.
> >there are only two places where msg->msg_control
> >is used in iproute2: bpf_scm.h and libnetlink.c,
> >they both use a fixed char[1024] buffer, which
> >should work with musl.
> >
> >one thing i noticed is that iproute2 fails to
> >take cmsghdr alignment requirements into account,
> >so it only works by accident.
> >
> >i think the musl struct has different alignment
> >(4 byte instead of 8 byte) which may cause problems
> >because the copy uses the musl alignment, i'm
> >not sure if this can cause what you observed.
> >
> >so we still don't know what your problem was
> >and what fails exactly.

> easy again. iproute2 doesnt work on mips64 targets since recvmsg /
> sendmsg call does fail, caused by a wrong structure alignment, bad
> structure at all or bad musl code at all.

I think what nsz was asking for, and what I'd like to see, is a way to
reproduce the bug. I'm going to try building iproute2 for mips64 and
running it on a prebuilt kernel from Aboriginal Linux under
qemu-system-mips64, but I don't know what specific commands are needed
to hit the affected code path.

> its all resulting in the same failing recvmsg / sendmsg call.. so
> yes libnetlink.c does not work with musl on mips64 (it does work on
> x64 and everything else, just not mips64) unless the hack i offered
> was applied which again fixed all.
> before you ask again for a problem description, just read again. it
> wont change the description if you ask again and just makes people
> tired on this list.

Both versions of the struct (musl's and your modified one that matches
the kernel) have the exact same layout, but due to having a member
with 64-bit type, yours has 8-byte alignment and musl's only has
4-byte alignment. This means, at least:

1. When musl's sendmsg.c makes its copy to zero out the padding, the
   copy may not be correctly aligned for 64-bit writes, and the kernel
   faults or manually produces an error for this case, causing the
   whole operation to fail. However, I don't see where iproute2 is
   actually passing control messages to sendmsg, so while this is a
   problem, I don't think it's the cause. Maybe I'm missing the
   affected call point; this is why I'd like steps to reproduce the
   issue so I can see it.

2. iproute2's libnetlink.c's rtnl_listen function does not properly
   declare its cmsgbuf with the alignment of cmsghdr; it has type
   char[] so the compiler is free not to align it at all. This is
   presumably a bug in iproute2, but I can't find any good
   documentation (in the standards or Linux-specific) for how you're
   supposed to allocate this space, so maybe the kernel is able to
   handle aligning the buffer itself. I don't see any way the
   alignment of musl's cmsghdr type affects recvmsg though.

Maybe there are other effects I'm missing? I'll follow up again once I
get a test build/run of iproute2 and let you know whether I can see
the problem.

Rich


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-10 22:18                                   ` Rich Felker
@ 2016-04-10 22:24                                     ` Sebastian Gottschall
  2016-04-10 22:29                                       ` Rich Felker
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-10 22:24 UTC (permalink / raw)
  To: musl

Am 11.04.2016 um 00:18 schrieb Rich Felker:
> On Fri, Apr 08, 2016 at 01:33:51AM +0200, Sebastian Gottschall wrote:
>> Am 07.04.2016 um 20:46 schrieb Szabolcs Nagy:
>>> * Sebastian Gottschall <s.gottschall@dd-wrt.com> [2016-04-07 13:42:17 +0200]:
>>>>> ok so the failure is in sendmsg and in the msg_control copy.
>>>>>
>>>>> does the call fail with ENOMEM (because >1024 bytes of ancillary data)?
>>>>> that would be easy to fix..
>>>>>
>>>>> (libc has to make a copy, the struct is const and might be in
>>>>> readonly memory. a detailed bug report of the failure would
>>>>> be more useful than speculations about broken compilers..
>>>>> e.g. strace log with and without the msg_control copying.)
>>>> how to make a more detailed report than just that all netlink operations in
>>>> iproute2 fail. so the whole ip command doesnt work.
>>> there are only two places where msg->msg_control
>>> is used in iproute2: bpf_scm.h and libnetlink.c,
>>> they both use a fixed char[1024] buffer, which
>>> should work with musl.
>>>
>>> one thing i noticed is that iproute2 fails to
>>> take cmsghdr alignment requirements into account,
>>> so it only works by accident.
>>>
>>> i think the musl struct has different alignment
>>> (4 byte instead of 8 byte) which may cause problems
>>> because the copy uses the musl alignment, i'm
>>> not sure if this can cause what you observed.
>>>
>>> so we still don't know what your problem was
>>> and what fails exactly.
>> easy again. iproute2 doesnt work on mips64 targets since recvmsg /
>> sendmsg call does fail, caused by a wrong structure alignment, bad
>> structure at all or bad musl code at all.
> I think what nsz was asking for, and what I'd like to see, is a way to
> reproduce the bug. I'm going to try building iproute2 for mips64 and
> running it on a prebuilt kernel from Aboriginal Linux under
> qemu-system-mips64, but I don't know what specific commands are needed
> to hit the affected code path.
any command since all is netlink based
ip add add 192.168.1.1/24  dev eth0

yo will see that nothing will happen. ip will just return a error 
message (i wrote this message already in the first entry on this 
mailinglist)
"EOF on netlink" is the error which is shown

>
>> its all resulting in the same failing recvmsg / sendmsg call.. so
>> yes libnetlink.c does not work with musl on mips64 (it does work on
>> x64 and everything else, just not mips64) unless the hack i offered
>> was applied which again fixed all.
>> before you ask again for a problem description, just read again. it
>> wont change the description if you ask again and just makes people
>> tired on this list.
> Both versions of the struct (musl's and your modified one that matches
> the kernel) have the exact same layout, but due to having a member
> with 64-bit type, yours has 8-byte alignment and musl's only has
> 4-byte alignment. This means, at least:
>
> 1. When musl's sendmsg.c makes its copy to zero out the padding, the
>     copy may not be correctly aligned for 64-bit writes, and the kernel
>     faults or manually produces an error for this case, causing the
>     whole operation to fail. However, I don't see where iproute2 is
>     actually passing control messages to sendmsg, so while this is a
>     problem, I don't think it's the cause. Maybe I'm missing the
>     affected call point; this is why I'd like steps to reproduce the
>     issue so I can see it.
>
> 2. iproute2's libnetlink.c's rtnl_listen function does not properly
>     declare its cmsgbuf with the alignment of cmsghdr; it has type
>     char[] so the compiler is free not to align it at all. This is
>     presumably a bug in iproute2, but I can't find any good
>     documentation (in the standards or Linux-specific) for how you're
>     supposed to allocate this space, so maybe the kernel is able to
>     handle aligning the buffer itself. I don't see any way the
>     alignment of musl's cmsghdr type affects recvmsg though.
>
> Maybe there are other effects I'm missing? I'll follow up again once I
> get a test build/run of iproute2 and let you know whether I can see
> the problem.
okay. if you need a remote access to a octeon system using musl (my 
fixed variant), just tell me.
>
> Rich
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-10 22:24                                     ` Sebastian Gottschall
@ 2016-04-10 22:29                                       ` Rich Felker
  2016-04-10 22:33                                         ` Sebastian Gottschall
  0 siblings, 1 reply; 35+ messages in thread
From: Rich Felker @ 2016-04-10 22:29 UTC (permalink / raw)
  To: musl

On Mon, Apr 11, 2016 at 12:24:49AM +0200, Sebastian Gottschall wrote:
> >I think what nsz was asking for, and what I'd like to see, is a way to
> >reproduce the bug. I'm going to try building iproute2 for mips64 and
> >running it on a prebuilt kernel from Aboriginal Linux under
> >qemu-system-mips64, but I don't know what specific commands are needed
> >to hit the affected code path.
> any command since all is netlink based
> ip add add 192.168.1.1/24  dev eth0
> 
> yo will see that nothing will happen. ip will just return a error
> message (i wrote this message already in the first entry on this
> mailinglist)
> "EOF on netlink" is the error which is shown

OK, I'll try this.

> >>its all resulting in the same failing recvmsg / sendmsg call.. so
> >>yes libnetlink.c does not work with musl on mips64 (it does work on
> >>x64 and everything else, just not mips64) unless the hack i offered
> >>was applied which again fixed all.
> >>before you ask again for a problem description, just read again. it
> >>wont change the description if you ask again and just makes people
> >>tired on this list.
> >Both versions of the struct (musl's and your modified one that matches
> >the kernel) have the exact same layout, but due to having a member
> >with 64-bit type, yours has 8-byte alignment and musl's only has
> >4-byte alignment. This means, at least:
> >
> >1. When musl's sendmsg.c makes its copy to zero out the padding, the
> >    copy may not be correctly aligned for 64-bit writes, and the kernel
> >    faults or manually produces an error for this case, causing the
> >    whole operation to fail. However, I don't see where iproute2 is
> >    actually passing control messages to sendmsg, so while this is a
> >    problem, I don't think it's the cause. Maybe I'm missing the
> >    affected call point; this is why I'd like steps to reproduce the
> >    issue so I can see it.
> >
> >2. iproute2's libnetlink.c's rtnl_listen function does not properly
> >    declare its cmsgbuf with the alignment of cmsghdr; it has type
> >    char[] so the compiler is free not to align it at all. This is
> >    presumably a bug in iproute2, but I can't find any good
> >    documentation (in the standards or Linux-specific) for how you're
> >    supposed to allocate this space, so maybe the kernel is able to
> >    handle aligning the buffer itself. I don't see any way the
> >    alignment of musl's cmsghdr type affects recvmsg though.
> >
> >Maybe there are other effects I'm missing? I'll follow up again once I
> >get a test build/run of iproute2 and let you know whether I can see
> >the problem.
> okay. if you need a remote access to a octeon system using musl (my
> fixed variant), just tell me.

That would be really helpful. Something's wrong with the userspace for
the Aboriginal mips64 binaries (SIGBUS in init) and debugging that
would be a big distraction.

BTW do you have gdb and strace available?

Rich


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-10 22:29                                       ` Rich Felker
@ 2016-04-10 22:33                                         ` Sebastian Gottschall
  2016-04-11  2:35                                           ` Rich Felker
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-10 22:33 UTC (permalink / raw)
  To: musl

Am 11.04.2016 um 00:29 schrieb Rich Felker:
> On Mon, Apr 11, 2016 at 12:24:49AM +0200, Sebastian Gottschall wrote:
>>> I think what nsz was asking for, and what I'd like to see, is a way to
>>> reproduce the bug. I'm going to try building iproute2 for mips64 and
>>> running it on a prebuilt kernel from Aboriginal Linux under
>>> qemu-system-mips64, but I don't know what specific commands are needed
>>> to hit the affected code path.
>> any command since all is netlink based
>> ip add add 192.168.1.1/24  dev eth0
>>
>> yo will see that nothing will happen. ip will just return a error
>> message (i wrote this message already in the first entry on this
>> mailinglist)
>> "EOF on netlink" is the error which is shown
> OK, I'll try this.
>
>>>> its all resulting in the same failing recvmsg / sendmsg call.. so
>>>> yes libnetlink.c does not work with musl on mips64 (it does work on
>>>> x64 and everything else, just not mips64) unless the hack i offered
>>>> was applied which again fixed all.
>>>> before you ask again for a problem description, just read again. it
>>>> wont change the description if you ask again and just makes people
>>>> tired on this list.
>>> Both versions of the struct (musl's and your modified one that matches
>>> the kernel) have the exact same layout, but due to having a member
>>> with 64-bit type, yours has 8-byte alignment and musl's only has
>>> 4-byte alignment. This means, at least:
>>>
>>> 1. When musl's sendmsg.c makes its copy to zero out the padding, the
>>>     copy may not be correctly aligned for 64-bit writes, and the kernel
>>>     faults or manually produces an error for this case, causing the
>>>     whole operation to fail. However, I don't see where iproute2 is
>>>     actually passing control messages to sendmsg, so while this is a
>>>     problem, I don't think it's the cause. Maybe I'm missing the
>>>     affected call point; this is why I'd like steps to reproduce the
>>>     issue so I can see it.
>>>
>>> 2. iproute2's libnetlink.c's rtnl_listen function does not properly
>>>     declare its cmsgbuf with the alignment of cmsghdr; it has type
>>>     char[] so the compiler is free not to align it at all. This is
>>>     presumably a bug in iproute2, but I can't find any good
>>>     documentation (in the standards or Linux-specific) for how you're
>>>     supposed to allocate this space, so maybe the kernel is able to
>>>     handle aligning the buffer itself. I don't see any way the
>>>     alignment of musl's cmsghdr type affects recvmsg though.
>>>
>>> Maybe there are other effects I'm missing? I'll follow up again once I
>>> get a test build/run of iproute2 and let you know whether I can see
>>> the problem.
>> okay. if you need a remote access to a octeon system using musl (my
>> fixed variant), just tell me.
> That would be really helpful. Something's wrong with the userspace for
> the Aboriginal mips64 binaries (SIGBUS in init) and debugging that
> would be a big distraction.
>
> BTW do you have gdb and strace available?
not on the system itself. i'm not sure if strace works on mips64. never 
tried it.
but you're free to copy any binary to the /tmp dir. it has 2 gb ram. so 
enough space for static binaries if you want to play with.
i will send you the ssh data in a private email
>
> Rich
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-10 22:33                                         ` Sebastian Gottschall
@ 2016-04-11  2:35                                           ` Rich Felker
  2016-04-11  6:35                                             ` Sebastian Gottschall
  2016-04-21  1:37                                             ` Rich Felker
  0 siblings, 2 replies; 35+ messages in thread
From: Rich Felker @ 2016-04-11  2:35 UTC (permalink / raw)
  To: musl

On Mon, Apr 11, 2016 at 12:33:07AM +0200, Sebastian Gottschall wrote:
> Am 11.04.2016 um 00:29 schrieb Rich Felker:
> >On Mon, Apr 11, 2016 at 12:24:49AM +0200, Sebastian Gottschall wrote:
> >>>I think what nsz was asking for, and what I'd like to see, is a way to
> >>>reproduce the bug. I'm going to try building iproute2 for mips64 and
> >>>running it on a prebuilt kernel from Aboriginal Linux under
> >>>qemu-system-mips64, but I don't know what specific commands are needed
> >>>to hit the affected code path.
> >>any command since all is netlink based
> >>ip add add 192.168.1.1/24  dev eth0
> >>
> >>yo will see that nothing will happen. ip will just return a error
> >>message (i wrote this message already in the first entry on this
> >>mailinglist)
> >>"EOF on netlink" is the error which is shown
> >OK, I'll try this.
> >
> >>>>its all resulting in the same failing recvmsg / sendmsg call.. so
> >>>>yes libnetlink.c does not work with musl on mips64 (it does work on
> >>>>x64 and everything else, just not mips64) unless the hack i offered
> >>>>was applied which again fixed all.
> >>>>before you ask again for a problem description, just read again. it
> >>>>wont change the description if you ask again and just makes people
> >>>>tired on this list.
> >>>Both versions of the struct (musl's and your modified one that matches
> >>>the kernel) have the exact same layout, but due to having a member
> >>>with 64-bit type, yours has 8-byte alignment and musl's only has
> >>>4-byte alignment. This means, at least:
> >>>
> >>>1. When musl's sendmsg.c makes its copy to zero out the padding, the
> >>>    copy may not be correctly aligned for 64-bit writes, and the kernel
> >>>    faults or manually produces an error for this case, causing the
> >>>    whole operation to fail. However, I don't see where iproute2 is
> >>>    actually passing control messages to sendmsg, so while this is a
> >>>    problem, I don't think it's the cause. Maybe I'm missing the
> >>>    affected call point; this is why I'd like steps to reproduce the
> >>>    issue so I can see it.
> >>>
> >>>2. iproute2's libnetlink.c's rtnl_listen function does not properly
> >>>    declare its cmsgbuf with the alignment of cmsghdr; it has type
> >>>    char[] so the compiler is free not to align it at all. This is
> >>>    presumably a bug in iproute2, but I can't find any good
> >>>    documentation (in the standards or Linux-specific) for how you're
> >>>    supposed to allocate this space, so maybe the kernel is able to
> >>>    handle aligning the buffer itself. I don't see any way the
> >>>    alignment of musl's cmsghdr type affects recvmsg though.
> >>>
> >>>Maybe there are other effects I'm missing? I'll follow up again once I
> >>>get a test build/run of iproute2 and let you know whether I can see
> >>>the problem.
> >>okay. if you need a remote access to a octeon system using musl (my
> >>fixed variant), just tell me.
> >That would be really helpful. Something's wrong with the userspace for
> >the Aboriginal mips64 binaries (SIGBUS in init) and debugging that
> >would be a big distraction.
> >
> >BTW do you have gdb and strace available?
> not on the system itself. i'm not sure if strace works on mips64.
> never tried it.
> but you're free to copy any binary to the /tmp dir. it has 2 gb ram.
> so enough space for static binaries if you want to play with.
> i will send you the ssh data in a private email

I haven't been able to reproduce the error on your system. I've tried
building my own static-linked version of the "ip" utility with a
mips64-linux-musl softfloat compiler, and uploading my libc.so and
using it to run both your version of ip and a dynamic-linked one I
just built. They all work fine for adding/removing a 127.0.0.2 address
to the "lo" interface.

Next I'm going to try to get a minimal testcase that tries to
intentionally misalign the control message buffers. I suspect I'm just
"getting lucky" and my buffer happens to be aligned the way the kernel
wants by chance.

Rich


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-11  2:35                                           ` Rich Felker
@ 2016-04-11  6:35                                             ` Sebastian Gottschall
  2016-04-11 18:32                                               ` Rich Felker
  2016-04-21  1:37                                             ` Rich Felker
  1 sibling, 1 reply; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-11  6:35 UTC (permalink / raw)
  To: musl

Am 11.04.2016 um 04:35 schrieb Rich Felker:
> On Mon, Apr 11, 2016 at 12:33:07AM +0200, Sebastian Gottschall wrote:
>> Am 11.04.2016 um 00:29 schrieb Rich Felker:
>>> On Mon, Apr 11, 2016 at 12:24:49AM +0200, Sebastian Gottschall wrote:
>>>>> I think what nsz was asking for, and what I'd like to see, is a way to
>>>>> reproduce the bug. I'm going to try building iproute2 for mips64 and
>>>>> running it on a prebuilt kernel from Aboriginal Linux under
>>>>> qemu-system-mips64, but I don't know what specific commands are needed
>>>>> to hit the affected code path.
>>>> any command since all is netlink based
>>>> ip add add 192.168.1.1/24  dev eth0
>>>>
>>>> yo will see that nothing will happen. ip will just return a error
>>>> message (i wrote this message already in the first entry on this
>>>> mailinglist)
>>>> "EOF on netlink" is the error which is shown
>>> OK, I'll try this.
>>>
>>>>>> its all resulting in the same failing recvmsg / sendmsg call.. so
>>>>>> yes libnetlink.c does not work with musl on mips64 (it does work on
>>>>>> x64 and everything else, just not mips64) unless the hack i offered
>>>>>> was applied which again fixed all.
>>>>>> before you ask again for a problem description, just read again. it
>>>>>> wont change the description if you ask again and just makes people
>>>>>> tired on this list.
>>>>> Both versions of the struct (musl's and your modified one that matches
>>>>> the kernel) have the exact same layout, but due to having a member
>>>>> with 64-bit type, yours has 8-byte alignment and musl's only has
>>>>> 4-byte alignment. This means, at least:
>>>>>
>>>>> 1. When musl's sendmsg.c makes its copy to zero out the padding, the
>>>>>     copy may not be correctly aligned for 64-bit writes, and the kernel
>>>>>     faults or manually produces an error for this case, causing the
>>>>>     whole operation to fail. However, I don't see where iproute2 is
>>>>>     actually passing control messages to sendmsg, so while this is a
>>>>>     problem, I don't think it's the cause. Maybe I'm missing the
>>>>>     affected call point; this is why I'd like steps to reproduce the
>>>>>     issue so I can see it.
>>>>>
>>>>> 2. iproute2's libnetlink.c's rtnl_listen function does not properly
>>>>>     declare its cmsgbuf with the alignment of cmsghdr; it has type
>>>>>     char[] so the compiler is free not to align it at all. This is
>>>>>     presumably a bug in iproute2, but I can't find any good
>>>>>     documentation (in the standards or Linux-specific) for how you're
>>>>>     supposed to allocate this space, so maybe the kernel is able to
>>>>>     handle aligning the buffer itself. I don't see any way the
>>>>>     alignment of musl's cmsghdr type affects recvmsg though.
>>>>>
>>>>> Maybe there are other effects I'm missing? I'll follow up again once I
>>>>> get a test build/run of iproute2 and let you know whether I can see
>>>>> the problem.
>>>> okay. if you need a remote access to a octeon system using musl (my
>>>> fixed variant), just tell me.
>>> That would be really helpful. Something's wrong with the userspace for
>>> the Aboriginal mips64 binaries (SIGBUS in init) and debugging that
>>> would be a big distraction.
>>>
>>> BTW do you have gdb and strace available?
>> not on the system itself. i'm not sure if strace works on mips64.
>> never tried it.
>> but you're free to copy any binary to the /tmp dir. it has 2 gb ram.
>> so enough space for static binaries if you want to play with.
>> i will send you the ssh data in a private email
> I haven't been able to reproduce the error on your system. I've tried
> building my own static-linked version of the "ip" utility with a
> mips64-linux-musl softfloat compiler, and uploading my libc.so and
> using it to run both your version of ip and a dynamic-linked one I
> just built. They all work fine for adding/removing a 127.0.0.2 address
> to the "lo" interface.
i can install a broken musl libc again if that helps. (its plain openwrt 
toolchain result)
>
> Next I'm going to try to get a minimal testcase that tries to
> intentionally misalign the control message buffers. I suspect I'm just
> "getting lucky" and my buffer happens to be aligned the way the kernel
> wants by chance.
>
> Rich
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-11  6:35                                             ` Sebastian Gottschall
@ 2016-04-11 18:32                                               ` Rich Felker
  2016-04-11 19:01                                                 ` Sebastian Gottschall
  2016-04-14 14:10                                                 ` Sebastian Gottschall
  0 siblings, 2 replies; 35+ messages in thread
From: Rich Felker @ 2016-04-11 18:32 UTC (permalink / raw)
  To: musl

On Mon, Apr 11, 2016 at 08:35:00AM +0200, Sebastian Gottschall wrote:
> >>>BTW do you have gdb and strace available?
> >>not on the system itself. i'm not sure if strace works on mips64.
> >>never tried it.
> >>but you're free to copy any binary to the /tmp dir. it has 2 gb ram.
> >>so enough space for static binaries if you want to play with.
> >>i will send you the ssh data in a private email
> >I haven't been able to reproduce the error on your system. I've tried
> >building my own static-linked version of the "ip" utility with a
> >mips64-linux-musl softfloat compiler, and uploading my libc.so and
> >using it to run both your version of ip and a dynamic-linked one I
> >just built. They all work fine for adding/removing a 127.0.0.2 address
> >to the "lo" interface.
> i can install a broken musl libc again if that helps. (its plain
> openwrt toolchain result)

Yes, that would be helpful, but would it make it impossible for you to
get the network up in order for me to login? If so perhaps you could
put the broken stuff in a chroot I could run it from.

Rich


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-11 18:32                                               ` Rich Felker
@ 2016-04-11 19:01                                                 ` Sebastian Gottschall
  2016-04-14 14:10                                                 ` Sebastian Gottschall
  1 sibling, 0 replies; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-11 19:01 UTC (permalink / raw)
  To: musl

Am 11.04.2016 um 20:32 schrieb Rich Felker:
> On Mon, Apr 11, 2016 at 08:35:00AM +0200, Sebastian Gottschall wrote:
>>>>> BTW do you have gdb and strace available?
>>>> not on the system itself. i'm not sure if strace works on mips64.
>>>> never tried it.
>>>> but you're free to copy any binary to the /tmp dir. it has 2 gb ram.
>>>> so enough space for static binaries if you want to play with.
>>>> i will send you the ssh data in a private email
>>> I haven't been able to reproduce the error on your system. I've tried
>>> building my own static-linked version of the "ip" utility with a
>>> mips64-linux-musl softfloat compiler, and uploading my libc.so and
>>> using it to run both your version of ip and a dynamic-linked one I
>>> just built. They all work fine for adding/removing a 127.0.0.2 address
>>> to the "lo" interface.
>> i can install a broken musl libc again if that helps. (its plain
>> openwrt toolchain result)
> Yes, that would be helpful, but would it make it impossible for you to
> get the network up in order for me to login? If so perhaps you could
> put the broken stuff in a chroot I could run it from.
i do not use ip for configuring. ifconfig works :-)
>
> Rich
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-11 18:32                                               ` Rich Felker
  2016-04-11 19:01                                                 ` Sebastian Gottschall
@ 2016-04-14 14:10                                                 ` Sebastian Gottschall
  2016-04-15 16:19                                                   ` Rich Felker
  1 sibling, 1 reply; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-14 14:10 UTC (permalink / raw)
  To: musl

any news?

Am 11.04.2016 um 20:32 schrieb Rich Felker:
> On Mon, Apr 11, 2016 at 08:35:00AM +0200, Sebastian Gottschall wrote:
>>>>> BTW do you have gdb and strace available?
>>>> not on the system itself. i'm not sure if strace works on mips64.
>>>> never tried it.
>>>> but you're free to copy any binary to the /tmp dir. it has 2 gb ram.
>>>> so enough space for static binaries if you want to play with.
>>>> i will send you the ssh data in a private email
>>> I haven't been able to reproduce the error on your system. I've tried
>>> building my own static-linked version of the "ip" utility with a
>>> mips64-linux-musl softfloat compiler, and uploading my libc.so and
>>> using it to run both your version of ip and a dynamic-linked one I
>>> just built. They all work fine for adding/removing a 127.0.0.2 address
>>> to the "lo" interface.
>> i can install a broken musl libc again if that helps. (its plain
>> openwrt toolchain result)
> Yes, that would be helpful, but would it make it impossible for you to
> get the network up in order for me to login? If so perhaps you could
> put the broken stuff in a chroot I could run it from.
>
> Rich
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-14 14:10                                                 ` Sebastian Gottschall
@ 2016-04-15 16:19                                                   ` Rich Felker
  0 siblings, 0 replies; 35+ messages in thread
From: Rich Felker @ 2016-04-15 16:19 UTC (permalink / raw)
  To: musl

On Thu, Apr 14, 2016 at 04:10:05PM +0200, Sebastian Gottschall wrote:
> any news?

Sorry, I've just been busy, but I will get back to it. Thanks for
setting up things for me to test on a real machine. When I am ready,
the old broken version is installed again for me to look at, right?

Rich

> Am 11.04.2016 um 20:32 schrieb Rich Felker:
> >On Mon, Apr 11, 2016 at 08:35:00AM +0200, Sebastian Gottschall wrote:
> >>>>>BTW do you have gdb and strace available?
> >>>>not on the system itself. i'm not sure if strace works on mips64.
> >>>>never tried it.
> >>>>but you're free to copy any binary to the /tmp dir. it has 2 gb ram.
> >>>>so enough space for static binaries if you want to play with.
> >>>>i will send you the ssh data in a private email
> >>>I haven't been able to reproduce the error on your system. I've tried
> >>>building my own static-linked version of the "ip" utility with a
> >>>mips64-linux-musl softfloat compiler, and uploading my libc.so and
> >>>using it to run both your version of ip and a dynamic-linked one I
> >>>just built. They all work fine for adding/removing a 127.0.0.2 address
> >>>to the "lo" interface.
> >>i can install a broken musl libc again if that helps. (its plain
> >>openwrt toolchain result)
> >Yes, that would be helpful, but would it make it impossible for you to
> >get the network up in order for me to login? If so perhaps you could
> >put the broken stuff in a chroot I could run it from.
> >
> >Rich
> >


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-11  2:35                                           ` Rich Felker
  2016-04-11  6:35                                             ` Sebastian Gottschall
@ 2016-04-21  1:37                                             ` Rich Felker
  2016-04-21  7:22                                               ` Sebastian Gottschall
  1 sibling, 1 reply; 35+ messages in thread
From: Rich Felker @ 2016-04-21  1:37 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 5406 bytes --]

On Sun, Apr 10, 2016 at 10:35:22PM -0400, Rich Felker wrote:
> On Mon, Apr 11, 2016 at 12:33:07AM +0200, Sebastian Gottschall wrote:
> > Am 11.04.2016 um 00:29 schrieb Rich Felker:
> > >On Mon, Apr 11, 2016 at 12:24:49AM +0200, Sebastian Gottschall wrote:
> > >>>I think what nsz was asking for, and what I'd like to see, is a way to
> > >>>reproduce the bug. I'm going to try building iproute2 for mips64 and
> > >>>running it on a prebuilt kernel from Aboriginal Linux under
> > >>>qemu-system-mips64, but I don't know what specific commands are needed
> > >>>to hit the affected code path.
> > >>any command since all is netlink based
> > >>ip add add 192.168.1.1/24  dev eth0
> > >>
> > >>yo will see that nothing will happen. ip will just return a error
> > >>message (i wrote this message already in the first entry on this
> > >>mailinglist)
> > >>"EOF on netlink" is the error which is shown
> > >OK, I'll try this.
> > >
> > >>>>its all resulting in the same failing recvmsg / sendmsg call.. so
> > >>>>yes libnetlink.c does not work with musl on mips64 (it does work on
> > >>>>x64 and everything else, just not mips64) unless the hack i offered
> > >>>>was applied which again fixed all.
> > >>>>before you ask again for a problem description, just read again. it
> > >>>>wont change the description if you ask again and just makes people
> > >>>>tired on this list.
> > >>>Both versions of the struct (musl's and your modified one that matches
> > >>>the kernel) have the exact same layout, but due to having a member
> > >>>with 64-bit type, yours has 8-byte alignment and musl's only has
> > >>>4-byte alignment. This means, at least:
> > >>>
> > >>>1. When musl's sendmsg.c makes its copy to zero out the padding, the
> > >>>    copy may not be correctly aligned for 64-bit writes, and the kernel
> > >>>    faults or manually produces an error for this case, causing the
> > >>>    whole operation to fail. However, I don't see where iproute2 is
> > >>>    actually passing control messages to sendmsg, so while this is a
> > >>>    problem, I don't think it's the cause. Maybe I'm missing the
> > >>>    affected call point; this is why I'd like steps to reproduce the
> > >>>    issue so I can see it.
> > >>>
> > >>>2. iproute2's libnetlink.c's rtnl_listen function does not properly
> > >>>    declare its cmsgbuf with the alignment of cmsghdr; it has type
> > >>>    char[] so the compiler is free not to align it at all. This is
> > >>>    presumably a bug in iproute2, but I can't find any good
> > >>>    documentation (in the standards or Linux-specific) for how you're
> > >>>    supposed to allocate this space, so maybe the kernel is able to
> > >>>    handle aligning the buffer itself. I don't see any way the
> > >>>    alignment of musl's cmsghdr type affects recvmsg though.
> > >>>
> > >>>Maybe there are other effects I'm missing? I'll follow up again once I
> > >>>get a test build/run of iproute2 and let you know whether I can see
> > >>>the problem.
> > >>okay. if you need a remote access to a octeon system using musl (my
> > >>fixed variant), just tell me.
> > >That would be really helpful. Something's wrong with the userspace for
> > >the Aboriginal mips64 binaries (SIGBUS in init) and debugging that
> > >would be a big distraction.
> > >
> > >BTW do you have gdb and strace available?
> > not on the system itself. i'm not sure if strace works on mips64.
> > never tried it.
> > but you're free to copy any binary to the /tmp dir. it has 2 gb ram.
> > so enough space for static binaries if you want to play with.
> > i will send you the ssh data in a private email
> 
> I haven't been able to reproduce the error on your system. I've tried
> building my own static-linked version of the "ip" utility with a
> mips64-linux-musl softfloat compiler, and uploading my libc.so and
> using it to run both your version of ip and a dynamic-linked one I
> just built. They all work fine for adding/removing a 127.0.0.2 address
> to the "lo" interface.
> 
> Next I'm going to try to get a minimal testcase that tries to
> intentionally misalign the control message buffers. I suspect I'm just
> "getting lucky" and my buffer happens to be aligned the way the kernel
> wants by chance.

I've managed to track down the cause of the breakage. Somehow your
iproute2 has been miscompiled. What I did was add debug logic to
libc.so to print the contents of the msghdr struct passed in before
fixups, after fixups, and after the syscall. The output I got was:

msghdr: 0xffffd58e08 12 0xffffd58df8 1 0 0 0 0 0
msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 0
msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 32

The fields (including __pad1 and __pad2) are printed in order. So as
you can see, ip passed in a structure with a 1 in __pad1 and a 0 in
msg_iovlen. The source (libnetlink.c) stores 1 to msg_iovlen, so my
guess is that somehow it ended up getting the wrong-endian version of
the structure definition. You could confirm this by adding #error to
the little-endian case in arch/mips64/bits/socket.h and recompiling. I
suspect it's going to take some additional work to track down the
cause, which is likely specific to something in your toolchain (it
didn't happen for me when I built my own iproute2).

In case you or anyone else would like to use the struct dumping in
testing, or just understand precisely what it's printing, I'm
attaching the patch I used.

Rich

[-- Attachment #2: recvmsg-dumper.diff --]
[-- Type: text/plain, Size: 846 bytes --]

diff --git a/src/network/recvmsg.c b/src/network/recvmsg.c
index 4f52665..8d7cace 100644
--- a/src/network/recvmsg.c
+++ b/src/network/recvmsg.c
@@ -3,19 +3,37 @@
 #include "syscall.h"
 #include "libc.h"
 
+#include <stdio.h>
+static void dump(struct msghdr *h)
+{
+	dprintf(2, "msghdr: %p %u %p %d %d %p %u %u %d\n",
+		h->msg_name,
+		h->msg_namelen,
+		h->msg_iov,
+		h->__pad1,
+		h->msg_iovlen,
+		h->msg_control,
+		h->__pad2,
+		h->msg_controllen,
+		h->msg_flags);
+}
+
 ssize_t recvmsg(int fd, struct msghdr *msg, int flags)
 {
 	ssize_t r;
 #if LONG_MAX > INT_MAX
+	dump(msg);
 	struct msghdr h, *orig = msg;
 	if (msg) {
 		h = *msg;
 		h.__pad1 = h.__pad2 = 0;
 		msg = &h;
 	}
+	dump(msg);
 #endif
 	r = socketcall_cp(recvmsg, fd, msg, flags, 0, 0, 0);
 #if LONG_MAX > INT_MAX
+	dump(msg);
 	if (orig) *orig = h;
 #endif
 	return r;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-21  1:37                                             ` Rich Felker
@ 2016-04-21  7:22                                               ` Sebastian Gottschall
  2016-04-21 15:36                                                 ` Rich Felker
  0 siblings, 1 reply; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-21  7:22 UTC (permalink / raw)
  To: musl

Am 21.04.2016 um 03:37 schrieb Rich Felker:
> On Sun, Apr 10, 2016 at 10:35:22PM -0400, Rich Felker wrote:
>> On Mon, Apr 11, 2016 at 12:33:07AM +0200, Sebastian Gottschall wrote:
>>> Am 11.04.2016 um 00:29 schrieb Rich Felker:
>>>> On Mon, Apr 11, 2016 at 12:24:49AM +0200, Sebastian Gottschall wrote:
>>>>>> I think what nsz was asking for, and what I'd like to see, is a way to
>>>>>> reproduce the bug. I'm going to try building iproute2 for mips64 and
>>>>>> running it on a prebuilt kernel from Aboriginal Linux under
>>>>>> qemu-system-mips64, but I don't know what specific commands are needed
>>>>>> to hit the affected code path.
>>>>> any command since all is netlink based
>>>>> ip add add 192.168.1.1/24  dev eth0
>>>>>
>>>>> yo will see that nothing will happen. ip will just return a error
>>>>> message (i wrote this message already in the first entry on this
>>>>> mailinglist)
>>>>> "EOF on netlink" is the error which is shown
>>>> OK, I'll try this.
>>>>
>>>>>>> its all resulting in the same failing recvmsg / sendmsg call.. so
>>>>>>> yes libnetlink.c does not work with musl on mips64 (it does work on
>>>>>>> x64 and everything else, just not mips64) unless the hack i offered
>>>>>>> was applied which again fixed all.
>>>>>>> before you ask again for a problem description, just read again. it
>>>>>>> wont change the description if you ask again and just makes people
>>>>>>> tired on this list.
>>>>>> Both versions of the struct (musl's and your modified one that matches
>>>>>> the kernel) have the exact same layout, but due to having a member
>>>>>> with 64-bit type, yours has 8-byte alignment and musl's only has
>>>>>> 4-byte alignment. This means, at least:
>>>>>>
>>>>>> 1. When musl's sendmsg.c makes its copy to zero out the padding, the
>>>>>>     copy may not be correctly aligned for 64-bit writes, and the kernel
>>>>>>     faults or manually produces an error for this case, causing the
>>>>>>     whole operation to fail. However, I don't see where iproute2 is
>>>>>>     actually passing control messages to sendmsg, so while this is a
>>>>>>     problem, I don't think it's the cause. Maybe I'm missing the
>>>>>>     affected call point; this is why I'd like steps to reproduce the
>>>>>>     issue so I can see it.
>>>>>>
>>>>>> 2. iproute2's libnetlink.c's rtnl_listen function does not properly
>>>>>>     declare its cmsgbuf with the alignment of cmsghdr; it has type
>>>>>>     char[] so the compiler is free not to align it at all. This is
>>>>>>     presumably a bug in iproute2, but I can't find any good
>>>>>>     documentation (in the standards or Linux-specific) for how you're
>>>>>>     supposed to allocate this space, so maybe the kernel is able to
>>>>>>     handle aligning the buffer itself. I don't see any way the
>>>>>>     alignment of musl's cmsghdr type affects recvmsg though.
>>>>>>
>>>>>> Maybe there are other effects I'm missing? I'll follow up again once I
>>>>>> get a test build/run of iproute2 and let you know whether I can see
>>>>>> the problem.
>>>>> okay. if you need a remote access to a octeon system using musl (my
>>>>> fixed variant), just tell me.
>>>> That would be really helpful. Something's wrong with the userspace for
>>>> the Aboriginal mips64 binaries (SIGBUS in init) and debugging that
>>>> would be a big distraction.
>>>>
>>>> BTW do you have gdb and strace available?
>>> not on the system itself. i'm not sure if strace works on mips64.
>>> never tried it.
>>> but you're free to copy any binary to the /tmp dir. it has 2 gb ram.
>>> so enough space for static binaries if you want to play with.
>>> i will send you the ssh data in a private email
>> I haven't been able to reproduce the error on your system. I've tried
>> building my own static-linked version of the "ip" utility with a
>> mips64-linux-musl softfloat compiler, and uploading my libc.so and
>> using it to run both your version of ip and a dynamic-linked one I
>> just built. They all work fine for adding/removing a 127.0.0.2 address
>> to the "lo" interface.
>>
>> Next I'm going to try to get a minimal testcase that tries to
>> intentionally misalign the control message buffers. I suspect I'm just
>> "getting lucky" and my buffer happens to be aligned the way the kernel
>> wants by chance.
> I've managed to track down the cause of the breakage. Somehow your
> iproute2 has been miscompiled. What I did was add debug logic to
> libc.so to print the contents of the msghdr struct passed in before
> fixups, after fixups, and after the syscall. The output I got was:
>
> msghdr: 0xffffd58e08 12 0xffffd58df8 1 0 0 0 0 0
> msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 0
> msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 32
>
> The fields (including __pad1 and __pad2) are printed in order. So as
> you can see, ip passed in a structure with a 1 in __pad1 and a 0 in
> msg_iovlen. The source (libnetlink.c) stores 1 to msg_iovlen, so my
> guess is that somehow it ended up getting the wrong-endian version of
> the structure definition. You could confirm this by adding #error to
> the little-endian case in arch/mips64/bits/socket.h and recompiling. I
> suspect it's going to take some additional work to track down the
> cause, which is likely specific to something in your toolchain (it
> didn't happen for me when I built my own iproute2).
i tried that already before i contacted you. the #error case never 
raises within the little endian case
so your guess doesnt match reality. (i even tried it again right now. 
all is fine. it only uses the big endian case)

Sebastian

>
> In case you or anyone else would like to use the struct dumping in
> testing, or just understand precisely what it's printing, I'm
> attaching the patch I used.
>
> Rich



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-21  7:22                                               ` Sebastian Gottschall
@ 2016-04-21 15:36                                                 ` Rich Felker
  2016-04-21 17:16                                                   ` Rich Felker
  2016-04-21 19:29                                                   ` Sebastian Gottschall
  0 siblings, 2 replies; 35+ messages in thread
From: Rich Felker @ 2016-04-21 15:36 UTC (permalink / raw)
  To: musl

On Thu, Apr 21, 2016 at 09:22:16AM +0200, Sebastian Gottschall wrote:
> Am 21.04.2016 um 03:37 schrieb Rich Felker:
> >On Sun, Apr 10, 2016 at 10:35:22PM -0400, Rich Felker wrote:
> >>On Mon, Apr 11, 2016 at 12:33:07AM +0200, Sebastian Gottschall wrote:
> >>>Am 11.04.2016 um 00:29 schrieb Rich Felker:
> >>>>On Mon, Apr 11, 2016 at 12:24:49AM +0200, Sebastian Gottschall wrote:
> >>>>>>I think what nsz was asking for, and what I'd like to see, is a way to
> >>>>>>reproduce the bug. I'm going to try building iproute2 for mips64 and
> >>>>>>running it on a prebuilt kernel from Aboriginal Linux under
> >>>>>>qemu-system-mips64, but I don't know what specific commands are needed
> >>>>>>to hit the affected code path.
> >>>>>any command since all is netlink based
> >>>>>ip add add 192.168.1.1/24  dev eth0
> >>>>>
> >>>>>yo will see that nothing will happen. ip will just return a error
> >>>>>message (i wrote this message already in the first entry on this
> >>>>>mailinglist)
> >>>>>"EOF on netlink" is the error which is shown
> >>>>OK, I'll try this.
> >>>>
> >>>>>>>its all resulting in the same failing recvmsg / sendmsg call.. so
> >>>>>>>yes libnetlink.c does not work with musl on mips64 (it does work on
> >>>>>>>x64 and everything else, just not mips64) unless the hack i offered
> >>>>>>>was applied which again fixed all.
> >>>>>>>before you ask again for a problem description, just read again. it
> >>>>>>>wont change the description if you ask again and just makes people
> >>>>>>>tired on this list.
> >>>>>>Both versions of the struct (musl's and your modified one that matches
> >>>>>>the kernel) have the exact same layout, but due to having a member
> >>>>>>with 64-bit type, yours has 8-byte alignment and musl's only has
> >>>>>>4-byte alignment. This means, at least:
> >>>>>>
> >>>>>>1. When musl's sendmsg.c makes its copy to zero out the padding, the
> >>>>>>    copy may not be correctly aligned for 64-bit writes, and the kernel
> >>>>>>    faults or manually produces an error for this case, causing the
> >>>>>>    whole operation to fail. However, I don't see where iproute2 is
> >>>>>>    actually passing control messages to sendmsg, so while this is a
> >>>>>>    problem, I don't think it's the cause. Maybe I'm missing the
> >>>>>>    affected call point; this is why I'd like steps to reproduce the
> >>>>>>    issue so I can see it.
> >>>>>>
> >>>>>>2. iproute2's libnetlink.c's rtnl_listen function does not properly
> >>>>>>    declare its cmsgbuf with the alignment of cmsghdr; it has type
> >>>>>>    char[] so the compiler is free not to align it at all. This is
> >>>>>>    presumably a bug in iproute2, but I can't find any good
> >>>>>>    documentation (in the standards or Linux-specific) for how you're
> >>>>>>    supposed to allocate this space, so maybe the kernel is able to
> >>>>>>    handle aligning the buffer itself. I don't see any way the
> >>>>>>    alignment of musl's cmsghdr type affects recvmsg though.
> >>>>>>
> >>>>>>Maybe there are other effects I'm missing? I'll follow up again once I
> >>>>>>get a test build/run of iproute2 and let you know whether I can see
> >>>>>>the problem.
> >>>>>okay. if you need a remote access to a octeon system using musl (my
> >>>>>fixed variant), just tell me.
> >>>>That would be really helpful. Something's wrong with the userspace for
> >>>>the Aboriginal mips64 binaries (SIGBUS in init) and debugging that
> >>>>would be a big distraction.
> >>>>
> >>>>BTW do you have gdb and strace available?
> >>>not on the system itself. i'm not sure if strace works on mips64.
> >>>never tried it.
> >>>but you're free to copy any binary to the /tmp dir. it has 2 gb ram.
> >>>so enough space for static binaries if you want to play with.
> >>>i will send you the ssh data in a private email
> >>I haven't been able to reproduce the error on your system. I've tried
> >>building my own static-linked version of the "ip" utility with a
> >>mips64-linux-musl softfloat compiler, and uploading my libc.so and
> >>using it to run both your version of ip and a dynamic-linked one I
> >>just built. They all work fine for adding/removing a 127.0.0.2 address
> >>to the "lo" interface.
> >>
> >>Next I'm going to try to get a minimal testcase that tries to
> >>intentionally misalign the control message buffers. I suspect I'm just
> >>"getting lucky" and my buffer happens to be aligned the way the kernel
> >>wants by chance.
> >I've managed to track down the cause of the breakage. Somehow your
> >iproute2 has been miscompiled. What I did was add debug logic to
> >libc.so to print the contents of the msghdr struct passed in before
> >fixups, after fixups, and after the syscall. The output I got was:
> >
> >msghdr: 0xffffd58e08 12 0xffffd58df8 1 0 0 0 0 0
> >msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 0
> >msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 32
> >
> >The fields (including __pad1 and __pad2) are printed in order. So as
> >you can see, ip passed in a structure with a 1 in __pad1 and a 0 in
> >msg_iovlen. The source (libnetlink.c) stores 1 to msg_iovlen, so my
> >guess is that somehow it ended up getting the wrong-endian version of
> >the structure definition. You could confirm this by adding #error to
> >the little-endian case in arch/mips64/bits/socket.h and recompiling. I
> >suspect it's going to take some additional work to track down the
> >cause, which is likely specific to something in your toolchain (it
> >didn't happen for me when I built my own iproute2).
> i tried that already before i contacted you. the #error case never
> raises within the little endian case

Was that when compiling musl or iproute2? The problem is in how
iproute2 was built; your libc.so seems fine.

> so your guess doesnt match reality. (i even tried it again right
> now. all is fine. it only uses the big endian case)

If it's not the endian tests, I don't know what else would have caused
this. I'll get a disassembly dump of the function to show you. Is
there any way I can reproduce your exact toolchain to see if I can get
the same miscompilation to happen?

Rich


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-21 15:36                                                 ` Rich Felker
@ 2016-04-21 17:16                                                   ` Rich Felker
  2016-04-21 19:30                                                     ` Sebastian Gottschall
  2016-04-21 19:29                                                   ` Sebastian Gottschall
  1 sibling, 1 reply; 35+ messages in thread
From: Rich Felker @ 2016-04-21 17:16 UTC (permalink / raw)
  To: musl

On Thu, Apr 21, 2016 at 11:36:37AM -0400, Rich Felker wrote:
> > >I've managed to track down the cause of the breakage. Somehow your
> > >iproute2 has been miscompiled. What I did was add debug logic to
> > >libc.so to print the contents of the msghdr struct passed in before
> > >fixups, after fixups, and after the syscall. The output I got was:
> > >
> > >msghdr: 0xffffd58e08 12 0xffffd58df8 1 0 0 0 0 0
> > >msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 0
> > >msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 32
> > >
> > >The fields (including __pad1 and __pad2) are printed in order. So as
> > >you can see, ip passed in a structure with a 1 in __pad1 and a 0 in
> > >msg_iovlen. The source (libnetlink.c) stores 1 to msg_iovlen, so my
> > >guess is that somehow it ended up getting the wrong-endian version of
> > >the structure definition. You could confirm this by adding #error to
> > >the little-endian case in arch/mips64/bits/socket.h and recompiling. I
> > >suspect it's going to take some additional work to track down the
> > >cause, which is likely specific to something in your toolchain (it
> > >didn't happen for me when I built my own iproute2).
> > i tried that already before i contacted you. the #error case never
> > raises within the little endian case
> 
> Was that when compiling musl or iproute2? The problem is in how
> iproute2 was built; your libc.so seems fine.
> 
> > so your guess doesnt match reality. (i even tried it again right
> > now. all is fine. it only uses the big endian case)
> 
> If it's not the endian tests, I don't know what else would have caused
> this. I'll get a disassembly dump of the function to show you. Is
> there any way I can reproduce your exact toolchain to see if I can get
> the same miscompilation to happen?

OK, I finally found the source you're building from and tracked down
the problem, which is simply that you have a buggy, 10-year-outdated
version of iproute2's libnetlink.c. The relevant code is here:

https://github.com/mirror/dd-wrt/blob/25e48ec1931daf4ef98a91ada9623638d128f34d/src/router/iproute2/lib/libnetlink.c#L156

Rather than using designated initializers as the current code does:

http://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/tree/lib/libnetlink.c?id=4bf138d6d2747b198fc0a78f5fe4e1c9287e9e90#n220

it's simply assuming an order for the members of struct msghdr. There
are several ways you could fix this:

1. Update to a modern version of iproute2. This would probably fix a
   lot of other bugs too.

2. Copy the designated-initializers approach from the modern code into
   your version.

3. Just use a zero-initializer for the structure and then assign
   values to individual members by name with ordinary assignments.

Let me know if you need any more info.

Rich


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-21 15:36                                                 ` Rich Felker
  2016-04-21 17:16                                                   ` Rich Felker
@ 2016-04-21 19:29                                                   ` Sebastian Gottschall
  1 sibling, 0 replies; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-21 19:29 UTC (permalink / raw)
  To: musl

Am 21.04.2016 um 17:36 schrieb Rich Felker:
> On Thu, Apr 21, 2016 at 09:22:16AM +0200, Sebastian Gottschall wrote:
>> Am 21.04.2016 um 03:37 schrieb Rich Felker:
>>> On Sun, Apr 10, 2016 at 10:35:22PM -0400, Rich Felker wrote:
>>>> On Mon, Apr 11, 2016 at 12:33:07AM +0200, Sebastian Gottschall wrote:
>>>>> Am 11.04.2016 um 00:29 schrieb Rich Felker:
>>>>>> On Mon, Apr 11, 2016 at 12:24:49AM +0200, Sebastian Gottschall wrote:
>>>>>>>> I think what nsz was asking for, and what I'd like to see, is a way to
>>>>>>>> reproduce the bug. I'm going to try building iproute2 for mips64 and
>>>>>>>> running it on a prebuilt kernel from Aboriginal Linux under
>>>>>>>> qemu-system-mips64, but I don't know what specific commands are needed
>>>>>>>> to hit the affected code path.
>>>>>>> any command since all is netlink based
>>>>>>> ip add add 192.168.1.1/24  dev eth0
>>>>>>>
>>>>>>> yo will see that nothing will happen. ip will just return a error
>>>>>>> message (i wrote this message already in the first entry on this
>>>>>>> mailinglist)
>>>>>>> "EOF on netlink" is the error which is shown
>>>>>> OK, I'll try this.
>>>>>>
>>>>>>>>> its all resulting in the same failing recvmsg / sendmsg call.. so
>>>>>>>>> yes libnetlink.c does not work with musl on mips64 (it does work on
>>>>>>>>> x64 and everything else, just not mips64) unless the hack i offered
>>>>>>>>> was applied which again fixed all.
>>>>>>>>> before you ask again for a problem description, just read again. it
>>>>>>>>> wont change the description if you ask again and just makes people
>>>>>>>>> tired on this list.
>>>>>>>> Both versions of the struct (musl's and your modified one that matches
>>>>>>>> the kernel) have the exact same layout, but due to having a member
>>>>>>>> with 64-bit type, yours has 8-byte alignment and musl's only has
>>>>>>>> 4-byte alignment. This means, at least:
>>>>>>>>
>>>>>>>> 1. When musl's sendmsg.c makes its copy to zero out the padding, the
>>>>>>>>     copy may not be correctly aligned for 64-bit writes, and the kernel
>>>>>>>>     faults or manually produces an error for this case, causing the
>>>>>>>>     whole operation to fail. However, I don't see where iproute2 is
>>>>>>>>     actually passing control messages to sendmsg, so while this is a
>>>>>>>>     problem, I don't think it's the cause. Maybe I'm missing the
>>>>>>>>     affected call point; this is why I'd like steps to reproduce the
>>>>>>>>     issue so I can see it.
>>>>>>>>
>>>>>>>> 2. iproute2's libnetlink.c's rtnl_listen function does not properly
>>>>>>>>     declare its cmsgbuf with the alignment of cmsghdr; it has type
>>>>>>>>     char[] so the compiler is free not to align it at all. This is
>>>>>>>>     presumably a bug in iproute2, but I can't find any good
>>>>>>>>     documentation (in the standards or Linux-specific) for how you're
>>>>>>>>     supposed to allocate this space, so maybe the kernel is able to
>>>>>>>>     handle aligning the buffer itself. I don't see any way the
>>>>>>>>     alignment of musl's cmsghdr type affects recvmsg though.
>>>>>>>>
>>>>>>>> Maybe there are other effects I'm missing? I'll follow up again once I
>>>>>>>> get a test build/run of iproute2 and let you know whether I can see
>>>>>>>> the problem.
>>>>>>> okay. if you need a remote access to a octeon system using musl (my
>>>>>>> fixed variant), just tell me.
>>>>>> That would be really helpful. Something's wrong with the userspace for
>>>>>> the Aboriginal mips64 binaries (SIGBUS in init) and debugging that
>>>>>> would be a big distraction.
>>>>>>
>>>>>> BTW do you have gdb and strace available?
>>>>> not on the system itself. i'm not sure if strace works on mips64.
>>>>> never tried it.
>>>>> but you're free to copy any binary to the /tmp dir. it has 2 gb ram.
>>>>> so enough space for static binaries if you want to play with.
>>>>> i will send you the ssh data in a private email
>>>> I haven't been able to reproduce the error on your system. I've tried
>>>> building my own static-linked version of the "ip" utility with a
>>>> mips64-linux-musl softfloat compiler, and uploading my libc.so and
>>>> using it to run both your version of ip and a dynamic-linked one I
>>>> just built. They all work fine for adding/removing a 127.0.0.2 address
>>>> to the "lo" interface.
>>>>
>>>> Next I'm going to try to get a minimal testcase that tries to
>>>> intentionally misalign the control message buffers. I suspect I'm just
>>>> "getting lucky" and my buffer happens to be aligned the way the kernel
>>>> wants by chance.
>>> I've managed to track down the cause of the breakage. Somehow your
>>> iproute2 has been miscompiled. What I did was add debug logic to
>>> libc.so to print the contents of the msghdr struct passed in before
>>> fixups, after fixups, and after the syscall. The output I got was:
>>>
>>> msghdr: 0xffffd58e08 12 0xffffd58df8 1 0 0 0 0 0
>>> msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 0
>>> msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 32
>>>
>>> The fields (including __pad1 and __pad2) are printed in order. So as
>>> you can see, ip passed in a structure with a 1 in __pad1 and a 0 in
>>> msg_iovlen. The source (libnetlink.c) stores 1 to msg_iovlen, so my
>>> guess is that somehow it ended up getting the wrong-endian version of
>>> the structure definition. You could confirm this by adding #error to
>>> the little-endian case in arch/mips64/bits/socket.h and recompiling. I
>>> suspect it's going to take some additional work to track down the
>>> cause, which is likely specific to something in your toolchain (it
>>> didn't happen for me when I built my own iproute2).
>> i tried that already before i contacted you. the #error case never
>> raises within the little endian case
> Was that when compiling musl or iproute2? The problem is in how
> iproute2 was built; your libc.so seems fine.
iproute2 for sure
>
>> so your guess doesnt match reality. (i even tried it again right
>> now. all is fine. it only uses the big endian case)
> If it's not the endian tests, I don't know what else would have caused
> this. I'll get a disassembly dump of the function to show you. Is
> there any way I can reproduce your exact toolchain to see if I can get
> the same miscompilation to happen?
i can provide you a tarball of the used toolchain compiled for amd64 
(its plain openwrt gcc 5.3.0 using musl)
the iproute2 package which is used is 
http://svn.dd-wrt.com/browser/src/router/iproute2
thats the one which is used for all targets. its not the newest but the 
one i'm using on all targets (working on x64, x32, little endian, big 
endian, arm, mips, powerpc etc)

if something helps. just tell me

> Rich
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: recvmsg/sendmsg broken on mips64
  2016-04-21 17:16                                                   ` Rich Felker
@ 2016-04-21 19:30                                                     ` Sebastian Gottschall
  0 siblings, 0 replies; 35+ messages in thread
From: Sebastian Gottschall @ 2016-04-21 19:30 UTC (permalink / raw)
  To: musl

Am 21.04.2016 um 19:16 schrieb Rich Felker:
> On Thu, Apr 21, 2016 at 11:36:37AM -0400, Rich Felker wrote:
>>>> I've managed to track down the cause of the breakage. Somehow your
>>>> iproute2 has been miscompiled. What I did was add debug logic to
>>>> libc.so to print the contents of the msghdr struct passed in before
>>>> fixups, after fixups, and after the syscall. The output I got was:
>>>>
>>>> msghdr: 0xffffd58e08 12 0xffffd58df8 1 0 0 0 0 0
>>>> msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 0
>>>> msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 32
>>>>
>>>> The fields (including __pad1 and __pad2) are printed in order. So as
>>>> you can see, ip passed in a structure with a 1 in __pad1 and a 0 in
>>>> msg_iovlen. The source (libnetlink.c) stores 1 to msg_iovlen, so my
>>>> guess is that somehow it ended up getting the wrong-endian version of
>>>> the structure definition. You could confirm this by adding #error to
>>>> the little-endian case in arch/mips64/bits/socket.h and recompiling. I
>>>> suspect it's going to take some additional work to track down the
>>>> cause, which is likely specific to something in your toolchain (it
>>>> didn't happen for me when I built my own iproute2).
>>> i tried that already before i contacted you. the #error case never
>>> raises within the little endian case
>> Was that when compiling musl or iproute2? The problem is in how
>> iproute2 was built; your libc.so seems fine.
>>
>>> so your guess doesnt match reality. (i even tried it again right
>>> now. all is fine. it only uses the big endian case)
>> If it's not the endian tests, I don't know what else would have caused
>> this. I'll get a disassembly dump of the function to show you. Is
>> there any way I can reproduce your exact toolchain to see if I can get
>> the same miscompilation to happen?
> OK, I finally found the source you're building from and tracked down
> the problem, which is simply that you have a buggy, 10-year-outdated
> version of iproute2's libnetlink.c. The relevant code is here:
>
> https://github.com/mirror/dd-wrt/blob/25e48ec1931daf4ef98a91ada9623638d128f34d/src/router/iproute2/lib/libnetlink.c#L156
>
> Rather than using designated initializers as the current code does:
>
> http://git.kernel.org/cgit/linux/kernel/git/shemminger/iproute2.git/tree/lib/libnetlink.c?id=4bf138d6d2747b198fc0a78f5fe4e1c9287e9e90#n220
>
> it's simply assuming an order for the members of struct msghdr. There
> are several ways you could fix this:
>
> 1. Update to a modern version of iproute2. This would probably fix a
>     lot of other bugs too.
>
> 2. Copy the designated-initializers approach from the modern code into
>     your version.
>
> 3. Just use a zero-initializer for the structure and then assign
>     values to individual members by name with ordinary assignments.
okay. will try
>
> Let me know if you need any more info.
>
> Rich
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-04-21 19:30 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-31 18:20 size_t and int64_t on a new platform Dan Gohman
2016-03-31 19:25 ` Rich Felker
2016-03-31 20:10   ` Szabolcs Nagy
2016-03-31 20:23     ` Alexander Monakov
2016-03-31 20:30       ` Rich Felker
2016-04-01  9:16         ` recvmsg/sendmsg broken on mips64 Sebastian Gottschall
2016-04-01  9:49           ` Szabolcs Nagy
2016-04-01 10:29             ` Sebastian Gottschall
2016-04-01 11:31               ` Szabolcs Nagy
2016-04-01 11:37                 ` Sebastian Gottschall
2016-04-01 12:21                   ` Masanori Ogino
2016-04-01 12:42                     ` Sebastian Gottschall
2016-04-01 13:17                       ` Szabolcs Nagy
2016-04-02  9:52                         ` Sebastian Gottschall
2016-04-07  9:48                           ` Szabolcs Nagy
2016-04-07 11:42                             ` Sebastian Gottschall
2016-04-07 18:46                               ` Szabolcs Nagy
2016-04-07 23:33                                 ` Sebastian Gottschall
2016-04-10 22:18                                   ` Rich Felker
2016-04-10 22:24                                     ` Sebastian Gottschall
2016-04-10 22:29                                       ` Rich Felker
2016-04-10 22:33                                         ` Sebastian Gottschall
2016-04-11  2:35                                           ` Rich Felker
2016-04-11  6:35                                             ` Sebastian Gottschall
2016-04-11 18:32                                               ` Rich Felker
2016-04-11 19:01                                                 ` Sebastian Gottschall
2016-04-14 14:10                                                 ` Sebastian Gottschall
2016-04-15 16:19                                                   ` Rich Felker
2016-04-21  1:37                                             ` Rich Felker
2016-04-21  7:22                                               ` Sebastian Gottschall
2016-04-21 15:36                                                 ` Rich Felker
2016-04-21 17:16                                                   ` Rich Felker
2016-04-21 19:30                                                     ` Sebastian Gottschall
2016-04-21 19:29                                                   ` Sebastian Gottschall
2016-04-01  0:35   ` size_t and int64_t on a new platform Dan Gohman

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).