From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/9936 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: recvmsg/sendmsg broken on mips64 Date: Thu, 21 Apr 2016 11:36:37 -0400 Message-ID: <20160421153637.GY21636@brightrain.aerifal.cx> References: <20160407184643.GI9862@port70.net> <2656e404-f225-cd95-3989-a48df486d914@dd-wrt.com> <20160410221812.GP21636@brightrain.aerifal.cx> <20160410222947.GQ21636@brightrain.aerifal.cx> <20160411023522.GR21636@brightrain.aerifal.cx> <20160421013715.GX21636@brightrain.aerifal.cx> <57187FA8.8010806@dd-wrt.com> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1461253016 24203 80.91.229.3 (21 Apr 2016 15:36:56 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 21 Apr 2016 15:36:56 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-9949-gllmg-musl=m.gmane.org@lists.openwall.com Thu Apr 21 17:36:56 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1atGez-0005ay-Ma for gllmg-musl@m.gmane.org; Thu, 21 Apr 2016 17:36:53 +0200 Original-Received: (qmail 32024 invoked by uid 550); 21 Apr 2016 15:36:50 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 32006 invoked from network); 21 Apr 2016 15:36:49 -0000 Content-Disposition: inline In-Reply-To: <57187FA8.8010806@dd-wrt.com> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:9936 Archived-At: On Thu, Apr 21, 2016 at 09:22:16AM +0200, Sebastian Gottschall wrote: > Am 21.04.2016 um 03:37 schrieb Rich Felker: > >On Sun, Apr 10, 2016 at 10:35:22PM -0400, Rich Felker wrote: > >>On Mon, Apr 11, 2016 at 12:33:07AM +0200, Sebastian Gottschall wrote: > >>>Am 11.04.2016 um 00:29 schrieb Rich Felker: > >>>>On Mon, Apr 11, 2016 at 12:24:49AM +0200, Sebastian Gottschall wrote: > >>>>>>I think what nsz was asking for, and what I'd like to see, is a way to > >>>>>>reproduce the bug. I'm going to try building iproute2 for mips64 and > >>>>>>running it on a prebuilt kernel from Aboriginal Linux under > >>>>>>qemu-system-mips64, but I don't know what specific commands are needed > >>>>>>to hit the affected code path. > >>>>>any command since all is netlink based > >>>>>ip add add 192.168.1.1/24 dev eth0 > >>>>> > >>>>>yo will see that nothing will happen. ip will just return a error > >>>>>message (i wrote this message already in the first entry on this > >>>>>mailinglist) > >>>>>"EOF on netlink" is the error which is shown > >>>>OK, I'll try this. > >>>> > >>>>>>>its all resulting in the same failing recvmsg / sendmsg call.. so > >>>>>>>yes libnetlink.c does not work with musl on mips64 (it does work on > >>>>>>>x64 and everything else, just not mips64) unless the hack i offered > >>>>>>>was applied which again fixed all. > >>>>>>>before you ask again for a problem description, just read again. it > >>>>>>>wont change the description if you ask again and just makes people > >>>>>>>tired on this list. > >>>>>>Both versions of the struct (musl's and your modified one that matches > >>>>>>the kernel) have the exact same layout, but due to having a member > >>>>>>with 64-bit type, yours has 8-byte alignment and musl's only has > >>>>>>4-byte alignment. This means, at least: > >>>>>> > >>>>>>1. When musl's sendmsg.c makes its copy to zero out the padding, the > >>>>>> copy may not be correctly aligned for 64-bit writes, and the kernel > >>>>>> faults or manually produces an error for this case, causing the > >>>>>> whole operation to fail. However, I don't see where iproute2 is > >>>>>> actually passing control messages to sendmsg, so while this is a > >>>>>> problem, I don't think it's the cause. Maybe I'm missing the > >>>>>> affected call point; this is why I'd like steps to reproduce the > >>>>>> issue so I can see it. > >>>>>> > >>>>>>2. iproute2's libnetlink.c's rtnl_listen function does not properly > >>>>>> declare its cmsgbuf with the alignment of cmsghdr; it has type > >>>>>> char[] so the compiler is free not to align it at all. This is > >>>>>> presumably a bug in iproute2, but I can't find any good > >>>>>> documentation (in the standards or Linux-specific) for how you're > >>>>>> supposed to allocate this space, so maybe the kernel is able to > >>>>>> handle aligning the buffer itself. I don't see any way the > >>>>>> alignment of musl's cmsghdr type affects recvmsg though. > >>>>>> > >>>>>>Maybe there are other effects I'm missing? I'll follow up again once I > >>>>>>get a test build/run of iproute2 and let you know whether I can see > >>>>>>the problem. > >>>>>okay. if you need a remote access to a octeon system using musl (my > >>>>>fixed variant), just tell me. > >>>>That would be really helpful. Something's wrong with the userspace for > >>>>the Aboriginal mips64 binaries (SIGBUS in init) and debugging that > >>>>would be a big distraction. > >>>> > >>>>BTW do you have gdb and strace available? > >>>not on the system itself. i'm not sure if strace works on mips64. > >>>never tried it. > >>>but you're free to copy any binary to the /tmp dir. it has 2 gb ram. > >>>so enough space for static binaries if you want to play with. > >>>i will send you the ssh data in a private email > >>I haven't been able to reproduce the error on your system. I've tried > >>building my own static-linked version of the "ip" utility with a > >>mips64-linux-musl softfloat compiler, and uploading my libc.so and > >>using it to run both your version of ip and a dynamic-linked one I > >>just built. They all work fine for adding/removing a 127.0.0.2 address > >>to the "lo" interface. > >> > >>Next I'm going to try to get a minimal testcase that tries to > >>intentionally misalign the control message buffers. I suspect I'm just > >>"getting lucky" and my buffer happens to be aligned the way the kernel > >>wants by chance. > >I've managed to track down the cause of the breakage. Somehow your > >iproute2 has been miscompiled. What I did was add debug logic to > >libc.so to print the contents of the msghdr struct passed in before > >fixups, after fixups, and after the syscall. The output I got was: > > > >msghdr: 0xffffd58e08 12 0xffffd58df8 1 0 0 0 0 0 > >msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 0 > >msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 32 > > > >The fields (including __pad1 and __pad2) are printed in order. So as > >you can see, ip passed in a structure with a 1 in __pad1 and a 0 in > >msg_iovlen. The source (libnetlink.c) stores 1 to msg_iovlen, so my > >guess is that somehow it ended up getting the wrong-endian version of > >the structure definition. You could confirm this by adding #error to > >the little-endian case in arch/mips64/bits/socket.h and recompiling. I > >suspect it's going to take some additional work to track down the > >cause, which is likely specific to something in your toolchain (it > >didn't happen for me when I built my own iproute2). > i tried that already before i contacted you. the #error case never > raises within the little endian case Was that when compiling musl or iproute2? The problem is in how iproute2 was built; your libc.so seems fine. > so your guess doesnt match reality. (i even tried it again right > now. all is fine. it only uses the big endian case) If it's not the endian tests, I don't know what else would have caused this. I'll get a disassembly dump of the function to show you. Is there any way I can reproduce your exact toolchain to see if I can get the same miscompilation to happen? Rich