From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12306 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] Add comments to i386 assembly source Date: Tue, 2 Jan 2018 14:49:09 -0500 Message-ID: <20180102194909.GM1627@brightrain.aerifal.cx> References: <20171223094545.rmx6xtmucyz5xzap@voyager> <72c68934-4445-c83d-7bbc-004953b2f9e9@bitwagon.com> <20171231154926.GG1627@brightrain.aerifal.cx> <20180101195224.tpkl5g5w66rzwzz3@voyager> <5caf910a-dd98-6836-c70f-6a98cf8a9d22@bitwagon.com> <20180102014915.GJ1627@brightrain.aerifal.cx> <64245dca-3c6e-3918-701c-dcf3f8e00783@bitwagon.com> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1514922455 30653 195.159.176.226 (2 Jan 2018 19:47:35 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 2 Jan 2018 19:47:35 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-12322-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jan 02 20:47:31 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1eWSWu-00074p-UY for gllmg-musl@m.gmane.org; Tue, 02 Jan 2018 20:47:21 +0100 Original-Received: (qmail 30684 invoked by uid 550); 2 Jan 2018 19:49:21 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 30666 invoked from network); 2 Jan 2018 19:49:21 -0000 Content-Disposition: inline In-Reply-To: <64245dca-3c6e-3918-701c-dcf3f8e00783@bitwagon.com> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:12306 Archived-At: On Mon, Jan 01, 2018 at 07:15:50PM -0800, John Reiser wrote: > On 01/01/2018 13:49 UTC, Rich Felker wrote: > >On Mon, Jan 01, 2018 at 02:57:02PM -0800, John Reiser wrote: > >>There's a bug. clone() is a user-level function that can be used > >>independently of the musl internal implementation of threads. > >>Thus when clone() in musl/src/linux/clone.c calls > >> return __syscall_ret(__clone(func, stack, flags, arg, ptid, tls, ctid)); > >>then the i386 implementation of __clone has no guarantee about > >>the value in %gs, and it is a bug to assume that (%gs >> 3) > >>fits in 8 bits. > > > >The ABI is that at function call or any time a signal could be > >received, %gs must always be a valid segment register value reflecting > >the current thread's thread pointer. If this is violated, the program > >has undefined behavior. > > More than one segment descriptor can designate the same subset > of the linear address space. Duplicate the segment descriptor > to a target selector that is >= 256, and load %gs with the > duplicate selector before calling clone(). It's not clear to me that such a substition is valid; as far as I can tell no explicit effort to ensure that it works is made, and it would not happen without writing asm to do specifically that. > >>The code in musl/src/thread/i386/clone.s wastes up to 12 bytes > >>when aligning the new stack, by aligning before [pre-]allocating > >>space for the one argument to the thread function. > > > >I suspect the initial value happens to be aligned anyway in which case > >reserving 16 bytes and aligning to 16 is the same as reserving 4 and > >aligning to 16. If you think it's not, I don't mind changing if you > >can do careful testing to make sure it doesn't introduce any bugs. > > This is another bug! Consider the valid code: > void **lo_stack = malloc(5 * sizeof(void *)); > /* malloc() guarantees 16-byte alignment of lo_stack */ > clone(func, &lo_stack[5], ...); You can't run code on a 20-byte stack. This is not a surprise. In theory it might be possible if the callee is only asm, but you can't make C function calls since each call frame will consume at least 16 bytes (return address and alignment). I also disagree with considering it valid to assume clone invokes the provided callback function directly with no intervening functions; this is incorrect on SH right now since we use a C function to smooth over the difference between plain and fdpic calling conventions. ARM will probably do the same once fdpic for cortex-M is added. > then __clone() does: > and $-16,%ecx /* &lo_stack[4] */ > sub $ 16,%ecx /* &lo_stack[0] */ > ... > mov %ecx,%esp /* new thread: implicit action of ___NR_clone system call */ > call *%eax /* OUT-OF-BOUNDS: lo_stack[-1] = return address */ > > Thus, starting the thread function has scribbled outside the allocated area, > even though the lo_stack[] array can accommodate the call by the code I showed: > lea -NBPW(arg2),%ecx /* &lo_stack[4] */ > and $-16,%ecx /* still &lo_stack[4] */ > ... > mov %ecx,%esp /* new thread: implicit action of __NR_clone system call */ > call *%eax /* lo_stack[3] = return address */ > > The danger is not "new bugs", but rather revealing latent bugs that were > obscured by the less-strict old code. For instance, if the thread > function actually has two formal parameters, or if it uses va_arg() > to reference beyond the first actual argument, then running the optimal > code is more likely to notice. I agree with your analysis of what happens but I don't think it's particularly interesting or a bug. Rich