From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/556 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: tough choice on thread pointer initialization issue Date: Thu, 9 Feb 2012 21:58:25 -0500 Message-ID: <20120210025824.GA25414@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1328842739 15154 80.91.229.3 (10 Feb 2012 02:58:59 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 10 Feb 2012 02:58:59 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-557-gllmg-musl=m.gmane.org@lists.openwall.com Fri Feb 10 03:58:58 2012 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1RvghN-00010U-03 for gllmg-musl@plane.gmane.org; Fri, 10 Feb 2012 03:58:57 +0100 Original-Received: (qmail 11718 invoked by uid 550); 10 Feb 2012 02:58:56 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 11710 invoked from network); 10 Feb 2012 02:58:55 -0000 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:556 Archived-At: hi all, due to some recent changes in musl, i've detected a long-standing bug that went unnoticed until now. the way musl's pthread implementation works, the thread pointer register (and kernel-side support) for the initial thread is subject to "lazy initialization" - no syscalls are made to set it up until the first time it's used. this allows us to keep an absolutely minimal set of syscalls for super-short/trivial programs, which looks really impressive in the libc comparison charts and strace runs. however... the bug i've found occurs when the thread pointer happens to get initialized in code that runs from a signal handler. this is a rare situation that will only happen if the program is avoiding async-signal-unsafe functions in the main flow of execution so that it's free to use them in a signal handler, but despite being rare, it's perfectly legal, and right now musl crashes on such programs, for example: #include #include #include void evil(int sig) { printf("hello, world %lx\n", (unsigned long)pthread_self()); } int main(void) { signal(SIGALRM, evil); alarm(1); sleep(2); } the issue is that when a signal handler returns, all registers, including the thread-pointer registers (%gs or %fs on x86 or x86_64) are reset to the values they had in the code the signal interrupted. thus, musl thinks the thread pointer is valid at this point, but it's actually null. i see 3 possible fixes, none of which are ideal: approach 1: hack the signal-return "restore" function to save the current thread register value into the struct sigcontext before calling SYS_sigreturn, so that it will be preserved when the interrupted code resumes. pros: minimal costs, never adds any syscalls versus current musl. cons: ugly hack, and gdb does not like non-canonical sigreturn functions (it refuses to work when the instruction pointer is at them). approach 2: call pthread_self() from sigaction(). this will ensure that a signal handler never runs prior to the thread pointer being initialized. pros: minimal code changes, and avoids adding syscalls except for programs that use signals but not threads. cons: adds a syscall, and links unnecessary thread code when static linking, in any program that uses signal handlers. approach 3: always initialize the thread pointer from __libc_start_main (befoe main runs). (this is the glibc approach) pros: simple, and allows all the lazy-initialization logic to be removed, moderately debloating and speeding up lots of thread-related functions that will be able to use the thread pointer without making a function call that checks whether it's initialized. would also make it easier for us to support stack-protector, vsyscall/sysenter syscalls, and thread-local storage in the future. cons: constant additional 2-syscall overhead at startup (but it could be optimized out when static-linking programs that don't use any thread-related functions). their run times are ~1010ns and ~890ns on my machine, compared to ~260000ns for the exec syscall. one other possible issue is that we'd need to worry about making sure non-threaded programs which otherwise would work on old kernels without thread support don't crash due to assuming the thread pointer is valid in places where they shouldn't need it. before i make a decision, i'd like to hear if anyone from the community has strong opinions one way or the other. i've almost ruled out approach #1 and i'm leaning towards #3, with the idea that simplicity is worth more than a couple trivial syscalls. rich