From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/4720 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Transition path for removing lazy init of thread pointer Date: Tue, 25 Mar 2014 03:11:24 -0400 Message-ID: <20140325071124.GZ26358@brightrain.aerifal.cx> References: <20140324174915.GA1263@brightrain.aerifal.cx> <20140324230405.GA23163@brightrain.aerifal.cx> <5330C769.6080304@skarnet.org> <20140325015531.GB23474@brightrain.aerifal.cx> <53312396.5060307@skarnet.org> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1395731488 31797 80.91.229.3 (25 Mar 2014 07:11:28 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 25 Mar 2014 07:11:28 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-4724-gllmg-musl=m.gmane.org@lists.openwall.com Tue Mar 25 08:11:38 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1WSLWL-00018V-OA for gllmg-musl@plane.gmane.org; Tue, 25 Mar 2014 08:11:37 +0100 Original-Received: (qmail 32006 invoked by uid 550); 25 Mar 2014 07:11:37 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 31998 invoked from network); 25 Mar 2014 07:11:36 -0000 Content-Disposition: inline In-Reply-To: <53312396.5060307@skarnet.org> User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:4720 Archived-At: On Tue, Mar 25, 2014 at 06:35:02AM +0000, Laurent Bercot wrote: > On 25/03/2014 01:55, Rich Felker wrote: > >The mandatory syscall is set_thread_area or equivalent, e.g. > >arch_prctl on x86_64. It's there because most archs need a syscall to > >set the thread pointer used for accessing TLS. Even in single-threaded > >programs, there are reasons one may want to have it. > > > >The big reason is that, on most archs, stack protector's canary value > >is stored at a fixed offset from the thread pointer rather than in a > >global, so stack protector can't work without the thread pointer being > >initialized. Up to now we've tried to detect whether stack protector > >is used based on symbol references to __stack_chk_fail, but this check > >gives a false negative (and thus crashing programs) if gcc optimizes > >out the check to __stack_chk_fail but not the load of the canary, e.g. > >in the program: int main() { exit(0); } > > That's a good reason indeed. > I take it you're still hell-bent against compile-time options ? Because In general, no. I'll probably eventually accept compile-time options for things like iconv charset selection. But gratuitous ones, yes. Especially if supporting the compile-time option significantly complicates the code and forces us to have multiple #ifdef/#else cases, ala uClibc. Making thread-pointer optional would, at least in the long term, be one of those, since it either precludes all optimization and simplification that assumes the thread pointer is available, or forces us to have multiple versions of the same code for with/without it. > a musl compile-time option "I don't want this musl to support stack > protector, yes I know it will crash programs compiled with it, but I'm > a big boy and know what I'm doing" would be great for OCD people like > me who like their strace clean. :) Yeah, this is really just a case of appealing to OCD, so thanks for acknowledging that. :-) I think we could still consider making the second syscall (set_tid_address) get optimized out in static binaries that don't need it, but it's enough of a complexity burden that I'd like to see what others have to say about it, and at least wait to see how hard it would be, once other cleanups related to this change are made. > >The other main reason is that lazy initialization is a lot more > >expensive at runtime. > > That's not a good reason for single-threaded programs. Well there are a lot of mostly-useless micro-optimizations you could do that, theoretically, improve single-threaded programs. Like accessing errno directly. The problem is that these preclude doing major systemic simplifications that have much greater debloating effects (even on single-threaded programs!) unless we make a whole separate single-thread-only libc. For example, __stdio_read and __stdio_write just got simpler because they no longer have to special-case the threaded/non-threaded cases to avoid gratuitous thread-pointer loads and possible crashes. And pthread_setcancelstate, which is used in various functions which need to avoid triggering cancellation, is now simpler since it knows the absence/presence of a thread pointer will be constant (before, it had to be able to get/set state before the thread pointer was loaded for consistency in case it's loaded later). Right now that's about it for code that gets linked in NON-threaded programs, but there will probably be more that gets simplified later, and a lot more if you count code for programs using threads. > >So despite always initializing the thread pointer kinda looking like > >"bloat" from a minimal-program standpoint, it's really a major step > >forward in debloating and simplifying lots of code. > > I totally understand and approve for multi-threaded programs and > programs using stack protection. I just wish there were a special > optimization for "int main() { return 0; }". Yes, I miss the extreme-minimal strace too, but it's still pretty damn minimal and not going to get any bigger anytime soon. What I don't miss is the messy undocumented logic for lazy initialization. Rich