From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/6224 Path: news.gmane.org!not-for-mail From: Andy Lutomirski Newsgroups: gmane.linux.lib.musl.general Subject: Re: A running list of questions from "porting" Slackware to musl Date: Tue, 30 Sep 2014 22:49:15 -0700 Message-ID: <542B95DB.7050209@mit.edu> References: <542AA579.2040304@langurwallah.org> <20140930153216.GA1785@newbook> <20140930155023.GC23797@brightrain.aerifal.cx> <542B41C4.1040701@amacapital.net> <20141001000516.GI23797@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1412142580 22132 80.91.229.3 (1 Oct 2014 05:49:40 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 1 Oct 2014 05:49:40 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-6237-gllmg-musl=m.gmane.org@lists.openwall.com Wed Oct 01 07:49:35 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XZCn7-0008LR-SU for gllmg-musl@plane.gmane.org; Wed, 01 Oct 2014 07:49:33 +0200 Original-Received: (qmail 15985 invoked by uid 550); 1 Oct 2014 05:49:32 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 15977 invoked from network); 1 Oct 2014 05:49:32 -0000 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:message-id:date:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=cKi3F3t0WZaWUivcFEKUuuSWQTzu5OKcIqhskmFRwDw=; b=PcmnyM2l0x8rYZ7tT0C58LIhjKl4pII8/EkglHyNaqB8cdGhtjeMoQLMtyNchSS0+4 mg0CBCpkWhV4obhXb6ErZpHaJ7I+zBXiBsJ5BEVRXu7b2Jj+Jg6axP4lYsWwjYaMgXz4 dFLrx3c5snlCJgq3EBdu+j3rLUdDwowgf7qLm6dzAzPcf+SoX2XY/c0cryJ+CRWen0mK x85NgLUXOzQDzEspgC21wjArIGM3+uWfKN8ORr7uWmTIKUBdsdIFTc77YdLQ2ms8XTNi VEQii6ImlXMWELNZDiMfBgm3ZbFM46OZfIXLMvDDZZkelWIGtEkupgPUVr3lQyBbupMA /Fsw== X-Gm-Message-State: ALoCoQlYKhbbVdRSbV8CxxtzvLdpRqCNcwkP0UhgiZncGIxZObbLNS8+RKlBlNtK4Z1uyQR4jP5c X-Received: by 10.66.233.201 with SMTP id ty9mr18751885pac.99.1412142559997; Tue, 30 Sep 2014 22:49:19 -0700 (PDT) X-Google-Original-From: Andy Lutomirski User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 In-Reply-To: <20141001000516.GI23797@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:6224 Archived-At: On 09/30/2014 05:05 PM, Rich Felker wrote: > On Tue, Sep 30, 2014 at 04:50:28PM -0700, Andy Lutomirski wrote: >>> When gcc generates the canary-check code, on failure it normally >>> calls/jumps to __stack_chk_fail. But for shared libraries, that call >>> would go to a thunk in the library's PLT, which depends on the GOT >>> register being initialized (actually this varies by arch; x86_64 >>> doesn't need it). In order to avoid (expensive) loading of the GOT >>> register in every function just as a contingency in case >>> __stack_chk_fail needs to be called, for position-independent code GCC >>> generates a call to __stack_chk_fail_local instead. This is a hidden >>> function (and necessarily exists within the same .so) so the call >>> doesn't have to go through the PLT; it's just a straight relative >>> call/jump instruction. __stack_chk_fail_local is then responsible for >>> loading the GOT register and calling __stack_chk_fail. >> >> [slightly off topic] >> >> Does GCC even know how to call through the GOT instead of the PLT? >> Windows (at least 32-bit Windows) has done for decades, at least if >> dllimport is set. >> >> On x86_64, this would be call *whatever@gotoff(%rip) instead of call >> whatever@plt. > > This precludes optimizing out the indirection at link time (or at > least it requires more complex transformation in the linker). I'm not > sure if there are cases where GCC generates this kind of code or not. > It's also not practical on many ISAs. I think I filed a bug asking for this (among other things) in GCC once. Basically, I want __attribute__((visibility("imported"))) or something that. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56527 > >> (Even better: the loader could patch the PLT with a direct jump. Could >> musl do this? At least in the case where the symbol is within 2G of the >> PLT entry, > > This is really not a good idea. The old PowerPC ABI did this, and musl > does not support it (it requires the new "secure-plt" mode). Hardened > kernels have various restrictions on modifying executable pages, up to > and including completely forbidding this kind of usage. And even if > it's not forbidden, it's going to use more memory due to an additional > page (or more) per shared library that's not going to be sharable. > Also it requires complex per-arch code (minimal machine code > generation, instruction cache flushing/barriers, etc.). That extra page might not be needed if the linker could end up removing a bunch of GOT entries for functions that don't have their addresses taken. (Or, on x86_64, where unaligned access is cheap, the GOT could actually overlap the PLT in memory, but only if DT_BIND_NOW or whatever it's called is on. Hmm. I bet that the linker could do this in a way that doesn't require loader support at all as long as textrel is allowed.) > >> this should be straightforward if no threads have been >> started yet. > > Threads having been started or not are not relevant. The newly loaded > code is not visible until dlopen returns, so nothing can race with > modifications to it. True, at least when lazy binding is off. > >> If musl did this, it could advertise a nice speedup over >> glibc...) > > I think the performance gain would be mostly theoretical. Do you have > any timings that show otherwise? No. It would reduce pressure on whatever presumably limited resources the CPU has for predicting indirect jumps, and it would reduce the number of cache lines needed for a call through the PLT. Doing it cleanly would also probably require a new dynamic entry and a new relocation type. Also, it might be a lost cause when selinux is being used. I *hate* execmem, execmod, etc -- it really should be possible to do this and to write a sensible JIT without requiring special selinux permissions. I think that what's needed is a syscall to make a writeable alias of an executable mapping. Anyway, probably not worth it. --Andy