From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7753 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: ppc soft-float regression Date: Mon, 25 May 2015 02:57:56 -0400 Message-ID: <20150525065756.GR17573@brightrain.aerifal.cx> References: <20150517181556.GA23050@euler> <20150517195622.GA4761@euler> <20150518183929.GA6370@euler> <20150518201043.GX17573@brightrain.aerifal.cx> <20150518201422.GY17573@brightrain.aerifal.cx> <20150518220731.GA31132@euler> <20150522062346.GK17573@brightrain.aerifal.cx> <20150524030809.GA19134@brightrain.aerifal.cx> <20150525003648.GO17573@brightrain.aerifal.cx> <1432535489.2715.1.camel@inria.fr> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1432537096 30503 80.91.229.3 (25 May 2015 06:58:16 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 25 May 2015 06:58:16 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7765-gllmg-musl=m.gmane.org@lists.openwall.com Mon May 25 08:58:12 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YwmKy-0001cT-2L for gllmg-musl@m.gmane.org; Mon, 25 May 2015 08:58:12 +0200 Original-Received: (qmail 5336 invoked by uid 550); 25 May 2015 06:58:10 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 5310 invoked from network); 25 May 2015 06:58:10 -0000 Content-Disposition: inline In-Reply-To: <1432535489.2715.1.camel@inria.fr> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:7753 Archived-At: On Mon, May 25, 2015 at 08:31:29AM +0200, Jens Gustedt wrote: > Am Sonntag, den 24.05.2015, 20:36 -0400 schrieb Rich Felker: > > There's a simple alternative I just came up with though: have > > dlstart.c compute the number of REL entries that need their addends > > saved and allocate a VLA on its stack for stages 2 and 3 to use. While > > the number of addends could be significant, it's many orders of > > magnitude smaller than the smallest practical stack sizes we could > > actually run with, so it's perfectly safe to put it on the stack. > > > > Here're the basic changes I'd like to make to dlstart.c to implement > > this: > > > > 1. Remove processing of DT_JMPREL REL/RELA table. All entries in this > > table are necessarily JMP_SLOT type, and thus symbolic, so there's > > nothing stage 1 can do with them anyway. Also, being JMP_SLOT type, > > they have implicit addends of zero if they're REL-type, so there's > > no need to save addends. > > > > 2. Remove the loop in dlstart.c that works like a fake function call 3 > > times to process DT_JMPREL, DT_REL, and DT_RELA. Instead we just > > need 2 iterations, and now the stride is constant in each, so they > > should simplify down a lot more inline. > > > > 3. During the loop that processes DT_REL, count the number of > > non-relative relocations (ones we skip at this stage), then make a > > VLA this size and pass its address to __dls2 as a second argument. > > > > 4. Have the do_relocs in stage 2 save addends in this provided array > > before overwriting them, and save its address for use by stage 3. > > > > 5. Have the do_relocs in stage 3 (for ldso/libc only) pull addends > > from this array instead of of from inline. > > > > Steps 1 and 2 are purely code removal/simplification and should be > > done regardless of whether we move forward on the above program, I > > think. Steps 3-5 add some complexity but hardly any code, just a few > > lines here and there. > > > > Comments? > > I like it. > > The thing that is a bit critical here, is the VLA. Not because it is a > VLA as such, but because it is a dynamic allocation on the stack. We > already have a similar strategy in pthread_create for TLS. The > difference is that there we have > > - a sanity check > - an alternative strategy if the sanity check fails > > Would there be a possibility to have both here, too? I'm not sure what you're referring to in pthread_create. It has no VLA. Maybe you mean the choice of whether to put TLS in a caller-provided thread stack, but I don't think that's an analogous situaton. In that one, the provided stack size is known and TLS is arbitrarily large (under the control of the app and loaded libs). Here (dlstart), the stack size is not known and the size of the addend table is not specifically known, but it's fixed at ld-time for libc.so and the same for every run, and it's orders of magnitude smaller than any usable stack (e.g. 14 entries on i386). There are two potential failure modes here: 1. We run out of stack because RLIMIT_STACK is insanely small. In that case we'll quickly crash somewhere else. If there's a risk of getting off the stack into other memory, that risk would already exist in other places. 2. We run out of stack because the REL table is HUGE. This is a static condition for the libc.so ELF file and would not change from run to run. If something went wrong in the build process to cause this, it needs to be fixed. So I'm skeptical of the need for a fallback. If we do want/need a fallback, it would need to be of the following form: 1. Count addends that need to be saved. 2. If cnt