From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/1623 Path: news.gmane.org!not-for-mail From: musl Newsgroups: gmane.linux.lib.musl.general Subject: Re: ldso : dladdr support Date: Fri, 17 Aug 2012 18:35:48 +0200 Message-ID: <502E72E4.7050008@gmail.com> References: <5020DA13.6080803@gmail.com> <20120807114627.GG30810@port70.net> <50212306.6070402@gmail.com> <20120807230933.GC27715@brightrain.aerifal.cx> <502237A1.1000805@gmail.com> <20120808115202.GL30810@port70.net> <5022703B.3090105@gmail.com> <20120811230536.GQ27715@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1345221366 4690 80.91.229.3 (17 Aug 2012 16:36:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 17 Aug 2012 16:36:06 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-1624-gllmg-musl=m.gmane.org@lists.openwall.com Fri Aug 17 18:36:05 2012 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1T2PWm-0006sX-Vw for gllmg-musl@plane.gmane.org; Fri, 17 Aug 2012 18:36:05 +0200 Original-Received: (qmail 1959 invoked by uid 550); 17 Aug 2012 16:36:03 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 1951 invoked from network); 17 Aug 2012 16:36:03 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=z5XKUPH0HZVdlxB1+5Ci+G0nBizCOH5TNHNhMrs//io=; b=T06DoaQGHyBHohQMmctXiNTDLNL1VWsbscT3xdW70c1jXhz0xd6af9vFyuPRSwHJg9 SrxPZF5AsWuoVQX4wP+90hQ/qzL0x7RTAK7peIY7o/Q6nAx3RbXhPUEEplAerqwBIviC /eEWQy2XGSCzAHM96+Ko5DNTCp4h69hyAsaM9S7e2uJgL6Ch+mBDU9xtbCYkHF3v2FKL zgRMfqcQ109xrnzEjFiyWVWD6NFn5r3p6Y0RtUhOTwuuddIR4gzaCqaCTJWdIltkwg7U z89oLl765eYI9NrtffJcCubEjyWvEcLowLEfwhnpwEYMBkOR1Tx7PyUbMGU2KbrB5drl TXVA== User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 In-Reply-To: <20120811230536.GQ27715@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:1623 Archived-At: On 12/08/2012 01:05, Rich Felker wrote: > On Wed, Aug 08, 2012 at 03:57:15PM +0200, musl wrote: >> Same as before except I use bit mask to support multiple hash algorithms. > Sorry for taking a while to get back to you. I haven't had as much > time to work on musl the past couple weeks and some other topics (like > mips dynamic linking) had priority, but I hope to have more again for > a while now. Here's a quick review of some things that will hopefully > turn into a discussion for improving/simplifying the code. > >> +#ifdef _GNU_SOURCE >> +typedef struct { >> + const char *dli_fname; /* Pathname of shared object that >> + contains address */ >> + void *dli_fbase; /* Address at which shared object >> + is loaded */ >> + const char *dli_sname; /* Name of nearest symbol with address >> + lower than addr */ >> + void *dli_saddr; /* Exact address of symbol named >> + in dli_sname */ >> +} Dl_info; > musl policy is not to have commentary, especially copied from > third-party sources, in the public headers. This is partly to > strengthen the claim that all public headers are public domain > (contain no copyrightable content) and partly just to avoid size creep > and wasted parsing time by the compiler. > >> static void decode_dyn(struct dso *p) >> { >> - size_t dyn[DYN_CNT] = {0}; >> - decode_vec(p->dynv, dyn, DYN_CNT); >> - p->syms = (void *)(p->base + dyn[DT_SYMTAB]); >> - p->hashtab = (void *)(p->base + dyn[DT_HASH]); >> - p->strings = (void *)(p->base + dyn[DT_STRTAB]); >> + size_t *v; >> + p->hashalgs = 0; >> + for (v = p->dynv; v[0]; v+=2) { >> + switch (v[0]) { >> + case DT_SYMTAB: >> + p->syms = (void *)(p->base + v[1]); >> + break; >> + case DT_HASH: >> + p->hashtabs[SYSV_HASH] = (void *)(p->base + v[1]); >> + p->hashalgs |= (1 << SYSV_HASH); >> + break; >> + case DT_STRTAB: >> + p->strings = (void *)(p->base + v[1]); >> + break; >> + case DT_GNU_HASH: >> + p->hashtabs[GNU_HASH] = (void *)(p->base + v[1]); >> + p->hashalgs |= (1 << GNU_HASH); >> + break; >> + default: >> + break; >> + } >> + } >> } > This is rather ugly but I don't see a better way right off, since > DT_GNU_HASH has that huge value... Maybe it would be nice to improve > decode_vec to have a variant that takes a (static const) table of DT_x > values and struct offsets to store them at when found..? This is just > some rambling and I'm not sure it's better, but it might be smart if > we're going to want to continue adding support for more > non-original-sysv DT_* entries with huge values, so we don't have to > write new switches for each one. I've done another implem of the decode_vec function : static void decode_vec(size_t *v, const size_t *def, size_t *storage, size_t *found, size_t size) { size_t i; memset(storage, 0, size*sizeof(size_t)); memset(found, 0, (size+sizeof(size_t)*8-1)/8); for (; v[0]; v+=2) { if (v[0] < def[0] || v[0] > def[size-1]) continue; for (i = 0; i < size; ++i) { if (v[0] < def[i]) break; else if (v[0] == def[i]) { storage[i] = v[1]; found[i/(sizeof(size_t)*8)] |= (1 << (i%(sizeof(size_t)*8))); break; } } } } The def table is a table of ordered tag values (DT_x, AT_x, ...) values. When a tag is found in v the value is stored in the storage table and a flag is set in the found table. This is the modified decode_dyn function: static void decode_dyn(struct dso *p) { static const size_t def[] = {DT_HASH, DT_STRTAB, DT_SYMTAB, DT_GNU_HASH}; size_t dyn[4]; size_t found; decode_vec (p->dynv, def, dyn, &found, 4); p->syms = (void *)(p->base + dyn[2]); p->strings = (void *)(p->base + dyn[1]); if (found&(1<<0)) p->hashtab = (void *)(p->base + dyn[0]); if (found&(1<<3)) p->ghashtab = (void *)(p->base + dyn[3]); } I'm not sure it's the best way to support big DT_x values as it implies an iteration over the def table for each entry in the vector table. In the previous example, if the dyn vector contains multiple tags with values between DT_SYMTAB and DT_GNU_HASH, the decode_vec func iterates over the first 3 entries of the def table for each of these tags. > BTW, while I _want_ it to be safe, it's possible that early switches > (early meaning prior to the comment in __dynlink that says libc is now > fully functional) will actually fail to work/crash on some archs... So > this needs consideration too. > >> static struct dso *load_library(const char *name) >> @@ -784,8 +899,11 @@ end: >> static void *do_dlsym(struct dso *p, const char *s, void *ra) >> { >> [...] >> if (sym && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES)) >> return p->base + sym->st_value; >> if (p->deps) for (i=0; p->deps[i]; i++) { >> - sym = lookup(s, h, p->deps[i]); >> + algs = p->deps[i]->hashalgs; >> + if (!(algs & ok)) { >> + if (algs & SYSV_HASH) { >> + h[SYSV_HASH] = sysv_hash(s); >> + sym = sysv_lookup(s, h[SYSV_HASH], p->deps[i]); >> + ok |= SYSV_HASH; >> + } else { >> + h[GNU_HASH] = gnu_hash(s); >> + sym = gnu_lookup(s, h[GNU_HASH], p->deps[i]); >> + ok |= GNU_HASH; >> + } >> + } else { >> + if (algs & SYSV_HASH) >> + sym = sysv_lookup(s, h[SYSV_HASH], p->deps[i]); >> + else >> + sym = gnu_lookup(s, h[GNU_HASH], p->deps[i]); >> + } > This looks like a lot of code duplication and extra unnecessary > variables. The way I would do it is something like: > > if (p->deps[i]->hashtab && (h || !p->deps[i]->ghashtab)) { > if (!h) h = hash(s); > sym = sysv_lookup(s, h, p->deps[i]); > } > > i.e. if there's a sysv hash table and we've already computed h (sysv > hash) or if there's no gnu hash table, compute h if it wasn't already > computed, and then attempt a lookup with it. > > I'm not sure I got the logic all right (this is really a 1-minute > glance over the code right now amidst doing lots of other stuff too) > but the ideas are: > > - no need for extra vars for bitmask. Whether the hash var for the > corresponding hash type is nonzero is sufficient to tell whether > it's been computed. > - no need for extra vars/fields to store which hash types a dso has. > Just use the hashtab/ghashtab fields in the dso struct, and let them > be null if the corresponding hash table does not exist. (And don't > make them an array unless there's a real benefit in using an array; > I don't think there is any benefit unless you're aiming for > extensibility to support N hash types.) > >> +static int do_dladdr (void *addr, Dl_info *info) >> [...] >> + if (p->hashalgs & (1 << SYSV_HASH)) { >> + hashtab = p->hashtabs[SYSV_HASH]; >> + for (i = 0; i < hashtab[1]; i++) { > I'm not seeing why this function needs hash tables at all. It's not > looking up symbols, just iterating over the entire symbol table, no? > Please explain if I'm mistaken. > > Rich