From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/1656 Path: news.gmane.org!not-for-mail From: musl Newsgroups: gmane.linux.lib.musl.general Subject: Re: ldso : dladdr support Date: Mon, 20 Aug 2012 14:55:04 +0200 Message-ID: <503233A8.8000604@gmail.com> References: <20120807114627.GG30810@port70.net> <50212306.6070402@gmail.com> <20120807230933.GC27715@brightrain.aerifal.cx> <502237A1.1000805@gmail.com> <20120808115202.GL30810@port70.net> <5022703B.3090105@gmail.com> <20120811230536.GQ27715@brightrain.aerifal.cx> <20120817053934.GS27715@brightrain.aerifal.cx> <50311776.9040802@gmail.com> <20120820020626.GD27715@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------000205030204000700070502" X-Trace: ger.gmane.org 1345467323 12165 80.91.229.3 (20 Aug 2012 12:55:23 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 20 Aug 2012 12:55:23 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-1657-gllmg-musl=m.gmane.org@lists.openwall.com Mon Aug 20 14:55:24 2012 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1T3RVq-0007xQ-4y for gllmg-musl@plane.gmane.org; Mon, 20 Aug 2012 14:55:22 +0200 Original-Received: (qmail 20470 invoked by uid 550); 20 Aug 2012 12:55:19 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 20458 invoked from network); 20 Aug 2012 12:55:19 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=YsSTUjspMSTPe/nUdoxVx/7IGNnxKteJ2ETBdc/ZQ9Q=; b=Li7uQ5omPPA3ZpShfvow3BE24cBGMZb4p6x3wGuaUeq71XVEI8QBNb7JyRGOFlxB/I eVgM2k3FdkLvVFM4hMF8NGC7wSrjy6+HKQE37gOA3XEFzFbgMwFS9PWRWVXN9C0zzyK7 JeENmyr4KJ8al9NpKTeRJKcf9qok2phqaovqJ7R9khm00oMXm2ekYw1RwBhtl1ifgDpo CzPqqHU3Z2+y8ly2i6wIr2+SbQxbPFi52HEYWDVIrTWHUKVFDXnGap6IXxX5+63V1wBj YsuKjK3n9gpob3e0TAOn9O396w+q5X0zhIzbfbKyoNTD8ZuOtjwL4J126PetenlXQbE3 vhkA== User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 In-Reply-To: <20120820020626.GD27715@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:1656 Archived-At: This is a multi-part message in MIME format. --------------000205030204000700070502 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi, This patch adds: * decode_vec2 implemented for high DT_x values. * find_closest_sym code integrated in do_dladdr function. * other fixes you asked in your previous mail. Regards, Boris. On 20/08/2012 04:06, Rich Felker wrote: > On Sun, Aug 19, 2012 at 06:42:30PM +0200, musl wrote: >> Hi, >> >> This patch fixes a bug in dladdr: sym var was not incremented across gnu hash chain iteration). >> I also reworked the dladdr implem to share more code between sysv and gnu hash. >> I still haven't found a better way to get the symbol table size. Do you? >> >> This patch uses the new decode_vec function, but as I told you in my >> previous mail, I'm not sure this the way to go. >> Could you tell me what you think? > Yeah, I'm not really happy with it either. Trying to think of > something better... > >> diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c >> index f55c6f1..bf1ec6b 100644 >> --- a/src/ldso/dynlink.c >> +++ b/src/ldso/dynlink.c >> @@ -1,3 +1,4 @@ >> +#define _GNU_SOURCE >> #include >> #include >> #include >> @@ -28,12 +29,14 @@ typedef Elf32_Phdr Phdr; >> typedef Elf32_Sym Sym; >> #define R_TYPE(x) ((x)&255) >> #define R_SYM(x) ((x)>>8) >> +#define ELF_ST_TYPE ELF32_ST_TYPE >> #else >> typedef Elf64_Ehdr Ehdr; >> typedef Elf64_Phdr Phdr; >> typedef Elf64_Sym Sym; >> #define R_TYPE(x) ((x)&0xffffffff) >> #define R_SYM(x) ((x)>>32) >> +#define ELF_ST_TYPE ELF64_ST_TYPE >> #endif > These definitions are actually the same. I would just > #define ST_TYPE(x) ((x)&15) > >> -static uint32_t hash(const char *s0) >> +static uint32_t sysv_hash(const char *s0) >> { >> const unsigned char *s = (void *)s0; >> uint_fast32_t h = 0; >> @@ -105,7 +117,16 @@ static uint32_t hash(const char *s0) >> return h & 0xfffffff; >> } >> >> -static Sym *lookup(const char *s, uint32_t h, struct dso *dso) >> +static uint32_t gnu_hash (const char *s0) >> +{ >> + const unsigned char *s = (void *)s0; >> + uint_fast32_t h = 5381; >> + for (; *s; s++) >> + h = h*33 + *s; >> + return h & 0xffffffff; >> +} > The final &0xffffffff is a no-op. Note that the one in sysv_hash is > not a no-op; sysv_hash's result is 28 bits, not 32. > > Re-reading this code also raised another issue: I'm not entirely > convinced that 0 is not a possible hash value, which may invalidate > what I said before about using h==0 to indicate "not yet computed". Of > course, it may not matter; if one in 4 billion symbol names get their > hashes repeatedly recomputed rather than being reused, it's not going > to make any difference to overall performance... > >> /* Only trust user/env if kernel says we're not suid/sgid */ >> - if ((aux[0]&0x7800)!=0x7800 || aux[AT_UID]!=aux[AT_EUID] >> - || aux[AT_GID]!=aux[AT_EGID] || aux[AT_SECURE]) { >> + if ((found&0x1e0)!=0x1e0 || aux[5]!=aux[6] >> + || aux[7]!=aux[8] || aux[9]) { >> env_path = 0; >> env_preload = 0; > Looking at this, I agree that the new decode_vec idea is not a good > direction. It's obfuscating the code badly. > > For now, how about leaving the old decode_vec alone and just adding a > new one with a different name for getting to "high" entries. I wonder > if it would be possible, rather than using a list of wanted entries, > to use a base/count rather than always working zero-based like > decode_vec does. This would allow the resulting indices to still > actually mean something so we don't wind up with magic numbers all > over the code.. The decode_vec uses the first entry in 'a' to store the tags found in the given vector. If cnt is bigger than sizeof(size_t) * 8, there is a shift overflow. It's fine in the current use as you only test tag values < 32 (or don't use decode_vec to test it : AT_SYSINFO_EHDR). Should I take care of those cases in decode_vec2 (add a separate 'found' table argument)? > > Rich --------------000205030204000700070502 Content-Type: text/x-patch; name="dladdr-gnu-hash-v4.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="dladdr-gnu-hash-v4.patch" diff --git a/include/dlfcn.h b/include/dlfcn.h index dea74c7..8524e0b 100644 --- a/include/dlfcn.h +++ b/include/dlfcn.h @@ -18,6 +18,17 @@ char *dlerror(void); void *dlopen(const char *, int); void *dlsym(void *, const char *); +#ifdef _GNU_SOURCE +typedef struct { + const char *dli_fname; + void *dli_fbase; + const char *dli_sname; + void *dli_saddr; +} Dl_info; + +int dladdr (void *addr, Dl_info *info); +#endif + #ifdef __cplusplus } #endif diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c index 9692c6b..90272c5 100644 --- a/src/ldso/dynlink.c +++ b/src/ldso/dynlink.c @@ -1,3 +1,4 @@ +#define _GNU_SOURCE #include #include #include @@ -35,6 +36,7 @@ typedef Elf64_Sym Sym; #define R_TYPE(x) ((x)&0xffffffff) #define R_SYM(x) ((x)>>32) #endif +#define ST_TYPE(x) ((x)&15) struct debug { int ver; @@ -53,6 +55,7 @@ struct dso { int refcnt; Sym *syms; uint32_t *hashtab; + uint32_t *ghashtab; char *strings; unsigned char *map; size_t map_len; @@ -85,6 +88,7 @@ struct debug *_dl_debug_addr = &debug; #define AUX_CNT 24 #define DYN_CNT 34 +#define DYN_ADDR_CNT 12 static void decode_vec(size_t *v, size_t *a, size_t cnt) { @@ -95,7 +99,16 @@ static void decode_vec(size_t *v, size_t *a, size_t cnt) } } -static uint32_t hash(const char *s0) +static void decode_vec2(size_t *v, size_t *a, size_t cnt, size_t offset) +{ + memset(a, 0, cnt*sizeof(size_t)); + for (; v[0]; v+=2) if (v[0](offset-cnt)) { + a[0] |= 1ULL<<(offset-v[0]); + a[offset-v[0]] = v[1]; + } +} + +static uint32_t sysv_hash(const char *s0) { const unsigned char *s = (void *)s0; uint_fast32_t h = 0; @@ -106,7 +119,16 @@ static uint32_t hash(const char *s0) return h & 0xfffffff; } -static Sym *lookup(const char *s, uint32_t h, struct dso *dso) +static uint32_t gnu_hash (const char *s0) +{ + const unsigned char *s = (void *)s0; + uint_fast32_t h = 5381; + for (; *s; s++) + h = h*33 + *s; + return h; +} + +static Sym *sysv_lookup(const char *s, uint32_t h, struct dso *dso) { size_t i; Sym *syms = dso->syms; @@ -119,20 +141,81 @@ static Sym *lookup(const char *s, uint32_t h, struct dso *dso) return 0; } +static Sym *gnu_lookup(const char *s, uint32_t h1, struct dso *dso) +{ + size_t i; + Sym *sym; + char *strings = dso->strings; + uint32_t *hashtab = dso->ghashtab; + uint32_t nbuckets = hashtab[0]; + size_t *maskwords = (size_t *)(hashtab + 4); + uint32_t *buckets = hashtab + 4 + (hashtab[2]*(sizeof(size_t)/sizeof(uint32_t))); + uint32_t symndx = hashtab[1]; + Sym *syms = dso->syms; + uint32_t shift2 = hashtab[3]; + uint32_t h2 = h1 >> shift2; + uint32_t *hashvals = buckets + nbuckets; + uint32_t *hashval; + size_t c = sizeof(size_t) * 8; + size_t n = (h1/c) & (hashtab[2]-1); + size_t bitmask = (1 << (h1%c)) | (1 << (h2%c)); + + if ((maskwords[n] & bitmask) != bitmask) + return 0; + + n = buckets[h1 % nbuckets]; + if (!n) + return 0; + + sym = syms + n; + hashval = hashvals + n - symndx; + + for (h1 &= (uint32_t)-2;; sym++) { + h2 = *hashval++; + if ((h1 == (h2 & ~1)) && !strcmp(s, strings + sym->st_name)) + return sym; + + if (h2 & 1) + break; + } + + return 0; +} + #define OK_TYPES (1<hashtab) { + h = sysv_hash(s); + precomptab = precomp[0]; + } else { + gh = gnu_hash(s); + precomptab = precomp[0]; + } + if (h == precomptab[0] && !strcmp(s, "dlopen")) rtld_used = 1; + if (h == precomptab[1] && !strcmp(s, "dlsym")) rtld_used = 1; + if (h == precomptab[2] && !strcmp(s, "__stack_chk_fail")) ssp_used = 1; for (; dso; dso=dso->next) { Sym *sym; if (!dso->global) continue; - sym = lookup(s, h, dso); + if (dso->hashtab && (h || !dso->ghashtab)) { + if (!h) + h = sysv_hash(s); + sym = sysv_lookup(s, h, dso); + } else { + if (!gh) + gh = gnu_hash(s); + sym = gnu_lookup(s, gh, dso); + } if (sym && (!need_def || sym->st_shndx) && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES) && (1<<(sym->st_info>>4) & OK_BINDS)) { @@ -325,8 +408,12 @@ static void decode_dyn(struct dso *p) size_t dyn[DYN_CNT] = {0}; decode_vec(p->dynv, dyn, DYN_CNT); p->syms = (void *)(p->base + dyn[DT_SYMTAB]); - p->hashtab = (void *)(p->base + dyn[DT_HASH]); p->strings = (void *)(p->base + dyn[DT_STRTAB]); + if (dyn[0]&(1<hashtab = (void *)(p->base + dyn[DT_HASH]); + decode_vec2(p->dynv, dyn, DYN_ADDR_CNT, DT_ADDRRNGHI+1); + if (dyn[0]&(1<<(DT_ADDRTAGIDX(DT_GNU_HASH)+1))) + p->ghashtab = (void *)(p->base + dyn[DT_ADDRTAGIDX(DT_GNU_HASH)+1]); } static struct dso *load_library(const char *name) @@ -788,7 +875,7 @@ end: static void *do_dlsym(struct dso *p, const char *s, void *ra) { size_t i; - uint32_t h; + uint32_t h = 0, gh = 0; Sym *sym; if (p == RTLD_NEXT) { for (p=head; p && (unsigned char *)ra-p->map>p->map_len; p=p->next); @@ -802,12 +889,25 @@ static void *do_dlsym(struct dso *p, const char *s, void *ra) if (!res) goto failed; return res; } - h = hash(s); - sym = lookup(s, h, p); + if (p->hashtab) { + h = sysv_hash(s); + sym = sysv_lookup(s, h, p); + } else { + gh = gnu_hash(s); + sym = gnu_lookup(s, gh, p); + } if (sym && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES)) return p->base + sym->st_value; if (p->deps) for (i=0; p->deps[i]; i++) { - sym = lookup(s, h, p->deps[i]); + if (p->deps[i]->hashtab && (h || !p->deps[i]->ghashtab)) { + if (!h) + h = sysv_hash(s); + sym = sysv_lookup(s, h, p->deps[i]); + } else { + if (!gh) + gh = gnu_hash(s); + sym = gnu_lookup(s, h, p->deps[i]); + } if (sym && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES)) return p->deps[i]->base + sym->st_value; } @@ -817,6 +917,69 @@ failed: return 0; } +static int do_dladdr (void *addr, Dl_info *info) +{ + struct dso *p; + memset (info, 0, sizeof (*info)); + for (p=head; p; p=p->next) { + if ((unsigned char *)addr >= p->map && (unsigned char *)addr < p->map + p->map_len) { + Sym *syms = p->syms; + uint32_t nsym = 0; + char *strings = p->strings; + size_t i; + info->dli_fname = p->name; + info->dli_fbase = p->base; + if (p->hashtab) + nsym = p->hashtab[1]; + else { + uint32_t *buckets; + buckets = p->ghashtab + 4 + (p->ghashtab[2] * (sizeof(size_t)/sizeof(uint32_t))); + syms += p->ghashtab[1]; + for (i = 0; i < p->ghashtab[0]; ++i) { + if (buckets[i] > nsym) + nsym = buckets[i]; + } + + if (nsym) { + nsym -= p->ghashtab[1]; + uint32_t *hashval = buckets + p->ghashtab[0] + nsym; + do { + nsym++; + }while (!(*hashval++ & 1)); + } + + } + + for (i = 0; i < nsym; i++) { + if ((syms[i].st_value != 0 || syms[i].st_shndx != SHN_UNDEF) && + ST_TYPE(syms[i].st_info) != STT_TLS) { + void *symaddr = p->base + syms[i].st_value; + + if (addr >= symaddr && (!info->dli_saddr || (addr-symaddr)<(addr-info->dli_saddr))) { + info->dli_saddr = symaddr; + info->dli_sname = strings + syms[i].st_name; + + if (addr == symaddr) + break; + } + } + } + + return 1; + } + } + return 0; +} + +int dladdr (void *addr, Dl_info *info) +{ + int res; + pthread_rwlock_rdlock(&lock); + res = do_dladdr (addr, info); + pthread_rwlock_unlock(&lock); + return res; +} + void *__dlsym(void *p, const char *s, void *ra) { void *res; --------------000205030204000700070502--