From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/1458 Path: news.gmane.org!not-for-mail From: musl Newsgroups: gmane.linux.lib.musl.general Subject: Re: ldso : dladdr support Date: Wed, 08 Aug 2012 11:55:45 +0200 Message-ID: <502237A1.1000805@gmail.com> References: <5020DA13.6080803@gmail.com> <20120807114627.GG30810@port70.net> <50212306.6070402@gmail.com> <20120807230933.GC27715@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------000201080105040506020208" X-Trace: dough.gmane.org 1344419764 28743 80.91.229.3 (8 Aug 2012 09:56:04 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 8 Aug 2012 09:56:04 +0000 (UTC) Cc: Rich Felker To: musl@lists.openwall.com Original-X-From: musl-return-1459-gllmg-musl=m.gmane.org@lists.openwall.com Wed Aug 08 11:56:01 2012 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Sz2zh-00007y-7E for gllmg-musl@plane.gmane.org; Wed, 08 Aug 2012 11:56:01 +0200 Original-Received: (qmail 25777 invoked by uid 550); 8 Aug 2012 09:56:00 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 25769 invoked from network); 8 Aug 2012 09:55:59 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type; bh=5tS9boIB79Ja0OhYWsPRVKW/kI8ACqPbY88lgJlt8Oo=; b=uWQrpEbpz3Ww0SM8ifZoUY02hqb5VBbpYNNyOi/t0JRbNz+75Dl31L5DLWkdrKxW9f 98ghAvwxL96j1MnJf8pLo1+EYGgDwOVvShmZqpLZQWbcLq8l6IstIp/oZ98c76tDQeRc c6iSHMtyeOeGCSzeFNAh8W0JO/06Y7OFNxFVziflNLMxCJzWCo3XQxf4k/P6BNlVhWsS qOkCaj2GEJWu72B+Fp/70bOaeE9D+jZqWrzEPmQy2wOpUx3MrL0vDMyz9BDkV2hYCyzu nVhOJEhAWwtpQov8s+3rH5TPWxdv+Bh4Cra5JhFNF9BUGpcsDbrMXDE8DRuG/g0TyFy5 gAqg== User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 In-Reply-To: <20120807230933.GC27715@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:1458 Archived-At: This is a multi-part message in MIME format. --------------000201080105040506020208 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Here is the new patch for dladdr + gnu hash support. Please tell me if you find coding style errors. On 08/08/2012 01:09, Rich Felker wrote: > On Tue, Aug 07, 2012 at 04:15:34PM +0200, musl wrote: >>> actually i'm not sure if the function pointer approach is >>> the right one here, but that's a design question >>> >> Could you tell me why (performance)? > Some reasons I would avoid gratuitous tables of function pointer > tables for things like this: > > 1. In shared libraries, they have to be relocated at load time, which > adds to startup time cost. A single table doesn't make much > difference, but the _policy_ of avoiding them in general can make a > difference when there are many places they might have been used. > > 2. Likewise, since they're non-constant, they have to be in the data > segment, which increases the size of the library's data segment. > Same "single table vs policy" point as in #1 applies. > > 3. Since they're non-constant, they're nice targets for an attacker > looking to perform code execution exploits. I don't like > artificially restricting our use of the language just because > certain constructs could help an attacker exploit an > ALREADY-vulnerable program, but I also think it's prudent to avoid > gratuitous uses like this that serve no purpose and potentially > contribute to exploitability. > I reworked the implementation to remove all function pointers. > >>>> + uint32_t *hashtab = dso->hashtab; >>>> + uint32_t *buckets = hashtab + 4 + (hashtab[2] * (sizeof(size_t) / sizeof(uint32_t))); >>> i don't know the algorithm but the sizeof magic looks suspicious to me >> I took the algorithm described here : >> >> https://blogs.oracle.com/ali/entry/gnu_hash_elf_sections >> >> The maskwords are native word size (64 bits for elf64 and 32 bits for elf32). >> I could define these macros: >> #define MASKWORD_SIZE sizeof (Elf32_Word) >> #define MASKWORD_SIZE sizeof (Elf64_Word) > In that case, we use size_t, not the ElfXX_Foo mess which only makes > sense when writing generic tools for dealing with ELF files from > arbitrary archs, not when handling native ELF structures. > >>>> +static uint32_t gnu_hash (const char *s0) >>>> +{ >>>> + const unsigned char *s = (void *)s0; >>>> + uint_fast32_t h = 5381; >>>> + for (unsigned char c = *s; c != '\0'; c = *++s) >>>> + h = h * 33 + c; >>>> + return h & 0xffffffff; >>>> +} >>>> + >>> i think c != '\0' is not very idiomatic >>> and i'd avoid using c99 style declaration in for >>> >>> use >>> >>> for (; *s; s++) >>> h = 33*h + *s; >>> >> Corrected. > Indeed, that's a lot cleaner. > >>>> + uint32_t shift2 = hashtab[3]; >>>> + uint32_t h2 = h1 >> shift2; >>> i'm not sure if input validation makes sense in ldso >>> but shifts can be tricky (hashtab[3] must be in 0..31 here) >> By validation do you mean mask the shift value with 0x1F ? > There are millions of ways an invalid/corrupt ELF file can cause the > dynamic linker/calling program to invoke UB. I don't see much value in > checking for them. Such invalid files will only exist if created > intentionally or if your storage medium is corrupted. There is no > "safe" interface corresponding to dlopen by which you can try opening > a DSO and sanity-checking it before accessing any code or data from it > (e.g. the ctors run before dlopen returns!) so the only way to handle > evil ELF files is to validate the image on disk _before_ calling > dlopen, and opening it with an absolute pathname to a location under > your control. > >>>> + for (h1 &= ~1; 1; sym++) { >>> the 1; in the middle is unnecessary >>> >>> h1 &= ~1; is problematic if signed int representation is >>> not two's complement >>> (~1 is an implementation defined negative value which is > It's not implementation-defined. It's defined as operating on the bit > representation, and once the signed representation is chosen among the > 3 allowed, the implementation is forced to perform ~ accordingly. > >>> then converted to uint32_t according to well defined rules) > musl will never deal with non-twos-complement implementations (AFAIK, > no non-twos-complement conformant C implementation has ever been > created) so it's not a big deal, but I agree it would be cleaner to > avoid the issue. Personally I try to avoid ever using bitwise > operators on signed types. > >>> i think we can have more efficient code here if >>> only gnu and sysv hash algos are supported >>> (i don't think it's realistic to assume new hash algorithms >>> even gnu hash is a fairly useless addition to elf) >> This has to do with the function pointer approach. >> Do you prefer if/else alternative? > Yes, I prefer if/else. > > Also, we had a conversation on the list and/or IRC a while back (I > don't remember which) where I described some optimizations that should > be present if gnu hash is to be supported. Basically, the dynamic > linker should keep track of whether there's one of the two hashes > that's supported by ALL loaded libs, and in that case, only ever use > that hash, to avoid computing two different hashes. The hash for given algo is only computed once (if needed). That's the reason for the computed table. If all libs uses the same hash algo, the other will never be computed. > Another issue that needs to be fixed before gnu hash support can be > committed is that you removed the hash comparisons for "dlopen", > "dlsym", and "__stack_chk_fail". Incuring a strcmp against these 3 > strings for every single symbol lookup seems extremely costly; that's > why I did the hash comparisons in the first place. Keeping track of > them is important so we can avoid the overhead of supporting these > functions when there's no way the application can ever attempt to use > them, but we also want to minimize the startup overhead of making this > determination. I added the precomp table wich includes both gnu and sysv hashes for the dlopen, dlsym and __stack_chk_fail symbols. > > Rich --------------000201080105040506020208 Content-Type: text/x-patch; name="ldso-add-dladdr-and-gnu-hash-support.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="ldso-add-dladdr-and-gnu-hash-support.patch" >From 4a809b89c38ca3a5b69e5758a04effd6d404039c Mon Sep 17 00:00:00 2001 From: Boris BREZILLON Date: Wed, 8 Aug 2012 11:45:31 +0200 Subject: [PATCH] ldso : add dladdr and gnu hash support --- include/dlfcn.h | 15 ++++ src/ldso/dynlink.c | 231 +++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 235 insertions(+), 11 deletions(-) diff --git a/include/dlfcn.h b/include/dlfcn.h index dea74c7..8c45822 100644 --- a/include/dlfcn.h +++ b/include/dlfcn.h @@ -18,6 +18,21 @@ char *dlerror(void); void *dlopen(const char *, int); void *dlsym(void *, const char *); +#ifdef _GNU_SOURCE +typedef struct { + const char *dli_fname; /* Pathname of shared object that + contains address */ + void *dli_fbase; /* Address at which shared object + is loaded */ + const char *dli_sname; /* Name of nearest symbol with address + lower than addr */ + void *dli_saddr; /* Exact address of symbol named + in dli_sname */ +} Dl_info; + +int dladdr (void *addr, Dl_info *info); +#endif + #ifdef __cplusplus } #endif diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c index f55c6f1..f40c1e5 100644 --- a/src/ldso/dynlink.c +++ b/src/ldso/dynlink.c @@ -1,3 +1,4 @@ +#define _GNU_SOURCE #include #include #include @@ -28,12 +29,14 @@ typedef Elf32_Phdr Phdr; typedef Elf32_Sym Sym; #define R_TYPE(x) ((x)&255) #define R_SYM(x) ((x)>>8) +#define ELF_ST_TYPE ELF32_ST_TYPE #else typedef Elf64_Ehdr Ehdr; typedef Elf64_Phdr Phdr; typedef Elf64_Sym Sym; #define R_TYPE(x) ((x)&0xffffffff) #define R_SYM(x) ((x)>>32) +#define ELF_ST_TYPE ELF64_ST_TYPE #endif struct debug { @@ -52,6 +55,7 @@ struct dso { int refcnt; Sym *syms; + uint32_t hashalg; uint32_t *hashtab; char *strings; unsigned char *map; @@ -66,6 +70,10 @@ struct dso { char buf[]; }; +#define SYSV_HASH_ALG_IDX 0 +#define GNU_HASH_ALG_IDX 1 +#define HASH_ALG_CNT 2 + #include "reloc.h" void __init_ssp(size_t *); @@ -94,7 +102,7 @@ static void decode_vec(size_t *v, size_t *a, size_t cnt) } } -static uint32_t hash(const char *s0) +static uint32_t sysv_hash(const char *s0) { const unsigned char *s = (void *)s0; uint_fast32_t h = 0; @@ -105,7 +113,16 @@ static uint32_t hash(const char *s0) return h & 0xfffffff; } -static Sym *lookup(const char *s, uint32_t h, struct dso *dso) +static uint32_t gnu_hash (const char *s0) +{ + const unsigned char *s = (void *)s0; + uint_fast32_t h = 5381; + for (; *s; s++) + h = h*33 + *s; + return h & 0xffffffff; +} + +static Sym *sysv_lookup(const char *s, uint32_t h, struct dso *dso) { size_t i; Sym *syms = dso->syms; @@ -118,20 +135,94 @@ static Sym *lookup(const char *s, uint32_t h, struct dso *dso) return 0; } +static Sym *gnu_lookup(const char *s, uint32_t h1, struct dso *dso) +{ + size_t i; + Sym *sym; + char *strings = dso->strings; + uint32_t *hashtab = dso->hashtab; + uint32_t nbuckets = hashtab[0]; + size_t *maskwords = (size_t *)(hashtab + 4); + uint32_t *buckets = hashtab + 4 + (hashtab[2]*(sizeof(size_t)/sizeof(uint32_t))); + uint32_t symndx = hashtab[1]; + Sym *syms = dso->syms; + uint32_t shift2 = hashtab[3]; + uint32_t h2 = h1 >> shift2; + uint32_t *hashvals = buckets + nbuckets; + uint32_t *hashval; + size_t c = sizeof(size_t) * 8; + size_t n = (h1/c) & (hashtab[2]-1); + size_t bitmask = (1 << (h1%c)) | (1 << (h2%c)); + + if ((maskwords[n] & bitmask) != bitmask) + return 0; + + n = buckets[h1 % nbuckets]; + if (!n) + return 0; + + sym = syms + n; + hashval = hashvals + n - symndx; + + for (h1 &= (uint32_t)-2;; sym++) { + h2 = *hashval++; + if ((h1 == (h2 & ~1)) && !strcmp(s, strings + sym->st_name)) + return sym; + + if (h2 & 1) + break; + } + + return 0; +} + #define OK_TYPES (1<hashalg; + if (alg == SYSV_HASH_ALG_IDX) + h[alg] = sysv_hash(s); + else + h[alg] = gnu_hash(s); + + computed[alg] = 1; + + if (h[alg] == precomp[alg][0] && !strcmp(s, "dlopen")) rtld_used = 1; + if (h[alg] == precomp[alg][1] && !strcmp(s, "dlsym")) rtld_used = 1; + if (h[alg] == precomp[alg][2] && !strcmp(s, "__stack_chk_fail")) ssp_used = 1; + for (; dso; dso=dso->next) { Sym *sym; + if (!dso->global) continue; - sym = lookup(s, h, dso); + + alg = dso->hashalg; + if (!computed[alg]) { + if (alg == SYSV_HASH_ALG_IDX) { + h[alg] = sysv_hash(s); + sym = sysv_lookup(s, h[alg], dso); + } + else { + h[alg] = gnu_hash(s); + sym = gnu_lookup(s, h[alg], dso); + } + computed[alg] = 1; + } else { + if (alg == SYSV_HASH_ALG_IDX) + sym = sysv_lookup(s, h[alg], dso); + else + sym = gnu_lookup(s, h[alg], dso); + } + if (sym && (!need_def || sym->st_shndx) && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES) && (1<<(sym->st_info>>4) & OK_BINDS)) { @@ -320,11 +411,17 @@ static int path_open(const char *name, const char *search, char *buf, size_t buf static void decode_dyn(struct dso *p) { + size_t *v = p->dynv; size_t dyn[DYN_CNT] = {0}; decode_vec(p->dynv, dyn, DYN_CNT); p->syms = (void *)(p->base + dyn[DT_SYMTAB]); p->hashtab = (void *)(p->base + dyn[DT_HASH]); + p->hashalg = SYSV_HASH_ALG_IDX; p->strings = (void *)(p->base + dyn[DT_STRTAB]); + for (; v[0]; v+=2) if (v[0] == DT_GNU_HASH) { + p->hashtab = (void *)(p->base + v[1]); + p->hashalg = GNU_HASH_ALG_IDX; + } } static struct dso *load_library(const char *name) @@ -784,8 +881,11 @@ end: static void *do_dlsym(struct dso *p, const char *s, void *ra) { size_t i; - uint32_t h; Sym *sym; + uint8_t computed[HASH_ALG_CNT] = {0, 0}; + uint8_t alg = p->hashalg; + uint32_t h[HASH_ALG_CNT]; + if (p == RTLD_NEXT) { for (p=head; p && (unsigned char *)ra-p->map>p->map_len; p=p->next); if (!p) p=head; @@ -798,12 +898,36 @@ static void *do_dlsym(struct dso *p, const char *s, void *ra) if (!res) goto failed; return res; } - h = hash(s); - sym = lookup(s, h, p); + + if (alg == SYSV_HASH_ALG_IDX) { + h[alg] = sysv_hash(s); + sym = sysv_lookup(s, h[alg], p); + } else { + h[alg] = gnu_hash(s); + sym = gnu_lookup(s, h[alg], p); + } + computed[alg] = 1; + if (sym && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES)) return p->base + sym->st_value; if (p->deps) for (i=0; p->deps[i]; i++) { - sym = lookup(s, h, p->deps[i]); + alg = p->deps[i]->hashalg; + if (!computed[alg]) { + if (alg == SYSV_HASH_ALG_IDX) { + h[alg] = sysv_hash(s); + sym = sysv_lookup(s, h[alg], p->deps[i]); + } else { + h[alg] = gnu_hash(s); + sym = gnu_lookup(s, h[alg], p->deps[i]); + } + computed[alg] = 1; + } else { + if (alg == SYSV_HASH_ALG_IDX) + sym = sysv_lookup(s, h[alg], p->deps[i]); + else + sym = gnu_lookup(s, h[alg], p->deps[i]); + } + if (sym && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES)) return p->deps[i]->base + sym->st_value; } @@ -813,6 +937,91 @@ failed: return 0; } +struct sym_search { + void *addr; + Dl_info *info; +}; + +static int find_closest_sym (struct dso *dso, Sym *sym, struct sym_search *search) +{ + void *symaddr = dso->base + sym->st_value; + char *strings = dso->strings; + Dl_info *info = search->info; + void *addr = search->addr; + void *prevaddr = info->dli_saddr; + + if (sym->st_value == 0 && sym->st_shndx == SHN_UNDEF) + return 1; + + if (ELF_ST_TYPE(sym->st_info) == STT_TLS) + return 1; + + if (addr < symaddr) + return 1; + + if (prevaddr && (addr - symaddr) > (addr - prevaddr)) + return 1; + + info->dli_saddr = symaddr; + info->dli_sname = strings + sym->st_name; + + if (addr == symaddr) + return 0; + + return 1; + +} + +static int do_dladdr (void *addr, Dl_info *info) +{ + struct sym_search search; + struct dso *p; + memset (info, 0, sizeof (*info)); + search.info = info; + search.addr = addr; + for (p=head; p; p=p->next) { + if ((unsigned char *)addr >= p->map && (unsigned char *)addr < p->map + p->map_len) { + Sym *syms = p->syms; + uint32_t *hashtab = p->hashtab; + size_t i; + info->dli_fname = p->name; + info->dli_fbase = p->base; + if (p->hashalg == SYSV_HASH_ALG_IDX) { + for (i = 0; i < hashtab[1]; i++) { + if (!find_closest_sym (p, syms + i, &search)) + return 1; + } + } else { + uint32_t *buckets = hashtab + 4 + (hashtab[2] * (sizeof(size_t)/sizeof(uint32_t))); + uint32_t nbuckets = hashtab[0]; + uint32_t *hashvals = buckets + nbuckets; + uint32_t symndx = hashtab[1]; + for (i = 0; i < nbuckets; ++i) { + uint32_t n = buckets[i]; + Sym *sym = syms + n; + uint32_t *hashval = hashvals + n - symndx; + + do { + if (!find_closest_sym (p, sym, &search)) + return 1; + }while (!(*hashval++ & 1)); + } + } + return 1; + } + } + return 0; +} + +int dladdr (void *addr, Dl_info *info) +{ + int res; + pthread_rwlock_rdlock(&lock); + res = do_dladdr (addr, info); + pthread_rwlock_unlock(&lock); + return res; +} + void *__dlsym(void *p, const char *s, void *ra) { void *res; -- 1.7.9.5 --------------000201080105040506020208--