mailing list of musl libc
 help / color / mirror / code / Atom feed
From: musl <b.brezillon.musl@gmail.com>
To: musl@lists.openwall.com
Subject: Re: ldso : dladdr support
Date: Mon, 20 Aug 2012 14:55:04 +0200	[thread overview]
Message-ID: <503233A8.8000604@gmail.com> (raw)
In-Reply-To: <20120820020626.GD27715@brightrain.aerifal.cx>

[-- Attachment #1: Type: text/plain, Size: 3846 bytes --]

Hi,

This patch adds:
* decode_vec2 implemented for high DT_x values.
* find_closest_sym code integrated in do_dladdr function.
* other fixes you asked in your previous mail.

Regards,

Boris.


On 20/08/2012 04:06, Rich Felker wrote:
> On Sun, Aug 19, 2012 at 06:42:30PM +0200, musl wrote:
>> Hi,
>>
>> This patch fixes a bug in dladdr: sym var was not incremented across gnu hash chain iteration).
>> I also reworked the dladdr implem to share more code between sysv and gnu hash.
>> I still haven't found a better way to get the symbol table size. Do you?
>>
>> This patch uses the new decode_vec function, but as I told you in my
>> previous mail, I'm not sure this the way to go.
>> Could you tell me what you think?
> Yeah, I'm not really happy with it either. Trying to think of
> something better...
>
>> diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
>> index f55c6f1..bf1ec6b 100644
>> --- a/src/ldso/dynlink.c
>> +++ b/src/ldso/dynlink.c
>> @@ -1,3 +1,4 @@
>> +#define _GNU_SOURCE
>>  #include <stdio.h>
>>  #include <stdlib.h>
>>  #include <string.h>
>> @@ -28,12 +29,14 @@ typedef Elf32_Phdr Phdr;
>>  typedef Elf32_Sym Sym;
>>  #define R_TYPE(x) ((x)&255)
>>  #define R_SYM(x) ((x)>>8)
>> +#define ELF_ST_TYPE ELF32_ST_TYPE
>>  #else
>>  typedef Elf64_Ehdr Ehdr;
>>  typedef Elf64_Phdr Phdr;
>>  typedef Elf64_Sym Sym;
>>  #define R_TYPE(x) ((x)&0xffffffff)
>>  #define R_SYM(x) ((x)>>32)
>> +#define ELF_ST_TYPE ELF64_ST_TYPE
>>  #endif
> These definitions are actually the same. I would just
> #define ST_TYPE(x) ((x)&15)
>
>> -static uint32_t hash(const char *s0)
>> +static uint32_t sysv_hash(const char *s0)
>>  {
>>  	const unsigned char *s = (void *)s0;
>>  	uint_fast32_t h = 0;
>> @@ -105,7 +117,16 @@ static uint32_t hash(const char *s0)
>>  	return h & 0xfffffff;
>>  }
>>  
>> -static Sym *lookup(const char *s, uint32_t h, struct dso *dso)
>> +static uint32_t gnu_hash (const char *s0)
>> +{
>> +	const unsigned char *s = (void *)s0;
>> +	uint_fast32_t h = 5381;
>> +	for (; *s; s++)
>> +		h = h*33 + *s;
>> +	return h & 0xffffffff;
>> +}
> The final &0xffffffff is a no-op. Note that the one in sysv_hash is
> not a no-op; sysv_hash's result is 28 bits, not 32.
>
> Re-reading this code also raised another issue: I'm not entirely
> convinced that 0 is not a possible hash value, which may invalidate
> what I said before about using h==0 to indicate "not yet computed". Of
> course, it may not matter; if one in 4 billion symbol names get their
> hashes repeatedly recomputed rather than being reused, it's not going
> to make any difference to overall performance...
>
>>  	/* Only trust user/env if kernel says we're not suid/sgid */
>> -	if ((aux[0]&0x7800)!=0x7800 || aux[AT_UID]!=aux[AT_EUID]
>> -	  || aux[AT_GID]!=aux[AT_EGID] || aux[AT_SECURE]) {
>> +	if ((found&0x1e0)!=0x1e0 || aux[5]!=aux[6]
>> +	  || aux[7]!=aux[8] || aux[9]) {
>>  		env_path = 0;
>>  		env_preload = 0;
> Looking at this, I agree that the new decode_vec idea is not a good
> direction. It's obfuscating the code badly.
>
> For now, how about leaving the old decode_vec alone and just adding a
> new one with a different name for getting to "high" entries. I wonder
> if it would be possible, rather than using a list of wanted entries,
> to use a base/count rather than always working zero-based like
> decode_vec does. This would allow the resulting indices to still
> actually mean something so we don't wind up with magic numbers all
> over the code..
The decode_vec uses the first entry in 'a' to store the tags found in the given vector.
If cnt is bigger than sizeof(size_t) * 8, there is a shift overflow.
It's fine in the current use as you only test tag values < 32 (or don't use decode_vec to test it : AT_SYSINFO_EHDR).
Should I take care of those cases in decode_vec2 (add a separate 'found' table argument)?
>
> Rich


[-- Attachment #2: dladdr-gnu-hash-v4.patch --]
[-- Type: text/x-patch, Size: 7892 bytes --]

diff --git a/include/dlfcn.h b/include/dlfcn.h
index dea74c7..8524e0b 100644
--- a/include/dlfcn.h
+++ b/include/dlfcn.h
@@ -18,6 +18,17 @@ char  *dlerror(void);
 void  *dlopen(const char *, int);
 void  *dlsym(void *, const char *);
 
+#ifdef _GNU_SOURCE
+typedef struct {
+	const char *dli_fname;
+	void *dli_fbase;
+	const char *dli_sname;
+	void *dli_saddr;
+} Dl_info;
+
+int dladdr (void *addr, Dl_info *info);
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
index 9692c6b..90272c5 100644
--- a/src/ldso/dynlink.c
+++ b/src/ldso/dynlink.c
@@ -1,3 +1,4 @@
+#define _GNU_SOURCE
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -35,6 +36,7 @@ typedef Elf64_Sym Sym;
 #define R_TYPE(x) ((x)&0xffffffff)
 #define R_SYM(x) ((x)>>32)
 #endif
+#define ST_TYPE(x) ((x)&15)
 
 struct debug {
 	int ver;
@@ -53,6 +55,7 @@ struct dso {
 	int refcnt;
 	Sym *syms;
 	uint32_t *hashtab;
+	uint32_t *ghashtab;
 	char *strings;
 	unsigned char *map;
 	size_t map_len;
@@ -85,6 +88,7 @@ struct debug *_dl_debug_addr = &debug;
 
 #define AUX_CNT 24
 #define DYN_CNT 34
+#define DYN_ADDR_CNT 12
 
 static void decode_vec(size_t *v, size_t *a, size_t cnt)
 {
@@ -95,7 +99,16 @@ static void decode_vec(size_t *v, size_t *a, size_t cnt)
 	}
 }
 
-static uint32_t hash(const char *s0)
+static void decode_vec2(size_t *v, size_t *a, size_t cnt, size_t offset)
+{
+	memset(a, 0, cnt*sizeof(size_t));
+	for (; v[0]; v+=2) if (v[0]<offset && v[0]>(offset-cnt)) {
+		a[0] |= 1ULL<<(offset-v[0]);
+		a[offset-v[0]] = v[1];
+	}
+}
+
+static uint32_t sysv_hash(const char *s0)
 {
 	const unsigned char *s = (void *)s0;
 	uint_fast32_t h = 0;
@@ -106,7 +119,16 @@ static uint32_t hash(const char *s0)
 	return h & 0xfffffff;
 }
 
-static Sym *lookup(const char *s, uint32_t h, struct dso *dso)
+static uint32_t gnu_hash (const char *s0)
+{
+	const unsigned char *s = (void *)s0;
+	uint_fast32_t h = 5381;
+	for (; *s; s++)
+		h = h*33 + *s;
+	return h;
+}
+
+static Sym *sysv_lookup(const char *s, uint32_t h, struct dso *dso)
 {
 	size_t i;
 	Sym *syms = dso->syms;
@@ -119,20 +141,81 @@ static Sym *lookup(const char *s, uint32_t h, struct dso *dso)
 	return 0;
 }
 
+static Sym *gnu_lookup(const char *s, uint32_t h1, struct dso *dso)
+{
+	size_t i;
+	Sym *sym;
+	char *strings = dso->strings;
+	uint32_t *hashtab = dso->ghashtab;
+	uint32_t nbuckets = hashtab[0];
+	size_t *maskwords = (size_t *)(hashtab + 4);
+	uint32_t *buckets = hashtab + 4 + (hashtab[2]*(sizeof(size_t)/sizeof(uint32_t)));
+	uint32_t symndx = hashtab[1];
+	Sym *syms = dso->syms;
+	uint32_t shift2 = hashtab[3];
+	uint32_t h2 = h1 >> shift2;
+	uint32_t *hashvals = buckets + nbuckets;
+	uint32_t *hashval;
+	size_t c = sizeof(size_t) * 8;
+	size_t n = (h1/c) & (hashtab[2]-1);
+	size_t bitmask = (1 << (h1%c)) | (1 << (h2%c));
+
+	if ((maskwords[n] & bitmask) != bitmask)
+		return 0;
+
+	n = buckets[h1 % nbuckets];
+	if (!n)
+		return 0;
+
+	sym = syms + n;
+	hashval = hashvals + n - symndx;
+
+	for (h1 &= (uint32_t)-2;; sym++) {
+		h2 = *hashval++;
+		if ((h1 == (h2 & ~1)) && !strcmp(s, strings + sym->st_name))
+			return sym;
+
+		if (h2 & 1)
+			break;
+	}
+
+	return 0;
+}
+
 #define OK_TYPES (1<<STT_NOTYPE | 1<<STT_OBJECT | 1<<STT_FUNC | 1<<STT_COMMON)
 #define OK_BINDS (1<<STB_GLOBAL | 1<<STB_WEAK)
 
 static void *find_sym(struct dso *dso, const char *s, int need_def)
 {
-	uint32_t h = hash(s);
+	uint32_t h = 0, gh = 0;
 	void *def = 0;
-	if (h==0x6b366be && !strcmp(s, "dlopen")) rtld_used = 1;
-	if (h==0x6b3afd && !strcmp(s, "dlsym")) rtld_used = 1;
-	if (h==0x595a4cc && !strcmp(s, "__stack_chk_fail")) ssp_used = 1;
+	static uint32_t precomp[2][3] = {
+		{0x6b366be, 0x6b3afd, 0x595a4cc},
+		{0xf9040207, 0xf4dc4ae, 0x1f4039c9},
+	};
+	uint32_t *precomptab;
+	if (dso->hashtab) {
+		h = sysv_hash(s);
+		precomptab = precomp[0];
+	} else {
+		gh = gnu_hash(s);
+		precomptab = precomp[0];
+	}
+	if (h == precomptab[0] && !strcmp(s, "dlopen")) rtld_used = 1;
+	if (h == precomptab[1] && !strcmp(s, "dlsym")) rtld_used = 1;
+	if (h == precomptab[2] && !strcmp(s, "__stack_chk_fail")) ssp_used = 1;
 	for (; dso; dso=dso->next) {
 		Sym *sym;
 		if (!dso->global) continue;
-		sym = lookup(s, h, dso);
+		if (dso->hashtab && (h || !dso->ghashtab)) {
+			if (!h)
+				h = sysv_hash(s);
+			sym = sysv_lookup(s, h, dso);
+		} else {
+			if (!gh)
+				gh = gnu_hash(s);
+			sym = gnu_lookup(s, gh, dso);
+		}
 		if (sym && (!need_def || sym->st_shndx) && sym->st_value
 		 && (1<<(sym->st_info&0xf) & OK_TYPES)
 		 && (1<<(sym->st_info>>4) & OK_BINDS)) {
@@ -325,8 +408,12 @@ static void decode_dyn(struct dso *p)
 	size_t dyn[DYN_CNT] = {0};
 	decode_vec(p->dynv, dyn, DYN_CNT);
 	p->syms = (void *)(p->base + dyn[DT_SYMTAB]);
-	p->hashtab = (void *)(p->base + dyn[DT_HASH]);
 	p->strings = (void *)(p->base + dyn[DT_STRTAB]);
+	if (dyn[0]&(1<<DT_HASH))
+		p->hashtab = (void *)(p->base + dyn[DT_HASH]);
+	decode_vec2(p->dynv, dyn, DYN_ADDR_CNT, DT_ADDRRNGHI+1);
+	if (dyn[0]&(1<<(DT_ADDRTAGIDX(DT_GNU_HASH)+1)))
+		p->ghashtab = (void *)(p->base + dyn[DT_ADDRTAGIDX(DT_GNU_HASH)+1]);
 }
 
 static struct dso *load_library(const char *name)
@@ -788,7 +875,7 @@ end:
 static void *do_dlsym(struct dso *p, const char *s, void *ra)
 {
 	size_t i;
-	uint32_t h;
+	uint32_t h = 0, gh = 0;
 	Sym *sym;
 	if (p == RTLD_NEXT) {
 		for (p=head; p && (unsigned char *)ra-p->map>p->map_len; p=p->next);
@@ -802,12 +889,25 @@ static void *do_dlsym(struct dso *p, const char *s, void *ra)
 		if (!res) goto failed;
 		return res;
 	}
-	h = hash(s);
-	sym = lookup(s, h, p);
+	if (p->hashtab) {
+		h = sysv_hash(s);
+		sym = sysv_lookup(s, h, p);
+	} else {
+		gh = gnu_hash(s);
+		sym = gnu_lookup(s, gh, p);
+	}
 	if (sym && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES))
 		return p->base + sym->st_value;
 	if (p->deps) for (i=0; p->deps[i]; i++) {
-		sym = lookup(s, h, p->deps[i]);
+		if (p->deps[i]->hashtab && (h || !p->deps[i]->ghashtab)) {
+			if (!h)
+				h = sysv_hash(s);
+			sym = sysv_lookup(s, h, p->deps[i]);
+		} else {
+			if (!gh)
+				gh = gnu_hash(s);
+			sym = gnu_lookup(s, h, p->deps[i]);
+		}
 		if (sym && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES))
 			return p->deps[i]->base + sym->st_value;
 	}
@@ -817,6 +917,69 @@ failed:
 	return 0;
 }
 
+static int do_dladdr (void *addr, Dl_info *info)
+{
+	struct dso *p;
+	memset (info, 0, sizeof (*info));
+	for (p=head; p; p=p->next) {
+		if ((unsigned char *)addr >= p->map && (unsigned char *)addr < p->map + p->map_len) {
+			Sym *syms = p->syms;
+			uint32_t nsym = 0;
+			char *strings = p->strings;
+			size_t i;
+			info->dli_fname = p->name;
+			info->dli_fbase = p->base;
+			if (p->hashtab)
+				nsym = p->hashtab[1];
+			else {
+				uint32_t *buckets;
+				buckets = p->ghashtab + 4 + (p->ghashtab[2] * (sizeof(size_t)/sizeof(uint32_t)));
+				syms += p->ghashtab[1];
+				for (i = 0; i < p->ghashtab[0]; ++i) {
+					if (buckets[i] > nsym)
+						nsym = buckets[i];
+				}
+
+				if (nsym) {
+					nsym -= p->ghashtab[1];
+					uint32_t *hashval = buckets + p->ghashtab[0] + nsym;
+					do {
+						nsym++;
+					}while (!(*hashval++ & 1));
+				}
+
+			}
+
+			for (i = 0; i < nsym; i++) {
+				if ((syms[i].st_value != 0 || syms[i].st_shndx != SHN_UNDEF) &&
+					 ST_TYPE(syms[i].st_info) != STT_TLS) {
+					void *symaddr = p->base + syms[i].st_value;
+
+					if (addr >= symaddr && (!info->dli_saddr || (addr-symaddr)<(addr-info->dli_saddr))) {
+						info->dli_saddr = symaddr;
+						info->dli_sname = strings + syms[i].st_name;
+
+						if (addr == symaddr)
+							break;
+					}
+				}
+			}
+
+			return 1;
+		}
+	}
+	return 0;
+}
+
+int dladdr (void *addr, Dl_info *info)
+{
+	int res;
+	pthread_rwlock_rdlock(&lock);
+	res = do_dladdr (addr, info);
+	pthread_rwlock_unlock(&lock);
+	return res;
+}
+
 void *__dlsym(void *p, const char *s, void *ra)
 {
 	void *res;

  reply	other threads:[~2012-08-20 12:55 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-07  9:04 musl
2012-08-07 11:46 ` Szabolcs Nagy
2012-08-07 14:15   ` musl
2012-08-07 14:53     ` Szabolcs Nagy
2012-08-07 23:09     ` Rich Felker
2012-08-08  9:55       ` musl
2012-08-08 11:52         ` Szabolcs Nagy
2012-08-08 12:54           ` Rich Felker
2012-08-08 13:57           ` musl
2012-08-11 23:05             ` Rich Felker
2012-08-15 22:41               ` boris brezillon
2012-08-17  5:39                 ` Rich Felker
2012-08-19 16:42                   ` musl
2012-08-20  2:06                     ` Rich Felker
2012-08-20 12:55                       ` musl [this message]
2012-08-20 14:32                         ` musl
2012-08-23 21:39                           ` Rich Felker
2012-08-23 22:21                             ` Rich Felker
2012-08-24  7:29                               ` musl
2012-08-24 18:38                                 ` Rich Felker
2012-08-25  7:42                                   ` boris brezillon
2012-08-25 12:35                                     ` Rich Felker
2012-08-25 22:13                                   ` musl
2012-08-25 22:37                                     ` musl
2012-08-26  0:00                                   ` musl
2012-08-24  8:12                               ` Szabolcs Nagy
2012-08-24  8:56                                 ` musl
2012-08-24  9:38                                   ` Szabolcs Nagy
2012-08-25 21:34                               ` musl
2012-08-25 21:42                                 ` Rich Felker
2012-08-16 18:03               ` musl
2012-08-17 16:35               ` musl
2012-08-08 12:49         ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=503233A8.8000604@gmail.com \
    --to=b.brezillon.musl@gmail.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).