mailing list of musl libc
 help / color / mirror / code / Atom feed
From: musl <b.brezillon.musl@gmail.com>
To: musl@lists.openwall.com
Cc: Rich Felker <dalias@aerifal.cx>
Subject: Re: ldso : dladdr support
Date: Wed, 08 Aug 2012 11:55:45 +0200	[thread overview]
Message-ID: <502237A1.1000805@gmail.com> (raw)
In-Reply-To: <20120807230933.GC27715@brightrain.aerifal.cx>

[-- Attachment #1: Type: text/plain, Size: 5954 bytes --]

Here is the new patch for dladdr + gnu hash support.

Please tell me if you find coding style errors.

On 08/08/2012 01:09, Rich Felker wrote:
> On Tue, Aug 07, 2012 at 04:15:34PM +0200, musl wrote:
>>> actually i'm not sure if the function pointer approach is
>>> the right one here, but that's a design question
>>>
>> Could you tell me why (performance)?
> Some reasons I would avoid gratuitous tables of function pointer
> tables for things like this:
>
> 1. In shared libraries, they have to be relocated at load time, which
>    adds to startup time cost. A single table doesn't make much
>    difference, but the _policy_ of avoiding them in general can make a
>    difference when there are many places they might have been used.
>
> 2. Likewise, since they're non-constant, they have to be in the data
>    segment, which increases the size of the library's data segment.
>    Same "single table vs policy" point as in #1 applies.
>
> 3. Since they're non-constant, they're nice targets for an attacker
>    looking to perform code execution exploits. I don't like
>    artificially restricting our use of the language just because
>    certain constructs could help an attacker exploit an
>    ALREADY-vulnerable program, but I also think it's prudent to avoid
>    gratuitous uses like this that serve no purpose and potentially
>    contribute to exploitability.
>
I reworked the implementation to remove all function pointers.
>
>>>> +	uint32_t *hashtab = dso->hashtab;
>>>> +	uint32_t *buckets = hashtab + 4 + (hashtab[2] * (sizeof(size_t) / sizeof(uint32_t)));
>>> i don't know the algorithm but the sizeof magic looks suspicious to me
>> I took the algorithm described here :
>>
>> https://blogs.oracle.com/ali/entry/gnu_hash_elf_sections
>>
>> The maskwords are native word size (64 bits for elf64 and 32 bits for elf32).
>> I could define these macros:
>> #define MASKWORD_SIZE sizeof (Elf32_Word)
>> #define MASKWORD_SIZE sizeof (Elf64_Word)
> In that case, we use size_t, not the ElfXX_Foo mess which only makes
> sense when writing generic tools for dealing with ELF files from
> arbitrary archs, not when handling native ELF structures.
>
>>>> +static uint32_t gnu_hash (const char *s0)
>>>> +{
>>>> +	const unsigned char *s = (void *)s0;
>>>> +	uint_fast32_t h = 5381;
>>>> +	for (unsigned char c = *s; c != '\0'; c = *++s)
>>>> +		h = h * 33 + c;
>>>> +	return h & 0xffffffff;
>>>> +}
>>>> +
>>> i think c != '\0' is not very idiomatic
>>> and i'd avoid using c99 style declaration in for
>>>
>>> use
>>>
>>> for (; *s; s++)
>>> 	h = 33*h + *s;
>>>
>> Corrected.
> Indeed, that's a lot cleaner.
>
>>>> +	uint32_t shift2 = hashtab[3];
>>>> +	uint32_t h2 = h1 >> shift2;
>>> i'm not sure if input validation makes sense in ldso
>>> but shifts can be tricky (hashtab[3] must be in 0..31 here)
>> By validation do you mean mask the shift value with 0x1F ?
> There are millions of ways an invalid/corrupt ELF file can cause the
> dynamic linker/calling program to invoke UB. I don't see much value in
> checking for them. Such invalid files will only exist if created
> intentionally or if your storage medium is corrupted. There is no
> "safe" interface corresponding to dlopen by which you can try opening
> a DSO and sanity-checking it before accessing any code or data from it
> (e.g. the ctors run before dlopen returns!) so the only way to handle
> evil ELF files is to validate the image on disk _before_ calling
> dlopen, and opening it with an absolute pathname to a location under
> your control.
>
>>>> +	for (h1 &= ~1; 1; sym++) {
>>> the 1; in the middle is unnecessary
>>>
>>> h1 &= ~1; is problematic if signed int representation is
>>> not two's complement
>>> (~1 is an implementation defined negative value which is
> It's not implementation-defined. It's defined as operating on the bit
> representation, and once the signed representation is chosen among the
> 3 allowed, the implementation is forced to perform ~ accordingly.
>
>>> then converted to uint32_t according to well defined rules)
> musl will never deal with non-twos-complement implementations (AFAIK,
> no non-twos-complement conformant C implementation has ever been
> created) so it's not a big deal, but I agree it would be cleaner to
> avoid the issue. Personally I try to avoid ever using bitwise
> operators on signed types.
>
>>> i think we can have more efficient code here if
>>> only gnu and sysv hash algos are supported
>>> (i don't think it's realistic to assume new hash algorithms
>>> even gnu hash is a fairly useless addition to elf)
>> This has to do with the function pointer approach.
>> Do you prefer if/else alternative?
> Yes, I prefer if/else.
>
> Also, we had a conversation on the list and/or IRC a while back (I
> don't remember which) where I described some optimizations that should
> be present if gnu hash is to be supported. Basically, the dynamic
> linker should keep track of whether there's one of the two hashes
> that's supported by ALL loaded libs, and in that case, only ever use
> that hash, to avoid computing two different hashes.
The hash for given algo is only computed once (if needed).
That's the reason for the computed table.
If all libs uses the same hash algo, the other will never be computed.
> Another issue that needs to be fixed before gnu hash support can be
> committed is that you removed the hash comparisons for "dlopen",
> "dlsym", and "__stack_chk_fail". Incuring a strcmp against these 3
> strings for every single symbol lookup seems extremely costly; that's
> why I did the hash comparisons in the first place. Keeping track of
> them is important so we can avoid the overhead of supporting these
> functions when there's no way the application can ever attempt to use
> them, but we also want to minimize the startup overhead of making this
> determination.
I added the precomp table wich includes both gnu and sysv hashes for the dlopen,
dlsym and __stack_chk_fail symbols.
>
> Rich


[-- Attachment #2: ldso-add-dladdr-and-gnu-hash-support.patch --]
[-- Type: text/x-patch, Size: 9709 bytes --]

From 4a809b89c38ca3a5b69e5758a04effd6d404039c Mon Sep 17 00:00:00 2001
From: Boris BREZILLON <b.brezillon@overkiz.com>
Date: Wed, 8 Aug 2012 11:45:31 +0200
Subject: [PATCH] ldso : add dladdr and gnu hash support

---
 include/dlfcn.h    |   15 ++++
 src/ldso/dynlink.c |  231 +++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 235 insertions(+), 11 deletions(-)

diff --git a/include/dlfcn.h b/include/dlfcn.h
index dea74c7..8c45822 100644
--- a/include/dlfcn.h
+++ b/include/dlfcn.h
@@ -18,6 +18,21 @@ char  *dlerror(void);
 void  *dlopen(const char *, int);
 void  *dlsym(void *, const char *);
 
+#ifdef _GNU_SOURCE
+typedef struct {
+    const char *dli_fname;  /* Pathname of shared object that
+                               contains address */
+    void       *dli_fbase;  /* Address at which shared object
+                               is loaded */
+    const char *dli_sname;  /* Name of nearest symbol with address
+                               lower than addr */
+    void       *dli_saddr;  /* Exact address of symbol named
+                               in dli_sname */
+} Dl_info;
+
+int dladdr (void *addr, Dl_info *info);
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
index f55c6f1..f40c1e5 100644
--- a/src/ldso/dynlink.c
+++ b/src/ldso/dynlink.c
@@ -1,3 +1,4 @@
+#define _GNU_SOURCE
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -28,12 +29,14 @@ typedef Elf32_Phdr Phdr;
 typedef Elf32_Sym Sym;
 #define R_TYPE(x) ((x)&255)
 #define R_SYM(x) ((x)>>8)
+#define ELF_ST_TYPE ELF32_ST_TYPE
 #else
 typedef Elf64_Ehdr Ehdr;
 typedef Elf64_Phdr Phdr;
 typedef Elf64_Sym Sym;
 #define R_TYPE(x) ((x)&0xffffffff)
 #define R_SYM(x) ((x)>>32)
+#define ELF_ST_TYPE ELF64_ST_TYPE
 #endif
 
 struct debug {
@@ -52,6 +55,7 @@ struct dso {
 
 	int refcnt;
 	Sym *syms;
+	uint32_t hashalg;
 	uint32_t *hashtab;
 	char *strings;
 	unsigned char *map;
@@ -66,6 +70,10 @@ struct dso {
 	char buf[];
 };
 
+#define SYSV_HASH_ALG_IDX   0
+#define GNU_HASH_ALG_IDX    1
+#define HASH_ALG_CNT		2
+
 #include "reloc.h"
 
 void __init_ssp(size_t *);
@@ -94,7 +102,7 @@ static void decode_vec(size_t *v, size_t *a, size_t cnt)
 	}
 }
 
-static uint32_t hash(const char *s0)
+static uint32_t sysv_hash(const char *s0)
 {
 	const unsigned char *s = (void *)s0;
 	uint_fast32_t h = 0;
@@ -105,7 +113,16 @@ static uint32_t hash(const char *s0)
 	return h & 0xfffffff;
 }
 
-static Sym *lookup(const char *s, uint32_t h, struct dso *dso)
+static uint32_t gnu_hash (const char *s0)
+{
+	const unsigned char *s = (void *)s0;
+	uint_fast32_t h = 5381;
+	for (; *s; s++)
+		h = h*33 + *s;
+	return h & 0xffffffff;
+}
+
+static Sym *sysv_lookup(const char *s, uint32_t h, struct dso *dso)
 {
 	size_t i;
 	Sym *syms = dso->syms;
@@ -118,20 +135,94 @@ static Sym *lookup(const char *s, uint32_t h, struct dso *dso)
 	return 0;
 }
 
+static Sym *gnu_lookup(const char *s, uint32_t h1, struct dso *dso)
+{
+	size_t i;
+	Sym *sym;
+	char *strings = dso->strings;
+	uint32_t *hashtab = dso->hashtab;
+	uint32_t nbuckets = hashtab[0];
+	size_t *maskwords = (size_t *)(hashtab + 4);
+	uint32_t *buckets = hashtab + 4 + (hashtab[2]*(sizeof(size_t)/sizeof(uint32_t)));
+	uint32_t symndx = hashtab[1];
+	Sym *syms = dso->syms;
+	uint32_t shift2 = hashtab[3];
+	uint32_t h2 = h1 >> shift2;
+	uint32_t *hashvals = buckets + nbuckets;
+	uint32_t *hashval;
+	size_t c = sizeof(size_t) * 8;
+	size_t n = (h1/c) & (hashtab[2]-1);
+	size_t bitmask = (1 << (h1%c)) | (1 << (h2%c));
+
+	if ((maskwords[n] & bitmask) != bitmask)
+		return 0;
+
+	n = buckets[h1 % nbuckets];
+	if (!n)
+		return 0;
+
+	sym = syms + n;
+	hashval = hashvals + n - symndx;
+
+	for (h1 &= (uint32_t)-2;; sym++) {
+		h2 = *hashval++;
+		if ((h1 == (h2 & ~1)) && !strcmp(s, strings + sym->st_name))
+			return sym;
+
+		if (h2 & 1)
+			break;
+	}
+
+	return 0;
+}
+
 #define OK_TYPES (1<<STT_NOTYPE | 1<<STT_OBJECT | 1<<STT_FUNC | 1<<STT_COMMON)
 #define OK_BINDS (1<<STB_GLOBAL | 1<<STB_WEAK)
 
 static void *find_sym(struct dso *dso, const char *s, int need_def)
 {
-	uint32_t h = hash(s);
 	void *def = 0;
-	if (h==0x6b366be && !strcmp(s, "dlopen")) rtld_used = 1;
-	if (h==0x6b3afd && !strcmp(s, "dlsym")) rtld_used = 1;
-	if (h==0x595a4cc && !strcmp(s, "__stack_chk_fail")) ssp_used = 1;
+	static uint32_t precomp[HASH_ALG_CNT][3] = {
+	  {0x6b366be, 0x6b3afd, 0x595a4cc},
+	  {0xf9040207, 0xf4dc4ae, 0x1f4039c9},
+	};
+	uint32_t h[HASH_ALG_CNT];
+	uint8_t computed[HASH_ALG_CNT] = {0, 0};
+	uint8_t alg = dso->hashalg;
+	if (alg == SYSV_HASH_ALG_IDX)
+		h[alg] = sysv_hash(s);
+	else
+		h[alg] = gnu_hash(s);
+
+	computed[alg] = 1;
+
+	if (h[alg] == precomp[alg][0] && !strcmp(s, "dlopen")) rtld_used = 1;
+	if (h[alg] == precomp[alg][1] && !strcmp(s, "dlsym")) rtld_used = 1;
+	if (h[alg] == precomp[alg][2] && !strcmp(s, "__stack_chk_fail")) ssp_used = 1;
+
 	for (; dso; dso=dso->next) {
 		Sym *sym;
+
 		if (!dso->global) continue;
-		sym = lookup(s, h, dso);
+
+		alg = dso->hashalg;
+		if (!computed[alg]) {
+			if (alg == SYSV_HASH_ALG_IDX) {
+				h[alg] = sysv_hash(s);
+				sym = sysv_lookup(s, h[alg], dso);
+			}
+			else {
+				h[alg] = gnu_hash(s);
+				sym = gnu_lookup(s, h[alg], dso);
+			}
+			computed[alg] = 1;
+		} else {
+			if (alg == SYSV_HASH_ALG_IDX)
+				sym = sysv_lookup(s, h[alg], dso);
+			else
+				sym = gnu_lookup(s, h[alg], dso);
+		}
+
 		if (sym && (!need_def || sym->st_shndx) && sym->st_value
 		 && (1<<(sym->st_info&0xf) & OK_TYPES)
 		 && (1<<(sym->st_info>>4) & OK_BINDS)) {
@@ -320,11 +411,17 @@ static int path_open(const char *name, const char *search, char *buf, size_t buf
 
 static void decode_dyn(struct dso *p)
 {
+	size_t *v = p->dynv;
 	size_t dyn[DYN_CNT] = {0};
 	decode_vec(p->dynv, dyn, DYN_CNT);
 	p->syms = (void *)(p->base + dyn[DT_SYMTAB]);
 	p->hashtab = (void *)(p->base + dyn[DT_HASH]);
+	p->hashalg = SYSV_HASH_ALG_IDX;
 	p->strings = (void *)(p->base + dyn[DT_STRTAB]);
+	for (; v[0]; v+=2) if (v[0] == DT_GNU_HASH) {
+		p->hashtab = (void *)(p->base + v[1]);
+		p->hashalg = GNU_HASH_ALG_IDX;
+	}
 }
 
 static struct dso *load_library(const char *name)
@@ -784,8 +881,11 @@ end:
 static void *do_dlsym(struct dso *p, const char *s, void *ra)
 {
 	size_t i;
-	uint32_t h;
 	Sym *sym;
+	uint8_t computed[HASH_ALG_CNT] = {0, 0};
+	uint8_t alg = p->hashalg;
+	uint32_t h[HASH_ALG_CNT];
+
 	if (p == RTLD_NEXT) {
 		for (p=head; p && (unsigned char *)ra-p->map>p->map_len; p=p->next);
 		if (!p) p=head;
@@ -798,12 +898,36 @@ static void *do_dlsym(struct dso *p, const char *s, void *ra)
 		if (!res) goto failed;
 		return res;
 	}
-	h = hash(s);
-	sym = lookup(s, h, p);
+
+	if (alg == SYSV_HASH_ALG_IDX) {
+		h[alg] = sysv_hash(s);
+		sym = sysv_lookup(s, h[alg], p);
+	} else {
+		h[alg] = gnu_hash(s);
+		sym = gnu_lookup(s, h[alg], p);
+	}
+	computed[alg] = 1;
+
 	if (sym && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES))
 		return p->base + sym->st_value;
 	if (p->deps) for (i=0; p->deps[i]; i++) {
-		sym = lookup(s, h, p->deps[i]);
+		alg = p->deps[i]->hashalg;
+		if (!computed[alg]) {
+			if (alg == SYSV_HASH_ALG_IDX) {
+				h[alg] = sysv_hash(s);
+				sym = sysv_lookup(s, h[alg], p->deps[i]);
+			} else {
+				h[alg] = gnu_hash(s);
+				sym = gnu_lookup(s, h[alg], p->deps[i]);
+			}
+			computed[alg] = 1;
+		} else {
+			if (alg == SYSV_HASH_ALG_IDX)
+				sym = sysv_lookup(s, h[alg], p->deps[i]);
+			else
+				sym = gnu_lookup(s, h[alg], p->deps[i]);
+		}
+
 		if (sym && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES))
 			return p->deps[i]->base + sym->st_value;
 	}
@@ -813,6 +937,91 @@ failed:
 	return 0;
 }
 
+struct sym_search {
+	void *addr;
+	Dl_info *info;
+};
+
+static int find_closest_sym (struct dso *dso, Sym *sym, struct sym_search *search)
+{
+	void *symaddr = dso->base + sym->st_value;
+	char *strings = dso->strings;
+	Dl_info *info = search->info;
+	void *addr = search->addr;
+	void *prevaddr = info->dli_saddr;
+
+	if (sym->st_value == 0 && sym->st_shndx == SHN_UNDEF)
+		return 1;
+
+	if (ELF_ST_TYPE(sym->st_info) == STT_TLS)
+		return 1;
+
+	if (addr < symaddr)
+		return 1;
+
+	if (prevaddr && (addr - symaddr) > (addr - prevaddr))
+		return 1;
+
+	info->dli_saddr = symaddr;
+	info->dli_sname = strings + sym->st_name;
+
+	if (addr == symaddr)
+		return 0;
+
+	return 1;
+
+}
+
+static int do_dladdr (void *addr, Dl_info *info)
+{
+	struct sym_search search;
+	struct dso *p;
+	memset (info, 0, sizeof (*info));
+	search.info = info;
+	search.addr = addr;
+	for (p=head; p; p=p->next) {
+		if ((unsigned char *)addr >= p->map && (unsigned char *)addr < p->map + p->map_len) {
+			Sym *syms = p->syms;
+			uint32_t *hashtab = p->hashtab;
+			size_t i;
+			info->dli_fname = p->name;
+			info->dli_fbase = p->base;
+			if (p->hashalg == SYSV_HASH_ALG_IDX) {
+				for (i = 0; i < hashtab[1]; i++) {
+					if (!find_closest_sym (p, syms + i, &search))
+						return 1;
+				}
+			} else {
+				uint32_t *buckets = hashtab + 4 + (hashtab[2] * (sizeof(size_t)/sizeof(uint32_t)));
+				uint32_t nbuckets = hashtab[0];
+				uint32_t *hashvals = buckets + nbuckets;
+				uint32_t symndx = hashtab[1];
+				for (i = 0; i < nbuckets; ++i) {
+					uint32_t n = buckets[i];
+					Sym *sym = syms + n;
+					uint32_t *hashval = hashvals + n - symndx;
+
+					do {
+						if (!find_closest_sym (p, sym, &search))
+							return 1;
+					}while (!(*hashval++ & 1));
+				}
+			}
+			return 1;
+		}
+	}
+	return 0;
+}
+
+int dladdr (void *addr, Dl_info *info)
+{
+	int res;
+	pthread_rwlock_rdlock(&lock);
+	res = do_dladdr (addr, info);
+	pthread_rwlock_unlock(&lock);
+	return res;
+}
+
 void *__dlsym(void *p, const char *s, void *ra)
 {
 	void *res;
-- 
1.7.9.5


  reply	other threads:[~2012-08-08  9:55 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-07  9:04 musl
2012-08-07 11:46 ` Szabolcs Nagy
2012-08-07 14:15   ` musl
2012-08-07 14:53     ` Szabolcs Nagy
2012-08-07 23:09     ` Rich Felker
2012-08-08  9:55       ` musl [this message]
2012-08-08 11:52         ` Szabolcs Nagy
2012-08-08 12:54           ` Rich Felker
2012-08-08 13:57           ` musl
2012-08-11 23:05             ` Rich Felker
2012-08-15 22:41               ` boris brezillon
2012-08-17  5:39                 ` Rich Felker
2012-08-19 16:42                   ` musl
2012-08-20  2:06                     ` Rich Felker
2012-08-20 12:55                       ` musl
2012-08-20 14:32                         ` musl
2012-08-23 21:39                           ` Rich Felker
2012-08-23 22:21                             ` Rich Felker
2012-08-24  7:29                               ` musl
2012-08-24 18:38                                 ` Rich Felker
2012-08-25  7:42                                   ` boris brezillon
2012-08-25 12:35                                     ` Rich Felker
2012-08-25 22:13                                   ` musl
2012-08-25 22:37                                     ` musl
2012-08-26  0:00                                   ` musl
2012-08-24  8:12                               ` Szabolcs Nagy
2012-08-24  8:56                                 ` musl
2012-08-24  9:38                                   ` Szabolcs Nagy
2012-08-25 21:34                               ` musl
2012-08-25 21:42                                 ` Rich Felker
2012-08-16 18:03               ` musl
2012-08-17 16:35               ` musl
2012-08-08 12:49         ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=502237A1.1000805@gmail.com \
    --to=b.brezillon.musl@gmail.com \
    --cc=dalias@aerifal.cx \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).