* malloc(0) behaviour @ 2013-01-14 17:17 Igmar Palsenberg 2013-01-14 18:05 ` Rich Felker 2013-01-14 23:37 ` Rob Landley 0 siblings, 2 replies; 39+ messages in thread From: Igmar Palsenberg @ 2013-01-14 17:17 UTC (permalink / raw) To: musl Hi, Is there a (good) reason for Musl to follow glibc's malloc(0) behaviour ? Musl returns a valid pointer, which is fine according to the standard, but returning NULL is also fine. IMHO, returning NULL is better : It usually kills the program if actual storage is attempted. You also can't do that if a valid pointer is returned, so I really can't grasp the reason for returning a pointer at all, except to support buggy and lazy programming. I suggest we make malloc(0) return a NULL pointer. Any objections ? Regards, Igmar ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-14 17:17 malloc(0) behaviour Igmar Palsenberg @ 2013-01-14 18:05 ` Rich Felker 2013-01-14 22:22 ` Strake 2013-01-15 8:31 ` Igmar Palsenberg 2013-01-14 23:37 ` Rob Landley 1 sibling, 2 replies; 39+ messages in thread From: Rich Felker @ 2013-01-14 18:05 UTC (permalink / raw) To: musl On Mon, Jan 14, 2013 at 06:17:47PM +0100, Igmar Palsenberg wrote: > Hi, > > Is there a (good) reason for Musl to follow glibc's malloc(0) > behaviour ? Musl returns a valid pointer, which is fine according to > the standard, but returning NULL is also fine. Yes, there are many good reasons. The most obvious (but stupid) one is that a huge number of programs will "replace" malloc with one where malloc(0) returns something other than a null pointer if the system's malloc(0) returns null, and this adds both bloat and risk of bugs/breakage from the replacement. But there are other much more fundamental reasons too. Basically they all come down to interactions between the requirements of malloc and realloc, and the fact that returning a null pointer from realloc means failure (and thus that the original object was not freed). I don't want to try to remember all the details at the moment (they're enough to make your brain hurt..) but here are some links to get you started on the issue: http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_400.htm http://austingroupbugs.net/view.php?id=400 http://sourceware.org/bugzilla/show_bug.cgi?id=12547 Rich ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-14 18:05 ` Rich Felker @ 2013-01-14 22:22 ` Strake 2013-01-14 23:05 ` Rich Felker 2013-01-15 8:32 ` Igmar Palsenberg 2013-01-15 8:31 ` Igmar Palsenberg 1 sibling, 2 replies; 39+ messages in thread From: Strake @ 2013-01-14 22:22 UTC (permalink / raw) To: musl On 14/01/2013, Rich Felker <dalias@aerifal.cx> wrote: > Yes, there are many good reasons. The most obvious (but stupid) one is > that a huge number of programs will "replace" malloc with one where > malloc(0) returns something other than a null pointer if the system's > malloc(0) returns null, and this adds both bloat and risk of > bugs/breakage from the replacement. But there are other much more > fundamental reasons too. Basically they all come down to interactions > between the requirements of malloc and realloc, and the fact that > returning a null pointer from realloc means failure (and thus that the > original object was not freed). Another: Null means allocation failure. As malloc ought to never fail to find zero bytes free, it thus makes sense to return a non-null pointer. Cheers, Strake ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-14 22:22 ` Strake @ 2013-01-14 23:05 ` Rich Felker 2013-01-15 8:32 ` Igmar Palsenberg 1 sibling, 0 replies; 39+ messages in thread From: Rich Felker @ 2013-01-14 23:05 UTC (permalink / raw) To: musl On Mon, Jan 14, 2013 at 05:22:16PM -0500, Strake wrote: > On 14/01/2013, Rich Felker <dalias@aerifal.cx> wrote: > > Yes, there are many good reasons. The most obvious (but stupid) one is > > that a huge number of programs will "replace" malloc with one where > > malloc(0) returns something other than a null pointer if the system's > > malloc(0) returns null, and this adds both bloat and risk of > > bugs/breakage from the replacement. But there are other much more > > fundamental reasons too. Basically they all come down to interactions > > between the requirements of malloc and realloc, and the fact that > > returning a null pointer from realloc means failure (and thus that the > > original object was not freed). > > Another: Null means allocation failure. As malloc ought to never fail > to find zero bytes free, it thus makes sense to return a non-null > pointer. Yes. In any case, I don't see why a correct program would care what malloc(0) returns, since passing 0 to malloc just makes it harder to write a correct program. Normally after calling malloc, you check the return value, and handle it as an error if the return value is a null pointer. But if your program might pass 0 to malloc, you have to also consider the possibility that the null pointer was returned not as an error but as the "success" result for malloc(0). This greatly complicates your error handling; now you have to either clear errno first and check whether it's set after the call to malloc to determine whether allocation failed, or check for size==0 in the failure case and treat that specially as a sort of success. Things get even worse if you use realloc(p,0), especially if you want to support non-conforming implementations like glibc... Rich ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-14 22:22 ` Strake 2013-01-14 23:05 ` Rich Felker @ 2013-01-15 8:32 ` Igmar Palsenberg 2013-01-15 12:53 ` Rob Landley 1 sibling, 1 reply; 39+ messages in thread From: Igmar Palsenberg @ 2013-01-15 8:32 UTC (permalink / raw) To: musl > On 14/01/2013, Rich Felker <dalias@aerifal.cx> wrote: >> Yes, there are many good reasons. The most obvious (but stupid) one is >> that a huge number of programs will "replace" malloc with one where >> malloc(0) returns something other than a null pointer if the system's >> malloc(0) returns null, and this adds both bloat and risk of >> bugs/breakage from the replacement. But there are other much more >> fundamental reasons too. Basically they all come down to interactions >> between the requirements of malloc and realloc, and the fact that >> returning a null pointer from realloc means failure (and thus that the >> original object was not freed). > > Another: Null means allocation failure. As malloc ought to never fail > to find zero bytes free, it thus makes sense to return a non-null > pointer. A valid pointer also means you should be able to store something. With malloc(0), you can't. If you ask me, abort() would be a same thing to do. And no, I don't expect a decent libc to actually do this :-) Igmar ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-15 8:32 ` Igmar Palsenberg @ 2013-01-15 12:53 ` Rob Landley 2013-01-15 22:18 ` Igmar Palsenberg 0 siblings, 1 reply; 39+ messages in thread From: Rob Landley @ 2013-01-15 12:53 UTC (permalink / raw) To: musl; +Cc: musl On 01/15/2013 02:32:55 AM, Igmar Palsenberg wrote: > > On 14/01/2013, Rich Felker <dalias@aerifal.cx> wrote: > > Another: Null means allocation failure. As malloc ought to never > fail > > to find zero bytes free, it thus makes sense to return a non-null > > pointer. > > A valid pointer also means you should be able to store something. Says who? Fire up the mmap() man page and look at PROT_NONE. Rob ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-15 12:53 ` Rob Landley @ 2013-01-15 22:18 ` Igmar Palsenberg 0 siblings, 0 replies; 39+ messages in thread From: Igmar Palsenberg @ 2013-01-15 22:18 UTC (permalink / raw) To: musl >> > On 14/01/2013, Rich Felker <dalias@aerifal.cx> wrote: >> > Another: Null means allocation failure. As malloc ought to never fail >> > to find zero bytes free, it thus makes sense to return a non-null >> > pointer. >> A valid pointer also means you should be able to store something. > > Says who? Fire up the mmap() man page and look at PROT_NONE. malloc() doesn't handle that case. I also find it irrelevant in this case. You can only pass it to free() legally, everything else is UB. Igmar ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-14 18:05 ` Rich Felker 2013-01-14 22:22 ` Strake @ 2013-01-15 8:31 ` Igmar Palsenberg 2013-01-15 11:06 ` Szabolcs Nagy 2013-01-15 12:52 ` Rob Landley 1 sibling, 2 replies; 39+ messages in thread From: Igmar Palsenberg @ 2013-01-15 8:31 UTC (permalink / raw) To: musl >> Is there a (good) reason for Musl to follow glibc's malloc(0) >> behaviour ? Musl returns a valid pointer, which is fine according to >> the standard, but returning NULL is also fine. > > Yes, there are many good reasons. The most obvious (but stupid) one is > that a huge number of programs will "replace" malloc with one where > malloc(0) returns something other than a null pointer if the system's > malloc(0) returns null, and this adds both bloat and risk of > bugs/breakage from the replacement. But there are other much more > fundamental reasons too. Basically they all come down to interactions > between the requirements of malloc and realloc, and the fact that > returning a null pointer from realloc means failure (and thus that the > original object was not freed). I don't want to try to remember all > the details at the moment (they're enough to make your brain hurt..) > but here are some links to get you started on the issue: > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_400.htm > http://austingroupbugs.net/view.php?id=400 > http://sourceware.org/bugzilla/show_bug.cgi?id=12547 While the above is clear to me, returning a pointer that can't hold anything just seems wrong to me. It's also a matter of promoting bad code : Doing a malloc(0) is simply a bug. People are just to lazy to check return values, and this makes the loop 3 lines shorter. I'll wrap malloc() to include an abort in my case :) Igmar ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-15 8:31 ` Igmar Palsenberg @ 2013-01-15 11:06 ` Szabolcs Nagy 2013-01-15 12:33 ` Igmar Palsenberg 2013-01-15 13:46 ` Rich Felker 2013-01-15 12:52 ` Rob Landley 1 sibling, 2 replies; 39+ messages in thread From: Szabolcs Nagy @ 2013-01-15 11:06 UTC (permalink / raw) To: musl * Igmar Palsenberg <igmar@palsenberg.com> [2013-01-15 09:31:24 +0100]: > > fundamental reasons too. Basically they all come down to interactions > > between the requirements of malloc and realloc, and the fact that > > returning a null pointer from realloc means failure (and thus that the > > > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_400.htm > > http://austingroupbugs.net/view.php?id=400 > > http://sourceware.org/bugzilla/show_bug.cgi?id=12547 > > While the above is clear to me, returning a pointer that can't hold anything just seems wrong to me. i don't think we have too many options here, the standards and historical practices has various inconsistencies and musl has the least broken one but we can do a theoretical discussion about the merits of malloc(0)!=0: i'm surprised that it "seems wrong" to you, you can access the amount of bytes you requested through the returned pointer p, evaluating p+size is valid, p is suitably aligned for all objects and it can be freed. these assumptions are broken if malloc(0)==0 if the standard made malloc(0) work in ansi c then it would save some branch logic and would made the world a safer place (because in a fair amount of code that gets array length from external source no special casing would be needed for length==0) in rob pike's words "zero is a perfectly fine value" http://code.google.com/p/go/issues/detail?id=4142#c2 > It's also a matter of promoting bad code : Doing a malloc(0) is simply a bug. People are just to lazy to check return values, > and this makes the loop 3 lines shorter. malloc(0) is implementation-defined with two different conforming implementations but either one you choose in practice a lot of code will rely on the choosen behaviour incorrectly returning 0 does not save you from that returning 0 has the drawback that realloc(0,0) will be inconsistent (either with the realloc(0,n)===malloc(n) assumption or the realloc(p,0) failure reporting when p needs to be freed) > I'll wrap malloc() to include an abort in my case :) but don't do that in library code that may be used in a long running process: allocation failures should be reported to let the caller handle it ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-15 11:06 ` Szabolcs Nagy @ 2013-01-15 12:33 ` Igmar Palsenberg 2013-01-15 13:48 ` Rich Felker 2013-01-15 13:46 ` Rich Felker 1 sibling, 1 reply; 39+ messages in thread From: Igmar Palsenberg @ 2013-01-15 12:33 UTC (permalink / raw) To: musl >> While the above is clear to me, returning a pointer that can't hold anything just seems wrong to me. > > i don't think we have too many options here, > the standards and historical practices has > various inconsistencies and musl has the least > broken one > > but we can do a theoretical discussion about > the merits of malloc(0)!=0: > > i'm surprised that it "seems wrong" to you, > you can access the amount of bytes you requested > through the returned pointer p, evaluating > p+size is valid, p is suitably aligned for all > objects and it can be freed. > these assumptions are broken if malloc(0)==0 That's there to access if size is 0 ? Sure, you can access : struct foo { }; which is size 0. I do wonder what that gives me in practice. That is, not counting the fact that : if (size == 0) size = 1; was a common practice in malloc() implementations a while ago. > if the standard made malloc(0) work in ansi c > then it would save some branch logic and would > made the world a safer place > (because in a fair amount of code that gets > array length from external source no special > casing would be needed for length==0) > > in rob pike's words "zero is a perfectly fine value" > http://code.google.com/p/go/issues/detail?id=4142#c2 He does have a point. If I go to the gas station, hang in the fuel dispencer, pull it out again directly afterwards, and telling the guy behind the desk that I didn't actually got fuel, I'm probably stared at :) My way of thinking is just different, and both are fine. >> It's also a matter of promoting bad code : Doing a malloc(0) is simply a bug. People are just to lazy to check return values, >> and this makes the loop 3 lines shorter. > > malloc(0) is implementation-defined with > two different conforming implementations > > but either one you choose in practice a lot > of code will rely on the choosen behaviour > incorrectly > > returning 0 does not save you from that > > returning 0 has the drawback that realloc(0,0) > will be inconsistent > (either with the realloc(0,n)===malloc(n) assumption > or the realloc(p,0) failure reporting when p needs > to be freed) Agree. I always handle that case. > >> I'll wrap malloc() to include an abort in my case :) > > but don't do that in library code that may be > used in a long running process: allocation failures > should be reported to let the caller handle it No, just in user code. Libraries shouldn't abort, I agree. In my case, it only aborts in debug mode to aid testing. Igmar ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-15 12:33 ` Igmar Palsenberg @ 2013-01-15 13:48 ` Rich Felker 2013-01-15 22:28 ` Igmar Palsenberg 0 siblings, 1 reply; 39+ messages in thread From: Rich Felker @ 2013-01-15 13:48 UTC (permalink / raw) To: musl On Tue, Jan 15, 2013 at 01:33:07PM +0100, Igmar Palsenberg wrote: > > > >> While the above is clear to me, returning a pointer that can't hold anything just seems wrong to me. > > > > i don't think we have too many options here, > > the standards and historical practices has > > various inconsistencies and musl has the least > > broken one > > > > but we can do a theoretical discussion about > > the merits of malloc(0)!=0: > > > > i'm surprised that it "seems wrong" to you, > > you can access the amount of bytes you requested > > through the returned pointer p, evaluating > > p+size is valid, p is suitably aligned for all > > objects and it can be freed. > > these assumptions are broken if malloc(0)==0 > > That's there to access if size is 0 ? Sure, you can access : > > struct foo { > }; This is a constraint violation. C does not allow empty structs, and even if it did, they would not have size 0, since no type or object ever has size 0 in C. > > which is size 0. I do wonder what that gives me in practice. That is, not counting the fact that : > > if (size == 0) > size = 1; > > was a common practice in malloc() implementations a while ago. Of course, this is the canonical, simplest way to make malloc(0) return a unique pointer. Rich ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-15 13:48 ` Rich Felker @ 2013-01-15 22:28 ` Igmar Palsenberg 2013-01-15 23:22 ` Rob 0 siblings, 1 reply; 39+ messages in thread From: Igmar Palsenberg @ 2013-01-15 22:28 UTC (permalink / raw) To: musl >> >> That's there to access if size is 0 ? Sure, you can access : >> >> struct foo { >> }; > > This is a constraint violation. C does not allow empty structs, and > even if it did, they would not have size 0, since no type or object > ever has size 0 in C. GCC thinks otherwise : #include <stdlib.h> #include <stdio.h> struct test { }; int main(int argc, char **argv) { char *x = NULL; printf("sizeof test : %d\n", sizeof(struct test)); return 0; } [igmar@devel ~]$ ./x sizeof test : 0 It gives me a warning, but doesn't error out. Olders version might behave differently, I don't have those installed. The LLVM compiler does the same. No idea what the standard says, but your remarks sounds correct to me. >> >> which is size 0. I do wonder what that gives me in practice. That is, not counting the fact that : >> >> if (size == 0) >> size = 1; >> >> was a common practice in malloc() implementations a while ago. > > Of course, this is the canonical, simplest way to make malloc(0) > return a unique pointer. Enough for this thread. I did got the answer I wanted, and the result I want is easy to realise. Not mu intention to irritate people. Igmar ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-15 22:28 ` Igmar Palsenberg @ 2013-01-15 23:22 ` Rob 2013-01-16 7:46 ` Igmar Palsenberg 0 siblings, 1 reply; 39+ messages in thread From: Rob @ 2013-01-15 23:22 UTC (permalink / raw) To: musl On Tue, Jan 15, 2013 at 11:28:14PM +0100, Igmar Palsenberg wrote: > >> That's there to access if size is 0 ? Sure, you can access : > >> > >> struct foo { > >> }; > > > > This is a constraint violation. C does not allow empty structs, and > > even if it did, they would not have size 0, since no type or object > > ever has size 0 in C. > > GCC thinks otherwise Therein lies the problem. It's an extension, as are a multitude of other things such as ?: expressions missing the middle expression and ({...}) and so on. Clang also implements these, as do other compilers such as tcc. Rob ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-15 23:22 ` Rob @ 2013-01-16 7:46 ` Igmar Palsenberg 0 siblings, 0 replies; 39+ messages in thread From: Igmar Palsenberg @ 2013-01-16 7:46 UTC (permalink / raw) To: musl >>> This is a constraint violation. C does not allow empty structs, and >>> even if it did, they would not have size 0, since no type or object >>> ever has size 0 in C. >> >> GCC thinks otherwise > > Therein lies the problem. > It's an extension, as are a multitude of other things such as ?: > expressions missing the middle expression and ({...}) and so on. Clang > also implements these, as do other compilers such as tcc. True. I just wanted to indicate that because it compiles, it ends up being used, and considered "standard code". Unportable in theory, but this kind of extensions enabled by default causes this. The same goes for zero-length arrays if I remember correctly. I still need to lookup how to disable those. Igmar ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-15 11:06 ` Szabolcs Nagy 2013-01-15 12:33 ` Igmar Palsenberg @ 2013-01-15 13:46 ` Rich Felker 1 sibling, 0 replies; 39+ messages in thread From: Rich Felker @ 2013-01-15 13:46 UTC (permalink / raw) To: musl On Tue, Jan 15, 2013 at 12:06:18PM +0100, Szabolcs Nagy wrote: > * Igmar Palsenberg <igmar@palsenberg.com> [2013-01-15 09:31:24 +0100]: > > > fundamental reasons too. Basically they all come down to interactions > > > between the requirements of malloc and realloc, and the fact that > > > returning a null pointer from realloc means failure (and thus that the > > > > > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_400.htm > > > http://austingroupbugs.net/view.php?id=400 > > > http://sourceware.org/bugzilla/show_bug.cgi?id=12547 > > > > While the above is clear to me, returning a pointer that can't hold anything just seems wrong to me. > > i don't think we have too many options here, > the standards and historical practices has > various inconsistencies and musl has the least > broken one > > but we can do a theoretical discussion about > the merits of malloc(0)!=0: > > i'm surprised that it "seems wrong" to you, > you can access the amount of bytes you requested > through the returned pointer p, evaluating > p+size is valid, p is suitably aligned for all > objects and it can be freed. > these assumptions are broken if malloc(0)==0 > > if the standard made malloc(0) work in ansi c > then it would save some branch logic and would > made the world a safer place > (because in a fair amount of code that gets > array length from external source no special > casing would be needed for length==0) In fairness, there's hardly any difference between the work involved in: if (size < LIMIT) and if (size-1 < LIMIT-1) The latter catches 0 and treats it as invalid. > > I'll wrap malloc() to include an abort in my case :) > > but don't do that in library code that may be > used in a long running process: allocation failures > should be reported to let the caller handle it I think he meant malloc(0) would abort to indicate that, in the rules of his project, malloc(0) is a programming error. This is not such a bad idea.. Rich ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-15 8:31 ` Igmar Palsenberg 2013-01-15 11:06 ` Szabolcs Nagy @ 2013-01-15 12:52 ` Rob Landley 1 sibling, 0 replies; 39+ messages in thread From: Rob Landley @ 2013-01-15 12:52 UTC (permalink / raw) To: musl On 01/15/2013 02:31:24 AM, Igmar Palsenberg wrote: > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_400.htm > > http://austingroupbugs.net/view.php?id=400 > > http://sourceware.org/bugzilla/show_bug.cgi?id=12547 > > While the above is clear to me, returning a pointer that can't hold > anything just > seems wrong to me. Returning failure from an allocation that can't _not_ be satisfied seems wrong to me. Rob ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-14 17:17 malloc(0) behaviour Igmar Palsenberg 2013-01-14 18:05 ` Rich Felker @ 2013-01-14 23:37 ` Rob Landley 2013-01-15 0:24 ` Rich Felker 2013-01-15 9:01 ` Igmar Palsenberg 1 sibling, 2 replies; 39+ messages in thread From: Rob Landley @ 2013-01-14 23:37 UTC (permalink / raw) To: musl On 01/14/2013 11:17:47 AM, Igmar Palsenberg wrote: > Hi, > > Is there a (good) reason for Musl to follow glibc's malloc(0) > behaviour ? Because not doing so breaks some programs? (Dunno if it still does, but last time I tried switching that config option off in uClibc the linux from scratch build didn't finish.) > Musl returns a valid pointer, which is fine according > to the standard, but returning NULL is also fine. > > IMHO, returning NULL is better : It usually kills the program if > actual storage is > attempted. It also kills the program if they're checking the return code of malloc() and treating NULL as an allocation failure indicator. NULL has a defined meaning of "failed", when a zero length allocation is trivial to satisfy and not necessarily wrong. Should a zero length file read return -1 instead of 0? Is it the program's job to make sure it never makes a NOP syscall? Does adding special case checks in your code to avoid such NOP calls actually make the code smaller and simpler? > You also can't do that if a valid pointer > is returned, so I really can't grasp the reason for returning a > pointer at all, Not indicating that the allocation failed and triggering an assert() when there isn't actually a problem with a legitimately zero length array that had nothing in it? (Both times I debugged why LFS stuff was failing that's what it turned out to be, but I didn't spend too much time on it before just switching the uClibc option on to support it.) > except to support buggy and lazy programming. You're defining "lazy" here a "not adding a special case in the caller for every use of malloc()". That's certainly a point of view, but I'm not sure that's the word you want to use for it. "Not sufficiently defensive programming" maybe? Whether or not defensive programming is an improvement is one of those perpetual arguments having to do with differing mindsets. In reality the reason you'd want to do it is some libc implementations might _not_ allow malloc(0) to succeed, and thus there are portability issues. But only for people who care about portability off glibc, which is an uphill battle in the first place. That said, "I'm right because other people might think like I do and that would make me right" seems kind of a circular argument for a position being _better_. If posix required a valid pointer to be returned the positions would be reversed without any actual change in the technical merits. And "posix doesn't require this, therefore let's intentional break it to force any programs with technical posix compatability issues to use a different libc rather than change anything to humor us"... not seeing it. > I suggest we make malloc(0) return a NULL pointer. Any objections ? Only that it broke real world programs last time I tried it, but if you don't care about that then by all means force 'em to use a different libc. (Re: the "posix test" thread, having a ./configure --pedantic that builds a musl without gnu/dammit extensions and safety tape over the sharp bits sounds like a possible fun future thing. Having that be the default, or adding a lot of runtime checks rather than commenting stuff out at compile time, not so much...) Rob ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-14 23:37 ` Rob Landley @ 2013-01-15 0:24 ` Rich Felker 2013-01-15 12:17 ` Rob Landley 2013-01-15 9:01 ` Igmar Palsenberg 1 sibling, 1 reply; 39+ messages in thread From: Rich Felker @ 2013-01-15 0:24 UTC (permalink / raw) To: musl On Mon, Jan 14, 2013 at 05:37:29PM -0600, Rob Landley wrote: > On 01/14/2013 11:17:47 AM, Igmar Palsenberg wrote: > >Hi, > > > >Is there a (good) reason for Musl to follow glibc's malloc(0) > >behaviour ? > > Because not doing so breaks some programs? (Dunno if it still does, > but last time I tried switching that config option off in uClibc the > linux from scratch build didn't finish.) I don't think it breaks any programs; the ones that depend on malloc(0) returning nonzero seem to use either gnulib or their own hacks to wrap malloc if the system malloc does not have this behavior. However, the bug that originally prompted me to switch the behavior was that the detection could be broken. gnulib was compiling a tiny test program to check the behavior of malloc, and with musl, it got linked to __simple_malloc, which was giving the malloc(0)!=NULL behavior. The full program then got linked to real malloc (where malloc(0) was returning a null pointer), and horribly broke, reporting spurious allocation failures and aborting. At that time, I researched and discussed the topic and the conclusion reached was that malloc(0)!=NULL makes more sense. > >Musl returns a valid pointer, which is fine according > >to the standard, but returning NULL is also fine. > > > >IMHO, returning NULL is better : It usually kills the program if > >actual storage is > >attempted. > > It also kills the program if they're checking the return code of > malloc() and treating NULL as an allocation failure indicator. NULL > has a defined meaning of "failed", when a zero length allocation is > trivial to satisfy and not necessarily wrong. > > Should a zero length file read return -1 instead of 0? Is it the > program's job to make sure it never makes a NOP syscall? Does adding > special case checks in your code to avoid such NOP calls actually > make the code smaller and simpler? Indeed, another argument for malloc(0)!=NULL is to avoid having to write special-case code to handle it. Wrapping malloc with a function that replaces an argument of 0 with an argument of 1 is not just a hack to get "GNU behavior"; it's what you probably want to be doing anyway (even if there were no GNU to copy) if there's a chance you might pass 0 to malloc. It leads to the simplest code. Of course it would make sense to give this wrapper a proper name rather than using macro hacks to "replace" malloc like gnulib does... Or, they could use my convenient formula... #undef malloc #define malloc(n) malloc((n)|1) which seems a lot cheaper than a wrapper. It can be applied to realloc too, but might get expensive with calloc. > >You also can't do that if a valid pointer > >is returned, so I really can't grasp the reason for returning a > >pointer at all, > > Not indicating that the allocation failed and triggering an assert() > when there isn't actually a problem with a legitimately zero length > array that had nothing in it? (Both times I debugged why LFS stuff > was failing that's what it turned out to be, but I didn't spend too > much time on it before just switching the uClibc option on to > support it.) While in some cases it would be nice to get a fault, you don't usually get a fault when trying to access past the end of a length-1 array, so why should you expect one when trying to access past the end of a "length-0 array"? > >except to support buggy and lazy programming. > > You're defining "lazy" here a "not adding a special case in the > caller for every use of malloc()". That's certainly a point of view, > but I'm not sure that's the word you want to use for it. "Not > sufficiently defensive programming" maybe? Well, doing nothing to account for the fact that malloc(0) "failing" might not indicate a problematic OOM condition is "lazy" in my book. > And "posix doesn't require this, therefore let's intentional break > it to force any programs with technical posix compatability issues > to use a different libc rather than change anything to humor us"... > not seeing it. When POSIX allows implementation options but one option has clear technical merits over the others, as is the case here, I don't see any good reason to take the low-quality option. Especially if most existing implementations already take the high-quality one, as this suggests to me it's somewhat reasonable for otherwise-portable programs to assume the high-quality behavior. > (Re: the "posix test" thread, having a ./configure --pedantic that > builds a musl without gnu/dammit extensions and safety tape over the > sharp bits sounds like a possible fun future thing. Having that be > the default, or adding a lot of runtime checks rather than > commenting stuff out at compile time, not so much...) Sounds like a lot of work for something that could be better accomplished with static analysis tools and such... Rich ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-15 0:24 ` Rich Felker @ 2013-01-15 12:17 ` Rob Landley 0 siblings, 0 replies; 39+ messages in thread From: Rob Landley @ 2013-01-15 12:17 UTC (permalink / raw) To: musl; +Cc: musl On 01/14/2013 06:24:42 PM, Rich Felker wrote: > > Not indicating that the allocation failed and triggering an assert() > > when there isn't actually a problem with a legitimately zero length > > array that had nothing in it? (Both times I debugged why LFS stuff > > was failing that's what it turned out to be, but I didn't spend too > > much time on it before just switching the uClibc option on to > > support it.) > > While in some cases it would be nice to get a fault, you don't usually > get a fault when trying to access past the end of a length-1 array, so > why should you expect one when trying to access past the end of a > "length-0 array"? Nobody accessed it. They were doing something handwavingly like: if (!(array=malloc(sizeof(struct blah)*len))) die(); for (i=0; i<len; i++) blah(); Which works just fine for len=0 because the for loop doesn't do anything... assuming malloc doesn't return 0 and trigger the allocation failure check. (But again, this was ~5 years ago. Haven't tried switching it off since.) > > >except to support buggy and lazy programming. > > > > You're defining "lazy" here a "not adding a special case in the > > caller for every use of malloc()". That's certainly a point of view, > > but I'm not sure that's the word you want to use for it. "Not > > sufficiently defensive programming" maybe? > > Well, doing nothing to account for the fact that malloc(0) "failing" > might not indicate a problematic OOM condition is "lazy" in my book. Maybe, but it means they're special casing 0, when otherwise it just works. (At least on eglibc, and getting people to care about portability off glibc remains a thing.) My point was that requiring them to special case this is not necessarily an improvement. Rob ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-14 23:37 ` Rob Landley 2013-01-15 0:24 ` Rich Felker @ 2013-01-15 9:01 ` Igmar Palsenberg 2013-01-15 12:58 ` Rob Landley 1 sibling, 1 reply; 39+ messages in thread From: Igmar Palsenberg @ 2013-01-15 9:01 UTC (permalink / raw) To: musl >> Is there a (good) reason for Musl to follow glibc's malloc(0) behaviour ? > > Because not doing so breaks some programs? (Dunno if it still does, but last time I tried switching that config option off in uClibc the linux from scratch build didn't finish.) It does. On the other hand : Those programs are broken anyway if you ask me. >> Musl returns a valid pointer, which is fine according >> to the standard, but returning NULL is also fine. >> IMHO, returning NULL is better : It usually kills the program if actual storage is >> attempted. > > It also kills the program if they're checking the return code of malloc() and treating NULL as an allocation failure indicator. NULL has a defined meaning of "failed", when a zero length allocation is trivial to satisfy and not necessarily wrong. Is that a bad thing in this case ? The reason I like musl is that is a conforming library : Not something that has numerous hacks / additions to make it backwards compatible, such I've seen with glibc. Usually, when I try the same code on for example FreeBSD, all hell breaks loose. In my experience, musl actually increase my programs portability. > Should a zero length file read return -1 instead of 0? Is it the program's job to make sure it never makes a NOP syscall? Does adding special case checks in your code to avoid such NOP calls actually make the code smaller and simpler? > >> You also can't do that if a valid pointer >> is returned, so I really can't grasp the reason for returning a pointer at all, > > Not indicating that the allocation failed and triggering an assert() when there isn't actually a problem with a legitimately zero length array that had nothing in it? (Both times I debugged why LFS stuff was failing that's what it turned out to be, but I didn't spend too much time on it before just switching the uClibc option on to support it.) With my own code, I consider it a bug. Yes, opinions differ on this matter. I personally add abort() to code that shouldn't happen, or I consider bad. I do catch real bugs with this from time to time. >> except to support buggy and lazy programming. > > You're defining "lazy" here a "not adding a special case in the caller for every use of malloc()". That's certainly a point of view, but I'm not sure that's the word you want to use for it. "Not sufficiently defensive programming" maybe? Definitely the latter. Error checking is underrated matter if you ask me : People write code, and then spend ages debugging issues that would have been a non-issue when proper error checking has been done. The "oh well, it never fails, so don't bother with error checking" thought. > Whether or not defensive programming is an improvement is one of those perpetual arguments having to do with differing mindsets. In reality the reason you'd want to do it is some libc implementations might _not_ allow malloc(0) to succeed, and thus there are portability issues. But only for people who care about portability off glibc, which is an uphill battle in the first place. Agree. I will stay a battle until the standard only gives you one direction, and "implementation defined" is removed from it. > That said, "I'm right because other people might think like I do and that would make me right" seems kind of a circular argument for a position being _better_. If posix required a valid pointer to be returned the positions would be reversed without any actual change in the technical merits. > > And "posix doesn't require this, therefore let's intentional break it to force any programs with technical posix compatability issues to use a different libc rather than change anything to humor us"... not seeing it. It's a matter of opinion. I just wanted to know the logic behind returning a pointer in this case, while for example the *BSD libc's return a NULL. I handle this case in my code properly, and bluntly abort() if my code calls malloc(0). That's good with my code, but might be bad with other people's code. >> I suggest we make malloc(0) return a NULL pointer. Any objections ? > > Only that it broke real world programs last time I tried it, but if you don't care about that then by all means force 'em to use a different libc. > > (Re: the "posix test" thread, having a ./configure --pedantic that builds a musl without gnu/dammit extensions and safety tape over the sharp bits sounds like a possible fun future thing. Having that be the default, or adding a lot of runtime checks rather than commenting stuff out at compile time, not so much...) I'm happy I can try a different libc easily. Downside : I now have some if defined(__GLIBC__) in my code :( I probably need some configure.ac magic. Igmar ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: malloc(0) behaviour 2013-01-15 9:01 ` Igmar Palsenberg @ 2013-01-15 12:58 ` Rob Landley 2013-01-15 14:54 ` dladdr() pierre 0 siblings, 1 reply; 39+ messages in thread From: Rob Landley @ 2013-01-15 12:58 UTC (permalink / raw) To: musl; +Cc: musl On 01/15/2013 03:01:06 AM, Igmar Palsenberg wrote: > Agree. I will stay a battle until the standard only gives you one > direction, and > "implementation defined" is removed from it. No, pragmatists will do the thing that makes more programs work and pedants will write code that is never used by anybody. If you consider you vs you a battle, have at. I'm going to stop reading this thread now. Rob ^ permalink raw reply [flat|nested] 39+ messages in thread
* dladdr() 2013-01-15 12:58 ` Rob Landley @ 2013-01-15 14:54 ` pierre 2013-01-15 18:48 ` dladdr() Rich Felker 0 siblings, 1 reply; 39+ messages in thread From: pierre @ 2013-01-15 14:54 UTC (permalink / raw) To: musl Latest musl-libc dladdr() fails here: // glibc: // gcc hello.c -o hello -ldl // // ret:1 // dli_fname:./hello:0x400000 // dli_sname:puts:0x4004c0 // musl: // musl-gcc hello.c -o hello // // ret:0 // dli_fname:(null):0 // dli_sname:(null):0 Dl_info dl = {0}; int ret = dladdr(puts + 0, &dl); printf("ret:%d\n" "dli_fname:%s:%p\n" "dli_sname:%s:%p\n\n", ret, dl.dli_fname, dl.dli_fbase, dl.dli_sname, dl.dli_saddr); What is missing to make this work? Pierre ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-15 14:54 ` dladdr() pierre @ 2013-01-15 18:48 ` Rich Felker 2013-01-16 11:00 ` dladdr() pierre 0 siblings, 1 reply; 39+ messages in thread From: Rich Felker @ 2013-01-15 18:48 UTC (permalink / raw) To: musl, pierre On Tue, Jan 15, 2013 at 03:54:44PM +0100, pierre wrote: > Latest musl-libc dladdr() fails here: > > // glibc: > // gcc hello.c -o hello -ldl > // > // ret:1 > // dli_fname:./hello:0x400000 > // dli_sname:puts:0x4004c0 > > // musl: > // musl-gcc hello.c -o hello > // > // ret:0 > // dli_fname:(null):0 > // dli_sname:(null):0 > > Dl_info dl = {0}; > int ret = dladdr(puts + 0, &dl); > > printf("ret:%d\n" > "dli_fname:%s:%p\n" > "dli_sname:%s:%p\n\n", > ret, > dl.dli_fname, dl.dli_fbase, > dl.dli_sname, dl.dli_saddr); > > What is missing to make this work? Are you using static or dynamic linking? If static, dladdr is just a stub that always fails. It could be implemented to work under some conditions, but it would be highly dependent on what options you compile the binary with, since by default static binaries do not contain the bloat that would be needed to perform introspection. Rich ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-15 18:48 ` dladdr() Rich Felker @ 2013-01-16 11:00 ` pierre 2013-01-16 12:51 ` dladdr() Szabolcs Nagy 0 siblings, 1 reply; 39+ messages in thread From: pierre @ 2013-01-16 11:00 UTC (permalink / raw) To: musl, Rich Felker [-- Attachment #1: Type: text/plain, Size: 927 bytes --] Rich, > Are you using static or dynamic linking? Dynamic. Before posting here, I checked the musl source code, and the binary: 'objdump -z -d hello' gives: callq 4003e0 <dladdr@plt> Further, 'musl-gcc -static hello.c -o hello_s' weights 23KB while the shared binary above weights only 7KB. Finally, my example provided the compilation/linking cmd line, which, by default, is dynamic: // glibc: // gcc hello.c -o hello -ldl // musl: // musl-gcc hello.c -o hello I am attaching the source code linked with musl 0.9.8 so you can duplicate this issue without wasting time. Besides knowing how to enable dladdr() in dynamic mode, I am interested in knowing what other steps are needed beyond replacing the #ifdef SHARED by another custom define (#ifdef __FORCE_DL) for musl's dl implementation to do the job in static binaries (with exported symbols). Thank you for musl, a much-needed alternative. Pierre [-- Attachment #2: hello.c --] [-- Type: text/x-csrc, Size: 685 bytes --] // ------------------------------ // glibc: // gcc hello.c -o hello -ldl // // ret:1 // dli_fname:./hello:0x400000 // dli_sname:puts:0x4004c0 // ------------------------------ // musl: // musl-gcc hello.c -o hello // // ret:0 // dli_fname:(null):0 // dli_sname:(null):0 // ------------------------------ #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <dlfcn.h> int main(void) { puts("\nhello world!\n"); Dl_info dl = {0}; int ret = dladdr(puts + 0, &dl); printf("ret:%d\n" "dli_fname:%s:%p\n" "dli_sname:%s:%p\n\n", ret, dl.dli_fname, dl.dli_fbase, dl.dli_sname, dl.dli_saddr); return 0; } ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-16 11:00 ` dladdr() pierre @ 2013-01-16 12:51 ` Szabolcs Nagy 2013-01-16 14:24 ` dladdr() musl 0 siblings, 1 reply; 39+ messages in thread From: Szabolcs Nagy @ 2013-01-16 12:51 UTC (permalink / raw) To: musl, pierre * pierre <pierre@silentlife.com> [2013-01-16 12:00:18 +0100]: > > // glibc: > // gcc hello.c -o hello -ldl > > // musl: > // musl-gcc hello.c -o hello > > I am attaching the source code linked with musl 0.9.8 > so you can duplicate this issue without wasting time. > > Besides knowing how to enable dladdr() in dynamic mode, > I am interested in knowing what other steps are needed > beyond replacing the #ifdef SHARED by another custom > define (#ifdef __FORCE_DL) for musl's dl implementation > to do the job in static binaries (with exported symbols). > there was a thread about dl* in static binaries, it is not yet supported and non-trivial to get right i linked to the most interesting posts at the bottom: http://port70.net/~nsz/32_dynlink.html i ran hello in gdb and it seems sym->st_shndx==0 for the "puts" symbol, i dont know the semantics of st_shndx, but the following patch makes hello.c work: diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c index 935367e..ba0bd8f 100644 --- a/src/ldso/dynlink.c +++ b/src/ldso/dynlink.c @@ -1178,7 +1178,7 @@ int __dladdr(void *addr, Dl_info *info) } for (; nsym; nsym--, sym++) { - if (sym->st_shndx && sym->st_value + if (/*sym->st_shndx &&*/ sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES) && (1<<(sym->st_info>>4) & OK_BINDS)) { void *symaddr = p->base + sym->st_value; ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-16 12:51 ` dladdr() Szabolcs Nagy @ 2013-01-16 14:24 ` musl 2013-01-16 15:20 ` dladdr() pierre 2013-01-16 16:49 ` dladdr() Rich Felker 0 siblings, 2 replies; 39+ messages in thread From: musl @ 2013-01-16 14:24 UTC (permalink / raw) To: musl On 16/01/2013 13:51, Szabolcs Nagy wrote: > * pierre <pierre@silentlife.com> [2013-01-16 12:00:18 +0100]: >> >> // glibc: >> // gcc hello.c -o hello -ldl >> >> // musl: >> // musl-gcc hello.c -o hello >> >> I am attaching the source code linked with musl 0.9.8 >> so you can duplicate this issue without wasting time. >> >> Besides knowing how to enable dladdr() in dynamic mode, >> I am interested in knowing what other steps are needed >> beyond replacing the #ifdef SHARED by another custom >> define (#ifdef __FORCE_DL) for musl's dl implementation >> to do the job in static binaries (with exported symbols). >> > > there was a thread about dl* in static binaries, it is > not yet supported and non-trivial to get right > > i linked to the most interesting posts at the bottom: > http://port70.net/~nsz/32_dynlink.html > > i ran hello in gdb and it seems > sym->st_shndx==0 > for the "puts" symbol, i dont know the semantics of > st_shndx, but the following patch makes hello.c work: > > diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c > index 935367e..ba0bd8f 100644 > --- a/src/ldso/dynlink.c > +++ b/src/ldso/dynlink.c > @@ -1178,7 +1178,7 @@ int __dladdr(void *addr, Dl_info *info) > } > > for (; nsym; nsym--, sym++) { > - if (sym->st_shndx && sym->st_value > + if (/*sym->st_shndx &&*/ sym->st_value > && (1<<(sym->st_info&0xf) & OK_TYPES) > && (1<<(sym->st_info>>4) & OK_BINDS)) { > void *symaddr = p->base + sym->st_value; > You're right, sym->st_shndx only tells if the symbol is external (resolved during relocation process) or internal (defined in the current shared object). In this example "puts" is part of libc.so and not hello. We should remove this check. BTW dli_fbase is still wrong. It should be info->dli_fbase = p->map; and not info->dli_fbase = p->base; Best Regards, Boris ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-16 14:24 ` dladdr() musl @ 2013-01-16 15:20 ` pierre 2013-01-16 16:49 ` dladdr() Rich Felker 1 sibling, 0 replies; 39+ messages in thread From: pierre @ 2013-01-16 15:20 UTC (permalink / raw) To: musl Thank you for the relevant reply, Boris and Nagy. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-16 14:24 ` dladdr() musl 2013-01-16 15:20 ` dladdr() pierre @ 2013-01-16 16:49 ` Rich Felker 2013-01-16 17:42 ` dladdr() musl 1 sibling, 1 reply; 39+ messages in thread From: Rich Felker @ 2013-01-16 16:49 UTC (permalink / raw) To: musl On Wed, Jan 16, 2013 at 03:24:24PM +0100, musl wrote: > > i ran hello in gdb and it seems > > sym->st_shndx==0 > > for the "puts" symbol, i dont know the semantics of > > st_shndx, but the following patch makes hello.c work: > > > > diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c > > index 935367e..ba0bd8f 100644 > > --- a/src/ldso/dynlink.c > > +++ b/src/ldso/dynlink.c > > @@ -1178,7 +1178,7 @@ int __dladdr(void *addr, Dl_info *info) > > } > > > > for (; nsym; nsym--, sym++) { > > - if (sym->st_shndx && sym->st_value > > + if (/*sym->st_shndx &&*/ sym->st_value > > && (1<<(sym->st_info&0xf) & OK_TYPES) > > && (1<<(sym->st_info>>4) & OK_BINDS)) { > > void *symaddr = p->base + sym->st_value; > > > > You're right, sym->st_shndx only tells if the symbol is external > (resolved during relocation process) or internal (defined in the > current shared object). st_shndx==0 is used for PLT entries. Oddly, I get (for hello.c): Symbol table '.dynsym' contains 9 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00000000 0 FUNC GLOBAL DEFAULT UND printf 2: 080482c0 0 FUNC GLOBAL DEFAULT UND puts 3: 00000000 0 FUNC GLOBAL DEFAULT UND dladdr So the symbol value (the address of the PLT entry) seems to be filled in only for symbols which are also used to take the address of a function. Other symbols in the PLT (which are used only for resolving the PLT entry at load time) lack addresses. I suspect this could prevent correct dladdr behavior in some cases, but I can't think of a serious usage case that would be broken right off. Anyway, I'm applying nsz's patch (without the commented code :) which should make the situation much better (hopefully as good as glibc's). > BTW dli_fbase is still wrong. It should be > > info->dli_fbase = p->map; > > and not > > info->dli_fbase = p->base; I notice it disagrees with glibc, but I'm not sure I agree it's wrong. The man page states that dli_fbase is the "Address at which shared object is loaded", which is never clearly defined. On the other hand, dl_iterate_phdr's dlpi_addr is specified to contain the "Base address of object", which is defined below as "the difference between the virtual memory address of the shared object and the offset of that object in the file from which it was loaded". To me, this seems like the main/only "load address" that would be of interest to a program. However, there's also an interest in matching historical practice, especially since dladdr is not a standard function and the existing implementations could be seen as the "real" specification. Do you know what other systems like BSD do? Rich ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-16 16:49 ` dladdr() Rich Felker @ 2013-01-16 17:42 ` musl 2013-01-21 2:03 ` dladdr() Rich Felker 0 siblings, 1 reply; 39+ messages in thread From: musl @ 2013-01-16 17:42 UTC (permalink / raw) To: musl On 16/01/2013 17:49, Rich Felker wrote: > On Wed, Jan 16, 2013 at 03:24:24PM +0100, musl wrote: >>> i ran hello in gdb and it seems >>> sym->st_shndx==0 >>> for the "puts" symbol, i dont know the semantics of >>> st_shndx, but the following patch makes hello.c work: >>> >>> diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c >>> index 935367e..ba0bd8f 100644 >>> --- a/src/ldso/dynlink.c >>> +++ b/src/ldso/dynlink.c >>> @@ -1178,7 +1178,7 @@ int __dladdr(void *addr, Dl_info *info) >>> } >>> >>> for (; nsym; nsym--, sym++) { >>> - if (sym->st_shndx && sym->st_value >>> + if (/*sym->st_shndx &&*/ sym->st_value >>> && (1<<(sym->st_info&0xf) & OK_TYPES) >>> && (1<<(sym->st_info>>4) & OK_BINDS)) { >>> void *symaddr = p->base + sym->st_value; >>> >> >> You're right, sym->st_shndx only tells if the symbol is external >> (resolved during relocation process) or internal (defined in the >> current shared object). > > st_shndx==0 is used for PLT entries. Oddly, I get (for hello.c): > > Symbol table '.dynsym' contains 9 entries: > Num: Value Size Type Bind Vis Ndx Name > 0: 00000000 0 NOTYPE LOCAL DEFAULT UND > 1: 00000000 0 FUNC GLOBAL DEFAULT UND printf > 2: 080482c0 0 FUNC GLOBAL DEFAULT UND puts > 3: 00000000 0 FUNC GLOBAL DEFAULT UND dladdr > > So the symbol value (the address of the PLT entry) seems to be filled > in only for symbols which are also used to take the address of a > function. Other symbols in the PLT (which are used only for resolving > the PLT entry at load time) lack addresses. I suspect this could > prevent correct dladdr behavior in some cases, but I can't think of a > serious usage case that would be broken right off. > > Anyway, I'm applying nsz's patch (without the commented code :) which > should make the situation much better (hopefully as good as glibc's). > >> BTW dli_fbase is still wrong. It should be >> >> info->dli_fbase = p->map; >> >> and not >> >> info->dli_fbase = p->base; > > I notice it disagrees with glibc, but I'm not sure I agree it's wrong. > The man page states that dli_fbase is the "Address at which shared > object is loaded", which is never clearly defined. On the other hand, > dl_iterate_phdr's dlpi_addr is specified to contain the "Base address > of object", which is defined below as "the difference between the > virtual memory address of the shared object and the offset of that > object in the file from which it was loaded". To me, this seems like > the main/only "load address" that would be of interest to a program. > However, there's also an interest in matching historical practice, > especially since dladdr is not a standard function and the existing > implementations could be seen as the "real" specification. Do you know > what other systems like BSD do? freebsd dladdr man pages defines the dli_fbase field as : "The base address at which the shared object is mapped into the address space of the calling process" see: http://www.unix.com/man-page/FreeBSD/3/dladdr/ > > Rich > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-16 17:42 ` dladdr() musl @ 2013-01-21 2:03 ` Rich Felker 2013-01-21 6:58 ` dladdr() pierre 0 siblings, 1 reply; 39+ messages in thread From: Rich Felker @ 2013-01-21 2:03 UTC (permalink / raw) To: musl On Wed, Jan 16, 2013 at 06:42:48PM +0100, musl wrote: > > implementations could be seen as the "real" specification. Do you know > > what other systems like BSD do? > > > freebsd dladdr man pages defines the dli_fbase field as : > "The base address at which the shared object is > mapped into the address space of the calling > process" > > see: > http://www.unix.com/man-page/FreeBSD/3/dladdr/ Do you know what they _do_, though? Is the fbase address of the main program 0 or the lowest virtual address of the mapping? BTW is dladdr otherwise working correctly for you now? Rich ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-21 2:03 ` dladdr() Rich Felker @ 2013-01-21 6:58 ` pierre 2013-01-21 18:35 ` dladdr() Rich Felker 0 siblings, 1 reply; 39+ messages in thread From: pierre @ 2013-01-21 6:58 UTC (permalink / raw) To: musl > is dladdr otherwise working correctly for you now? In the simple example I provided, it always works. I have added loaded modules with conflicting symbols and dladdr() also works as expected. But for my main application (which populates backtraces with dladdr), musl dladdr() works in DEBUG mode and crashes in RELEASE mode. I did not have time to isolate the problem in a reproducible code snippet but I will let you know when I have done that. Knowing that dladdr() crashes should help you anyway: it might be related to searching addresses that are out of the specified module scope (stack smashing?), or to the ability to be resilient to introspecting itself (hence the success of the musl DEBUG code, as well as of the bulky GLIBC). I don't use BSD so I can't tell what they _do_. I appreciate musl and intend to contribute, especially in the areas where my application is using libc in cornersome areas. I also find the idea of a musl-based Linux distro rather attractive as it will immensely reduce the surface of uncertainty. Pierre ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-21 6:58 ` dladdr() pierre @ 2013-01-21 18:35 ` Rich Felker 2013-01-22 6:27 ` dladdr() pierre 0 siblings, 1 reply; 39+ messages in thread From: Rich Felker @ 2013-01-21 18:35 UTC (permalink / raw) To: musl, pierre On Mon, Jan 21, 2013 at 07:58:20AM +0100, pierre wrote: > > > is dladdr otherwise working correctly for you now? > > In the simple example I provided, it always works. > I have added loaded modules with conflicting symbols > and dladdr() also works as expected. > > But for my main application (which populates backtraces > with dladdr), musl dladdr() works in DEBUG mode and > crashes in RELEASE mode. I did not have time to isolate > the problem in a reproducible code snippet but I will > let you know when I have done that. The only non-static memory access dladdr makes are: 1. Reading the map ranges for each loaded DSO from the DSO list. 2. Reading the hash table pointers for the matching DSO (if one was found), and: 2a. If there's a sysv hash table, just reading the number of symbols directly from it. 2b. If there's only a gnu hash table, reading the gnu hash table and counting the number of symbols. 3. Reading the locations of the symbol and string tables from the DSO record. 4. Reading each symbol table entry to find the best match for the argument. 5. Storing results to the caller-provided Dl_info buffer. I don't see how any of these could crash without memory already having been corrupted. Do you have a test case I could run, or have you tried building libc with -g and examining the crash in gdb to see where it happens? Rich ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-21 18:35 ` dladdr() Rich Felker @ 2013-01-22 6:27 ` pierre 2013-01-22 13:07 ` dladdr() Szabolcs Nagy 0 siblings, 1 reply; 39+ messages in thread From: pierre @ 2013-01-22 6:27 UTC (permalink / raw) To: musl, Rich Felker Rich, > I don't see how any of these could crash without > memory already having been corrupted. That's what I wrote previously: my code populates backtraces which could have been damaged by stack smashing. Backtraces are used to identify the cause of a crash, which, by definition, may not be clean. The information I wanted you guys to have is that, for the same application code, both GLIBC and musl compiled in DEBUG mode do not crash (while musl in release mode crashes). > Do you have a test case I could run, In my previous email (the one you are answering) I wrote: "I did not have time to isolate the problem in a reproducible code snippet but I will let you know when I have done that." > have you tried building libc with -g and examining > the crash in gdb to see where it happens? In my previous email (the one you are answering) I wrote: "for my main application (which populates backtraces with dladdr), musl dladdr() works in DEBUG mode and crashes in RELEASE mode." The musl "DEBUG mode" tested here used -g2 and it did not crash. I am not trying to argue with you - there's no point at doing that - I was merely providing the information I had available when you asked "does it work for you?". Pierre ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-22 6:27 ` dladdr() pierre @ 2013-01-22 13:07 ` Szabolcs Nagy 2013-01-22 13:40 ` dladdr() pierre 0 siblings, 1 reply; 39+ messages in thread From: Szabolcs Nagy @ 2013-01-22 13:07 UTC (permalink / raw) To: musl, pierre * pierre <pierre@silentlife.com> [2013-01-22 07:27:46 +0100]: > The information I wanted you guys to have is that, > for the same application code, both GLIBC and musl > compiled in DEBUG mode do not crash (while musl in > release mode crashes). ... > at doing that - I was merely providing the information > I had available when you asked "does it work for you?". i'd assume you invoke ub somewhere (and i think that's what dalias was trying to say) but it is desirable to verify that this is really not a musl bug and we need better info for that: instead of 'debug mode' vs 'release mode' the config.mak with compiler flags/architecture/.. for the two cases instead of 'crash' the details of the caught signal, backtrace, memory map etc a link to the source code or a minimal example can help too i'm not sure how populating backtrace with dladdr works, sounds suspicious, how do you get your backtrace? ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-22 13:07 ` dladdr() Szabolcs Nagy @ 2013-01-22 13:40 ` pierre 2013-01-22 13:51 ` dladdr() Rich Felker 0 siblings, 1 reply; 39+ messages in thread From: pierre @ 2013-01-22 13:40 UTC (permalink / raw) To: musl, Szabolcs Nagy Nagy, > instead of 'debug mode' vs 'release mode' the > config.mak with compiler flags/architecture/.. > for the two cases > instead of 'crash' the details of the caught > signal, backtrace, memory map etc > a link to the source code or a minimal example > can help too Since I already said that I would provide this information later (I also have a job) I wonder what the purpose of these repeated calls is. PLEASE READ THE WHOLE THREAT before adding your grain of salt. > i'm not sure how populating backtrace with > dladdr works By finding the symbols that match each IP (Instruction Pointer) address. > sounds suspicious I am impressed by the value of this argument. > how do you get your backtrace? By following stack frames. I am curious to learn what better way you will find less "suspicious". Pierre ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-22 13:40 ` dladdr() pierre @ 2013-01-22 13:51 ` Rich Felker 2013-01-22 14:59 ` dladdr() pierre 0 siblings, 1 reply; 39+ messages in thread From: Rich Felker @ 2013-01-22 13:51 UTC (permalink / raw) To: musl On Tue, Jan 22, 2013 at 02:40:44PM +0100, pierre wrote: > Nagy, > > > instead of 'debug mode' vs 'release mode' the > > config.mak with compiler flags/architecture/.. > > for the two cases > > instead of 'crash' the details of the caught > > signal, backtrace, memory map etc > > a link to the source code or a minimal example > > can help too > > Since I already said that I would provide this > information later (I also have a job) I wonder > what the purpose of these repeated calls is. > > PLEASE READ THE WHOLE THREAT before adding your > grain of salt. Apparently one or more of us missed a few details in the previous emails when replying, but neither of us is out to attack you or anything like that. I just wanted to understand whether there was some detail I was missing that might be a bug in musl, or whether the crash was caused by things outside our control. > > i'm not sure how populating backtrace with > > dladdr works > > By finding the symbols that match each IP > (Instruction Pointer) address. > > > sounds suspicious > > I am impressed by the value of this argument. > > > how do you get your backtrace? > > By following stack frames. I am curious to learn > what better way you will find less "suspicious". I think his comment was that, in the event the stack has been smashed, following the stack frames is likely (almost certain) to lead you to destinations Also, depending on what you're doing, it might not even be valid. Since DWARF2 is used for debugging nowadays, musl is normally built with -fomit-frame-pointer, so there won't be frame pointers to help you follow the stack frames out of libc functions, in case the crash happened inside a libc function. There also won't be DWARF2 information except in debug builds, and even if it is there, it's normally (unless you change CFLAGS) just available to the debugger rather than mapped into the program for introspection purposes. Anyway, my purpose, and I believe nsz's purpose, in asking questions was so that we could both (1) better evaluate whether there might be a bug in musl related to this report, and (2) give you better information about musl that might help you determine why your code isn't doing what you want it to. Rich ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-22 13:51 ` dladdr() Rich Felker @ 2013-01-22 14:59 ` pierre 2013-01-22 16:11 ` dladdr() Szabolcs Nagy 0 siblings, 1 reply; 39+ messages in thread From: pierre @ 2013-01-22 14:59 UTC (permalink / raw) To: musl Rich, > in the event the stack has been smashed, > following the stack frames is likely > (almost certain) to lead you to [bad] > destinations [...] that might not even > be valid That's precisely what I wrote in that email that nobody seems to have read. And I added, just in case it could help, that dladdr should not crash when trying to lookup invalid addresses. > there won't be frame pointers to help you > follow the stack frames out of libc functions For mere mortals, knowing that it's in libc is enough, as they then will check what junk was given to libc (and then will write a workaround if they feel that libc is buggy). For those with a stronger motivation, helping to strengthen a decent libc makes a lot more sense. I have no other motivation when I invest some of my time here. Pierre. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-22 14:59 ` dladdr() pierre @ 2013-01-22 16:11 ` Szabolcs Nagy 2013-01-22 23:43 ` dladdr() Rich Felker 0 siblings, 1 reply; 39+ messages in thread From: Szabolcs Nagy @ 2013-01-22 16:11 UTC (permalink / raw) To: musl, pierre * pierre <pierre@silentlife.com> [2013-01-22 15:59:46 +0100]: > > in the event the stack has been smashed, > > following the stack frames is likely > > (almost certain) to lead you to [bad] > > destinations [...] that might not even > > be valid > > That's precisely what I wrote in that email > that nobody seems to have read. > i can assure you that at least i read your mail we know that you think that the stack is smashed but you found it important to mention that in debug mode it works while it crashes in release mode (whatever those modes mean) what you do is suspicious because if you get the backtrace in a running program then it's most likely invalid (and i thought we could help you before you spend too much time on inherently invalid design and even give an explanation why it worked in debug mode) i dont think anyone tried to argue with you or offend you in any way > And I added, just in case it could help, that > dladdr should not crash when trying to lookup > invalid addresses. addr is never dereferenced in musl's code so invalid addr is not an issue > > there won't be frame pointers to help you > > follow the stack frames out of libc functions > > For mere mortals, knowing that it's in libc > is enough, as they then will check what junk > was given to libc (and then will write a > workaround if they feel that libc is buggy). > > For those with a stronger motivation, helping > to strengthen a decent libc makes a lot more > sense. > > I have no other motivation when I invest some > of my time here. i honestly dont understand what you are trying to say but i'm glad that you found musl useful whatever you are doing ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: dladdr() 2013-01-22 16:11 ` dladdr() Szabolcs Nagy @ 2013-01-22 23:43 ` Rich Felker 0 siblings, 0 replies; 39+ messages in thread From: Rich Felker @ 2013-01-22 23:43 UTC (permalink / raw) To: musl On Tue, Jan 22, 2013 at 05:11:42PM +0100, Szabolcs Nagy wrote: > * pierre <pierre@silentlife.com> [2013-01-22 15:59:46 +0100]: > > > in the event the stack has been smashed, > > > following the stack frames is likely > > > (almost certain) to lead you to [bad] > > > destinations [...] that might not even > > > be valid > > > > That's precisely what I wrote in that email > > that nobody seems to have read. > > > > i can assure you that at least i read your mail > > we know that you think that the stack is smashed > but you found it important to mention that in debug > mode it works while it crashes in release mode > (whatever those modes mean) If the crash is occurring in dladdr, more than just the stack must be smashed, since dladdr does not access any pointers that would be located on the stack. As far as I can tell, in order for it to crash, the linked list of loaded DSOs must be corrupted. It's possible that this is more likely with musl than with glibc because DSO records typically will get allocated in the otherwise-unused slack space at the beginning/end of DSO .data segments, where overflows of static objects could run into the DSO records. > > And I added, just in case it could help, that > > dladdr should not crash when trying to lookup > > invalid addresses. > > addr is never dereferenced in musl's code > so invalid addr is not an issue Indeed, that's the question I was trying to check on to make sure we weren't doing something wrong. Rich ^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2013-01-22 23:43 UTC | newest] Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-01-14 17:17 malloc(0) behaviour Igmar Palsenberg 2013-01-14 18:05 ` Rich Felker 2013-01-14 22:22 ` Strake 2013-01-14 23:05 ` Rich Felker 2013-01-15 8:32 ` Igmar Palsenberg 2013-01-15 12:53 ` Rob Landley 2013-01-15 22:18 ` Igmar Palsenberg 2013-01-15 8:31 ` Igmar Palsenberg 2013-01-15 11:06 ` Szabolcs Nagy 2013-01-15 12:33 ` Igmar Palsenberg 2013-01-15 13:48 ` Rich Felker 2013-01-15 22:28 ` Igmar Palsenberg 2013-01-15 23:22 ` Rob 2013-01-16 7:46 ` Igmar Palsenberg 2013-01-15 13:46 ` Rich Felker 2013-01-15 12:52 ` Rob Landley 2013-01-14 23:37 ` Rob Landley 2013-01-15 0:24 ` Rich Felker 2013-01-15 12:17 ` Rob Landley 2013-01-15 9:01 ` Igmar Palsenberg 2013-01-15 12:58 ` Rob Landley 2013-01-15 14:54 ` dladdr() pierre 2013-01-15 18:48 ` dladdr() Rich Felker 2013-01-16 11:00 ` dladdr() pierre 2013-01-16 12:51 ` dladdr() Szabolcs Nagy 2013-01-16 14:24 ` dladdr() musl 2013-01-16 15:20 ` dladdr() pierre 2013-01-16 16:49 ` dladdr() Rich Felker 2013-01-16 17:42 ` dladdr() musl 2013-01-21 2:03 ` dladdr() Rich Felker 2013-01-21 6:58 ` dladdr() pierre 2013-01-21 18:35 ` dladdr() Rich Felker 2013-01-22 6:27 ` dladdr() pierre 2013-01-22 13:07 ` dladdr() Szabolcs Nagy 2013-01-22 13:40 ` dladdr() pierre 2013-01-22 13:51 ` dladdr() Rich Felker 2013-01-22 14:59 ` dladdr() pierre 2013-01-22 16:11 ` dladdr() Szabolcs Nagy 2013-01-22 23:43 ` dladdr() Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).