malloc(0) behaviour

mailing list of musl libc
 help / color / mirror / code / Atom feed

* malloc(0) behaviour
@ 2013-01-14 17:17 Igmar Palsenberg
  2013-01-14 18:05 ` Rich Felker
  2013-01-14 23:37 ` Rob Landley
  0 siblings, 2 replies; 39+ messages in thread
From: Igmar Palsenberg @ 2013-01-14 17:17 UTC (permalink / raw)
  To: musl

Hi,

Is there a (good) reason for Musl to follow glibc's malloc(0) behaviour ? Musl returns a valid pointer, which is fine according
to the standard, but returning NULL is also fine.

IMHO, returning NULL is better : It usually kills the program if actual storage is attempted. You also can't do that if a valid pointer
is returned, so I really can't grasp the reason for returning a pointer at all, except to support buggy and lazy programming.

I suggest we make malloc(0) return a NULL pointer. Any objections ?

Regards,

	Igmar

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-14 17:17 malloc(0) behaviour Igmar Palsenberg
@ 2013-01-14 18:05 ` Rich Felker
  2013-01-14 22:22   ` Strake
  2013-01-15  8:31   ` Igmar Palsenberg
  2013-01-14 23:37 ` Rob Landley
  1 sibling, 2 replies; 39+ messages in thread
From: Rich Felker @ 2013-01-14 18:05 UTC (permalink / raw)
  To: musl

On Mon, Jan 14, 2013 at 06:17:47PM +0100, Igmar Palsenberg wrote:
> Hi,
> 
> Is there a (good) reason for Musl to follow glibc's malloc(0)
> behaviour ? Musl returns a valid pointer, which is fine according to
> the standard, but returning NULL is also fine.

Yes, there are many good reasons. The most obvious (but stupid) one is
that a huge number of programs will "replace" malloc with one where
malloc(0) returns something other than a null pointer if the system's
malloc(0) returns null, and this adds both bloat and risk of
bugs/breakage from the replacement. But there are other much more
fundamental reasons too. Basically they all come down to interactions
between the requirements of malloc and realloc, and the fact that
returning a null pointer from realloc means failure (and thus that the
original object was not freed). I don't want to try to remember all
the details at the moment (they're enough to make your brain hurt..)
but here are some links to get you started on the issue:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_400.htm
http://austingroupbugs.net/view.php?id=400
http://sourceware.org/bugzilla/show_bug.cgi?id=12547

Rich

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-14 18:05 ` Rich Felker
@ 2013-01-14 22:22   ` Strake
  2013-01-14 23:05     ` Rich Felker
  2013-01-15  8:32     ` Igmar Palsenberg
  2013-01-15  8:31   ` Igmar Palsenberg
  1 sibling, 2 replies; 39+ messages in thread
From: Strake @ 2013-01-14 22:22 UTC (permalink / raw)
  To: musl

On 14/01/2013, Rich Felker <dalias@aerifal.cx> wrote:
> Yes, there are many good reasons. The most obvious (but stupid) one is
> that a huge number of programs will "replace" malloc with one where
> malloc(0) returns something other than a null pointer if the system's
> malloc(0) returns null, and this adds both bloat and risk of
> bugs/breakage from the replacement. But there are other much more
> fundamental reasons too. Basically they all come down to interactions
> between the requirements of malloc and realloc, and the fact that
> returning a null pointer from realloc means failure (and thus that the
> original object was not freed).

Another: Null means allocation failure. As malloc ought to never fail
to find zero bytes free, it thus makes sense to return a non-null
pointer.

Cheers,
Strake


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-14 22:22   ` Strake
@ 2013-01-14 23:05     ` Rich Felker
  2013-01-15  8:32     ` Igmar Palsenberg
  1 sibling, 0 replies; 39+ messages in thread
From: Rich Felker @ 2013-01-14 23:05 UTC (permalink / raw)
  To: musl

On Mon, Jan 14, 2013 at 05:22:16PM -0500, Strake wrote:
> On 14/01/2013, Rich Felker <dalias@aerifal.cx> wrote:
> > Yes, there are many good reasons. The most obvious (but stupid) one is
> > that a huge number of programs will "replace" malloc with one where
> > malloc(0) returns something other than a null pointer if the system's
> > malloc(0) returns null, and this adds both bloat and risk of
> > bugs/breakage from the replacement. But there are other much more
> > fundamental reasons too. Basically they all come down to interactions
> > between the requirements of malloc and realloc, and the fact that
> > returning a null pointer from realloc means failure (and thus that the
> > original object was not freed).
> 
> Another: Null means allocation failure. As malloc ought to never fail
> to find zero bytes free, it thus makes sense to return a non-null
> pointer.

Yes. In any case, I don't see why a correct program would care what
malloc(0) returns, since passing 0 to malloc just makes it harder to
write a correct program. Normally after calling malloc, you check the
return value, and handle it as an error if the return value is a null
pointer. But if your program might pass 0 to malloc, you have to also
consider the possibility that the null pointer was returned not as an
error but as the "success" result for malloc(0). This greatly
complicates your error handling; now you have to either clear errno
first and check whether it's set after the call to malloc to determine
whether allocation failed, or check for size==0 in the failure case
and treat that specially as a sort of success.

Things get even worse if you use realloc(p,0), especially if you want
to support non-conforming implementations like glibc...

Rich

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-14 22:22   ` Strake
  2013-01-14 23:05     ` Rich Felker
@ 2013-01-15  8:32     ` Igmar Palsenberg
  2013-01-15 12:53       ` Rob Landley
  1 sibling, 1 reply; 39+ messages in thread
From: Igmar Palsenberg @ 2013-01-15  8:32 UTC (permalink / raw)
  To: musl



> On 14/01/2013, Rich Felker <dalias@aerifal.cx> wrote:
>> Yes, there are many good reasons. The most obvious (but stupid) one is
>> that a huge number of programs will "replace" malloc with one where
>> malloc(0) returns something other than a null pointer if the system's
>> malloc(0) returns null, and this adds both bloat and risk of
>> bugs/breakage from the replacement. But there are other much more
>> fundamental reasons too. Basically they all come down to interactions
>> between the requirements of malloc and realloc, and the fact that
>> returning a null pointer from realloc means failure (and thus that the
>> original object was not freed).
> 
> Another: Null means allocation failure. As malloc ought to never fail
> to find zero bytes free, it thus makes sense to return a non-null
> pointer.

A valid pointer also means you should be able to store something. With malloc(0), you can't.
If you ask me, abort() would be a same thing to do. And no, I don't expect a decent libc to actually do
this :-)



	Igmar

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-15  8:32     ` Igmar Palsenberg
@ 2013-01-15 12:53       ` Rob Landley
  2013-01-15 22:18         ` Igmar Palsenberg
  0 siblings, 1 reply; 39+ messages in thread
From: Rob Landley @ 2013-01-15 12:53 UTC (permalink / raw)
  To: musl; +Cc: musl

On 01/15/2013 02:32:55 AM, Igmar Palsenberg wrote:

> > On 14/01/2013, Rich Felker <dalias@aerifal.cx> wrote:
> > Another: Null means allocation failure. As malloc ought to never  
> fail
> > to find zero bytes free, it thus makes sense to return a non-null
> > pointer.
> 
> A valid pointer also means you should be able to store something.

Says who? Fire up the mmap() man page and look at PROT_NONE.

Rob

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-15 12:53       ` Rob Landley
@ 2013-01-15 22:18         ` Igmar Palsenberg
  0 siblings, 0 replies; 39+ messages in thread
From: Igmar Palsenberg @ 2013-01-15 22:18 UTC (permalink / raw)
  To: musl



>> > On 14/01/2013, Rich Felker <dalias@aerifal.cx> wrote:
>> > Another: Null means allocation failure. As malloc ought to never fail
>> > to find zero bytes free, it thus makes sense to return a non-null
>> > pointer.
>> A valid pointer also means you should be able to store something.
> 
> Says who? Fire up the mmap() man page and look at PROT_NONE.

malloc() doesn't handle that case. I also find it irrelevant in this case. You can only pass it to free() legally,
everything else is UB.




	Igmar

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-14 18:05 ` Rich Felker
  2013-01-14 22:22   ` Strake
@ 2013-01-15  8:31   ` Igmar Palsenberg
  2013-01-15 11:06     ` Szabolcs Nagy
  2013-01-15 12:52     ` Rob Landley
  1 sibling, 2 replies; 39+ messages in thread
From: Igmar Palsenberg @ 2013-01-15  8:31 UTC (permalink / raw)
  To: musl



>> Is there a (good) reason for Musl to follow glibc's malloc(0)
>> behaviour ? Musl returns a valid pointer, which is fine according to
>> the standard, but returning NULL is also fine.
> 
> Yes, there are many good reasons. The most obvious (but stupid) one is
> that a huge number of programs will "replace" malloc with one where
> malloc(0) returns something other than a null pointer if the system's
> malloc(0) returns null, and this adds both bloat and risk of
> bugs/breakage from the replacement. But there are other much more
> fundamental reasons too. Basically they all come down to interactions
> between the requirements of malloc and realloc, and the fact that
> returning a null pointer from realloc means failure (and thus that the
> original object was not freed). I don't want to try to remember all
> the details at the moment (they're enough to make your brain hurt..)
> but here are some links to get you started on the issue:
> 
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_400.htm
> http://austingroupbugs.net/view.php?id=400
> http://sourceware.org/bugzilla/show_bug.cgi?id=12547

While the above is clear to me, returning a pointer that can't hold anything just seems wrong to me.
It's also a matter of promoting bad code : Doing a malloc(0) is simply a bug. People are just to lazy to check return values,
and this makes the loop 3 lines shorter.

I'll wrap malloc() to include an abort in my case :)



	Igmar

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-15  8:31   ` Igmar Palsenberg
@ 2013-01-15 11:06     ` Szabolcs Nagy
  2013-01-15 12:33       ` Igmar Palsenberg
  2013-01-15 13:46       ` Rich Felker
  2013-01-15 12:52     ` Rob Landley
  1 sibling, 2 replies; 39+ messages in thread
From: Szabolcs Nagy @ 2013-01-15 11:06 UTC (permalink / raw)
  To: musl

* Igmar Palsenberg <igmar@palsenberg.com> [2013-01-15 09:31:24 +0100]:
> > fundamental reasons too. Basically they all come down to interactions
> > between the requirements of malloc and realloc, and the fact that
> > returning a null pointer from realloc means failure (and thus that the
> > 
> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_400.htm
> > http://austingroupbugs.net/view.php?id=400
> > http://sourceware.org/bugzilla/show_bug.cgi?id=12547
> 
> While the above is clear to me, returning a pointer that can't hold anything just seems wrong to me.

i don't think we have too many options here,
the standards and historical practices has
various inconsistencies and musl has the least
broken one

but we can do a theoretical discussion about
the merits of malloc(0)!=0:

i'm surprised that it "seems wrong" to you,
you can access the amount of bytes you requested
through the returned pointer p, evaluating
p+size is valid, p is suitably aligned for all
objects and it can be freed.
these assumptions are broken if malloc(0)==0

if the standard made malloc(0) work in ansi c
then it would save some branch logic and would
made the world a safer place
(because in a fair amount of code that gets
array length from external source no special
casing would be needed for length==0)

in rob pike's words "zero is a perfectly fine value"
http://code.google.com/p/go/issues/detail?id=4142#c2

> It's also a matter of promoting bad code : Doing a malloc(0) is simply a bug. People are just to lazy to check return values,
> and this makes the loop 3 lines shorter.

malloc(0) is implementation-defined with
two different conforming implementations

but either one you choose in practice a lot
of code will rely on the choosen behaviour
incorrectly

returning 0 does not save you from that

returning 0 has the drawback that realloc(0,0)
will be inconsistent
(either with the realloc(0,n)===malloc(n) assumption
or the realloc(p,0) failure reporting when p needs
to be freed)

> I'll wrap malloc() to include an abort in my case :)

but don't do that in library code that may be
used in a long running process: allocation failures
should be reported to let the caller handle it


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-15 11:06     ` Szabolcs Nagy
@ 2013-01-15 12:33       ` Igmar Palsenberg
  2013-01-15 13:48         ` Rich Felker
  2013-01-15 13:46       ` Rich Felker
  1 sibling, 1 reply; 39+ messages in thread
From: Igmar Palsenberg @ 2013-01-15 12:33 UTC (permalink / raw)
  To: musl



>> While the above is clear to me, returning a pointer that can't hold anything just seems wrong to me.
> 
> i don't think we have too many options here,
> the standards and historical practices has
> various inconsistencies and musl has the least
> broken one
> 
> but we can do a theoretical discussion about
> the merits of malloc(0)!=0:
> 
> i'm surprised that it "seems wrong" to you,
> you can access the amount of bytes you requested
> through the returned pointer p, evaluating
> p+size is valid, p is suitably aligned for all
> objects and it can be freed.
> these assumptions are broken if malloc(0)==0

That's there to access if size is 0 ? Sure, you can access :

struct foo {
};

which is size 0. I do wonder what that gives me in practice. That is, not counting the fact that :

if (size == 0)
	size = 1;

was a common practice in malloc() implementations a while ago.

> if the standard made malloc(0) work in ansi c
> then it would save some branch logic and would
> made the world a safer place
> (because in a fair amount of code that gets
> array length from external source no special
> casing would be needed for length==0)
> 
> in rob pike's words "zero is a perfectly fine value"
> http://code.google.com/p/go/issues/detail?id=4142#c2

He does have a point. If I go to the gas station, hang in the fuel dispencer, pull it out again directly afterwards, and telling the guy behind the desk
that I didn't actually got fuel, I'm probably stared at :)

My way of thinking is just different, and both are fine.
 
>> It's also a matter of promoting bad code : Doing a malloc(0) is simply a bug. People are just to lazy to check return values,
>> and this makes the loop 3 lines shorter.
> 
> malloc(0) is implementation-defined with
> two different conforming implementations
> 
> but either one you choose in practice a lot
> of code will rely on the choosen behaviour
> incorrectly
> 
> returning 0 does not save you from that
> 
> returning 0 has the drawback that realloc(0,0)
> will be inconsistent
> (either with the realloc(0,n)===malloc(n) assumption
> or the realloc(p,0) failure reporting when p needs
> to be freed)

Agree. I always handle that case. 

> 
>> I'll wrap malloc() to include an abort in my case :)
> 
> but don't do that in library code that may be
> used in a long running process: allocation failures
> should be reported to let the caller handle it

No, just in user code. Libraries shouldn't abort, I agree. In my case, it only aborts in debug mode to aid testing.




	Igmar

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-15 12:33       ` Igmar Palsenberg
@ 2013-01-15 13:48         ` Rich Felker
  2013-01-15 22:28           ` Igmar Palsenberg
  0 siblings, 1 reply; 39+ messages in thread
From: Rich Felker @ 2013-01-15 13:48 UTC (permalink / raw)
  To: musl

On Tue, Jan 15, 2013 at 01:33:07PM +0100, Igmar Palsenberg wrote:
> 
> 
> >> While the above is clear to me, returning a pointer that can't hold anything just seems wrong to me.
> > 
> > i don't think we have too many options here,
> > the standards and historical practices has
> > various inconsistencies and musl has the least
> > broken one
> > 
> > but we can do a theoretical discussion about
> > the merits of malloc(0)!=0:
> > 
> > i'm surprised that it "seems wrong" to you,
> > you can access the amount of bytes you requested
> > through the returned pointer p, evaluating
> > p+size is valid, p is suitably aligned for all
> > objects and it can be freed.
> > these assumptions are broken if malloc(0)==0
> 
> That's there to access if size is 0 ? Sure, you can access :
> 
> struct foo {
> };

This is a constraint violation. C does not allow empty structs, and
even if it did, they would not have size 0, since no type or object
ever has size 0 in C.

> 
> which is size 0. I do wonder what that gives me in practice. That is, not counting the fact that :
> 
> if (size == 0)
> 	size = 1;
> 
> was a common practice in malloc() implementations a while ago.

Of course, this is the canonical, simplest way to make malloc(0)
return a unique pointer.

Rich


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-15 13:48         ` Rich Felker
@ 2013-01-15 22:28           ` Igmar Palsenberg
  2013-01-15 23:22             ` Rob
  0 siblings, 1 reply; 39+ messages in thread
From: Igmar Palsenberg @ 2013-01-15 22:28 UTC (permalink / raw)
  To: musl

>> 
>> That's there to access if size is 0 ? Sure, you can access :
>> 
>> struct foo {
>> };
> 
> This is a constraint violation. C does not allow empty structs, and
> even if it did, they would not have size 0, since no type or object
> ever has size 0 in C.

GCC thinks otherwise : 

#include <stdlib.h>
#include <stdio.h>

struct test {
};

int main(int argc, char **argv)
{
	char *x = NULL;

	printf("sizeof test : %d\n", sizeof(struct test));

	return 0;
}

[igmar@devel ~]$ ./x
sizeof test : 0

It gives me a warning, but doesn't error out. Olders version might behave differently, I don't have those installed. The LLVM compiler does the same.
No idea what the standard says, but your remarks sounds correct to me. 

>> 
>> which is size 0. I do wonder what that gives me in practice. That is, not counting the fact that :
>> 
>> if (size == 0)
>> 	size = 1;
>> 
>> was a common practice in malloc() implementations a while ago.
> 
> Of course, this is the canonical, simplest way to make malloc(0)
> return a unique pointer.

Enough for this thread. I did got the answer I wanted, and the result I want is easy to realise. Not mu intention to irritate people.



	Igmar





^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-15 22:28           ` Igmar Palsenberg
@ 2013-01-15 23:22             ` Rob
  2013-01-16  7:46               ` Igmar Palsenberg
  0 siblings, 1 reply; 39+ messages in thread
From: Rob @ 2013-01-15 23:22 UTC (permalink / raw)
  To: musl

On Tue, Jan 15, 2013 at 11:28:14PM +0100, Igmar Palsenberg wrote:
> >> That's there to access if size is 0 ? Sure, you can access :
> >>
> >> struct foo {
> >> };
> >
> > This is a constraint violation. C does not allow empty structs, and
> > even if it did, they would not have size 0, since no type or object
> > ever has size 0 in C.
>
> GCC thinks otherwise

Therein lies the problem.
It's an extension, as are a multitude of other things such as ?:
expressions missing the middle expression and ({...}) and so on. Clang
also implements these, as do other compilers such as tcc.

Rob


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-15 23:22             ` Rob
@ 2013-01-16  7:46               ` Igmar Palsenberg
  0 siblings, 0 replies; 39+ messages in thread
From: Igmar Palsenberg @ 2013-01-16  7:46 UTC (permalink / raw)
  To: musl



>>> This is a constraint violation. C does not allow empty structs, and
>>> even if it did, they would not have size 0, since no type or object
>>> ever has size 0 in C.
>> 
>> GCC thinks otherwise
> 
> Therein lies the problem.
> It's an extension, as are a multitude of other things such as ?:
> expressions missing the middle expression and ({...}) and so on. Clang
> also implements these, as do other compilers such as tcc.

True. I just wanted to indicate that because it compiles, it ends up being used, and considered "standard code". Unportable in theory,
but this kind of extensions enabled by default causes this. The same goes for zero-length arrays if I remember correctly.

I still need to lookup how to disable those.



	Igmar

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-15 11:06     ` Szabolcs Nagy
  2013-01-15 12:33       ` Igmar Palsenberg
@ 2013-01-15 13:46       ` Rich Felker
  1 sibling, 0 replies; 39+ messages in thread
From: Rich Felker @ 2013-01-15 13:46 UTC (permalink / raw)
  To: musl

On Tue, Jan 15, 2013 at 12:06:18PM +0100, Szabolcs Nagy wrote:
> * Igmar Palsenberg <igmar@palsenberg.com> [2013-01-15 09:31:24 +0100]:
> > > fundamental reasons too. Basically they all come down to interactions
> > > between the requirements of malloc and realloc, and the fact that
> > > returning a null pointer from realloc means failure (and thus that the
> > > 
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_400.htm
> > > http://austingroupbugs.net/view.php?id=400
> > > http://sourceware.org/bugzilla/show_bug.cgi?id=12547
> > 
> > While the above is clear to me, returning a pointer that can't hold anything just seems wrong to me.
> 
> i don't think we have too many options here,
> the standards and historical practices has
> various inconsistencies and musl has the least
> broken one
> 
> but we can do a theoretical discussion about
> the merits of malloc(0)!=0:
> 
> i'm surprised that it "seems wrong" to you,
> you can access the amount of bytes you requested
> through the returned pointer p, evaluating
> p+size is valid, p is suitably aligned for all
> objects and it can be freed.
> these assumptions are broken if malloc(0)==0
> 
> if the standard made malloc(0) work in ansi c
> then it would save some branch logic and would
> made the world a safer place
> (because in a fair amount of code that gets
> array length from external source no special
> casing would be needed for length==0)

In fairness, there's hardly any difference between the work involved
in:

    if (size < LIMIT)

and

    if (size-1 < LIMIT-1)

The latter catches 0 and treats it as invalid.

> > I'll wrap malloc() to include an abort in my case :)
> 
> but don't do that in library code that may be
> used in a long running process: allocation failures
> should be reported to let the caller handle it

I think he meant malloc(0) would abort to indicate that, in the rules
of his project, malloc(0) is a programming error. This is not such a
bad idea..

Rich


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-15  8:31   ` Igmar Palsenberg
  2013-01-15 11:06     ` Szabolcs Nagy
@ 2013-01-15 12:52     ` Rob Landley
  1 sibling, 0 replies; 39+ messages in thread
From: Rob Landley @ 2013-01-15 12:52 UTC (permalink / raw)
  To: musl

On 01/15/2013 02:31:24 AM, Igmar Palsenberg wrote:
> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_400.htm
> > http://austingroupbugs.net/view.php?id=400
> > http://sourceware.org/bugzilla/show_bug.cgi?id=12547
> 
> While the above is clear to me, returning a pointer that can't hold  
> anything just
> seems wrong to me.

Returning failure from an allocation that can't _not_ be satisfied  
seems wrong to me.

Rob

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-14 17:17 malloc(0) behaviour Igmar Palsenberg
  2013-01-14 18:05 ` Rich Felker
@ 2013-01-14 23:37 ` Rob Landley
  2013-01-15  0:24   ` Rich Felker
  2013-01-15  9:01   ` Igmar Palsenberg
  1 sibling, 2 replies; 39+ messages in thread
From: Rob Landley @ 2013-01-14 23:37 UTC (permalink / raw)
  To: musl

On 01/14/2013 11:17:47 AM, Igmar Palsenberg wrote:
> Hi,
> 
> Is there a (good) reason for Musl to follow glibc's malloc(0)  
> behaviour ?

Because not doing so breaks some programs? (Dunno if it still does, but  
last time I tried switching that config option off in uClibc the linux  
from scratch build didn't finish.)

> Musl returns a valid pointer, which is fine according
> to the standard, but returning NULL is also fine.
> 
> IMHO, returning NULL is better : It usually kills the program if  
> actual storage is
> attempted.

It also kills the program if they're checking the return code of  
malloc() and treating NULL as an allocation failure indicator. NULL has  
a defined meaning of "failed", when a zero length allocation is trivial  
to satisfy and not necessarily wrong.

Should a zero length file read return -1 instead of 0? Is it the  
program's job to make sure it never makes a NOP syscall? Does adding  
special case checks in your code to avoid such NOP calls actually make  
the code smaller and simpler?

> You also can't do that if a valid pointer
> is returned, so I really can't grasp the reason for returning a  
> pointer at all,

Not indicating that the allocation failed and triggering an assert()  
when there isn't actually a problem with a legitimately zero length  
array that had nothing in it? (Both times I debugged why LFS stuff was  
failing that's what it turned out to be, but I didn't spend too much  
time on it before just switching the uClibc option on to support it.)

> except to support buggy and lazy programming.

You're defining "lazy" here a "not adding a special case in the caller  
for every use of malloc()". That's certainly a point of view, but I'm  
not sure that's the word you want to use for it. "Not sufficiently  
defensive programming" maybe?

Whether or not defensive programming is an improvement is one of those  
perpetual arguments having to do with differing mindsets. In reality  
the reason you'd want to do it is some libc implementations might _not_  
allow malloc(0) to succeed, and thus there are portability issues. But  
only for people who care about portability off glibc, which is an  
uphill battle in the first place.

That said, "I'm right because other people might think like I do and  
that would make me right" seems kind of a circular argument for a  
position being _better_. If posix required a valid pointer to be  
returned the positions would be reversed without any actual change in  
the technical merits.

And "posix doesn't require this, therefore let's intentional break it  
to force any programs with technical posix compatability issues to use  
a different libc rather than change anything to humor us"... not seeing  
it.

> I suggest we make malloc(0) return a NULL pointer. Any objections ?

Only that it broke real world programs last time I tried it, but if you  
don't care about that then by all means force 'em to use a different  
libc.

(Re: the "posix test" thread, having a ./configure --pedantic that  
builds a musl without gnu/dammit extensions and safety tape over the  
sharp bits sounds like a possible fun future thing. Having that be the  
default, or adding a lot of runtime checks rather than commenting stuff  
out at compile time, not so much...)

Rob

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-14 23:37 ` Rob Landley
@ 2013-01-15  0:24   ` Rich Felker
  2013-01-15 12:17     ` Rob Landley
  2013-01-15  9:01   ` Igmar Palsenberg
  1 sibling, 1 reply; 39+ messages in thread
From: Rich Felker @ 2013-01-15  0:24 UTC (permalink / raw)
  To: musl

On Mon, Jan 14, 2013 at 05:37:29PM -0600, Rob Landley wrote:
> On 01/14/2013 11:17:47 AM, Igmar Palsenberg wrote:
> >Hi,
> >
> >Is there a (good) reason for Musl to follow glibc's malloc(0)
> >behaviour ?
> 
> Because not doing so breaks some programs? (Dunno if it still does,
> but last time I tried switching that config option off in uClibc the
> linux from scratch build didn't finish.)

I don't think it breaks any programs; the ones that depend on
malloc(0) returning nonzero seem to use either gnulib or their own
hacks to wrap malloc if the system malloc does not have this behavior.
However, the bug that originally prompted me to switch the behavior
was that the detection could be broken. gnulib was compiling a tiny
test program to check the behavior of malloc, and with musl, it got
linked to __simple_malloc, which was giving the malloc(0)!=NULL
behavior. The full program then got linked to real malloc (where
malloc(0) was returning a null pointer), and horribly broke, reporting
spurious allocation failures and aborting.

At that time, I researched and discussed the topic and the conclusion
reached was that malloc(0)!=NULL makes more sense.

> >Musl returns a valid pointer, which is fine according
> >to the standard, but returning NULL is also fine.
> >
> >IMHO, returning NULL is better : It usually kills the program if
> >actual storage is
> >attempted.
> 
> It also kills the program if they're checking the return code of
> malloc() and treating NULL as an allocation failure indicator. NULL
> has a defined meaning of "failed", when a zero length allocation is
> trivial to satisfy and not necessarily wrong.
> 
> Should a zero length file read return -1 instead of 0? Is it the
> program's job to make sure it never makes a NOP syscall? Does adding
> special case checks in your code to avoid such NOP calls actually
> make the code smaller and simpler?

Indeed, another argument for malloc(0)!=NULL is to avoid having to
write special-case code to handle it. Wrapping malloc with a function
that replaces an argument of 0 with an argument of 1 is not just a
hack to get "GNU behavior"; it's what you probably want to be doing
anyway (even if there were no GNU to copy) if there's a chance you
might pass 0 to malloc. It leads to the simplest code. Of course it
would make sense to give this wrapper a proper name rather than using
macro hacks to "replace" malloc like gnulib does...

Or, they could use my convenient formula...

#undef malloc
#define malloc(n) malloc((n)|1)

which seems a lot cheaper than a wrapper. It can be applied to realloc
too, but might get expensive with calloc.

> >You also can't do that if a valid pointer
> >is returned, so I really can't grasp the reason for returning a
> >pointer at all,
> 
> Not indicating that the allocation failed and triggering an assert()
> when there isn't actually a problem with a legitimately zero length
> array that had nothing in it? (Both times I debugged why LFS stuff
> was failing that's what it turned out to be, but I didn't spend too
> much time on it before just switching the uClibc option on to
> support it.)

While in some cases it would be nice to get a fault, you don't usually
get a fault when trying to access past the end of a length-1 array, so
why should you expect one when trying to access past the end of a
"length-0 array"?

> >except to support buggy and lazy programming.
> 
> You're defining "lazy" here a "not adding a special case in the
> caller for every use of malloc()". That's certainly a point of view,
> but I'm not sure that's the word you want to use for it. "Not
> sufficiently defensive programming" maybe?

Well, doing nothing to account for the fact that malloc(0) "failing"
might not indicate a problematic OOM condition is "lazy" in my book.

> And "posix doesn't require this, therefore let's intentional break
> it to force any programs with technical posix compatability issues
> to use a different libc rather than change anything to humor us"...
> not seeing it.

When POSIX allows implementation options but one option has clear
technical merits over the others, as is the case here, I don't see any
good reason to take the low-quality option. Especially if most
existing implementations already take the high-quality one, as this
suggests to me it's somewhat reasonable for otherwise-portable
programs to assume the high-quality behavior.

> (Re: the "posix test" thread, having a ./configure --pedantic that
> builds a musl without gnu/dammit extensions and safety tape over the
> sharp bits sounds like a possible fun future thing. Having that be
> the default, or adding a lot of runtime checks rather than
> commenting stuff out at compile time, not so much...)

Sounds like a lot of work for something that could be better
accomplished with static analysis tools and such...

Rich

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-15  0:24   ` Rich Felker
@ 2013-01-15 12:17     ` Rob Landley
  0 siblings, 0 replies; 39+ messages in thread
From: Rob Landley @ 2013-01-15 12:17 UTC (permalink / raw)
  To: musl; +Cc: musl

On 01/14/2013 06:24:42 PM, Rich Felker wrote:
> > Not indicating that the allocation failed and triggering an assert()
> > when there isn't actually a problem with a legitimately zero length
> > array that had nothing in it? (Both times I debugged why LFS stuff
> > was failing that's what it turned out to be, but I didn't spend too
> > much time on it before just switching the uClibc option on to
> > support it.)
> 
> While in some cases it would be nice to get a fault, you don't usually
> get a fault when trying to access past the end of a length-1 array, so
> why should you expect one when trying to access past the end of a
> "length-0 array"?

Nobody accessed it. They were doing something handwavingly like:

   if (!(array=malloc(sizeof(struct blah)*len))) die();
   for (i=0; i<len; i++) blah();

Which works just fine for len=0 because the for loop doesn't do  
anything... assuming malloc doesn't return 0 and trigger the allocation  
failure check.

(But again, this was ~5 years ago. Haven't tried switching it off  
since.)

> > >except to support buggy and lazy programming.
> >
> > You're defining "lazy" here a "not adding a special case in the
> > caller for every use of malloc()". That's certainly a point of view,
> > but I'm not sure that's the word you want to use for it. "Not
> > sufficiently defensive programming" maybe?
> 
> Well, doing nothing to account for the fact that malloc(0) "failing"
> might not indicate a problematic OOM condition is "lazy" in my book.

Maybe, but it means they're special casing 0, when otherwise it just  
works. (At least on eglibc, and getting people to care about  
portability off glibc remains a thing.)

My point was that requiring them to special case this is not  
necessarily an improvement.

Rob

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-14 23:37 ` Rob Landley
  2013-01-15  0:24   ` Rich Felker
@ 2013-01-15  9:01   ` Igmar Palsenberg
  2013-01-15 12:58     ` Rob Landley
  1 sibling, 1 reply; 39+ messages in thread
From: Igmar Palsenberg @ 2013-01-15  9:01 UTC (permalink / raw)
  To: musl



>> Is there a (good) reason for Musl to follow glibc's malloc(0) behaviour ?
> 
> Because not doing so breaks some programs? (Dunno if it still does, but last time I tried switching that config option off in uClibc the linux from scratch build didn't finish.)

It does. On the other hand : Those programs are broken anyway if you ask me.

>> Musl returns a valid pointer, which is fine according
>> to the standard, but returning NULL is also fine.
>> IMHO, returning NULL is better : It usually kills the program if actual storage is
>> attempted.
> 
> It also kills the program if they're checking the return code of malloc() and treating NULL as an allocation failure indicator. NULL has a defined meaning of "failed", when a zero length allocation is trivial to satisfy and not necessarily wrong.

Is that a bad thing in this case ? The reason I like musl is that is a conforming library : Not something that has numerous hacks / additions to make it backwards compatible, such I've seen with glibc. Usually, when I
try the same code on for example FreeBSD, all hell breaks loose. In  my experience, musl actually increase my programs portability.

> Should a zero length file read return -1 instead of 0? Is it the program's job to make sure it never makes a NOP syscall? Does adding special case checks in your code to avoid such NOP calls actually make the code smaller and simpler?
> 
>> You also can't do that if a valid pointer
>> is returned, so I really can't grasp the reason for returning a pointer at all,
> 
> Not indicating that the allocation failed and triggering an assert() when there isn't actually a problem with a legitimately zero length array that had nothing in it? (Both times I debugged why LFS stuff was failing that's what it turned out to be, but I didn't spend too much time on it before just switching the uClibc option on to support it.)

With my own code, I consider it a bug. Yes, opinions differ on this matter. I personally add abort() to code that shouldn't happen, or I consider bad. I do catch real bugs with this from
time to time.

>> except to support buggy and lazy programming.
> 
> You're defining "lazy" here a "not adding a special case in the caller for every use of malloc()". That's certainly a point of view, but I'm not sure that's the word you want to use for it. "Not sufficiently defensive programming" maybe?

Definitely the latter. Error checking is underrated matter if you ask me : People write code, and then spend ages debugging issues that would have been a non-issue when proper error checking has been done.
The "oh well, it never fails, so don't bother with error checking" thought.

> Whether or not defensive programming is an improvement is one of those perpetual arguments having to do with differing mindsets. In reality the reason you'd want to do it is some libc implementations might _not_ allow malloc(0) to succeed, and thus there are portability issues. But only for people who care about portability off glibc, which is an uphill battle in the first place.

Agree. I will stay a battle until the standard only gives you one direction, and "implementation defined" is removed from it. 

> That said, "I'm right because other people might think like I do and that would make me right" seems kind of a circular argument for a position being _better_. If posix required a valid pointer to be returned the positions would be reversed without any actual change in the technical merits.
> 
> And "posix doesn't require this, therefore let's intentional break it to force any programs with technical posix compatability issues to use a different libc rather than change anything to humor us"... not seeing it.

It's a matter of opinion. I just wanted to know the logic behind returning a pointer in this case, while for example the *BSD libc's return a NULL. I handle this case in my code properly, and bluntly abort() if my code calls malloc(0). That's
good with my code, but might be bad with other people's code.

>> I suggest we make malloc(0) return a NULL pointer. Any objections ?
> 
> Only that it broke real world programs last time I tried it, but if you don't care about that then by all means force 'em to use a different libc.
> 
> (Re: the "posix test" thread, having a ./configure --pedantic that builds a musl without gnu/dammit extensions and safety tape over the sharp bits sounds like a possible fun future thing. Having that be the default, or adding a lot of runtime checks rather than commenting stuff out at compile time, not so much...)

I'm happy I can try a different libc easily. Downside : I now have some if defined(__GLIBC__) in my code :(
I probably need some configure.ac magic.



	Igmar

 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: malloc(0) behaviour
  2013-01-15  9:01   ` Igmar Palsenberg
@ 2013-01-15 12:58     ` Rob Landley
  2013-01-15 14:54       ` dladdr() pierre
  0 siblings, 1 reply; 39+ messages in thread
From: Rob Landley @ 2013-01-15 12:58 UTC (permalink / raw)
  To: musl; +Cc: musl

On 01/15/2013 03:01:06 AM, Igmar Palsenberg wrote:
> Agree. I will stay a battle until the standard only gives you one  
> direction, and
> "implementation defined" is removed from it.

No, pragmatists will do the thing that makes more programs work and  
pedants will write code that is never used by anybody. If you consider  
you vs you a battle, have at.

I'm going to stop reading this thread now.

Rob

^ permalink raw reply	[flat|nested] 39+ messages in thread

* dladdr()
  2013-01-15 12:58     ` Rob Landley
@ 2013-01-15 14:54       ` pierre
  2013-01-15 18:48         ` dladdr() Rich Felker
  0 siblings, 1 reply; 39+ messages in thread
From: pierre @ 2013-01-15 14:54 UTC (permalink / raw)
  To: musl

Latest musl-libc dladdr() fails here:

   // glibc:
   // gcc hello.c -o hello -ldl
   //
   // ret:1
   // dli_fname:./hello:0x400000
   // dli_sname:puts:0x4004c0

   // musl:
   // musl-gcc hello.c -o hello
   //
   // ret:0
   // dli_fname:(null):0
   // dli_sname:(null):0

   Dl_info dl = {0};
   int ret = dladdr(puts + 0, &dl);

   printf("ret:%d\n"
           "dli_fname:%s:%p\n"
           "dli_sname:%s:%p\n\n",
           ret,
           dl.dli_fname, dl.dli_fbase,
           dl.dli_sname, dl.dli_saddr);

What is missing to make this work?

Pierre



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-15 14:54       ` dladdr() pierre
@ 2013-01-15 18:48         ` Rich Felker
  2013-01-16 11:00           ` dladdr() pierre
  0 siblings, 1 reply; 39+ messages in thread
From: Rich Felker @ 2013-01-15 18:48 UTC (permalink / raw)
  To: musl, pierre

On Tue, Jan 15, 2013 at 03:54:44PM +0100, pierre wrote:
> Latest musl-libc dladdr() fails here:
> 
>    // glibc:
>    // gcc hello.c -o hello -ldl
>    //
>    // ret:1
>    // dli_fname:./hello:0x400000
>    // dli_sname:puts:0x4004c0
> 
>    // musl:
>    // musl-gcc hello.c -o hello
>    //
>    // ret:0
>    // dli_fname:(null):0
>    // dli_sname:(null):0
> 
>    Dl_info dl = {0};
>    int ret = dladdr(puts + 0, &dl);
> 
>    printf("ret:%d\n"
>            "dli_fname:%s:%p\n"
>            "dli_sname:%s:%p\n\n",
>            ret,
>            dl.dli_fname, dl.dli_fbase,
>            dl.dli_sname, dl.dli_saddr);
> 
> What is missing to make this work?

Are you using static or dynamic linking? If static, dladdr is just a
stub that always fails. It could be implemented to work under some
conditions, but it would be highly dependent on what options you
compile the binary with, since by default static binaries do not
contain the bloat that would be needed to perform introspection.

Rich


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-15 18:48         ` dladdr() Rich Felker
@ 2013-01-16 11:00           ` pierre
  2013-01-16 12:51             ` dladdr() Szabolcs Nagy
  0 siblings, 1 reply; 39+ messages in thread
From: pierre @ 2013-01-16 11:00 UTC (permalink / raw)
  To: musl, Rich Felker

[-- Attachment #1: Type: text/plain, Size: 927 bytes --]

Rich,

> Are you using static or dynamic linking?

Dynamic. Before posting here, I checked the musl source 
code, and the binary:

'objdump -z -d hello' gives: callq 4003e0 <dladdr@plt>

Further, 'musl-gcc -static hello.c -o hello_s' weights
23KB while the shared binary above weights only 7KB.

Finally, my example provided the compilation/linking 
cmd line, which, by default, is dynamic:

   // glibc:
   // gcc hello.c -o hello -ldl

   // musl:
   // musl-gcc hello.c -o hello

I am attaching the source code linked with musl 0.9.8
so you can duplicate this issue without wasting time.

Besides knowing how to enable dladdr() in dynamic mode,
I am interested in knowing what other steps are needed
beyond replacing the #ifdef SHARED by another custom 
define (#ifdef __FORCE_DL) for musl's dl implementation
to do the job in static binaries (with exported symbols).

Thank you for musl, a much-needed alternative.

Pierre

[-- Attachment #2: hello.c --]
[-- Type: text/x-csrc, Size: 685 bytes --]

// ------------------------------
// glibc:
// gcc hello.c -o hello -ldl
//
// ret:1
// dli_fname:./hello:0x400000
// dli_sname:puts:0x4004c0
// ------------------------------
// musl:
// musl-gcc hello.c -o hello
//
// ret:0
// dli_fname:(null):0
// dli_sname:(null):0
// ------------------------------
#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

int main(void)
{
   puts("\nhello world!\n");

   Dl_info dl = {0};
   int ret = dladdr(puts + 0, &dl);

   printf("ret:%d\n"
           "dli_fname:%s:%p\n"
           "dli_sname:%s:%p\n\n",
           ret,
           dl.dli_fname, dl.dli_fbase,
           dl.dli_sname, dl.dli_saddr);

	return 0;
}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-16 11:00           ` dladdr() pierre
@ 2013-01-16 12:51             ` Szabolcs Nagy
  2013-01-16 14:24               ` dladdr() musl
  0 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2013-01-16 12:51 UTC (permalink / raw)
  To: musl, pierre

* pierre <pierre@silentlife.com> [2013-01-16 12:00:18 +0100]:
> 
>    // glibc:
>    // gcc hello.c -o hello -ldl
> 
>    // musl:
>    // musl-gcc hello.c -o hello
> 
> I am attaching the source code linked with musl 0.9.8
> so you can duplicate this issue without wasting time.
> 
> Besides knowing how to enable dladdr() in dynamic mode,
> I am interested in knowing what other steps are needed
> beyond replacing the #ifdef SHARED by another custom 
> define (#ifdef __FORCE_DL) for musl's dl implementation
> to do the job in static binaries (with exported symbols).
> 

there was a thread about dl* in static binaries, it is
not yet supported and non-trivial to get right

i linked to the most interesting posts at the bottom:
http://port70.net/~nsz/32_dynlink.html

i ran hello in gdb and it seems
 sym->st_shndx==0
for the "puts" symbol, i dont know the semantics of
st_shndx, but the following patch makes hello.c work:

diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
index 935367e..ba0bd8f 100644
--- a/src/ldso/dynlink.c
+++ b/src/ldso/dynlink.c
@@ -1178,7 +1178,7 @@ int __dladdr(void *addr, Dl_info *info)
        }

        for (; nsym; nsym--, sym++) {
-               if (sym->st_shndx && sym->st_value
+               if (/*sym->st_shndx &&*/ sym->st_value
                 && (1<<(sym->st_info&0xf) & OK_TYPES)
                 && (1<<(sym->st_info>>4) & OK_BINDS)) {
                        void *symaddr = p->base + sym->st_value;



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-16 12:51             ` dladdr() Szabolcs Nagy
@ 2013-01-16 14:24               ` musl
  2013-01-16 15:20                 ` dladdr() pierre
  2013-01-16 16:49                 ` dladdr() Rich Felker
  0 siblings, 2 replies; 39+ messages in thread
From: musl @ 2013-01-16 14:24 UTC (permalink / raw)
  To: musl

On 16/01/2013 13:51, Szabolcs Nagy wrote:
> * pierre <pierre@silentlife.com> [2013-01-16 12:00:18 +0100]:
>>
>>    // glibc:
>>    // gcc hello.c -o hello -ldl
>>
>>    // musl:
>>    // musl-gcc hello.c -o hello
>>
>> I am attaching the source code linked with musl 0.9.8
>> so you can duplicate this issue without wasting time.
>>
>> Besides knowing how to enable dladdr() in dynamic mode,
>> I am interested in knowing what other steps are needed
>> beyond replacing the #ifdef SHARED by another custom 
>> define (#ifdef __FORCE_DL) for musl's dl implementation
>> to do the job in static binaries (with exported symbols).
>>
> 
> there was a thread about dl* in static binaries, it is
> not yet supported and non-trivial to get right
> 
> i linked to the most interesting posts at the bottom:
> http://port70.net/~nsz/32_dynlink.html
> 
> i ran hello in gdb and it seems
>  sym->st_shndx==0
> for the "puts" symbol, i dont know the semantics of
> st_shndx, but the following patch makes hello.c work:
> 
> diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
> index 935367e..ba0bd8f 100644
> --- a/src/ldso/dynlink.c
> +++ b/src/ldso/dynlink.c
> @@ -1178,7 +1178,7 @@ int __dladdr(void *addr, Dl_info *info)
>         }
> 
>         for (; nsym; nsym--, sym++) {
> -               if (sym->st_shndx && sym->st_value
> +               if (/*sym->st_shndx &&*/ sym->st_value
>                  && (1<<(sym->st_info&0xf) & OK_TYPES)
>                  && (1<<(sym->st_info>>4) & OK_BINDS)) {
>                         void *symaddr = p->base + sym->st_value;
> 

You're right, sym->st_shndx only tells if the symbol is external
(resolved during relocation process) or internal (defined in the current shared object).

In this example "puts" is part of libc.so and not hello.

We should remove this check.

BTW dli_fbase is still wrong. It should be

	info->dli_fbase = p->map;

and not

	info->dli_fbase = p->base;

Best Regards,

Boris


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-16 14:24               ` dladdr() musl
@ 2013-01-16 15:20                 ` pierre
  2013-01-16 16:49                 ` dladdr() Rich Felker
  1 sibling, 0 replies; 39+ messages in thread
From: pierre @ 2013-01-16 15:20 UTC (permalink / raw)
  To: musl

Thank you for the relevant reply, Boris and Nagy.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-16 14:24               ` dladdr() musl
  2013-01-16 15:20                 ` dladdr() pierre
@ 2013-01-16 16:49                 ` Rich Felker
  2013-01-16 17:42                   ` dladdr() musl
  1 sibling, 1 reply; 39+ messages in thread
From: Rich Felker @ 2013-01-16 16:49 UTC (permalink / raw)
  To: musl

On Wed, Jan 16, 2013 at 03:24:24PM +0100, musl wrote:
> > i ran hello in gdb and it seems
> >  sym->st_shndx==0
> > for the "puts" symbol, i dont know the semantics of
> > st_shndx, but the following patch makes hello.c work:
> > 
> > diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
> > index 935367e..ba0bd8f 100644
> > --- a/src/ldso/dynlink.c
> > +++ b/src/ldso/dynlink.c
> > @@ -1178,7 +1178,7 @@ int __dladdr(void *addr, Dl_info *info)
> >         }
> > 
> >         for (; nsym; nsym--, sym++) {
> > -               if (sym->st_shndx && sym->st_value
> > +               if (/*sym->st_shndx &&*/ sym->st_value
> >                  && (1<<(sym->st_info&0xf) & OK_TYPES)
> >                  && (1<<(sym->st_info>>4) & OK_BINDS)) {
> >                         void *symaddr = p->base + sym->st_value;
> > 
> 
> You're right, sym->st_shndx only tells if the symbol is external
> (resolved during relocation process) or internal (defined in the
> current shared object).

st_shndx==0 is used for PLT entries. Oddly, I get (for hello.c):

Symbol table '.dynsym' contains 9 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000     0 FUNC    GLOBAL DEFAULT  UND printf
     2: 080482c0     0 FUNC    GLOBAL DEFAULT  UND puts
     3: 00000000     0 FUNC    GLOBAL DEFAULT  UND dladdr

So the symbol value (the address of the PLT entry) seems to be filled
in only for symbols which are also used to take the address of a
function. Other symbols in the PLT (which are used only for resolving
the PLT entry at load time) lack addresses. I suspect this could
prevent correct dladdr behavior in some cases, but I can't think of a
serious usage case that would be broken right off.

Anyway, I'm applying nsz's patch (without the commented code :) which
should make the situation much better (hopefully as good as glibc's).

> BTW dli_fbase is still wrong. It should be
> 
> 	info->dli_fbase = p->map;
> 
> and not
> 
> 	info->dli_fbase = p->base;

I notice it disagrees with glibc, but I'm not sure I agree it's wrong.
The man page states that dli_fbase is the "Address at which shared
object is loaded", which is never clearly defined. On the other hand,
dl_iterate_phdr's dlpi_addr is specified to contain the "Base address
of object", which is defined below as "the difference between the
virtual memory address of the shared object and the offset of that
object in the file from which it was loaded". To me, this seems like
the main/only "load address" that would be of interest to a program.
However, there's also an interest in matching historical practice,
especially since dladdr is not a standard function and the existing
implementations could be seen as the "real" specification. Do you know
what other systems like BSD do?

Rich

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-16 16:49                 ` dladdr() Rich Felker
@ 2013-01-16 17:42                   ` musl
  2013-01-21  2:03                     ` dladdr() Rich Felker
  0 siblings, 1 reply; 39+ messages in thread
From: musl @ 2013-01-16 17:42 UTC (permalink / raw)
  To: musl

On 16/01/2013 17:49, Rich Felker wrote:
> On Wed, Jan 16, 2013 at 03:24:24PM +0100, musl wrote:
>>> i ran hello in gdb and it seems
>>>  sym->st_shndx==0
>>> for the "puts" symbol, i dont know the semantics of
>>> st_shndx, but the following patch makes hello.c work:
>>>
>>> diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
>>> index 935367e..ba0bd8f 100644
>>> --- a/src/ldso/dynlink.c
>>> +++ b/src/ldso/dynlink.c
>>> @@ -1178,7 +1178,7 @@ int __dladdr(void *addr, Dl_info *info)
>>>         }
>>>
>>>         for (; nsym; nsym--, sym++) {
>>> -               if (sym->st_shndx && sym->st_value
>>> +               if (/*sym->st_shndx &&*/ sym->st_value
>>>                  && (1<<(sym->st_info&0xf) & OK_TYPES)
>>>                  && (1<<(sym->st_info>>4) & OK_BINDS)) {
>>>                         void *symaddr = p->base + sym->st_value;
>>>
>>
>> You're right, sym->st_shndx only tells if the symbol is external
>> (resolved during relocation process) or internal (defined in the
>> current shared object).
> 
> st_shndx==0 is used for PLT entries. Oddly, I get (for hello.c):
> 
> Symbol table '.dynsym' contains 9 entries:
>    Num:    Value  Size Type    Bind   Vis      Ndx Name
>      0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
>      1: 00000000     0 FUNC    GLOBAL DEFAULT  UND printf
>      2: 080482c0     0 FUNC    GLOBAL DEFAULT  UND puts
>      3: 00000000     0 FUNC    GLOBAL DEFAULT  UND dladdr
> 
> So the symbol value (the address of the PLT entry) seems to be filled
> in only for symbols which are also used to take the address of a
> function. Other symbols in the PLT (which are used only for resolving
> the PLT entry at load time) lack addresses. I suspect this could
> prevent correct dladdr behavior in some cases, but I can't think of a
> serious usage case that would be broken right off.
> 
> Anyway, I'm applying nsz's patch (without the commented code :) which
> should make the situation much better (hopefully as good as glibc's).
> 
>> BTW dli_fbase is still wrong. It should be
>>
>> 	info->dli_fbase = p->map;
>>
>> and not
>>
>> 	info->dli_fbase = p->base;
> 
> I notice it disagrees with glibc, but I'm not sure I agree it's wrong.
> The man page states that dli_fbase is the "Address at which shared
> object is loaded", which is never clearly defined. On the other hand,
> dl_iterate_phdr's dlpi_addr is specified to contain the "Base address
> of object", which is defined below as "the difference between the
> virtual memory address of the shared object and the offset of that
> object in the file from which it was loaded". To me, this seems like
> the main/only "load address" that would be of interest to a program.
> However, there's also an interest in matching historical practice,
> especially since dladdr is not a standard function and the existing
> implementations could be seen as the "real" specification. Do you know
> what other systems like BSD do?


freebsd dladdr man pages defines the dli_fbase field as :
  "The base address at which the shared object is
   mapped into the address space of the calling
   process"

see:
  http://www.unix.com/man-page/FreeBSD/3/dladdr/



> 
> Rich
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-16 17:42                   ` dladdr() musl
@ 2013-01-21  2:03                     ` Rich Felker
  2013-01-21  6:58                       ` dladdr() pierre
  0 siblings, 1 reply; 39+ messages in thread
From: Rich Felker @ 2013-01-21  2:03 UTC (permalink / raw)
  To: musl

On Wed, Jan 16, 2013 at 06:42:48PM +0100, musl wrote:
> > implementations could be seen as the "real" specification. Do you know
> > what other systems like BSD do?
> 
> 
> freebsd dladdr man pages defines the dli_fbase field as :
>   "The base address at which the shared object is
>    mapped into the address space of the calling
>    process"
> 
> see:
>   http://www.unix.com/man-page/FreeBSD/3/dladdr/

Do you know what they _do_, though? Is the fbase address of the main
program 0 or the lowest virtual address of the mapping?

BTW is dladdr otherwise working correctly for you now?

Rich


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-21  2:03                     ` dladdr() Rich Felker
@ 2013-01-21  6:58                       ` pierre
  2013-01-21 18:35                         ` dladdr() Rich Felker
  0 siblings, 1 reply; 39+ messages in thread
From: pierre @ 2013-01-21  6:58 UTC (permalink / raw)
  To: musl

> is dladdr otherwise working correctly for you now?

In the simple example I provided, it always works.
I have added loaded modules with conflicting symbols
and dladdr() also works as expected.

But for my main application (which populates backtraces
with dladdr), musl dladdr() works in DEBUG mode and 
crashes in RELEASE mode. I did not have time to isolate
the problem in a reproducible code snippet but I will
let you know when I have done that.

Knowing that dladdr() crashes should help you anyway:
it might be related to searching addresses that are 
out of the specified module scope (stack smashing?), or
to the ability to be resilient to introspecting itself 
(hence the success of the musl DEBUG code, as well as 
of the bulky GLIBC).

I don't use BSD so I can't tell what they _do_.

I appreciate musl and intend to contribute, especially
in the areas where my application is using libc in 
cornersome areas.

I also find the idea of a musl-based Linux distro rather
attractive as it will immensely reduce the surface of
uncertainty.

Pierre

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-21  6:58                       ` dladdr() pierre
@ 2013-01-21 18:35                         ` Rich Felker
  2013-01-22  6:27                           ` dladdr() pierre
  0 siblings, 1 reply; 39+ messages in thread
From: Rich Felker @ 2013-01-21 18:35 UTC (permalink / raw)
  To: musl, pierre

On Mon, Jan 21, 2013 at 07:58:20AM +0100, pierre wrote:
> 
> > is dladdr otherwise working correctly for you now?
> 
> In the simple example I provided, it always works.
> I have added loaded modules with conflicting symbols
> and dladdr() also works as expected.
> 
> But for my main application (which populates backtraces
> with dladdr), musl dladdr() works in DEBUG mode and 
> crashes in RELEASE mode. I did not have time to isolate
> the problem in a reproducible code snippet but I will
> let you know when I have done that.

The only non-static memory access dladdr makes are:

1. Reading the map ranges for each loaded DSO from the DSO list.

2. Reading the hash table pointers for the matching DSO (if one was
found), and:

2a. If there's a sysv hash table, just reading the number of symbols
directly from it.

2b. If there's only a gnu hash table, reading the gnu hash table and
counting the number of symbols.

3. Reading the locations of the symbol and string tables from the DSO
record.

4. Reading each symbol table entry to find the best match for the
argument.

5. Storing results to the caller-provided Dl_info buffer.

I don't see how any of these could crash without memory already having
been corrupted. Do you have a test case I could run, or have you tried
building libc with -g and examining the crash in gdb to see where it
happens?

Rich

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-21 18:35                         ` dladdr() Rich Felker
@ 2013-01-22  6:27                           ` pierre
  2013-01-22 13:07                             ` dladdr() Szabolcs Nagy
  0 siblings, 1 reply; 39+ messages in thread
From: pierre @ 2013-01-22  6:27 UTC (permalink / raw)
  To: musl, Rich Felker

Rich,

> I don't see how any of these could crash without
> memory already having been corrupted.

That's what I wrote previously: my code populates 
backtraces which could have been damaged by stack
smashing.

Backtraces are used to identify the cause of a crash,
which, by definition, may not be clean.

The information I wanted you guys to have is that, 
for the same application code, both GLIBC and musl 
compiled in DEBUG mode do not crash (while musl in
release mode crashes).

> Do you have a test case I could run, 

In my previous email (the one you are answering) 
I wrote: 

  "I did not have time to isolate the problem in a
   reproducible code snippet but I will let you 
   know when I have done that."

> have you tried building libc with -g and examining
> the crash in gdb to see where it happens?

In my previous email (the one you are answering) 
I wrote: 

  "for my main application (which populates backtraces
   with dladdr), musl dladdr() works in DEBUG mode and
   crashes in RELEASE mode."

The musl "DEBUG mode" tested here used -g2 and it did
not crash.

I am not trying to argue with you - there's no point
at doing that - I was merely providing the information
I had available when you asked "does it work for you?".

Pierre

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-22  6:27                           ` dladdr() pierre
@ 2013-01-22 13:07                             ` Szabolcs Nagy
  2013-01-22 13:40                               ` dladdr() pierre
  0 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2013-01-22 13:07 UTC (permalink / raw)
  To: musl, pierre

* pierre <pierre@silentlife.com> [2013-01-22 07:27:46 +0100]:
> The information I wanted you guys to have is that, 
> for the same application code, both GLIBC and musl 
> compiled in DEBUG mode do not crash (while musl in
> release mode crashes).
...
> at doing that - I was merely providing the information
> I had available when you asked "does it work for you?".

i'd assume you invoke ub somewhere (and i think that's
what dalias was trying to say)

but it is desirable to verify that this is really not a
musl bug and we need better info for that:

instead of 'debug mode' vs 'release mode' the config.mak
with compiler flags/architecture/.. for the two cases

instead of 'crash' the details of the caught signal,
backtrace, memory map etc

a link to the source code or a minimal example can help too

i'm not sure how populating backtrace with dladdr works,
sounds suspicious,
how do you get your backtrace?


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-22 13:07                             ` dladdr() Szabolcs Nagy
@ 2013-01-22 13:40                               ` pierre
  2013-01-22 13:51                                 ` dladdr() Rich Felker
  0 siblings, 1 reply; 39+ messages in thread
From: pierre @ 2013-01-22 13:40 UTC (permalink / raw)
  To: musl, Szabolcs Nagy

Nagy,

> instead of 'debug mode' vs 'release mode' the 
> config.mak with compiler flags/architecture/.. 
> for the two cases
> instead of 'crash' the details of the caught 
> signal, backtrace, memory map etc
> a link to the source code or a minimal example 
> can help too

Since I already said that I would provide this 
information later (I also have a job) I wonder
what the purpose of these repeated calls is.

PLEASE READ THE WHOLE THREAT before adding your
grain of salt.

> i'm not sure how populating backtrace with 
> dladdr works

By finding the symbols that match each IP
(Instruction Pointer) address.

> sounds suspicious

I am impressed by the value of this argument.

> how do you get your backtrace?

By following stack frames. I am curious to learn 
what better way you will find less "suspicious".

Pierre

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-22 13:40                               ` dladdr() pierre
@ 2013-01-22 13:51                                 ` Rich Felker
  2013-01-22 14:59                                   ` dladdr() pierre
  0 siblings, 1 reply; 39+ messages in thread
From: Rich Felker @ 2013-01-22 13:51 UTC (permalink / raw)
  To: musl

On Tue, Jan 22, 2013 at 02:40:44PM +0100, pierre wrote:
> Nagy,
> 
> > instead of 'debug mode' vs 'release mode' the 
> > config.mak with compiler flags/architecture/.. 
> > for the two cases
> > instead of 'crash' the details of the caught 
> > signal, backtrace, memory map etc
> > a link to the source code or a minimal example 
> > can help too
> 
> Since I already said that I would provide this 
> information later (I also have a job) I wonder
> what the purpose of these repeated calls is.
> 
> PLEASE READ THE WHOLE THREAT before adding your
> grain of salt.

Apparently one or more of us missed a few details in the previous
emails when replying, but neither of us is out to attack you or
anything like that. I just wanted to understand whether there was some
detail I was missing that might be a bug in musl, or whether the crash
was caused by things outside our control.

> > i'm not sure how populating backtrace with 
> > dladdr works
> 
> By finding the symbols that match each IP
> (Instruction Pointer) address.
> 
> > sounds suspicious
> 
> I am impressed by the value of this argument.
> 
> > how do you get your backtrace?
> 
> By following stack frames. I am curious to learn 
> what better way you will find less "suspicious".

I think his comment was that, in the event the stack has been smashed,
following the stack frames is likely (almost certain) to lead you to
destinations

Also, depending on what you're doing, it might not even be valid.
Since DWARF2 is used for debugging nowadays, musl is normally built
with -fomit-frame-pointer, so there won't be frame pointers to help
you follow the stack frames out of libc functions, in case the crash
happened inside a libc function. There also won't be DWARF2
information except in debug builds, and even if it is there, it's
normally (unless you change CFLAGS) just available to the debugger
rather than mapped into the program for introspection purposes.

Anyway, my purpose, and I believe nsz's purpose, in asking questions
was so that we could both (1) better evaluate whether there might be a
bug in musl related to this report, and (2) give you better
information about musl that might help you determine why your code
isn't doing what you want it to.

Rich

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-22 13:51                                 ` dladdr() Rich Felker
@ 2013-01-22 14:59                                   ` pierre
  2013-01-22 16:11                                     ` dladdr() Szabolcs Nagy
  0 siblings, 1 reply; 39+ messages in thread
From: pierre @ 2013-01-22 14:59 UTC (permalink / raw)
  To: musl

Rich,

> in the event the stack has been smashed,
> following the stack frames is likely 
> (almost certain) to lead you to [bad] 
> destinations [...] that might not even 
> be valid

That's precisely what I wrote in that email
that nobody seems to have read.

And I added, just in case it could help, that 
dladdr should not crash when trying to lookup
invalid addresses.

> there won't be frame pointers to help you 
> follow the stack frames out of libc functions

For mere mortals, knowing that it's in libc
is enough, as they then will check what junk
was given to libc (and then will write a 
workaround if they feel that libc is buggy).

For those with a stronger motivation, helping
to strengthen a decent libc makes a lot more
sense.

I have no other motivation when I invest some 
of my time here.

Pierre.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-22 14:59                                   ` dladdr() pierre
@ 2013-01-22 16:11                                     ` Szabolcs Nagy
  2013-01-22 23:43                                       ` dladdr() Rich Felker
  0 siblings, 1 reply; 39+ messages in thread
From: Szabolcs Nagy @ 2013-01-22 16:11 UTC (permalink / raw)
  To: musl, pierre

* pierre <pierre@silentlife.com> [2013-01-22 15:59:46 +0100]:
> > in the event the stack has been smashed,
> > following the stack frames is likely 
> > (almost certain) to lead you to [bad] 
> > destinations [...] that might not even 
> > be valid
> 
> That's precisely what I wrote in that email
> that nobody seems to have read.
> 

i can assure you that at least i read your mail

we know that you think that the stack is smashed
but you found it important to mention that in debug
mode it works while it crashes in release mode
(whatever those modes mean)

what you do is suspicious because if you get the
backtrace in a running program then it's most
likely invalid
(and i thought we could help you before you spend
too much time on inherently invalid design and
even give an explanation why it worked in debug
mode)

i dont think anyone tried to argue with you
or offend you in any way

> And I added, just in case it could help, that 
> dladdr should not crash when trying to lookup
> invalid addresses.

addr is never dereferenced in musl's code
so invalid addr is not an issue

> > there won't be frame pointers to help you 
> > follow the stack frames out of libc functions
> 
> For mere mortals, knowing that it's in libc
> is enough, as they then will check what junk
> was given to libc (and then will write a 
> workaround if they feel that libc is buggy).
> 
> For those with a stronger motivation, helping
> to strengthen a decent libc makes a lot more
> sense.
> 
> I have no other motivation when I invest some 
> of my time here.

i honestly dont understand what you are trying
to say

but i'm glad that you found musl useful whatever
you are doing


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: dladdr()
  2013-01-22 16:11                                     ` dladdr() Szabolcs Nagy
@ 2013-01-22 23:43                                       ` Rich Felker
  0 siblings, 0 replies; 39+ messages in thread
From: Rich Felker @ 2013-01-22 23:43 UTC (permalink / raw)
  To: musl

On Tue, Jan 22, 2013 at 05:11:42PM +0100, Szabolcs Nagy wrote:
> * pierre <pierre@silentlife.com> [2013-01-22 15:59:46 +0100]:
> > > in the event the stack has been smashed,
> > > following the stack frames is likely 
> > > (almost certain) to lead you to [bad] 
> > > destinations [...] that might not even 
> > > be valid
> > 
> > That's precisely what I wrote in that email
> > that nobody seems to have read.
> > 
> 
> i can assure you that at least i read your mail
> 
> we know that you think that the stack is smashed
> but you found it important to mention that in debug
> mode it works while it crashes in release mode
> (whatever those modes mean)

If the crash is occurring in dladdr, more than just the stack must be
smashed, since dladdr does not access any pointers that would be
located on the stack. As far as I can tell, in order for it to crash,
the linked list of loaded DSOs must be corrupted. It's possible that
this is more likely with musl than with glibc because DSO records
typically will get allocated in the otherwise-unused slack space at
the beginning/end of DSO .data segments, where overflows of static
objects could run into the DSO records.

> > And I added, just in case it could help, that 
> > dladdr should not crash when trying to lookup
> > invalid addresses.
> 
> addr is never dereferenced in musl's code
> so invalid addr is not an issue

Indeed, that's the question I was trying to check on to make sure we
weren't doing something wrong.

Rich

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2013-01-22 23:43 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-14 17:17 malloc(0) behaviour Igmar Palsenberg
2013-01-14 18:05 ` Rich Felker
2013-01-14 22:22   ` Strake
2013-01-14 23:05     ` Rich Felker
2013-01-15  8:32     ` Igmar Palsenberg
2013-01-15 12:53       ` Rob Landley
2013-01-15 22:18         ` Igmar Palsenberg
2013-01-15  8:31   ` Igmar Palsenberg
2013-01-15 11:06     ` Szabolcs Nagy
2013-01-15 12:33       ` Igmar Palsenberg
2013-01-15 13:48         ` Rich Felker
2013-01-15 22:28           ` Igmar Palsenberg
2013-01-15 23:22             ` Rob
2013-01-16  7:46               ` Igmar Palsenberg
2013-01-15 13:46       ` Rich Felker
2013-01-15 12:52     ` Rob Landley
2013-01-14 23:37 ` Rob Landley
2013-01-15  0:24   ` Rich Felker
2013-01-15 12:17     ` Rob Landley
2013-01-15  9:01   ` Igmar Palsenberg
2013-01-15 12:58     ` Rob Landley
2013-01-15 14:54       ` dladdr() pierre
2013-01-15 18:48         ` dladdr() Rich Felker
2013-01-16 11:00           ` dladdr() pierre
2013-01-16 12:51             ` dladdr() Szabolcs Nagy
2013-01-16 14:24               ` dladdr() musl
2013-01-16 15:20                 ` dladdr() pierre
2013-01-16 16:49                 ` dladdr() Rich Felker
2013-01-16 17:42                   ` dladdr() musl
2013-01-21  2:03                     ` dladdr() Rich Felker
2013-01-21  6:58                       ` dladdr() pierre
2013-01-21 18:35                         ` dladdr() Rich Felker
2013-01-22  6:27                           ` dladdr() pierre
2013-01-22 13:07                             ` dladdr() Szabolcs Nagy
2013-01-22 13:40                               ` dladdr() pierre
2013-01-22 13:51                                 ` dladdr() Rich Felker
2013-01-22 14:59                                   ` dladdr() pierre
2013-01-22 16:11                                     ` dladdr() Szabolcs Nagy
2013-01-22 23:43                                       ` dladdr() Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).