mailing list of musl libc
 help / color / mirror / code / Atom feed
* dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
@ 2016-02-16 21:30 Hugues Bruant
  2016-02-16 21:55 ` Szabolcs Nagy
  0 siblings, 1 reply; 17+ messages in thread
From: Hugues Bruant @ 2016-02-16 21:30 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2656 bytes --]

Affects both 1.1.12 and 1.1.13

Tracked down with valgrind in Alpine Linux 3.3.

The dmg tool build from https://github.com/aerofs/libdmg-hfsplus links to a
handful shared libs. The following message is seen immediately at start:

==59== Invalid free() / delete / delete[] / realloc()
==59==    at 0x4C92B0E: free (vg_replace_malloc.c:530)
==59==    by 0x4056F68: reclaim_gaps (dynlink.c:488)
==59==    by 0x405743D: map_library (dynlink.c:708)
==59==    by 0x4057EF3: load_library (dynlink.c:1014)
==59==    by 0x4058CA8: load_preload (dynlink.c:1112)
==59==    by 0x4058CA8: __dls3 (dynlink.c:1581)
==59==    by 0x405856A: __dls2 (dynlink.c:1383)
==59==    by 0x405655E: ??? (in /lib/ld-musl-x86_64.so.1)
==59==    by 0x3: ???
==59==    by 0xFFF000E3A: ???
==59==    by 0xFFF000E3E: ???
==59==    by 0xFFF000E44: ???
==59==    by 0xFFF000E86: ???

Afterwards, the program proceeds with no issue, until it exists, at which
point a segfault is triggered when cleaning up shared libraries:

==38== Invalid read of size 8
==38==    at 0x4057551: decode_vec.constprop.5 (dynlink.c:171)
==38==    by 0x405825C: __libc_exit_fini (dynlink.c:1197)
==38==    by 0x4016233: exit (exit.c:31)
==38==    by 0x401D635: (below main) (__libc_start_main.c:74)
==38==  Address 0x636f8f53ebff9f53 is not stack'd, malloc'd or (recently) free'd
==38==
==38==
==38== Process terminating with default action of signal 11 (SIGSEGV)
==38==  General Protection Fault
==38==    at 0x4057551: decode_vec.constprop.5 (dynlink.c:171)
==38==    by 0x405825C: __libc_exit_fini (dynlink.c:1197)
==38==    by 0x4016233: exit (exit.c:31)
==38==    by 0x401D635: (below main) (__libc_start_main.c:74)
==38==


The first 32 bytes of one of the dso struct are manifestly corrupted.

Patching reclaim_gap as follows fixes the segfault:

diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index 87f3b7f..f897dbd 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -484,9 +484,9 @@ static void reclaim_gaps(struct dso *dso)
        for (; phcnt--; ph=(void *)((char *)ph+dso->phentsize)) {
                if (ph->p_type!=PT_LOAD) continue;
                if ((ph->p_flags&(PF_R|PF_W))!=(PF_R|PF_W)) continue;
-               reclaim(dso, ph->p_vaddr & -PAGE_SIZE, ph->p_vaddr);
-               reclaim(dso, ph->p_vaddr+ph->p_memsz,
-                       ph->p_vaddr+ph->p_memsz+PAGE_SIZE-1 & -PAGE_SIZE);
+               //reclaim(dso, ph->p_vaddr & -PAGE_SIZE, ph->p_vaddr);
+               //reclaim(dso, ph->p_vaddr+ph->p_memsz,
+               //      ph->p_vaddr+ph->p_memsz+PAGE_SIZE-1 & -PAGE_SIZE);
        }
 }


For more details: https://bugs.alpinelinux.org/issues/5123

Regards,
Hugues

[-- Attachment #2: Type: text/html, Size: 3606 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-16 21:30 dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini Hugues Bruant
@ 2016-02-16 21:55 ` Szabolcs Nagy
  2016-02-16 22:02   ` Rich Felker
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Szabolcs Nagy @ 2016-02-16 21:55 UTC (permalink / raw)
  To: musl

* Hugues Bruant <hugues@aerofs.com> [2016-02-16 16:30:42 -0500]:
> Affects both 1.1.12 and 1.1.13
> 
> Tracked down with valgrind in Alpine Linux 3.3.
> 
> The dmg tool build from https://github.com/aerofs/libdmg-hfsplus links to a
> handful shared libs. The following message is seen immediately at start:
> 
> ==59== Invalid free() / delete / delete[] / realloc()
> ==59==    at 0x4C92B0E: free (vg_replace_malloc.c:530)
> ==59==    by 0x4056F68: reclaim_gaps (dynlink.c:488)
> ==59==    by 0x405743D: map_library (dynlink.c:708)
> ==59==    by 0x4057EF3: load_library (dynlink.c:1014)
> ==59==    by 0x4058CA8: load_preload (dynlink.c:1112)
> ==59==    by 0x4058CA8: __dls3 (dynlink.c:1581)
> ==59==    by 0x405856A: __dls2 (dynlink.c:1383)
> ==59==    by 0x405655E: ??? (in /lib/ld-musl-x86_64.so.1)
> ==59==    by 0x3: ???
> ==59==    by 0xFFF000E3A: ???
> ==59==    by 0xFFF000E3E: ???
> ==59==    by 0xFFF000E44: ???
> ==59==    by 0xFFF000E86: ???
> 
> Afterwards, the program proceeds with no issue, until it exists, at which
> point a segfault is triggered when cleaning up shared libraries:
> 

this is not a bug.

valgrind is not aware of dynamic linker internals,
you have to use a musl specific suppression file
to hide this message (but i dont know if anybody
wrote such thing for valgrind).

> ==38== Invalid read of size 8
> ==38==    at 0x4057551: decode_vec.constprop.5 (dynlink.c:171)
> ==38==    by 0x405825C: __libc_exit_fini (dynlink.c:1197)
> ==38==    by 0x4016233: exit (exit.c:31)
> ==38==    by 0x401D635: (below main) (__libc_start_main.c:74)
> ==38==  Address 0x636f8f53ebff9f53 is not stack'd, malloc'd or (recently) free'd
> ==38==
> ==38==
> ==38== Process terminating with default action of signal 11 (SIGSEGV)
> ==38==  General Protection Fault
> ==38==    at 0x4057551: decode_vec.constprop.5 (dynlink.c:171)
> ==38==    by 0x405825C: __libc_exit_fini (dynlink.c:1197)
> ==38==    by 0x4016233: exit (exit.c:31)
> ==38==    by 0x401D635: (below main) (__libc_start_main.c:74)
> ==38==
> 
> 
> The first 32 bytes of one of the dso struct are manifestly corrupted.
> 
> Patching reclaim_gap as follows fixes the segfault:
> 

there is a corruption, but it is not caused by
reclaim gaps, removing it just hides the problem
(becaues then early mallocs happen to go elsewhere
where they are not corrupted).

needs more investigation.

> diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> index 87f3b7f..f897dbd 100644
> --- a/ldso/dynlink.c
> +++ b/ldso/dynlink.c
> @@ -484,9 +484,9 @@ static void reclaim_gaps(struct dso *dso)
>         for (; phcnt--; ph=(void *)((char *)ph+dso->phentsize)) {
>                 if (ph->p_type!=PT_LOAD) continue;
>                 if ((ph->p_flags&(PF_R|PF_W))!=(PF_R|PF_W)) continue;
> -               reclaim(dso, ph->p_vaddr & -PAGE_SIZE, ph->p_vaddr);
> -               reclaim(dso, ph->p_vaddr+ph->p_memsz,
> -                       ph->p_vaddr+ph->p_memsz+PAGE_SIZE-1 & -PAGE_SIZE);
> +               //reclaim(dso, ph->p_vaddr & -PAGE_SIZE, ph->p_vaddr);
> +               //reclaim(dso, ph->p_vaddr+ph->p_memsz,
> +               //      ph->p_vaddr+ph->p_memsz+PAGE_SIZE-1 & -PAGE_SIZE);
>         }
>  }
> 
> 
> For more details: https://bugs.alpinelinux.org/issues/5123
> 
> Regards,
> Hugues


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-16 21:55 ` Szabolcs Nagy
@ 2016-02-16 22:02   ` Rich Felker
  2016-02-17  0:05   ` Hugues Bruant
  2016-02-17  7:03   ` Timo Teras
  2 siblings, 0 replies; 17+ messages in thread
From: Rich Felker @ 2016-02-16 22:02 UTC (permalink / raw)
  To: musl

On Tue, Feb 16, 2016 at 10:55:50PM +0100, Szabolcs Nagy wrote:
> * Hugues Bruant <hugues@aerofs.com> [2016-02-16 16:30:42 -0500]:
> > Affects both 1.1.12 and 1.1.13
> > 
> > Tracked down with valgrind in Alpine Linux 3.3.
> > 
> > The dmg tool build from https://github.com/aerofs/libdmg-hfsplus links to a
> > handful shared libs. The following message is seen immediately at start:
> > 
> > ==59== Invalid free() / delete / delete[] / realloc()
> > ==59==    at 0x4C92B0E: free (vg_replace_malloc.c:530)
> > ==59==    by 0x4056F68: reclaim_gaps (dynlink.c:488)
> > ==59==    by 0x405743D: map_library (dynlink.c:708)
> > ==59==    by 0x4057EF3: load_library (dynlink.c:1014)
> > ==59==    by 0x4058CA8: load_preload (dynlink.c:1112)
> > ==59==    by 0x4058CA8: __dls3 (dynlink.c:1581)
> > ==59==    by 0x405856A: __dls2 (dynlink.c:1383)
> > ==59==    by 0x405655E: ??? (in /lib/ld-musl-x86_64.so.1)
> > ==59==    by 0x3: ???
> > ==59==    by 0xFFF000E3A: ???
> > ==59==    by 0xFFF000E3E: ???
> > ==59==    by 0xFFF000E44: ???
> > ==59==    by 0xFFF000E86: ???
> > 
> > Afterwards, the program proceeds with no issue, until it exists, at which
> > point a segfault is triggered when cleaning up shared libraries:
> > 
> 
> this is not a bug.

Well the crash is a bug, but it's not clear what the source of the bug
is.

> valgrind is not aware of dynamic linker internals,
> you have to use a musl specific suppression file
> to hide this message (but i dont know if anybody
> wrote such thing for valgrind).

Indeed, I'm not sure either.

I've commented with some further ideas on the bug tracker.

Rich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-16 21:55 ` Szabolcs Nagy
  2016-02-16 22:02   ` Rich Felker
@ 2016-02-17  0:05   ` Hugues Bruant
  2016-02-17  0:21     ` Rich Felker
  2016-02-17  7:03   ` Timo Teras
  2 siblings, 1 reply; 17+ messages in thread
From: Hugues Bruant @ 2016-02-17  0:05 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1775 bytes --]

> > ==59== Invalid free() / delete / delete[] / realloc()
> > ==59==    at 0x4C92B0E: free (vg_replace_malloc.c:530)
> > ==59==    by 0x4056F68: reclaim_gaps (dynlink.c:488)
> > ==59==    by 0x405743D: map_library (dynlink.c:708)
> > ==59==    by 0x4057EF3: load_library (dynlink.c:1014)
> > ==59==    by 0x4058CA8: load_preload (dynlink.c:1112)
> > ==59==    by 0x4058CA8: __dls3 (dynlink.c:1581)
> > ==59==    by 0x405856A: __dls2 (dynlink.c:1383)
> > ==59==    by 0x405655E: ??? (in /lib/ld-musl-x86_64.so.1)
> > ==59==    by 0x3: ???
> > ==59==    by 0xFFF000E3A: ???
> > ==59==    by 0xFFF000E3E: ???
> > ==59==    by 0xFFF000E44: ???
> > ==59==    by 0xFFF000E86: ???
> >
> > Afterwards, the program proceeds with no issue, until it exists, at which
> > point a segfault is triggered when cleaning up shared libraries:
> >
>
> this is not a bug.

How confident of that are you?

Something is reliably overwriting 32 bytes of a dso struct. Valgrind
supposedly catches out-of-bounds writes to heap-allocated arrays so
unless I'm mistaken, the absence of any other errors until the segfault
suggests that there is no out-of-bounds write and the logical conclusion
is that an allocation overlaps with the corrupted dso struct.

The program is not using any threads so if I understand correctly it sohuld
not be negatively affected by the small default stack size. In any case, I
enabled -fstack-protector-all and -fstack-check and this did not reveal any
issue so at this point I'm ruling out stack overflow as the source of the
corruption.

Quite frankly I'm hoping that the root cause is in libdmg-hfsplus because
it would be much easier to fix than musl but the evidence does not point
in that direction.

Any suggestions for further investigation would be appreciated.

[-- Attachment #2: Type: text/html, Size: 2357 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-17  0:05   ` Hugues Bruant
@ 2016-02-17  0:21     ` Rich Felker
  2016-02-17  6:16       ` Hugues Bruant
  0 siblings, 1 reply; 17+ messages in thread
From: Rich Felker @ 2016-02-17  0:21 UTC (permalink / raw)
  To: musl

On Tue, Feb 16, 2016 at 07:05:27PM -0500, Hugues Bruant wrote:
> > > ==59== Invalid free() / delete / delete[] / realloc()
> > > ==59==    at 0x4C92B0E: free (vg_replace_malloc.c:530)
> > > ==59==    by 0x4056F68: reclaim_gaps (dynlink.c:488)
> > > ==59==    by 0x405743D: map_library (dynlink.c:708)
> > > ==59==    by 0x4057EF3: load_library (dynlink.c:1014)
> > > ==59==    by 0x4058CA8: load_preload (dynlink.c:1112)
> > > ==59==    by 0x4058CA8: __dls3 (dynlink.c:1581)
> > > ==59==    by 0x405856A: __dls2 (dynlink.c:1383)
> > > ==59==    by 0x405655E: ??? (in /lib/ld-musl-x86_64.so.1)
> > > ==59==    by 0x3: ???
> > > ==59==    by 0xFFF000E3A: ???
> > > ==59==    by 0xFFF000E3E: ???
> > > ==59==    by 0xFFF000E44: ???
> > > ==59==    by 0xFFF000E86: ???
> > >
> > > Afterwards, the program proceeds with no issue, until it exists, at which
> > > point a segfault is triggered when cleaning up shared libraries:
> > >
> >
> > this is not a bug.
> 
> How confident of that are you?
> 
> Something is reliably overwriting 32 bytes of a dso struct. Valgrind
> supposedly catches out-of-bounds writes to heap-allocated arrays so
> unless I'm mistaken, the absence of any other errors until the segfault
> suggests that there is no out-of-bounds write and the logical conclusion
> is that an allocation overlaps with the corrupted dso struct.
> 
> The program is not using any threads so if I understand correctly it sohuld
> not be negatively affected by the small default stack size. In any case, I
> enabled -fstack-protector-all and -fstack-check and this did not reveal any
> issue so at this point I'm ruling out stack overflow as the source of the
> corruption.
> 
> Quite frankly I'm hoping that the root cause is in libdmg-hfsplus because
> it would be much easier to fix than musl but the evidence does not point
> in that direction.
> 
> Any suggestions for further investigation would be appreciated.

Szabolcs Nagy has been trying to reproduce your crash but it doesn't
seem possible without valid input data for it to process. Could you
provide data that can be used to reproduce the crash?

Rich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-17  0:21     ` Rich Felker
@ 2016-02-17  6:16       ` Hugues Bruant
  2016-02-17  6:27         ` Hugues Bruant
  0 siblings, 1 reply; 17+ messages in thread
From: Hugues Bruant @ 2016-02-17  6:16 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 983 bytes --]

I packaged a small reproducer in a docker image:

docker run huguesb/dmg-musl-crash-repro dmg build /repro.dmg.hfs /repro.dmg

Should successfully create a dmg from the hfs file and return exit code 139
due to the segfault in __libc_exit_fini

The failure is 100% reproducible on all the hosts I've tried so far, which
gives me some confidence that it's not a bad interaction with the kernel:
  - boot2docker 1.10.1 with kernel 4.1.17 / docker 1.10.1 / aufs
  - CoreOS 899.6.0 with kernel 4.3.3 / docker 1.9.1 / overlayfs
  - Ubuntu 14.04 with kernel 3.19.0-43 / docker 1.9.1 / aufs
  - Alpine 3.3.1 with kernel 4.1.15 / docker 1.9.1 / overlayfs

Although my main use case is for this to run inside a container, for
completeness I copied the hfs file out of the docker image on the alpine
host (docker cp). The issue still manifests, which rules out any bad
interaction with docker.

Please let me know if there's anything more I can do.

Regards,
Hugues
​

[-- Attachment #2: Type: text/html, Size: 1181 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-17  6:16       ` Hugues Bruant
@ 2016-02-17  6:27         ` Hugues Bruant
  0 siblings, 0 replies; 17+ messages in thread
From: Hugues Bruant @ 2016-02-17  6:27 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 350 bytes --]

I've haven't dived deeply in the reclaim_gap logic but now I'm curious to
know how it would interact with "creatively packed" ELF files:
http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

I'm not doing anything of the sort but it's worth considering how resilient
the loader code would be to an aggressively optimizing linker.
​

[-- Attachment #2: Type: text/html, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-16 21:55 ` Szabolcs Nagy
  2016-02-16 22:02   ` Rich Felker
  2016-02-17  0:05   ` Hugues Bruant
@ 2016-02-17  7:03   ` Timo Teras
  2016-02-17  9:15     ` Hugues Bruant
                       ` (2 more replies)
  2 siblings, 3 replies; 17+ messages in thread
From: Timo Teras @ 2016-02-17  7:03 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: musl

On Tue, 16 Feb 2016 22:55:50 +0100
Szabolcs Nagy <nsz@port70.net> wrote:

> * Hugues Bruant <hugues@aerofs.com> [2016-02-16 16:30:42 -0500]:
> > Affects both 1.1.12 and 1.1.13
> > 
> > Tracked down with valgrind in Alpine Linux 3.3.
> > 
> > The dmg tool build from https://github.com/aerofs/libdmg-hfsplus
> > links to a handful shared libs. The following message is seen
> > immediately at start:
> > 
> > ==59== Invalid free() / delete / delete[] / realloc()
> > ==59==    at 0x4C92B0E: free (vg_replace_malloc.c:530)
> > ==59==    by 0x4056F68: reclaim_gaps (dynlink.c:488)
> > ==59==    by 0x405743D: map_library (dynlink.c:708)
> > ==59==    by 0x4057EF3: load_library (dynlink.c:1014)
> > ==59==    by 0x4058CA8: load_preload (dynlink.c:1112)
> > ==59==    by 0x4058CA8: __dls3 (dynlink.c:1581)
> > ==59==    by 0x405856A: __dls2 (dynlink.c:1383)
> > ==59==    by 0x405655E: ??? (in /lib/ld-musl-x86_64.so.1)
> > ==59==    by 0x3: ???
> > ==59==    by 0xFFF000E3A: ???
> > ==59==    by 0xFFF000E3E: ???
> > ==59==    by 0xFFF000E44: ???
> > ==59==    by 0xFFF000E86: ???
> > 
> > Afterwards, the program proceeds with no issue, until it exists, at
> > which point a segfault is triggered when cleaning up shared
> > libraries: 
> 
> this is not a bug.

It is compliance issue. POSIX says about free:
--
The free() function shall cause the space pointed to by ptr to be
deallocated; that is, made available for further allocation. If ptr is
a null pointer, no action shall occur. Otherwise, if the argument does
not match a pointer earlier returned by a function in POSIX.1-2008 that
allocates memory as if by malloc(), or if the space has been
deallocated by a call to free() or realloc(), the behavior is undefined.
--

While overloading allocators are not supported, they'd break at this
too. And it'll be highly annoying if someone decides to test a new
memory allocator inside musl and does not know about this one exception.

> valgrind is not aware of dynamic linker internals,
> you have to use a musl specific suppression file
> to hide this message (but i dont know if anybody
> wrote such thing for valgrind).

Well - musl really should introduce __donatemem or similar for this
purpose, and not overload the standard free() function. This would make
the valgrind warning go away.

I'd rather not write a suppression for the above, since the internals
are misusing/overloading a standard api call against posix.

Technically valgrind is detecting a valid case for misuse of free().
While in context of standard musl allocator it's ok.

Thanks,
Timo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-17  7:03   ` Timo Teras
@ 2016-02-17  9:15     ` Hugues Bruant
  2016-02-17 15:25       ` Rich Felker
  2016-02-17 10:19     ` Szabolcs Nagy
  2016-02-17 15:17     ` Markus Wichmann
  2 siblings, 1 reply; 17+ messages in thread
From: Hugues Bruant @ 2016-02-17  9:15 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 719 bytes --]

Finally figured it out:

1. musl is reclaiming space from the executable starting at offset
0x224B20, i.e. at the end of the bss
2. this reclaimed space gets used for the dso struct of the first shared lib
3. the last variable in the bss appears to be scratch space for checksum
computation
4. the code is assuming "unsigned long" to be 4 bytes, which isn't the case
on 64bit platforms
5. the checksum code overflows out of the bss, corrupting the dso struct
6. this issue is masked in a glibc environment because the loader doesn't
make the unused part of the program pages available to malloc.
7. valgrind doesn't catch the problem because it doesn't bound-check globals

Sorry about the noise.

​

[-- Attachment #2: Type: text/html, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-17  7:03   ` Timo Teras
  2016-02-17  9:15     ` Hugues Bruant
@ 2016-02-17 10:19     ` Szabolcs Nagy
  2016-02-18 18:05       ` Szabolcs Nagy
  2016-02-17 15:17     ` Markus Wichmann
  2 siblings, 1 reply; 17+ messages in thread
From: Szabolcs Nagy @ 2016-02-17 10:19 UTC (permalink / raw)
  To: Timo Teras; +Cc: musl

* Timo Teras <timo.teras@iki.fi> [2016-02-17 09:03:27 +0200]:
> On Tue, 16 Feb 2016 22:55:50 +0100
> Szabolcs Nagy <nsz@port70.net> wrote:
> 
> > * Hugues Bruant <hugues@aerofs.com> [2016-02-16 16:30:42 -0500]:
> > > Affects both 1.1.12 and 1.1.13
> > > 
> > > Tracked down with valgrind in Alpine Linux 3.3.
> > > 
> > > The dmg tool build from https://github.com/aerofs/libdmg-hfsplus
> > > links to a handful shared libs. The following message is seen
> > > immediately at start:
> > > 
> > > ==59== Invalid free() / delete / delete[] / realloc()
> > > ==59==    at 0x4C92B0E: free (vg_replace_malloc.c:530)
> > > ==59==    by 0x4056F68: reclaim_gaps (dynlink.c:488)
> > > ==59==    by 0x405743D: map_library (dynlink.c:708)
> > > ==59==    by 0x4057EF3: load_library (dynlink.c:1014)
> > > ==59==    by 0x4058CA8: load_preload (dynlink.c:1112)
> > > ==59==    by 0x4058CA8: __dls3 (dynlink.c:1581)
> > > ==59==    by 0x405856A: __dls2 (dynlink.c:1383)
> > > ==59==    by 0x405655E: ??? (in /lib/ld-musl-x86_64.so.1)
> > > ==59==    by 0x3: ???
> > > ==59==    by 0xFFF000E3A: ???
> > > ==59==    by 0xFFF000E3E: ???
> > > ==59==    by 0xFFF000E44: ???
> > > ==59==    by 0xFFF000E86: ???
> > > 
> > > Afterwards, the program proceeds with no issue, until it exists, at
> > > which point a segfault is triggered when cleaning up shared
> > > libraries: 
> > 
> > this is not a bug.
> 
> It is compliance issue. POSIX says about free:
> --
> The free() function shall cause the space pointed to by ptr to be
> deallocated; that is, made available for further allocation. If ptr is
> a null pointer, no action shall occur. Otherwise, if the argument does
> not match a pointer earlier returned by a function in POSIX.1-2008 that
> allocates memory as if by malloc(), or if the space has been
> deallocated by a call to free() or realloc(), the behavior is undefined.
> --
> 
> While overloading allocators are not supported, they'd break at this
> too. And it'll be highly annoying if someone decides to test a new
> memory allocator inside musl and does not know about this one exception.
> 

interception of malloc/free/... is supported, just not
calls inside the libc (so you can intercept allocations
that happen in user code, but libc internal calls will
keep using libc allocators).

libc internal calls are unobservable on the language
level, so i don't think there is a conformance issue.

valgrind does not use elf preloading to intercept free,
but emulates the instructions, one basic block at a time,
and if it sees a call to 'free' according to its symbol
tables, then it redirects the call to its own free
(which happens to sit in a preloaded module so it is in
'valgrind client space' and runs on the simulated cpu).

note that valgrind does the redirection based on the
called address and not using GOT entries, so it discards
symbol visibility rules.

in principle this could arbitrarily break, but in
practice it happens to work.. except reclaim gaps
in case of musl. (there are a large amount of hacks
in valgrind to make it not spew out errors on glibc
ld.so and early startup code, musl only needs this
one hack.)

> > valgrind is not aware of dynamic linker internals,
> > you have to use a musl specific suppression file
> > to hide this message (but i dont know if anybody
> > wrote such thing for valgrind).
> 
> Well - musl really should introduce __donatemem or similar for this
> purpose, and not overload the standard free() function. This would make
> the valgrind warning go away.
> 
> I'd rather not write a suppression for the above, since the internals
> are misusing/overloading a standard api call against posix.
> 

i would not call it 'overloading the standard free',
this is a libc internal call.

using a different symbol only helps if it has different
address as well (an alias would be still redirected).

to please valgrind the options are

1) have an internal free which valgrind does not know
   about, but public free calls it, so all public calls
   of free would go through an extra indirection.

2) have a copy of the internal logic of free under a
   different name, which means maintenance work and
   code size increase.

3) or have a suppression file.

i think 3) is a reasonable solution.

> Technically valgrind is detecting a valid case for misuse of free().
> While in context of standard musl allocator it's ok.
> 
> Thanks,
> Timo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-17  7:03   ` Timo Teras
  2016-02-17  9:15     ` Hugues Bruant
  2016-02-17 10:19     ` Szabolcs Nagy
@ 2016-02-17 15:17     ` Markus Wichmann
  2 siblings, 0 replies; 17+ messages in thread
From: Markus Wichmann @ 2016-02-17 15:17 UTC (permalink / raw)
  To: musl

On Wed, Feb 17, 2016 at 09:03:27AM +0200, Timo Teras wrote:
> It is compliance issue. POSIX says about free:

Getting ahead of ourselves just a bit here, aren't we?

musl is a libc implementation. As such, it provides an implementation of
malloc(), realloc() and free(). Therefore, it also knows the internals
of its malloc() implementation and can therefore successfully simulate a
malloc() header, just enough so that free() can pick it up.

I don't think superposition applies to this point as well, because the
reference to free() will be resolved internally in the shared lib, and
in the static lib, this code doesn't matter, and even if it did, you
couldn't link you own malloc implementation with it, because it is libc
internal.

> The free() function shall cause the space pointed to by ptr to be
> deallocated; that is, made available for further allocation. If ptr is
> a null pointer, no action shall occur. Otherwise, if the argument does
> not match a pointer earlier returned by a function in POSIX.1-2008 that
> allocates memory as if by malloc(), or if the space has been
> deallocated by a call to free() or realloc(), the behavior is undefined.

"The behavior is undefined" means POSIX does not define the behavior,
and people merely using the interface should avoid this case. But musl
doesn't use this interface, it uses a guarenteed implementation.

> 
> While overloading allocators are not supported, they'd break at this
> too. And it'll be highly annoying if someone decides to test a new
> memory allocator inside musl and does not know about this one exception.
> 

Are you saying that people modifying musl better know what they are
doing? Because yeah, I kinda guessed that.

> Well - musl really should introduce __donatemem or similar for this
> purpose, and not overload the standard free() function. This would make
> the valgrind warning go away.
> 

Aha. Write the code around the bug checker, yeah, that seems sane.

musl isn't overloading anything here, either. It is calling free(), and
that is the same free() users of malloc() should call. It does so with a
pointer to a specially crafted memory area.

> I'd rather not write a suppression for the above, since the internals
> are misusing/overloading a standard api call against posix.
> 

You appear to not understand that musl isn't POSIX. musl is a specific
implementation of a few standards, POSIX being among them. Don't ix up
interface and implementation.

Ciao,
Markus


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-17  9:15     ` Hugues Bruant
@ 2016-02-17 15:25       ` Rich Felker
  2016-02-17 17:34         ` Hugues Bruant
  0 siblings, 1 reply; 17+ messages in thread
From: Rich Felker @ 2016-02-17 15:25 UTC (permalink / raw)
  To: musl

On Wed, Feb 17, 2016 at 04:15:33AM -0500, Hugues Bruant wrote:
> Finally figured it out:
> 
> 1. musl is reclaiming space from the executable starting at offset
> 0x224B20, i.e. at the end of the bss
> 2. this reclaimed space gets used for the dso struct of the first shared lib
> 3. the last variable in the bss appears to be scratch space for checksum
> computation
> 4. the code is assuming "unsigned long" to be 4 bytes, which isn't the case
> on 64bit platforms
> 5. the checksum code overflows out of the bss, corrupting the dso struct
> 6. this issue is masked in a glibc environment because the loader doesn't
> make the unused part of the program pages available to malloc.
> 7. valgrind doesn't catch the problem because it doesn't bound-check globals
> 
> Sorry about the noise.

No problem. Nice find though -- you caught a bug even valgrind
couldn't, and one that probably could have been nasty if the linker
happened to reorder bss differently such that the overflowed object
was no longer at the end. Have you sent a patch upstream?

Rich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-17 15:25       ` Rich Felker
@ 2016-02-17 17:34         ` Hugues Bruant
  2016-02-17 18:14           ` Szabolcs Nagy
  0 siblings, 1 reply; 17+ messages in thread
From: Hugues Bruant @ 2016-02-17 17:34 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 226 bytes --]

> Have you sent a patch upstream?
>
https://github.com/aerofs/libdmg-hfsplus/commit/ab55551b3ebf196a2a8f35a1e8a1fee95a0c5508


On a related note, how should I go about submitting this package for
inclusion in the Alpine repo?

[-- Attachment #2: Type: text/html, Size: 641 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-17 17:34         ` Hugues Bruant
@ 2016-02-17 18:14           ` Szabolcs Nagy
  2016-02-17 18:46             ` Isaac Dunham
  0 siblings, 1 reply; 17+ messages in thread
From: Szabolcs Nagy @ 2016-02-17 18:14 UTC (permalink / raw)
  To: musl

* Hugues Bruant <hugues@aerofs.com> [2016-02-17 12:34:30 -0500]:
> > Have you sent a patch upstream?
> >
> https://github.com/aerofs/libdmg-hfsplus/commit/ab55551b3ebf196a2a8f35a1e8a1fee95a0c5508
> 
> On a related note, how should I go about submitting this package for
> inclusion in the Alpine repo?

use the mailing lists:
http://alpinelinux.org/community/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-17 18:14           ` Szabolcs Nagy
@ 2016-02-17 18:46             ` Isaac Dunham
  0 siblings, 0 replies; 17+ messages in thread
From: Isaac Dunham @ 2016-02-17 18:46 UTC (permalink / raw)
  To: musl

On Wed, Feb 17, 2016 at 07:14:30PM +0100, Szabolcs Nagy wrote:
> * Hugues Bruant <hugues@aerofs.com> [2016-02-17 12:34:30 -0500]:
> > > Have you sent a patch upstream?
> > >
> > https://github.com/aerofs/libdmg-hfsplus/commit/ab55551b3ebf196a2a8f35a1e8a1fee95a0c5508
> > 
> > On a related note, how should I go about submitting this package for
> > inclusion in the Alpine repo?
> 
> use the mailing lists:
> http://alpinelinux.org/community/

Specifically, send a patch adding it to testing/ in the aports tree to
the alpine-aports mailing list.
http://wiki.alpinelinux.org/wiki/Creating_an_Alpine_package
is a more complete overview, although the reference to alpine-devel
is out of date. (alpine-aports has Patchwork watching it.)
If you have more questions, ask on that mailing list.

HTH,
Isaac Dunham



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-17 10:19     ` Szabolcs Nagy
@ 2016-02-18 18:05       ` Szabolcs Nagy
  2016-02-18 18:33         ` Rich Felker
  0 siblings, 1 reply; 17+ messages in thread
From: Szabolcs Nagy @ 2016-02-18 18:05 UTC (permalink / raw)
  To: Timo Teras, musl

* Szabolcs Nagy <nsz@port70.net> [2016-02-17 11:19:17 +0100]:
> * Timo Teras <timo.teras@iki.fi> [2016-02-17 09:03:27 +0200]:
> > Well - musl really should introduce __donatemem or similar for this
> > purpose, and not overload the standard free() function. This would make
> > the valgrind warning go away.
> 
> to please valgrind the options are
> 
> 1) have an internal free which valgrind does not know
>    about, but public free calls it, so all public calls
>    of free would go through an extra indirection.
> 
> 2) have a copy of the internal logic of free under a
>    different name, which means maintenance work and
>    code size increase.
> 
> 3) or have a suppression file.
> 
> i think 3) is a reasonable solution.

i looked at this again: i think moving most of reclaim()
function into src/malloc makes sense, so all malloc
internal knowledge is at one place (even if dynlink.c
is the only user of this code).

but i don't see an easy way to do the reclaim without
calling free (so the valgrind problem is not solved,
only code maintenance gets better)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini
  2016-02-18 18:05       ` Szabolcs Nagy
@ 2016-02-18 18:33         ` Rich Felker
  0 siblings, 0 replies; 17+ messages in thread
From: Rich Felker @ 2016-02-18 18:33 UTC (permalink / raw)
  To: musl

On Thu, Feb 18, 2016 at 07:05:13PM +0100, Szabolcs Nagy wrote:
> * Szabolcs Nagy <nsz@port70.net> [2016-02-17 11:19:17 +0100]:
> > * Timo Teras <timo.teras@iki.fi> [2016-02-17 09:03:27 +0200]:
> > > Well - musl really should introduce __donatemem or similar for this
> > > purpose, and not overload the standard free() function. This would make
> > > the valgrind warning go away.
> > 
> > to please valgrind the options are
> > 
> > 1) have an internal free which valgrind does not know
> >    about, but public free calls it, so all public calls
> >    of free would go through an extra indirection.
> > 
> > 2) have a copy of the internal logic of free under a
> >    different name, which means maintenance work and
> >    code size increase.
> > 
> > 3) or have a suppression file.
> > 
> > i think 3) is a reasonable solution.
> 
> i looked at this again: i think moving most of reclaim()
> function into src/malloc makes sense, so all malloc
> internal knowledge is at one place (even if dynlink.c
> is the only user of this code).
> 
> but i don't see an easy way to do the reclaim without
> calling free (so the valgrind problem is not solved,
> only code maintenance gets better)

I think it could be done by making free a wrapper with zero cost. See
how free starts out with:

	if (!p) return;

This could instead be:

	if (p) return do_free(p);
	/* end of function */

and the return statement is just a conditional tail-call jump, same
cost as the conditional branch in the current code.

This would also fix malloc-internal calls to free (which might confuse
valgrind?) and eliminate the useless branch to test for null pointer
when free is called internally from malloc/realloc.

Rich


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-02-18 18:33 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-16 21:30 dynlink.c: bug in reclaim_gaps leading to segfault in __libc_exit_fini Hugues Bruant
2016-02-16 21:55 ` Szabolcs Nagy
2016-02-16 22:02   ` Rich Felker
2016-02-17  0:05   ` Hugues Bruant
2016-02-17  0:21     ` Rich Felker
2016-02-17  6:16       ` Hugues Bruant
2016-02-17  6:27         ` Hugues Bruant
2016-02-17  7:03   ` Timo Teras
2016-02-17  9:15     ` Hugues Bruant
2016-02-17 15:25       ` Rich Felker
2016-02-17 17:34         ` Hugues Bruant
2016-02-17 18:14           ` Szabolcs Nagy
2016-02-17 18:46             ` Isaac Dunham
2016-02-17 10:19     ` Szabolcs Nagy
2016-02-18 18:05       ` Szabolcs Nagy
2016-02-18 18:33         ` Rich Felker
2016-02-17 15:17     ` Markus Wichmann

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).