mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Memory leak issue in multi-threaded program
@ 2020-01-28  5:44 Leesoo Ahn
  2020-01-28 13:29 ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Leesoo Ahn @ 2020-01-28  5:44 UTC (permalink / raw)
  To: musl

Dear musl developers,

Hello!, it seems that musl currently has a memory leak issue in 
multi-threaded program. It occurs in the below situation of latest 
(v1.1.24) source. Also, not only in 32-bits[1], but also 64-bits[2] as 
well.

When a program create and run, at least, two threads or more with 
pthread APIs, VSZ of the program by ps command keeps increasing. But 
here is a weird thing that it is fine 'IF ONLY ONE' pthread is created 
and run.

To confirm the issue in your host machine, please follow the instructions,

0. Clone the musl git and get inside.
1. Build with these options for static build, ./configure 
--prefix=$(pwd)/_build_dir --disable-shared
2. Download the test code[3], then build with the command, 
./_build_dir/bin/musl-gcc ./test.c
3. Run this script, ./a.out &; while [ 1 ]; do { ps aux | grep [a].out | 
grep -v grep; sleep 1; } done

You may figure out that VSZ keeps increasing.

BUT, when I make it to try to allocate memory all the time by kernel 
mmap with this diff[4] as workaround, although it creates more pthreads 
than 2, the issue never happens.

It would be really thankful if you guys could confirm it and find out 
the way to fix the bug.

Thank you in advance and take care.

Best Regards,
Leesoo

----
[1] 32-bits env: https://pastebin.com/xR4PySaM
[2] 64-bits env: https://pastebin.com/stdVQXdE
[3] test code: https://pastebin.com/0s8nmdUv
[4] workaround patch:

diff --git a/src/malloc/malloc.c b/src/malloc/malloc.c
index 9698259..3d39be7 100644
--- a/src/malloc/malloc.c
+++ b/src/malloc/malloc.c
@@ -288,7 +288,11 @@ void *malloc(size_t n)

  	if (adjust_size(&n) < 0) return 0;

+#if 1
+	if ( 1 ) {
+#else
  	if (n > MMAP_THRESHOLD) {
+#endif
  		size_t len = n + OVERHEAD + PAGE_SIZE - 1 & -PAGE_SIZE;
  		char *base = __mmap(0, len, PROT_READ|PROT_WRITE,
  			MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Memory leak issue in multi-threaded program
  2020-01-28  5:44 [musl] Memory leak issue in multi-threaded program Leesoo Ahn
@ 2020-01-28 13:29 ` Rich Felker
  2020-01-29  1:55   ` Leesoo Ahn
  2020-02-05 10:17   ` Leesoo Ahn
  0 siblings, 2 replies; 5+ messages in thread
From: Rich Felker @ 2020-01-28 13:29 UTC (permalink / raw)
  To: Leesoo Ahn; +Cc: musl

On Tue, Jan 28, 2020 at 02:44:07PM +0900, Leesoo Ahn wrote:
> Dear musl developers,
> 
> Hello!, it seems that musl currently has a memory leak issue in
> multi-threaded program. It occurs in the below situation of latest
> (v1.1.24) source. Also, not only in 32-bits[1], but also 64-bits[2]
> as well.
> 
> When a program create and run, at least, two threads or more with
> pthread APIs, VSZ of the program by ps command keeps increasing. But
> here is a weird thing that it is fine 'IF ONLY ONE' pthread is
> created and run.
> 
> To confirm the issue in your host machine, please follow the instructions,
> 
> 0. Clone the musl git and get inside.
> 1. Build with these options for static build, ./configure
> --prefix=$(pwd)/_build_dir --disable-shared
> 2. Download the test code[3], then build with the command,
> ../_build_dir/bin/musl-gcc ./test.c
> 3. Run this script, ./a.out &; while [ 1 ]; do { ps aux | grep
> [a].out | grep -v grep; sleep 1; } done
> 
> You may figure out that VSZ keeps increasing.
> 
> BUT, when I make it to try to allocate memory all the time by kernel
> mmap with this diff[4] as workaround, although it creates more
> pthreads than 2, the issue never happens.
> 
> It would be really thankful if you guys could confirm it and find
> out the way to fix the bug.

This is a known issue described in:

https://www.openwall.com/lists/musl/2018/10/30/2

and likely several times before that, though it was not realized that
people were hitting it in practice (vs it just being theoretical)
until around that time. I posted an experimental mitigation patch last
spring:

https://www.openwall.com/lists/musl/2019/04/12/4

but it's not heavily tested and its impact on performance is
significant. I think it should be ok if you need an immediate fix, but
you should do some testing to make sure. If you go this route, reports
of any problems (or success) would be nice to hear about.

Further work in that direction was not done because it was already
planned that musl's malloc implementation will be replaced, and that
the replacement will solve this and other problems in much better
ways. This is work in progress and is intended for merge in the next
release cycle:

https://www.openwall.com/lists/musl/2019/10/22/3
https://github.com/richfelker/mallocng-draft

Hope this information helps.

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Memory leak issue in multi-threaded program
  2020-01-28 13:29 ` Rich Felker
@ 2020-01-29  1:55   ` Leesoo Ahn
  2020-02-05 10:17   ` Leesoo Ahn
  1 sibling, 0 replies; 5+ messages in thread
From: Leesoo Ahn @ 2020-01-29  1:55 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

Dear Rich,

Thank you for the quick feedback. I am currently taking a look at the 
hotfix patch and do stress testing.

However, I can't wait for the next-gen new malloc implementation!

Cheers,
Leesoo

20. 1. 28. 오후 10:29에 Rich Felker 이(가) 쓴 글:
> On Tue, Jan 28, 2020 at 02:44:07PM +0900, Leesoo Ahn wrote:
>> Dear musl developers,
>>
>> Hello!, it seems that musl currently has a memory leak issue in
>> multi-threaded program. It occurs in the below situation of latest
>> (v1.1.24) source. Also, not only in 32-bits[1], but also 64-bits[2]
>> as well.
>>
>> When a program create and run, at least, two threads or more with
>> pthread APIs, VSZ of the program by ps command keeps increasing. But
>> here is a weird thing that it is fine 'IF ONLY ONE' pthread is
>> created and run.
>>
>> To confirm the issue in your host machine, please follow the instructions,
>>
>> 0. Clone the musl git and get inside.
>> 1. Build with these options for static build, ./configure
>> --prefix=$(pwd)/_build_dir --disable-shared
>> 2. Download the test code[3], then build with the command,
>> ../_build_dir/bin/musl-gcc ./test.c
>> 3. Run this script, ./a.out &; while [ 1 ]; do { ps aux | grep
>> [a].out | grep -v grep; sleep 1; } done
>>
>> You may figure out that VSZ keeps increasing.
>>
>> BUT, when I make it to try to allocate memory all the time by kernel
>> mmap with this diff[4] as workaround, although it creates more
>> pthreads than 2, the issue never happens.
>>
>> It would be really thankful if you guys could confirm it and find
>> out the way to fix the bug.
> 
> This is a known issue described in:
> 
> https://www.openwall.com/lists/musl/2018/10/30/2
> 
> and likely several times before that, though it was not realized that
> people were hitting it in practice (vs it just being theoretical)
> until around that time. I posted an experimental mitigation patch last
> spring:
> 
> https://www.openwall.com/lists/musl/2019/04/12/4
> 
> but it's not heavily tested and its impact on performance is
> significant. I think it should be ok if you need an immediate fix, but
> you should do some testing to make sure. If you go this route, reports
> of any problems (or success) would be nice to hear about.
> 
> Further work in that direction was not done because it was already
> planned that musl's malloc implementation will be replaced, and that
> the replacement will solve this and other problems in much better
> ways. This is work in progress and is intended for merge in the next
> release cycle:
> 
> https://www.openwall.com/lists/musl/2019/10/22/3
> https://github.com/richfelker/mallocng-draft
> 
> Hope this information helps.
> 
> Rich
> 
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Memory leak issue in multi-threaded program
  2020-01-28 13:29 ` Rich Felker
  2020-01-29  1:55   ` Leesoo Ahn
@ 2020-02-05 10:17   ` Leesoo Ahn
  2020-02-05 20:00     ` Rich Felker
  1 sibling, 1 reply; 5+ messages in thread
From: Leesoo Ahn @ 2020-02-05 10:17 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

Dear Rich,

My coworker and I had been trying to solve this leak issue in embedded 
system which is based on OpenWRT, ARM64 arch and currently musl-1.1.16 
for our product. However, musl-1.1.24 patch you referred below, we 
figured out that backporting of the patch into 1.1.16 is quite difficult 
by such problems, for examples, translation faults raised, or in another 
way of without the patch, double-locking issue in atomically calling 
malloc/free with this changes[1].

But not only in 1.1.16, but also 1.1.24 that we tested with, has the 
same problems as well. So, we are currently like in the middle of Sea 
without any foods. It has a big risk and so much dangerous for our product.

We are considering to keep 1.1.16 as our base in product, because 
although in 1.1.24, a lot of bugs fixed, nobody can guarantee for our 
product when we put 1.1.24 on it.

Could you give us any ideas for fixing the issue in v1.1.16, please? Ah, 
we are in so much pain...

Or what do you think this case that all the time, all processes ask to 
kernel via mmap syscall? Does this solve the issue...even though it has 
bad performance...?

I wish I can solve this problem sooner.

Best regards,
Leesoo

----
[1]
diff --git a/src/malloc/malloc.c b/src/malloc/malloc.c
index 9698259..f914cff 100644
--- a/src/malloc/malloc.c
+++ b/src/malloc/malloc.c
@@ -14,6 +14,10 @@
  #define inline inline __attribute__((always_inline))
  #endif

+#include <pthread.h>
+pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
+
  static struct {
  	volatile uint64_t binmap;
  	struct bin bins[64];
@@ -281,8 +285,25 @@ static void trim(struct chunk *self, size_t n)
  	__bin_chunk(split);
  }

+#if 1
+static void *__malloc(size_t n);
+
  void *malloc(size_t n)
  {
+	void *new_heap;
+
+	pthread_mutex_lock(&lock);
+	new_heap = __malloc(n);
+	pthread_mutex_unlock(&lock);
+
+	return new_heap;
+}
+
+static void *__malloc(size_t n)
+#else
+void *malloc(size_t n)
+#endif
+{
  	struct chunk *c;
  	int i, j;

@@ -516,8 +537,21 @@ static void unmap_chunk(struct chunk *self)
  	__munmap(base, len);
  }

+#if 1
+static void __free(void *p);
+
  void free(void *p)
  {
+	pthread_mutex_lock(&lock);
+	__free(p);
+	pthread_mutex_unlock(&lock);
+}
+
+static void __free(void *p)
+#else
+void free(void *p)
+#endif
+{
  	if (!p) return;

  	struct chunk *self = MEM_TO_CHUNK(p);


20. 1. 28. 오후 10:29에 Rich Felker 이(가) 쓴 글:
> On Tue, Jan 28, 2020 at 02:44:07PM +0900, Leesoo Ahn wrote:
>> Dear musl developers,
>>
>> Hello!, it seems that musl currently has a memory leak issue in
>> multi-threaded program. It occurs in the below situation of latest
>> (v1.1.24) source. Also, not only in 32-bits[1], but also 64-bits[2]
>> as well.
>>
>> When a program create and run, at least, two threads or more with
>> pthread APIs, VSZ of the program by ps command keeps increasing. But
>> here is a weird thing that it is fine 'IF ONLY ONE' pthread is
>> created and run.
>>
>> To confirm the issue in your host machine, please follow the instructions,
>>
>> 0. Clone the musl git and get inside.
>> 1. Build with these options for static build, ./configure
>> --prefix=$(pwd)/_build_dir --disable-shared
>> 2. Download the test code[3], then build with the command,
>> ../_build_dir/bin/musl-gcc ./test.c
>> 3. Run this script, ./a.out &; while [ 1 ]; do { ps aux | grep
>> [a].out | grep -v grep; sleep 1; } done
>>
>> You may figure out that VSZ keeps increasing.
>>
>> BUT, when I make it to try to allocate memory all the time by kernel
>> mmap with this diff[4] as workaround, although it creates more
>> pthreads than 2, the issue never happens.
>>
>> It would be really thankful if you guys could confirm it and find
>> out the way to fix the bug.
> 
> This is a known issue described in:
> 
> https://www.openwall.com/lists/musl/2018/10/30/2
> 
> and likely several times before that, though it was not realized that
> people were hitting it in practice (vs it just being theoretical)
> until around that time. I posted an experimental mitigation patch last
> spring:
> 
> https://www.openwall.com/lists/musl/2019/04/12/4
> 
> but it's not heavily tested and its impact on performance is
> significant. I think it should be ok if you need an immediate fix, but
> you should do some testing to make sure. If you go this route, reports
> of any problems (or success) would be nice to hear about.
> 
> Further work in that direction was not done because it was already
> planned that musl's malloc implementation will be replaced, and that
> the replacement will solve this and other problems in much better
> ways. This is work in progress and is intended for merge in the next
> release cycle:
> 
> https://www.openwall.com/lists/musl/2019/10/22/3
> https://github.com/richfelker/mallocng-draft
> 
> Hope this information helps.
> 
> Rich
> 
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Memory leak issue in multi-threaded program
  2020-02-05 10:17   ` Leesoo Ahn
@ 2020-02-05 20:00     ` Rich Felker
  0 siblings, 0 replies; 5+ messages in thread
From: Rich Felker @ 2020-02-05 20:00 UTC (permalink / raw)
  To: Leesoo Ahn; +Cc: musl

On Wed, Feb 05, 2020 at 07:17:05PM +0900, Leesoo Ahn wrote:
> Dear Rich,
> 
> My coworker and I had been trying to solve this leak issue in
> embedded system which is based on OpenWRT, ARM64 arch and currently
> musl-1.1.16 for our product. However, musl-1.1.24 patch you referred
> below, we figured out that backporting of the patch into 1.1.16 is
> quite difficult by such problems, for examples, translation faults
> raised, or in another way of without the patch, double-locking issue
> in atomically calling malloc/free with this changes[1].
> 
> But not only in 1.1.16, but also 1.1.24 that we tested with, has the
> same problems as well. So, we are currently like in the middle of
> Sea without any foods. It has a big risk and so much dangerous for
> our product.
> 
> We are considering to keep 1.1.16 as our base in product, because
> although in 1.1.24, a lot of bugs fixed, nobody can guarantee for
> our product when we put 1.1.24 on it.
> 
> Could you give us any ideas for fixing the issue in v1.1.16, please?
> Ah, we are in so much pain...
> 
> Or what do you think this case that all the time, all processes ask
> to kernel via mmap syscall? Does this solve the issue...even though
> it has bad performance...?
> 
> I wish I can solve this problem sooner.

Unconditional use of mmap may be okay, but it will significantly harm
performance and increase memory usage (even a 10-byte allocation will
consume 4k!) and it would require some review to make sure there are
no assumptions that mmap is only used for larger sizes.

Your approach with wrapping malloc and free with big global locks
should be safe, but you also need to wrap realloc. (There are other
functions but I think they all call malloc, free, or realloc as
backends so just those three should suffice.) This is probably the
easiest solution available to you.

I don't think backporting the patch I showed you to 1.1.16 would be a
lot of work, and I could send you a quote for it as a paid support
service if you're interested.

If you were using 1.1.20 or later, another option would be to just
link in an alternate malloc implementation, but that is not supported
and not safe in earlier versions of musl like 1.1.16. And trying to
make it work without understanding why it's unsafe would be a recipe
for really nasty subtle breakage.

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-05 20:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-28  5:44 [musl] Memory leak issue in multi-threaded program Leesoo Ahn
2020-01-28 13:29 ` Rich Felker
2020-01-29  1:55   ` Leesoo Ahn
2020-02-05 10:17   ` Leesoo Ahn
2020-02-05 20:00     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).