From: Dominique MARTINET <dominique.martinet@atmark-techno.com>
To: musl@lists.openwall.com
Subject: [musl] infinite loop in mallocng's try_avail
Date: Mon, 23 Jan 2023 10:33:39 +0900 [thread overview]
Message-ID: <Y83j89CsGRHIfqWC@atmark-techno.com> (raw)
Hi,
A coworker of mine has been observing hangs during shutdown on our ARM
alpine/openrc board, where the openrc shutdown process is stuck in
userspace after having stopped almost everything once in a blue moon
(running reboot in a loop, this is reproducible roughly one time every
three days)
It turns out it's stuck in malloc() as follow:
(gdb) bt
#0 try_avail (pm=0xb6f692e8 <__malloc_context+48>) at src/malloc/mallocng/malloc.c:156
#1 alloc_slot (sc=sc@entry=0, req=req@entry=8) at src/malloc/mallocng/malloc.c:288
#2 0xb6efdc94 in __libc_malloc_impl (n=8) at src/malloc/mallocng/malloc.c:369
#3 0xb6e9d532 in xmalloc (size=size@entry=8) at ../src/shared/helpers.h:64
#4 0xb6e9d598 in rc_stringlist_new () at ../src/librc/librc-stringlist.c:24
#5 0xb6e9b91e in get_provided (depinfo=depinfo@entry=0xb6f585d0, runlevel=runlevel@entry=0xbe940f99 "shutdown", options=options@entry=3)
at ../src/librc/librc-depend.c:269
#6 0xb6e9bce8 in visit_service (options=3, runlevel=0xbe940f99 "shutdown", depinfo=0xb6f5ee30, visited=0xb6f57b00, sorted=0xb6f57af0, types=0xb6f575a0,
deptree=0xb6ed8ec0) at ../src/librc/librc-depend.c:419
#7 visit_service (deptree=0xb6ed8ec0, types=0xb6f575a0, sorted=0xb6f57af0, visited=0xb6f57b00, depinfo=0xb6f5ee30, runlevel=0xbe940f99 "shutdown", options=3)
at ../src/librc/librc-depend.c:357
#8 0xb6e9bcb2 in visit_service (options=3, runlevel=0xbe940f99 "shutdown", depinfo=0xb6f62f50, visited=0xb6f57b00, sorted=0xb6f57af0, types=0xb6f575a0,
deptree=0xb6ed8ec0) at ../src/librc/librc-depend.c:405
#9 visit_service (deptree=0xb6ed8ec0, types=0xb6f575a0, sorted=0xb6f57af0, visited=0xb6f57b00, depinfo=0xb6f62f50, runlevel=0xbe940f99 "shutdown", options=3)
at ../src/librc/librc-depend.c:357
#10 0xb6e9bcb2 in visit_service (options=3, runlevel=0xbe940f99 "shutdown", depinfo=0xb6ed85a0, visited=0xb6f57b00, sorted=0xb6f57af0, types=0xb6f575a0,
deptree=0xb6ed8ec0) at ../src/librc/librc-depend.c:405
#11 visit_service (deptree=0xb6ed8ec0, types=0xb6f575a0, sorted=0xb6f57af0, visited=0xb6f57b00, depinfo=0xb6ed85a0, runlevel=0xbe940f99 "shutdown", options=3)
at ../src/librc/librc-depend.c:357
#12 0xb6e9bffa in rc_deptree_depends (deptree=deptree@entry=0xb6ed8ec0, types=types@entry=0xb6f575a0, services=services@entry=0xb6f57ab0,
runlevel=0xbe940f99 "shutdown", options=options@entry=3) at ../src/librc/librc-depend.c:482
#13 0x004226ea in do_stop_services (going_down=true, parallel=true, newlevel=0xbe940f99 "shutdown", deptree=0xb6ed8ec0, stop_services=<optimized out>,
start_services=0xb6f57d70, types_nw=0xb6f575a0) at ../src/openrc/rc.c:592
#14 main (argc=<optimized out>, argv=<optimized out>) at ../src/openrc/rc.c:1022
(this is musl 1.2.4 with a couple of patchs, none around malloc:
https://gitlab.alpinelinux.org/alpine/aports/-/tree/3.17-stable/main/musl
)
For convenience, I've copied the incriminated loop here:
int cnt = m->mem->active_idx + 2;
int size = size_classes[m->sizeclass]*UNIT;
int span = UNIT + size*cnt;
// activate up to next 4k boundary
while ((span^(span+size-1)) < 4096) {
cnt++;
span += size;
}
Here, we have UNIT=16 (constant), we're considering a malloc(8) so
sizeclass is 0 -> size_classes[0] = 1 -> size = 16, so we're looping
on this condition:
while (span ^ (span + 15) < 4096)
with span being a multiple of 16.
I'm not good at maths, but I'm fairly confindent this never gets false:
0x0ff0 ^ 0x0fff = f
0x1000 ^ 0x100f = f
It could become true without the -1 (span+16), is that was intended ?
(It works for other sizes because of the UNIT offset in size, so
e.g. for size=32 we have 0x0ff0 ^ 0x100f which is 8191 alright, but
that'd also work without the -1 as far as I can tell)
Anyway, I'm not ruling out a problem in openrc yet -- rc_parallel is
known to be buggy -- but looking at the mallocng ctx I can't see any
obvious problem so I'm asking here first:
Is that supposed to work? Now I've gotten it down so far it might be
possible to build a reproducer; unfortunately I do not know how ctx was
when it entered try_avail (the function entered the if (!mask) so the
first entry was dequeued), but here's it's current state (please ask if
you need anything else)
------
(gdb) p __malloc_context
$94 = {
secret = 15756413639004407235,
init_done = 1,
mmap_counter = 135,
free_meta_head = 0x0,
avail_meta = 0x18a3f70,
avail_meta_count = 6,
avail_meta_area_count = 0,
meta_alloc_shift = 0,
meta_area_head = 0x18a3000,
meta_area_tail = 0x18a3000,
avail_meta_areas = 0x18a4000 <error: Cannot access memory at address 0x18a4000>,
active = {0x18a3e98, 0x18a3eb0, 0x18a3208, 0x18a3280, 0x0, 0x0, 0x0, 0x18a31c0, 0x0, 0x0, 0x0, 0x18a3148, 0x0, 0x0, 0x0, 0x18a3dd8, 0x0, 0x0, 0x0, 0x18a3d90, 0x0,
0x18a31f0, 0x0, 0x18a3b68, 0x0, 0x18a3f28, 0x0, 0x0, 0x0, 0x18a3238, 0x0 <repeats 18 times>},
usage_by_class = {2580, 600, 10, 7, 0 <repeats 11 times>, 96, 0, 0, 0, 20, 0, 3, 0, 8, 0, 3, 0, 0, 0, 3, 0 <repeats 18 times>},
unmap_seq = '\000' <repeats 31 times>,
bounces = '\000' <repeats 18 times>, "w", '\000' <repeats 12 times>,
seq = 1 '\001',
brk = 25837568
}
(gdb) p *__malloc_context->active[0]
$95 = {
prev = 0x18a3f40,
next = 0x18a3e80,
mem = 0xb6f57b30,
avail_mask = 1073741822,
freed_mask = 0,
last_idx = 29,
freeable = 1,
sizeclass = 0,
maplen = 0
}
(gdb) p *__malloc_context->active[0]->mem
$97 = {
meta = 0x18a3e98,
active_idx = 29 '\035',
pad = "\000\000\000\000\000\000\000\000\377\000",
storage = 0xb6f57b40 ""
}
------
I'm leaving that process stuck in limbo so I can get more infos for a
few days.
Meanwhile we'll also be trying to see if the following patch helps --
given we do not have a good reproducer there won't be any feedback on
this side for a while.
----
From f46bbe3117674becb3c5e4ffb7c8f4e4776084db Mon Sep 17 00:00:00 2001
From: Dominique Martinet <dominique.martinet@atmark-techno.com>
Date: Mon, 23 Jan 2023 10:12:55 +0900
Subject: [PATCH] mallocng: try_avail: fix infinite loop
sizeclass=0 has span being a multiple of size = 16, which will
never stop having "(span^(span+size-1)) < 4096".
Removing the -1 makes that ending condition reachable for all
class sizes.
---
src/malloc/mallocng/malloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/malloc/mallocng/malloc.c b/src/malloc/mallocng/malloc.c
index d695ab8ec982..9ef0b176c31e 100644
--- a/src/malloc/mallocng/malloc.c
+++ b/src/malloc/mallocng/malloc.c
@@ -151,7 +151,7 @@ static uint32_t try_avail(struct meta **pm)
int size = size_classes[m->sizeclass]*UNIT;
int span = UNIT + size*cnt;
// activate up to next 4k boundary
- while ((span^(span+size-1)) < 4096) {
+ while ((span^(span+size)) < 4096) {
cnt++;
span += size;
}
----
Thank you,
--
Dominique Martinet
next reply other threads:[~2023-01-23 1:34 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-23 1:33 Dominique MARTINET [this message]
2023-01-24 8:37 ` Rich Felker
2023-01-25 0:33 ` Dominique MARTINET
2023-01-25 5:53 ` Rich Felker
2023-01-25 6:48 ` Dominique MARTINET
2023-01-27 6:20 ` Dominique MARTINET
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y83j89CsGRHIfqWC@atmark-techno.com \
--to=dominique.martinet@atmark-techno.com \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).