From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 23840 invoked from network); 23 Jan 2023 01:34:12 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 23 Jan 2023 01:34:12 -0000 Received: (qmail 30649 invoked by uid 550); 23 Jan 2023 01:34:06 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 30607 invoked from network); 23 Jan 2023 01:34:05 -0000 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-disposition:mime-version:message-id:subject:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kX6shpV92TwM+5KkHkqNRphD/QcKCHmd2axBtFNQ0sI=; b=kG+PsTO0mFiwbCkYobs6gk5F4eiowJxSDtxjH7mSP9gBOhPe0aoLRDOH92Bfxx7piq vT+WH/0SLparBM4+ncCfav3cFfhrigldVfHcnqi7VS41YwmoiC7Vzc6vFrp9W0EC8+qo KrlA8Dyj5zgQ2UjmB7W75tZY0H3Z3ImY2uKox2E7iL31mFem3mFebhaJFSOKU15wpTe4 haEbdSDhx30iR+1KjhHVhWuCQ7aiInrhOyyBQ5lhtHv1M8IJUwSQ0Zxz0I8cAFmg4oUt 0U9wUZU6fc/nQDpKjAeuup33lddWeyEqYPv0bqc5Rqv8vHVa4rTjVXcwMl2kpUQ1MIeI pk3w== X-Gm-Message-State: AFqh2kq609Qy873MnPwTIfN8Xex+PblGKGs2huUHlBYR8i3cT2hhMEoW JA6bgwg0qT0pYXvFaFzp1hAD3zGcZxws6nckn0/6mIHmvEjxxfKohT3WtJFEB9BprkiSrw8hmNu ndR6MqyuCIu8wxL5hMv4gig== X-Received: by 2002:aa7:910c:0:b0:58d:a924:9ce with SMTP id 12-20020aa7910c000000b0058da92409cemr22386188pfh.7.1674437631525; Sun, 22 Jan 2023 17:33:51 -0800 (PST) X-Google-Smtp-Source: AMrXdXvvru2L5mVd1i7bzH9SrFtsbkgR+vLBuBWhPtelVW7++/ihnZRcfZGgPvYPrZMF3gpOZk78cQ== X-Received: by 2002:aa7:910c:0:b0:58d:a924:9ce with SMTP id 12-20020aa7910c000000b0058da92409cemr22386165pfh.7.1674437631036; Sun, 22 Jan 2023 17:33:51 -0800 (PST) Date: Mon, 23 Jan 2023 10:33:39 +0900 From: Dominique MARTINET To: musl@lists.openwall.com Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Subject: [musl] infinite loop in mallocng's try_avail Hi, A coworker of mine has been observing hangs during shutdown on our ARM alpine/openrc board, where the openrc shutdown process is stuck in userspace after having stopped almost everything once in a blue moon (running reboot in a loop, this is reproducible roughly one time every three days) It turns out it's stuck in malloc() as follow: (gdb) bt #0 try_avail (pm=0xb6f692e8 <__malloc_context+48>) at src/malloc/mallocng/malloc.c:156 #1 alloc_slot (sc=sc@entry=0, req=req@entry=8) at src/malloc/mallocng/malloc.c:288 #2 0xb6efdc94 in __libc_malloc_impl (n=8) at src/malloc/mallocng/malloc.c:369 #3 0xb6e9d532 in xmalloc (size=size@entry=8) at ../src/shared/helpers.h:64 #4 0xb6e9d598 in rc_stringlist_new () at ../src/librc/librc-stringlist.c:24 #5 0xb6e9b91e in get_provided (depinfo=depinfo@entry=0xb6f585d0, runlevel=runlevel@entry=0xbe940f99 "shutdown", options=options@entry=3) at ../src/librc/librc-depend.c:269 #6 0xb6e9bce8 in visit_service (options=3, runlevel=0xbe940f99 "shutdown", depinfo=0xb6f5ee30, visited=0xb6f57b00, sorted=0xb6f57af0, types=0xb6f575a0, deptree=0xb6ed8ec0) at ../src/librc/librc-depend.c:419 #7 visit_service (deptree=0xb6ed8ec0, types=0xb6f575a0, sorted=0xb6f57af0, visited=0xb6f57b00, depinfo=0xb6f5ee30, runlevel=0xbe940f99 "shutdown", options=3) at ../src/librc/librc-depend.c:357 #8 0xb6e9bcb2 in visit_service (options=3, runlevel=0xbe940f99 "shutdown", depinfo=0xb6f62f50, visited=0xb6f57b00, sorted=0xb6f57af0, types=0xb6f575a0, deptree=0xb6ed8ec0) at ../src/librc/librc-depend.c:405 #9 visit_service (deptree=0xb6ed8ec0, types=0xb6f575a0, sorted=0xb6f57af0, visited=0xb6f57b00, depinfo=0xb6f62f50, runlevel=0xbe940f99 "shutdown", options=3) at ../src/librc/librc-depend.c:357 #10 0xb6e9bcb2 in visit_service (options=3, runlevel=0xbe940f99 "shutdown", depinfo=0xb6ed85a0, visited=0xb6f57b00, sorted=0xb6f57af0, types=0xb6f575a0, deptree=0xb6ed8ec0) at ../src/librc/librc-depend.c:405 #11 visit_service (deptree=0xb6ed8ec0, types=0xb6f575a0, sorted=0xb6f57af0, visited=0xb6f57b00, depinfo=0xb6ed85a0, runlevel=0xbe940f99 "shutdown", options=3) at ../src/librc/librc-depend.c:357 #12 0xb6e9bffa in rc_deptree_depends (deptree=deptree@entry=0xb6ed8ec0, types=types@entry=0xb6f575a0, services=services@entry=0xb6f57ab0, runlevel=0xbe940f99 "shutdown", options=options@entry=3) at ../src/librc/librc-depend.c:482 #13 0x004226ea in do_stop_services (going_down=true, parallel=true, newlevel=0xbe940f99 "shutdown", deptree=0xb6ed8ec0, stop_services=, start_services=0xb6f57d70, types_nw=0xb6f575a0) at ../src/openrc/rc.c:592 #14 main (argc=, argv=) at ../src/openrc/rc.c:1022 (this is musl 1.2.4 with a couple of patchs, none around malloc: https://gitlab.alpinelinux.org/alpine/aports/-/tree/3.17-stable/main/musl ) For convenience, I've copied the incriminated loop here: int cnt = m->mem->active_idx + 2; int size = size_classes[m->sizeclass]*UNIT; int span = UNIT + size*cnt; // activate up to next 4k boundary while ((span^(span+size-1)) < 4096) { cnt++; span += size; } Here, we have UNIT=16 (constant), we're considering a malloc(8) so sizeclass is 0 -> size_classes[0] = 1 -> size = 16, so we're looping on this condition: while (span ^ (span + 15) < 4096) with span being a multiple of 16. I'm not good at maths, but I'm fairly confindent this never gets false: 0x0ff0 ^ 0x0fff = f 0x1000 ^ 0x100f = f It could become true without the -1 (span+16), is that was intended ? (It works for other sizes because of the UNIT offset in size, so e.g. for size=32 we have 0x0ff0 ^ 0x100f which is 8191 alright, but that'd also work without the -1 as far as I can tell) Anyway, I'm not ruling out a problem in openrc yet -- rc_parallel is known to be buggy -- but looking at the mallocng ctx I can't see any obvious problem so I'm asking here first: Is that supposed to work? Now I've gotten it down so far it might be possible to build a reproducer; unfortunately I do not know how ctx was when it entered try_avail (the function entered the if (!mask) so the first entry was dequeued), but here's it's current state (please ask if you need anything else) ------ (gdb) p __malloc_context $94 = { secret = 15756413639004407235, init_done = 1, mmap_counter = 135, free_meta_head = 0x0, avail_meta = 0x18a3f70, avail_meta_count = 6, avail_meta_area_count = 0, meta_alloc_shift = 0, meta_area_head = 0x18a3000, meta_area_tail = 0x18a3000, avail_meta_areas = 0x18a4000 , active = {0x18a3e98, 0x18a3eb0, 0x18a3208, 0x18a3280, 0x0, 0x0, 0x0, 0x18a31c0, 0x0, 0x0, 0x0, 0x18a3148, 0x0, 0x0, 0x0, 0x18a3dd8, 0x0, 0x0, 0x0, 0x18a3d90, 0x0, 0x18a31f0, 0x0, 0x18a3b68, 0x0, 0x18a3f28, 0x0, 0x0, 0x0, 0x18a3238, 0x0 }, usage_by_class = {2580, 600, 10, 7, 0 , 96, 0, 0, 0, 20, 0, 3, 0, 8, 0, 3, 0, 0, 0, 3, 0 }, unmap_seq = '\000' , bounces = '\000' , "w", '\000' , seq = 1 '\001', brk = 25837568 } (gdb) p *__malloc_context->active[0] $95 = { prev = 0x18a3f40, next = 0x18a3e80, mem = 0xb6f57b30, avail_mask = 1073741822, freed_mask = 0, last_idx = 29, freeable = 1, sizeclass = 0, maplen = 0 } (gdb) p *__malloc_context->active[0]->mem $97 = { meta = 0x18a3e98, active_idx = 29 '\035', pad = "\000\000\000\000\000\000\000\000\377\000", storage = 0xb6f57b40 "" } ------ I'm leaving that process stuck in limbo so I can get more infos for a few days. Meanwhile we'll also be trying to see if the following patch helps -- given we do not have a good reproducer there won't be any feedback on this side for a while. ---- >From f46bbe3117674becb3c5e4ffb7c8f4e4776084db Mon Sep 17 00:00:00 2001 From: Dominique Martinet Date: Mon, 23 Jan 2023 10:12:55 +0900 Subject: [PATCH] mallocng: try_avail: fix infinite loop sizeclass=0 has span being a multiple of size = 16, which will never stop having "(span^(span+size-1)) < 4096". Removing the -1 makes that ending condition reachable for all class sizes. --- src/malloc/mallocng/malloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/malloc/mallocng/malloc.c b/src/malloc/mallocng/malloc.c index d695ab8ec982..9ef0b176c31e 100644 --- a/src/malloc/mallocng/malloc.c +++ b/src/malloc/mallocng/malloc.c @@ -151,7 +151,7 @@ static uint32_t try_avail(struct meta **pm) int size = size_classes[m->sizeclass]*UNIT; int span = UNIT + size*cnt; // activate up to next 4k boundary - while ((span^(span+size-1)) < 4096) { + while ((span^(span+size)) < 4096) { cnt++; span += size; } ---- Thank you, -- Dominique Martinet