From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 6178 invoked from network); 25 Jan 2023 05:53:40 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 25 Jan 2023 05:53:40 -0000 Received: (qmail 22484 invoked by uid 550); 25 Jan 2023 05:53:37 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 22452 invoked from network); 25 Jan 2023 05:53:36 -0000 Date: Wed, 25 Jan 2023 00:53:23 -0500 From: Rich Felker To: Dominique MARTINET Cc: musl@lists.openwall.com Message-ID: <20230125055323.GK4163@brightrain.aerifal.cx> References: <20230124083747.GI4163@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] infinite loop in mallocng's try_avail On Wed, Jan 25, 2023 at 09:33:52AM +0900, Dominique MARTINET wrote: > > If this code is being reached, either the allocator state has been > > corrupted by some UB in the application, or there's a logic bug in > > mallocng. The sequence of events that seem to have to happen to get > > there are: > > > > 1. Previously active group has no more available slots (line 120). > > Right, that one has already likely been dequeued (or at least > traversed), so I do not see how to look at it but that sounds possible. > > > 2. Freed mask of newly activating group (line 131 or 138) is either > > zero (line 145) or the active_idx (read from in-band memory > > susceptible to application buffer overflows etc) is wrong and > > produces zero when its bits are anded with the freed mask (line > > 145). > > m->freed_mask looks like it is zero from values below; I cannot tell if > that comes from a corruption outside of musl or not. > > > > (gdb) p __malloc_context > > > $94 = { > > > secret = 15756413639004407235, > > > init_done = 1, > > > mmap_counter = 135, > > > free_meta_head = 0x0, > > > avail_meta = 0x18a3f70, > > > avail_meta_count = 6, > > > avail_meta_area_count = 0, > > > meta_alloc_shift = 0, > > > meta_area_head = 0x18a3000, > > > meta_area_tail = 0x18a3000, > > > avail_meta_areas = 0x18a4000 , > > > active = {0x18a3e98, 0x18a3eb0, 0x18a3208, 0x18a3280, 0x0, 0x0, 0x0, 0x18a31c0, 0x0, 0x0, 0x0, 0x18a3148, 0x0, 0x0, 0x0, 0x18a3dd8, 0x0, 0x0, 0x0, 0x18a3d90, 0x0, > > > 0x18a31f0, 0x0, 0x18a3b68, 0x0, 0x18a3f28, 0x0, 0x0, 0x0, 0x18a3238, 0x0 }, > > > usage_by_class = {2580, 600, 10, 7, 0 , 96, 0, 0, 0, 20, 0, 3, 0, 8, 0, 3, 0, 0, 0, 3, 0 }, > > > unmap_seq = '\000' , > > > bounces = '\000' , "w", '\000' , > > > seq = 1 '\001', > > > brk = 25837568 > > > } > > > (gdb) p *__malloc_context->active[0] > > > $95 = { > > > prev = 0x18a3f40, > > > next = 0x18a3e80, > > > mem = 0xb6f57b30, > > > avail_mask = 1073741822, > > > freed_mask = 0, > > > last_idx = 29, > > > freeable = 1, > > > sizeclass = 0, > > > maplen = 0 > > > } > > > (gdb) p *__malloc_context->active[0]->mem > > > $97 = { > > > meta = 0x18a3e98, > > > active_idx = 29 '\035', > > > pad = "\000\000\000\000\000\000\000\000\377\000", > > > storage = 0xb6f57b40 "" > > > } > > > > This is really weird, because at the point of the infinite loop, the > > new group should not yet be activated (line 163), so > > __malloc_context->active[0] should still point to the old active > > group. But its avail_mask has all bits set and active_idx is not > > corrupted, so try_avail should just have obtained an available slot > > from it without ever entering the block at line 120. So I'm confused > > how it got to the loop. > > try_avail's pm is `__malloc_context->active[0]`, which is overwritten by > either dequeue(pm, m) or *pm = m (lines 123,128), so the original > m->avail_mask could have been zero, with the next element having a zero > freed mask? No, avail_mask is only supposed to be able to be nonzero after activate_group, which is only called on the head of an active list (free.c:86 or malloc.c:163) and which atomically pulls bits off freed_mask to move them to avail_mask. If we're observing avail_mask nonzero at the point you saw it, some invariant seems to have been violated. > > One odd thing I noticed is that the backtrace pm=0xb6f692e8 does not > > match the __malloc_context->active[0] address. Were thse from > > different runs? > > These were from the same run, I've only observed this single occurence > first-hand. > > pm is &__malloc_context->active[0], so it's not 0x18a3e98 (first value > of active) but its address (e.g. __malloc_context+48 as per gdb symbol > resolution in the backtrace) > I didn't print __malloc_context but I don't see why gdb would have > gotten that wrong. Ah, I forgot I was looking at an additional level of indirection here. It would be nice to know if m is the same active[0] as at entry; that would help figure out where things went wrong... Rich