From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 895 invoked from network); 25 Jan 2023 00:34:25 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 25 Jan 2023 00:34:25 -0000 Received: (qmail 26613 invoked by uid 550); 25 Jan 2023 00:34:20 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 26573 invoked from network); 25 Jan 2023 00:34:19 -0000 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PFF/B5R6bIsqRDw3SKVHqxvIw/HeisQI7nkMa9o9x6A=; b=dhqRGrkTCMFF57+iwMvn8iDwPcSuaHyrVhCM/kH39N0kEl0R/IO5XUoMkco2ZUBIv4 fGvjRUsT53TqWSNvnRx/Crkipth/17F6nZEVL+Qrl6gwqrVsni583463KqARw6z2o51I vn8QiW47pQW0saHYmDPK2bfavswnRJ0wLZ5siocD/5E3CVn7vdvtGxTeV8kVNHW302Te qC+B+Vy+Qr5SW6APRwYKWivGzEP7DyK3aFSbs8RFsWO+XNZvA4NFz/enoBKEZ7lsQwQr zE8j8E/IUW0lk+K/fBAC2vabcK8bnOF38DPhKixQjILkHEg0OFOUVQiUpqnUZYZTT8A1 3sEg== X-Gm-Message-State: AFqh2ko3Jc6ygtdmPJuiQUqMSWWFapSqSPSyWJBL9//3fh1nCTVvZPnL kSZHYRZnk0Jyi5lIejiEUPoFtfHDqZpOS2oy87pOL04AVNaMujhvNYbbjfhiObmc/LgB4Weszsw ECS60pW+AIILbeXp4vJ6aAg== X-Received: by 2002:a17:902:c3c6:b0:192:c014:f6ba with SMTP id j6-20020a170902c3c600b00192c014f6bamr29736082plj.33.1674606845350; Tue, 24 Jan 2023 16:34:05 -0800 (PST) X-Google-Smtp-Source: AMrXdXs/eY8np9OtEVWOhPEcrUgiM/y9YYAmIQri4D3GHvlcZkvonWa/d3Gzjj7TjbH0Y6kdKVDtww== X-Received: by 2002:a17:902:c3c6:b0:192:c014:f6ba with SMTP id j6-20020a170902c3c600b00192c014f6bamr29736064plj.33.1674606844901; Tue, 24 Jan 2023 16:34:04 -0800 (PST) Date: Wed, 25 Jan 2023 09:33:52 +0900 From: Dominique MARTINET To: Rich Felker Cc: musl@lists.openwall.com Message-ID: References: <20230124083747.GI4163@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20230124083747.GI4163@brightrain.aerifal.cx> Subject: Re: [musl] infinite loop in mallocng's try_avail Thanks for the reply, Rich Felker wrote on Tue, Jan 24, 2023 at 03:37:48AM -0500: > > (this is musl 1.2.4 with a couple of patchs, none around malloc: (I had meant 1.2.3) > > https://gitlab.alpinelinux.org/alpine/aports/-/tree/3.17-stable/main/musl > > ) > > > > For convenience, I've copied the incriminated loop here: > > int cnt = m->mem->active_idx + 2; > > int size = size_classes[m->sizeclass]*UNIT; > > int span = UNIT + size*cnt; > > // activate up to next 4k boundary > > while ((span^(span+size-1)) < 4096) { > > cnt++; > > span += size; > > } > > This code should not be reachable for size class 0 or any size class > allocated inside a larger-size-class slot. > That case has active_idx = cnt-1 (set at line 272). I figured that it might be "normally" unreachable but did not see why, thanks for confirming that intention. > If this code is being reached, either the allocator state has been > corrupted by some UB in the application, or there's a logic bug in > mallocng. The sequence of events that seem to have to happen to get > there are: > > 1. Previously active group has no more available slots (line 120). Right, that one has already likely been dequeued (or at least traversed), so I do not see how to look at it but that sounds possible. > 2. Freed mask of newly activating group (line 131 or 138) is either > zero (line 145) or the active_idx (read from in-band memory > susceptible to application buffer overflows etc) is wrong and > produces zero when its bits are anded with the freed mask (line > 145). m->freed_mask looks like it is zero from values below; I cannot tell if that comes from a corruption outside of musl or not. > > (gdb) p __malloc_context > > $94 = { > > secret = 15756413639004407235, > > init_done = 1, > > mmap_counter = 135, > > free_meta_head = 0x0, > > avail_meta = 0x18a3f70, > > avail_meta_count = 6, > > avail_meta_area_count = 0, > > meta_alloc_shift = 0, > > meta_area_head = 0x18a3000, > > meta_area_tail = 0x18a3000, > > avail_meta_areas = 0x18a4000 , > > active = {0x18a3e98, 0x18a3eb0, 0x18a3208, 0x18a3280, 0x0, 0x0, 0x0, 0x18a31c0, 0x0, 0x0, 0x0, 0x18a3148, 0x0, 0x0, 0x0, 0x18a3dd8, 0x0, 0x0, 0x0, 0x18a3d90, 0x0, > > 0x18a31f0, 0x0, 0x18a3b68, 0x0, 0x18a3f28, 0x0, 0x0, 0x0, 0x18a3238, 0x0 }, > > usage_by_class = {2580, 600, 10, 7, 0 , 96, 0, 0, 0, 20, 0, 3, 0, 8, 0, 3, 0, 0, 0, 3, 0 }, > > unmap_seq = '\000' , > > bounces = '\000' , "w", '\000' , > > seq = 1 '\001', > > brk = 25837568 > > } > > (gdb) p *__malloc_context->active[0] > > $95 = { > > prev = 0x18a3f40, > > next = 0x18a3e80, > > mem = 0xb6f57b30, > > avail_mask = 1073741822, > > freed_mask = 0, > > last_idx = 29, > > freeable = 1, > > sizeclass = 0, > > maplen = 0 > > } > > (gdb) p *__malloc_context->active[0]->mem > > $97 = { > > meta = 0x18a3e98, > > active_idx = 29 '\035', > > pad = "\000\000\000\000\000\000\000\000\377\000", > > storage = 0xb6f57b40 "" > > } > > This is really weird, because at the point of the infinite loop, the > new group should not yet be activated (line 163), so > __malloc_context->active[0] should still point to the old active > group. But its avail_mask has all bits set and active_idx is not > corrupted, so try_avail should just have obtained an available slot > from it without ever entering the block at line 120. So I'm confused > how it got to the loop. try_avail's pm is `__malloc_context->active[0]`, which is overwritten by either dequeue(pm, m) or *pm = m (lines 123,128), so the original m->avail_mask could have been zero, with the next element having a zero freed mask? I'm really not familiar with the slot managemnet logic here, that might not normally be possible without corruption, but the structures look fairly sensible to me... Not that it proves there wasn't some sort of outside corruption, I wish this was easier to reproduce so I could just run it in valgrind or asan to detect overflows... > One odd thing I noticed is that the backtrace pm=0xb6f692e8 does not > match the __malloc_context->active[0] address. Were thse from > different runs? These were from the same run, I've only observed this single occurence first-hand. pm is &__malloc_context->active[0], so it's not 0x18a3e98 (first value of active) but its address (e.g. __malloc_context+48 as per gdb symbol resolution in the backtrace) I didn't print __malloc_context but I don't see why gdb would have gotten that wrong. Cheers, -- Dominique