From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4
Received: (qmail 895 invoked from network); 25 Jan 2023 00:34:25 -0000
Received: from second.openwall.net (193.110.157.125)
  by inbox.vuxu.org with ESMTPUTF8; 25 Jan 2023 00:34:25 -0000
Received: (qmail 26613 invoked by uid 550); 25 Jan 2023 00:34:20 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 26573 invoked from network); 25 Jan 2023 00:34:19 -0000
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=PFF/B5R6bIsqRDw3SKVHqxvIw/HeisQI7nkMa9o9x6A=;
        b=dhqRGrkTCMFF57+iwMvn8iDwPcSuaHyrVhCM/kH39N0kEl0R/IO5XUoMkco2ZUBIv4
         fGvjRUsT53TqWSNvnRx/Crkipth/17F6nZEVL+Qrl6gwqrVsni583463KqARw6z2o51I
         vn8QiW47pQW0saHYmDPK2bfavswnRJ0wLZ5siocD/5E3CVn7vdvtGxTeV8kVNHW302Te
         qC+B+Vy+Qr5SW6APRwYKWivGzEP7DyK3aFSbs8RFsWO+XNZvA4NFz/enoBKEZ7lsQwQr
         zE8j8E/IUW0lk+K/fBAC2vabcK8bnOF38DPhKixQjILkHEg0OFOUVQiUpqnUZYZTT8A1
         3sEg==
X-Gm-Message-State: AFqh2ko3Jc6ygtdmPJuiQUqMSWWFapSqSPSyWJBL9//3fh1nCTVvZPnL
	kSZHYRZnk0Jyi5lIejiEUPoFtfHDqZpOS2oy87pOL04AVNaMujhvNYbbjfhiObmc/LgB4Weszsw
	ECS60pW+AIILbeXp4vJ6aAg==
X-Received: by 2002:a17:902:c3c6:b0:192:c014:f6ba with SMTP id j6-20020a170902c3c600b00192c014f6bamr29736082plj.33.1674606845350;
        Tue, 24 Jan 2023 16:34:05 -0800 (PST)
X-Google-Smtp-Source: AMrXdXs/eY8np9OtEVWOhPEcrUgiM/y9YYAmIQri4D3GHvlcZkvonWa/d3Gzjj7TjbH0Y6kdKVDtww==
X-Received: by 2002:a17:902:c3c6:b0:192:c014:f6ba with SMTP id j6-20020a170902c3c600b00192c014f6bamr29736064plj.33.1674606844901;
        Tue, 24 Jan 2023 16:34:04 -0800 (PST)
Date: Wed, 25 Jan 2023 09:33:52 +0900
From: Dominique MARTINET <dominique.martinet@atmark-techno.com>
To: Rich Felker <dalias@libc.org>
Cc: musl@lists.openwall.com
Message-ID: <Y9B48H0MsvL9yLBv@atmark-techno.com>
References: <Y83j89CsGRHIfqWC@atmark-techno.com>
 <20230124083747.GI4163@brightrain.aerifal.cx>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20230124083747.GI4163@brightrain.aerifal.cx>
Subject: Re: [musl] infinite loop in mallocng's try_avail

Thanks for the reply,

Rich Felker wrote on Tue, Jan 24, 2023 at 03:37:48AM -0500:
> > (this is musl 1.2.4 with a couple of patchs, none around malloc:

(I had meant 1.2.3)

> > https://gitlab.alpinelinux.org/alpine/aports/-/tree/3.17-stable/main/musl
> > )
> > 
> > For convenience, I've copied the incriminated loop here:
> >         int cnt = m->mem->active_idx + 2;
> >         int size = size_classes[m->sizeclass]*UNIT;
> >         int span = UNIT + size*cnt;
> >         // activate up to next 4k boundary
> >         while ((span^(span+size-1)) < 4096) {
> >                 cnt++;
> >                 span += size;
> >         }
> 
> This code should not be reachable for size class 0 or any size class
> allocated inside a larger-size-class slot.
> That case has active_idx = cnt-1 (set at line 272).

I figured that it might be "normally" unreachable but did not see why,
thanks for confirming that intention.

> If this code is being reached, either the allocator state has been
> corrupted by some UB in the application, or there's a logic bug in
> mallocng. The sequence of events that seem to have to happen to get
> there are:
> 
> 1. Previously active group has no more available slots (line 120).

Right, that one has already likely been dequeued (or at least
traversed), so I do not see how to look at it but that sounds possible.

> 2. Freed mask of newly activating group (line 131 or 138) is either
>    zero (line 145) or the active_idx (read from in-band memory
>    susceptible to application buffer overflows etc) is wrong and
>    produces zero when its bits are anded with the freed mask (line
>    145).

m->freed_mask looks like it is zero from values below; I cannot tell if
that comes from a corruption outside of musl or not.

> > (gdb) p __malloc_context            
> > $94 = {
> >   secret = 15756413639004407235,
> >   init_done = 1,
> >   mmap_counter = 135,
> >   free_meta_head = 0x0,
> >   avail_meta = 0x18a3f70,
> >   avail_meta_count = 6,
> >   avail_meta_area_count = 0,
> >   meta_alloc_shift = 0,
> >   meta_area_head = 0x18a3000,
> >   meta_area_tail = 0x18a3000,
> >   avail_meta_areas = 0x18a4000 <error: Cannot access memory at address 0x18a4000>,
> >   active = {0x18a3e98, 0x18a3eb0, 0x18a3208, 0x18a3280, 0x0, 0x0, 0x0, 0x18a31c0, 0x0, 0x0, 0x0, 0x18a3148, 0x0, 0x0, 0x0, 0x18a3dd8, 0x0, 0x0, 0x0, 0x18a3d90, 0x0, 
> >     0x18a31f0, 0x0, 0x18a3b68, 0x0, 0x18a3f28, 0x0, 0x0, 0x0, 0x18a3238, 0x0 <repeats 18 times>},
> >   usage_by_class = {2580, 600, 10, 7, 0 <repeats 11 times>, 96, 0, 0, 0, 20, 0, 3, 0, 8, 0, 3, 0, 0, 0, 3, 0 <repeats 18 times>},
> >   unmap_seq = '\000' <repeats 31 times>,
> >   bounces = '\000' <repeats 18 times>, "w", '\000' <repeats 12 times>,
> >   seq = 1 '\001',
> >   brk = 25837568
> > }
> > (gdb) p *__malloc_context->active[0]
> > $95 = {
> >   prev = 0x18a3f40,
> >   next = 0x18a3e80,
> >   mem = 0xb6f57b30,
> >   avail_mask = 1073741822,
> >   freed_mask = 0,
> >   last_idx = 29,
> >   freeable = 1,
> >   sizeclass = 0,
> >   maplen = 0
> > }
> > (gdb) p *__malloc_context->active[0]->mem
> > $97 = {
> >   meta = 0x18a3e98,
> >   active_idx = 29 '\035',
> >   pad = "\000\000\000\000\000\000\000\000\377\000",
> >   storage = 0xb6f57b40 ""
> > }
> 
> This is really weird, because at the point of the infinite loop, the
> new group should not yet be activated (line 163), so
> __malloc_context->active[0] should still point to the old active
> group. But its avail_mask has all bits set and active_idx is not
> corrupted, so try_avail should just have obtained an available slot
> from it without ever entering the block at line 120. So I'm confused
> how it got to the loop.

try_avail's pm is `__malloc_context->active[0]`, which is overwritten by
either dequeue(pm, m) or *pm = m (lines 123,128), so the original
m->avail_mask could have been zero, with the next element having a zero
freed mask?

I'm really not familiar with the slot managemnet logic here, that might
not normally be possible without corruption, but the structures look
fairly sensible to me... Not that it proves there wasn't some sort of
outside corruption, I wish this was easier to reproduce so I could just
run it in valgrind or asan to detect overflows...

> One odd thing I noticed is that the backtrace pm=0xb6f692e8 does not
> match the __malloc_context->active[0] address. Were thse from
> different runs?

These were from the same run, I've only observed this single occurence
first-hand.

pm is &__malloc_context->active[0], so it's not 0x18a3e98 (first value
of active) but its address (e.g. __malloc_context+48 as per gdb symbol
resolution in the backtrace)
I didn't print __malloc_context but I don't see why gdb would have
gotten that wrong.


Cheers,
-- 
Dominique