From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HTML_FONT_FACE_BAD,HTML_MESSAGE, MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2,T_KAM_HTML_FONT_INVALID autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 25298 invoked from network); 20 Sep 2022 02:35:15 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 20 Sep 2022 02:35:15 -0000 Received: (qmail 13467 invoked by uid 550); 20 Sep 2022 02:35:12 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 13447 invoked from network); 20 Sep 2022 02:35:11 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:mime-version:references:subject:cc:to:from:date:from:to :cc:subject:date; bh=z+bK7E4CAUu1nbPyDA8LMxdjPtLDcTJi2DjB2p8f1bo=; b=MtpR4oZBkt8s16dGYcuPVc9nRK7rcGmdMEbcaaqTpwq6MhdjmT6KSxkH9JT6Qlg27x D9DjXA2oFpY4rBOsVhZA7NQTyNkIzl80yHCqTxewadj4NMGB5YyjbWXj74jTLG7yiVn7 wzD2ONhzeSRyqzrh6QK3bt5gHXFxapcHn92sYoLNI8GuNIx/IBiGOhoGF97dJj2wufAk fjCZhbj9jLKKSxdpnVP3dwQWyrsgwfTgY2s3ouhYhCoffzm8bdZD9G+ytpcuazKM6vwJ yVzgKolC5bo/OGlxY1UMD+tQAh/EMiOljD5JLA7EMv8XSvEV+Ofi0yTL6WNLOvZ+L9Nm 9hMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=message-id:mime-version:references:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date; bh=z+bK7E4CAUu1nbPyDA8LMxdjPtLDcTJi2DjB2p8f1bo=; b=4aCvpbdYmzG7FACHQ88FIZC/yODDxsxOvXQIalBLIyEbKWDZMcsYhlijOpRrkKGgjJ 7ORhpkuN3FFaFthGTxZs13BkBzYrd12Sv5qH0pwr9eBRIm1G0GS7rEa9aQ00H7C5Zyva Y5KTwiNyFrxAaHx4G2kmIolO2dbHPp/qz4x7VRFcmssn5J9HsBwTDIPAl3rj58aRIHla iGN9FFTbsbqTezZKoqKfdlnjQrXITchwES8zi0P21RQ3uUe8WgSSg8bsrKMS/nEVq/9a LtRHWfTvqc2NfQDkvGy4ZQGAfQUbCc47j0KwZLWysLmq9WlqQYNpGeYVOv9wMf+Z0Rhr gVmQ== X-Gm-Message-State: ACrzQf1FplbO8/ZHP9QeyetjygLFYddt/uUYmRKkDwUeJFJrcqDU+ATV SE7l3Y3nQUbFBg0d/tgO3fMuY5yq/TIdzg== X-Google-Smtp-Source: AMsMyM58IWQL10FRH6uReCDGcSKBOT+Pi9o99fEqzFT5HBL6spzbDE86EgKg8K9EU19ToPLVf9g0fw== X-Received: by 2002:a17:90b:390e:b0:202:5d4e:c1f2 with SMTP id ob14-20020a17090b390e00b002025d4ec1f2mr1379980pjb.45.1663641299136; Mon, 19 Sep 2022 19:34:59 -0700 (PDT) Date: Tue, 20 Sep 2022 10:35:02 +0800 From: baiyang To: "Rich Felker" Cc: musl References: <874jx3h76u.fsf@oldenburg.str.redhat.com>, <20220919134659.GO9709@brightrain.aerifal.cx>, , <2022092001404698842815@gmail.com>, , <2022092008254998320584@gmail.com>, <20220920003811.GF9709@brightrain.aerifal.cx>, <2022092008470636285288@gmail.com>, <20220920010056.GG9709@brightrain.aerifal.cx>, <2022092009180277847194@gmail.com>, <20220920021511.GH9709@brightrain.aerifal.cx> X-Priority: 3 X-GUID: 69683B91-E9C0-4739-B757-3BEC8A1C52CB X-Has-Attach: no X-Mailer: Foxmail 7.2.23.116[cn] Mime-Version: 1.0 Message-ID: <20220920103500598557106@gmail.com> Content-Type: multipart/alternative; boundary="----=_001_NextPart682342858752_=----" Subject: Re: Re: [musl] The heap memory performance (malloc/free/realloc) is significantly degraded in musl 1.2 (compared to 1.1) This is a multi-part message in MIME format. ------=_001_NextPart682342858752_=---- Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 PiBZb3Ugc2VlbSB0byB0aGluayB0aGF0IGlmIHRoZSBncm91cCBzdHJpZGUgd2FzIDgxMDAsIGNh bGxpbmcgcmVhbGxvYyBtaWdodCBtZW1jcHkgdXAgdG8gODEwMCBieXRlcy4gVGhpcyBpcyBub3Qg dGhlIGNhc2UuDQoNClllcywgSSBhbHJlYWR5IHVuZGVyc3Rvb2QgdGhhdCBtYWxsb2NuZyB3b3Vs ZCBvbmx5IG1lbWNweSA2NjAwIGJ5dGVzIHdoZW4gSSB3YXMgdG9sZCB0aGF0IG1hbGxvY191c2Fi bGVfc2l6ZSB3aWxsIHJldHVybiB0aGUgc2l6ZSByZXF1ZXN0ZWQgYnkgdGhlIHVzZXIuDQoNCkJ1 dCBBRkFJSywgbWFueSBvdGhlciBtYWxsb2MgaW1wbGVtZW50YXRpb25zIGJhc2ljYWxseSBkb24n dCBrZWVwIDY2MDAgYnl0ZXMgb2YgZGF0YS4gU28gdGhleSdyZSBhY3R1YWxseSBnb2luZyB0byBt ZW1jcHkgdGhlIDgxMDAgYnl0ZXMuDQoNCj4gWW91IGFsc28gc2VlbSB0byBiZSB1bmRlciB0aGUg aW1wcmVzc2lvbiB0aGF0IHRoZSB3b3JrIHRvIGRldGVybWluZQ0KPiB0aGF0IHRoZSBzaXplIHdh cyA2NjAwIGFuZCBub3QgODEwMCBpcyB3aGVyZSBtb3N0IChvciBhdCBsZWFzdCBhDQo+IHNpZ25p ZmljYW50IHBvcnRpb24gb2YpIHRoZSB0aW1lIGlzIHNwZW50LiAgVGhpcyBpcyBhbHNvIG5vdCB0 aGUgY2FzZS4NCj4gVGhlIG1ham9yaXR5IG9mIHRoZSBtZXRhZGF0YSBwcm9jZXNzaW5nIHRpbWUg aXMgY2hhc2luZyBwb2ludGVycyBiYWNrDQo+IHRvIHRoZSBvdXQtb2YtYmFuZCBtZXRhZGF0YSwg dmFsaWRhdGluZyBpdCwgdmFsaWRhdGluZyB0aGF0IGl0DQo+IHJvdW5kLXRyaXBzIGJhY2ssIGFu ZCB2YWxpZGF0aW5nIHZhcmlvdXMgb3RoZXIgdGhpbmdzLiBTb21lIG9mIHRoZXNlDQo+IGNvdWxk IGluIHByaW5jaXBsZSBiZSBvbWl0dGVkIGF0IHRoZSBjb3N0IG9mIGxvc3Mtb2YtaGFyZGVuaW5n Lg0KDQpZZXMsIGFjY29yZGluZyB0byBteSBwcmV2aW91cyB1bmRlcnN0YW5kaW5nICh3aGljaCBz ZWVtcyB3cm9uZyBub3cpLCBzaW5jZSBvdGhlciBtYWxsb2NfdXNhYmxlX3NpemUgaW1wbGVtZW50 YXRpb25zIHRoYXQgZGlyZWN0bHkgcmV0dXJuIDgxMDAgKHRoZSBhY3R1YWwgYWxsb2NhdGVkIHNp emUgY2xhc3MgbGVuZ3RoKSBzdWNoIGFzIHRjbWFsbG9jIGFyZSBhbGwgdmVyeSBmYXN0LCBzbyBJ IGNhbiBvbmx5IHVuZGVyc3RhbmQgdGhhdCBtYWxsb2NuZyBpcyBzbyBtdWNoIHNsb3dlciB0aGFu IHRoZW0gYmVjYXVzZSBpdCBoYXMgdG8gcmV0dXJuIDY2MDAsIG5vdCA4MTAwLiBBcGFydCBmcm9t IHRoaXMgZGlmZmVyZW5jZSwgdGhlcmUgaXMgbm8gcmVhc29uIGl0IGlzIHNsb3dlciB0aGFuIG90 aGVyIGltcGxlbWVudGF0aW9ucyBvZiBtYWxsb2NfdXNhYmxlX3NpemUgYXMgSSB1bmRlcnN0YW5k IGl0Lg0KDQpJZiB0aGlzIGlzIG5vdCB0aGUgbWFpbiByZWFzb24sIGNhbiB3ZSBzcGVlZCB1cCB0 aGlzIGFsZ29yaXRobSB3aXRoIHRoZSBoZWxwIG9mIGEgZmFzdCBsb29rdXAgdGFibGUgbWVjaGFu aXNtIGxpa2UgdGNtYWxsb2M/IEFzIEkgc2FpZCBiZWZvcmUsIHRoaXMgbm90IG9ubHkgZ3JlYXRs eSBpbmNyZWFzZXMgdGhlIHBlcmZvcm1hbmNlIG9mIG1hbGxvY191c2FibGVfc2l6ZSAsIGJ1dCBh bHNvIHRoZSBwZXJmb3JtYW5jZSBvZiByZWFsbG9jIGFuZCBmcmVlIC4NCg0KVGhhbmtzIDotKQ0K IA0KLS0NCg0KICAgQmVzdCBSZWdhcmRzDQogIEJhaVlhbmcNCiAgYmFpeWFuZ0BnbWFpbC5jb20N CiAgaHR0cDovL2kuYmFpeS5jbg0KKioqKiA8IEVORCBPRiBFTUFJTCA+ICoqKiogDQogDQogDQpG cm9tOiBSaWNoIEZlbGtlcg0KRGF0ZTogMjAyMi0wOS0yMCAxMDoxNQ0KVG86IGJhaXlhbmcNCkND OiBtdXNsDQpTdWJqZWN0OiBSZTogUmU6IFttdXNsXSBUaGUgaGVhcCBtZW1vcnkgcGVyZm9ybWFu Y2UgKG1hbGxvYy9mcmVlL3JlYWxsb2MpIGlzIHNpZ25pZmljYW50bHkgZGVncmFkZWQgaW4gbXVz bCAxLjIgKGNvbXBhcmVkIHRvIDEuMSkNCk9uIFR1ZSwgU2VwIDIwLCAyMDIyIGF0IDA5OjE4OjA0 QU0gKzA4MDAsIGJhaXlhbmcgd3JvdGU6DQo+ID4gVGhlcmUgaXMgbm8gaGlkZGVuICJzaXplIGFj dHVhbGx5IGFsbG9jYXRlZCBpbnRlcm5hbGx5Ii4gVGhlIHNpemUgeW91DQo+ID4gZ2V0IGlzIHRo ZSBzaXplIHlvdSByZXF1ZXN0ZWQuIEV2ZXJ5dGhpbmcgZWxzZSBpcyBhbGxvY2F0b3IgZGF0YQ0K PiA+IHN0cnVjdHVyZXMgKm91dHNpZGUgb2YgdGhlIG9iamVjdCogdGhhdCB0aGUgY2FsbGVyIGhh cyBubyBlbnRpdGxlbWVudA0KPiA+IHRvIHBlZWsgb3IgcG9rZSBhdCwgYW5kIG1hbGxvY191c2Fi bGVfc2l6ZSdzIHJldHVybiB2YWx1ZSByZWZsZWN0cw0KPiA+IHRoYXQuDQo+IA0KPiBJZiBJIHVu ZGVyc3RhbmQgY29ycmVjdGx5LCBhY2NvcmRpbmcgdG8gdGhlIGRlZmluaXRpb24gb2Ygc2l6ZV9j bGFzc2VzIGluIHRoZSBtYWxsb2NuZyBjb2RlOiANCj4gMS4gV2hlbiBJIGNhbGwgYHZvaWQqIHAg PSBtYWxsb2MoNjYwMClgLCBtYWxsb2NuZyBhY3R1YWxseSBhbGxvY2F0ZXMNCj4gbW9yZSB0aGFu IDgxMDAgYnl0ZXMgb2YgdXNhYmxlIHNwYWNlLCByaWdodD8NCiANCk5vLCBpdCB1c2VzIHNwYWNl IGZyb20gYSBzaXplLWNsYXNzLTgxNzYgZ3JvdXAgKH49c2xhYikgdG8gcHJvZHVjZSBhbg0KYWxs b2NhdGlvbiBvZiBzaXplIDY2MDAuIFRoZSAqYWxsb2NhdGlvbiogaXMgdGhlIHBhcnQgdGhhdCBi ZWxvbmdzIHRvDQp0aGUgY2FsbGVyLiBFdmVyeXRoaW5nIGVsc2UgaXMgcGFydCBvZiB0aGUgYWxs b2NhdG9yIGRhdGEgc3RydWN0dXJlcy4NCiANCj4gMi4gQWNjb3JkaW5nIHRvIHlvdXIgcHJldmlv dXMgZXhwbGFuYXRpb24sIGNhbGxpbmcNCj4gbWFsbG9jX3VzYWJsZV9zaXplKHApIGF0IHRoaXMg dGltZSByZXR1cm5zIDY2MDAsIHJpZ2h0Pw0KIA0KWWVzLg0KIA0KPiBNeSBxdWVzdGlvbiBpcywg aWYgbWFsbG9jX3VzYWJsZV9zaXplKHApIGNhbiBkaXJlY3RseSByZXR1cm4gODE5MQ0KPiAob3Ig c2ltaWxhciBhY3R1YWwgYWxsb2NhdGVkIHNpemUsIGFzIG90aGVyIGxpYmMgZG8pIGluc3RlYWQg b2YNCj4gNjYwMCwgaXMgaXQgcG9zc2libGUgdG8gbWFrZSBtYWxsb2NuZyBhY2hpZXZlIGhpZ2hl ciBwZXJmb3JtYW5jZQ0KPiBib3RoIGluIHRpbWUgYW5kIHNwYWNlPw0KIA0KTm8sIGFuZCB0aGUg cmVhc29uIHlvdSBzYWlkIHlvdSB3YW50IGl0IHRvIGRvZXMgbm90IG1ha2Ugc2Vuc2UuIFlvdQ0K c2VlbSB0byB0aGluayB0aGF0IGlmIHRoZSBncm91cCBzdHJpZGUgd2FzIDgxMDAsIGNhbGxpbmcg cmVhbGxvYyBtaWdodA0KbWVtY3B5IHVwIHRvIDgxMDAgYnl0ZXMuIFRoaXMgaXMgbm90IHRoZSBj YXNlLiBJZiByZWFsbG9jIGhhcyB0bw0KYWxsb2NhdGUgYSBuZXcgb2JqZWN0LCB0aGUgYW1vdW50 IGNvcGllZCB3aWxsIGJlIDY2MDAgb3IgZXhhY3RseQ0Kd2hhdGV2ZXIgdGhlIGFsbG9jYXRlZCBv YmplY3Qgc2l6ZSB3YXMgKG9yIHRoZSBuZXcgc2l6ZSwgaWYgc21hbGxlcikuDQpUaGlzIGlzIHRo ZSBvbmx5IG1lYW5pbmdmdWwgbnVtYmVyLg0KIA0KWW91IGFsc28gc2VlbSB0byBiZSB1bmRlciB0 aGUgaW1wcmVzc2lvbiB0aGF0IHRoZSB3b3JrIHRvIGRldGVybWluZQ0KdGhhdCB0aGUgc2l6ZSB3 YXMgNjYwMCBhbmQgbm90IDgxMDAgaXMgd2hlcmUgbW9zdCAob3IgYXQgbGVhc3QgYQ0Kc2lnbmlm aWNhbnQgcG9ydGlvbiBvZikgdGhlIHRpbWUgaXMgc3BlbnQuIFRoaXMgaXMgYWxzbyBub3QgdGhl IGNhc2UuDQpUaGUgbWFqb3JpdHkgb2YgdGhlIG1ldGFkYXRhIHByb2Nlc3NpbmcgdGltZSBpcyBj aGFzaW5nIHBvaW50ZXJzIGJhY2sNCnRvIHRoZSBvdXQtb2YtYmFuZCBtZXRhZGF0YSwgdmFsaWRh dGluZyBpdCwgdmFsaWRhdGluZyB0aGF0IGl0DQpyb3VuZC10cmlwcyBiYWNrLCBhbmQgdmFsaWRh dGluZyB2YXJpb3VzIG90aGVyIHRoaW5ncy4gU29tZSBvZiB0aGVzZQ0KY291bGQgaW4gcHJpbmNp cGxlIGJlIG9taXR0ZWQgYXQgdGhlIGNvc3Qgb2YgbG9zcy1vZi1oYXJkZW5pbmcuDQogDQpGaWd1 cmluZyBvdXQgdGhhdCB0aGUgYWxsb2NhdGlvbiBpcyA2NjAwIGJ5dGVzLCBvbmNlIHlvdSBhbHJl YWR5IGtub3cNCnRoZSBzaXplIGNsYXNzIGFuZCBvdXQtb2YtYmFuZCBtZXRhZGF0YSwgaXMgcXVp dGUgdHJpdmlhbCBhbmQgaGFyZGx5DQp0YWtlcyBhbnkgb2YgdGhlIHRpbWUuIChJdCBhbHNvIGhh cyBhIGZldyB2YWxpZGF0aW9uIGNoZWNrcyB0aGF0IGNvdWxkDQpiZSBvbWl0dGVkIGF0IHRoZSBj b3N0IG9mIGxvc3Mgb2YgaGFyZGVuaW5nLCBidXQgdGhlc2UgYXJlDQpwcm9wb3J0aW9uYWxseSBt dWNoIHNtYWxsZXIuKQ0KIA0KUmljaA0K ------=_001_NextPart682342858752_=---- Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable =0A
> You seem to think that if the group stride was 8100, calling = realloc might memcpy up to 8100 bytes. This is not the case.
=
Yes, = I already understood that mallocng would only memcpy 6600 bytes when I was= told that malloc_usable_size will return the size requested by the user.<= /span>

But AFAIK, many other malloc implementations basica= lly don't keep 6600 bytes of data. So they're actually going to memcpy the 8100 bytes.
=0A

<= /div>
You also seem to be under the impression that the wo= rk to determine
> that the size was 6600 and not 8100 = is where most (or at least a
> significant portion of) the ti= me is spent.  This is also not the case.
> The majori= ty of the metadata processing time is chasing pointers back
>= to the out-of-band metadata, validating it, validating that it
= > round-trips back, and validating various other things. Some of these<= /div>
> could in principle be omitted at the cost of loss-of-harden= ing.

Yes, according to my previous understanding = (which seems wrong now), since other malloc_usable_size implementations th= at directly return 8100 (the actual allocated size class length) such as t= cmalloc are all very fast, so I can only understand that mallocng is so mu= ch slower than them because it has to return 6600, not 8100. Apart from this = difference, there is no reason it is slower than other implementations of = malloc_usable_size as I understand it.

If = this is not the main reason, can we speed up this algorithm with the help = of a fast lookup table mechanism like tcmalloc? As I said before, thi= s not only greatly increases the performance of malloc_usable_size , but a= lso the performance of realloc and free .

Thanks = :-)
 
=0A
--

   Best Regards
  B= aiYang
  baiy= ang@gmail.com
  http://i.bai= y.cn
***= * < END OF EMAIL > ****
 
 
=
 
<= b>From: Rich Felker
<= div>Date: 2022-09-20 10:15
To: baiyang
CC: musl
Subject:=  Re: Re: [musl] The heap memory performance (malloc/free/realloc) is = significantly degraded in musl 1.2 (compared to 1.1)
On Tue, Sep 20, 2022 at 09:18:04AM +0800, baiyang wrote:
=0A> > There is no hidden "size actually allocated internally". The s= ize you
=0A
> > get is the size you requested. Everything e= lse is allocator data
=0A
> > structures *outside of the ob= ject* that the caller has no entitlement
=0A
> > to peek or= poke at, and malloc_usable_size's return value reflects
=0A
>= > that.
=0A
>
=0A
> If I understand correctly= , according to the definition of size_classes in the mallocng code:
= =0A
> 1. When I call `void* p =3D malloc(6600)`, mallocng actually = allocates
=0A
> more than 8100 bytes of usable space, right?=0A
 
=0A
No, it uses space from a size-class-8176 g= roup (~=3Dslab) to produce an
=0A
allocation of size 6600. The *a= llocation* is the part that belongs to
=0A
the caller. Everything= else is part of the allocator data structures.
=0A
 
= =0A
> 2. According to your previous explanation, calling
=0A> malloc_usable_size(p) at this time returns 6600, right?
=0A =0A
Yes.
=0A
 
=0A
> My quest= ion is, if malloc_usable_size(p) can directly return 8191
=0A
>= ; (or similar actual allocated size, as other libc do) instead of
=0A=
> 6600, is it possible to make mallocng achieve higher performance=
=0A
> both in time and space?
=0A
 
=0ANo, and the reason you said you want it to does not make sense. You=0A
seem to think that if the group stride was 8100, calling realloc = might
=0A
memcpy up to 8100 bytes. This is not the case. If reall= oc has to
=0A
allocate a new object, the amount copied will be 66= 00 or exactly
=0A
whatever the allocated object size was (or the = new size, if smaller).
=0A
This is the only meaningful number.=0A
 
=0A
You also seem to be under the impression th= at the work to determine
=0A
that the size was 6600 and not 8100 = is where most (or at least a
=0A
significant portion of) the time= is spent. This is also not the case.
=0A
The majority of the met= adata processing time is chasing pointers back
=0A
to the out-of-= band metadata, validating it, validating that it
=0A
round-trips = back, and validating various other things. Some of these
=0A
coul= d in principle be omitted at the cost of loss-of-hardening.
=0A
&= nbsp;
=0A
Figuring out that the allocation is 6600 bytes, once yo= u already know
=0A
the size class and out-of-band metadata, is qu= ite trivial and hardly
=0A
takes any of the time. (It also has a = few validation checks that could
=0A
be omitted at the cost of lo= ss of hardening, but these are
=0A
proportionally much smaller.)<= /div>=0A
 
=0A
Rich
=0A
=0A ------=_001_NextPart682342858752_=------