From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12991 Path: news.gmane.org!.POSTED!not-for-mail From: ritesh sonawane Newsgroups: gmane.linux.lib.musl.general Subject: Re: Changing MMAP_THRESHOLD in malloc() implementation. Date: Thu, 5 Jul 2018 12:32:24 +0530 Message-ID: References: <20180703144331.GL1392@brightrain.aerifal.cx> <20180704145629.GQ1392@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000002b661f05703b2182" X-Trace: blaine.gmane.org 1530774033 25071 195.159.176.226 (5 Jul 2018 07:00:33 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 5 Jul 2018 07:00:33 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-13007-gllmg-musl=m.gmane.org@lists.openwall.com Thu Jul 05 09:00:29 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1fayFh-0006PR-5s for gllmg-musl@m.gmane.org; Thu, 05 Jul 2018 09:00:29 +0200 Original-Received: (qmail 32721 invoked by uid 550); 5 Jul 2018 07:02:38 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 32700 invoked from network); 5 Jul 2018 07:02:37 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=1B7Vz/RGEg/U4D1bu46StezSF0YKMPT4z9vjiq98bKk=; b=AAYidIgxCXywupSS8ZGnfTQLPEnqgD7/A7YYnxxjIZCGXBP31eEqCL9TdHDveBCTwv 3nkyUb/FPW8u13EhV1glLImlcnoaJXb7fGZycPOvIwfD4TZER72i5rmvSnlUxeTKcLBM 6meSxEQwN39zPqZ7c7IfldFXf7qD6ZhzDKXbSqRyRenC4BBSjf8eEBAXzHVJajNN81i+ cWjwFZFu4uvinZKRP0zCCuys5UOOFegLRNUjgtqVbNhxlm4lajHmKpcGTs0hnJS4F362 9S6ifVlNpHizTtDbcEAH0HMY8Wwu89GZEqPJzK+KaFVxq87IrinAUhiZVklIm/kqQ3kO tZaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=1B7Vz/RGEg/U4D1bu46StezSF0YKMPT4z9vjiq98bKk=; b=kY+SJliONFQoazD7EINXOGIMXKVCwxM+kJV3+DQ9l7NtR/68sSp7fgf/3RP9JQFdqf zZXj/7me/3NaTmv6d4nz61sjgqN/0Up7ZK859RQdW1OBtychKpX/U501Wljsu5wHS0qS JIFqcPb3XAH9GI6io2eNQX0KD7ZKdgBMiEmeDNDRPKh+YwmJuq2n5iEAaQ6FO/LEfZQx EHydJftUfmkyP2V65bIfvf8dnYQOhEuN2VX+gKNGb6D500BjplLeXQdoG/BLngHGeT5F CFAiI0JguO8QG5fP8jfRRgF7bdUselSbOKJ5RDrDLd99O+E6u5xbkcW8TSPykIiNKmRW xjVw== X-Gm-Message-State: APt69E2puitDcn6X9fbSLv5H5A3XVZR+306dTY+y1sGJfG54OP7n0F3I j1MzI5k8xDgp+zsqkkKYknKUOHB8h87ZFTwtgvZRxQ== X-Google-Smtp-Source: AAOMgpdr+e0t5OseriGnswRXV4l8gKKvoYtSlNI4WW9qUxpAo7d8mDthf6rv2Mk8nISQ00IiZDACfgHpB+ZAVB++/js= X-Received: by 2002:a6b:d106:: with SMTP id l6-v6mr3673162iob.8.1530774145425; Thu, 05 Jul 2018 00:02:25 -0700 (PDT) In-Reply-To: Xref: news.gmane.org gmane.linux.lib.musl.general:12991 Archived-At: --0000000000002b661f05703b2182 Content-Type: text/plain; charset="UTF-8" >If this is a custom architecture, have you considered using variable >pagesize like x86 and others? Unconditionally using large/huge pages >for everything seems like a really, really bad idea. Aside from >wasting memory, it makes COW latency spikes really high (memcpy a >whole 2MB ot 64MB on fault) and even worse for paging in mmapped files >from a block-device-backed filesystem. Currently our architecture is not supporting COW and demand paging. page size is decided at compile time only or it can be passed to mmap() syscall like x86 to change page size during run time new memory requests. On Thu, Jul 5, 2018 at 12:22 PM, ritesh sonawane wrote: > > How does this happen? The behavior you should see is just rounding up > > of the request to a multiple of the page size, not scaling of the > > request. Maybe I don't understand what you're saying here. > > In our case if threshold value is 224KB then all request more than 224KB > will allocate memory > using mmap() only. In that case size will be aligned to page > boundary(64MB). > So user if application request size 225KB multiple time then for every > request of 225KB one page of > (64MB) will be allocated. It means if user request 225KB five time then > according to > user (225KB x 5 = 1125KB) is consumed, But actual memory consumed is > (64MB x 5 = 320MB). > > > On Wed, Jul 4, 2018 at 8:26 PM, Rich Felker wrote: > >> On Wed, Jul 04, 2018 at 12:35:02PM +0530, ritesh sonawane wrote: >> > sir above statistics are with 64MB page size. we are using system which >> > having 2MB(large page) >> > 64MB(huge page). >> >> If this is a custom architecture, have you considered using variable >> pagesize like x86 and others? Unconditionally using large/huge pages >> for everything seems like a really, really bad idea. Aside from >> wasting memory, it makes COW latency spikes really high (memcpy a >> whole 2MB ot 64MB on fault) and even worse for paging in mmapped files >> from a block-device-backed filesystem. >> >> Rich >> >> > On Wed, Jul 4, 2018 at 10:54 AM, ritesh sonawane < >> rdssonawane2317@gmail.com> >> > wrote: >> > >> > > Thank you very much for instant reply.. >> > > >> > > Yes sir it is wasting memory for each shared library. But memory >> wastage >> > > even worse when program >> > > requesting memory with size more than 224KB(threshold value). >> > > >> > > ->If a program requests 1GB per request, it can use 45GB at the most. >> > > ->If a program requests 512MB per request, it can use 41.5GB at the >> most. >> > > ->If a program requests 225KB per request, it can use about 167MB at >> the >> > > most. >> > > >> > > As we ported musl-1.1.14 for our architecture, we are bounded to make >> > > change in same base code. >> > > we have increased MMAP_THRESHOLD to 1GB and also changes the >> calculation >> > > for bin index . >> > > after that observed improvement in memory utilization. i.e for size >> 225KB >> > > memory used is 47.6 GB. >> > > >> > > But now facing problem in multi threaded application. As we haven't >> > > changed the function pretrim() >> > > because there are some hard coded values like '40' and '3' used and >> > > unable to understand how >> > > these values are decided ..? >> > > >> > > static int pretrim(struct chunk *self, size_t n, int i, int j) >> > > { >> > > size_t n1; >> > > struct chunk *next, *split; >> > > >> > > /* We cannot pretrim if it would require re-binning. */ >> > > if (j < 40) return 0; >> > > if (j < i+3) { >> > > if (j != 63) return 0; >> > > n1 = CHUNK_SIZE(self); >> > > if (n1-n <= MMAP_THRESHOLD) return 0; >> > > } else { >> > > n1 = CHUNK_SIZE(self); >> > > } >> > > ..... >> > > ..... >> > > ...... >> > > } >> > > >> > > can we get any clue how these value are decided, it will be very >> > > helpful for us. >> > > >> > > Best Regard, >> > > Ritesh Sonawane >> > > >> > > On Tue, Jul 3, 2018 at 8:13 PM, Rich Felker wrote: >> > > >> > >> On Tue, Jul 03, 2018 at 12:58:04PM +0530, ritesh sonawane wrote: >> > >> > Hi All, >> > >> > >> > >> > We are using musl-1.1.14 version for our architecture. It is >> having page >> > >> > size of 2MB. >> > >> > Due to low threshold value there is more memory wastage. So we >> want to >> > >> > change the value of MMAP_THRESHOLD. >> > >> > >> > >> > can anyone please giude us, which factor need to consider to >> change this >> > >> > threshold value ? >> > >> >> > >> It's not a parameter that can be changed but linked to the scale of >> > >> bin sizes. There is no framework to track and reuse freed chunks >> which >> > >> are larger than MMAP_THRESHOLD, so you'd be replacing recoverable >> > >> waste from page granularity with unrecoverable waste from inability >> to >> > >> reuse these larger freed chunks except breaking them up into pieces >> to >> > >> satisfy smaller requests. >> > >> >> > >> I may look into handling this better when replacing musl's malloc at >> > >> some point, but if your system forces you to use ridiculously large >> > >> pages like 2MB, you've basically committed to wasting huge amounts of >> > >> memory anyway (at least 2MB for each shared library in each >> > >> process)... >> > >> >> > >> With musl git-master and future releases, you have the option to link >> > >> a malloc replacement library that might be a decent solution to your >> > >> problem. >> > >> >> > >> Rich >> > >> >> > > >> > > >> > > --0000000000002b661f05703b2182 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
>If this is a custom archite= cture, have you considered using variable
>pagesize like x86 and others? Uncondit= ionally using large/huge pages
>for everything seems like a really, really bad id= ea. Aside from
>wasting memory, it makes COW latency spikes really high (memcpy a=
>whol= e 2MB ot 64MB on fault) and even worse for paging in mmapped files>from a block-= device-backed filesystem.

Currently our architecture is not supporting COW an= d demand paging.=C2=A0
page size is decided at compile time only = or it can be passed=C2=A0 to mmap()=C2=A0
syscall like x86=C2=A0 = to change page size during run time new memory requests.=C2=A0=C2=A0
<= div>

On Thu, Jul 5, 2018 at 12:22 PM, ritesh sonawane <<= a href=3D"mailto:rdssonawane2317@gmail.com" target=3D"_blank">rdssonawane23= 17@gmail.com> wrote:
> How does this happen? The behav= ior you should see is just rounding up
> of the request to a multiple of the page= size, not scaling of the
> request. Maybe I don't understand what you're= saying here.

In our case if threshold v= alue is 224KB then all request more than 224KB will allocate memory=C2=A0
using mmap()=C2= =A0only. In that case size will be = aligned to page boundary(64MB).=C2=A0
So user if application request size 225KB multiple = time then for every request of 225KB=C2=A0one page of=C2=A0
= (64MB) will be allocated.=C2=A0 It means if user request 225KB five time th= en according to
user (225= KB x 5 =3D 1125KB) is consumed,=C2=A0 But actual memory consumed is (64MB x= 5 =3D 320MB).

=

On Wed, Jul 4, 2018 at 8:26 PM, R= ich Felker <dalias@libc.org> wrote:
On Wed, Jul 04, 2018 at 12:35:02PM +0530, ritesh sonawane wr= ote:
> sir above statistics are with 64MB page size. we are using system whic= h
> having 2MB(large page)
> 64MB(huge page).

If this is a custom architecture, have you considered using variable=
pagesize like x86 and others? Unconditionally using large/huge pages
for everything seems like a really, really bad idea. Aside from
wasting memory, it makes COW latency spikes really high (memcpy a
whole 2MB ot 64MB on fault) and even worse for paging in mmapped files
from a block-device-backed filesystem.

Rich

> On Wed, Jul 4, 2018 at 10:54 AM, ritesh sonawane <rdssonawane2317@gmail.com= >
> wrote:
>
> > Thank you very much for instant reply..
> >
> > Yes sir it is wasting memory for each shared library. But memory = wastage
> > even worse when program
> > requesting memory with size more than 224KB(threshold value).
> >
> > ->If a program requests 1GB per request, it can use 45GB at th= e most.
> > ->If a program requests 512MB per request, it can use 41.5GB a= t the most.
> > ->If a program requests 225KB per request, it can use about 16= 7MB at the
> > most.
> >
> > As we ported=C2=A0 musl-1.1.14 for our architecture, we are bound= ed to make
> > change in same base code.
> > we have increased=C2=A0 MMAP_THRESHOLD to 1GB and also changes th= e calculation
> > for bin index .
> > after that observed improvement in memory utilization. i.e for si= ze 225KB
> > memory used is 47.6 GB.
> >
> > But now facing problem in multi threaded application. As we haven= 't
> > changed the function pretrim()
> > because there are some hard coded values like '40' and &#= 39;3' used and
> > unable to understand how
> > these values are decided ..?
> >
> > static int pretrim(struct chunk *self, size_t n, int i, int j) > > {
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t n1;
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct chunk *next, *split;
> >
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* We cannot pretrim if it would= require re-binning. */
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (j < 40) return 0;
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (j < i+3) {
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (= j !=3D 63) return 0;
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0n1 = =3D CHUNK_SIZE(self);
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (= n1-n <=3D MMAP_THRESHOLD) return 0;
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0n1 = =3D CHUNK_SIZE(self);
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> >=C2=A0 .....
> >=C2=A0 .....
> > ......
> > }
> >
> > can we get any clue how these value are decided, it will be very<= br> > > helpful for us.
> >
> > Best Regard,
> > Ritesh Sonawane
> >
> > On Tue, Jul 3, 2018 at 8:13 PM, Rich Felker <dalias@libc.org> wrote:
> >
> >> On Tue, Jul 03, 2018 at 12:58:04PM +0530, ritesh sonawane wro= te:
> >> > Hi All,
> >> >
> >> > We are using musl-1.1.14 version for our architecture. I= t is having page
> >> > size of 2MB.
> >> > Due to low threshold value there is more memory wastage.= So we want to
> >> > change the value of=C2=A0 MMAP_THRESHOLD.
> >> >
> >> > can anyone please giude us, which factor need to conside= r to change this
> >> > threshold value ?
> >>
> >> It's not a parameter that can be changed but linked to th= e scale of
> >> bin sizes. There is no framework to track and reuse freed chu= nks which
> >> are larger than MMAP_THRESHOLD, so you'd be replacing rec= overable
> >> waste from page granularity with unrecoverable waste from ina= bility to
> >> reuse these larger freed chunks except breaking them up into = pieces to
> >> satisfy smaller requests.
> >>
> >> I may look into handling this better when replacing musl'= s malloc at
> >> some point, but if your system forces you to use ridiculously= large
> >> pages like 2MB, you've basically committed to wasting hug= e amounts of
> >> memory anyway (at least 2MB for each shared library in each > >> process)...
> >>
> >> With musl git-master and future releases, you have the option= to link
> >> a malloc replacement library that might be a decent solution = to your
> >> problem.
> >>
> >> Rich
> >>
> >
> >


--0000000000002b661f05703b2182--