mailing list of musl libc
 help / color / mirror / code / Atom feed
* Query regarding malloc if statement
@ 2017-06-19 15:16 Jamie Mccrae
  2017-06-19 18:34 ` Markus Wichmann
  0 siblings, 1 reply; 7+ messages in thread
From: Jamie Mccrae @ 2017-06-19 15:16 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1131 bytes --]

Hi,

I'm using musl to compile a cross-distro application which I've been having problems with and whilst discussing the problem the developer of another project, was shown a musl malloc function which manually checks the contents of each byte and changes it to 0 if the byte is non-0. This code is in src/malloc/malloc.c as so:

void *__malloc0(size_t n)
{
        void *p = malloc(n);
        if (p && !IS_MMAPPED(MEM_TO_CHUNK(p))) {
                size_t *z;
                n = (n + sizeof *z - 1)/sizeof *z;
                for (z=p; n; n--, z++) if (*z) *z=0;
        }
        return p;
}


This code causes thousands of errors when using valgrind (in excess of 800,000 for my application) due to checking the value of each byte before it has been set and I have to agree with this other developer that I'm at a loss as to why this is performed. If you step through the array and just set each byte to 0 then there will be no read-before-initialisation error and the function will run much faster due to not having to retrieve the data. Why not instead use:

                for (z=p; n; n--, z++) *z=0;

[-- Attachment #2: Type: text/html, Size: 2093 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Query regarding malloc if statement
  2017-06-19 15:16 Query regarding malloc if statement Jamie Mccrae
@ 2017-06-19 18:34 ` Markus Wichmann
  2017-06-19 21:02   ` Jamie Mccrae
  0 siblings, 1 reply; 7+ messages in thread
From: Markus Wichmann @ 2017-06-19 18:34 UTC (permalink / raw)
  To: musl

On Mon, Jun 19, 2017 at 03:16:16PM +0000, Jamie Mccrae wrote:
> Hi,
> 
> I'm using musl to compile a cross-distro application which I've been having problems with and whilst discussing the problem the developer of another project, was shown a musl malloc function which manually checks the contents of each byte and changes it to 0 if the byte is non-0. This code is in src/malloc/malloc.c as so:
> 
> void *__malloc0(size_t n)
> {
>         void *p = malloc(n);
>         if (p && !IS_MMAPPED(MEM_TO_CHUNK(p))) {
>                 size_t *z;
>                 n = (n + sizeof *z - 1)/sizeof *z;
>                 for (z=p; n; n--, z++) if (*z) *z=0;
>         }
>         return p;
> }
> 
> 
> This code causes thousands of errors when using valgrind (in excess of 800,000 for my application) due to checking the value of each byte before it has been set and I have to agree with this other developer that I'm at a loss as to why this is performed. If you step through the array and just set each byte to 0 then there will be no read-before-initialisation error and the function will run much faster due to not having to retrieve the data. Why not instead use:
> 
>                 for (z=p; n; n--, z++) *z=0;

Ah, yet another valgrind false positive. If the memory was allocated
with mmap() (which is different from IS_MMAPPED(), because the latter
means that ONLY the chunk is in that map), then the first write access
will cause a page fault. Avoiding write access therefore improves
performance. A lot. Such a mapping will be read as zero without
consequence.

My advice: Get valgrind to ignore the system library, as it doesn't know
what it's doing there. We already had a lot of reclaim_gaps() fun there.

Ciao,
Markus


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Query regarding malloc if statement
  2017-06-19 18:34 ` Markus Wichmann
@ 2017-06-19 21:02   ` Jamie Mccrae
  2017-06-20  4:14     ` Markus Wichmann
  0 siblings, 1 reply; 7+ messages in thread
From: Jamie Mccrae @ 2017-06-19 21:02 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2048 bytes --]



> Ah, yet another valgrind false positive. If the memory was allocated
> with mmap() (which is different from IS_MMAPPED(), because the latter
> means that ONLY the chunk is in that map), then the first write access
> will cause a page fault. Avoiding write access therefore improves
> performance. A lot. Such a mapping will be read as zero without
> consequence.

My understanding is that doing a read followed by a possible write is slower than always doing a write for the reason that upon doing a read the process will halt
until the memory is brought into the CPU's cache which isn't a problem when just doing a write. I've just thrown together a simple application to test this (testing on a modern PC running alpine linux 64-bit in a virtualbox VM with 512MB RAM and 1 CPU core) with a normal musl library and a modified one whereby I've removed the 'if' check:

#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

void TimedFunc()
{
    uint32_t loops = 64;
    uint32_t *ptr;
    while (loops > 0)
    {
        ptr = calloc(64, 2);
        free(ptr);
        --loops;
    }
}

void main()
{
    clock_t stime, etime;
    stime = clock();

    uint32_t runs = 0;
    while (runs < 16384)
    {
        TimedFunc();
        ++runs;
    }

    etime = clock();
    printf("%d loops in %d ms\r\n", runs, ((etime - stime) * 1000 / CLOCKS_PER_SEC));
}


Results are 74-148ms for the normal library and 70-72ms when the if statement is removed (about twice as fast). I've also got am original raspberry pi with a single CPU and have alpine linux on that so I've performed the same test using 32 loops, calloc(32, 2) and 8192 loops instead and see a similar result although it's much closer 411-412ms for the normal library and 405-408ms when the if statement is removed.
Surely a page fault will occur when attempting to read memory not writing it, it doesn't need to bring the page into the cache if no read is taking place therefore a page fault will not occur?



[-- Attachment #2: Type: text/html, Size: 3670 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Query regarding malloc if statement
  2017-06-19 21:02   ` Jamie Mccrae
@ 2017-06-20  4:14     ` Markus Wichmann
  2017-06-20 14:35       ` Szabolcs Nagy
  0 siblings, 1 reply; 7+ messages in thread
From: Markus Wichmann @ 2017-06-20  4:14 UTC (permalink / raw)
  To: musl

On Mon, Jun 19, 2017 at 09:02:00PM +0000, Jamie Mccrae wrote:
> My understanding is that doing a read followed by a possible write is slower than always doing a write for the reason that upon doing a read the process will halt
> until the memory is brought into the CPU's cache which isn't a problem when just doing a write. I've just thrown together a simple application to test this (testing on a modern PC running alpine linux 64-bit in a virtualbox VM with 512MB RAM and 1 CPU core) with a normal musl library and a modified one whereby I've removed the 'if' check:
> 

Woah, you're mixing up a few things here. A cache miss and a page fault
are two very different things.

Besides, doesn't a cache miss on write mean that a cache-line for the
write area has to be allocated first?

> #include <time.h>
> #include <stdlib.h>
> #include <stdio.h>
> #include <stdint.h>
> 
> void TimedFunc()
> {
>     uint32_t loops = 64;
>     uint32_t *ptr;
>     while (loops > 0)
>     {
>         ptr = calloc(64, 2);
>         free(ptr);
>         --loops;
>     }
> }
> 
> void main()
> {
>     clock_t stime, etime;
>     stime = clock();
> 
>     uint32_t runs = 0;
>     while (runs < 16384)
>     {
>         TimedFunc();
>         ++runs;
>     }
> 
>     etime = clock();
>     printf("%d loops in %d ms\r\n", runs, ((etime - stime) * 1000 / CLOCKS_PER_SEC));
> }
> 

Hmm... looks about right (except for "void main", but let's not be
pedantic here). But, as I said, the whole thing only works if brk() is
disabled. If you don't want to recompile your kernel, you can use a
seccomp filter to disallow that system call. This forces musl to fall
back to allocating heap with mmap().

Also, you are allocating 128 bytes, which is too small to trigger the
effect. Try 100kB (if my maths did not fail me, for a 32-bit platform
the mmap threshold is at 112kB, and for a 64-bit platform it is twice
that, so 100kB is well below that).
> 
> Results are 74-148ms for the normal library and 70-72ms when the if statement is removed (about twice as fast). I've also got am original raspberry pi with a single CPU and have alpine linux on that so I've performed the same test using 32 loops, calloc(32, 2) and 8192 loops instead and see a similar result although it's much closer 411-412ms for the normal library and 405-408ms when the if statement is removed.

Interesting. So it appears to not be beneficial, time-wise, for small
allocations.

> Surely a page fault will occur when attempting to read memory not writing it, it doesn't need to bring the page into the cache if no read is taking place therefore a page fault will not occur?

No, not really. See, if Linux is doing the right thing, then it will
always have a zero page handy. If an application requests memory via
mmap() with anonymous pages, what Linux should do is write into the page
tables in the CPU-facing bytes that the pages exist and all point to the
zero page and are read-only. In the OS-facing bits, it needs to record
that those pages are copy-on-write, of course. Then a read of those
pages will return bytes from the zero page (so always zero), and a write
will cause a page fault.

Linux will of course handle that page fault by allocating a fresh
physical page and copying the zero page there and rewriting the page
tables and invalidating the page table cache. Before continuing the
program.

Of course, I don't know if Linux really does that. It might just answer
a request for memory with completely inaccessible pages that cause a
fault as soon as they are accessed in any way. The interface would be
fulfilled either way.

Oh, and the CPU cache doesn't have anything to do with this. The page
fault mechanism is so slow that a cache miss or two make no odds here.

Ciao,
Markus


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Query regarding malloc if statement
  2017-06-20  4:14     ` Markus Wichmann
@ 2017-06-20 14:35       ` Szabolcs Nagy
  2017-06-20 20:04         ` Markus Wichmann
  0 siblings, 1 reply; 7+ messages in thread
From: Szabolcs Nagy @ 2017-06-20 14:35 UTC (permalink / raw)
  To: musl

* Markus Wichmann <nullplan@gmx.net> [2017-06-20 06:14:29 +0200]:
> On Mon, Jun 19, 2017 at 09:02:00PM +0000, Jamie Mccrae wrote:
> > My understanding is that doing a read followed by a possible write is slower than always doing a write for the reason that upon doing a read the process will halt
> > until the memory is brought into the CPU's cache which isn't a problem when just doing a write. I've just thrown together a simple application to test this (testing on a modern PC running alpine linux 64-bit in a virtualbox VM with 512MB RAM and 1 CPU core) with a normal musl library and a modified one whereby I've removed the 'if' check:
> > 
> 
> Woah, you're mixing up a few things here. A cache miss and a page fault
> are two very different things.
> 
> Besides, doesn't a cache miss on write mean that a cache-line for the
> write area has to be allocated first?
> 

are you arguing with somebody off-list?
(i only see your replies)

in any case the calloc code should not be
controversial on linux.

writing to each allocated page when they are
potentially unused is a huge waste of memory
not just time. (a benchmark that calls calloc/free
in a loop is obviously bogous, try

calloc(1000*1000*1000,1)

vs

memset(malloc(1000*1000*1000),0,1000*1000*1000))


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Query regarding malloc if statement
  2017-06-20 14:35       ` Szabolcs Nagy
@ 2017-06-20 20:04         ` Markus Wichmann
  2017-06-20 23:08           ` A. Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Markus Wichmann @ 2017-06-20 20:04 UTC (permalink / raw)
  To: musl

On Tue, Jun 20, 2017 at 04:35:49PM +0200, Szabolcs Nagy wrote:
> are you arguing with somebody off-list?
> (i only see your replies)
> 

I had a feeling something strange was going on lately. However, Jamie
did address his mails to the list. Could someone investigate where they
got stuck? Is he in your killfile, perchance?

> in any case the calloc code should not be
> controversial on linux.
> 

It isn't, but valgrind likes to disagree (that's why I alluded to
reclaim() before; valgrind has a habit of playing contrarian to musl).

> writing to each allocated page when they are
> potentially unused is a huge waste of memory
> not just time. (a benchmark that calls calloc/free
> in a loop is obviously bogous, try
> 
> calloc(1000*1000*1000,1)
> 
> vs
> 
> memset(malloc(1000*1000*1000),0,1000*1000*1000))

Well, that doesn't test the line in contention as the allocated size is
way beyond the mmap threshold; thus __malloc0() won't do anything at
all, except call malloc().

The only way that line would make a positive difference is if malloc()
were to call expand_heap(), and expand_heap() were to use mmap() to get
more memory, and the returned chunk would contain unallocated pages.
Since each heap expansion requires the writing of a header and a footer,
something between three pages and the mmap threshold would have to be
allocated. That's why I suggested using 100 kB.

Ciao,
Markus


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Query regarding malloc if statement
  2017-06-20 20:04         ` Markus Wichmann
@ 2017-06-20 23:08           ` A. Wilcox
  0 siblings, 0 replies; 7+ messages in thread
From: A. Wilcox @ 2017-06-20 23:08 UTC (permalink / raw)
  To: musl

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 20/06/17 15:04, Markus Wichmann wrote:
> On Tue, Jun 20, 2017 at 04:35:49PM +0200, Szabolcs Nagy wrote:
>> are you arguing with somebody off-list? (i only see your
>> replies)
>> 
> 
> I had a feeling something strange was going on lately. However,
> Jamie did address his mails to the list. Could someone investigate
> where they got stuck? Is he in your killfile, perchance?


For what it's worth, I see both sides of this from here.

- --arw


- -- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
http://adelielinux.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJZSarZAAoJEMspy1GSK50UEIMQAIBuz40Y7x2q/JmbcIP4yD2x
xMC83DzVY7Fle6FuBJrrdKP8/7eF6X4xRD2jRPYKTZJ8UUmBceDGbkOFB+cPFa+V
4zo4e0J39c7l0FnFSnQAdvk4gjR6hf0vOq/Rq7rYusmduVImTwORcgYaO4AmnwJi
cZ1OLkaijnao5f/xkY4KymJ3NoLZlMChmqunVmz4Kk43UEz+8fK0hnGkEc7MVneH
GAeyLYHYL3STmr5GrrtvEZ6fBAfmLXLfnSM5L+uM3EGL2HUH6v+yAvBC+EddI6Jn
moQISnohbeFF+JYzEHzN7mnPkXGSifxSSnxKpT0kS3D25kTXMbgLqFg2tsMTgCew
erpNZQvMDyfwiOV07B4/Hf1cG5vM0EP2zvRhC6mNG8hQVIMYsQGhBMwRTf9uXejw
znQVKzW7xB6VW6XsZgz9+H3Vjx5FfUTRrBgYDDH/9bJHXXQ9MW5moZPU6n41amTW
BKbarTqP2GpZJovejhW4yE/Mtg4lVfBAhTmh2Vc/6Us2eECVOBGp3IF9tfRASfij
e52N6yi+ZdrHhDszjxx297cO9ETzWAxhfNofoCEgYcJd3gRTii8feqfz0QLK8NRs
1mU+NwpOJzJcdjR8EUPpGf4ovnW0Hjxz7FgasqI0gNtVDvobOs3JsuUZfWLE1qqA
xXCEpoiLXMsQJXLKEjX7
=f7y5
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-06-20 23:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-19 15:16 Query regarding malloc if statement Jamie Mccrae
2017-06-19 18:34 ` Markus Wichmann
2017-06-19 21:02   ` Jamie Mccrae
2017-06-20  4:14     ` Markus Wichmann
2017-06-20 14:35       ` Szabolcs Nagy
2017-06-20 20:04         ` Markus Wichmann
2017-06-20 23:08           ` A. Wilcox

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).