* Query regarding malloc if statement @ 2017-06-19 15:16 Jamie Mccrae 2017-06-19 18:34 ` Markus Wichmann 0 siblings, 1 reply; 7+ messages in thread From: Jamie Mccrae @ 2017-06-19 15:16 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1131 bytes --] Hi, I'm using musl to compile a cross-distro application which I've been having problems with and whilst discussing the problem the developer of another project, was shown a musl malloc function which manually checks the contents of each byte and changes it to 0 if the byte is non-0. This code is in src/malloc/malloc.c as so: void *__malloc0(size_t n) { void *p = malloc(n); if (p && !IS_MMAPPED(MEM_TO_CHUNK(p))) { size_t *z; n = (n + sizeof *z - 1)/sizeof *z; for (z=p; n; n--, z++) if (*z) *z=0; } return p; } This code causes thousands of errors when using valgrind (in excess of 800,000 for my application) due to checking the value of each byte before it has been set and I have to agree with this other developer that I'm at a loss as to why this is performed. If you step through the array and just set each byte to 0 then there will be no read-before-initialisation error and the function will run much faster due to not having to retrieve the data. Why not instead use: for (z=p; n; n--, z++) *z=0; [-- Attachment #2: Type: text/html, Size: 2093 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Query regarding malloc if statement 2017-06-19 15:16 Query regarding malloc if statement Jamie Mccrae @ 2017-06-19 18:34 ` Markus Wichmann 2017-06-19 21:02 ` Jamie Mccrae 0 siblings, 1 reply; 7+ messages in thread From: Markus Wichmann @ 2017-06-19 18:34 UTC (permalink / raw) To: musl On Mon, Jun 19, 2017 at 03:16:16PM +0000, Jamie Mccrae wrote: > Hi, > > I'm using musl to compile a cross-distro application which I've been having problems with and whilst discussing the problem the developer of another project, was shown a musl malloc function which manually checks the contents of each byte and changes it to 0 if the byte is non-0. This code is in src/malloc/malloc.c as so: > > void *__malloc0(size_t n) > { > void *p = malloc(n); > if (p && !IS_MMAPPED(MEM_TO_CHUNK(p))) { > size_t *z; > n = (n + sizeof *z - 1)/sizeof *z; > for (z=p; n; n--, z++) if (*z) *z=0; > } > return p; > } > > > This code causes thousands of errors when using valgrind (in excess of 800,000 for my application) due to checking the value of each byte before it has been set and I have to agree with this other developer that I'm at a loss as to why this is performed. If you step through the array and just set each byte to 0 then there will be no read-before-initialisation error and the function will run much faster due to not having to retrieve the data. Why not instead use: > > for (z=p; n; n--, z++) *z=0; Ah, yet another valgrind false positive. If the memory was allocated with mmap() (which is different from IS_MMAPPED(), because the latter means that ONLY the chunk is in that map), then the first write access will cause a page fault. Avoiding write access therefore improves performance. A lot. Such a mapping will be read as zero without consequence. My advice: Get valgrind to ignore the system library, as it doesn't know what it's doing there. We already had a lot of reclaim_gaps() fun there. Ciao, Markus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Query regarding malloc if statement 2017-06-19 18:34 ` Markus Wichmann @ 2017-06-19 21:02 ` Jamie Mccrae 2017-06-20 4:14 ` Markus Wichmann 0 siblings, 1 reply; 7+ messages in thread From: Jamie Mccrae @ 2017-06-19 21:02 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 2048 bytes --] > Ah, yet another valgrind false positive. If the memory was allocated > with mmap() (which is different from IS_MMAPPED(), because the latter > means that ONLY the chunk is in that map), then the first write access > will cause a page fault. Avoiding write access therefore improves > performance. A lot. Such a mapping will be read as zero without > consequence. My understanding is that doing a read followed by a possible write is slower than always doing a write for the reason that upon doing a read the process will halt until the memory is brought into the CPU's cache which isn't a problem when just doing a write. I've just thrown together a simple application to test this (testing on a modern PC running alpine linux 64-bit in a virtualbox VM with 512MB RAM and 1 CPU core) with a normal musl library and a modified one whereby I've removed the 'if' check: #include <time.h> #include <stdlib.h> #include <stdio.h> #include <stdint.h> void TimedFunc() { uint32_t loops = 64; uint32_t *ptr; while (loops > 0) { ptr = calloc(64, 2); free(ptr); --loops; } } void main() { clock_t stime, etime; stime = clock(); uint32_t runs = 0; while (runs < 16384) { TimedFunc(); ++runs; } etime = clock(); printf("%d loops in %d ms\r\n", runs, ((etime - stime) * 1000 / CLOCKS_PER_SEC)); } Results are 74-148ms for the normal library and 70-72ms when the if statement is removed (about twice as fast). I've also got am original raspberry pi with a single CPU and have alpine linux on that so I've performed the same test using 32 loops, calloc(32, 2) and 8192 loops instead and see a similar result although it's much closer 411-412ms for the normal library and 405-408ms when the if statement is removed. Surely a page fault will occur when attempting to read memory not writing it, it doesn't need to bring the page into the cache if no read is taking place therefore a page fault will not occur? [-- Attachment #2: Type: text/html, Size: 3670 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Query regarding malloc if statement 2017-06-19 21:02 ` Jamie Mccrae @ 2017-06-20 4:14 ` Markus Wichmann 2017-06-20 14:35 ` Szabolcs Nagy 0 siblings, 1 reply; 7+ messages in thread From: Markus Wichmann @ 2017-06-20 4:14 UTC (permalink / raw) To: musl On Mon, Jun 19, 2017 at 09:02:00PM +0000, Jamie Mccrae wrote: > My understanding is that doing a read followed by a possible write is slower than always doing a write for the reason that upon doing a read the process will halt > until the memory is brought into the CPU's cache which isn't a problem when just doing a write. I've just thrown together a simple application to test this (testing on a modern PC running alpine linux 64-bit in a virtualbox VM with 512MB RAM and 1 CPU core) with a normal musl library and a modified one whereby I've removed the 'if' check: > Woah, you're mixing up a few things here. A cache miss and a page fault are two very different things. Besides, doesn't a cache miss on write mean that a cache-line for the write area has to be allocated first? > #include <time.h> > #include <stdlib.h> > #include <stdio.h> > #include <stdint.h> > > void TimedFunc() > { > uint32_t loops = 64; > uint32_t *ptr; > while (loops > 0) > { > ptr = calloc(64, 2); > free(ptr); > --loops; > } > } > > void main() > { > clock_t stime, etime; > stime = clock(); > > uint32_t runs = 0; > while (runs < 16384) > { > TimedFunc(); > ++runs; > } > > etime = clock(); > printf("%d loops in %d ms\r\n", runs, ((etime - stime) * 1000 / CLOCKS_PER_SEC)); > } > Hmm... looks about right (except for "void main", but let's not be pedantic here). But, as I said, the whole thing only works if brk() is disabled. If you don't want to recompile your kernel, you can use a seccomp filter to disallow that system call. This forces musl to fall back to allocating heap with mmap(). Also, you are allocating 128 bytes, which is too small to trigger the effect. Try 100kB (if my maths did not fail me, for a 32-bit platform the mmap threshold is at 112kB, and for a 64-bit platform it is twice that, so 100kB is well below that). > > Results are 74-148ms for the normal library and 70-72ms when the if statement is removed (about twice as fast). I've also got am original raspberry pi with a single CPU and have alpine linux on that so I've performed the same test using 32 loops, calloc(32, 2) and 8192 loops instead and see a similar result although it's much closer 411-412ms for the normal library and 405-408ms when the if statement is removed. Interesting. So it appears to not be beneficial, time-wise, for small allocations. > Surely a page fault will occur when attempting to read memory not writing it, it doesn't need to bring the page into the cache if no read is taking place therefore a page fault will not occur? No, not really. See, if Linux is doing the right thing, then it will always have a zero page handy. If an application requests memory via mmap() with anonymous pages, what Linux should do is write into the page tables in the CPU-facing bytes that the pages exist and all point to the zero page and are read-only. In the OS-facing bits, it needs to record that those pages are copy-on-write, of course. Then a read of those pages will return bytes from the zero page (so always zero), and a write will cause a page fault. Linux will of course handle that page fault by allocating a fresh physical page and copying the zero page there and rewriting the page tables and invalidating the page table cache. Before continuing the program. Of course, I don't know if Linux really does that. It might just answer a request for memory with completely inaccessible pages that cause a fault as soon as they are accessed in any way. The interface would be fulfilled either way. Oh, and the CPU cache doesn't have anything to do with this. The page fault mechanism is so slow that a cache miss or two make no odds here. Ciao, Markus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Query regarding malloc if statement 2017-06-20 4:14 ` Markus Wichmann @ 2017-06-20 14:35 ` Szabolcs Nagy 2017-06-20 20:04 ` Markus Wichmann 0 siblings, 1 reply; 7+ messages in thread From: Szabolcs Nagy @ 2017-06-20 14:35 UTC (permalink / raw) To: musl * Markus Wichmann <nullplan@gmx.net> [2017-06-20 06:14:29 +0200]: > On Mon, Jun 19, 2017 at 09:02:00PM +0000, Jamie Mccrae wrote: > > My understanding is that doing a read followed by a possible write is slower than always doing a write for the reason that upon doing a read the process will halt > > until the memory is brought into the CPU's cache which isn't a problem when just doing a write. I've just thrown together a simple application to test this (testing on a modern PC running alpine linux 64-bit in a virtualbox VM with 512MB RAM and 1 CPU core) with a normal musl library and a modified one whereby I've removed the 'if' check: > > > > Woah, you're mixing up a few things here. A cache miss and a page fault > are two very different things. > > Besides, doesn't a cache miss on write mean that a cache-line for the > write area has to be allocated first? > are you arguing with somebody off-list? (i only see your replies) in any case the calloc code should not be controversial on linux. writing to each allocated page when they are potentially unused is a huge waste of memory not just time. (a benchmark that calls calloc/free in a loop is obviously bogous, try calloc(1000*1000*1000,1) vs memset(malloc(1000*1000*1000),0,1000*1000*1000)) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Query regarding malloc if statement 2017-06-20 14:35 ` Szabolcs Nagy @ 2017-06-20 20:04 ` Markus Wichmann 2017-06-20 23:08 ` A. Wilcox 0 siblings, 1 reply; 7+ messages in thread From: Markus Wichmann @ 2017-06-20 20:04 UTC (permalink / raw) To: musl On Tue, Jun 20, 2017 at 04:35:49PM +0200, Szabolcs Nagy wrote: > are you arguing with somebody off-list? > (i only see your replies) > I had a feeling something strange was going on lately. However, Jamie did address his mails to the list. Could someone investigate where they got stuck? Is he in your killfile, perchance? > in any case the calloc code should not be > controversial on linux. > It isn't, but valgrind likes to disagree (that's why I alluded to reclaim() before; valgrind has a habit of playing contrarian to musl). > writing to each allocated page when they are > potentially unused is a huge waste of memory > not just time. (a benchmark that calls calloc/free > in a loop is obviously bogous, try > > calloc(1000*1000*1000,1) > > vs > > memset(malloc(1000*1000*1000),0,1000*1000*1000)) Well, that doesn't test the line in contention as the allocated size is way beyond the mmap threshold; thus __malloc0() won't do anything at all, except call malloc(). The only way that line would make a positive difference is if malloc() were to call expand_heap(), and expand_heap() were to use mmap() to get more memory, and the returned chunk would contain unallocated pages. Since each heap expansion requires the writing of a header and a footer, something between three pages and the mmap threshold would have to be allocated. That's why I suggested using 100 kB. Ciao, Markus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Query regarding malloc if statement 2017-06-20 20:04 ` Markus Wichmann @ 2017-06-20 23:08 ` A. Wilcox 0 siblings, 0 replies; 7+ messages in thread From: A. Wilcox @ 2017-06-20 23:08 UTC (permalink / raw) To: musl -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 20/06/17 15:04, Markus Wichmann wrote: > On Tue, Jun 20, 2017 at 04:35:49PM +0200, Szabolcs Nagy wrote: >> are you arguing with somebody off-list? (i only see your >> replies) >> > > I had a feeling something strange was going on lately. However, > Jamie did address his mails to the list. Could someone investigate > where they got stuck? Is he in your killfile, perchance? For what it's worth, I see both sides of this from here. - --arw - -- A. Wilcox (awilfox) Project Lead, Adélie Linux http://adelielinux.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJZSarZAAoJEMspy1GSK50UEIMQAIBuz40Y7x2q/JmbcIP4yD2x xMC83DzVY7Fle6FuBJrrdKP8/7eF6X4xRD2jRPYKTZJ8UUmBceDGbkOFB+cPFa+V 4zo4e0J39c7l0FnFSnQAdvk4gjR6hf0vOq/Rq7rYusmduVImTwORcgYaO4AmnwJi cZ1OLkaijnao5f/xkY4KymJ3NoLZlMChmqunVmz4Kk43UEz+8fK0hnGkEc7MVneH GAeyLYHYL3STmr5GrrtvEZ6fBAfmLXLfnSM5L+uM3EGL2HUH6v+yAvBC+EddI6Jn moQISnohbeFF+JYzEHzN7mnPkXGSifxSSnxKpT0kS3D25kTXMbgLqFg2tsMTgCew erpNZQvMDyfwiOV07B4/Hf1cG5vM0EP2zvRhC6mNG8hQVIMYsQGhBMwRTf9uXejw znQVKzW7xB6VW6XsZgz9+H3Vjx5FfUTRrBgYDDH/9bJHXXQ9MW5moZPU6n41amTW BKbarTqP2GpZJovejhW4yE/Mtg4lVfBAhTmh2Vc/6Us2eECVOBGp3IF9tfRASfij e52N6yi+ZdrHhDszjxx297cO9ETzWAxhfNofoCEgYcJd3gRTii8feqfz0QLK8NRs 1mU+NwpOJzJcdjR8EUPpGf4ovnW0Hjxz7FgasqI0gNtVDvobOs3JsuUZfWLE1qqA xXCEpoiLXMsQJXLKEjX7 =f7y5 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-06-20 23:08 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-06-19 15:16 Query regarding malloc if statement Jamie Mccrae 2017-06-19 18:34 ` Markus Wichmann 2017-06-19 21:02 ` Jamie Mccrae 2017-06-20 4:14 ` Markus Wichmann 2017-06-20 14:35 ` Szabolcs Nagy 2017-06-20 20:04 ` Markus Wichmann 2017-06-20 23:08 ` A. Wilcox
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).