Thanks for the quick answer, and I've taken a look at the pre-2011 fwrite.c code, and using SYS_writev is indeed much cleaner code! See below on my proposed answer to your question about what the "cutoff" should be for copying. (SYS_writev is fully retained, but the 2nd iovec element will very often be zero length in this proposal, which makes emulation of SYS_writev much more efficient). On Sat, 1 Dec 2018 at 03:10, Rich Felker wrote: > > It would probably be welcome to make __stdio_write make use of > SYS_write when it would be expected to be faster (len very small), but > I'm not sure what the exact cutoff should be. > > My proposal is the cutoff be 5-8 bytes (on 32 bit CPUs) , and on 64 bit CPUs, 9-16 bytes. The cutoffs are selected in such a way that the "no copy" loop (searching for '\n') always ends on a word aligned position (opening the door to future optimisations by using word-at-a-time search for '\n' instead of byte-at-a-time). The "copy" branch is also guaranteed to only be a double word at most, but a minimum of a single word (allowing a two word memcpy to be done with just a 2x load/mask/store word code sequence). Some example code is shown to give a general idea of word-aligning the cutoff amount (but not yet doing word-at-a-time searching of '\n', or optimised two word memcpy). *CURRENT __fwritex.c CODE (lines 12-20)*: if (f->lbf >= 0) { /* Match /^(.*\n|)/ */ for (i=l; i && s[i-1] != '\n'; i--); if (i) { size_t n = f->write(f, s, i); if (n < i) return n; s += i; l -= i; } } *PROPOSED*: size_t i, len; if (f->lbf >= 0) { const unsigned char *t = ALIGN(s+sizeof(size_t)*2); for (i = l+s-t; ; i--) { if (i <= 0) { /* SHORT LINE - copy up to 16 bytes into f->wpos buffer and then flush line */ for (j = t-s; j && s[j-1] != '\n'; j--); if (j) { memcpy(f->wpos, s, j); f->wpos += j; size_t n = f->write(f, t, 0); if (n < 0) return n; s += j; l -= j; } break; } if (t[i-1] == '\n') { size_t n = f->write(f, s, len = i+t-s); if (n < len) return n; s += len; l -= len; break; } } }