From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13489 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: stdio glitch & questions Date: Fri, 30 Nov 2018 19:02:29 -0500 Message-ID: <20181201000229.GT23599@brightrain.aerifal.cx> References: <20181130160951.GS23599@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1543622437 5023 195.159.176.226 (1 Dec 2018 00:00:37 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 1 Dec 2018 00:00:37 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-13505-gllmg-musl=m.gmane.org@lists.openwall.com Sat Dec 01 01:00:33 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1gSsi1-0001DD-Az for gllmg-musl@m.gmane.org; Sat, 01 Dec 2018 01:00:33 +0100 Original-Received: (qmail 13404 invoked by uid 550); 1 Dec 2018 00:02:42 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 13381 invoked from network); 1 Dec 2018 00:02:41 -0000 Content-Disposition: inline In-Reply-To: Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13489 Archived-At: On Sat, Dec 01, 2018 at 09:15:56AM +1100, Xan Phung wrote: > Thanks for the quick answer, and I've taken a look at the pre-2011 fwrite.c > code, and using SYS_writev is indeed much cleaner code! > > See below on my proposed answer to your question about what the "cutoff" > should be for copying. (SYS_writev is fully retained, but the 2nd iovec > element will very often be zero length in this proposal, which makes > emulation of SYS_writev much more efficient). > > On Sat, 1 Dec 2018 at 03:10, Rich Felker wrote: > > > > > It would probably be welcome to make __stdio_write make use of > > SYS_write when it would be expected to be faster (len very small), but > > I'm not sure what the exact cutoff should be. > > > > > My proposal is the cutoff be 5-8 bytes (on 32 bit CPUs) , and on 64 bit > CPUs, 9-16 bytes. > > The cutoffs are selected in such a way that the "no copy" loop (searching > for '\n') always ends on a word aligned position (opening the door to > future optimisations by using word-at-a-time search for '\n' instead of > byte-at-a-time). The "copy" branch is also guaranteed to only be a double > word at most, but a minimum of a single word (allowing a two word memcpy to > be done with just a 2x load/mask/store word code sequence). Some example > code is shown to give a general idea of word-aligning the cutoff amount > (but not yet doing word-at-a-time searching of '\n', or optimised two word > memcpy). > > *CURRENT __fwritex.c CODE (lines 12-20)*: > > > if (f->lbf >= 0) { > > /* Match /^(.*\n|)/ */ > > for (i=l; i && s[i-1] != '\n'; i--); > > if (i) { > > size_t n = f->write(f, s, i); > > if (n < i) return n; > > s += i; > > l -= i; > > } > > } > > > > *PROPOSED*: > > size_t i, len; > if (f->lbf >= 0) { > const unsigned char *t = ALIGN(s+sizeof(size_t)*2); > for (i = l+s-t; ; i--) { > if (i <= 0) { /* SHORT LINE - copy up to 16 bytes into f->wpos > buffer and then flush line */ > for (j = t-s; j && s[j-1] != '\n'; j--); > if (j) { > memcpy(f->wpos, s, j); f->wpos += j; > size_t n = f->write(f, t, 0); > if (n < 0) return n; > s += j; > l -= j; > } break; > } > if (t[i-1] == '\n') { > size_t n = f->write(f, s, len = i+t-s); > if (n < len) return n; > s += len; > l -= len; > break; > } > } > } I've been trying to understand what you're trying to do. It seems you chose to work at the point of line-buffered flush logic, since that happens to be the only case where f->write is called with an argument that might fit in the remaining buffer space. As written the alignment logic and pointer arithmetic is invalid; the sums/differences are out of bounds of the array, and i<=0 is not meaningful since i has an unsigned type (and so does l+s-t). But even if it could be made correct, it's all completely unnecessary and just making the code slower and less readable. If __fwritex were the right place for this code, all you would need to do is check whether i<16 (or whatever threshold) before calling f->write, and if so, memcpy'ing it to the buffer then calling f->write with a length of 0. However, then you could not use the return value of f->write to determine if it succeeded (see how fflush and fseek have to deal with this case). Contrary to what your code assumes, f->write does not (and cannot, since the type is unsigned) return a negative value on error. Instead, I think it probably makes more sense to put the logic in __stdio_write, but this will also be somewhat nontrivial to work in. At least the "iovcnt == 2 ? ..." logic needs to be adapted to something like "rem > len ? ...". Before the loop should probably be something like "if (len < f->wend-f->wpos && len <= 16) ..." to conditionally copy the new data into the buffer. Do you see any reason to prefer doing it in __fwritex? Rich