From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13489
Path: news.gmane.org!.POSTED!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: stdio glitch & questions
Date: Fri, 30 Nov 2018 19:02:29 -0500
Message-ID: <20181201000229.GT23599@brightrain.aerifal.cx>
References: <CAO6moYvRndzizm+7Q3TbyB_TYr6PatkkS9mw=OMszti5=iBB1w@mail.gmail.com>
 <20181130160951.GS23599@brightrain.aerifal.cx>
 <CAO6moYvBB0Lb+g23hb0hc=WXsXoACtnSBjvQ7fSep-0Zky=W_g@mail.gmail.com>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: blaine.gmane.org 1543622437 5023 195.159.176.226 (1 Dec 2018 00:00:37 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Sat, 1 Dec 2018 00:00:37 +0000 (UTC)
User-Agent: Mutt/1.5.21 (2010-09-15)
To: musl@lists.openwall.com
Original-X-From: musl-return-13505-gllmg-musl=m.gmane.org@lists.openwall.com Sat Dec 01 01:00:33 2018
Return-path: <musl-return-13505-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-13505-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1gSsi1-0001DD-Az
	for gllmg-musl@m.gmane.org; Sat, 01 Dec 2018 01:00:33 +0100
Original-Received: (qmail 13404 invoked by uid 550); 1 Dec 2018 00:02:42 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 13381 invoked from network); 1 Dec 2018 00:02:41 -0000
Content-Disposition: inline
In-Reply-To: <CAO6moYvBB0Lb+g23hb0hc=WXsXoACtnSBjvQ7fSep-0Zky=W_g@mail.gmail.com>
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:13489
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/13489>

On Sat, Dec 01, 2018 at 09:15:56AM +1100, Xan Phung wrote:
> Thanks for the quick answer, and I've taken a look at the pre-2011 fwrite.c
> code, and using SYS_writev is indeed much cleaner code!
> 
> See below on my proposed answer to your question about what the "cutoff"
> should be for copying.  (SYS_writev is fully retained, but the 2nd iovec
> element will very often be zero length in this proposal, which makes
> emulation of SYS_writev much more efficient).
> 
> On Sat, 1 Dec 2018 at 03:10, Rich Felker <dalias@libc.org> wrote:
> 
> >
> > It would probably be welcome to make __stdio_write make use of
> > SYS_write when it would be expected to be faster (len very small), but
> > I'm not sure what the exact cutoff should be.
> >
> >
> My proposal is the cutoff be 5-8 bytes (on 32 bit CPUs) , and on 64 bit
> CPUs, 9-16 bytes.
> 
> The cutoffs are selected in such a way that the "no copy" loop (searching
> for '\n') always ends on a word aligned position (opening the door to
> future optimisations by using word-at-a-time search for '\n' instead of
> byte-at-a-time).  The "copy" branch is also guaranteed to only be a double
> word at most, but a minimum of a single word (allowing a two word memcpy to
> be done with just a 2x load/mask/store word code sequence).  Some example
> code is shown to give a general idea of word-aligning the cutoff amount
> (but not yet doing word-at-a-time searching of '\n', or optimised two word
> memcpy).
> 
> *CURRENT __fwritex.c CODE (lines 12-20)*:
> 
> 
> if (f->lbf >= 0) {
> 
> /* Match /^(.*\n|)/ */
> 
> for (i=l; i && s[i-1] != '\n'; i--);
> 
> if (i) {
> 
> size_t n = f->write(f, s, i);
> 
> if (n < i) return n;
> 
> s += i;
> 
> l -= i;
> 
> }
> 
> }
> 
> 
> 
> *PROPOSED*:
> 
> 	size_t i, len;
> 	if (f->lbf >= 0) {
> 		const unsigned char *t = ALIGN(s+sizeof(size_t)*2);
> 		for (i = l+s-t; ; i--) {
> 			if (i <= 0) {   /* SHORT LINE - copy up to 16 bytes into f->wpos
> buffer and then flush line */
> 				for (j = t-s; j && s[j-1] != '\n'; j--);
> 				if (j) {
> 			  		memcpy(f->wpos, s, j);  f->wpos += j;
> 					size_t n = f->write(f, t, 0);
> 					if (n < 0) return n;
> 					s += j;
> 					l -= j;
> 				}				break;
> 			}
> 			if (t[i-1] == '\n') {
> 				size_t n = f->write(f, s, len = i+t-s);
> 				if (n < len) return n;
> 				s += len;
> 				l -= len;
> 				break;
> 			}
> 		}
> 	}

I've been trying to understand what you're trying to do. It seems you
chose to work at the point of line-buffered flush logic, since that
happens to be the only case where f->write is called with an argument
that might fit in the remaining buffer space.

As written the alignment logic and pointer arithmetic is invalid; the
sums/differences are out of bounds of the array, and i<=0 is not
meaningful since i has an unsigned type (and so does l+s-t). But even
if it could be made correct, it's all completely unnecessary and just
making the code slower and less readable.

If __fwritex were the right place for this code, all you would need to
do is check whether i<16 (or whatever threshold) before calling
f->write, and if so, memcpy'ing it to the buffer then calling f->write
with a length of 0. However, then you could not use the return value
of f->write to determine if it succeeded (see how fflush and fseek
have to deal with this case). Contrary to what your code assumes,
f->write does not (and cannot, since the type is unsigned) return a
negative value on error.

Instead, I think it probably makes more sense to put the logic in
__stdio_write, but this will also be somewhat nontrivial to work in.
At least the "iovcnt == 2 ? ..." logic needs to be adapted to
something like "rem > len ? ...". Before the loop should probably be
something like "if (len < f->wend-f->wpos && len <= 16) ..." to
conditionally copy the new data into the buffer.

Do you see any reason to prefer doing it in __fwritex?

Rich