From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13491
Path: news.gmane.org!.POSTED!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: stdio glitch & questions
Date: Fri, 30 Nov 2018 22:17:36 -0500
Message-ID: <20181201031736.GU23599@brightrain.aerifal.cx>
References: <CAO6moYvRndzizm+7Q3TbyB_TYr6PatkkS9mw=OMszti5=iBB1w@mail.gmail.com>
 <20181130160951.GS23599@brightrain.aerifal.cx>
 <CAO6moYvBB0Lb+g23hb0hc=WXsXoACtnSBjvQ7fSep-0Zky=W_g@mail.gmail.com>
 <20181201000229.GT23599@brightrain.aerifal.cx>
 <CAO6moYsYGOhGTzdrqRLj=J6cxidjmid2YaSnAOP6Gz5zPK24fw@mail.gmail.com>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: blaine.gmane.org 1543634146 6981 195.159.176.226 (1 Dec 2018 03:15:46 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Sat, 1 Dec 2018 03:15:46 +0000 (UTC)
User-Agent: Mutt/1.5.21 (2010-09-15)
To: musl@lists.openwall.com
Original-X-From: musl-return-13507-gllmg-musl=m.gmane.org@lists.openwall.com Sat Dec 01 04:15:41 2018
Return-path: <musl-return-13507-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-13507-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1gSvkq-0001kq-Bd
	for gllmg-musl@m.gmane.org; Sat, 01 Dec 2018 04:15:40 +0100
Original-Received: (qmail 19676 invoked by uid 550); 1 Dec 2018 03:17:49 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 19657 invoked from network); 1 Dec 2018 03:17:49 -0000
Content-Disposition: inline
In-Reply-To: <CAO6moYsYGOhGTzdrqRLj=J6cxidjmid2YaSnAOP6Gz5zPK24fw@mail.gmail.com>
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:13491
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/13491>

On Sat, Dec 01, 2018 at 01:42:30PM +1100, Xan Phung wrote:
> On Sat, 1 Dec 2018 at 11:02, Rich Felker <dalias@libc.org> wrote:
> 
> >
> > I've been trying to understand what you're trying to do. It seems you
> > chose to work at the point of line-buffered flush logic, since that
> > happens to be the only case where f->write is called with an argument
> > that might fit in the remaining buffer space.
> >
> >
> Yes that's correct... Also, the other reason I chose __fwrite.c is it's the
> only place where the search for '\n' is done.
> 
> An additional objective I had was to split the loop searching for '\n' into
> two stages:
> (i) Stage 1: search for '\n' word by word ie: 8 bytes at a time ... if '\n'
> found at >~16 bytes before start of "s" array then go straight to f->write
> without copying bytes
> (I don't show it in my code, but the algorithm to do stage 1 would be
> similar to memchr.c)
> (ii) Stage 2: in final 9~16 bytes, drop down into byte-by-byte searching
> for  '\n' (also doing copy of residual bytes into buffer when '\n' found)
> 
> 
> > As written the alignment logic and pointer arithmetic is invalid; the
> > sums/differences are out of bounds of the array, and i<=0 is not
> > meaningful since i has an unsigned type (and so does l+s-t). But even
> > if it could be made correct, it's all completely unnecessary and just
> > making the code slower and less readable. If __fwritex were the right
> > place for this code, all you would need to
> >
> do is check whether i<16 (or whatever threshold) before calling
> > f->write, and if so, memcpy'ing it to the buffer then calling f->write
> > with a length of 0.
> 
> 
> I agree, the check for i <= X is a simpler way of expressing the algorithm.
> [X is a value between 9 and 16 that guarantees (s + X) will be word aligned]
> 
> In my version, I introduced a new array base "t" and rebase i to index into
> "t" because "t" is a guaranteed word aligned pointer.
> However, this word alignment of "t" is not exploited in the current
> byte-at-a-time searching for '\n', so yes at present it's unnecessary.
> 
> 
> > However, then you could not use the return value
> > of f->write to determine if it succeeded (see how fflush and fseek
> > have to deal with this case). Contrary to what your code assumes,
> > f->write does not (and cannot, since the type is unsigned) return a
> > negative value on error.
> >
> >
> Ok, noted, the expression to check for error should be (!f->wpos) and not n
> < 0
> 
> Instead, I think it probably makes more sense to put the logic in
> > __stdio_write, but this will also be somewhat nontrivial to work in.
> > At least the "iovcnt == 2 ? ..." logic needs to be adapted to
> > something like "rem > len ? ...". Before the loop should probably be
> > something like "if (len < f->wend-f->wpos && len <= 16) ..." to
> > conditionally copy the new data into the buffer.
> >
> > Do you see any reason to prefer doing it in __fwritex?
> >
> Wouldn't putting check for i < X in __fwrite.c better because it:
> (a) facilitates word-by-word searching objective outlined above

Whether a faster search for '\n' is possible or makes sense is
completely independent of where or whether you copy into the buffer.
The search is necessary either way, and the scope of the search has to
be the entire input array.

There is a namespace-safe __memrchr function that could be used here
if it were optimized, but I don't think open-coding an optimized
version of a string function inside a stdio function makes sense.

> (b) check for space in buffer is already done in line 10 of __fwrite.c,
> hence avoids need for any more buffer length checks

Indeed, that is potentially a reason for doing it here.

> (c) ?? speeds up writes which doesn't use __stdio_write

You mean other backends like fmemopen, open_memstream, fopencookie,
etc.? This is possibly also a good reason, but in order for it to
help, some of their write functions might need improvements too. For
example I just noticed that fopencookie's cookiewrite() function
unnecessarily calls the application's write function a second time
with length 0 when length 0 was passed to it. This should probably be
avoided for other reasons, since it's not clear that the callback
needs to accept 0 as a valid length.

Note that __stdio_write also needs some changes to avoid unnecessary
writev syscalls when write would suffice.

So anyway, I think there are probably some good reasons to do the
merging in __fwritex rather than in the __stdio_write backend. But it
should be a lot simpler than what you initially proposed.

Rich