From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13488 Path: news.gmane.org!.POSTED!not-for-mail From: Xan Phung Newsgroups: gmane.linux.lib.musl.general Subject: Re: stdio glitch & questions Date: Sat, 1 Dec 2018 09:15:56 +1100 Message-ID: References: <20181130160951.GS23599@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000d99d25057be926be" X-Trace: blaine.gmane.org 1543616081 14653 195.159.176.226 (30 Nov 2018 22:14:41 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 30 Nov 2018 22:14:41 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-13504-gllmg-musl=m.gmane.org@lists.openwall.com Fri Nov 30 23:14:37 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1gSr3U-0003hV-QV for gllmg-musl@m.gmane.org; Fri, 30 Nov 2018 23:14:36 +0100 Original-Received: (qmail 30615 invoked by uid 550); 30 Nov 2018 22:16:45 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 30597 invoked from network); 30 Nov 2018 22:16:45 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=1JQmTj4RX+Jted4fm47BL3VNA9tHkahsl62HXoiVUZw=; b=mippx1+F5t9noSJoPC+zh1539ePntPu34gW2m536cX7R7hV1UUhcptk1lTCbvJ2kTm UnKMcrlRlUADjUNhDkcZCV/CpRiorrwZTaNkQ6EBhEJ3n/IUX1ayYSOzQKxAEHl3dYa0 Q9ovFX2apLWVnqC+cGqCYxe0QZssYeM01PA3RP9ZWi3Pz4/CIWiibwJpD0Gi25Mlpp3P p7MY2MUbid9J+L9x6THs2yR7waYUYAIKhxtdn6Fjj0pESlR5KUcdcdW9mBBp+niwJ+0J iFePWJqDmRYXrjUuUi2SLZroC1iWiOoVzS35lj5DM+RTEI7FRQ01Y07m1op3gepMNllK mTHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=1JQmTj4RX+Jted4fm47BL3VNA9tHkahsl62HXoiVUZw=; b=G0sUoZjAwt5pZV955bmGYJQbY1TJusQ71KQ3uMMcE9arNqZVBV7RGjHUjYaQ3iM7VR wvqgGuZwNmfFm6rU5+tgxSuDS0DJ9yorvSEC5QeABNkWrPACecAB+LA/AuCmnPeXjrwt hR29w5GH5d0+T9+5hubazEWs48wVfiIAx84CbQe9og1oCZ+152G38vDwP3tCs8vzM4DQ zHQ6xAdz1wXPkJpgpWz3y7Efo+fx7Y53wvJb/WHItcPnhcZVS+t6Wga7irzEO6gxffN4 EvT4rX3V9YqMPFmzxh2cdxoiPBUD4I/h0DA54YR0hGW+ZiWOwar9Zh3MfcR3cOflyy6K Ej9A== X-Gm-Message-State: AA+aEWatJWN3xc0oHvJ+MD9V6UEr2oiZaUDpVJ3KPJpIqPkk/b+vlB9N r1C5g553vI4CgGFh40B7c56auIKWtaZg2OaJaIi/yBHP X-Google-Smtp-Source: AFSGD/Vm8rN/D3TweEs0FtZMtxSlmICuAXpdb16/EaV43569Tp+GMFnmsGO8A20ZCsWtApKi4dhnKJrzjhSliPxSKAs= X-Received: by 2002:a67:4a96:: with SMTP id e22mr3185252vsg.92.1543616192943; Fri, 30 Nov 2018 14:16:32 -0800 (PST) In-Reply-To: <20181130160951.GS23599@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:13488 Archived-At: --000000000000d99d25057be926be Content-Type: text/plain; charset="UTF-8" Thanks for the quick answer, and I've taken a look at the pre-2011 fwrite.c code, and using SYS_writev is indeed much cleaner code! See below on my proposed answer to your question about what the "cutoff" should be for copying. (SYS_writev is fully retained, but the 2nd iovec element will very often be zero length in this proposal, which makes emulation of SYS_writev much more efficient). On Sat, 1 Dec 2018 at 03:10, Rich Felker wrote: > > It would probably be welcome to make __stdio_write make use of > SYS_write when it would be expected to be faster (len very small), but > I'm not sure what the exact cutoff should be. > > My proposal is the cutoff be 5-8 bytes (on 32 bit CPUs) , and on 64 bit CPUs, 9-16 bytes. The cutoffs are selected in such a way that the "no copy" loop (searching for '\n') always ends on a word aligned position (opening the door to future optimisations by using word-at-a-time search for '\n' instead of byte-at-a-time). The "copy" branch is also guaranteed to only be a double word at most, but a minimum of a single word (allowing a two word memcpy to be done with just a 2x load/mask/store word code sequence). Some example code is shown to give a general idea of word-aligning the cutoff amount (but not yet doing word-at-a-time searching of '\n', or optimised two word memcpy). *CURRENT __fwritex.c CODE (lines 12-20)*: if (f->lbf >= 0) { /* Match /^(.*\n|)/ */ for (i=l; i && s[i-1] != '\n'; i--); if (i) { size_t n = f->write(f, s, i); if (n < i) return n; s += i; l -= i; } } *PROPOSED*: size_t i, len; if (f->lbf >= 0) { const unsigned char *t = ALIGN(s+sizeof(size_t)*2); for (i = l+s-t; ; i--) { if (i <= 0) { /* SHORT LINE - copy up to 16 bytes into f->wpos buffer and then flush line */ for (j = t-s; j && s[j-1] != '\n'; j--); if (j) { memcpy(f->wpos, s, j); f->wpos += j; size_t n = f->write(f, t, 0); if (n < 0) return n; s += j; l -= j; } break; } if (t[i-1] == '\n') { size_t n = f->write(f, s, len = i+t-s); if (n < len) return n; s += len; l -= len; break; } } } --000000000000d99d25057be926be Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for the quick ans= wer, and I've taken a look at the pre-2011 fwrite.c code, and using SYS= _writev is indeed much cleaner code!

See below on my proposed answer to your quest= ion about what the "cutoff" should be for copying.=C2=A0 (SYS_wri= tev is fully retained, but the 2nd iovec element will very often be zero le= ngth in this proposal, which makes emulation of SYS_writev much more effici= ent).

On Sat, 1 Dec 2018 at 03:10, Rich Felker <dalias@libc.org> wrote:

It would probably be welcome to make __stdio_write make use of
SYS_write when it would be expected to be faster (len very small), but
I'm not sure what the exact cutoff should be.=C2=A0


My proposal is the cutoff be 5-8 bytes (on 32 bit CPUs= )=C2=A0, and on 64 bit CPUs, 9-16 bytes.

The cutof= fs are selected in such a way that the "no copy" loop (searching = for '\n') always ends on a word aligned position (opening the door = to future optimisations by using word-at-a-time search for '\n' ins= tead of byte-at-a-time).=C2=A0 The "copy" branch is also guarante= ed to only be a double word at most, but a minimum of a single word (allowi= ng a two word memcpy to be done with just a 2x load/mask/store word code se= quence).=C2=A0 Some example code is shown to give a general idea of word-al= igning the cutoff amount (but not yet doing word-at-a-time searching of = 9;\n', or optimised two word memcpy).

CURRE= NT __fwritex.c CODE (lines 12-20):


if (f->lbf >=3D= 0) {

/* Match /^(.*\n|)/ */

= for (i=3Dl; i && s[i-1] !=3D '\n'; i--);

if (i) {

size_t n =3D f->write(f, s, i);=

if (n < i) return n;

s +=3D i;

l -=3D i;

}

}

=

<= font color=3D"#000000">

PROPOSED:

= size_t i, len; if (f->lbf >=3D 0) { const unsigned char *t =3D ALIGN(s+sizeof(size_t)*2); for (i =3D l+s-t; ; i--) { if (i <=3D 0) { /* SHORT LINE - copy up to 16 bytes into f->wpos= buffer and then flush line */ for (j =3D t-s; j && s[j-1] !=3D '\n'; j--); if (j) { memcpy(f->wpos, s, j); f->wpos +=3D j; size_t n =3D f->write(f, t, 0); if (n < 0) return n; s +=3D j; l -=3D j; } break; } if (t[i-1] =3D=3D '\n') { size_t n =3D f->write(f, s, len =3D i+t-s); if (n < len) return n; s +=3D len; l -=3D len; break; } } }

=



--000000000000d99d25057be926be--