From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@9fans.net
Subject: Re: [9fans] Go and 21-bit runes (and a bit of Go status)
Date: Sun, 2 Jun 2013 10:10:00 -0400 [thread overview]
Message-ID: <ed8f64a5984a128fa8abd3236a1d901d@kw.quanstro.net> (raw)
In-Reply-To: <f1f540c3c271b35b940f50607d11a615@proxima.alt.za>
[-- Attachment #1: Type: text/plain, Size: 1570 bytes --]
> Regarding the latter, Plan 9 does not allow floating point
> instructions to be executed within note handling, but erring on the
> side of caution also forbids instructions such as MOVOU (don't ask me)
> which is part of the SSE(2?) extension, but hardly qualifies as a
> floating point instruction.
movou (movdqu in the manual) is a sse2 data movement instruction.
not all sse2 instructions require that sse be turned on (pause, for example),
but movou uses at least one xmm register so is clearly using the sse
unit, thus requiring that it be turned on.
the go runtime memmove uses movou for memmoves between 33 and 128
bytes. i only see a 10 cycle difference for these cases on my atom machine,
(maximum 13%), so we're not missing out on much here by not using sse.
the real win, or loss for the plan 9 memmove, is in the short memmoves.
but this is a µbenchmark, and it would be more convincing with a real
world test.
- erik
harness; 8.memmovetest
memmove
1 92.42578 cycles/op
2 81.28125 cycles/op
4 56.47266 cycles/op
8 58.32422 cycles/op
16 62.28516 cycles/op
32 70.26563 cycles/op
64 86.32031 cycles/op
128 118.3125 cycles/op
512 323.5078 cycles/op
1024 587.1094 cycles/op
4096 2119.242 cycles/op
131072 133058.5 cycles/op
rt·memmove
1 20.60156 cycles/op
2 20.34375 cycles/op
4 24.46875 cycles/op
8 22.42969 cycles/op
16 27.45703 cycles/op
32 52.82813 cycles/op
64 79.19531 cycles/op
128 129.1289 cycles/op
512 314.4492 cycles/op
1024 569.9648 cycles/op
4096 2132.297 cycles/op
131072 135378.3 cycles/op
[-- Attachment #2: memmovetest.c --]
[-- Type: text/plain, Size: 1526 bytes --]
#include <u.h>
#include <libc.h>
typedef struct Movtab Movtab;
typedef void* (*Movfn)(void*, void*, ulong);
u32int runtimecpuid_edx = 0x4000000;
extern void* runtimememmove(void*, void*, ulong);
struct Movtab {
Movfn f;
char *name;
};
uvlong hz;
int sztab[] = {1, 2, 4, 8, 16, 32, 64, 128, 512, 1024, 4096, 128*1024, };
uchar buf0[128*1024];
uchar buf1[128*1024];
Movtab movtab[] = {memmove, "memmove", runtimememmove, "rt·memmove", };
//Movfn movtab[] = {memmove, runtimememmove};
uvlong
gethz(void)
{
char buf[1024], *f[5];
int n, fd;
fd = open("/dev/time", OREAD);
if(fd == -1)
sysfatal("%s: open /dev/time: %r", argv0);
n = pread(fd, buf, sizeof buf-1, 0);
if(n <= 0)
sysfatal("%s: read /dev/time: %r", argv0);
buf[n] = 0;
n = tokenize(buf, f, nelem(f));
if(n < 4)
sysfatal("%s: /dev/time: unexpected fmt", argv0);
return strtoull(f[3], 0, 0);
}
void
inner(Movfn f, ulong sz)
{
int i;
for(i = 0; i < 1024; i++)
f(buf1, buf0, sz);
}
void
main(int argc, char **argv)
{
int i, j;
uvlong t[2], c[nelem(movtab)][nelem(sztab)];
// double dhz;
Movfn f;
ARGBEGIN{
}ARGEND
hz = gethz();
// dhz = hz;
for(i = 0; i < 2; i++){
print("%s\n", movtab[i].name);
f = movtab[i].f;
for(j = 0; j < nelem(sztab); j++){
cycles(t + 0);
inner(f, sztab[j]);
cycles(t + 1);
c[i][j] = t[1] - t[0];
print("%d %g cycles/op\n", sztab[j], c[i][j]/1024.);
sleep(0);
}
print("\n");
}
exits("");
}
[-- Attachment #3.1: Type: text/plain, Size: 331 bytes --]
from postmaster@kw:
The following attachment had content that we can't
prove to be harmless. To avoid possible automatic
execution, we changed the content headers.
The original header was:
Content-Disposition: attachment; filename=memmove_386.s
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
[-- Attachment #3.2: memmove_386.s.suspect --]
[-- Type: application/octet-stream, Size: 4116 bytes --]
// Inferno's libkern/memmove-386.s
// http://code.google.com/p/inferno-os/source/browse/libkern/memmove-386.s
//
// Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
// Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com). All rights reserved.
// Portions Copyright 2009 The Go Authors. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
#define MOVOU MOVDQU
TEXT runtimememmove(SB), $0
MOVL to+0(FP), DI
MOVL fr+4(FP), SI
MOVL n+8(FP), BX
// REP instructions have a high startup cost, so we handle small sizes
// with some straightline code. The REP MOVSL instruction is really fast
// for large sizes. The cutover is approximately 1K. We implement up to
// 128 because that is the maximum SSE register load (loading all data
// into registers lets us ignore copy direction).
tail:
TESTL BX, BX
JEQ move_0
CMPL BX, $2
JBE move_1or2
CMPL BX, $4
JBE move_3or4
CMPL BX, $8
JBE move_5through8
CMPL BX, $16
JBE move_9through16
TESTL $0x4000000, runtimecpuid_edx(SB) // check for sse2
JEQ nosse2
CMPL BX, $32
JBE move_17through32
CMPL BX, $64
JBE move_33through64
CMPL BX, $128
JBE move_65through128
// TODO: use branch table and BSR to make this just a single dispatch
nosse2:
/*
* check and set for backwards
*/
CMPL SI, DI
JLS back
/*
* forward copy loop
*/
forward:
MOVL BX, CX
SHRL $2, CX
ANDL $3, BX
REP; MOVSL
JMP tail
/*
* check overlap
*/
back:
MOVL SI, CX
ADDL BX, CX
CMPL CX, DI
JLS forward
/*
* whole thing backwards has
* adjusted addresses
*/
ADDL BX, DI
ADDL BX, SI
STD
/*
* copy
*/
MOVL BX, CX
SHRL $2, CX
ANDL $3, BX
SUBL $4, DI
SUBL $4, SI
REP; MOVSL
CLD
ADDL $4, DI
ADDL $4, SI
SUBL BX, DI
SUBL BX, SI
JMP tail
move_1or2:
MOVB (SI), AX
MOVB -1(SI)(BX*1), CX
MOVB AX, (DI)
MOVB CX, -1(DI)(BX*1)
move_0:
RET
move_3or4:
MOVW (SI), AX
MOVW -2(SI)(BX*1), CX
MOVW AX, (DI)
MOVW CX, -2(DI)(BX*1)
RET
move_5through8:
MOVL (SI), AX
MOVL -4(SI)(BX*1), CX
MOVL AX, (DI)
MOVL CX, -4(DI)(BX*1)
RET
move_9through16:
MOVL (SI), AX
MOVL 4(SI), CX
MOVL -8(SI)(BX*1), DX
MOVL -4(SI)(BX*1), BP
MOVL AX, (DI)
MOVL CX, 4(DI)
MOVL DX, -8(DI)(BX*1)
MOVL BP, -4(DI)(BX*1)
RET
move_17through32:
MOVOU (SI), X0
MOVOU -16(SI)(BX*1), X1
MOVOU X0, (DI)
MOVOU X1, -16(DI)(BX*1)
RET
move_33through64:
MOVOU (SI), X0
MOVOU 16(SI), X1
MOVOU -32(SI)(BX*1), X2
MOVOU -16(SI)(BX*1), X3
MOVOU X0, (DI)
MOVOU X1, 16(DI)
MOVOU X2, -32(DI)(BX*1)
MOVOU X3, -16(DI)(BX*1)
RET
move_65through128:
MOVOU (SI), X0
MOVOU 16(SI), X1
MOVOU 32(SI), X2
MOVOU 48(SI), X3
MOVOU -64(SI)(BX*1), X4
MOVOU -48(SI)(BX*1), X5
MOVOU -32(SI)(BX*1), X6
MOVOU -16(SI)(BX*1), X7
MOVOU X0, (DI)
MOVOU X1, 16(DI)
MOVOU X2, 32(DI)
MOVOU X3, 48(DI)
MOVOU X4, -64(DI)(BX*1)
MOVOU X5, -48(DI)(BX*1)
MOVOU X6, -32(DI)(BX*1)
MOVOU X7, -16(DI)(BX*1)
RET
next prev parent reply other threads:[~2013-06-02 14:10 UTC|newest]
Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-02 10:53 lucio
2013-06-02 14:10 ` erik quanstrom [this message]
2013-06-02 15:24 ` lucio
2013-06-03 4:20 ` erik quanstrom
2013-06-03 5:38 ` lucio
2013-06-03 13:28 ` erik quanstrom
2013-06-03 16:34 ` lucio
2013-06-03 16:46 ` erik quanstrom
2013-06-03 17:04 ` lucio
2013-06-03 17:07 ` erik quanstrom
2013-06-03 17:33 ` Bakul Shah
2013-06-03 17:38 ` Charles Forsyth
2013-06-03 5:48 ` [9fans] More Go status lucio
2013-06-03 17:53 ` [9fans] SSE in a note handler Steve Simon
2013-06-02 15:01 ` [9fans] Go and 21-bit runes (and a bit of Go status) cinap_lenrek
2013-06-02 15:22 ` lucio
2013-06-02 15:38 ` cinap_lenrek
2013-06-02 15:54 ` lucio
2013-06-02 15:59 ` Kurt H Maier
2013-06-02 16:08 ` lucio
2013-06-02 19:37 ` Anthony Martin
2013-12-02 2:10 ` Skip Tavakkolian
2013-12-02 8:22 ` Anthony Martin
2013-12-02 14:33 ` erik quanstrom
2013-12-02 14:59 ` lucio
2013-12-02 15:22 ` Kurt H Maier
2013-12-02 17:19 ` lucio
2013-12-02 18:39 ` Kurt H Maier
2013-12-02 20:09 ` Skip Tavakkolian
2013-12-02 20:11 ` erik quanstrom
2013-12-02 20:22 ` Skip Tavakkolian
2013-12-02 20:24 ` David du Colombier
2013-12-02 20:38 ` erik quanstrom
2013-12-02 20:44 ` Bakul Shah
2013-12-02 20:45 ` erik quanstrom
2013-12-02 20:59 ` Bakul Shah
2013-12-02 21:03 ` erik quanstrom
2013-12-02 23:35 ` Bakul Shah
2013-12-03 0:11 ` erik quanstrom
2013-12-02 20:45 ` David du Colombier
2013-12-02 21:06 ` Skip Tavakkolian
2013-12-02 21:45 ` Jeff Sickel
2013-12-02 21:47 ` erik quanstrom
2013-12-02 21:51 ` erik quanstrom
2013-12-02 23:26 ` Skip Tavakkolian
2013-12-02 23:43 ` Steve Simon
2013-12-03 0:16 ` Anthony Martin
2013-12-03 2:55 ` erik quanstrom
2013-12-03 0:12 ` erik quanstrom
2013-12-03 0:21 ` Anthony Martin
2013-12-03 0:49 ` Aram Hăvărneanu
2013-12-03 0:52 ` erik quanstrom
2013-12-03 1:01 ` Anthony Martin
2013-12-03 1:06 ` Jeremy Jackins
2013-12-03 1:34 ` Jeff Sickel
2013-12-03 7:43 ` lucio
2013-12-03 7:33 ` lucio
2013-12-03 7:29 ` lucio
2013-12-03 15:20 ` erik quanstrom
2013-12-03 7:31 ` lucio
2013-12-03 8:14 ` Jeff Sickel
2013-12-03 9:16 ` lucio
2013-12-03 9:21 ` lucio
2013-12-03 14:51 ` erik quanstrom
2013-12-03 9:46 ` Charles Forsyth
2013-12-03 10:04 ` lucio
2013-12-03 11:39 ` Aram Hăvărneanu
2013-12-03 14:42 ` erik quanstrom
2013-12-03 14:51 ` Charles Forsyth
2013-12-03 15:54 ` Jeff Sickel
2013-12-03 16:04 ` lucio
2013-12-03 16:47 ` Charles Forsyth
2013-12-03 17:44 ` Skip Tavakkolian
2013-12-03 23:12 ` john francis lee
2013-12-04 0:13 ` sl
2013-12-04 4:25 ` lucio
2013-12-04 4:35 ` erik quanstrom
2013-12-04 6:19 ` lucio
2013-12-04 7:04 ` [9fans] Go port [was Re: Go and 21-bit runes (and a bit of Go status)] Jeff Sickel
2013-12-04 7:20 ` [9fans] Go port [was Re: Go and 21-bit runes (and a bit of Go lucio
2013-12-04 7:52 ` Jeff Sickel
2013-12-04 15:11 ` lucio
2013-12-04 4:37 ` [9fans] Go and 21-bit runes (and a bit of Go status) Jens Staal
2013-12-04 4:46 ` erik quanstrom
2013-12-03 7:10 ` lucio
2013-12-03 7:23 ` Skip Tavakkolian
2013-12-03 7:37 ` lucio
2013-12-03 15:04 ` erik quanstrom
2013-12-03 9:48 ` Richard Miller
2013-12-03 10:08 ` lucio
2013-12-03 11:14 ` Charles Forsyth
2013-12-03 11:24 ` lucio
2013-12-03 6:53 ` lucio
2013-12-03 4:49 ` lucio
2013-12-03 8:02 ` Kurt H Maier
2013-12-03 9:12 ` lucio
2013-12-02 15:50 ` erik quanstrom
2013-12-02 17:23 ` lucio
2013-12-02 18:35 ` erik quanstrom
2013-12-03 4:35 ` lucio
2013-12-02 22:52 ` Anthony Martin
2013-12-03 6:20 ` andrey mirtchovski
2013-12-02 16:10 ` Skip Tavakkolian
2013-12-02 17:25 ` lucio
2013-12-02 19:13 ` Skip Tavakkolian
2013-12-02 19:34 ` erik quanstrom
2013-12-03 6:34 ` lucio
2013-12-03 5:02 ` lucio
2013-12-02 17:31 ` Jeff Sickel
2013-12-02 17:52 ` lucio
2013-12-02 18:33 ` erik quanstrom
2013-12-02 19:16 ` Skip Tavakkolian
2013-12-02 19:26 ` erik quanstrom
2013-12-02 19:33 ` Skip Tavakkolian
2013-12-02 19:31 ` Christopher Nielsen
2013-12-02 20:17 ` David du Colombier
2013-12-02 19:37 ` Bakul Shah
2013-12-02 19:57 ` Skip Tavakkolian
2013-12-03 6:47 ` lucio
2013-12-03 6:45 ` lucio
2013-12-03 4:32 ` lucio
2013-12-03 17:22 erik quanstrom
2013-12-03 17:37 ` Bence Fábián
2013-12-03 17:59 ` erik quanstrom
2013-12-04 5:48 ` lucio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ed8f64a5984a128fa8abd3236a1d901d@kw.quanstro.net \
--to=quanstro@quanstro.net \
--cc=9fans@9fans.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).