From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Sun, 2 Jun 2013 10:10:00 -0400 To: 9fans@9fans.net Message-ID: In-Reply-To: References: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="upas-wnnfksrrovlprufyikfxhroafi" Subject: Re: [9fans] Go and 21-bit runes (and a bit of Go status) Topicbox-Message-UUID: 5f7e9628-ead8-11e9-9d60-3106f5b1d025 This is a multi-part message in MIME format. --upas-wnnfksrrovlprufyikfxhroafi Content-Disposition: inline Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > Regarding the latter, Plan 9 does not allow floating point > instructions to be executed within note handling, but erring on the > side of caution also forbids instructions such as MOVOU (don't ask me) > which is part of the SSE(2?) extension, but hardly qualifies as a > floating point instruction. movou (movdqu in the manual) is a sse2 data movement instruction. not all sse2 instructions require that sse be turned on (pause, for examp= le), but movou uses at least one xmm register so is clearly using the sse unit, thus requiring that it be turned on. the go runtime memmove uses movou for memmoves between 33 and 128 bytes. i only see a 10 cycle difference for these cases on my atom machi= ne, (maximum 13%), so we're not missing out on much here by not using sse. the real win, or loss for the plan 9 memmove, is in the short memmoves. but this is a =C2=B5benchmark, and it would be more convincing with a rea= l world test. - erik harness; 8.memmovetest memmove 1 92.42578 cycles/op 2 81.28125 cycles/op 4 56.47266 cycles/op 8 58.32422 cycles/op 16 62.28516 cycles/op 32 70.26563 cycles/op 64 86.32031 cycles/op 128 118.3125 cycles/op 512 323.5078 cycles/op 1024 587.1094 cycles/op 4096 2119.242 cycles/op 131072 133058.5 cycles/op rt=C2=B7memmove 1 20.60156 cycles/op 2 20.34375 cycles/op 4 24.46875 cycles/op 8 22.42969 cycles/op 16 27.45703 cycles/op 32 52.82813 cycles/op 64 79.19531 cycles/op 128 129.1289 cycles/op 512 314.4492 cycles/op 1024 569.9648 cycles/op 4096 2132.297 cycles/op 131072 135378.3 cycles/op --upas-wnnfksrrovlprufyikfxhroafi Content-Disposition: attachment; filename=memmovetest.c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable #include #include typedef struct Movtab Movtab; typedef void* (*Movfn)(void*, void*, ulong); u32int runtimecpuid_edx =3D 0x4000000; extern void* runtimememmove(void*, void*, ulong); struct Movtab { Movfn f; char *name; }; uvlong hz; int sztab[] =3D {1, 2, 4, 8, 16, 32, 64, 128, 512, 1024, 4096, 128*1024, = }; uchar buf0[128*1024]; uchar buf1[128*1024]; Movtab movtab[] =3D {memmove, "memmove", runtimememmove, "rt=C2=B7memmov= e", }; //Movfn movtab[] =3D {memmove, runtimememmove}; uvlong gethz(void) { char buf[1024], *f[5]; int n, fd; fd =3D open("/dev/time", OREAD); if(fd =3D=3D -1) sysfatal("%s: open /dev/time: %r", argv0); n =3D pread(fd, buf, sizeof buf-1, 0); if(n <=3D 0) sysfatal("%s: read /dev/time: %r", argv0); buf[n] =3D 0; n =3D tokenize(buf, f, nelem(f)); if(n < 4) sysfatal("%s: /dev/time: unexpected fmt", argv0); return strtoull(f[3], 0, 0); } void inner(Movfn f, ulong sz) { int i; for(i =3D 0; i < 1024; i++) f(buf1, buf0, sz); } void main(int argc, char **argv) { int i, j; uvlong t[2], c[nelem(movtab)][nelem(sztab)]; // double dhz; Movfn f; ARGBEGIN{ }ARGEND hz =3D gethz(); // dhz =3D hz; for(i =3D 0; i < 2; i++){ print("%s\n", movtab[i].name); f =3D movtab[i].f; for(j =3D 0; j < nelem(sztab); j++){ cycles(t + 0); inner(f, sztab[j]); cycles(t + 1); c[i][j] =3D t[1] - t[0]; print("%d %g cycles/op\n", sztab[j], c[i][j]/1024.); sleep(0); } print("\n"); } exits(""); } --upas-wnnfksrrovlprufyikfxhroafi Content-Type: multipart/mixed; boundary="upas-mfqjrtepgmkpqtrwftyogceoyk" Content-Disposition: inline This is a multi-part message in MIME format. --upas-mfqjrtepgmkpqtrwftyogceoyk Content-Disposition: inline Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit from postmaster@kw: The following attachment had content that we can't prove to be harmless. To avoid possible automatic execution, we changed the content headers. The original header was: Content-Disposition: attachment; filename=memmove_386.s Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit --upas-mfqjrtepgmkpqtrwftyogceoyk Content-Type: application/octet-stream Content-Disposition: attachment; filename="memmove_386.s.suspect" // Inferno's libkern/memmove-386.s // http://code.google.com/p/inferno-os/source/browse/libkern/memmove-386.s // // Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved. // Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com). All rights reserved. // Portions Copyright 2009 The Go Authors. All rights reserved. // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to deal // in the Software without restriction, including without limitation the rights // to use, copy, modify, merge, publish, distribute, sublicense, and/or sell // copies of the Software, and to permit persons to whom the Software is // furnished to do so, subject to the following conditions: // // The above copyright notice and this permission notice shall be included in // all copies or substantial portions of the Software. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE // AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN // THE SOFTWARE. #define MOVOU MOVDQU TEXT runtimememmove(SB), $0 MOVL to+0(FP), DI MOVL fr+4(FP), SI MOVL n+8(FP), BX // REP instructions have a high startup cost, so we handle small sizes // with some straightline code. The REP MOVSL instruction is really fast // for large sizes. The cutover is approximately 1K. We implement up to // 128 because that is the maximum SSE register load (loading all data // into registers lets us ignore copy direction). tail: TESTL BX, BX JEQ move_0 CMPL BX, $2 JBE move_1or2 CMPL BX, $4 JBE move_3or4 CMPL BX, $8 JBE move_5through8 CMPL BX, $16 JBE move_9through16 TESTL $0x4000000, runtimecpuid_edx(SB) // check for sse2 JEQ nosse2 CMPL BX, $32 JBE move_17through32 CMPL BX, $64 JBE move_33through64 CMPL BX, $128 JBE move_65through128 // TODO: use branch table and BSR to make this just a single dispatch nosse2: /* * check and set for backwards */ CMPL SI, DI JLS back /* * forward copy loop */ forward: MOVL BX, CX SHRL $2, CX ANDL $3, BX REP; MOVSL JMP tail /* * check overlap */ back: MOVL SI, CX ADDL BX, CX CMPL CX, DI JLS forward /* * whole thing backwards has * adjusted addresses */ ADDL BX, DI ADDL BX, SI STD /* * copy */ MOVL BX, CX SHRL $2, CX ANDL $3, BX SUBL $4, DI SUBL $4, SI REP; MOVSL CLD ADDL $4, DI ADDL $4, SI SUBL BX, DI SUBL BX, SI JMP tail move_1or2: MOVB (SI), AX MOVB -1(SI)(BX*1), CX MOVB AX, (DI) MOVB CX, -1(DI)(BX*1) move_0: RET move_3or4: MOVW (SI), AX MOVW -2(SI)(BX*1), CX MOVW AX, (DI) MOVW CX, -2(DI)(BX*1) RET move_5through8: MOVL (SI), AX MOVL -4(SI)(BX*1), CX MOVL AX, (DI) MOVL CX, -4(DI)(BX*1) RET move_9through16: MOVL (SI), AX MOVL 4(SI), CX MOVL -8(SI)(BX*1), DX MOVL -4(SI)(BX*1), BP MOVL AX, (DI) MOVL CX, 4(DI) MOVL DX, -8(DI)(BX*1) MOVL BP, -4(DI)(BX*1) RET move_17through32: MOVOU (SI), X0 MOVOU -16(SI)(BX*1), X1 MOVOU X0, (DI) MOVOU X1, -16(DI)(BX*1) RET move_33through64: MOVOU (SI), X0 MOVOU 16(SI), X1 MOVOU -32(SI)(BX*1), X2 MOVOU -16(SI)(BX*1), X3 MOVOU X0, (DI) MOVOU X1, 16(DI) MOVOU X2, -32(DI)(BX*1) MOVOU X3, -16(DI)(BX*1) RET move_65through128: MOVOU (SI), X0 MOVOU 16(SI), X1 MOVOU 32(SI), X2 MOVOU 48(SI), X3 MOVOU -64(SI)(BX*1), X4 MOVOU -48(SI)(BX*1), X5 MOVOU -32(SI)(BX*1), X6 MOVOU -16(SI)(BX*1), X7 MOVOU X0, (DI) MOVOU X1, 16(DI) MOVOU X2, 32(DI) MOVOU X3, 48(DI) MOVOU X4, -64(DI)(BX*1) MOVOU X5, -48(DI)(BX*1) MOVOU X6, -32(DI)(BX*1) MOVOU X7, -16(DI)(BX*1) RET --upas-mfqjrtepgmkpqtrwftyogceoyk-- --upas-wnnfksrrovlprufyikfxhroafi--