9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@9fans.net
Subject: Re: [9fans] Go and 21-bit runes (and a bit of Go status)
Date: Sun,  2 Jun 2013 10:10:00 -0400	[thread overview]
Message-ID: <ed8f64a5984a128fa8abd3236a1d901d@kw.quanstro.net> (raw)
In-Reply-To: <f1f540c3c271b35b940f50607d11a615@proxima.alt.za>

[-- Attachment #1: Type: text/plain, Size: 1570 bytes --]

> Regarding the latter, Plan 9 does not allow floating point
> instructions to be executed within note handling, but erring on the
> side of caution also forbids instructions such as MOVOU (don't ask me)
> which is part of the SSE(2?) extension, but hardly qualifies as a
> floating point instruction.

movou (movdqu in the manual) is a sse2 data movement instruction.
not all sse2 instructions require that sse be turned on (pause, for example),
but movou uses at least one xmm register so is clearly using the sse
unit, thus requiring that it be turned on.

the go runtime memmove uses movou for memmoves between 33 and 128
bytes.  i only see a 10 cycle difference for these cases on my atom machine,
(maximum 13%), so we're not missing out on much here by not using sse.

the real win, or loss for the plan 9 memmove, is in the short memmoves.
but this is a µbenchmark, and it would be more convincing with a real
world test.

- erik

harness; 8.memmovetest
memmove
1	92.42578 cycles/op
2	81.28125 cycles/op
4	56.47266 cycles/op
8	58.32422 cycles/op
16	62.28516 cycles/op
32	70.26563 cycles/op
64	86.32031 cycles/op
128	118.3125 cycles/op
512	323.5078 cycles/op
1024	587.1094 cycles/op
4096	2119.242 cycles/op
131072	133058.5 cycles/op

rt·memmove
1	20.60156 cycles/op
2	20.34375 cycles/op
4	24.46875 cycles/op
8	22.42969 cycles/op
16	27.45703 cycles/op
32	52.82813 cycles/op
64	79.19531 cycles/op
128	129.1289 cycles/op
512	314.4492 cycles/op
1024	569.9648 cycles/op
4096	2132.297 cycles/op
131072	135378.3 cycles/op

[-- Attachment #2: memmovetest.c --]
[-- Type: text/plain, Size: 1526 bytes --]

#include <u.h>
#include <libc.h>

typedef struct Movtab Movtab;
typedef void* (*Movfn)(void*, void*, ulong);

	u32int	runtimecpuid_edx = 0x4000000;
extern	void*	runtimememmove(void*, void*, ulong);

struct Movtab {
	Movfn	f;
	char	*name;
};

uvlong	hz;
int	sztab[] = {1, 2, 4, 8, 16, 32, 64, 128, 512, 1024, 4096, 128*1024, };
uchar	buf0[128*1024];
uchar	buf1[128*1024];
Movtab	movtab[] = {memmove, "memmove",  runtimememmove, "rt·memmove", };
//Movfn	movtab[] = {memmove, runtimememmove};

uvlong
gethz(void)
{
	char buf[1024], *f[5];
	int n, fd;

	fd = open("/dev/time", OREAD);
	if(fd == -1)
		sysfatal("%s: open /dev/time: %r", argv0);
	n = pread(fd, buf, sizeof buf-1, 0);
	if(n <= 0)
		sysfatal("%s: read /dev/time: %r", argv0);

	buf[n] = 0;
	n = tokenize(buf, f, nelem(f));
	if(n < 4)
		sysfatal("%s: /dev/time: unexpected fmt", argv0);

	return strtoull(f[3], 0, 0);
}

void
inner(Movfn f, ulong sz)
{
	int i;

	for(i = 0; i < 1024; i++)
		f(buf1, buf0, sz);
}

void
main(int argc, char **argv)
{
	int i, j;
	uvlong t[2], c[nelem(movtab)][nelem(sztab)];
//	double dhz;
	Movfn f;

	ARGBEGIN{
	}ARGEND

	hz = gethz();
//	dhz = hz;

	for(i = 0; i < 2; i++){
		print("%s\n", movtab[i].name);
		f = movtab[i].f;
		for(j = 0; j < nelem(sztab); j++){
			cycles(t + 0);
			inner(f, sztab[j]);
			cycles(t + 1);
			c[i][j] = t[1] - t[0];

			print("%d	%g cycles/op\n", sztab[j], c[i][j]/1024.);
			sleep(0);
		}
		print("\n");
	}

	exits("");
}

[-- Attachment #3.1: Type: text/plain, Size: 331 bytes --]

from postmaster@kw:
The following attachment had content that we can't
prove to be harmless.  To avoid possible automatic
execution, we changed the content headers.
The original header was:

	Content-Disposition: attachment; filename=memmove_386.s
	Content-Type: text/plain; charset="UTF-8"
	Content-Transfer-Encoding: 8bit

[-- Attachment #3.2: memmove_386.s.suspect --]
[-- Type: application/octet-stream, Size: 4116 bytes --]

// Inferno's libkern/memmove-386.s
// http://code.google.com/p/inferno-os/source/browse/libkern/memmove-386.s
//
//         Copyright © 1994-1999 Lucent Technologies Inc.  All rights reserved.
//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
//         Portions Copyright 2009 The Go Authors. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
#define MOVOU	MOVDQU

TEXT runtimememmove(SB), $0
	MOVL	to+0(FP), DI
	MOVL	fr+4(FP), SI
	MOVL	n+8(FP), BX

	// REP instructions have a high startup cost, so we handle small sizes
	// with some straightline code.  The REP MOVSL instruction is really fast
	// for large sizes.  The cutover is approximately 1K.  We implement up to
	// 128 because that is the maximum SSE register load (loading all data
	// into registers lets us ignore copy direction).
tail:
	TESTL	BX, BX
	JEQ	move_0
	CMPL	BX, $2
	JBE	move_1or2
	CMPL	BX, $4
	JBE	move_3or4
	CMPL	BX, $8
	JBE	move_5through8
	CMPL	BX, $16
	JBE	move_9through16
	TESTL	$0x4000000, runtimecpuid_edx(SB) // check for sse2
	JEQ	nosse2
	CMPL	BX, $32
	JBE	move_17through32
	CMPL	BX, $64
	JBE	move_33through64
	CMPL	BX, $128
	JBE	move_65through128
	// TODO: use branch table and BSR to make this just a single dispatch

nosse2:
/*
 * check and set for backwards
 */
	CMPL	SI, DI
	JLS	back

/*
 * forward copy loop
 */
forward:
	MOVL	BX, CX
	SHRL	$2, CX
	ANDL	$3, BX

	REP;	MOVSL
	JMP	tail
/*
 * check overlap
 */
back:
	MOVL	SI, CX
	ADDL	BX, CX
	CMPL	CX, DI
	JLS	forward
/*
 * whole thing backwards has
 * adjusted addresses
 */

	ADDL	BX, DI
	ADDL	BX, SI
	STD

/*
 * copy
 */
	MOVL	BX, CX
	SHRL	$2, CX
	ANDL	$3, BX

	SUBL	$4, DI
	SUBL	$4, SI
	REP;	MOVSL

	CLD
	ADDL	$4, DI
	ADDL	$4, SI
	SUBL	BX, DI
	SUBL	BX, SI
	JMP	tail

move_1or2:
	MOVB	(SI), AX
	MOVB	-1(SI)(BX*1), CX
	MOVB	AX, (DI)
	MOVB	CX, -1(DI)(BX*1)
move_0:
	RET
move_3or4:
	MOVW	(SI), AX
	MOVW	-2(SI)(BX*1), CX
	MOVW	AX, (DI)
	MOVW	CX, -2(DI)(BX*1)
	RET
move_5through8:
	MOVL	(SI), AX
	MOVL	-4(SI)(BX*1), CX
	MOVL	AX, (DI)
	MOVL	CX, -4(DI)(BX*1)
	RET
move_9through16:
	MOVL	(SI), AX
	MOVL	4(SI), CX
	MOVL	-8(SI)(BX*1), DX
	MOVL	-4(SI)(BX*1), BP
	MOVL	AX, (DI)
	MOVL	CX, 4(DI)
	MOVL	DX, -8(DI)(BX*1)
	MOVL	BP, -4(DI)(BX*1)
	RET
move_17through32:
	MOVOU	(SI), X0
	MOVOU	-16(SI)(BX*1), X1
	MOVOU	X0, (DI)
	MOVOU	X1, -16(DI)(BX*1)
	RET
move_33through64:
	MOVOU	(SI), X0
	MOVOU	16(SI), X1
	MOVOU	-32(SI)(BX*1), X2
	MOVOU	-16(SI)(BX*1), X3
	MOVOU	X0, (DI)
	MOVOU	X1, 16(DI)
	MOVOU	X2, -32(DI)(BX*1)
	MOVOU	X3, -16(DI)(BX*1)
	RET
move_65through128:
	MOVOU	(SI), X0
	MOVOU	16(SI), X1
	MOVOU	32(SI), X2
	MOVOU	48(SI), X3
	MOVOU	-64(SI)(BX*1), X4
	MOVOU	-48(SI)(BX*1), X5
	MOVOU	-32(SI)(BX*1), X6
	MOVOU	-16(SI)(BX*1), X7
	MOVOU	X0, (DI)
	MOVOU	X1, 16(DI)
	MOVOU	X2, 32(DI)
	MOVOU	X3, 48(DI)
	MOVOU	X4, -64(DI)(BX*1)
	MOVOU	X5, -48(DI)(BX*1)
	MOVOU	X6, -32(DI)(BX*1)
	MOVOU	X7, -16(DI)(BX*1)
	RET

  reply	other threads:[~2013-06-02 14:10 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-02 10:53 lucio
2013-06-02 14:10 ` erik quanstrom [this message]
2013-06-02 15:24   ` lucio
2013-06-03  4:20     ` erik quanstrom
2013-06-03  5:38       ` lucio
2013-06-03 13:28         ` erik quanstrom
2013-06-03 16:34           ` lucio
2013-06-03 16:46             ` erik quanstrom
2013-06-03 17:04               ` lucio
2013-06-03 17:07                 ` erik quanstrom
2013-06-03 17:33                   ` Bakul Shah
2013-06-03 17:38                 ` Charles Forsyth
2013-06-03  5:48       ` [9fans] More Go status lucio
2013-06-03 17:53       ` [9fans] SSE in a note handler Steve Simon
2013-06-02 15:01 ` [9fans] Go and 21-bit runes (and a bit of Go status) cinap_lenrek
2013-06-02 15:22   ` lucio
2013-06-02 15:38     ` cinap_lenrek
2013-06-02 15:54       ` lucio
2013-06-02 15:59         ` Kurt H Maier
2013-06-02 16:08           ` lucio
2013-06-02 19:37   ` Anthony Martin
2013-12-02  2:10     ` Skip Tavakkolian
2013-12-02  8:22       ` Anthony Martin
2013-12-02 14:33         ` erik quanstrom
2013-12-02 14:59           ` lucio
2013-12-02 15:22             ` Kurt H Maier
2013-12-02 17:19               ` lucio
2013-12-02 18:39                 ` Kurt H Maier
2013-12-02 20:09                   ` Skip Tavakkolian
2013-12-02 20:11                     ` erik quanstrom
2013-12-02 20:22                       ` Skip Tavakkolian
2013-12-02 20:24                       ` David du Colombier
2013-12-02 20:38                         ` erik quanstrom
2013-12-02 20:44                           ` Bakul Shah
2013-12-02 20:45                             ` erik quanstrom
2013-12-02 20:59                               ` Bakul Shah
2013-12-02 21:03                                 ` erik quanstrom
2013-12-02 23:35                                   ` Bakul Shah
2013-12-03  0:11                                     ` erik quanstrom
2013-12-02 20:45                           ` David du Colombier
2013-12-02 21:06                           ` Skip Tavakkolian
2013-12-02 21:45                             ` Jeff Sickel
2013-12-02 21:47                               ` erik quanstrom
2013-12-02 21:51                             ` erik quanstrom
2013-12-02 23:26                               ` Skip Tavakkolian
2013-12-02 23:43                                 ` Steve Simon
2013-12-03  0:16                                   ` Anthony Martin
2013-12-03  2:55                                     ` erik quanstrom
2013-12-03  0:12                                 ` erik quanstrom
2013-12-03  0:21                                   ` Anthony Martin
2013-12-03  0:49                                   ` Aram Hăvărneanu
2013-12-03  0:52                                     ` erik quanstrom
2013-12-03  1:01                                       ` Anthony Martin
2013-12-03  1:06                                       ` Jeremy Jackins
2013-12-03  1:34                                         ` Jeff Sickel
2013-12-03  7:43                                           ` lucio
2013-12-03  7:33                                     ` lucio
2013-12-03  7:29                                   ` lucio
2013-12-03 15:20                                     ` erik quanstrom
2013-12-03  7:31                                   ` lucio
2013-12-03  8:14                                     ` Jeff Sickel
2013-12-03  9:16                                       ` lucio
2013-12-03  9:21                                       ` lucio
2013-12-03 14:51                                         ` erik quanstrom
2013-12-03  9:46                                       ` Charles Forsyth
2013-12-03 10:04                                         ` lucio
2013-12-03 11:39                                           ` Aram Hăvărneanu
2013-12-03 14:42                                           ` erik quanstrom
2013-12-03 14:51                                             ` Charles Forsyth
2013-12-03 15:54                                               ` Jeff Sickel
2013-12-03 16:04                                               ` lucio
2013-12-03 16:47                                                 ` Charles Forsyth
2013-12-03 17:44                                                   ` Skip Tavakkolian
2013-12-03 23:12                                                   ` john francis lee
2013-12-04  0:13                                                     ` sl
2013-12-04  4:25                                                   ` lucio
2013-12-04  4:35                                                     ` erik quanstrom
2013-12-04  6:19                                                       ` lucio
2013-12-04  7:04                                                         ` [9fans] Go port [was Re: Go and 21-bit runes (and a bit of Go status)] Jeff Sickel
2013-12-04  7:20                                                           ` [9fans] Go port [was Re: Go and 21-bit runes (and a bit of Go lucio
2013-12-04  7:52                                                             ` Jeff Sickel
2013-12-04 15:11                                                               ` lucio
2013-12-04  4:37                                                     ` [9fans] Go and 21-bit runes (and a bit of Go status) Jens Staal
2013-12-04  4:46                                                       ` erik quanstrom
2013-12-03  7:10                                 ` lucio
2013-12-03  7:23                                   ` Skip Tavakkolian
2013-12-03  7:37                                     ` lucio
2013-12-03 15:04                                     ` erik quanstrom
2013-12-03  9:48                                 ` Richard Miller
2013-12-03 10:08                                   ` lucio
2013-12-03 11:14                                     ` Charles Forsyth
2013-12-03 11:24                                       ` lucio
2013-12-03  6:53                     ` lucio
2013-12-03  4:49                   ` lucio
2013-12-03  8:02                     ` Kurt H Maier
2013-12-03  9:12                       ` lucio
2013-12-02 15:50             ` erik quanstrom
2013-12-02 17:23               ` lucio
2013-12-02 18:35                 ` erik quanstrom
2013-12-03  4:35                   ` lucio
2013-12-02 22:52               ` Anthony Martin
2013-12-03  6:20                 ` andrey mirtchovski
2013-12-02 16:10           ` Skip Tavakkolian
2013-12-02 17:25             ` lucio
2013-12-02 19:13               ` Skip Tavakkolian
2013-12-02 19:34                 ` erik quanstrom
2013-12-03  6:34                   ` lucio
2013-12-03  5:02                 ` lucio
2013-12-02 17:31             ` Jeff Sickel
2013-12-02 17:52               ` lucio
2013-12-02 18:33                 ` erik quanstrom
2013-12-02 19:16                   ` Skip Tavakkolian
2013-12-02 19:26                     ` erik quanstrom
2013-12-02 19:33                       ` Skip Tavakkolian
2013-12-02 19:31                   ` Christopher Nielsen
2013-12-02 20:17                     ` David du Colombier
2013-12-02 19:37                   ` Bakul Shah
2013-12-02 19:57                     ` Skip Tavakkolian
2013-12-03  6:47                       ` lucio
2013-12-03  6:45                     ` lucio
2013-12-03  4:32                   ` lucio
2013-12-03 17:22 erik quanstrom
2013-12-03 17:37 ` Bence Fábián
2013-12-03 17:59   ` erik quanstrom
2013-12-04  5:48 ` lucio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ed8f64a5984a128fa8abd3236a1d901d@kw.quanstro.net \
    --to=quanstro@quanstro.net \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).