From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@coraid.com>
Date: Fri,  5 Dec 2008 14:40:34 -0500
To: 9fans@9fans.net
Message-ID: <c14dcfe176eba456974fe04fb82948af@coraid.com>
In-Reply-To: <4939815F.9020509@telus.net>
References: <13426df10812042239pde2100dw696049def0160c4a@mail.gmail.com>
	<39cb2be32e592403f7336c6200cf56a3@quanstro.net>
	<dd6fe68a0812051027t5661d7ebs81cce9a4ca0a6b7f@mail.gmail.com>
	<dd6fe68a0812051032i7fad822bx34fdb9cc704f280e@mail.gmail.com>
	<13426df10812051049j40b40b78u4ae74a3fc7df07a3@mail.gmail.com>
	<49397F3E.9070801@telus.net>
	<57cb40901c57600ac592ec15ccb1a687@coraid.com>
	<4939815F.9020509@telus.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: Re: [9fans] image/memimage speed
Topicbox-Message-UUID: 5b4262dc-ead4-11e9-9d60-3106f5b1d025

On Fri Dec  5 14:32:56 EST 2008, plalonde@telus.net wrote:
> But random access patterns suck at being speculatively cached.
> Linear access patterns still require reasonably careful work for the
> caching to do the right thing.
> Expecting your entire frame buffer to be cached in L2 isn't particularly
> reasonable.
>
> Paul

i'm just not convinced that nvidia's poor performance has
anything to do with pcie latency or processor stalls.
a 500x500 window takes ~1sec to uncover.  that's like
2 billion instructions.  since a cacheline is ~128 bytes
(close enough)  that's ~8000 stall opertunities.  if it
takes all of them, that's only 8 million instructions.
on the order of 1/1000th of the actual delay.  if WC
were the issue, i should see 100x improvement in reading
from the card.

- erik