[9fans] image/memimage speed

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] image/memimage speed
@ 2008-11-30 22:00 Iruata Souza
  2008-11-30 23:54 ` Iruata Souza
  0 siblings, 1 reply; 25+ messages in thread
From: Iruata Souza @ 2008-11-30 22:00 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

9fans,

I'm playing with nearest-neighbor image resampling and wrote two
simple implementations - http://tmp.oitobits.net/iru/nn.c,
http://tmp.oitobits.net/iru/nnmem.c - one using draw(2) and the other
using memdraw(2).

running them on the same image on disk shows that nnmem is way faster:

cpu% nnmem acme.wd
spent 0.127344 seconds on resampling
cpu% nn acme.wd
spent 6.111893 seconds on resampling

looking at the code you see nn.c calls unloadimage() to fill oscan
with the data from m; oscan is then used for the interpolation. that
pass is not needed in nnmem.c because of byteaddr() gives us the
address of the first byte of data in m, the memimage in question.

what I'm seeking is a way to avoid the unloadimage() call in nn.c, if
that's possible - which, by my understanding of the manual and code,
is not.
alternatively I could try drawing the memimage to the screen, which I
did not find possible directly, only by converting it to an image.
any ideas?

sorry if I'm missing the obvious.

iru

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-11-30 22:00 [9fans] image/memimage speed Iruata Souza
@ 2008-11-30 23:54 ` Iruata Souza
  2008-12-01  1:29   ` erik quanstrom
  0 siblings, 1 reply; 25+ messages in thread
From: Iruata Souza @ 2008-11-30 23:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sun, Nov 30, 2008 at 8:00 PM, Iruata Souza <iru.muzgo@gmail.com> wrote:
> 9fans,
>
> I'm playing with nearest-neighbor image resampling and wrote two
> simple implementations - http://tmp.oitobits.net/iru/nn.c,
> http://tmp.oitobits.net/iru/nnmem.c - one using draw(2) and the other
> using memdraw(2).
>
> running them on the same image on disk shows that nnmem is way faster:
>
> cpu% nnmem acme.wd
> spent 0.127344 seconds on resampling
> cpu% nn acme.wd
> spent 6.111893 seconds on resampling
>
> looking at the code you see nn.c calls unloadimage() to fill oscan
> with the data from m; oscan is then used for the interpolation. that
> pass is not needed in nnmem.c because of byteaddr() gives us the
> address of the first byte of data in m, the memimage in question.
>
> what I'm seeking is a way to avoid the unloadimage() call in nn.c, if
> that's possible - which, by my understanding of the manual and code,
> is not.
> alternatively I could try drawing the memimage to the screen, which I
> did not find possible directly, only by converting it to an image.
> any ideas?
>
> sorry if I'm missing the obvious.
>

mostly everything here is now understood by me. sorry for the noise.

iru



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-11-30 23:54 ` Iruata Souza
@ 2008-12-01  1:29   ` erik quanstrom
  2008-12-01  1:54     ` andrey mirtchovski
  2008-12-01 14:19     ` Steve Simon
  0 siblings, 2 replies; 25+ messages in thread
From: erik quanstrom @ 2008-12-01  1:29 UTC (permalink / raw)
  To: 9fans

>> what I'm seeking is a way to avoid the unloadimage() call in nn.c, if
>> that's possible - which, by my understanding of the manual and code,
>> is not.
>> alternatively I could try drawing the memimage to the screen, which I
>> did not find possible directly, only by converting it to an image.
>> any ideas?
>>
>> sorry if I'm missing the obvious.
>>
>
> mostly everything here is now understood by me. sorry for the noise.

i think this is a good point.  reading from the frame buffer can
be deathly slow on a lot of modern video cards.  you're seeing a
factor of 60.  it might be a good idea to keep a copy of the
framebuffer in kernel memory.

i have been using a write-combining framebuffer for about four
months.  (implemented for the x86 architechture via the pat
bits in the page table.)  it has made drawing (writes to the
framebuffer) much faster, but, since reads from the frame buffer
are slow for different reasons, it doesn't help at all for operations
like unhiding windows.

- erik

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-01  1:29   ` erik quanstrom
@ 2008-12-01  1:54     ` andrey mirtchovski
  2008-12-01  2:35       ` erik quanstrom
  2008-12-01 14:19     ` Steve Simon
  1 sibling, 1 reply; 25+ messages in thread
From: andrey mirtchovski @ 2008-12-01  1:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

can you report timings on the xscreensaver hacks (link at bottom)?

they have a "benchmarking" option -b S which lets you see how many fps
they're doing:

mk all; for (i in 8.*) { echo -n $i^': '; $i -b 5 }

would run each hack for 5 seconds and let you know what their fps is.

i used to get incredible fps reports in parallels, where everything
was in memory and read backs were fast.

andrey

http://mirtchovski.com/p9/xscr/xscr.tgz

> i think this is a good point.  reading from the frame buffer can
> be deathly slow on a lot of modern video cards.  you're seeing a
> factor of 60.  it might be a good idea to keep a copy of the
> framebuffer in kernel memory.
>
> i have been using a write-combining framebuffer for about four
> months.  (implemented for the x86 architechture via the pat
> bits in the page table.)  it has made drawing (writes to the
> framebuffer) much faster, but, since reads from the frame buffer
> are slow for different reasons, it doesn't help at all for operations
> like unhiding windows.
>
> - erik
>
>
>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-01  1:54     ` andrey mirtchovski
@ 2008-12-01  2:35       ` erik quanstrom
  2008-12-01  3:30         ` andrey mirtchovski
  2008-12-01  6:41         ` Paul Lalonde
  0 siblings, 2 replies; 25+ messages in thread
From: erik quanstrom @ 2008-12-01  2:35 UTC (permalink / raw)
  To: 9fans

> can you report timings on the xscreensaver hacks (link at bottom)?

sure.  a few of the programs didn't seem to work (delayscreen
and polydominoes).  mountain, sphere and spirograph seemed
to hum right along.

- erik

; cat '#P/archctl'
cpu AMD64 2604 pge
pge on
coherence mfence
cmpswap cmpswap486
i8253set on

; cat /dev/wsys/^`{cat /dev/winid}^/wctl
         60         505         765        1034 current visible

8.anemone: fps: 30.796417
8.anemotaxis: fps: 25.595581
8.attraction: fps: 55.993117
8.blaster: fps: 71.190817
8.boxfit: fps: 20.596372
8.ccurve: fps: 0.999865
8.cloudlife: fps: 22.795733
8.coral: fps: 70.787094
8.critical: fps: 151.975499
8.decayscreen: fps: 0.000000
8.deco: fps: 0.199968
8.deluxe: fps: 14.197528
8.demon: fps: 379.542993
8.discrete: fps: 17.997126
8.drift: fps: 8.998634
8.eruption: fps: 25.996981
8.euler2d: fps: 43.295463
8.fadeplot: fps: 51.393631
8.flame: fps: 7.598581
8.flow: fps: 45.593024
8.fluidballs: fps: 13.798253
8.forest: fps: 8.598666
8.fuzzyflakes: fps: 41.790739
8.galaxy: fps: 160.172456
8.glenda: fps: 46.295627
8.halftone: fps: 18.597379
8.helix: fps: 0.399954
8.hopalong: fps: 414.148296
8.hypercube: fps: 34.194590
8.hyperglenda: fps: 30.996313
8.ifs: fps: 37.993663
8.imsmap: fps: 0.199964
8.interaggregate: fps: 9.198770
8.interference: fps: 5.599390
8.julia: fps: 45.392443
8.lyap: fps: 45674.831450
8.maze: fps: 1.199848
8.moire: fps: 0.999855
8.mountain: fps: 31790.873361
8.munch: fps: 0.199971
8.nerverot: fps: 26.795984
8.pacman: fps: 59.190800
8.pedal: fps: 0.599921
8.petri: fps: 76.190348
8.pyro: fps: 305.364824
8.rd-bomb: fps: 32.995072
8.ripples: fps: 27.196419
8.rorschach: fps: 0.000000
8.rotzoomer: fps: 34.194205
8.scrdump: fps: 0.000000
8.sierpinski: fps: 46.592159
8.slip: fps: 1.799762
8.sphere: fps: 647.526941
8.spirograph: fps: 23604.015568
8.squiral: fps: 2129.662538
8.starfish: fps: 49.190284
8.strange: fps: 47.992470
8.substrate: fps: 0.000000
8.swirl: fps: 551.526029
8.thornbird: fps: 28.196393
8.triangle: fps: 66.588276
8.vermiculate: fps: 1357.648605
8.vines: fps: 87.190306
8.wander: fps: 51699.256387
8.whirlwindwarp: fps: 55.592024
8.wormhole: fps: 32.196066
8.zoom: fps: 2.399737




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-01  2:35       ` erik quanstrom
@ 2008-12-01  3:30         ` andrey mirtchovski
  2008-12-01  6:41         ` Paul Lalonde
  1 sibling, 0 replies; 25+ messages in thread
From: andrey mirtchovski @ 2008-12-01  3:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

cool, thanks!

those do almost everything: loadimage() from bytes, drawing onto the
screen using itself as a source, drawing primitives (lines and
circles), alpha blending (glenda and hyperglenda) as well as
unloadimage() for the case where the screen is being resized.

i'll try to dig out a parallels installation tomorrow to compare your
results with.

andrey

> sure.  a few of the programs didn't seem to work (delayscreen
> and polydominoes).  mountain, sphere and spirograph seemed
> to hum right along.
>
> - erik
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-01  2:35       ` erik quanstrom
  2008-12-01  3:30         ` andrey mirtchovski
@ 2008-12-01  6:41         ` Paul Lalonde
  1 sibling, 0 replies; 25+ messages in thread
From: Paul Lalonde @ 2008-12-01  6:41 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs; +Cc: 9fans

Minor gripe as a graphics person - I loathe frames per second as a
performance measure.  Because of the inverse you can't really compare
the numbers.  Consider the difference in improvement between 1fps and
2fps compared to 29fps and 30fps.  Both improve by 1fps, but they
represent dramatically different performance gains.

If you can at all, report milliseconds per frame, ideally with fixed
system overhead removed.

The only time fps is appropriate is if you're targetting your v-blank
rate.

Yes, I've been reviewing too many papers with bar charts of gps
measures lately...

Paul

On Nov 30, 2008, at 6:35 PM, erik quanstrom <quanstro@quanstro.net>
wrote:

>
>> can you report timings on the xscreensaver hacks (link at bottom)?
>
> sure.  a few of the programs didn't seem to work (delayscreen
> and polydominoes).  mountain, sphere and spirograph seemed
> to hum right along.
>
> - erik
>
> ; cat '#P/archctl'
> cpu AMD64 2604 pge
> pge on
> coherence mfence
> cmpswap cmpswap486
> i8253set on
>
> ; cat /dev/wsys/^`{cat /dev/winid}^/wctl
>         60         505         765        1034 current visible
>
> 8.anemone: fps: 30.796417
> 8.anemotaxis: fps: 25.595581
> 8.attraction: fps: 55.993117
> 8.blaster: fps: 71.190817
> 8.boxfit: fps: 20.596372
> 8.ccurve: fps: 0.999865
> 8.cloudlife: fps: 22.795733
> 8.coral: fps: 70.787094
> 8.critical: fps: 151.975499
> 8.decayscreen: fps: 0.000000
> 8.deco: fps: 0.199968
> 8.deluxe: fps: 14.197528
> 8.demon: fps: 379.542993
> 8.discrete: fps: 17.997126
> 8.drift: fps: 8.998634
> 8.eruption: fps: 25.996981
> 8.euler2d: fps: 43.295463
> 8.fadeplot: fps: 51.393631
> 8.flame: fps: 7.598581
> 8.flow: fps: 45.593024
> 8.fluidballs: fps: 13.798253
> 8.forest: fps: 8.598666
> 8.fuzzyflakes: fps: 41.790739
> 8.galaxy: fps: 160.172456
> 8.glenda: fps: 46.295627
> 8.halftone: fps: 18.597379
> 8.helix: fps: 0.399954
> 8.hopalong: fps: 414.148296
> 8.hypercube: fps: 34.194590
> 8.hyperglenda: fps: 30.996313
> 8.ifs: fps: 37.993663
> 8.imsmap: fps: 0.199964
> 8.interaggregate: fps: 9.198770
> 8.interference: fps: 5.599390
> 8.julia: fps: 45.392443
> 8.lyap: fps: 45674.831450
> 8.maze: fps: 1.199848
> 8.moire: fps: 0.999855
> 8.mountain: fps: 31790.873361
> 8.munch: fps: 0.199971
> 8.nerverot: fps: 26.795984
> 8.pacman: fps: 59.190800
> 8.pedal: fps: 0.599921
> 8.petri: fps: 76.190348
> 8.pyro: fps: 305.364824
> 8.rd-bomb: fps: 32.995072
> 8.ripples: fps: 27.196419
> 8.rorschach: fps: 0.000000
> 8.rotzoomer: fps: 34.194205
> 8.scrdump: fps: 0.000000
> 8.sierpinski: fps: 46.592159
> 8.slip: fps: 1.799762
> 8.sphere: fps: 647.526941
> 8.spirograph: fps: 23604.015568
> 8.squiral: fps: 2129.662538
> 8.starfish: fps: 49.190284
> 8.strange: fps: 47.992470
> 8.substrate: fps: 0.000000
> 8.swirl: fps: 551.526029
> 8.thornbird: fps: 28.196393
> 8.triangle: fps: 66.588276
> 8.vermiculate: fps: 1357.648605
> 8.vines: fps: 87.190306
> 8.wander: fps: 51699.256387
> 8.whirlwindwarp: fps: 55.592024
> 8.wormhole: fps: 32.196066
> 8.zoom: fps: 2.399737
>
>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-01  1:29   ` erik quanstrom
  2008-12-01  1:54     ` andrey mirtchovski
@ 2008-12-01 14:19     ` Steve Simon
  2008-12-01 14:33       ` erik quanstrom
  1 sibling, 1 reply; 25+ messages in thread
From: Steve Simon @ 2008-12-01 14:19 UTC (permalink / raw)
  To: 9fans

> i think this is a good point.  reading from the frame buffer can
> be deathly slow on a lot of modern video cards.

Very true, the only exception to this I know of is some of the modern
Dual PCIExpress cards which use a bus in each direction.

-Steve



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-01 14:19     ` Steve Simon
@ 2008-12-01 14:33       ` erik quanstrom
  2008-12-05  6:39         ` ron minnich
  0 siblings, 1 reply; 25+ messages in thread
From: erik quanstrom @ 2008-12-01 14:33 UTC (permalink / raw)
  To: 9fans

>
> Very true, the only exception to this I know of is some of the modern
> Dual PCIExpress cards which use a bus in each direction.
>

do you have a reference for "dual pciexpress"?  as far as i know,
pcie/agp/pci cards only have a single bus that goes both ways.

my limited understanding was that the reason that reading the
framebuffer was slow, was that there is no framebuffer.  it's an
illusion that the card provides that's easy to write but not so
easy to fake on reads.

someone with real understanding of graphics please correct me.

- erik

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-01 14:33       ` erik quanstrom
@ 2008-12-05  6:39         ` ron minnich
  2008-12-05 13:35           ` erik quanstrom
  0 siblings, 1 reply; 25+ messages in thread
From: ron minnich @ 2008-12-05  6:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Dec 1, 2008 at 6:33 AM, erik quanstrom <quanstro@coraid.com> wrote:
>>
>> Very true, the only exception to this I know of is some of the modern
>> Dual PCIExpress cards which use a bus in each direction.
>>
>
> do you have a reference for "dual pciexpress"?  as far as i know,
> pcie/agp/pci cards only have a single bus that goes both ways.

how about this:
"SANTA CLARA, CA—JUNE 28, 2004—NVIDIA Corporation (Nasdaq: NVDA), the
worldwide leader in visual processing solutions, broadened its already
expansive graphics line today with the introduction of four new NVIDIA
Quadro(R) professional graphics solutions based on PCI Express™.
Leveraging this next-generation bus architecture, NVIDIA doubles the
bandwidth of its AGP 8X-based products to over 4GB per second in both
upstream and downstream data transfers.  "

I think the AGP assymetry is long gone. Back when I was at LANL the
graphics guys were telling me that read bandwidth was no longer an
issue with the new pcie cards.

If you're still seeing bad performance it may be because you need to
fix up the MTRR or GART settings. I've done this dance and have no
memory at this point of what you do, but vague memory is that proper
MTRR settings with a good PCIe card will give you far better bandwidth
than the old AGP cards. There is nothing like the 60x assymetry.

ron

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-05  6:39         ` ron minnich
@ 2008-12-05 13:35           ` erik quanstrom
  2008-12-05 18:27             ` Russ Cox
  0 siblings, 1 reply; 25+ messages in thread
From: erik quanstrom @ 2008-12-05 13:35 UTC (permalink / raw)
  To: 9fans

> If you're still seeing bad performance it may be because you need to
> fix up the MTRR or GART settings. I've done this dance and have no
> memory at this point of what you do, but vague memory is that proper
> MTRR settings with a good PCIe card will give you far better bandwidth
> than the old AGP cards. There is nothing like the 60x assymetry.

i don't think this is the case.  if you recall from the original
post, i have used the pat registers to set up memory types on
a pcie card and i do see dramatic speedups for drawing to
the screen.  however, reading from the screen is just as slow
as before.

according to the intel's x86 arch guide vol 3a, §10-8, p. 466
speculative reads are allowed for WC/WT/WB memory.  so
i wouldn't think that it's a bus problem at all.

if you recall, the only difficulty in using subpixel
fonts a few years ago was the fact that for hold mode and
deselection, the the α draw was done with the new mask and
the on-screen image, which was read from the frame buffer.
not only did this result in a squared α, it was also sloooow,
especially on nvidia cards.  oddly, the via machines driven
in vesa mode were the fastest.  the speed up was at least a
factor of 10; you could easily see the speedup.

at this point, you probablly don't believe me.  so maybe
some numbers will help make the case:

; time dd -if /dev/wsys/1/screen -of /dev/null -bs 512k
0+1201 records in
0+1201 records out
0.00u 0.03s 4.04r 	 dd -if /dev/wsys/1/screen -of /dev/null ...
; time dd -if /dev/zero -of /dev/null -bs 512k -count 1201
1201+0 records in
1201+0 records out
0.00u 0.14s 0.14r 	 dd -if /dev/zero -of /dev/null ...

i've seen the same behavior on a number of different nvidia
cards of different generations.  newer cards seem to be worse
than older cards.  (if you have the programming interface
manuals, i'd be happy to double-check the settings.  ☺.)

can you explain what the downside of double-buffering
would be?  it's not like the days where we asked, hey buddy,
have you got 4 megs to spare?

- erik

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-05 13:35           ` erik quanstrom
@ 2008-12-05 18:27             ` Russ Cox
  2008-12-05 18:32               ` Russ Cox
  2008-12-07 17:00               ` Aki Nyrhinen
  0 siblings, 2 replies; 25+ messages in thread
From: Russ Cox @ 2008-12-05 18:27 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> i don't think this is the case.  if you recall from the original
> post, i have used the pat registers to set up memory types on
> a pcie card and i do see dramatic speedups for drawing to
> the screen.  however, reading from the screen is just as slow
> as before.

I think the problem of

> can you explain what the downside of double-buffering
> would be?  it's not like the days where we asked, hey buddy,
> have you got 4 megs to spare?

you mean using a soft screen (a kernel copy of the video
memory, so that you only ever write to the video card).
double buffering is switching the screen between two
different copies of the screen image, only ever drawing
on the one that is not currently on the screen.

in answer to your question, that might be a fine thing to do
now that memory is more plentiful.  no one has been bothered
enough to do it.  you would lose the hardware acceleration
for fill and scroll, since you can't have the video card editing
the frame buffer directly--your copy would be out of sync.
on the other hand, writes are so fast that it probably wouldn't
matter.  the win for hw scroll is not reading from the frame buffer.

i think it's a pretty trivial change, since the relevant code
is all there for non-direct-mapped frame buffers anyway.

go for it.

russ

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-05 18:27             ` Russ Cox
@ 2008-12-05 18:32               ` Russ Cox
  2008-12-05 18:49                 ` ron minnich
  2008-12-07 17:00               ` Aki Nyrhinen
  1 sibling, 1 reply; 25+ messages in thread
From: Russ Cox @ 2008-12-05 18:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Dec 5, 2008 at 10:27 AM, Russ Cox <rsc@swtch.com> wrote:
>> i don't think this is the case.  if you recall from the original
>> post, i have used the pat registers to set up memory types on
>> a pcie card and i do see dramatic speedups for drawing to
>> the screen.  however, reading from the screen is just as slow
>> as before.
>
> I think the problem of

I think the problem is that when video card manufacturers
optimize read bandwidth, they are working on the read bw
of the video card's RPC-like interface, not the read bw from
the video memory.

To a first approximation, no one reads directly from video memory.

Russ


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-05 18:32               ` Russ Cox
@ 2008-12-05 18:49                 ` ron minnich
  2008-12-05 19:21                   ` Paul Lalonde
  0 siblings, 1 reply; 25+ messages in thread
From: ron minnich @ 2008-12-05 18:49 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Dec 5, 2008 at 10:32 AM, Russ Cox <rsc@swtch.com> wrote:

> To a first approximation, no one reads directly from video memory.

That is certainly true, but it's been a concern for some time for GPU
computing, and the chipset folks are paying attention.

ron



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-05 18:49                 ` ron minnich
@ 2008-12-05 19:21                   ` Paul Lalonde
  2008-12-05 19:25                     ` erik quanstrom
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Lalonde @ 2008-12-05 19:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Again, you can stream a whole frame buffer reasonably fast - that should
be nearly full-rate; it should be full rate if you pre-fetch with
sufficient advance notice (500-1000 clocks), or DMA.  But random access
reads *have* to be slow: you get a stall while the system goes to PCIe
for each cache line you attempt to read from.

Paul

ron minnich wrote:
> On Fri, Dec 5, 2008 at 10:32 AM, Russ Cox <rsc@swtch.com> wrote:
>
>
>> To a first approximation, no one reads directly from video memory.
>>
>
> That is certainly true, but it's been a concern for some time for GPU
> computing, and the chipset folks are paying attention.
>
> ron
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-05 19:21                   ` Paul Lalonde
@ 2008-12-05 19:25                     ` erik quanstrom
  2008-12-05 19:30                       ` Paul Lalonde
  0 siblings, 1 reply; 25+ messages in thread
From: erik quanstrom @ 2008-12-05 19:25 UTC (permalink / raw)
  To: 9fans

On Fri Dec  5 14:23:22 EST 2008, plalonde@telus.net wrote:
> Again, you can stream a whole frame buffer reasonably fast - that should
> be nearly full-rate; it should be full rate if you pre-fetch with
> sufficient advance notice (500-1000 clocks), or DMA.  But random access
> reads *have* to be slow: you get a stall while the system goes to PCIe
> for each cache line you attempt to read from.
>
> Paul

the cpu is allowed to speculatively cache WC memory.

- erik



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-05 19:25                     ` erik quanstrom
@ 2008-12-05 19:30                       ` Paul Lalonde
  2008-12-05 19:40                         ` erik quanstrom
  2008-12-05 20:11                         ` ron minnich
  0 siblings, 2 replies; 25+ messages in thread
From: Paul Lalonde @ 2008-12-05 19:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

But random access patterns suck at being speculatively cached.
Linear access patterns still require reasonably careful work for the
caching to do the right thing.
Expecting your entire frame buffer to be cached in L2 isn't particularly
reasonable.

Paul

erik quanstrom wrote:
> On Fri Dec  5 14:23:22 EST 2008, plalonde@telus.net wrote:
>
>> Again, you can stream a whole frame buffer reasonably fast - that should
>> be nearly full-rate; it should be full rate if you pre-fetch with
>> sufficient advance notice (500-1000 clocks), or DMA.  But random access
>> reads *have* to be slow: you get a stall while the system goes to PCIe
>> for each cache line you attempt to read from.
>>
>> Paul
>>
>
> the cpu is allowed to speculatively cache WC memory.
>
> - erik
>
>




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-05 19:30                       ` Paul Lalonde
@ 2008-12-05 19:40                         ` erik quanstrom
  2008-12-05 20:11                         ` ron minnich
  1 sibling, 0 replies; 25+ messages in thread
From: erik quanstrom @ 2008-12-05 19:40 UTC (permalink / raw)
  To: 9fans

On Fri Dec  5 14:32:56 EST 2008, plalonde@telus.net wrote:
> But random access patterns suck at being speculatively cached.
> Linear access patterns still require reasonably careful work for the
> caching to do the right thing.
> Expecting your entire frame buffer to be cached in L2 isn't particularly
> reasonable.
>
> Paul

i'm just not convinced that nvidia's poor performance has
anything to do with pcie latency or processor stalls.
a 500x500 window takes ~1sec to uncover.  that's like
2 billion instructions.  since a cacheline is ~128 bytes
(close enough)  that's ~8000 stall opertunities.  if it
takes all of them, that's only 8 million instructions.
on the order of 1/1000th of the actual delay.  if WC
were the issue, i should see 100x improvement in reading
from the card.

- erik

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-05 19:30                       ` Paul Lalonde
  2008-12-05 19:40                         ` erik quanstrom
@ 2008-12-05 20:11                         ` ron minnich
  2008-12-06  5:52                           ` Paul Lalonde
  1 sibling, 1 reply; 25+ messages in thread
From: ron minnich @ 2008-12-05 20:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Dec 5, 2008 at 11:30 AM, Paul Lalonde <plalonde@telus.net> wrote:
> But random access patterns suck at being speculatively cached.
> Linear access patterns still require reasonably careful work for the caching
> to do the right thing.
> Expecting your entire frame buffer to be cached in L2 isn't particularly
> reasonable.
>

I'm pretty sure we can put some #s on this discussion. It's too fuzzy for me.

Forget speculative reads, for now. Paul, what kind of time are you
seeing on your measurements to load a cache line over pcie from a
card?

ron



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-05 20:11                         ` ron minnich
@ 2008-12-06  5:52                           ` Paul Lalonde
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Lalonde @ 2008-12-06  5:52 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I'll try to track down an actual PCIe card rather than a simulator and
run down some numbers on Monday.

Paul

On 5-Dec-08, at 12:11 PM, ron minnich wrote:

>
> On Fri, Dec 5, 2008 at 11:30 AM, Paul Lalonde <plalonde@telus.net>
> wrote:
>> But random access patterns suck at being speculatively cached.
>> Linear access patterns still require reasonably careful work for
>> the caching
>> to do the right thing.
>> Expecting your entire frame buffer to be cached in L2 isn't
>> particularly
>> reasonable.
>>
>
> I'm pretty sure we can put some #s on this discussion. It's too
> fuzzy for me.
>
> Forget speculative reads, for now. Paul, what kind of time are you
> seeing on your measurements to load a cache line over pcie from a
> card?
>
> ron
>




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-05 18:27             ` Russ Cox
  2008-12-05 18:32               ` Russ Cox
@ 2008-12-07 17:00               ` Aki Nyrhinen
  2008-12-07 23:22                 ` erik quanstrom
  1 sibling, 1 reply; 25+ messages in thread
From: Aki Nyrhinen @ 2008-12-07 17:00 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

for vgavesa, you can find this on /n/sources/patch/saved/vesasoftscreen.
it's been there for a some time.

for all the cards that have support outside the vesa driver, it's probably
easiest to say monitor=vesa too, since you lose acceleration anyway.
the related mtrr or the pat patch is good to have with this, or take away
the mtrr() call from the patch.

i remember we discussed this a few years back, and you were mostly
concerned about losing accelerated ops then.

On Fri, Dec 5, 2008 at 8:27 PM, Russ Cox <rsc@swtch.com> wrote:
>> i don't think this is the case.  if you recall from the original
>> post, i have used the pat registers to set up memory types on
>> a pcie card and i do see dramatic speedups for drawing to
>> the screen.  however, reading from the screen is just as slow
>> as before.
>
> I think the problem of
>
>> can you explain what the downside of double-buffering
>> would be?  it's not like the days where we asked, hey buddy,
>> have you got 4 megs to spare?
>
> you mean using a soft screen (a kernel copy of the video
> memory, so that you only ever write to the video card).
> double buffering is switching the screen between two
> different copies of the screen image, only ever drawing
> on the one that is not currently on the screen.
>
> in answer to your question, that might be a fine thing to do
> now that memory is more plentiful.  no one has been bothered
> enough to do it.  you would lose the hardware acceleration
> for fill and scroll, since you can't have the video card editing
> the frame buffer directly--your copy would be out of sync.
> on the other hand, writes are so fast that it probably wouldn't
> matter.  the win for hw scroll is not reading from the frame buffer.
>
> i think it's a pretty trivial change, since the relevant code
> is all there for non-direct-mapped frame buffers anyway.
>
> go for it.
>
> russ
>
>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-07 17:00               ` Aki Nyrhinen
@ 2008-12-07 23:22                 ` erik quanstrom
  2008-12-08  0:17                   ` Aki Nyrhinen
  0 siblings, 1 reply; 25+ messages in thread
From: erik quanstrom @ 2008-12-07 23:22 UTC (permalink / raw)
  To: 9fans

On Sun Dec  7 12:02:10 EST 2008, anyrhine@gmail.com wrote:
> for vgavesa, you can find this on /n/sources/patch/saved/vesasoftscreen.
> it's been there for a some time.
>
> for all the cards that have support outside the vesa driver, it's probably
> easiest to say monitor=vesa too, since you lose acceleration anyway.
> the related mtrr or the pat patch is good to have with this, or take away
> the mtrr() call from the patch.
>
> i remember we discussed this a few years back, and you were mostly
> concerned about losing accelerated ops then.

sure, but
- what about resolutions higher than vesa specifies,
- what about working with multiple processors
- have you had any luck with vesa on nvidia cards at all?

- erik



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-07 23:22                 ` erik quanstrom
@ 2008-12-08  0:17                   ` Aki Nyrhinen
  0 siblings, 0 replies; 25+ messages in thread
From: Aki Nyrhinen @ 2008-12-08  0:17 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Dec 8, 2008 at 1:22 AM, erik quanstrom <quanstro@quanstro.net> wrote:
> On Sun Dec  7 12:02:10 EST 2008, anyrhine@gmail.com wrote:
>> for vgavesa, you can find this on /n/sources/patch/saved/vesasoftscreen.
>> it's been there for a some time.
>>
>> for all the cards that have support outside the vesa driver, it's probably
>> easiest to say monitor=vesa too, since you lose acceleration anyway.
>> the related mtrr or the pat patch is good to have with this, or take away
>> the mtrr() call from the patch.
>>
>> i remember we discussed this a few years back, and you were mostly
>> concerned about losing accelerated ops then.
>
> sure, but
> - what about resolutions higher than vesa specifies,
> - what about working with multiple processors
> - have you had any luck with vesa on nvidia cards at all?

actually, i'm using a hack very similar to the vgavesa stuff above,
except for vganvidia, precisely for the reasons you mention (all of them),
plus the fact that acceleration is broken for my current nvidia card and i
could not care less.

the changes are just as trivial (and the same code can be used).
the reason why i didn't originally post the mtrr hack with all vga
drivers changed was that while the added buffering makes performance
much more constant, it is a loss for non-obscured foreground windows
for things like 'time du -a /usr', due to losing hwfill and hwscroll.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
  2008-12-01 15:24 plalonde
@ 2008-12-05  5:22 ` sqweek
  0 siblings, 0 replies; 25+ messages in thread
From: sqweek @ 2008-12-05  5:22 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Tue, Dec 2, 2008 at 12:24 AM,  <plalonde@telus.net> wrote:
> I think the real performance issue for hardware where the frame buffer is in
> the PCIe shared memory apperture is that writes are write-through/coalesced
> on their way across the PCIe, but reads can't be, and so incur huge stalls.

 Hah, this sounds just like 9p's latency woes.
-sqweek



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [9fans] image/memimage speed
@ 2008-12-01 15:24 plalonde
  2008-12-05  5:22 ` sqweek
  0 siblings, 1 reply; 25+ messages in thread
From: plalonde @ 2008-12-01 15:24 UTC (permalink / raw)
  To: 9fans; +Cc: 9fans

[-- Attachment #1: Type: text/html, Size: 1202 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2008-12-08  0:17 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-30 22:00 [9fans] image/memimage speed Iruata Souza
2008-11-30 23:54 ` Iruata Souza
2008-12-01  1:29   ` erik quanstrom
2008-12-01  1:54     ` andrey mirtchovski
2008-12-01  2:35       ` erik quanstrom
2008-12-01  3:30         ` andrey mirtchovski
2008-12-01  6:41         ` Paul Lalonde
2008-12-01 14:19     ` Steve Simon
2008-12-01 14:33       ` erik quanstrom
2008-12-05  6:39         ` ron minnich
2008-12-05 13:35           ` erik quanstrom
2008-12-05 18:27             ` Russ Cox
2008-12-05 18:32               ` Russ Cox
2008-12-05 18:49                 ` ron minnich
2008-12-05 19:21                   ` Paul Lalonde
2008-12-05 19:25                     ` erik quanstrom
2008-12-05 19:30                       ` Paul Lalonde
2008-12-05 19:40                         ` erik quanstrom
2008-12-05 20:11                         ` ron minnich
2008-12-06  5:52                           ` Paul Lalonde
2008-12-07 17:00               ` Aki Nyrhinen
2008-12-07 23:22                 ` erik quanstrom
2008-12-08  0:17                   ` Aki Nyrhinen
2008-12-01 15:24 plalonde
2008-12-05  5:22 ` sqweek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).