9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] nvidia scrolling performance
@ 2006-04-29 21:39 erik quanstrom
  2006-04-30  4:00 ` jmk
  2006-04-30 16:10 ` Russ Cox
  0 siblings, 2 replies; 27+ messages in thread
From: erik quanstrom @ 2006-04-29 21:39 UTC (permalink / raw)
  To: 9fans

i've been using 1280x1024x32 on an nv18 with a greyscale font.
graphics performance is good.  page is able to pan large images
smoothly.  however, for a ~1000x750 window, i get these numbers
with various other load on the system:

	0.00u 0.00s 49.32r 	 cat /sys/games/lib/fortunes		# system is idle
	0.00u 0.00s 50.29r 	 cat /sys/games/lib/fortunes		# load avg of ~ 0.5
	0.00u 0.00s 50.71r 	 cat /sys/games/lib/fortunes		# load avg of ~ 0.9
	0.00u 0.01s 63.46r 	 cat /sys/games/lib/fortunes		# load avg > 1

running the same test with pelm.8, i get

	0.00u 0.00s 1.08r 	 cat /sys/games/lib/fortunes

the only obvious difference i see between the plan 9 driver and the xf86 stuff is
that the xf86 stuff tries to buffer dma access a bit.  is ths the key, or should i be looking
at something else?

- erik


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-04-29 21:39 [9fans] nvidia scrolling performance erik quanstrom
@ 2006-04-30  4:00 ` jmk
  2006-04-30 16:10 ` Russ Cox
  1 sibling, 0 replies; 27+ messages in thread
From: jmk @ 2006-04-30  4:00 UTC (permalink / raw)
  To: 9fans

On Sat Apr 29 17:51:02 EDT 2006, quanstro@quanstro.net wrote:
> ...
> the only obvious difference i see between the plan 9 driver and the xf86 stuff is
> that the xf86 stuff tries to buffer dma access a bit.  is ths the key, or should i be looking
> at something else?
>
> - erik

i think we've all noticed performance issues with the nvidia driver,
but we are at the mercy of the xfree86 code, a different model and
having the time to figure out what's up or a different way of doing
things. we'd all be grateful if you figured out what's wrong. i spent
a little time on it last week after getting the geforce 6600 to work
with a dell 2405fpw but i couldn't see where to make it faster.

--jim


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-04-29 21:39 [9fans] nvidia scrolling performance erik quanstrom
  2006-04-30  4:00 ` jmk
@ 2006-04-30 16:10 ` Russ Cox
  2006-04-30 18:12   ` Steve Simon
  1 sibling, 1 reply; 27+ messages in thread
From: Russ Cox @ 2006-04-30 16:10 UTC (permalink / raw)
  To: 9fans

> the only obvious difference i see between the plan 9 driver and the xf86 stuff is
> that the xf86 stuff tries to buffer dma access a bit.  is ths the key, or should i be looking
> at something else?

This isn't the nvidia driver.
The problem is that you're using greyscale fonts.
That requires dropping into the generic alpha code
instead of using the boolean alpha code that regular
character drawing uses.  The boolean alpha code
avoids some multiplies, which helps a little, and also
avoids reading from the frame buffer memory, which
helps a *lot*.

Frame buffer memory is very very slow to read from,
and not just on nvidia.  When I did some timings six years
ago, I found that reading from frame buffer memory
was slower than reading from disk.  I'm sure the situation
hasn't gotten better.  It's not on the fast path for any
other system, so the vendors just don't care.

Now that memories have gotten bigger, it might be
worth keeping a copy of the screen image like in
the X port, but it's not clear to me how to reconcile
that with using acceleration.

Greyscale fonts are slow and ugly.  For more than
occasional use, just don't do it.

Russ



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-04-30 16:10 ` Russ Cox
@ 2006-04-30 18:12   ` Steve Simon
  2006-04-30 22:34     ` Ronald G Minnich
  0 siblings, 1 reply; 27+ messages in thread
From: Steve Simon @ 2006-04-30 18:12 UTC (permalink / raw)
  To: 9fans

> Frame buffer memory is very very slow to read from,
> and not just on nvidia.  When I did some timings six years
> ago, I found that reading from frame buffer memory
> was slower than reading from disk.  I'm sure the situation
> hasn't gotten better.  It's not on the fast path for any
> other system, so the vendors just don't care.

I may be talking rubbish but I understood this is a fundamental
problem with reading VGA memory over the PCI bus. VGA cards are
designed for fast writes and not fast reads.

People have been very interested in using the GCPUs in graphics cards
to do video processing (to disk rather than for display) but the limiting
factor seems to have been the speed at which data can be read back.
I do hear that some cards are appearing with dual PCIX which will allow
symetric access speeds to the frame buffer.

-Steve


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-04-30 18:12   ` Steve Simon
@ 2006-04-30 22:34     ` Ronald G Minnich
  2006-05-01  6:52       ` Nigel Roles
  0 siblings, 1 reply; 27+ messages in thread
From: Ronald G Minnich @ 2006-04-30 22:34 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Steve Simon wrote:

> I may be talking rubbish but I understood this is a fundamental
> problem with reading VGA memory over the PCI bus. VGA cards are
> designed for fast writes and not fast reads.

depends.

on pci and AGP it's an issue.
AGP has the assymetric bandwidth built in.

on PCI express, it's supposed to get much better.


> I do hear that some cards are appearing with dual PCIX which will allow
> symetric access speeds to the frame buffer.
>

Note that PCIX is not pci express (PCIe or PCI-e or PCI-E). It's so
confusing. PCIX is PCI-X which is an old, slow bus.

ron


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [9fans] nvidia scrolling performance
  2006-04-30 22:34     ` Ronald G Minnich
@ 2006-05-01  6:52       ` Nigel Roles
  2006-05-01 19:58         ` Ronald G Minnich
  0 siblings, 1 reply; 27+ messages in thread
From: Nigel Roles @ 2006-05-01  6:52 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

9fans-bounces+ngr=9fs.org@cse.psu.edu wrote:
> Steve Simon wrote:
> 
>> I may be talking rubbish but I understood this is a fundamental
>> problem with reading VGA memory over the PCI bus. VGA cards are
>> designed for fast writes and not fast reads.
> 
> depends.
> 
> on pci and AGP it's an issue.
> AGP has the assymetric bandwidth built in.
> 
> on PCI express, it's supposed to get much better.
> 
> 
>> I do hear that some cards are appearing with dual PCIX which will
>> allow symetric access speeds to the frame buffer.
>> 
> 
> Note that PCIX is not pci express (PCIe or PCI-e or PCI-E). It's so
> confusing. PCIX is PCI-X which is an old, slow bus.
> 
> ron

PCIX is slow? Is 34Gbps not quick enough for you?




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-01  6:52       ` Nigel Roles
@ 2006-05-01 19:58         ` Ronald G Minnich
  2006-05-01 20:10           ` David Leimbach
  0 siblings, 1 reply; 27+ messages in thread
From: Ronald G Minnich @ 2006-05-01 19:58 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Nigel Roles wrote:
> 9fans-bounces+ngr=9fs.org@cse.psu.edu wrote:

> PCIX is slow? Is 34Gbps not quick enough for you?

I assume you mean PCIX, not pci-e.

And, no, it's not. YMMV.

ron


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-01 19:58         ` Ronald G Minnich
@ 2006-05-01 20:10           ` David Leimbach
  0 siblings, 0 replies; 27+ messages in thread
From: David Leimbach @ 2006-05-01 20:10 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 5/1/06, Ronald G Minnich <rminnich@lanl.gov> wrote:
> Nigel Roles wrote:
> > 9fans-bounces+ngr=9fs.org@cse.psu.edu wrote:
>
> > PCIX is slow? Is 34Gbps not quick enough for you?
>
> I assume you mean PCIX, not pci-e.
>
> And, no, it's not. YMMV.
>
> ron
>
How about that new inter-chassis Hypertransport? :)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
@ 2006-05-05 18:08 erik quanstrom
  0 siblings, 0 replies; 27+ messages in thread
From: erik quanstrom @ 2006-05-05 18:08 UTC (permalink / raw)
  To: 9fans

perhaps there is some database-related literature that would be helpful?
SQL is the closest analog i can think of.

- erik

On Fri May  5 12:33:48 CDT 2006, plalonde@telus.net wrote:
> 
> 
> On 5-May-06, at 10:22 AM, erik quanstrom wrote:
> >
> > has anyone considered a declaritive language for this sort of  
> > programming?
> > or are shaders too irregular for that sort of thing?
> 
> Certainly that's part of what we're considering at Neoptica.  Shaders  
> aren't nearly as irregular as I used to think they could be.
> We're working on GPU/SPU languages & synchronization mechanism that  
> let the user specify what (declaratively) and let us generate the  
> how.  But the literature is sparse, particularly in this domain.
> 
> Paul


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-05 17:22 erik quanstrom
@ 2006-05-05 17:32 ` Paul Lalonde
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Lalonde @ 2006-05-05 17:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On 5-May-06, at 10:22 AM, erik quanstrom wrote:
>
> has anyone considered a declaritive language for this sort of  
> programming?
> or are shaders too irregular for that sort of thing?

Certainly that's part of what we're considering at Neoptica.  Shaders  
aren't nearly as irregular as I used to think they could be.
We're working on GPU/SPU languages & synchronization mechanism that  
let the user specify what (declaratively) and let us generate the  
how.  But the literature is sparse, particularly in this domain.

Paul

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFEW4w/pJeHo/Fbu1wRAhyJAKCSpaWFH6UPXbx6RlmC9zf15E7AoACaA/Fj
Y9FVp5YtH4pIhzISTelr+K8=
=ORpa
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-05 17:07   ` Ronald G Minnich
@ 2006-05-05 17:30     ` Paul Lalonde
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Lalonde @ 2006-05-05 17:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm going to claim there is a difference.  The GPU handles all the  
threading issues needed to deal with the parallelism the GPU offers,  
at the cost of a restricted programming model (some random-access  
reads, no random access writes); the SPU offers a more general  
computing model, but you have to handle all the memory movement and  
syncrhonization issues.
You can choose to use the SPU as a GPU with many more registers and  
some extra random-access storage, but more general models can be  
applied, and useful.

Paul

On 5-May-06, at 10:07 AM, Ronald G Minnich wrote:

> Paul Lalonde wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> Aw, but I'd claim all that fancy 3-D graphics stuff is real   
>> computation :-)
>> But yeah, GPU abuse for general purpose computation is just plain   
>> scary.  I thank my lucky stars that there is plenty of FLOPS to  
>> go  around in the Cell's SPUs.
>
> uh, there's no real difference between GPU computation and SPU  
> computation, saving there's more of them SPUs. All the problems apply.
>
> ron

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFEW4unpJeHo/Fbu1wRAlkCAKC9dTCVlcN2jULQKEU5dMyK4L9gJwCeKvdN
2YjgFjnKlb9MCAeZ9bqzFLg=
=f0JX
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
@ 2006-05-05 17:22 erik quanstrom
  2006-05-05 17:32 ` Paul Lalonde
  0 siblings, 1 reply; 27+ messages in thread
From: erik quanstrom @ 2006-05-05 17:22 UTC (permalink / raw)
  To: 9fans

the abstract for glift contains this gem:

	Glift enables GPU programmers to separate algorithms from data structure 
	definitions; thereby greatly simplifying algorithmic development [...]

has anyone considered a declaritive language for this sort of programming?
or are shaders too irregular for that sort of thing?

- erik

On Fri May  5 11:23:05 CDT 2006, plalonde@telus.net wrote:
> 
> I'm really hoping that we can find a way to get our users off the C/C+ 
> + bandwagon (and that includes the high-level shading languages as  
> well) and using something that can express the required computations  
> more naturally.  There are some promissing-looking functional  
> approaches, but there's a huge barrier to adoption if it doesn't look  
> like C.
> 
> Paul
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-05 15:56 ` Paul Lalonde
  2006-05-05 16:01   ` David Leimbach
  2006-05-05 16:05   ` Wes
@ 2006-05-05 17:07   ` Ronald G Minnich
  2006-05-05 17:30     ` Paul Lalonde
  2 siblings, 1 reply; 27+ messages in thread
From: Ronald G Minnich @ 2006-05-05 17:07 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Paul Lalonde wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Aw, but I'd claim all that fancy 3-D graphics stuff is real  computation 
> :-)
> But yeah, GPU abuse for general purpose computation is just plain  
> scary.  I thank my lucky stars that there is plenty of FLOPS to go  
> around in the Cell's SPUs.

uh, there's no real difference between GPU computation and SPU 
computation, saving there's more of them SPUs. All the problems apply.

ron


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-05 16:21     ` Paul Lalonde
@ 2006-05-05 16:59       ` David Leimbach
  0 siblings, 0 replies; 27+ messages in thread
From: David Leimbach @ 2006-05-05 16:59 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> I'm really hoping that we can find a way to get our users off the C/C+
> + bandwagon (and that includes the high-level shading languages as
> well) and using something that can express the required computations
> more naturally.  There are some promissing-looking functional
> approaches, but there's a huge barrier to adoption if it doesn't look
> like C.

DSLs (Domain Specific Languages) aren't as hip as they should be IMO. 
People like to do evil metaprogramming stuff to twist languages into
weird syntactical sugary notation, then want to cry when the compiler
produces some unintelligble error messages about what's going on.

General purpose languages have their place of course... but a language
that lets you work closer to the problem space seems to almost always
produce more elegant code.

If C were everything to everyone, Fortran wouldn't still be in heavy
use in certain circles.

Dave


>
> Paul
>
> On 5-May-06, at 9:01 AM, David Leimbach wrote:
>
> > On 5/5/06, Paul Lalonde <plalonde@telus.net> wrote:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> Aw, but I'd claim all that fancy 3-D graphics stuff is real
> >> computation :-)
> >> But yeah, GPU abuse for general purpose computation is just plain
> >> scary.  I thank my lucky stars that there is plenty of FLOPS to go
> >> around in the Cell's SPUs.
> >>
> >
> > Eh, nvidia's working on making the GPUs more accessible (via
> > compilers, kind of like Cell) for more general purpose computation.
> >
> > The problem with FPGA, GPU, and "non-local" coprocessing cores is
> > usually the moving of data to them fast enough.  Cell shouldn't have
> > this problem and with the new hypertransport stuff coming out, it
> > looks like one can easilly do NUMA like things inter-chassis too.
> >
> > I don't know if this is cost effective, but streaming parallelism to
> > special coprocessors can be a big win in HPC.
> >
> > Dave
> >
> >> Paul
> >>
> >> On 5-May-06, at 8:46 AM, erik quanstrom wrote:
> >>
> >> > if i were doing real computation, i wouldn't use a gpu i'd use a
> >> > cpu. ;-)
> >> >
> >> > - erik
> >> >
> >> >> 8G/s? Nowhere near enough.  Enough for text, but try doing real
> >> >> computation using that GPU...
> >> >> PS3 is running 25G/s bi-directional.  Those bits move.
> >> >>
> >>
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG v1.4.1 (Darwin)
> >>
> >> iD8DBQFEW3W0pJeHo/Fbu1wRAoOzAJ9C4d5WBnPm4hH1scoknQI1sFfuTgCgqC9c
> >> Ft6mIE9ogrlaD9ltrNkMmjg=
> >> =qWgd
> >> -----END PGP SIGNATURE-----
> >>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (Darwin)
>
> iD8DBQFEW3ulpJeHo/Fbu1wRAtbrAJ0RH5SpW4ZIx0W7BZIh3QXCRXt5MwCfYLVG
> 4DsnaEAu+s0hp/wAVsJZ5+U=
> =R5Qr
> -----END PGP SIGNATURE-----
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-05 16:08 erik quanstrom
@ 2006-05-05 16:42 ` David Leimbach
  0 siblings, 0 replies; 27+ messages in thread
From: David Leimbach @ 2006-05-05 16:42 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 5/5/06, erik quanstrom <quanstro@quanstro.net> wrote:
> conversly, isn't doing gpu work on the cpu cpu abuse? ;-)
> i mean we've got to have the proper caste system for processors
> and unwashed masses of special assemblers.
>
> - erik
>

I don't see it as abuse, just optimization.

> On Fri May  5 10:57:55 CDT 2006, plalonde@telus.net wrote:
>
> > Aw, but I'd claim all that fancy 3-D graphics stuff is real
> > computation :-)
> > But yeah, GPU abuse for general purpose computation is just plain
> > scary.  I thank my lucky stars that there is plenty of FLOPS to go
> > around in the Cell's SPUs.
> >
> > Paul
> >
> > On 5-May-06, at 8:46 AM, erik quanstrom wrote:
> >
> > > if i were doing real computation, i wouldn't use a gpu i'd use a
> > > cpu. ;-)
> > >
> > > - erik
> > >
> > >> 8G/s? Nowhere near enough.  Enough for text, but try doing real
> > >> computation using that GPU...
> > >> PS3 is running 25G/s bi-directional.  Those bits move.
> > >>
> >
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-05 16:01   ` David Leimbach
@ 2006-05-05 16:21     ` Paul Lalonde
  2006-05-05 16:59       ` David Leimbach
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Lalonde @ 2006-05-05 16:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The GPGPU stuff that is making the computation much more accessible,  
but at least in my field (games - uh make that real-time interactive  
graphical simulations) if you give me more throughput near video ram,  
I'll be sorely tempted to (surprise) use it to make more or better  
pictures.

The interesting thing about the GPU is that it exposes a (fairly)  
strict streaming computation model in which the user really only has  
control over the computation kernel, and very little control over the  
iteration construct.  That makes using the high levels of parallelism  
relatively easy and efficient.  The challenge is in expressing non- 
trivial algorithms in streaming ways.  Aaron Lefohn's Glift suite  
(http://graphics.idav.ucdavis.edu/graphics/publications/print_pub? 
pub_id=837) is a nice wrapper around that material for more general  
data structures on the GPU.  The downside is that you had better have  
a C++ compiler that does templates well.

I'm really hoping that we can find a way to get our users off the C/C+ 
+ bandwagon (and that includes the high-level shading languages as  
well) and using something that can express the required computations  
more naturally.  There are some promissing-looking functional  
approaches, but there's a huge barrier to adoption if it doesn't look  
like C.

Paul

On 5-May-06, at 9:01 AM, David Leimbach wrote:

> On 5/5/06, Paul Lalonde <plalonde@telus.net> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Aw, but I'd claim all that fancy 3-D graphics stuff is real
>> computation :-)
>> But yeah, GPU abuse for general purpose computation is just plain
>> scary.  I thank my lucky stars that there is plenty of FLOPS to go
>> around in the Cell's SPUs.
>>
>
> Eh, nvidia's working on making the GPUs more accessible (via
> compilers, kind of like Cell) for more general purpose computation.
>
> The problem with FPGA, GPU, and "non-local" coprocessing cores is
> usually the moving of data to them fast enough.  Cell shouldn't have
> this problem and with the new hypertransport stuff coming out, it
> looks like one can easilly do NUMA like things inter-chassis too.
>
> I don't know if this is cost effective, but streaming parallelism to
> special coprocessors can be a big win in HPC.
>
> Dave
>
>> Paul
>>
>> On 5-May-06, at 8:46 AM, erik quanstrom wrote:
>>
>> > if i were doing real computation, i wouldn't use a gpu i'd use a
>> > cpu. ;-)
>> >
>> > - erik
>> >
>> >> 8G/s? Nowhere near enough.  Enough for text, but try doing real
>> >> computation using that GPU...
>> >> PS3 is running 25G/s bi-directional.  Those bits move.
>> >>
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.1 (Darwin)
>>
>> iD8DBQFEW3W0pJeHo/Fbu1wRAoOzAJ9C4d5WBnPm4hH1scoknQI1sFfuTgCgqC9c
>> Ft6mIE9ogrlaD9ltrNkMmjg=
>> =qWgd
>> -----END PGP SIGNATURE-----
>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFEW3ulpJeHo/Fbu1wRAtbrAJ0RH5SpW4ZIx0W7BZIh3QXCRXt5MwCfYLVG
4DsnaEAu+s0hp/wAVsJZ5+U=
=R5Qr
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
@ 2006-05-05 16:08 erik quanstrom
  2006-05-05 16:42 ` David Leimbach
  0 siblings, 1 reply; 27+ messages in thread
From: erik quanstrom @ 2006-05-05 16:08 UTC (permalink / raw)
  To: 9fans

conversly, isn't doing gpu work on the cpu cpu abuse? ;-)
i mean we've got to have the proper caste system for processors
and unwashed masses of special assemblers.

- erik

On Fri May  5 10:57:55 CDT 2006, plalonde@telus.net wrote:

> Aw, but I'd claim all that fancy 3-D graphics stuff is real  
> computation :-)
> But yeah, GPU abuse for general purpose computation is just plain  
> scary.  I thank my lucky stars that there is plenty of FLOPS to go  
> around in the Cell's SPUs.
> 
> Paul
> 
> On 5-May-06, at 8:46 AM, erik quanstrom wrote:
> 
> > if i were doing real computation, i wouldn't use a gpu i'd use a  
> > cpu. ;-)
> >
> > - erik
> >
> >> 8G/s? Nowhere near enough.  Enough for text, but try doing real
> >> computation using that GPU...
> >> PS3 is running 25G/s bi-directional.  Those bits move.
> >>
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-05 15:56 ` Paul Lalonde
  2006-05-05 16:01   ` David Leimbach
@ 2006-05-05 16:05   ` Wes
  2006-05-05 17:07   ` Ronald G Minnich
  2 siblings, 0 replies; 27+ messages in thread
From: Wes @ 2006-05-05 16:05 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

What kind of plan9 kernel would you like to compile:
auth-server
cpu-server
gpu-server

On 5/6/06, Paul Lalonde <plalonde@telus.net> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Aw, but I'd claim all that fancy 3-D graphics stuff is real
> computation :-)
> But yeah, GPU abuse for general purpose computation is just plain
> scary.  I thank my lucky stars that there is plenty of FLOPS to go
> around in the Cell's SPUs.
>
> Paul
>
> On 5-May-06, at 8:46 AM, erik quanstrom wrote:
>
> > if i were doing real computation, i wouldn't use a gpu i'd use a
> > cpu. ;-)
> >
> > - erik
> >
> >> 8G/s? Nowhere near enough.  Enough for text, but try doing real
> >> computation using that GPU...
> >> PS3 is running 25G/s bi-directional.  Those bits move.
> >>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (Darwin)
>
> iD8DBQFEW3W0pJeHo/Fbu1wRAoOzAJ9C4d5WBnPm4hH1scoknQI1sFfuTgCgqC9c
> Ft6mIE9ogrlaD9ltrNkMmjg=
> =qWgd
> -----END PGP SIGNATURE-----
>

[-- Attachment #2: Type: text/html, Size: 1375 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-05 15:56 ` Paul Lalonde
@ 2006-05-05 16:01   ` David Leimbach
  2006-05-05 16:21     ` Paul Lalonde
  2006-05-05 16:05   ` Wes
  2006-05-05 17:07   ` Ronald G Minnich
  2 siblings, 1 reply; 27+ messages in thread
From: David Leimbach @ 2006-05-05 16:01 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 5/5/06, Paul Lalonde <plalonde@telus.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Aw, but I'd claim all that fancy 3-D graphics stuff is real
> computation :-)
> But yeah, GPU abuse for general purpose computation is just plain
> scary.  I thank my lucky stars that there is plenty of FLOPS to go
> around in the Cell's SPUs.
>

Eh, nvidia's working on making the GPUs more accessible (via
compilers, kind of like Cell) for more general purpose computation.

The problem with FPGA, GPU, and "non-local" coprocessing cores is
usually the moving of data to them fast enough.  Cell shouldn't have
this problem and with the new hypertransport stuff coming out, it
looks like one can easilly do NUMA like things inter-chassis too.

I don't know if this is cost effective, but streaming parallelism to
special coprocessors can be a big win in HPC.

Dave

> Paul
>
> On 5-May-06, at 8:46 AM, erik quanstrom wrote:
>
> > if i were doing real computation, i wouldn't use a gpu i'd use a
> > cpu. ;-)
> >
> > - erik
> >
> >> 8G/s? Nowhere near enough.  Enough for text, but try doing real
> >> computation using that GPU...
> >> PS3 is running 25G/s bi-directional.  Those bits move.
> >>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (Darwin)
>
> iD8DBQFEW3W0pJeHo/Fbu1wRAoOzAJ9C4d5WBnPm4hH1scoknQI1sFfuTgCgqC9c
> Ft6mIE9ogrlaD9ltrNkMmjg=
> =qWgd
> -----END PGP SIGNATURE-----
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-05 15:46 erik quanstrom
@ 2006-05-05 15:56 ` Paul Lalonde
  2006-05-05 16:01   ` David Leimbach
                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Paul Lalonde @ 2006-05-05 15:56 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aw, but I'd claim all that fancy 3-D graphics stuff is real  
computation :-)
But yeah, GPU abuse for general purpose computation is just plain  
scary.  I thank my lucky stars that there is plenty of FLOPS to go  
around in the Cell's SPUs.

Paul

On 5-May-06, at 8:46 AM, erik quanstrom wrote:

> if i were doing real computation, i wouldn't use a gpu i'd use a  
> cpu. ;-)
>
> - erik
>
>> 8G/s? Nowhere near enough.  Enough for text, but try doing real
>> computation using that GPU...
>> PS3 is running 25G/s bi-directional.  Those bits move.
>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFEW3W0pJeHo/Fbu1wRAoOzAJ9C4d5WBnPm4hH1scoknQI1sFfuTgCgqC9c
Ft6mIE9ogrlaD9ltrNkMmjg=
=qWgd
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
@ 2006-05-05 15:46 erik quanstrom
  2006-05-05 15:56 ` Paul Lalonde
  0 siblings, 1 reply; 27+ messages in thread
From: erik quanstrom @ 2006-05-05 15:46 UTC (permalink / raw)
  To: 9fans

if i were doing real computation, i wouldn't use a gpu i'd use a cpu. ;-)

- erik

> 8G/s? Nowhere near enough.  Enough for text, but try doing real  
> computation using that GPU...
> PS3 is running 25G/s bi-directional.  Those bits move.
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-01 17:57   ` Artem Letko
@ 2006-05-01 19:02     ` Russ Cox
  0 siblings, 0 replies; 27+ messages in thread
From: Russ Cox @ 2006-05-01 19:02 UTC (permalink / raw)
  To: 9fans

> what if we use hardware to do blits with alpha?

easier said than done.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-01 17:44 ` Russ Cox
@ 2006-05-01 17:57   ` Artem Letko
  2006-05-01 19:02     ` Russ Cox
  0 siblings, 1 reply; 27+ messages in thread
From: Artem Letko @ 2006-05-01 17:57 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

what if we use hardware to do blits with alpha?

-art

On 5/1/06, Russ Cox <rsc@swtch.com> wrote:
> > why does the framebuffer need to be consulted when the background is a known,
> > solid color?
>
> the underlying operation is just plain draw,
> and drawing text corresponds to drawing
> solid black through the font (as a mask)
> onto the destination image.  if the mask has
> fractional alpha, that requires reading the
> destination image to do the mixing.
>
> the destination image might in this case
> be a known solid color, but in general it
> need not be.
>
> you could address this by adding a fourth argument
> to memdraw and then using it inside devdraw
> to specify a "read from this instead of the destination"
> image.  it's not clear to me that this is worth the bother,
> and it makes the interface less clean.  you'd also have
> to redo libframe to use stringbg everywhere.
>
> russ
>
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-01  0:30 erik quanstrom
@ 2006-05-01 17:44 ` Russ Cox
  2006-05-01 17:57   ` Artem Letko
  0 siblings, 1 reply; 27+ messages in thread
From: Russ Cox @ 2006-05-01 17:44 UTC (permalink / raw)
  To: 9fans

> why does the framebuffer need to be consulted when the background is a known,
> solid color?

the underlying operation is just plain draw,
and drawing text corresponds to drawing 
solid black through the font (as a mask)
onto the destination image.  if the mask has
fractional alpha, that requires reading the
destination image to do the mixing.

the destination image might in this case 
be a known solid color, but in general it
need not be.

you could address this by adding a fourth argument
to memdraw and then using it inside devdraw
to specify a "read from this instead of the destination"
image.  it's not clear to me that this is worth the bother,
and it makes the interface less clean.  you'd also have
to redo libframe to use stringbg everywhere.

russ



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
  2006-05-01  0:52 erik quanstrom
@ 2006-05-01  1:00 ` Paul Lalonde
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Lalonde @ 2006-05-01  1:00 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

8G/s? Nowhere near enough.  Enough for text, but try doing real
computation using that GPU...
PS3 is running 25G/s bi-directional.  Those bits move.

Paul

On 30-Apr-06, at 5:52 PM, erik quanstrom wrote:

> i only have a pci card, but someone with an agp card and a machine
> that
> allows bios-controlled agp bandwidth could eliminate some
> possibilities.
> if it's bus limited, then the total performance should be linear in
> agp
> bus speed, right?  of course we still wouldn't know which direction on
> the bus was limiting.
>
> perhaps it's time to add another machine to the second-hand
> hardware collection.
> ;-)
>
> pci-x (1.066G/s) has the same bandwidth as PCIe x4 (1G/s).
> PCIe SLI = 2 * 16x = 8G/s, which ought to be enough for just about
> anyone.
>
> - erik
>
> On Sun Apr 30 13:13:21 CDT 2006, steve@quintile.net wrote:
>>> Frame buffer memory is very very slow to read from,
>>> and not just on nvidia.  When I did some timings six years
>>> ago, I found that reading from frame buffer memory
>>> was slower than reading from disk.  I'm sure the situation
>>> hasn't gotten better.  It's not on the fast path for any
>>> other system, so the vendors just don't care.
>>
>> I may be talking rubbish but I understood this is a fundamental
>> problem with reading VGA memory over the PCI bus. VGA cards are
>> designed for fast writes and not fast reads.
>>
>> People have been very interested in using the GCPUs in graphics cards
>> to do video processing (to disk rather than for display) but the
>> limiting
>> factor seems to have been the speed at which data can be read back.
>> I do hear that some cards are appearing with dual PCIX which will
>> allow
>> symetric access speeds to the frame buffer.
>>
>> -Steve
>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFEVV2gpJeHo/Fbu1wRAowpAJ0SaqAQvQO7LWvyWvbDkAsa9FY6lACgyZaL
A9qxHIBefBlanunFaLXl9Nc=
=PktC
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
@ 2006-05-01  0:52 erik quanstrom
  2006-05-01  1:00 ` Paul Lalonde
  0 siblings, 1 reply; 27+ messages in thread
From: erik quanstrom @ 2006-05-01  0:52 UTC (permalink / raw)
  To: 9fans

i only have a pci card, but someone with an agp card and a machine that
allows bios-controlled agp bandwidth could eliminate some possibilities.
if it's bus limited, then the total performance should be linear in agp
bus speed, right?  of course we still wouldn't know which direction on
the bus was limiting.

perhaps it's time to add another machine to the second-hand hardware collection.
;-)

pci-x (1.066G/s) has the same bandwidth as PCIe x4 (1G/s).
PCIe SLI = 2 * 16x = 8G/s, which ought to be enough for just about anyone.

- erik

On Sun Apr 30 13:13:21 CDT 2006, steve@quintile.net wrote:
> > Frame buffer memory is very very slow to read from,
> > and not just on nvidia.  When I did some timings six years
> > ago, I found that reading from frame buffer memory
> > was slower than reading from disk.  I'm sure the situation
> > hasn't gotten better.  It's not on the fast path for any
> > other system, so the vendors just don't care.
>
> I may be talking rubbish but I understood this is a fundamental
> problem with reading VGA memory over the PCI bus. VGA cards are
> designed for fast writes and not fast reads.
>
> People have been very interested in using the GCPUs in graphics cards
> to do video processing (to disk rather than for display) but the limiting
> factor seems to have been the speed at which data can be read back.
> I do hear that some cards are appearing with dual PCIX which will allow
> symetric access speeds to the frame buffer.
>
> -Steve
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9fans] nvidia scrolling performance
@ 2006-05-01  0:30 erik quanstrom
  2006-05-01 17:44 ` Russ Cox
  0 siblings, 1 reply; 27+ messages in thread
From: erik quanstrom @ 2006-05-01  0:30 UTC (permalink / raw)
  To: 9fans

i don't think the extra multiplication has much to do with it.
the system load had esentially zero effect on the drawing time until
both processors were >100% utilization.  at that point, i think that
scheduling, rather than actual cpu usage. (there were occasional long
pauses.)  for moderate loads, the time was +/- 5%.

why does the framebuffer need to be consulted when the background is a known,
solid color?

as far as greyscale fonts go, both cyberbit and code2000 look pretty
good to me.  pelm does not look very good with my monitors large
dot pitch, and allows much less text in the same size window.
i used to use it when i had a better monitor.  also, pelm is fixed-width
and has too many pjws at codepoints that are important to me.

what to you might be slow and ugly might be slow and good looking
to me.  however i'm the one looking at the monitor!

- erik

On Sun Apr 30 10:09:57 CDT 2006, rsc@swtch.com wrote:
> > the only obvious difference i see between the plan 9 driver and the xf86 stuff is
> > that the xf86 stuff tries to buffer dma access a bit.  is ths the key, or should i be looking
> > at something else?
>
> This isn't the nvidia driver.
> The problem is that you're using greyscale fonts.
> That requires dropping into the generic alpha code
> instead of using the boolean alpha code that regular
> character drawing uses.  The boolean alpha code
> avoids some multiplies, which helps a little, and also
> avoids reading from the frame buffer memory, which
> helps a *lot*.
>
> Frame buffer memory is very very slow to read from,
> and not just on nvidia.  When I did some timings six years
> ago, I found that reading from frame buffer memory
> was slower than reading from disk.  I'm sure the situation
> hasn't gotten better.  It's not on the fast path for any
> other system, so the vendors just don't care.
>
> Now that memories have gotten bigger, it might be
> worth keeping a copy of the screen image like in
> the X port, but it's not clear to me how to reconcile
> that with using acceleration.
>
> Greyscale fonts are slow and ugly.  For more than
> occasional use, just don't do it.
>
> Russ
>
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2006-05-05 18:08 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-29 21:39 [9fans] nvidia scrolling performance erik quanstrom
2006-04-30  4:00 ` jmk
2006-04-30 16:10 ` Russ Cox
2006-04-30 18:12   ` Steve Simon
2006-04-30 22:34     ` Ronald G Minnich
2006-05-01  6:52       ` Nigel Roles
2006-05-01 19:58         ` Ronald G Minnich
2006-05-01 20:10           ` David Leimbach
2006-05-01  0:30 erik quanstrom
2006-05-01 17:44 ` Russ Cox
2006-05-01 17:57   ` Artem Letko
2006-05-01 19:02     ` Russ Cox
2006-05-01  0:52 erik quanstrom
2006-05-01  1:00 ` Paul Lalonde
2006-05-05 15:46 erik quanstrom
2006-05-05 15:56 ` Paul Lalonde
2006-05-05 16:01   ` David Leimbach
2006-05-05 16:21     ` Paul Lalonde
2006-05-05 16:59       ` David Leimbach
2006-05-05 16:05   ` Wes
2006-05-05 17:07   ` Ronald G Minnich
2006-05-05 17:30     ` Paul Lalonde
2006-05-05 16:08 erik quanstrom
2006-05-05 16:42 ` David Leimbach
2006-05-05 17:22 erik quanstrom
2006-05-05 17:32 ` Paul Lalonde
2006-05-05 18:08 erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).