From: sirjofri <sirjofri+ml-9front@sirjofri.de>
To: 9front@9front.org
Subject: Re: [9front] Thoughts on Wayland?
Date: Mon, 5 Aug 2024 14:26:44 +0200 (GMT+02:00) [thread overview]
Message-ID: <b63ecf29-6b2a-4a12-9a0e-06a13dbf2468@sirjofri.de> (raw)
In-Reply-To: <20240805110501.asula7k52eo5gdld@black>
05.08.2024 13:09:41 Shawn Rutledge <lists@ecloud.org>:
> Dedicated GPUs are like that, so portable APIs need to support working
> that way. But often on integrated graphics, GPU memory is just a chunk
> of system memory, which makes the "upload" trivial in practice. Perhaps
> it can even be zero-copy sometimes, if there's a way to just map a chunk
> into GPU-managed space after it's populated?
I don't know so much about the integrated graphics, but in the end they plaster the whole beast with an API middleware like OpenGL or DirectX, which takes care of anything that's happening underneath. I assume that these also handle copying or mapping with integrated graphics. With dedicated GPUs, they surely upload the data, either directly (blocking) or nonblocking (save until the GPU is ready, then upload).
> It's very useful for AI applications when there's a lot of system memory
> and the boundary for "dedicated" GPU memory is fully flexible, as on
> Apple's unified memory. (For example I found that I can run a fully
> offline LLM with ollama on a modern mac; and after that, I stay away
> from all the online LLMs... I don't trust them, they are ultimately
> greedy, and if something is "free" it must be that we're helping them
> train it better by using their throttled interfaces.) So I'm not sure,
> but maybe we can expect that kind of architecture to be more common in
> the future.
Probably, who knows. The GPU is becoming a core part of any system, and AMD is working on having dedicated GPU power in the same package as the CPU. Modern APUs (as in handheld gaming consoles) are incredibly powerful enough (still not comparable to true dedicated graphics, but you can run modern games on them), and ARM also follows the market of integrating GPUs (Mali and stuff). I assume that with the rise of AI in standard computer systems, the market is forced to having dedicated GPU power, and unified memory (apple) sounds like an interesting way to go forward.
> And it would be nice to have a way to avoid spending system memory at
> all for GPU resources, to be able to stream directly from the original
> source (e.g. a file) to GPU memory. This is an issue in Qt (my day
> job)... we have caching features in multiple places, we try to avoid
> reading files and remote resources more than once; but if you're playing
> an animation, and the desired end result is to have the frames as GPU
> textures, and you have enough GPU memory, then it should be ok to lose
> the CPU-side cache. Especially if the cache was not already in
> GPU-texture form. Decoding simple formats like png and gif is fast
> enough that it doesn't even matter if you need to do it multiple times:
> not worth caching frames from them, IMO, unless the cache is on the GPU
> side. But I think the tradeoffs are different for different sizes. In
> some cases, an animated gif can be very compact and yet a full set of
> decoded frames can be enormous, so it doesn't make sense to cache it
> anywhere. Decoding one frame at a time is the cheapest. Even if you
> had to keep re-reading the file, doesn't the OS cache the file contents
> in RAM anyway? (A controversial position, I'm sure.) So how and whether
> to decide at runtime what to cache how and where, or leave it up to the
> application developer by exposing all the suitable APIs, is Qt's
> problem... sorry for the digression, my point is just that
> upload-and-forget is not the only way that a GPU needs to be used.
> Likewise large games are often streaming assets and geometry to the GPU
> more or less continuously, from what I've heard: depends which assets
> they can reasonably expect to be reused and to have enough GPU memory to
> retain them, I suppose?
Streaming textures (and other data) to the GPU is indeed a complex topic. Large games usually calculate what's needed, then load the data that's not already in the GPU and upload it. They make a big difference between streamed data (e.g world textures) and non-streamed data (e.g UI textures).
Microsoft is also working on DirectStorage, which makes decoding/unpacking on the CPU obsolete by just uploading the data to the GPU and unpacking it there. I think they are also working on using other bus systems to transfer the data, but as far as I know, as the hardware is currently built, it's not that easily possible and you always need the CPU (and system memory) for streaming. I don't know the exact details though...
Modern game engines (I'm biased towards Unreal) really perfected streaming, also considering the incredible speeds of SSDs. Nanite, for example, describes a hierarchy of clusters of polygons, each cluster with its own bounds. The cluster data is uploaded to the GPU, and the GPU (shader) does some fancy culling (frustum, occlusion, and size). This data is then used to "tell" the CPU which polygons (clusters) to load, and then only this data is loaded. This data is also stored on the GPU for as long as it's needed, the data is reused each frame. So per frame, only new clusters are streamed in, for example when moving around. In practice, each frame is slightly different, so you always have a few clusters here and there, but compared to full draw calls like in a classic pipeline, that's a huge difference!
> It's also my daydream to get the GPU to take care of UI rendering more
> completely, even for simple 2D stuff, and free up the CPU. It's one
> thing I'm hoping to achieve with my 9p scenegraph project (which is
> a slow-moving side project, not a Qt project). But in general, there
> might also be a backlash against excessive GPU usage coming, if people
> expect to use the GPU mainly for "hard problems" or embarrassingly-
> parallel algorithms like AI and 3D graphics, and not load it down with
> simple stuff that the CPU can just as well do for itself. And battery
> consumption might be a concern sometimes too. My attitude towards old
> CPU-based paint engines like draw and QPainter has been kindof negative
> since I started at Qt, because we've been trying to sell the idea that
> you have a GPU, so you might as well use it to get nice AA on all your
> graphics, animations "for free", alpha blending, and stuff like that. I
> still think AA is really a killer feature though. Just about makes
> 2D-on-the-gpu worthwhile all on its own. But Plan 9's draw could not
> have AA on everything, could it?
In fact, GPUs can still be used as 2d accelerators. In the end, it comes down to how you program it.
In the beginning, the GPU was more like devdraw, it could only draw 2d stuff based on simple draw calls. With shaders, you are free to do what you want. You can, in fact, upload some scene data structure, and your shader does interpolation, rasterization, etc. Sometimes it's even cheaper to do that in a shader than using dedicated hardware components.
> So while _I'm_ still interested in 2D on the GPU, I admit that you might
> be onto something with your gpufs proposal, to focus on treating it more
> like a computing resource than a fundamental piece of the graphics
> pipeline. But I think we should have both options. ;-)
The trend goes towards indirect rendering for reasons. For example, the standard graphics pipeline is very strict about what it does and what it expects, and also how it works. But for your specific application, you maybe don't need all the components, and you maybe need some components to work differently.
This is also the example of Nanite: large triangles are faster to rasterize on the hardware, using the dedicated hardware rasterizer, but smaller triangles are faster to render with a custom rasterizer engine. It becomes even more complex when thinking about other use cases, like particle rendering and displacement.
Additionally, when using the standard pipeline, you have to run a lot of boilerplate code that you don't really need. For Nanite, they even plan to implement compute-based shading (running the pixel shader as a compute shader), and they expect a performance gain.
I expect that at some point we won't have the hardware for this specialized rendering pipeline anymore. It will still exist as a concept, but the API (OpenGL, DirectX, ...) will simulate that.
>> With complex applications with hungry code and hungry graphics (many primitive draws)
>
> Many draw calls are the enemy of performance when it comes to GPU
> graphics. This is the main reason for needing a scene graph: first get
> the whole collection of everything that needs to go to the screen
> together into one pile, in an efficient form that is quick to traverse,
> and then traverse it and figure out how to combine the draw calls.
> (There is an impedance mismatch between any turtle-graphics or
> paint-engine API, and the GPU. You can solve it if you can intercept the
> draw calls and have them only populate the scene graph instead. Never do
> any drawing immediately on-demand. Or, give up on the imperative API
> altogether: be declarative.) This is the thing that Qt Quick is good at.
> The interesting question for me now is how best to map that idea to a
> filesystem.
It's not uncommon nowadays to have a command list builder that accepts commands from many different cpu threads. Those command lists will then be uploaded in bundles, which reduces the number of draw calls.
> Are you trying to have gpufs as an endpoint for uploading assets that
> could be invoked via draw calls too? Or just as a way to invoke the GPU
> to do embarrassingly-parallel computation at whatever stage one wants to
> use it for? (Why not make it possible to use it both ways, eventually?)
> But I would expect that this fs will have a close-to-the-hardware
> design, right?
Currently, gpufs is designed after vulkan, which is much closer to the hardware than OpenGL. It focuses on interacting with the hardware, instead of drawing graphics.
I'd like to treat assets and graphics as "just data". Your application (including shaders) defines what the data is. It could be a 3d model, a texture, a frame buffer, an animation, it fully depends on the application.
That way, you can upload your assets and your programs, and in the end you get the final frame as data, which can be interpreted as an image, for example. You could however als upload assets and your programs and get the final data, which is an animation. As a game developer, I have to build graphics though...
sirjofri
next prev parent reply other threads:[~2024-08-05 12:30 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-04 19:26 Willow Liquorice
2024-08-04 19:44 ` Stuart Morrow
2024-08-04 19:49 ` Willow Liquorice
2024-08-04 19:54 ` Eli Cohen
2024-08-04 19:59 ` Eli Cohen
2024-08-04 20:04 ` Eli Cohen
2024-08-04 20:05 ` Eli Cohen
2024-08-04 20:09 ` Willow Liquorice
2024-08-04 20:29 ` Eli Cohen
2024-08-04 21:23 ` ori
2024-08-04 21:43 ` ori
2024-08-04 22:00 ` David Leimbach
2024-08-04 22:22 ` ori
2024-08-04 22:42 ` David Leimbach
2024-08-04 22:57 ` ori
2024-08-04 21:10 ` Willow Liquorice
2024-08-04 21:24 ` ori
2024-08-04 21:25 ` Eli Cohen
2024-08-05 8:13 ` Willow Liquorice
2024-08-05 8:29 ` ori
2024-08-05 8:52 ` sirjofri
2024-08-05 8:57 ` Noam Preil
2024-08-05 9:12 ` sirjofri
2024-08-05 11:51 ` hiro
2024-08-05 9:03 ` Willow Liquorice
2024-08-05 11:05 ` Shawn Rutledge
2024-08-05 12:01 ` hiro
2024-08-05 12:26 ` sirjofri [this message]
2024-08-05 11:15 ` David Arnold
2024-08-05 11:47 ` hiro
2024-08-05 12:35 ` sirjofri
2024-08-04 20:01 ` Willow Liquorice
2024-08-04 21:23 ` sirjofri
2024-08-04 20:08 ` mkf9
2024-08-04 20:35 ` Willow Liquorice
2024-08-04 20:32 ` Pavel Renev
2024-08-04 21:31 ` ori
2024-08-05 6:09 ` Noam Preil
2024-08-05 8:02 ` hiro
2024-08-05 11:51 ` Shawn Rutledge
2024-08-06 16:37 ` hiro
2024-08-06 17:57 ` sirjofri
2024-08-07 9:27 ` Steve simon
2024-08-07 11:47 ` hiro
2024-08-05 12:54 ` Willow Liquorice
2024-08-05 13:13 ` [9front] Fortune worthy Steve simon
2024-08-05 20:06 ` [9front] Thoughts on Wayland? Jon Sharp
2024-08-06 0:07 ` Eli Cohen
2024-08-06 0:09 ` Eli Cohen
2024-08-06 1:57 ` Michael Misch
2024-08-06 13:01 ` Emil Tomczyk
2024-08-04 22:27 ` Dave MacFarlane
2024-08-05 6:10 ` Noam Preil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b63ecf29-6b2a-4a12-9a0e-06a13dbf2468@sirjofri.de \
--to=sirjofri+ml-9front@sirjofri.de \
--cc=9front@9front.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).