From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <9front-bounces@9front.inri.net> X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: from 9front.inri.net (9front.inri.net [168.235.81.73]) by inbox.vuxu.org (Postfix) with ESMTP id 8CA1225BCC for ; Mon, 5 Aug 2024 13:09:44 +0200 (CEST) Received: from hedgehog.birch.relay.mailchannels.net ([23.83.209.81]) by 9front; Mon Aug 5 07:05:15 -0400 2024 X-Sender-Id: dreamhost|x-authsender|lists@ecloud.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 4D34D858A8 for <9front@9front.org>; Mon, 5 Aug 2024 11:05:12 +0000 (UTC) Received: from pdx1-sub0-mail-a266.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id C87FA858ED for <9front@9front.org>; Mon, 5 Aug 2024 11:05:11 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1722855911; a=rsa-sha256; cv=none; b=qimqVPl23xPJwFPeLgo9DdGZFJa11MsQbBV1fCP4pNNKiEdGbsTmoV2TmWQUStv7g+7tY9 sL4HcItZwZgRe+HG92/Mv9KC0HEwyuTazBhyneaaPs7wPF3Gs6CAaozrR9wje9sAn6ZJmB R9vZXficCmLpoivaVgEDtaV5r7eq27um1VrizAI5WcUNImMFItkQByUW946V46SsNx6tlp 01faCG4ziolKo+olq0vO0Ud3uUk1+5RfdFK6A9rT8NIvu7FwYiwr9Jamw9Eu9ef0KOeQ+/ QZYVCYxdP5FIdk0ayo8C6+OIIazn+7FTBWNe7DX5W8uqIxgnRA5AzbS0suC/nA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1722855911; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JQHLKRRPsHLWWLE7McleHnqXT1R50gnhsTcMB3ZMhh4=; b=dI0fZUJBG5AC326lSj9gJjNviiZtd3S0SD8UKc0EMxauPr9iIW9eZIIZvnzS05d798o+3K /6Gc/X2fkMBOQ6kziSKfWKs0o7RGJTknRE5wHu5IaG2BrCyTet6e+ilVWlSiJuvaELPPzI /JTIu08orIDfPrTJda6LyK6x1GeHhflWXDYWVlRg9ST087lvYGIrxwvfO7k2y/EiJYeHjM N+9WjqD/ciPh080fQUsShLRHGcmFY5SGwd2d+Qyj4dpL64+4tBpGWdAlPUH8vzzAQlyskr jngf393xmN3XR+YruCYesEjB8rCwnxUlIcaNplxT4/USL1p90gT45nrMcxX8jw== ARC-Authentication-Results: i=1; rspamd-6777b474b8-7g4qt; auth=pass smtp.auth=dreamhost smtp.mailfrom=lists@ecloud.org X-Sender-Id: dreamhost|x-authsender|lists@ecloud.org X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|lists@ecloud.org X-MailChannels-Auth-Id: dreamhost X-Thread-Little: 413ab345433d6bd5_1722855912046_974178784 X-MC-Loop-Signature: 1722855912046:2048075260 X-MC-Ingress-Time: 1722855912046 Received: from pdx1-sub0-mail-a266.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.112.150.197 (trex/7.0.2); Mon, 05 Aug 2024 11:05:12 +0000 Received: from black (unknown [91.74.18.99]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: lists@ecloud.org) by pdx1-sub0-mail-a266.dreamhost.com (Postfix) with ESMTPSA id 4WctrR0ZJBz1V for <9front@9front.org>; Mon, 5 Aug 2024 04:05:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ecloud.org; s=dreamhost; t=1722855911; bh=JQHLKRRPsHLWWLE7McleHnqXT1R50gnhsTcMB3ZMhh4=; h=Date:From:To:Subject:Content-Type; b=KQdYSn6xDs84Uwdtm1uqdGc01XaVQ9BXPNeYruGeGHs8YXGB436llIdxWlQNPkW3J wv4Tl+k1owOkBPwu+UWsFvQ9EvWEqbHRwRb31OBWST/0E9FhTfddhFkjP9xXCn7zVM 7jwLEfbK3D3pK1r46aJSs4dgWsIEiYR5hpOt6848qiaY/flfXCTAH/kKWpeDkU4x/I AR5GTK7brRwl+DOkcC9bpVQ1aqWDT1QB/ETOxYIhdCfFA12gt7DDhZPqpdv2SnmU7z Df82/pPIYCLB96E07yugDtTOsAAbKmSRzjRmqlzwXD8gcsJenr8OdUabz6YhTchq/Y Pae5mUP6Vl8bQ== Date: Mon, 5 Aug 2024 13:05:06 +0200 From: Shawn Rutledge To: 9front@9front.org Message-ID: <20240805110501.asula7k52eo5gdld@black> References: <7003a121-ae98-4a24-b0dc-778c3b086310@sirjofri.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7003a121-ae98-4a24-b0dc-778c3b086310@sirjofri.de> List-ID: <9front.9front.org> List-Help: X-Glyph: ➈ X-Bullshit: responsive property-oriented SSL injection extension Subject: Re: [9front] Thoughts on Wayland? Reply-To: 9front@9front.org Precedence: bulk On Mon, Aug 05, 2024 at 10:52:58AM +0200, sirjofri wrote: > 05.08.2024 10:34:45 ori@eigenstate.org: > > shared memory just isn't a very good model for distributed systems. > > devdraw is more or less what you get if you make a slimmed down cairo > > the native drawing model, and talk to it over a pipe for rpcs, but your > > musings are gradually reinventing devdraw :) > > That's what I like about GPUs: they have dedicated memory and you have > to upload manually. No synchronization issues, just what you upload is > there. It's basically a different machine. > > Also, the design is extremely similar to devdraw, just that you can > upload programs and arbitrary data. Imagine all your graphics routines > are executed on the devdraw server side, and you just kick off those > programs from the client. That would mean fluent animations with many > frames, and the cpu that runs the client application can just relax. Dedicated GPUs are like that, so portable APIs need to support working that way. But often on integrated graphics, GPU memory is just a chunk of system memory, which makes the "upload" trivial in practice. Perhaps it can even be zero-copy sometimes, if there's a way to just map a chunk into GPU-managed space after it's populated? It's very useful for AI applications when there's a lot of system memory and the boundary for "dedicated" GPU memory is fully flexible, as on Apple's unified memory. (For example I found that I can run a fully offline LLM with ollama on a modern mac; and after that, I stay away from all the online LLMs... I don't trust them, they are ultimately greedy, and if something is "free" it must be that we're helping them train it better by using their throttled interfaces.) So I'm not sure, but maybe we can expect that kind of architecture to be more common in the future. And it would be nice to have a way to avoid spending system memory at all for GPU resources, to be able to stream directly from the original source (e.g. a file) to GPU memory. This is an issue in Qt (my day job)... we have caching features in multiple places, we try to avoid reading files and remote resources more than once; but if you're playing an animation, and the desired end result is to have the frames as GPU textures, and you have enough GPU memory, then it should be ok to lose the CPU-side cache. Especially if the cache was not already in GPU-texture form. Decoding simple formats like png and gif is fast enough that it doesn't even matter if you need to do it multiple times: not worth caching frames from them, IMO, unless the cache is on the GPU side. But I think the tradeoffs are different for different sizes. In some cases, an animated gif can be very compact and yet a full set of decoded frames can be enormous, so it doesn't make sense to cache it anywhere. Decoding one frame at a time is the cheapest. Even if you had to keep re-reading the file, doesn't the OS cache the file contents in RAM anyway? (A controversial position, I'm sure.) So how and whether to decide at runtime what to cache how and where, or leave it up to the application developer by exposing all the suitable APIs, is Qt's problem... sorry for the digression, my point is just that upload-and-forget is not the only way that a GPU needs to be used. Likewise large games are often streaming assets and geometry to the GPU more or less continuously, from what I've heard: depends which assets they can reasonably expect to be reused and to have enough GPU memory to retain them, I suppose? It's also my daydream to get the GPU to take care of UI rendering more completely, even for simple 2D stuff, and free up the CPU. It's one thing I'm hoping to achieve with my 9p scenegraph project (which is a slow-moving side project, not a Qt project). But in general, there might also be a backlash against excessive GPU usage coming, if people expect to use the GPU mainly for "hard problems" or embarrassingly- parallel algorithms like AI and 3D graphics, and not load it down with simple stuff that the CPU can just as well do for itself. And battery consumption might be a concern sometimes too. My attitude towards old CPU-based paint engines like draw and QPainter has been kindof negative since I started at Qt, because we've been trying to sell the idea that you have a GPU, so you might as well use it to get nice AA on all your graphics, animations "for free", alpha blending, and stuff like that. I still think AA is really a killer feature though. Just about makes 2D-on-the-gpu worthwhile all on its own. But Plan 9's draw could not have AA on everything, could it? So while _I'm_ still interested in 2D on the GPU, I admit that you might be onto something with your gpufs proposal, to focus on treating it more like a computing resource than a fundamental piece of the graphics pipeline. But I think we should have both options. ;-) > With complex applications with hungry code and hungry graphics (many primitive draws) Many draw calls are the enemy of performance when it comes to GPU graphics. This is the main reason for needing a scene graph: first get the whole collection of everything that needs to go to the screen together into one pile, in an efficient form that is quick to traverse, and then traverse it and figure out how to combine the draw calls. (There is an impedance mismatch between any turtle-graphics or paint-engine API, and the GPU. You can solve it if you can intercept the draw calls and have them only populate the scene graph instead. Never do any drawing immediately on-demand. Or, give up on the imperative API altogether: be declarative.) This is the thing that Qt Quick is good at. The interesting question for me now is how best to map that idea to a filesystem. Are you trying to have gpufs as an endpoint for uploading assets that could be invoked via draw calls too? Or just as a way to invoke the GPU to do embarrassingly-parallel computation at whatever stage one wants to use it for? (Why not make it possible to use it both ways, eventually?) But I would expect that this fs will have a close-to-the-hardware design, right?