Re: [9front] Two file servers sharing an auth server

9front - general discussion about 9front
 help / color / mirror / Atom feed

From: sirjofri <sirjofri+ml-9front@sirjofri.de>
To: 9front@9front.org
Subject: Re: [9front] Two file servers sharing an auth server
Date: Fri, 4 Nov 2022 09:29:43 +0100 (GMT+01:00)	[thread overview]
Message-ID: <262944d5-3974-47fb-b8e3-2327a78f5edb@sirjofri.de> (raw)
In-Reply-To: <83E1A1F40327CC6C521C846162693736@thinktankworkspaces.com>

04.11.2022 08:11:26 william@thinktankworkspaces.com:

> Wow. I'm really interested in learning more about this. I kind of figured performance would be better
> local rather than served remotely. I might want to dig into this further and I'm curious about
> concurrency. Can multiple rc-httpd services be started to handle concurrency?
>
> Examples greatly welcomed
>
> What about non-script apps. I was really interested in some of the web features golang has and if it will work well on another node

Well, I'd say it depends on the application and what it does. Let's forget about user data for a moment, but I'm sure you can figure that out yourself.

Let's imagine a single-binary application for this. That binary starts once and runs forever, and it uses sed (as a separate application) for some task.

This binary could spawn one sed process and feed it data. In this case it doesn't really make much sense to copy the binaries to ramfs and start from there since it would only load both binaries once from the fileserver, start the processes and leave everything in memory. It doesn't matter if your application streams data to sed once per minute or 100 times in a second. The process is loaded already and no additional disk reads will happen.

Or, the application could spawn a sed process every time it uses it. This would mean, however, that the sed binary has to be loaded each time, which is can be a huge load for the disk. In this scenario it makes sense to copy the sed binary to some ramfs (or local disk if you need the memory otherwise, but sed is small).

It could even be worse if sed is dynamically linked, since the loader has to load all the binaries needed for sed.

rc-httpd is one of the worst examples though. It's totally fine to use it with some scale, but rc-httpd will fork many processes per connection.

Regarding your question about concurrency and rc-httpd: On Plan 9 you usually start your process per connection, so each connection from a client will have their own rc-httpd process (see /rc/bin/service/tcp80 or !tcp80). To keep the same state you need something persistent, like a filesystem or "real" files (whatever this means).

I think it would be possible to rewrite rc-httpd to spawn processes in the beginning and stream data to them, but I guess that would make rc-httpd more complicated and if you wanna do that you might as well use httpd or tcp80 or anything else.

If I wanted to design a scalable application for concurrency with Plan 9 in mind I'd probably create a filesystem that handles all synchronization, provides data for the compute nodes and receives results from them. That filesystem can be imported to all compute machines and then we could start the compute node processes on the compute system. Well, the binary could theoretically actually be part of the filesystem, this would allow us to distribute the task basically automatically.

Something similar to this:
- /mnt/concfs/
  - ctl (every process should read this and announce their presence and what they do)
  - data
  - program

Starting a compute process might be just importing the filesystem from the server, then run the "program" file (which handles everything else, like reading from ctl and doing the actual processing). You can see that it easily fits a very small shell script.

data would be something like the endpoint. For rendering an image it might be some structure for pixels, and the processes can just read scene data they need to process and write the pixel result. The data file could also be abstracted to a full filesystem hierarchy, which sometimes makes sense.

ctl would be the synchronization point. The server just tells the client which pixel to compute and the client just announces when it's done (after writing the pixel value to the data file). But that protocol would be hidden inside the "program" file, so the client machine actually knows nothing about what to compute and how the protocol works. It just has to know what file to execute.

Note that it's still possible to write standard multithreaded programs for stuff like that, using libthread, fork or anything like that. It all depends on the use case and the amount of scale.

It would be actually possible to have the "program" multithreaded, so each compute node would not run just one process but many.

Note that I never built a system like that, it's just some idea that would need refinement.

sirjofri

     prev parent reply	other threads:[~2022-11-04  8:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-02 19:18 Nathan Zimmen-Myers
2022-11-02 20:23 ` ori
2022-11-02 23:53   ` william
2022-11-03  0:43     ` Nathan Zimmen-Myers
2022-11-03  9:28       ` Steve Simon
2022-11-03 10:33         ` sirjofri
2022-11-03 20:57           ` Steve Simon
2022-11-04  0:00             ` william
2022-11-04  6:58             ` sirjofri
2022-11-04  7:09               ` william
2022-11-04  8:29                 ` sirjofri [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=262944d5-3974-47fb-b8e3-2327a78f5edb@sirjofri.de \
    --to=sirjofri+ml-9front@sirjofri.de \
    --cc=9front@9front.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).