* [9fans] 9p server to multiply 9p messages? @ 2022-05-28 16:02 fgergo 2022-05-28 18:43 ` Skip Tavakkolian 2022-05-29 23:16 ` Bakul Shah 0 siblings, 2 replies; 25+ messages in thread From: fgergo @ 2022-05-28 16:02 UTC (permalink / raw) To: 9fans Has anybody considered (or maybe even implemented) a 9p server to multiply incoming 9p messages to 2 or more 9p servers? Maybe with 2 different strategies for responding to the original request? 1. respond as soon as at least 1 response from one of the 9p servers is received, 2. respond only after all responses had been received. thanks! ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-Me76b8d1bf6427627eb2fa9f3 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-28 16:02 [9fans] 9p server to multiply 9p messages? fgergo @ 2022-05-28 18:43 ` Skip Tavakkolian 2022-05-28 19:21 ` ron minnich 2022-05-29 10:23 ` fgergo 2022-05-29 23:16 ` Bakul Shah 1 sibling, 2 replies; 25+ messages in thread From: Skip Tavakkolian @ 2022-05-28 18:43 UTC (permalink / raw) To: 9fans Interesting idea! This assumes the downstream servers have identical namespace hierarchy; right? State management could be messy or impossible unless some sort of transaction structure is imposed on the {walk, [open/create, read/write]|[stat/wstat], clunk} sequences, where the server that replies to walk first, gets that transaction. On Sat, May 28, 2022 at 9:04 AM <fgergo@gmail.com> wrote: > > Has anybody considered (or maybe even implemented) a 9p server to > multiply incoming 9p messages to 2 or more 9p servers? > Maybe with 2 different strategies for responding to the original request? > 1. respond as soon as at least 1 response from one of the 9p servers > is received, > 2. respond only after all responses had been received. > thanks! ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M86625b23fdf7710ba07da4a4 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-28 18:43 ` Skip Tavakkolian @ 2022-05-28 19:21 ` ron minnich 2022-05-29 10:33 ` fgergo 2022-05-29 10:23 ` fgergo 1 sibling, 1 reply; 25+ messages in thread From: ron minnich @ 2022-05-28 19:21 UTC (permalink / raw) To: 9fans not for 9p, but in 1993, when Gene Kim interned with me at the Supercomputing Research Center, we did this: https://www.semanticscholar.org/paper/Bigfoot-NFS-%3A-A-Parallel-File-Striping-NFS-Server-(-Kim/19cb61337bab7b4de856fcbf29b55965647be091, similar in spirit to your idea. The core idea was that we distributed the files over the set of servers, and replicated directory trees to avoid the usual troubles: we did not want to implement a metadata server. So each NFS request was fanned out to 32 machines, vector rpc style, and because the networks of that time were so slow, it was not that much slower than you'd expect. It worked, it gave us what was at the time a really big NFS server, the paper got rejected twice at usenix, the original full paper is long lost, the code? probably lost too. It was available from super.org in 1993, but ... that's not on the wayback machine. On Sat, May 28, 2022 at 11:45 AM Skip Tavakkolian <skip.tavakkolian@gmail.com> wrote: > > Interesting idea! > > This assumes the downstream servers have identical namespace hierarchy; right? > > State management could be messy or impossible unless some sort of > transaction structure is imposed on the {walk, [open/create, > read/write]|[stat/wstat], clunk} sequences, where the server that > replies to walk first, gets that transaction. > > On Sat, May 28, 2022 at 9:04 AM <fgergo@gmail.com> wrote: > > > > Has anybody considered (or maybe even implemented) a 9p server to > > multiply incoming 9p messages to 2 or more 9p servers? > > Maybe with 2 different strategies for responding to the original request? > > 1. respond as soon as at least 1 response from one of the 9p servers > > is received, > > 2. respond only after all responses had been received. > > thanks! ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M7413556639c5150cb4c6e116 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-28 19:21 ` ron minnich @ 2022-05-29 10:33 ` fgergo 0 siblings, 0 replies; 25+ messages in thread From: fgergo @ 2022-05-29 10:33 UTC (permalink / raw) To: 9fans Thanks yes, this would be one use-case. On 5/28/22, ron minnich <rminnich@gmail.com> wrote: > not for 9p, but in 1993, when Gene Kim interned with me at the > Supercomputing Research Center, we did this: > https://www.semanticscholar.org/paper/Bigfoot-NFS-%3A-A-Parallel-File-Striping-NFS-Server-(-Kim/19cb61337bab7b4de856fcbf29b55965647be091, > similar in spirit to your idea. > > The core idea was that we distributed the files over the set of > servers, and replicated directory trees to avoid the usual troubles: > we did not want to implement a metadata server. So each NFS request > was fanned out to 32 machines, vector rpc style, and because the > networks of that time were so slow, it was not that much slower than > you'd expect. > > It worked, it gave us what was at the time a really big NFS server, > the paper got rejected twice at usenix, the original full paper is > long lost, the code? probably lost too. > > It was available from super.org in 1993, but ... that's not on the > wayback machine. > > On Sat, May 28, 2022 at 11:45 AM Skip Tavakkolian > <skip.tavakkolian@gmail.com> wrote: >> >> Interesting idea! >> >> This assumes the downstream servers have identical namespace hierarchy; >> right? >> >> State management could be messy or impossible unless some sort of >> transaction structure is imposed on the {walk, [open/create, >> read/write]|[stat/wstat], clunk} sequences, where the server that >> replies to walk first, gets that transaction. >> >> On Sat, May 28, 2022 at 9:04 AM <fgergo@gmail.com> wrote: >> > >> > Has anybody considered (or maybe even implemented) a 9p server to >> > multiply incoming 9p messages to 2 or more 9p servers? >> > Maybe with 2 different strategies for responding to the original >> > request? >> > 1. respond as soon as at least 1 response from one of the 9p servers >> > is received, >> > 2. respond only after all responses had been received. >> > thanks! ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-Mb23151be1bea68b8a1ac928c Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-28 18:43 ` Skip Tavakkolian 2022-05-28 19:21 ` ron minnich @ 2022-05-29 10:23 ` fgergo 2022-05-29 11:41 ` fgergo 1 sibling, 1 reply; 25+ messages in thread From: fgergo @ 2022-05-29 10:23 UTC (permalink / raw) To: 9fans As a first approximation - assuming identical namespaces - this multiplier 9p server (9plier? multi9plier?) could be trivially(?) useful, used with recover(4) on all connections and with an independent synchronization mechanism, in case states would fall out of sync. Furthermore I would not rule out usefulness if the namespaces are not identical, though I think a higher level model (over 9p) would need to be considered to built anything useful. Thanks for your insight! On 5/28/22, Skip Tavakkolian <skip.tavakkolian@gmail.com> wrote: > Interesting idea! > > This assumes the downstream servers have identical namespace hierarchy; > right? > > State management could be messy or impossible unless some sort of > transaction structure is imposed on the {walk, [open/create, > read/write]|[stat/wstat], clunk} sequences, where the server that > replies to walk first, gets that transaction. > > On Sat, May 28, 2022 at 9:04 AM <fgergo@gmail.com> wrote: >> >> Has anybody considered (or maybe even implemented) a 9p server to >> multiply incoming 9p messages to 2 or more 9p servers? >> Maybe with 2 different strategies for responding to the original request? >> 1. respond as soon as at least 1 response from one of the 9p servers >> is received, >> 2. respond only after all responses had been received. >> thanks! ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M97ea1f3fe6561095767a222e Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-29 10:23 ` fgergo @ 2022-05-29 11:41 ` fgergo 0 siblings, 0 replies; 25+ messages in thread From: fgergo @ 2022-05-29 11:41 UTC (permalink / raw) To: 9fans s/over 9p/higher than 9p/ On 5/29/22, fgergo@gmail.com <fgergo@gmail.com> wrote: > As a first approximation - assuming identical namespaces - this > multiplier 9p server (9plier? multi9plier?) could be trivially(?) > useful, used with recover(4) on all connections and with an > independent synchronization mechanism, in case states would fall out > of sync. > > Furthermore I would not rule out usefulness if the namespaces are not > identical, though I think a higher level model (over 9p) would need to > be considered to built anything useful. > Thanks for your insight! > > On 5/28/22, Skip Tavakkolian <skip.tavakkolian@gmail.com> wrote: >> Interesting idea! >> >> This assumes the downstream servers have identical namespace hierarchy; >> right? >> >> State management could be messy or impossible unless some sort of >> transaction structure is imposed on the {walk, [open/create, >> read/write]|[stat/wstat], clunk} sequences, where the server that >> replies to walk first, gets that transaction. >> >> On Sat, May 28, 2022 at 9:04 AM <fgergo@gmail.com> wrote: >>> >>> Has anybody considered (or maybe even implemented) a 9p server to >>> multiply incoming 9p messages to 2 or more 9p servers? >>> Maybe with 2 different strategies for responding to the original >>> request? >>> 1. respond as soon as at least 1 response from one of the 9p servers >>> is received, >>> 2. respond only after all responses had been received. >>> thanks! ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M176ab7d3e0344d1d02e485e3 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-28 16:02 [9fans] 9p server to multiply 9p messages? fgergo 2022-05-28 18:43 ` Skip Tavakkolian @ 2022-05-29 23:16 ` Bakul Shah 2022-05-30 4:59 ` ori 1 sibling, 1 reply; 25+ messages in thread From: Bakul Shah @ 2022-05-29 23:16 UTC (permalink / raw) To: 9fans On May 28, 2022, at 9:02 AM, fgergo@gmail.com wrote: > > Has anybody considered (or maybe even implemented) a 9p server to > multiply incoming 9p messages to 2 or more 9p servers? > Maybe with 2 different strategies for responding to the original request? > 1. respond as soon as at least 1 response from one of the 9p servers > is received, > 2. respond only after all responses had been received. Some variation of this would be interesting for a clustered or distributed filesystem. The challenge would be doing this in an understandable way, cleanly and with good performance. Probably using separate namespaces for control & management operations. [Just brainstorming here...] May be think about this using a clean slate approach. Features that can be of use: - fault tolerance (more than one node storing the same bits) - scalable (in capacity, throughput, clients and server nodes) - consistent view of the "same" FS by its clients - file migration (transparent to a client e.g. to reduce latency) - controlled sharing - file size that can exceed the capacity of a single node - allow storing files bigger than can fit on a node - nodes can show up/go away dynamically - provide multiple security domains (one bank, many customers!) - access to older snapshots - allow use of any local FS for storage - easy to provision & manage ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M30f1ef13e6748428ffc79346 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-29 23:16 ` Bakul Shah @ 2022-05-30 4:59 ` ori 2022-05-30 7:19 ` Bakul Shah 2022-05-30 8:33 ` hiro 0 siblings, 2 replies; 25+ messages in thread From: ori @ 2022-05-30 4:59 UTC (permalink / raw) To: 9fans Quoth Bakul Shah <bakul@iitbombay.org>: > > Some variation of this would be interesting for a clustered > or distributed filesystem. The challenge would be doing this > in an understandable way, cleanly and with good performance. > Probably using separate namespaces for control & management > operations. the challenge is that 9p is stateful, so all servers must replay the same messages in the same order; this means that if one of the replicas fails or returns a result that is not the same as the other, the front falls off. this means mirroring messages naïvely reduces reliability and performance, rather than increasing it. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M932bce5897117e1749590e6b Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-30 4:59 ` ori @ 2022-05-30 7:19 ` Bakul Shah 2022-05-30 8:03 ` fgergo ` (2 more replies) 2022-05-30 8:33 ` hiro 1 sibling, 3 replies; 25+ messages in thread From: Bakul Shah @ 2022-05-30 7:19 UTC (permalink / raw) To: 9fans On May 29, 2022, at 10:01 PM, ori@eigenstate.org wrote: > > the challenge is that 9p is stateful, so all servers must > replay the same messages in the same order; this means that > if one of the replicas fails or returns a result that is not > the same as the other, the front falls off. > > this means mirroring messages naïvely reduces reliability > and performance, rather than increasing it. I was not thinking of mirroring. I was thinking of ckustered or distributed systems like CephFS, IPFS, GlusterFS etc but thinking a cleaner & simpler design might be possible. But it is quite possible my brainstorm/pipedream is not realistic! 9p itself is low performance but that is a separate issue. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M356e62122d1f1d87bf554dca Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-30 7:19 ` Bakul Shah @ 2022-05-30 8:03 ` fgergo 2022-05-30 8:35 ` hiro 2022-05-31 16:14 ` ron minnich 2 siblings, 0 replies; 25+ messages in thread From: fgergo @ 2022-05-30 8:03 UTC (permalink / raw) To: 9fans On 5/30/22, Bakul Shah <bakul@iitbombay.org> wrote: > On May 29, 2022, at 10:01 PM, ori@eigenstate.org wrote: >> >> the challenge is that 9p is stateful, so all servers must >> replay the same messages in the same order; this means that >> if one of the replicas fails or returns a result that is not >> the same as the other, the front falls off. >> >> this means mirroring messages naïvely reduces reliability >> and performance, rather than increasing it. > > I was not thinking of mirroring. > > I was thinking of ckustered or distributed systems like > CephFS, IPFS, GlusterFS etc but thinking a cleaner & > simpler design might be possible. But it is quite possible > my brainstorm/pipedream is not realistic! > Besides the trivial applications (eg. mirroring with an out-of-band consolidation mechanism), I've been thinking more along these lines as well. Sure, for these applications multiplying would be just a basic function, and other, more interesting 9p servers would manipulate the namespaces of the different 9p servers, serving different parts of the "goal-namespace". A 9p multiplier would be just the first lego brick. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-Med52d098c5edf414b8d150ba Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-30 7:19 ` Bakul Shah 2022-05-30 8:03 ` fgergo @ 2022-05-30 8:35 ` hiro 2022-05-31 16:14 ` ron minnich 2 siblings, 0 replies; 25+ messages in thread From: hiro @ 2022-05-30 8:35 UTC (permalink / raw) To: 9fans > 9p itself is low performance but that is a separate issue. wrong ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M3261fe8a162f5dd9e1dc09c7 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-30 7:19 ` Bakul Shah 2022-05-30 8:03 ` fgergo 2022-05-30 8:35 ` hiro @ 2022-05-31 16:14 ` ron minnich 2022-05-31 18:27 ` hiro 2022-06-01 4:26 ` Bakul Shah 2 siblings, 2 replies; 25+ messages in thread From: ron minnich @ 2022-05-31 16:14 UTC (permalink / raw) To: 9fans On Mon, May 30, 2022 at 12:21 AM Bakul Shah <bakul@iitbombay.org> wrote: > 9p itself is low performance but that is a separate issue. Bakul, what are the units? It might be helpful to quantify this statement. Are you possibly conflating Plan 9 file systems being slow and 9p being slow? As Rob pointed out in 2013, "If go install is slow on Plan 9, it's because Plan 9's file system is slow (which it is and always has been)", so slowness in Plan 9 file systems is to be expected. 9p itself does have its limits, which is why Bell Labs Antwerp started an effort in 2011 to replace it, but the new work never went very far. I also know of a number of efforts in the virtualization world where 9p was discarded for performance reasons. It's hard to argue with the 100x performance improvement that comes with virtiofs, for example. Gvisor is replacing 9p: https://github.com/google/gvisor/milestone/6. Although, in the latter case, I would argue the problem is more with Linux limitations than 9p limitations -- linux can't seem to walk more than one pathname component at a time, for example, since it has the old school namei loop. But I'm wondering if you have a measurement with numbers. For rough order of magnitude, HPC file systems can deliver 10 Gbytes/ second for file reads nowadays, but getting there took 20 years of work. When we ran Plan 9 on Blue Gene, with the 6 Gbyte/second toroidal mesh connect for each node, we never came remotely close to that figure. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M410e08e9297838b9bb37bb5a Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-31 16:14 ` ron minnich @ 2022-05-31 18:27 ` hiro 2022-05-31 18:35 ` ori 2022-06-01 12:00 ` ron minnich 2022-06-01 4:26 ` Bakul Shah 1 sibling, 2 replies; 25+ messages in thread From: hiro @ 2022-05-31 18:27 UTC (permalink / raw) To: 9fans so virtiofs is not using 9p any more? and with 10 million parallel requests, why shouldn't 9p be able to deliver 10GB/s ?! On 5/31/22, ron minnich <rminnich@gmail.com> wrote: > On Mon, May 30, 2022 at 12:21 AM Bakul Shah <bakul@iitbombay.org> wrote: >> 9p itself is low performance but that is a separate issue. > > Bakul, what are the units? It might be helpful to quantify this > statement. Are you possibly conflating Plan 9 file systems being slow > and 9p being slow? > > As Rob pointed out in 2013, "If go install is slow on Plan 9, it's > because Plan 9's file system is > slow (which it is and always has been)", so slowness in Plan 9 file > systems is to be expected. > > 9p itself does have its limits, which is why Bell Labs Antwerp started > an effort in 2011 to replace it, but the new work never went very far. > > I also know of a number of efforts in the virtualization world where > 9p was discarded for performance reasons. It's hard to argue with the > 100x performance improvement that comes with virtiofs, for example. > > Gvisor is replacing 9p: https://github.com/google/gvisor/milestone/6. > Although, in the latter case, I would argue the problem is more with > Linux limitations than 9p limitations -- linux can't seem to walk more > than one pathname component at a time, for example, since it has the > old school namei loop. > > But I'm wondering if you have a measurement with numbers. > > For rough order of magnitude, HPC file systems can deliver 10 Gbytes/ > second for file reads nowadays, but getting there took 20 years of > work. When we ran Plan 9 on Blue Gene, with the 6 Gbyte/second > toroidal mesh connect for each node, we never came remotely close to > that figure. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M650fba778076835adf9ce8df Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-31 18:27 ` hiro @ 2022-05-31 18:35 ` ori 2022-06-01 12:00 ` ron minnich 1 sibling, 0 replies; 25+ messages in thread From: ori @ 2022-05-31 18:35 UTC (permalink / raw) To: 9fans Quoth hiro <23hiro@gmail.com>: > > and with 10 million parallel requests, why shouldn't 9p be able to > deliver 10GB/s ?! the tag field is 16 bits. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M07ead5c0290adc1a3494d29c Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-31 18:27 ` hiro 2022-05-31 18:35 ` ori @ 2022-06-01 12:00 ` ron minnich 2022-06-01 14:51 ` ori ` (2 more replies) 1 sibling, 3 replies; 25+ messages in thread From: ron minnich @ 2022-06-01 12:00 UTC (permalink / raw) To: 9fans On Tue, May 31, 2022 at 11:29 AM hiro <23hiro@gmail.com> wrote: > > so virtiofs is not using 9p any more? > > and with 10 million parallel requests, why shouldn't 9p be able to > deliver 10GB/s ?! Everyone always says this. I used to say it too. 9p requires a certain degree of ordering -- as Andrey once pointed out, it's not productive to close a file, then write it. So there is a tricky ordering requirement you need to get right, due to Plan 9 being stateful. The way we use 9p in Plan 9, as a general purpose protocol for everything, like devices, requires that each Tread or Twrite occur in order, but also requires that each T be retired before the next T is issued. devmnt does this. If you don't do this, hardware can get confused (e.g. ordering of Twrite followed by Tread followed by Twrite needs to be maintained. E.g. you don't want to issue the Tread before you know the Twrite happened. E.g. pre-posting 100 Treads to /dev/mouse is not a good idea if you suddenly want to do a Twrite in the middle of it). This is why 9p starts to perform poorly in networks with high bandwidth*delay products -- if you watch the net traffic, you see each T op on fid blocked by the previous Reply (by devmnt). I never figured out a way to fix this without fixing devmnt -- by removing its general nature. But, more to the point, whether or not 9p should be able to do all these parallel requests and get high performance, nobody has yet done it. The only numbers ever reported for making high bandhwidth*delay networks better were in Floren's thesis, when he added Tstream. After 20+ years of this discussion, I start to wondering whether it's harder than it looks. ron ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M71f56ad40eb62ce87f0917e3 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-06-01 12:00 ` ron minnich @ 2022-06-01 14:51 ` ori 2022-06-01 15:31 ` hiro 2022-06-01 16:01 ` ori 2 siblings, 0 replies; 25+ messages in thread From: ori @ 2022-06-01 14:51 UTC (permalink / raw) To: 9fans Quoth ron minnich <rminnich@gmail.com>: > This is why 9p starts to perform poorly in networks with high > bandwidth*delay products -- if you watch the net traffic, you see each > T op on fid blocked by the previous Reply (by devmnt). > > I never figured out a way to fix this without fixing devmnt -- by > removing its general nature. I suspect there are 2 changes that would be needed. First, a shallow protocol change to 9p, where a 'bundle' tag is added, such that if an Rerror is returned for any message in the same bundle, the rest of the bundle is not executed. Second, the userspace API would need to change so that reads and writes can return without waiting for a result, this is harder, and I haven't come up with anything satisfying. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M087325fe3d809fc254b1b283 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-06-01 12:00 ` ron minnich 2022-06-01 14:51 ` ori @ 2022-06-01 15:31 ` hiro 2022-06-01 15:39 ` hiro 2022-06-01 16:01 ` ori 2 siblings, 1 reply; 25+ messages in thread From: hiro @ 2022-06-01 15:31 UTC (permalink / raw) To: 9fans I don't think the reason nobody is doing this is that it's difficult per se. Fcp also achieves parallelism without any changes to 9p. And posix fs also share some of our statefulness. A file system can have offsets, readahead can help. Other synthetic FS need different tricks, but we can exchange some guarantees that are only needed in seekable files for an optimization that shall only be done on pipes and streaming access. There's some trivial heuristic solutions for this but they are not generic naturally. If one were to do this right, after a few increments one will see that bandwidth limits are hit, which is a new problem that is much harder to solve and impossible without even more heuristics classifications possibly applied by a distributed 9p scheduler (dynamic multi hop network congestion awareness anybody?) On 6/1/22, ron minnich <rminnich@gmail.com> wrote: > On Tue, May 31, 2022 at 11:29 AM hiro <23hiro@gmail.com> wrote: >> >> so virtiofs is not using 9p any more? >> >> and with 10 million parallel requests, why shouldn't 9p be able to >> deliver 10GB/s ?! > > Everyone always says this. I used to say it too. > > 9p requires a certain degree of ordering -- as Andrey once pointed > out, it's not productive to close a file, then write it. So there is a > tricky ordering requirement you need to get right, due to Plan 9 being > stateful. > > The way we use 9p in Plan 9, as a general purpose protocol for > everything, like devices, requires that each Tread or Twrite occur in > order, but also requires that each T be retired before the next T is > issued. devmnt does this. If you don't do this, hardware can get > confused (e.g. ordering of Twrite followed by Tread followed by Twrite > needs to be maintained. E.g. you don't want to issue the Tread before > you know the Twrite happened. E.g. pre-posting 100 Treads to > /dev/mouse is not a good idea if you suddenly want to do a Twrite in > the middle of it). > > This is why 9p starts to perform poorly in networks with high > bandwidth*delay products -- if you watch the net traffic, you see each > T op on fid blocked by the previous Reply (by devmnt). > > I never figured out a way to fix this without fixing devmnt -- by > removing its general nature. > > But, more to the point, whether or not 9p should be able to do all > these parallel requests and get high performance, nobody has yet done > it. The only numbers ever reported for making high bandhwidth*delay > networks better were in Floren's thesis, when he added Tstream. > > After 20+ years of this discussion, I start to wondering whether it's > harder than it looks. > > ron ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M2be93c6a04bb2586cba3b797 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-06-01 15:31 ` hiro @ 2022-06-01 15:39 ` hiro 0 siblings, 0 replies; 25+ messages in thread From: hiro @ 2022-06-01 15:39 UTC (permalink / raw) To: 9fans In case this is not immediately clear: theoretically preventable 1rtt minimum delays are much less bad than the practically unbounded maximum delays in congested networks. Put in another way: making some few things fast is much more easy than making sure that everything else doesn't get infinitely slow as a result to this. Right now huge streams don't get huge unfair advantages unless the rtt is very small or the parallelism very high On 6/1/22, hiro <23hiro@gmail.com> wrote: > I don't think the reason nobody is doing this is that it's difficult per > se. > > Fcp also achieves parallelism without any changes to 9p. > > And posix fs also share some of our statefulness. > > A file system can have offsets, readahead can help. > > Other synthetic FS need different tricks, but we can exchange some > guarantees that are only needed in seekable files for an optimization > that shall only be done on pipes and streaming access. > > There's some trivial heuristic solutions for this but they are not > generic naturally. > > If one were to do this right, after a few increments one will see that > bandwidth limits are hit, which is a new problem that is much harder > to solve and impossible without even more heuristics classifications > possibly applied by a distributed 9p scheduler (dynamic multi hop > network congestion awareness anybody?) > > On 6/1/22, ron minnich <rminnich@gmail.com> wrote: >> On Tue, May 31, 2022 at 11:29 AM hiro <23hiro@gmail.com> wrote: >>> >>> so virtiofs is not using 9p any more? >>> >>> and with 10 million parallel requests, why shouldn't 9p be able to >>> deliver 10GB/s ?! >> >> Everyone always says this. I used to say it too. >> >> 9p requires a certain degree of ordering -- as Andrey once pointed >> out, it's not productive to close a file, then write it. So there is a >> tricky ordering requirement you need to get right, due to Plan 9 being >> stateful. >> >> The way we use 9p in Plan 9, as a general purpose protocol for >> everything, like devices, requires that each Tread or Twrite occur in >> order, but also requires that each T be retired before the next T is >> issued. devmnt does this. If you don't do this, hardware can get >> confused (e.g. ordering of Twrite followed by Tread followed by Twrite >> needs to be maintained. E.g. you don't want to issue the Tread before >> you know the Twrite happened. E.g. pre-posting 100 Treads to >> /dev/mouse is not a good idea if you suddenly want to do a Twrite in >> the middle of it). >> >> This is why 9p starts to perform poorly in networks with high >> bandwidth*delay products -- if you watch the net traffic, you see each >> T op on fid blocked by the previous Reply (by devmnt). >> >> I never figured out a way to fix this without fixing devmnt -- by >> removing its general nature. >> >> But, more to the point, whether or not 9p should be able to do all >> these parallel requests and get high performance, nobody has yet done >> it. The only numbers ever reported for making high bandhwidth*delay >> networks better were in Floren's thesis, when he added Tstream. >> >> After 20+ years of this discussion, I start to wondering whether it's >> harder than it looks. >> >> ron ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M099a56feb7c401cc1d0b3ed6 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-06-01 12:00 ` ron minnich 2022-06-01 14:51 ` ori 2022-06-01 15:31 ` hiro @ 2022-06-01 16:01 ` ori 2 siblings, 0 replies; 25+ messages in thread From: ori @ 2022-06-01 16:01 UTC (permalink / raw) To: 9fans Quoth ron minnich <rminnich@gmail.com>: > This is why 9p starts to perform poorly in networks with high > bandwidth*delay products -- if you watch the net traffic, you see each > T op on fid blocked by the previous Reply (by devmnt). > > I never figured out a way to fix this without fixing devmnt -- by > removing its general nature. I suspect there are 2 changes that would be needed. First, a shallow protocol change to 9p, where a 'bundle' tag is added, such that if an Rerror is returned for any message in the same bundle, the rest of the bundle is not executed. Second, the userspace API would need to change so that reads and writes can return without waiting for a result, this is harder, and I haven't come up with anything satisfying. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M5fd4417f3e9ea39b38e54f35 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-31 16:14 ` ron minnich 2022-05-31 18:27 ` hiro @ 2022-06-01 4:26 ` Bakul Shah 2022-06-01 7:25 ` hiro 2022-06-01 15:55 ` Jacob Moody 1 sibling, 2 replies; 25+ messages in thread From: Bakul Shah @ 2022-06-01 4:26 UTC (permalink / raw) To: 9fans On May 31, 2022, at 9:14 AM, ron minnich <rminnich@gmail.com> wrote: > > On Mon, May 30, 2022 at 12:21 AM Bakul Shah <bakul@iitbombay.org> wrote: >> 9p itself is low performance but that is a separate issue. > > Bakul, what are the units? It might be helpful to quantify this > statement. Are you possibly conflating Plan 9 file systems being slow > and 9p being slow? I did a quick test: From a 9front VM to another machine I get about 11.7 MBps caching. The first time around it was close to 7.3 MBps). From an Ubuntu VM to another machine I get about 111 MBps (cached. The first time around it was close to 62 MBps). Both VMs run on the same host. Test copies to the same target machine. I used 9p read for 9front, scp for Linux, copy to /dev/null. The target machine is freebsd. The VMs talk to the target over a 1Gbps ethernet (so 111 MBps is the wirespeed limit). 9front uses hjfs. Ubuntu uses ext4. On the host I give a file as the guest "disk", using 'nvme' type device on bhyve to each VM. Both 9front and ubuntu are 64 bit kernels. This is a very rough measurement as there are many differences between the systems. The filesystem overhead is clearly an issue but 10 times worse? ----- Looking at the protocol: For read/write 9p uses 4 byte for size so in theory you can send very large packets but then you have to buffer up a lot of data. Ideally you want streaming (some sort of sliding window). May be you can use the tag field to do something more intelligent. Not sure any implementations do so. You also have head of line blocking if you can have only one TCP connection to a server. > As Rob pointed out in 2013, "If go install is slow on Plan 9, it's > because Plan 9's file system is > slow (which it is and always has been)", so slowness in Plan 9 file > systems is to be expected. > > 9p itself does have its limits, which is why Bell Labs Antwerp started > an effort in 2011 to replace it, but the new work never went very far. > > I also know of a number of efforts in the virtualization world where > 9p was discarded for performance reasons. It's hard to argue with the > 100x performance improvement that comes with virtiofs, for example. Why is virtiofs 100x faster? Just lot of hardwork and tuning? May be that is good place to look to learn what needs to change (in case someone wants to replace 9p with something else)? > Gvisor is replacing 9p: https://github.com/google/gvisor/milestone/6. > Although, in the latter case, I would argue the problem is more with > Linux limitations than 9p limitations -- linux can't seem to walk more > than one pathname component at a time, for example, since it has the > old school namei loop. > > But I'm wondering if you have a measurement with numbers. > > For rough order of magnitude, HPC file systems can deliver 10 Gbytes/ > second for file reads nowadays, but getting there took 20 years of > work. When we ran Plan 9 on Blue Gene, with the 6 Gbyte/second > toroidal mesh connect for each node, we never came remotely close to > that figure. Given that experience, why do you need "numbers"? :-) Running 10Gbps links even @ home is quite doable now. With TCP you can achieve decent performance if not quite wirespeed. NVMe "disks" are pretty damn fast - you can easily get 2-4 GBps. But I think at remote filesystem protocol level you'd have to optimize multiple things in order to get close to wirespeed performance. Minimize copying, increase concurrency, reduce overhead in frequently used common path code, reduce user/kernel crossings etc. I think rdma and mmap will probably get used a lot too (obviously on non-plan9 OSes!). May be if you pushed 9p knowledge down to a smart NIC, it can map a tag value to compute location where the data needs to go. But all this is just handwaving. Without a real project and funding it is hard to get sufficiently motivated to do more. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-Mf37a0689afc5c54c9aba65d7 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-06-01 4:26 ` Bakul Shah @ 2022-06-01 7:25 ` hiro 2022-06-01 15:55 ` Jacob Moody 1 sibling, 0 replies; 25+ messages in thread From: hiro @ 2022-06-01 7:25 UTC (permalink / raw) To: 9fans And fcp? On 6/1/22, Bakul Shah <bakul@iitbombay.org> wrote: > On May 31, 2022, at 9:14 AM, ron minnich <rminnich@gmail.com> wrote: >> >> On Mon, May 30, 2022 at 12:21 AM Bakul Shah <bakul@iitbombay.org> wrote: >>> 9p itself is low performance but that is a separate issue. >> >> Bakul, what are the units? It might be helpful to quantify this >> statement. Are you possibly conflating Plan 9 file systems being slow >> and 9p being slow? > > I did a quick test: > > From a 9front VM to another machine I get about 11.7 MBps > caching. The first time around it was close to 7.3 MBps). > > From an Ubuntu VM to another machine I get about 111 MBps > (cached. The first time around it was close to 62 MBps). > > Both VMs run on the same host. Test copies to the same target > machine. I used 9p read for 9front, scp for Linux, copy to > /dev/null. The target machine is freebsd. The VMs talk to > the target over a 1Gbps ethernet (so 111 MBps is the wirespeed > limit). > > 9front uses hjfs. Ubuntu uses ext4. On the host I give a file > as the guest "disk", using 'nvme' type device on bhyve to each > VM. Both 9front and ubuntu are 64 bit kernels. > > This is a very rough measurement as there are many differences > between the systems. The filesystem overhead is clearly an issue > but 10 times worse? > ----- > Looking at the protocol: > > For read/write 9p uses 4 byte for size so in theory you can send > very large packets but then you have to buffer up a lot of data. > Ideally you want streaming (some sort of sliding window). May be > you can use the tag field to do something more intelligent. Not > sure any implementations do so. You also have head of line blocking > if you can have only one TCP connection to a server. > >> As Rob pointed out in 2013, "If go install is slow on Plan 9, it's >> because Plan 9's file system is >> slow (which it is and always has been)", so slowness in Plan 9 file >> systems is to be expected. >> >> 9p itself does have its limits, which is why Bell Labs Antwerp started >> an effort in 2011 to replace it, but the new work never went very far. >> >> I also know of a number of efforts in the virtualization world where >> 9p was discarded for performance reasons. It's hard to argue with the >> 100x performance improvement that comes with virtiofs, for example. > > > Why is virtiofs 100x faster? Just lot of hardwork and tuning? > May be that is good place to look to learn what needs to change > (in case someone wants to replace 9p with something else)? > >> Gvisor is replacing 9p: https://github.com/google/gvisor/milestone/6. >> Although, in the latter case, I would argue the problem is more with >> Linux limitations than 9p limitations -- linux can't seem to walk more >> than one pathname component at a time, for example, since it has the >> old school namei loop. >> >> But I'm wondering if you have a measurement with numbers. >> >> For rough order of magnitude, HPC file systems can deliver 10 Gbytes/ >> second for file reads nowadays, but getting there took 20 years of >> work. When we ran Plan 9 on Blue Gene, with the 6 Gbyte/second >> toroidal mesh connect for each node, we never came remotely close to >> that figure. > > Given that experience, why do you need "numbers"? :-) > > Running 10Gbps links even @ home is quite doable now. With TCP you > can achieve decent performance if not quite wirespeed. NVMe "disks" > are pretty damn fast - you can easily get 2-4 GBps. But I think at > remote filesystem protocol level you'd have to optimize multiple > things in order to get close to wirespeed performance. Minimize > copying, increase concurrency, reduce overhead in frequently used > common path code, reduce user/kernel crossings etc. I think rdma and > mmap will probably get used a lot too (obviously on non-plan9 OSes!). > May be if you pushed 9p knowledge down to a smart NIC, it can map a > tag value to compute location where the data needs to go. > > But all this is just handwaving. Without a real project and funding > it is hard to get sufficiently motivated to do more. > ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M9900ebe3ebf76b5d4d4426bd Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-06-01 4:26 ` Bakul Shah 2022-06-01 7:25 ` hiro @ 2022-06-01 15:55 ` Jacob Moody 2022-06-01 17:56 ` Steve Simon 1 sibling, 1 reply; 25+ messages in thread From: Jacob Moody @ 2022-06-01 15:55 UTC (permalink / raw) To: 9fans hjfs is not exactly known for it's speed[0]. Running a cwfs without a worm[1] is likely a more interesting comparison. I also would recommend using kvik's clone[2] for copying in parallel. Would be curious how that stacks up. Thanks, moody [0] http://fqa.9front.org/fqa4.html#4.3.6 [1] http://fqa.9front.org/fqa4.html#4.3.6.1 [2] https://git.sr.ht/~kvik/clone ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M0f93091b620bfd87f9d0f56a Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-06-01 15:55 ` Jacob Moody @ 2022-06-01 17:56 ` Steve Simon 2022-06-01 22:29 ` hiro 0 siblings, 1 reply; 25+ messages in thread From: Steve Simon @ 2022-06-01 17:56 UTC (permalink / raw) To: 9fans for performance testing why not copy from ramfs on one machine to ramfs on another? the suggestion from a 9con passim was to have fossil/cwfs/hjfs etc add a Qid type flag to files indicating they are from backing store (QTSTABLE ?)and thus may be copied in parallel. devices and synthetic would not normally have this flag forcing the read or write be sequential. you could even make the file server set this flag only on files that have not changed in X days, and thus the contents are more likely to be stable (idea from the SRC package from DEC) perhaps i missed something but i always thought the idea had legs. -Steve > On 1 Jun 2022, at 4:56 pm, Jacob Moody <moody@posixcafe.org> wrote: > > hjfs is not exactly known for it's speed[0]. Running a cwfs > without a worm[1] is likely a more interesting comparison. > > I also would recommend using kvik's clone[2] for copying > in parallel. > > Would be curious how that stacks up. > > Thanks, > moody > > [0] http://fqa.9front.org/fqa4.html#4.3.6 > [1] http://fqa.9front.org/fqa4.html#4.3.6.1 > [2] https://git.sr.ht/~kvik/clone ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M45fdc663ef275e87c9b77f37 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-06-01 17:56 ` Steve Simon @ 2022-06-01 22:29 ` hiro 0 siblings, 0 replies; 25+ messages in thread From: hiro @ 2022-06-01 22:29 UTC (permalink / raw) To: 9fans On 6/1/22, Steve Simon <steve@quintile.net> wrote: > for performance testing why not copy from ramfs on one machine to ramfs on > another? ramfs is single-process and thus quite slow. > the suggestion from a 9con passim was to have fossil/cwfs/hjfs etc add a Qid > type flag to files indicating they are from backing store (QTSTABLE ?)and > thus may be copied in parallel. devices and synthetic would not normally > have this flag forcing the read or write be sequential. yeah, that's what my comment about readahead is based on. this work has been already done at least for cwfs. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-M4ded96baa1173c231b64685d Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [9fans] 9p server to multiply 9p messages? 2022-05-30 4:59 ` ori 2022-05-30 7:19 ` Bakul Shah @ 2022-05-30 8:33 ` hiro 1 sibling, 0 replies; 25+ messages in thread From: hiro @ 2022-05-30 8:33 UTC (permalink / raw) To: 9fans > the challenge is that 9p is stateful, so all servers must > replay the same messages in the same order no, not all servers. 9p state could be faked, that's not the main problem here. the main problem is the higher layer application logic per server. this is both good and bad. e.g. some very few servers are stateless in this sense. a lot of servers are stateful in a way that it becomes near-impossible to recreate the state somewhere else. there's also the "most stateful" of servers, the fileserver, which we *can* reconnect to trivially in some edge-cases, because disk files always support seeking, if we sync the open/seeked files and positions. by ignoring those layers of abstraction on top of 9p it looks like 9p can provide for some kind of easy magic solutions. but that's just because 9p is simple and doesn't do much at all, not much magic either. impossible. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T769854fafd2b7d35-Me26b23970f5aba92282f4652 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2022-06-02 3:08 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-28 16:02 [9fans] 9p server to multiply 9p messages? fgergo 2022-05-28 18:43 ` Skip Tavakkolian 2022-05-28 19:21 ` ron minnich 2022-05-29 10:33 ` fgergo 2022-05-29 10:23 ` fgergo 2022-05-29 11:41 ` fgergo 2022-05-29 23:16 ` Bakul Shah 2022-05-30 4:59 ` ori 2022-05-30 7:19 ` Bakul Shah 2022-05-30 8:03 ` fgergo 2022-05-30 8:35 ` hiro 2022-05-31 16:14 ` ron minnich 2022-05-31 18:27 ` hiro 2022-05-31 18:35 ` ori 2022-06-01 12:00 ` ron minnich 2022-06-01 14:51 ` ori 2022-06-01 15:31 ` hiro 2022-06-01 15:39 ` hiro 2022-06-01 16:01 ` ori 2022-06-01 4:26 ` Bakul Shah 2022-06-01 7:25 ` hiro 2022-06-01 15:55 ` Jacob Moody 2022-06-01 17:56 ` Steve Simon 2022-06-01 22:29 ` hiro 2022-05-30 8:33 ` hiro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).