From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain Date: Mon, 3 Aug 2009 18:32:02 -0700 From: Roman V Shaposhnik In-reply-to: <13426df10908010847v5f4f891fq9510ad4b671660ea@mail.gmail.com> To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Message-id: <1249349522.479.15191.camel@work.SFBay.Sun.COM> References: <1248914582.479.7837.camel@work.SFBay.Sun.COM> <140e7ec30907300931v3dbd8ebdl32a6b74792be144@mail.gmail.com> <28CF259C-21F4-4071-806E-6D5DA02C985D@sun.com> <13426df10907312241q409b9f9w412974485aec7fee@mail.gmail.com> <13426df10908010847v5f4f891fq9510ad4b671660ea@mail.gmail.com> Subject: Re: [9fans] ceph Topicbox-Message-UUID: 3767b302-ead5-11e9-9d60-3106f5b1d025 On Sat, 2009-08-01 at 08:47 -0700, ron minnich wrote: > > What are their requirements as > > far as POSIX is concerned? > > 10,000 machines, working on a single app, must have access to a common > file store with full posix semantics and it all has to work like it > were one machine (their desktop, of course). > > This gets messy. It turns into an exercise of attempting to manage a > competing set of race conditions. It's like tuning > a multi-carburated enging from years gone by, assuming we ever had an > engine with 10,000 cylinders. Well, with Linux, at least you have a benefit of a gazillions of FS clients being available either natively or via FUSE. With Solaris... oh well... > > How much storage are talking about? > In round numbers, for the small clusters, usually a couple hundred T. > For anyhing else, more. Is all of this storage attached to a very small number of IO nodes, or is it evenly spread across the cluster? In fact, I'm interested in both scenarios, so here come two questions: 1. do we have anybody successfully managing that much storage (lets say ~100T) via something like humongous fossil installation (or kenfs for that matter)? 2. do we have anybody successfully managing that much storage that is also spread across the nodes? And if so, what's the best practices out there to make the client not worry about where does the storage actually come from (IOW, any kind of proxying of I/O, etc) I'm trying to see how the life after NFSv4 or AFS might look like for the clients still clinging to the old ways of doing things, yet trying to cooperatively use hundreds of T of storage. > > I'd be interested in discussing some aspects of what you're trying to > > accomplish with 9P for the HPC guys. > > The request: for each of the (lots of) compute nodes, have them mount > over 9p to, say 100x fewer io nodes, each of those to run lustre. Sorry for being dense, but what exactly is going to be accomplished by proxying I/O in such a way? Thanks, Roman.