From mboxrd@z Thu Jan  1 00:00:00 1970
MIME-version: 1.0
Content-transfer-encoding: 7BIT
Content-type: text/plain
Date: Mon,  3 Aug 2009 18:32:02 -0700
From: Roman V Shaposhnik <rvs@sun.com>
In-reply-to: <13426df10908010847v5f4f891fq9510ad4b671660ea@mail.gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Message-id: <1249349522.479.15191.camel@work.SFBay.Sun.COM>
References: <1248914582.479.7837.camel@work.SFBay.Sun.COM>
	<140e7ec30907300931v3dbd8ebdl32a6b74792be144@mail.gmail.com>
	<28CF259C-21F4-4071-806E-6D5DA02C985D@sun.com>
	<13426df10907312241q409b9f9w412974485aec7fee@mail.gmail.com>
	<B61E3BA7-2399-4D30-9ADC-B7EBFAFE8CF4@sun.com>
	<13426df10908010847v5f4f891fq9510ad4b671660ea@mail.gmail.com>
Subject: Re: [9fans] ceph
Topicbox-Message-UUID: 3767b302-ead5-11e9-9d60-3106f5b1d025

On Sat, 2009-08-01 at 08:47 -0700, ron minnich wrote:
> > What are their requirements as
> > far as POSIX is concerned?
>
> 10,000 machines, working on a single app, must have access to a common
> file store with full posix semantics and it all has to work like it
> were one machine (their desktop, of course).
>
> This gets messy. It turns into an exercise of attempting to manage a
> competing set of race conditions. It's like tuning
> a multi-carburated enging from years gone by, assuming we ever had an
> engine with 10,000 cylinders.

Well, with Linux, at least you have a benefit of a gazillions of FS
clients being available either natively or via FUSE. With Solaris...
oh well...

> > How much storage are talking about?
> In  round numbers, for the small clusters, usually a couple hundred T.
> For anyhing else, more.

Is all of this storage attached to a very small number of IO nodes, or
is it evenly spread across the cluster?

In fact, I'm interested in both scenarios, so here come two questions:
  1. do we have anybody successfully managing that much storage (lets
     say ~100T) via something like humongous fossil installation (or
     kenfs for that matter)?

  2. do we have anybody successfully managing that much storage that is
     also spread across the nodes? And if so, what's the best practices
     out there to make the client not worry about where does the storage
     actually come from (IOW, any kind of proxying of I/O, etc)

I'm trying to see how the life after NFSv4 or AFS might look like for
the clients still clinging to the old ways of doing things, yet
trying to cooperatively use hundreds of T of storage.

> > I'd be interested in discussing some aspects of what you're trying to
> > accomplish with 9P for the HPC guys.
>
> The request: for each of the (lots of) compute nodes, have them mount
> over 9p to, say 100x fewer io nodes, each of those to run lustre.

Sorry for being dense, but what exactly is going to be accomplished
by proxying I/O in such a way?

Thanks,
Roman.