From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@quanstro.net>
Date: Sun, 24 Apr 2011 14:58:30 -0400
To: 9fans@9fans.net
Message-ID: <104fd12b2053326254839134b6464e6c@ladd.quanstro.net>
In-Reply-To: <20110422174708.26B5CB827@mail.bitblocks.com>
References: <255556ff42dac9585ddf5e7f766d7175@hamnavoe.com>
	<20110421211046.C474DB835@mail.bitblocks.com>
	<9482032322d5daaadceace1f6875dad3@coraid.com>
	<20110422080352.DF703B835@mail.bitblocks.com>
	<770447f0b2403cebfda8add66dfca662@ladd.quanstro.net>
	<20110422174708.26B5CB827@mail.bitblocks.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Subject: Re: [9fans] Q: moving directories? hard links?
Topicbox-Message-UUID: d395b96c-ead6-11e9-9d60-3106f5b1d025

> > since inside a disk drive, there is also striping across platters and
> > wierd remapping games  (and then there's flash), and i don't see
> > any justification for calling this a "different fs layout".  you wouldn't
> > say you changed datastructures if you use 8x1gb dimms instead of
> > 4x2gb, would you?
>
> I am not getting through....
>
> Check out some papers on http://www.cs.cmu.edu/~garth/
> See http://www.pdl.cmu.edu/PDL-FTP/NASD/asplos98.pdf for instance.

no you're getting through.  i just don't accept the zfs-inspired theory that
the filesystem must do these things.  or the other theory that is
a one gigantic fs with supposedly global visibilty through redundency
layers, etc. is the only game in town.  since two theories are
better than one, this often becomes the one big ball of goo grand
unification theory of storage.  it could be this has a lot of advantages,
but i can't overcome the gut (sorry) reaction that big complicated things
are just too hard to get right in practice.  maybe one can overcome this
with arbitrarly many programmers.  but if i thought that were the
way to go, i wouldn't bother with 9fans.

it seems to me that there are layers we really have no visibility into
already.  disk drives give one lbas these days.  it's important to remember
that "l" stands for logical; that is, we cannot infer with certainty
where a block is stored based on its lba.  thus an elevator algorithm
might do exactly the wrong thing.  the difference can be astounding—
like 250ms.  and if there's trouble reading a sector, this can slow things by
a second or so.  further, we often deal with virtualized things like drive arrays,
drive caches, flash caches, ssds or lvm-like things that make it even
harder for a fs to out-guess the storage.

i suppose there are two ways to go with this situation, remove all the
layers between you and the storage controller and hope that you can
out-guess what's left, or you can give up, declare storage opaque and
leave that up to the storage guys.  i'm taking the second position.  it
allows for inovation in layers below the fs without changing the fs.

it's surprising to me that venti hasn't opened more eyes to what's
possible with block storage.

unfortunately venti, is optimized for deduplication, not performance.
except for backup, this is the opposite of what one wants.

just as a simple counter example to your 10.5pb example, consider
a large vmware install. you may have 1000s of vmfs "files" layered
on your san but file i/o does not couple them; they're all independent.
i don't see how a large fs would help at all.

- erik