9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@9fans.net
Subject: Re: [9fans] Q: moving directories? hard links?
Date: Sun, 24 Apr 2011 14:58:30 -0400	[thread overview]
Message-ID: <104fd12b2053326254839134b6464e6c@ladd.quanstro.net> (raw)
In-Reply-To: <20110422174708.26B5CB827@mail.bitblocks.com>

> > since inside a disk drive, there is also striping across platters and
> > wierd remapping games  (and then there's flash), and i don't see
> > any justification for calling this a "different fs layout".  you wouldn't
> > say you changed datastructures if you use 8x1gb dimms instead of
> > 4x2gb, would you?
>
> I am not getting through....
>
> Check out some papers on http://www.cs.cmu.edu/~garth/
> See http://www.pdl.cmu.edu/PDL-FTP/NASD/asplos98.pdf for instance.

no you're getting through.  i just don't accept the zfs-inspired theory that
the filesystem must do these things.  or the other theory that is
a one gigantic fs with supposedly global visibilty through redundency
layers, etc. is the only game in town.  since two theories are
better than one, this often becomes the one big ball of goo grand
unification theory of storage.  it could be this has a lot of advantages,
but i can't overcome the gut (sorry) reaction that big complicated things
are just too hard to get right in practice.  maybe one can overcome this
with arbitrarly many programmers.  but if i thought that were the
way to go, i wouldn't bother with 9fans.

it seems to me that there are layers we really have no visibility into
already.  disk drives give one lbas these days.  it's important to remember
that "l" stands for logical; that is, we cannot infer with certainty
where a block is stored based on its lba.  thus an elevator algorithm
might do exactly the wrong thing.  the difference can be astounding—
like 250ms.  and if there's trouble reading a sector, this can slow things by
a second or so.  further, we often deal with virtualized things like drive arrays,
drive caches, flash caches, ssds or lvm-like things that make it even
harder for a fs to out-guess the storage.

i suppose there are two ways to go with this situation, remove all the
layers between you and the storage controller and hope that you can
out-guess what's left, or you can give up, declare storage opaque and
leave that up to the storage guys.  i'm taking the second position.  it
allows for inovation in layers below the fs without changing the fs.

it's surprising to me that venti hasn't opened more eyes to what's
possible with block storage.

unfortunately venti, is optimized for deduplication, not performance.
except for backup, this is the opposite of what one wants.

just as a simple counter example to your 10.5pb example, consider
a large vmware install. you may have 1000s of vmfs "files" layered
on your san but file i/o does not couple them; they're all independent.
i don't see how a large fs would help at all.

- erik



  reply	other threads:[~2011-04-24 18:58 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-16  2:36 smiley
2011-04-16  2:40 ` Jacob Todd
2011-04-16  4:22 ` ron minnich
2011-04-16 16:53   ` erik quanstrom
2011-04-16 17:33   ` smiley
2011-04-16 17:51     ` erik quanstrom
2011-04-21 15:32       ` smiley
2011-04-21 15:43         ` ron minnich
2011-04-16 18:22     ` Bakul Shah
2011-04-21 15:44       ` smiley
2011-04-21 15:49         ` ron minnich
2011-04-21 16:54         ` Bakul Shah
2011-04-21 20:17           ` Richard Miller
2011-04-21 21:10             ` Bakul Shah
2011-04-21 22:41               ` erik quanstrom
2011-04-21 23:17                 ` ron minnich
2011-04-21 23:54                   ` Bakul Shah
2011-04-21 23:55                   ` erik quanstrom
2011-04-22  0:01                     ` ron minnich
2011-04-22  0:04                       ` erik quanstrom
2011-04-22  8:03                 ` Bakul Shah
2011-04-22  8:27                   ` dexen deVries
2011-04-22 13:05                   ` erik quanstrom
2011-04-22 17:47                     ` Bakul Shah
2011-04-24 18:58                       ` erik quanstrom [this message]
2011-04-16 18:03   ` Richard Miller
2011-04-16 18:17     ` Skip Tavakkolian
2011-04-16 18:56       ` Rob Pike
2011-04-18  9:08         ` Aharon Robbins
2011-04-18 12:41           ` erik quanstrom
2011-04-18 12:59             ` Lucio De Re
2011-04-18 13:00               ` erik quanstrom
2011-04-18 13:11                 ` Lucio De Re
2011-04-23  3:48                   ` Ethan Grammatikidis
2011-04-21  9:22       ` Balwinder S Dheeman
2011-04-16 16:58 ` erik quanstrom
2011-04-19 15:36 ` Charles Forsyth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=104fd12b2053326254839134b6464e6c@ladd.quanstro.net \
    --to=quanstro@quanstro.net \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).