From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Sun, 24 Apr 2011 14:58:30 -0400 To: 9fans@9fans.net Message-ID: <104fd12b2053326254839134b6464e6c@ladd.quanstro.net> In-Reply-To: <20110422174708.26B5CB827@mail.bitblocks.com> References: <255556ff42dac9585ddf5e7f766d7175@hamnavoe.com> <20110421211046.C474DB835@mail.bitblocks.com> <9482032322d5daaadceace1f6875dad3@coraid.com> <20110422080352.DF703B835@mail.bitblocks.com> <770447f0b2403cebfda8add66dfca662@ladd.quanstro.net> <20110422174708.26B5CB827@mail.bitblocks.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Subject: Re: [9fans] Q: moving directories? hard links? Topicbox-Message-UUID: d395b96c-ead6-11e9-9d60-3106f5b1d025 > > since inside a disk drive, there is also striping across platters and > > wierd remapping games (and then there's flash), and i don't see > > any justification for calling this a "different fs layout". you wouldn't > > say you changed datastructures if you use 8x1gb dimms instead of > > 4x2gb, would you? > > I am not getting through.... > > Check out some papers on http://www.cs.cmu.edu/~garth/ > See http://www.pdl.cmu.edu/PDL-FTP/NASD/asplos98.pdf for instance. no you're getting through. i just don't accept the zfs-inspired theory that the filesystem must do these things. or the other theory that is a one gigantic fs with supposedly global visibilty through redundency layers, etc. is the only game in town. since two theories are better than one, this often becomes the one big ball of goo grand unification theory of storage. it could be this has a lot of advantages, but i can't overcome the gut (sorry) reaction that big complicated things are just too hard to get right in practice. maybe one can overcome this with arbitrarly many programmers. but if i thought that were the way to go, i wouldn't bother with 9fans. it seems to me that there are layers we really have no visibility into already. disk drives give one lbas these days. it's important to remember that "l" stands for logical; that is, we cannot infer with certainty where a block is stored based on its lba. thus an elevator algorithm might do exactly the wrong thing. the difference can be astounding— like 250ms. and if there's trouble reading a sector, this can slow things by a second or so. further, we often deal with virtualized things like drive arrays, drive caches, flash caches, ssds or lvm-like things that make it even harder for a fs to out-guess the storage. i suppose there are two ways to go with this situation, remove all the layers between you and the storage controller and hope that you can out-guess what's left, or you can give up, declare storage opaque and leave that up to the storage guys. i'm taking the second position. it allows for inovation in layers below the fs without changing the fs. it's surprising to me that venti hasn't opened more eyes to what's possible with block storage. unfortunately venti, is optimized for deduplication, not performance. except for backup, this is the opposite of what one wants. just as a simple counter example to your 10.5pb example, consider a large vmware install. you may have 1000s of vmfs "files" layered on your san but file i/o does not couple them; they're all independent. i don't see how a large fs would help at all. - erik