From mboxrd@z Thu Jan 1 00:00:00 1970 To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> In-reply-to: Your message of "Thu, 21 Apr 2011 18:41:25 EDT." <9482032322d5daaadceace1f6875dad3@coraid.com> References: <255556ff42dac9585ddf5e7f766d7175@hamnavoe.com> <20110421211046.C474DB835@mail.bitblocks.com> <9482032322d5daaadceace1f6875dad3@coraid.com> Date: Fri, 22 Apr 2011 01:03:52 -0700 From: Bakul Shah Message-Id: <20110422080352.DF703B835@mail.bitblocks.com> Subject: Re: [9fans] Q: moving directories? hard links? Topicbox-Message-UUID: d23902ea-ead6-11e9-9d60-3106f5b1d025 On Thu, 21 Apr 2011 18:41:25 EDT erik quanstrom wrote: > > IIRC companies such as Panasas separate file names and other > > metadata from file storage. One way to get a single FS > > namespace that spans multiple disks or nodes for increasing > > data redundancy, file size beyond the largest disk size, > > throughput (and yes, complexity). > > that certainly does seem like the hard way to do things. > why should the structure of the data depend on where it's > located? certainly ken's fs doesn't change the format of > the worm if you concatinate several devices for the worm > or use just one. ? It all boils down to having to cope with individual units' limits and failures. If a file needs to be larger than the capacity of the largest disk, you stripe data across multiple disks. To handle disk failures you use mirroring or parity across multiple disks. To increase performance beyond what a single controller can do, you add multiple disk controllers. When you want higher capacity and throughput than is possible on a single node, you use a set of nodes, and stripe data across them. To handle a single node failure you mirror data across multiple nodes. To support increased lookups & metadata operations, you separate metadata storage & nodes from file storage & nodes as lookups + metadata have a different access pattern from file data access. To handle more concurrent access you add more net bandwidth and balance it across nodes. >>From an adminstrative point of view a single global namespace is much easier to manage. One should be able to add or replace individual units (disks, nodes, network capacity) quickly as and when needed without taking the FS down (to reduce administrative costs and to avoid any downtime). Then you have to worry about backups (on and offsite). In such a complex system, the concept of a single `volume' doesn't work well. In any case, users don't care about what data layout is used as long as the system can grow to fill their needs.