From mboxrd@z Thu Jan 1 00:00:00 1970 From: "John E. Barham" To: 9fans@cse.psu.edu Message-id: <029c01c3fefe$95f0a430$6539a8c0@hpn5415> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT Subject: [9fans] Distributed filesystems: Plan 9 vs. Linux Date: Sun, 29 Feb 2004 11:59:48 -0800 Topicbox-Message-UUID: 04bccee0-eacd-11e9-9e20-41e7f4b1d025 I'm working w/ a system that stores 10's of thousands of high-definition images per project (think movies). Storing all of the files on a single file server is not feasible (even w/ 3 x 1.8 TB arrays per Linux file server) and even if it were possible would be sub-optimal from a network standpoint since literally hundreds of image processing nodes could be hitting the server simultaneously. Keeping track of where the files for a particular project are physically stored is a nuisance and it can be time-consuming to track down the location of a particular file, esp. for some of the less technically minded users (e.g., artists). Our solution is to develop a virtual mapping service that maps logical URLs (e.g., /project/shot/frame010000.tiff) to physical location (e.g., /server10/home/user/project/shot/frame010000.tiff), but this requires adding support to client applications to work properly. Cumbersome but still a lot cheaper than buying high-end dedicated storage solutions, and even then we'd run out of space. (We considered using HTTP redirection but couldn't see how to make this work efficiently for PUT operations.) It occurred to me that Plan 9 has already solved this problem by being able to (securely) mount remote filesystems and do a union on directories. (Correct me if I'm wrong, but IIRC creating a file in a unioned folder would add the file to the original folder location.) Even something as seemingly simple as collecting stats on drive usage per server is a pain on Linux. For the moment we're running du over ssh using an expect style module in Python. Again, Plan 9 would make that trivial either by mounting the remote file server directly, or possibly running the script after doing a cpu connection. Anyway, it's a truism that the more powerful hardware gets (in both CPU power and storage capacity) the more ways we find to use it to capacity, so mechanisms that Plan 9 provides to present a seamless view of distributed resources are becoming more necessary, not less. John P.S. Forgive me if I'm hazy on the details of the Plan 9 commands, but the website appears to be down at the moment...