From mboxrd@z Thu Jan 1 00:00:00 1970 To: 9fans@cse.psu.edu From: anothy@cosym.net MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Message-Id: <20011209025927.703EF199DD@mail.cse.psu.edu> Subject: [9fans] metadata in file systems Date: Sat, 8 Dec 2001 21:59:09 -0500 Topicbox-Message-UUID: 31604920-eaca-11e9-9e20-41e7f4b1d025 this is longer than i meant for it to be, but whatever. it's a bit more radical than the "store attributes in some file" approach, but what do people here think of the approach to file system desing represented by MacOS's HFS+ and BeOS's BFS (and presumably others, although those are the ones i've used)? at a minimum, it seems many of the metadata issues we're talking about are addressed by these systems and their file's "attributes" or "resource forks". they've worked in those systems, and have found application in more recent systems, like AtheOS (which i've never used, so can't comment inteligently on), as well. [please avoid the obvious criticisms of the implementations named above, like case-insensativity - i'm interested in the ideas more generally than these implementations] in particular, many (certainly not all) of the problems with managing (or at least effectivly storing) metadata are addressed by moving the question to fs-level rather than app-level. the file servers could track things like a mime type for a file in much the same way it tracks other metadata currently - things like mod time, owner, and so on. it's in a good position to do this, potentially getting the information through the established means (see below for the "potentially"). problems remain, of course. the most obvious should be the broad nature of the change - it'd require either changes to syscalls like stat, "alternate" versions of them (eg "nstat" returning the "augmented" Dir structure), or entirely new syscalls (eg "metadata" returning just the new info). the first option is probably cleaner (keeps the number of syscalls and 9p messages down, reduces the additional pain to write a file server), but the second and third reduce the pain of updating existing programs. the next obvious concern is the need for tools to use/manipulate this metadata. this probably isn't that big a deal, at least in one direction (more on that in just a second), but it would need to be done. something along the lines of additions to ls and a new member of the chmod/chgrp family. simple versions of this scheme work well in one direction (given a file, return its metadata), but less so in the other (given some metadata, find matching files). particularly in the plan9 world, it's dificult to see how to do this without simply walking a tree. and while that's not too exciting, maybe it's okay; we don't do anything else for the metadata the fs currently stores, like owner or mode. the only place i see it becoming really problematic is on a jukebox. but again, this isn't any different from the situation now. the alternative is for either the fs itself or some user-land application to cache the data somewhere. each has problmes. the fs seems in a more promising position: it's guaranteed to get notice of any changes, and can best schedule any indexing that needs to be done. but in a plan 9 world, the fs can't know what things look like to the user (and user-land apps). while a user-land program is in a better position to know what the acutal namespace looks like, accounting for changes to files would involve a good bit of work, and the namespace could still change underfoot. the only place having file servers keep track of metadata more activly would be nice is to implement something like BeOS's "persistent queries". there you could basically say "show me all files image newer than Dec 5 2001, larger than 1KB" and it would, continually updating as things entered or left the set. since BeOS allowed a wide (arbitrary?) range of metadata, queries like this could be used even for things like monitoring new mail. while that capability's interesting, it (as far as i can tell) mandates putting those smarts into the fs, and i'm not sure the benifits are worth the cost (significantly increased complexity in the server, and the need for some way to get queries in/out). but having the _data_ stored by the fs seems like an interesting angle. anyway, that's more than enough out of me for a bit. thoughts? ア