From mboxrd@z Thu Jan 1 00:00:00 1970 To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> In-reply-to: Your message of "Sat, 19 May 2012 00:45:58 +0200." <23ED89F3-F760-428A-8CF4-0A046F52675B@lsub.org> References: <9F03A819-F521-407C-A6BD-13A04A3AC877@lsub.org> <20120518222257.386CFB827@mail.bitblocks.com> <23ED89F3-F760-428A-8CF4-0A046F52675B@lsub.org> Date: Sat, 19 May 2012 20:13:08 -0700 From: Bakul Shah Message-Id: <20120520031308.C9D3EB827@mail.bitblocks.com> Subject: Re: [9fans] The creepy WORM. (was: Re: Thinkpad T61 Installation Experience) Topicbox-Message-UUID: 936a0018-ead7-11e9-9d60-3106f5b1d025 On Sat, 19 May 2012 00:45:58 +0200 Francisco J Ballesteros wrote: > > Just curious. > > If the tree doesn't fit in memory, how do you decide who to > > kick out? LRU? Sounds much like a cache fs. What does it buy > > you over existing cache filesystems? Speaking more generally, > > not just in the plan9 context. > > lru for clean blocks. but you really have the tree you use in memory, all if > it fits. what it buys is simplicity, thus reliability, and speed. > instead of a single program doing everything, you have several trying to use > their memory and to avoid copying blocks in the main server. > plus, it's going to be modified to exploit the upcoming nix zero copy framework. This last point is more or less independent of the FS (as long as an io buffer is page aligned and io count is a multiple of page size). > it's not cow. you reuse the memory of a frozen block instead of copying. > you just melt it and reuse it. > all this is in memory. cow happens only on the disk, but you don't wait for that. > that's the main difference wrt others. How often would you flush to disk? You still need to worry about the order of writing metadata. > >> When the disk gets full, all reachable blocks are marked and > >> all other blocks are considered available for growing the > >> log (this is a description of semantics, not of the imple- > >> mentation). Thus, the log is circular but jumps to the next > >> available block each time it grows. If, after the mark pro- > >> cess, the disk is still full, the file system becomes read > >> only but for removing files. > > > > Why does circularity matter? It would make more sense to allocate > > new blocks for a given file near its existing blocks regardless of > > writing order. > > for simplicity, I removed most of the fanciest things I had before in place in > previous versions that could be a source of bugs. there are no ref. counters, > for example. it's designed to operate on > main memory, and it seems it does well even though the disk algorithms are > naive. You do have to keep track of free disk blocks. On disk. So a linked list would require you visit every freed block. > > Why not just use venti or some existing FS underneath than > > come up with a new disk format? > > to avoid complexity, latency, and bugs. I think an incore FS the easy case but you will have to face the issues of corner/boundary cases, various error conditions and efficiency when dealing with real disks. These things are what introduce complexity and bugs. "Soft updates" in FFS took quite a while shake out bugs. zfs took a long time. Hammer fs of DragonFly took a while. Pretty much every FS design has taken a while to be rock solid. Far longer than the original estimates of the designers I think. > that was the motivation, exploiting large main memories and keeping things > simple and reliable. Time will tell if we managed to achieve that or not :) Ah. I have been looking at SBCs with memories in the 128MB to 512MB range! Can't afford an incore FS! But even if there is gigabytes of memory why would I want to dedicate a lot of it to a filesystem? Most of the FS data is going to be "cold" most of the time. When you suddenly need lots of memory for some memory intensive computation, it may be too late to evacuate the memory of your FS data. But memory is just a file cache, this data can be thrown away any time if you need more space. And by making sure the cache holds a bounded amount of dirty data, you lose no more than that amount of data in case of a crash. > sorry I wrote in Sioux this time. its been a long day here :) Thanks for taking the time. Always nice to see yet another attempt at getting this right :-)