From mboxrd@z Thu Jan  1 00:00:00 1970
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
In-reply-to: Your message of "Sat, 19 May 2012 00:45:58 +0200."
	<23ED89F3-F760-428A-8CF4-0A046F52675B@lsub.org>
References: <f350de69a2df1fb900b5f1f54cd12826@rei2.9hal>
	<9F03A819-F521-407C-A6BD-13A04A3AC877@lsub.org>
	<20120518222257.386CFB827@mail.bitblocks.com>
	<23ED89F3-F760-428A-8CF4-0A046F52675B@lsub.org>
Date: Sat, 19 May 2012 20:13:08 -0700
From: Bakul Shah <bakul@bitblocks.com>
Message-Id: <20120520031308.C9D3EB827@mail.bitblocks.com>
Subject: Re: [9fans] The creepy WORM. (was: Re: Thinkpad T61 Installation
	Experience)
Topicbox-Message-UUID: 936a0018-ead7-11e9-9d60-3106f5b1d025

On Sat, 19 May 2012 00:45:58 +0200 Francisco J Ballesteros <nemo@lsub.org>  wrote:

> > Just curious.
> > If the tree doesn't fit in memory, how do you decide who to
> > kick out? LRU? Sounds much like a cache fs. What does it buy
> > you over existing cache filesystems? Speaking more generally,
> > not just in the plan9 context.
>
> lru for clean blocks. but you really have the tree you use in memory, all if
> it fits.  what it buys is simplicity, thus reliability, and speed.
> instead of a single program doing everything, you have several trying to use
> their memory and to avoid copying blocks in the main server.
> plus, it's going to be modified to exploit the upcoming nix zero copy framework.

This last point is more or less independent of the FS (as long
as an io buffer is page aligned and io count is a multiple of
page size).

> it's not cow. you reuse the memory of a frozen block instead of copying.
> you just melt it and reuse it.

> all this is in memory. cow happens only on the disk, but you don't wait for that.
> that's the main difference wrt others.

How often would you flush to disk? You still need to worry about the order
of writing metadata.

> >>          When the disk gets full, all reachable blocks are marked and
> >>          all other blocks are considered available for growing the
> >>          log (this is a description of semantics, not of the imple-
> >>          mentation). Thus, the log is circular but jumps to the next
> >>          available block each time it grows.  If, after the mark pro-
> >>          cess, the disk is still full, the file system becomes read
> >>          only but for removing files.
> >
> > Why does circularity matter? It would make more sense to allocate
> > new blocks for a given file near its existing blocks regardless of
> > writing order.
>
> for simplicity, I removed most of the fanciest things I had before in place in
> previous versions that could be a source of bugs. there are no ref. counters,
> for example. it's designed to operate on
> main memory, and it seems it does well even though the disk algorithms are
> naive.

You do have to keep track of free disk blocks. On disk. So a linked list
would require you visit every freed block.

> > Why not just use venti or some existing FS underneath than
> > come up with a new disk format?
>
> to avoid complexity, latency, and bugs.

I think an incore FS the easy case but you will have to face
the issues of corner/boundary cases, various error conditions
and efficiency when dealing with real disks. These things are
what introduce complexity and bugs. "Soft updates" in FFS took
quite a while shake out bugs.  zfs took a long time.  Hammer
fs of DragonFly took a while. Pretty much every FS design has
taken a while to be rock solid.  Far longer than the original
estimates of the designers I think.

> that was the motivation, exploiting large main memories and keeping things
> simple and reliable. Time will tell if we managed to achieve that or not :)

Ah. I have been looking at SBCs with memories in the 128MB to
512MB range! Can't afford an incore FS!

But even if there is gigabytes of memory why would I want to
dedicate a lot of it to a filesystem? Most of the FS data is
going to be "cold" most of the time. When you suddenly need
lots of memory for some memory intensive computation, it may
be too late to evacuate the memory of your FS data. But
memory is just a file cache, this data can be thrown away any
time if you need more space.  And by making sure the cache
holds a bounded amount of dirty data, you lose no more than
that amount of data in case of a crash.

> sorry I wrote in Sioux this time. its been a long day here :)

Thanks for taking the time. Always nice to see yet another
attempt at getting this right :-)