From mboxrd@z Thu Jan 1 00:00:00 1970 References: <9F03A819-F521-407C-A6BD-13A04A3AC877@lsub.org> <20120518222257.386CFB827@mail.bitblocks.com> From: Francisco J Ballesteros Content-Type: multipart/alternative; boundary=Apple-Mail-8ADA6F06-3BA9-41BE-90CE-CF3D126F8399 In-Reply-To: <20120518222257.386CFB827@mail.bitblocks.com> Message-Id: <23ED89F3-F760-428A-8CF4-0A046F52675B@lsub.org> Date: Sat, 19 May 2012 00:45:58 +0200 To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (1.0) Subject: Re: [9fans] The creepy WORM. (was: Re: Thinkpad T61 Installation Experience) Topicbox-Message-UUID: 926fb0e0-ead7-11e9-9d60-3106f5b1d025 --Apple-Mail-8ADA6F06-3BA9-41BE-90CE-CF3D126F8399 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii using ipad keyboard. excuse any typos. >>=20 >=20 > Just curious. > If the tree doesn't fit in memory, how do you decide who to > kick out? LRU? Sounds much like a cache fs. What does it buy > you over existing cache filesystems? Speaking more generally, > not just in the plan9 context. >=20 >=20 lru for clean blocks. but you really have the tree you use in memory, all if= it fits. what it buys is simplicity, thus reliability, and speed. instead of a single program doing everything, you have several trying to use= their memory and to avoid copying blocks in the main server. plus, it's going to be modified to exploit the upcoming nix zero copy framew= ork. >> The disk is organized as a log of blocks. When a new version >> of the tree must be written to disk, all blocks that changed >> are given disk addresses and are appended to the log. Once >> written, they are frozen. If new changes are made to the >> tree, blocks are melted and forget their previous addresses: >> each time they are written again, they are assigned new >> ones. >=20 > I don't understand use of the words frozen & melted here. How > is this different from how things work now? Something worse > than what venti or zfs do, which is to leave the old blocks > alone and allocate new space for new blocks. >=20 it's not cow. you reuse the memory of a frozen block instead of copying. you just melt it and reuse it.=20 all this is in memory. cow happens only on the disk, but you don't wait for t= hat. that's the main difference wrt others. >> When the disk gets full, all reachable blocks are marked and >> all other blocks are considered available for growing the >> log (this is a description of semantics, not of the imple- >> mentation). Thus, the log is circular but jumps to the next >> available block each time it grows. If, after the mark pro- >> cess, the disk is still full, the file system becomes read >> only but for removing files. >=20 > Why does circularity matter? It would make more sense to allocate > new blocks for a given file near its existing blocks regardless of > writing order. >=20 for simplicity, I removed most of the fanciest things I had before in place i= n previous versions that could be a source of bugs. there are no ref. counters= , for example. it's designed to operate on main memory, and it seems it does well even though the disk algorithms are naive. > Why not just use venti or some existing FS underneath than > come up with a new disk format? >=20 to avoid complexity, latency, and bugs. it's now a set of tools, you can archive creepy into venti if you want, or a= rchive fossil into a creepy rip (it's the same program, actually). for archival, you are going to use a pipe, and not a tcp connection. you have a program half the size, or 1/4 depending on how you wc. it takes half the time fossil takes in the silly tests I made, and you can u= nderstand the code the first time you read it, which is not trivial with the others, b= ut for Ken's. last, it's expected not to give you corrupted files despite power failures, w= hich we had in both fossil and venti (I'm not saying its their fault, our environmen= t is bumpy). that was the motivation, exploiting large main memories and keeping things s= imple and reliable. Time will tell if we managed to achieve that or not :) sorry I wrote in Sioux this time. its been a long day here :) > Sounds like a fun project but it would be nice to see the > rationale for it. >=20 > Thanks! --Apple-Mail-8ADA6F06-3BA9-41BE-90CE-CF3D126F8399 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8



using ipa= d keyboard. excuse any typos.



Just curious.
If the tree doesn't fit in memory, how do you decide who to
kick out? LRU? Sounds much like a cache fs. What does it buy
you over existing cache filesystems? Speaking more generally,
not just in the plan9 context.



lru for clean blocks. bu= t you really have the tree you use in memory, all if it fits.
what= it buys is simplicity, thus reliability, and speed.
instead of a s= ingle program doing everything, you have several trying to use
the= ir memory and to avoid copying blocks in the main server.
plus, it= 's going to be modified to exploit the upcoming nix zero copy framework.


         The disk is o= rganized as a log of blocks. When a new version
        =  of the tree must be written to disk, all blocks that changed
     = ;    are given disk addresses and are appended to the lo= g. Once
  =        written, they are frozen.  If= new changes are made to the
         tree, blocks a= re melted and forget their previous addresses:
        &= nbsp;each time they are written again, they are assigned new
      = ;   ones.

I don= 't understand use of the words frozen & melted here.  Howis this different from how things work now? Something worsethan what venti or zfs do, which is to leave the old blocksalone and allocate new space for new blocks.
<= br>

it's not cow. you reuse the memory= of a frozen block instead of copying.
you just melt it and reuse i= t. 

all this is in memory. cow happens only on= the disk, but you don't wait for that.
that's the main difference= wrt others.



      =    When the disk gets full, all reachable blocks are marked a= nd
   = ;      all other blocks are considered availab= le for growing the
&= nbsp;        log (this is a descript= ion of semantics, not of the imple-
         mentatio= n). Thus, the log is circular but jumps to the next
<= blockquote type=3D"cite">        &n= bsp; available block each time it grows.  If, after the mark pro-<= /span>
   &n= bsp;     cess, the disk is still full, the file sys= tem becomes read
&nb= sp;        only but for removing fil= es.

Why does circularity matte= r? It would make more sense to allocate
new blocks for a giv= en file near its existing blocks regardless of
writing order= .


for simp= licity, I removed most of the fanciest things I had before in place in
=
previous versions that could be a source of bugs. there are no ref. cou= nters,
for example. it's designed to operate on
main mem= ory, and it seems it does well even though the disk algorithms are
naive.


Why not= just use venti or some existing FS underneath than
come up w= ith a new disk format?

to avoid complexity, latency, and bugs.

it's now a set of tools, you can archive creepy into venti if you want, or= archive
fossil into a creepy rip (it's the same program, actually= ).

for archival, you are going to use a pipe, and n= ot a tcp connection.

you have a program half the si= ze, or 1/4 depending on how you wc.

it takes half t= he time fossil takes in the silly tests I made, and you can understand
=
the code the first time you read it, which is not trivial with the othe= rs, but for Ken's.

last, it's expected not to give y= ou corrupted files despite power failures, which we
had in both fo= ssil and venti (I'm not saying its their fault, our environment is bumpy).

that was the motivation, exploiting large main memor= ies and keeping things simple
and reliable. Time will tell if we m= anaged to achieve that or not :)

sorry I wrote in S= ioux this time. its been a long day here :)

Sounds like a fun project but it would be nice to see the
rationale for it.

Thanks!
= --Apple-Mail-8ADA6F06-3BA9-41BE-90CE-CF3D126F8399--