From mboxrd@z Thu Jan  1 00:00:00 1970
References: <f350de69a2df1fb900b5f1f54cd12826@rei2.9hal>
	<9F03A819-F521-407C-A6BD-13A04A3AC877@lsub.org>
	<20120518222257.386CFB827@mail.bitblocks.com>
From: Francisco J Ballesteros <nemo@lsub.org>
Content-Type: multipart/alternative;
	boundary=Apple-Mail-8ADA6F06-3BA9-41BE-90CE-CF3D126F8399
In-Reply-To: <20120518222257.386CFB827@mail.bitblocks.com>
Message-Id: <23ED89F3-F760-428A-8CF4-0A046F52675B@lsub.org>
Date: Sat, 19 May 2012 00:45:58 +0200
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (1.0)
Subject: Re: [9fans] The creepy WORM. (was: Re: Thinkpad T61 Installation
	Experience)
Topicbox-Message-UUID: 926fb0e0-ead7-11e9-9d60-3106f5b1d025


--Apple-Mail-8ADA6F06-3BA9-41BE-90CE-CF3D126F8399
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii




using ipad keyboard. excuse any typos.

>>=20
>=20
> Just curious.
> If the tree doesn't fit in memory, how do you decide who to
> kick out? LRU? Sounds much like a cache fs. What does it buy
> you over existing cache filesystems? Speaking more generally,
> not just in the plan9 context.
>=20
>=20

lru for clean blocks. but you really have the tree you use in memory, all if=
 it fits.
what it buys is simplicity, thus reliability, and speed.
instead of a single program doing everything, you have several trying to use=

their memory and to avoid copying blocks in the main server.
plus, it's going to be modified to exploit the upcoming nix zero copy framew=
ork.


>>          The disk is organized as a log of blocks. When a new version
>>          of the tree must be written to disk, all blocks that changed
>>          are given disk addresses and are appended to the log. Once
>>          written, they are frozen.  If new changes are made to the
>>          tree, blocks are melted and forget their previous addresses:
>>          each time they are written again, they are assigned new
>>          ones.
>=20
> I don't understand use of the words frozen & melted here.  How
> is this different from how things work now? Something worse
> than what venti or zfs do, which is to leave the old blocks
> alone and allocate new space for new blocks.
>=20

it's not cow. you reuse the memory of a frozen block instead of copying.
you just melt it and reuse it.=20

all this is in memory. cow happens only on the disk, but you don't wait for t=
hat.
that's the main difference wrt others.



>>          When the disk gets full, all reachable blocks are marked and
>>          all other blocks are considered available for growing the
>>          log (this is a description of semantics, not of the imple-
>>          mentation). Thus, the log is circular but jumps to the next
>>          available block each time it grows.  If, after the mark pro-
>>          cess, the disk is still full, the file system becomes read
>>          only but for removing files.
>=20
> Why does circularity matter? It would make more sense to allocate
> new blocks for a given file near its existing blocks regardless of
> writing order.
>=20

for simplicity, I removed most of the fanciest things I had before in place i=
n
previous versions that could be a source of bugs. there are no ref. counters=
,
for example. it's designed to operate on
main memory, and it seems it does well even though the disk algorithms are
naive.


> Why not just use venti or some existing FS underneath than
> come up with a new disk format?
>=20

to avoid complexity, latency, and bugs.

it's now a set of tools, you can archive creepy into venti if you want, or a=
rchive
fossil into a creepy rip (it's the same program, actually).

for archival, you are going to use a pipe, and not a tcp connection.

you have a program half the size, or 1/4 depending on how you wc.

it takes half the time fossil takes in the silly tests I made, and you can u=
nderstand
the code the first time you read it, which is not trivial with the others, b=
ut for Ken's.

last, it's expected not to give you corrupted files despite power failures, w=
hich we
had in both fossil and venti (I'm not saying its their fault, our environmen=
t is bumpy).

that was the motivation, exploiting large main memories and keeping things s=
imple
and reliable. Time will tell if we managed to achieve that or not :)

sorry I wrote in Sioux this time. its been a long day here :)

> Sounds like a fun project but it would be nice to see the
> rationale for it.
>=20
> Thanks!

--Apple-Mail-8ADA6F06-3BA9-41BE-90CE-CF3D126F8399
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head></head><body bgcolor=3D"#FFFFFF"><div><br><br><br><div>using ipa=
d keyboard. excuse any typos.<div><br></div></div></div><blockquote type=3D"=
cite"><div><blockquote type=3D"cite"><font class=3D"Apple-style-span" color=3D=
"#000000"><br></font></blockquote><span></span><br><span>Just curious.</span=
><br><span>If the tree doesn't fit in memory, how do you decide who to</span=
><br><span>kick out? LRU? Sounds much like a cache fs. What does it buy</spa=
n><br><span>you over existing cache filesystems? Speaking more generally,</s=
pan><br><span>not just in the plan9 context.</span><br><span></span><br><spa=
n></span><br></div></blockquote><div><br></div><div>lru for clean blocks. bu=
t you really have the tree you use in memory, all if it fits.</div><div>what=
 it buys is simplicity, thus reliability, and speed.</div><div>instead of a s=
ingle program doing everything, you have several trying to use</div><div>the=
ir memory and to avoid copying blocks in the main server.</div><div>plus, it=
's going to be modified to exploit the upcoming nix zero copy framework.</di=
v><div><br></div><br><blockquote type=3D"cite"><div><blockquote type=3D"cite=
"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The disk is o=
rganized as a log of blocks. When a new version</span><br></blockquote><bloc=
kquote type=3D"cite"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;of the tree must be written to disk, all blocks that changed</span><br=
></blockquote><blockquote type=3D"cite"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;are given disk addresses and are appended to the lo=
g. Once</span><br></blockquote><blockquote type=3D"cite"><span> &nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;written, they are frozen. &nbsp;If=
 new changes are made to the</span><br></blockquote><blockquote type=3D"cite=
"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tree, blocks a=
re melted and forget their previous addresses:</span><br></blockquote><block=
quote type=3D"cite"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;each time they are written again, they are assigned new</span><br></blo=
ckquote><blockquote type=3D"cite"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;ones.</span><br></blockquote><span></span><br><span>I don=
't understand use of the words frozen &amp; melted here. &nbsp;How</span><br=
><span>is this different from how things work now? Something worse</span><br=
><span>than what venti or zfs do, which is to leave the old blocks</span><br=
><span>alone and allocate new space for new blocks.</span><br><span></span><=
br></div></blockquote><div><br></div><div>it's not cow. you reuse the memory=
 of a frozen block instead of copying.</div><div>you just melt it and reuse i=
t.&nbsp;</div><div><br></div><div>all this is in memory. cow happens only on=
 the disk, but you don't wait for that.</div><div>that's the main difference=
 wrt others.</div><div><br></div><div><br></div><br><blockquote type=3D"cite=
"><div><blockquote type=3D"cite"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;When the disk gets full, all reachable blocks are marked a=
nd</span><br></blockquote><blockquote type=3D"cite"><span> &nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;all other blocks are considered availab=
le for growing the</span><br></blockquote><blockquote type=3D"cite"><span> &=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;log (this is a descript=
ion of semantics, not of the imple-</span><br></blockquote><blockquote type=3D=
"cite"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mentatio=
n). Thus, the log is circular but jumps to the next</span><br></blockquote><=
blockquote type=3D"cite"><span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;available block each time it grows. &nbsp;If, after the mark pro-<=
/span><br></blockquote><blockquote type=3D"cite"><span> &nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cess, the disk is still full, the file sys=
tem becomes read</span><br></blockquote><blockquote type=3D"cite"><span> &nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;only but for removing fil=
es.</span><br></blockquote><span></span><br><span>Why does circularity matte=
r? It would make more sense to allocate</span><br><span>new blocks for a giv=
en file near its existing blocks regardless of</span><br><span>writing order=
.</span><br><span></span><br></div></blockquote><div><br></div><div>for simp=
licity, I removed most of the fanciest things I had before in place in</div>=
<div>previous versions that could be a source of bugs. there are no ref. cou=
nters,</div><div>for example. it's designed to operate on</div><div>main mem=
ory, and it seems it does well even though the disk algorithms are</div><div=
>naive.</div><div><br></div><br><blockquote type=3D"cite"><div><span>Why not=
 just use venti or some existing FS underneath than</span><br><span>come up w=
ith a new disk format?</span><br><span></span><br></div></blockquote><div><b=
r></div><div>to avoid complexity, latency, and bugs.</div><div><br></div><di=
v>it's now a set of tools, you can archive creepy into venti if you want, or=
 archive</div><div>fossil into a creepy rip (it's the same program, actually=
).</div><div><br></div><div>for archival, you are going to use a pipe, and n=
ot a tcp connection.</div><div><br></div><div>you have a program half the si=
ze, or 1/4 depending on how you wc.</div><div><br></div><div>it takes half t=
he time fossil takes in the silly tests I made, and you can understand</div>=
<div>the code the first time you read it, which is not trivial with the othe=
rs, but for Ken's.</div><div><br></div><div>last, it's expected not to give y=
ou corrupted files despite power failures, which we</div><div>had in both fo=
ssil and venti (I'm not saying its their fault, our environment is bumpy).</=
div><div><br></div><div>that was the motivation, exploiting large main memor=
ies and keeping things simple</div><div>and reliable. Time will tell if we m=
anaged to achieve that or not :)</div><div><br></div><div>sorry I wrote in S=
ioux this time. its been a long day here :)</div><br><blockquote type=3D"cit=
e"><div><span>Sounds like a fun project but it would be nice to see the</spa=
n><br><span>rationale for it.</span><br><span></span><br><span>Thanks!</span=
><br></div></blockquote></body></html>=

--Apple-Mail-8ADA6F06-3BA9-41BE-90CE-CF3D126F8399--