From mboxrd@z Thu Jan  1 00:00:00 1970
MIME-Version: 1.0
In-Reply-To: <AANLkTi=WjAo36Vm8qDNYVvy-diVNAW5wRfz2Rcy2OKL3@mail.gmail.com>
References: <AANLkTi=kG6yRsA9dRxAHPwHv8XEkDBrJ6e+d3h3pJeU3@mail.gmail.com>
	<6c51f31b2d39e9ec0ea5f2e1a9eb0b2b@terzarima.net>
	<AANLkTi=WjAo36Vm8qDNYVvy-diVNAW5wRfz2Rcy2OKL3@mail.gmail.com>
From: Venkatesh Srinivas <me@endeavour.zapto.org>
Date: Wed, 27 Oct 2010 10:48:01 -0400
Message-ID: <AANLkTimeejf8cabWxtz+8YOKgLyGdQg5EPk55ymBzMDT@mail.gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Content-Type: multipart/alternative; boundary=0016364586c811201c04939a51ec
Subject: Re: [9fans] A little more ado about async Tclunk
Topicbox-Message-UUID: 6ef3c67a-ead6-11e9-9d60-3106f5b1d025

--0016364586c811201c04939a51ec
Content-Type: text/plain; charset=UTF-8

First, let me hop up and down and say -- this was me seeing what could be
accomplished in roughly two hours of hacking; this is by-far not the best
(or even a near approximation of) way to accomplish delayed TClunk. There
was no thought given at all to how to handle build-up of files to close or
how to overlap sending TClunks. I do track the peak number of outstanding
files to close; in this simple implementation, which fcp -R 16 -W 16-ing
down the p9 kernel sources, the peak waiting files to close I've seen was
10. I did not try out using more than one clunk process.

brucee solved this ten years ago; I'd love to hear how. Plan 9 has had
delayed close for tearing down fdgrps for nearly 4 years; dual-purposing
that logic to handle this case would be very straightforward.

erik quanstrom wrote:
>
> how do you flow control the clunking processes?  are you using
> qio?  if so, what's the size of the buffer?
>
> also, do you have any test results for different latentices?  how
> about a local network with ~0.5ms rtt and a long distance connection
> at ~150ms?
>
> have you tried this in user space?  one could imagine passing the
> fds to a seperate proc that does the closes without involving the kernel.
>

The clunking process just grabs Chans to close off a list, clunks one at a
time, and then sleeps if the list is empty. Its woken by a defered clunk
call. I tried out a little hysteresis there, to allow it to sleep for longer
but to burst close requests above another threshold, but that was
considerably slower overall than even the default (sync Clunk case). I do
not use qio. There is no limit on the number of outstanding files.

If you'd like to look it over, the main diff is visible at:
http://code.google.com/p/inferno-npe/source/detail?r=09a2e719616e1c842e602485edee9d03020909a6;
plan9 folks will notice a strong similarity to 9/port/chan.c:ccloseq
and
:closeproc.

I do not have tests at different latencies (reliable ones anyway); for
whatever reason, sources.cs.bell-labs.com was varying between 20ms and 300ms
from me yesterday, even in short timespans. The code is very easy to test
though - just grab and build inferno-npe and pass the '-j' flag to mount in
addition to whatever you'd normally use. You can monitor the number of files
waiting to close by reading /dev/vmstat.

I've not tried this in userspace (recently). Wes and I did try it about a
year ago as a side to the Journal Callbacks work; I'd have to look back to
see what the numbers looked like then, but iirc there was a significant
penalty to using our user-wrapper process and rewriting lots of 9p traffic,
though that was one of my first Limbo programs and could massively be
improved upon.

Eric Van Hensbergen wrote:
> More specifically, with the semantic restrictions that VS imposes, are
> their remaining semantic problems when using this with cache-able file
> systems?

All this approach does is delay TClunk messages, generally very little, and
only on files not marked ORCLOSE (perhaps it should also consider OEXCL) and
only on mounts flagged Sys->MCACHE. It would certainly be possible to write
a file server for which this approach is bad, but for the vast majority of
ones, this is a safe optimization. It does not depend on MCACHE caching;
inferno's port/cache.c is just a stub.

-- vs

--0016364586c811201c04939a51ec
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div class=3D"gmail_quote"><div>First, let me hop up and down and say -- th=
is was me seeing what could be accomplished in roughly two hours of hacking=
; this is by-far not the best (or even a near approximation of) way to acco=
mplish delayed TClunk. There was no thought given at all to how to handle b=
uild-up of files to close or how to overlap sending TClunks. I do track the=
 peak number of outstanding files to close; in this simple implementation, =
which fcp -R 16 -W 16-ing down the p9 kernel sources, the peak waiting file=
s to close I&#39;ve seen was 10. I did not try out using more than one clun=
k process.<br>

<br>brucee solved this ten years ago; I&#39;d love to hear how. Plan 9 has =
had delayed close for tearing down fdgrps for nearly 4 years; dual-purposin=
g that logic to handle this case would be very straightforward. <br><br>

erik quanstrom wrote:<br>&gt;<br>
&gt; how do you flow control the clunking processes? =C2=A0are you using<br=
>
&gt; qio? =C2=A0if so, what&#39;s the size of the buffer?<br>
&gt; <br>
&gt; also, do you have any test results for different latentices? =C2=A0how=
<br>
&gt; about a local network with ~0.5ms rtt and a long distance connection<b=
r>
&gt; at ~150ms?<br>
&gt;<br>
&gt; have you tried this in user space? =C2=A0one could imagine passing the=
<br>
&gt; fds to a seperate proc that does the closes without involving the kern=
el.<br>&gt;<br><br>The clunking process just grabs Chans to close off a lis=
t, clunks one at a time, and then sleeps if the list is empty. Its woken by=
 a defered clunk call. I tried out a little hysteresis there, to allow it t=
o sleep for longer but to burst close requests above another threshold, but=
 that was considerably slower overall than even the default (sync Clunk cas=
e). I do not use qio. There is no limit on the number of outstanding files.=
<br>

<br>If you&#39;d like to look it over, the main diff is visible=C2=A0at: <a=
 href=3D"http://code.google.com/p/inferno-npe/source/detail?r=3D09a2e719616=
e1c842e602485edee9d03020909a6">http://code.google.com/p/inferno-npe/source/=
detail?r=3D09a2e719616e1c842e602485edee9d03020909a6</a> ; plan9 folks will =
notice a strong similarity to 9/port/chan.c:ccloseq and :closeproc.<br>

<br>I do not have tests at different latencies (reliable ones anyway); for =
whatever reason, <a href=3D"http://sources.cs.bell-labs.com">sources.cs.bel=
l-labs.com</a> was varying between 20ms and 300ms from me yesterday, even i=
n short timespans. The code is very easy to test though - just grab and bui=
ld inferno-npe and pass the &#39;-j&#39; flag to mount in addition to whate=
ver you&#39;d normally use. You can monitor the number of files waiting to =
close by reading /dev/vmstat. <br>

<br>I&#39;ve not tried this in userspace (recently). Wes and I did try it a=
bout a year ago as a side to the Journal Callbacks work; I&#39;d have to lo=
ok back to see what the numbers looked like then, but iirc there was a sign=
ificant penalty to using our user-wrapper process and rewriting lots of 9p =
traffic, though that was one of my first Limbo programs and could massively=
 be improved upon.=C2=A0<br>

<br>Eric Van Hensbergen wrote:<br><font color=3D"#888888"></font>&gt; More =
specifically, with the semantic restrictions that VS imposes, are<br>&gt; t=
heir remaining semantic problems when using this with cache-able file<br>

&gt; systems?<br>
<font color=3D"#888888"></font><font color=3D"#888888"></font></div><div><b=
r>All this approach does is delay TClunk messages, generally very little, a=
nd only on files not marked ORCLOSE (perhaps it should also consider OEXCL)=
 and only on mounts flagged Sys-&gt;MCACHE. It would certainly be possible =
to write a file server for which this approach is bad, but for the vast maj=
ority of ones, this is a safe optimization. It does not depend on MCACHE ca=
ching; inferno&#39;s port/cache.c is just a stub.<br>

<br>-- vs<br></div></div>

--0016364586c811201c04939a51ec--