From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
	HTML_MESSAGE,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham
	autolearn_force=no version=3.4.4
Received: (qmail 23294 invoked from network); 4 Apr 2023 17:16:14 -0000
Received: from tb-ob1.topicbox.com (64.147.108.173)
  by inbox.vuxu.org with ESMTPUTF8; 4 Apr 2023 17:16:14 -0000
Received: from tb-mx1.topicbox.com (tb-mx1.nyi.icgroup.com [10.90.30.61])
	by tb-ob1.topicbox.com (Postfix) with ESMTP id 5B59929E76
	for <ml@inbox.vuxu.org>; Tue,  4 Apr 2023 13:16:10 -0400 (EDT)
	(envelope-from bounce.mMa4bc3ee3ea80defbb239f382.r522be890-2105-11eb-b15e-8d699134e1fa@9fans.bounce.topicbox.com)
Received: by tb-mx1.topicbox.com (Postfix, from userid 1132)
	id 7A5C717A732A; Tue,  4 Apr 2023 13:16:10 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=9fans.net; h=from:to
	:message-id:date:mime-version:content-type
	:content-transfer-encoding:list-help:list-id:list-post
	:list-subscribe:reply-to:subject:list-unsubscribe; s=dkim-1; t=
	1680628570; x=1680714970; bh=tEVURg8/zQvqtD/OKupRMoQUbA+hNE2NOTN
	IKVSDVg8=; b=DaB/Wa+6a/7RKAzigoyHpNnRgDq6U5yZk5E5f2iqlN3R82zmHLx
	aXU+cK78qgyqU7eNjW+AfhCQIvkLgMvJq0zgV9YXNwn/RuQOOQii7R5myL5PjS5L
	SFwWSA8RedqpbLkRnofrWxmGaWtjE6g+hr4NpVALBZFlaI1QhaGiMUu8=
From: noam@pixelhero.dev
To: 9fans <9fans@9fans.net>
Message-Id: <16806285560.EA879D18D.575584@composer.9fans.topicbox.com>
Date: Tue, 4 Apr 2023 13:15:56 -0400
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=16806285561.Ea12aFb.575584
Content-Transfer-Encoding: 7bit
Topicbox-Policy-Reasoning: allow: sender is a member
Topicbox-Message-UUID: 5d060da2-d30c-11ed-81d5-5344262d11b0
Archived-At: =?UTF-8?B?PGh0dHBzOi8vOWZhbnMudG9waWNib3guY29tL2dyb3Vwcy85?=
 =?UTF-8?B?ZmFucy9UMzU0ZmU3MDJlMWU5ZDVlOS1NYTRiYzNlZTNlYTgwZGVmYmIyMzlm?=
 =?UTF-8?B?MzgyPg==?=
List-Help: <https://9fans.topicbox.com/groups/9fans>
List-Id: "9fans" <9fans.9fans.net>
List-Post: <mailto:9fans@9fans.net>
List-Software: Topicbox v0
List-Subscribe: <https://9fans.topicbox.com/groups/9fans>
Precedence: list
Reply-To: 9fans <9fans@9fans.net>
Subject: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
List-Unsubscribe: <https://9fans.topicbox.com/groups/9fans>,
 <mailto:9fans+unsubscribe@9fans.net?subject=x-tx-unsubscribe:2:9fans:437d30aa-c441-11e9-8a57-d036212d11b0:522be890-2105-11eb-b15e-8d699134e1fa:Ma4bc3ee3ea80defbb239f382:1:MHa5PCl7ACxyaWlWSGogoiIzR0Vj32GHk1KmOPwD6cY>
Topicbox-Delivery-ID:
 2:9fans:437d30aa-c441-11e9-8a57-d036212d11b0:522be890-2105-11eb-b15e-8d699134e1fa:Ma4bc3ee3ea80defbb239f382:1:gl-z95Sf8qbeFDDeghPzgmqJ6qSVSCptz8mWI7Qh6AE


--16806285561.Ea12aFb.575584
Date: Tue, 4 Apr 2023 13:15:56 -0400
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I've sporadically encountered a deadlock in fossil. Naturally, when your ro=
ot file system crashes, it can be hard to debug. My solution: stop having a=
 root file system. Was able to attach acid using mycroft's tooling from ANT=
S, and get a clean stack trace (https://pixelhero.dev/notebook/fossil/stack=
s/2023-04-03.1).

After a few hours yesterday (https://pixelhero.dev/notebook/fossil/2023-04-=
03.html <https://pixelhero.dev/2023-04-03-fossil.html>), I eventually track=
ed down the deadlock. When blockWrite is told to flush a clean block to dis=
k - i.e. one which is already flushed - it removes the block from the cache=
's free list, locks the block, detects that it's clean, and then... drops t=
he reference. While keeping the block locked. And in the cache.

This leak of the lock, of course, means that the *next* access to the block=
 - which is still in the cache! - hangs indefinitely. This is seen exactly =
in the stack trace:

_cacheLocal grabs the block from the cache, tries to lock it, and hangs ind=
efinitely. Worse, it does so under a call to fileWalk, which holds a differ=
ent lock, so the effect spreads out and makes even more of the file system =
inaccessible as well (the fileMetaFlush proc hangs waiting on this file loc=
k).

This patch just ensures we call blockPut on the BioClean path as well, thus=
 unlocking the block and readding it to the cache's free lists.

The patch is on my branch - https://git.sr.ht/~pixelherodev/plan9/commit/1b=
f8bd4f44e058261da7e89d87527b12073c9e0f - but I figured I should probably po=
st it here as well.

If anyone has any other patches that weren't in the 9legacy download as of =
~2018, please let me know! :)

---
sys/src/cmd/fossil/cache.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sys/src/cmd/fossil/cache.c b/sys/src/cmd/fossil/cache.c
index f473d211e..2fec44949 100644
--- a/sys/src/cmd/fossil/cache.c
+++ b/sys/src/cmd/fossil/cache.c
@@ -1203,8 +1203,10 @@ blockWrite(Block *b, int waitlock)
fprint(2, "%s: %d:%x:%d iostate is %d in blockWrite\n",
argv0, bb->part, bb->addr, bb->l.type, bb->iostate);
/* probably BioWriting if it happens? */
-                       if(bb->iostate =3D=3D BioClean)
+                       if(bb->iostate =3D=3D BioClean){
+                               blockPut(bb);
goto ignblock;
+                       }
}

blockPut(bb);
--

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-Ma4bc3=
ee3ea80defbb239f382
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

--16806285561.Ea12aFb.575584
Date: Tue, 4 Apr 2023 13:15:56 -0400
MIME-Version: 1.0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE html>
<html><html><html><body><div>I&#39;ve sporadically encountered a deadlock i=
n fossil. Naturally, when your root file system crashes, it can be hard to =
debug. My solution: stop having a root file system. Was able to attach acid=
 using mycroft&#39;s tooling from ANTS, and get a clean stack trace (<a hre=
f=3D"https://pixelhero.dev/notebook/fossil/stacks/2023-04-03.1">https://pix=
elhero.dev/notebook/fossil/stacks/2023-04-03.1</a>).<br /></div><div><br />=
</div><div>After a few hours yesterday (<a href=3D"https://pixelhero.dev/20=
23-04-03-fossil.html">https://pixelhero.dev/notebook/fossil/2023-04-03.html=
</a>), I eventually tracked down the deadlock. When blockWrite is told to f=
lush a clean block to disk - i.e. one which is already flushed - it removes=
 the block from the cache&#39;s free list, locks the block, detects that it=
&#39;s clean, and then... drops the reference. While keeping the block lock=
ed. And in the cache.<br /></div><div><br /></div><div>This leak of the loc=
k, of course, means that the *next* access to the block - which is still in=
 the cache! - hangs indefinitely. This is seen exactly in the stack trace:<=
br /></div><div><br /></div><div>_cacheLocal grabs the block from the cache=
, tries to lock it, and hangs indefinitely. Worse, it does so under a call =
to fileWalk, which holds a different lock, so the effect spreads out and ma=
kes even more of the file system inaccessible as well (the fileMetaFlush pr=
oc hangs waiting on this file lock).<br /></div><div><br /></div><div>This =
patch just ensures we call blockPut on the BioClean path as well, thus unlo=
cking the block and readding it to the cache&#39;s free lists.<br /></div><=
div><br /></div><div>The patch is on my branch - <a href=3D"https://git.sr.=
ht/~pixelherodev/plan9/commit/1bf8bd4f44e058261da7e89d87527b12073c9e0f">htt=
ps://git.sr.ht/~pixelherodev/plan9/commit/1bf8bd4f44e058261da7e89d87527b120=
73c9e0f</a> - but I figured I should probably post it here as well.<br /></=
div><div><br /></div><div>If anyone has any other patches that weren&#39;t =
in the 9legacy download as of ~2018, please let me know! :)<br /></div><div=
><br /></div><div>---<br /></div><div>sys/src/cmd/fossil/cache.c | 4 +++-<b=
r /></div><div>1 file changed, 3 insertions(+), 1 deletion(-)<br /></div><d=
iv><br /></div><div>diff --git a/sys/src/cmd/fossil/cache.c b/sys/src/cmd/f=
ossil/cache.c<br /></div><div>index f473d211e..2fec44949 100644<br /></div>=
<div>--- a/sys/src/cmd/fossil/cache.c<br /></div><div>+++ b/sys/src/cmd/fos=
sil/cache.c<br /></div><div>@@ -1203,8 +1203,10 @@ blockWrite(Block *b, int=
 waitlock)<br /></div><div>fprint(2, &quot;%s: %d:%x:%d iostate is %d in bl=
ockWrite\n&quot;,<br /></div><div>argv0, bb-&gt;part, bb-&gt;addr, bb-&gt;l=
.type, bb-&gt;iostate);<br /></div><div>/* probably BioWriting if it happen=
s? */<br /></div><div>-			if(bb-&gt;iostate =3D=3D BioClean)<br /></div><di=
v>+			if(bb-&gt;iostate =3D=3D BioClean){<br /></div><div>+				blockPut(bb)=
;<br /></div><div>goto ignblock;<br /></div><div>+			}<br /></div><div>}<br=
 /></div><div><br /></div><div>blockPut(bb);<br /></div><div>--<br /></div>=
<div><br /></div><div id=3D"topicbox-footer" style=3D"margin:10px 0 0;borde=
r-top:1px solid #ddd;border-color:rgba(0,0,0,.15);padding:7px 0;">

<strong><a href=3D"https://9fans.topicbox.com/latest" style=3D"color:inheri=
t;text-decoration:none">9fans</a></strong>
  / 9fans / see
<a href=3D"https://9fans.topicbox.com/groups/9fans">discussions</a>
  +
<a href=3D"https://9fans.topicbox.com/groups/9fans/members">participants</a>
  +
<a href=3D"https://9fans.topicbox.com/groups/9fans/subscription">delivery&n=
bsp;options</a>
<a href=3D"https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-Ma4bc3=
ee3ea80defbb239f382" style=3D"float:right">Permalink</a>
</div>
</body></html></html></html>=

--16806285561.Ea12aFb.575584--