9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Skip Tavakkolian <skip.tavakkolian@gmail.com>
To: 9fans <9fans@9fans.net>
Subject: Re: [9fans] [PATCH] fossil: fix a deadlock in the caching logic
Date: Tue, 4 Apr 2023 11:34:09 -0700	[thread overview]
Message-ID: <CAJSxfmLke7BPuO7c94PtBD_GpF8w3Rdj+87hH_dO_1=PE2TOTg@mail.gmail.com> (raw)
In-Reply-To: <EB8D37D9-6D99-4F9C-B7F1-050CB5BE7D7E@quintile.net>

it definitely was not me. My bet would be on rsc, geoff, richard,
forsyth, quanstrom or djc.

On Tue, Apr 4, 2023 at 11:05 AM Steve Simon <steve@quintile.net> wrote:
>
>
> was this hard to reproduce?
>
> i have not seen fossil deadlocking and have used it since i installed my first home server in 2004.
>
> there definitely _was_ a problem in the snapshot code which was finally resolved around 2015 (roughly), i think perhaps skip, or forsyth found it - i apologise if i have the attribution wrong.
>
> fossil is also unhelpful if it runs out of space - i don’t believe brucee ever forgave it for that.
> this is less of a problem when it is run with venti of course.
>
> -Steve
>
>
> On 4 Apr 2023, at 6:16 pm, noam@pixelhero.dev wrote:
>
> 
> I've sporadically encountered a deadlock in fossil. Naturally, when your root file system crashes, it can be hard to debug. My solution: stop having a root file system. Was able to attach acid using mycroft's tooling from ANTS, and get a clean stack trace (https://pixelhero.dev/notebook/fossil/stacks/2023-04-03.1).
>
> After a few hours yesterday (https://pixelhero.dev/notebook/fossil/2023-04-03.html), I eventually tracked down the deadlock. When blockWrite is told to flush a clean block to disk - i.e. one which is already flushed - it removes the block from the cache's free list, locks the block, detects that it's clean, and then... drops the reference. While keeping the block locked. And in the cache.
>
> This leak of the lock, of course, means that the *next* access to the block - which is still in the cache! - hangs indefinitely. This is seen exactly in the stack trace:
>
> _cacheLocal grabs the block from the cache, tries to lock it, and hangs indefinitely. Worse, it does so under a call to fileWalk, which holds a different lock, so the effect spreads out and makes even more of the file system inaccessible as well (the fileMetaFlush proc hangs waiting on this file lock).
>
> This patch just ensures we call blockPut on the BioClean path as well, thus unlocking the block and readding it to the cache's free lists.
>
> The patch is on my branch - https://git.sr.ht/~pixelherodev/plan9/commit/1bf8bd4f44e058261da7e89d87527b12073c9e0f - but I figured I should probably post it here as well.
>
> If anyone has any other patches that weren't in the 9legacy download as of ~2018, please let me know! :)
>
> ---
> sys/src/cmd/fossil/cache.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/sys/src/cmd/fossil/cache.c b/sys/src/cmd/fossil/cache.c
> index f473d211e..2fec44949 100644
> --- a/sys/src/cmd/fossil/cache.c
> +++ b/sys/src/cmd/fossil/cache.c
> @@ -1203,8 +1203,10 @@ blockWrite(Block *b, int waitlock)
> fprint(2, "%s: %d:%x:%d iostate is %d in blockWrite\n",
> argv0, bb->part, bb->addr, bb->l.type, bb->iostate);
> /* probably BioWriting if it happens? */
> - if(bb->iostate == BioClean)
> + if(bb->iostate == BioClean){
> + blockPut(bb);
> goto ignblock;
> + }
> }
>
> blockPut(bb);
> --
>
> 9fans / 9fans / see discussions + participants + delivery options Permalink

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T354fe702e1e9d5e9-Mc25a40069de1a1f118f53839
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

  reply	other threads:[~2023-04-04 18:34 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-04 17:15 noam
2023-04-04 18:03 ` Steve Simon
2023-04-04 18:34   ` Skip Tavakkolian [this message]
2023-04-04 20:44     ` Charles Forsyth
2023-04-04 20:50       ` Charles Forsyth
2023-04-05  1:59         ` noam
2023-04-05 21:25           ` Charles Forsyth
2023-04-06  3:22             ` noam
2023-04-06  3:57               ` Lucio De Re
2023-04-08  7:50                 ` hiro
2023-04-08 14:30                   ` Charles Forsyth
2023-04-08 14:36                     ` Charles Forsyth
2023-04-08 15:09                       ` Dan Cross
2023-04-08 15:27                         ` Steffen Nurpmeso
2023-04-08 17:12                         ` Bakul Shah
2023-04-04 22:07       ` Anthony Martin
2023-04-05  2:48         ` noam
2023-04-04 22:15   ` noam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJSxfmLke7BPuO7c94PtBD_GpF8w3Rdj+87hH_dO_1=PE2TOTg@mail.gmail.com' \
    --to=skip.tavakkolian@gmail.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).