9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Richard Miller <9fans@hamnavoe.com>
To: 9fans@9fans.net
Subject: Re: [9fans] fossil deadlock
Date: Thu, 22 Mar 2012 11:57:10 +0000	[thread overview]
Message-ID: <635e0a367d595f2c6e2b38c2480d2cf9@hamnavoe.com> (raw)
In-Reply-To: <20110908124823.9235665D38@server.hemiola.co.uk>

Thanks to detective work by Rod at hemiola and David du Colombier,
I've been able to find & fix the long-standing deadlock in fossil's
snapshot code.

The problem is the lock c->dirtylk, which prevents any thread from
dirtying new blocks while flushThread is writing old dirty blocks to
disk.  Deadlock occurs when a thread T tries to dirty a block B while
holding a lock on an already-dirty block A. T can't proceed until
c->dirtylk is unlocked, which won't happen until flushThread has written
out block A; but block A can't be written out until T unlocks it.

The deadlock is usually observed when a fossil 'sync' occurs (eg at the
beginning of a periodic temporary snapshot) while 'snap -a' is running;
in this case thread T is archThread, updating the label in block B for a
dirty block A which it has just sent to venti.  But there are many other
cases where a thread locks two blocks and marks one dirty, which are
vulnerable to this deadlock if a snapshot occurs just at the wrong time.

The solution is simple (though I only realised it after trying and
rejecting several complicated ones).  It turns out that c->dirtylk
serves no useful purpose and can be eliminated altogether.  Its use
is explained by a comment in cacheFlush:
	/*
	 * Lock c->dirtylk so that more blocks aren't being dirtied
	 * while we try to write out what's already here.
	 * Otherwise we might not ever finish!
	 */
But whenever cacheFlush is called with wait=1, a write lock is already
held on the "epoch lock" fs->elk, which prevents any new 9p requests
from touching the file system.  Any dirtying of blocks is limited to
internal fossil activities for requests which were already in progress
when cacheFlush was called, and thus strictly bounded even without
c->dirtylk.

Patch is fossil-snap-deadlock.




  parent reply	other threads:[~2012-03-22 11:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-08 12:48 rod
2011-09-15 14:28 ` Ethan Grammatikidis
2011-09-15 14:32   ` erik quanstrom
2012-03-22 11:57 ` Richard Miller [this message]
2012-03-22 13:13   ` Gorka Guardiola
2012-03-22 13:41   ` David du Colombier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=635e0a367d595f2c6e2b38c2480d2cf9@hamnavoe.com \
    --to=9fans@hamnavoe.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).