From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: To: 9fans@9fans.net From: Richard Miller <9fans@hamnavoe.com> Date: Thu, 22 Mar 2012 16:30:17 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: [9fans] fossil archive corruption - solved Topicbox-Message-UUID: 6e16ec90-ead7-11e9-9d60-3106f5b1d025 I've finally solved the mystery of data corruption in fossil archives. As usual, the cause is obvious once you've seen it. The clue is this line and comment in /sys/src/cmd/fossil/cache.c:/^doRemoveLink: /* * We're not really going to overwrite b, but if we're not * going to look at its contents, there is no point in reading * them from the disk. */ b = cacheLocalData(c, p->addr, p->type, p->tag, recurse ? OReadOnly : OOverWrite, 0); The context is that b is a copy-on-write block from an earlier epoch which has just been copied, and doRemoveLink needs only the block's label (not its data) in order to remove it from the active file system by setting its state to BsClosed. The problem is that when cacheLocalData loads a block from disk with mode OOverWrite, it sets its iostate to BioClean even though only the block's label has been read. In all other instances of OOverWrite, this is benign because the caller immediately overwrites the block's data and sets its iostate to BioDirty before unlocking it. Even in the case of doRemoveLink, usually the block won't have needed to be loaded from disk because it has recently been copied and therefore is still in the cache with valid data. But if the block did need to be loaded again, the cache entry will will now be marked BioClean but with invalid data (ie the contents of whatever other block most recently used that buffer). Now suppose a 'snap -a' is running concurrently, and the copied block in question is from the just-finished epoch but hasn't been archived yet. If the timing is right (wrong), the snapshot thread will find the block in the cache and write its bad data permanently to venti. The consequence will depend on the type of the invalid block. If it's a pointer or dir block, the archive operation may fail with "illegal block address" errors when trying to walk to its descendents; if it's a meta data block, some portion of the archived tree may become inaccessible with "corrupted meta data" messages. Both of these have been reported from time to time in 9fans. If it's a data block, the archive will simply contain some bad data. I've observed this at least once, and who knows how many times it has happened unnoticed? The simplest fix would be to revert to the original fossil code in which doRemoveLink always loaded the block with OReadOnly. But in line with the Principle of Least Astonishment (and to guard against anyone else making a mistake with a similar "optimisation"), I think the right thing is for cacheLocalData to set the block's iostate to BioLabel, not BioClean, when only the label has been read. Patch is fossil-archive-corruption.