From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <a0b4fdc70c9531ba9a41193c219915fe@coraid.com>
From: erik quanstrom <quanstro@coraid.com>
Date: Mon,  9 Apr 2007 12:23:40 -0400
To: 9fans@cse.psu.edu
Subject: Re: [9fans] bell-labs website and plan9
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Topicbox-Message-UUID: 42495d00-ead2-11e9-9d60-3106f5b1d025

On Mon Apr  9 11:50:41 EDT 2007, rsc@swtch.com wrote:
> erik:
> > i have also noticed that replica/applylog has a problem.  when i started
> > experimenting with copying history from our old fileserver to the new
> > one, i started using replica/updatedb and replica/applylog.  updatedb
> > worked very well, but applylog hung for me pretty consistantly.
> 
> Did you ever use acid to get a stack trace from the `hung' applylogs?
> The only threading in applylog is an implementation of something
> like fcp to copy files using multiple outstanding 9P read requests.
> Since no one else seems to have had problems, I would guess that
> there were just some requests that made your file server thrash.
> But stack traces would make the answer very clear.

i apologize for not having a backtrace, they looked uninformative at the
time.  what i rememer was that applylog was not doing any i/o at the time.
(unless it was reading the same blocks over and over.)
in once instance, applylog had the same /proc/$pid/fd for 4 hrs
and generated no system load at all. the 

one problem i do see that was not my case (i was working on two successive
days from the dump) is that there is no maximum number
of tries to keep a file from changing underfoot.  a log file competing with
a slow link could be problematic.

restarting it where it left off (with an initial line number) generally
fixed the problem.  i didn't mention it at the time because i 
didn't get to the bottom of the problem.

i'll try to recreate the problem with a backtrace, but anyone else is welcome
to beat me to it.

- erik