From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <509071940709280935u4c3de703ua00380231d451857@mail.gmail.com> Date: Fri, 28 Sep 2007 12:35:35 -0400 From: "Anthony Sorace" To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu> Subject: Re: [9fans] More venti sync woes. In-Reply-To: <46FCC2CF.1060501@gmx.de> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <509071940709261232t1046ecv2ce6800d549c180c@mail.gmail.com> <509071940709271549l57a778e5k9d0f70e3d3a670be@mail.gmail.com> <46FCC2CF.1060501@gmx.de> Topicbox-Message-UUID: c6d7f2f2-ead2-11e9-9d60-3106f5b1d025 agreed. 'ps -a | grep venti' shows the followg after about 15 minutes: glenda 198 3:04 3:44 104508K Rendez venti [main] glenda 199 0:00 0:00 104508K Rendez venti glenda 200 0:00 0:00 104508K Sleep venti glenda 201 0:00 0:00 104508K Rendez venti [icachewriteproc:/dev/sdC0/isect] glenda 202 4:49 4:23 104508K Rendez venti [icachewritecoord] glenda 203 0:00 0:00 104508K Sleep venti [delaykickproc icache] glenda 204 0:23 1:11 104508K Rendez venti [flushproc] glenda 205 0:00 0:00 104508K Rendez venti [delaykickproc dcache] glenda 206 0:00 0:00 104508K Rendez venti glenda 206 0:00 0:00 104508K Rendez venti [bloomwriteproc] once it hits "sync...", load, context, and sycall are pegged in stats; memory ramps up a bit over the first ~ half minute, but levels out. For the big three processes, here's everything over 3% in tprof: :; tprof 198 total: 3040 TEXT 00001000 ms % sym 480 15.7 _tas 240 7.8 runthread 230 7.5 lock 180 5.9 _threadrendezvous 170 5.5 rendezvous 140 4.6 qlock 130 4.2 _sched 110 3.6 trace 110 3.6 _threadready 100 3.2 waitforkick 100 3.2 icachewritecoord :; tprof 202 total: 7570 TEXT 00001000 ms % sym 1090 14.3 _tas 520 6.8 runthread 500 6.6 rendezvous 490 6.4 _threadrendezvous 490 6.4 lock 290 3.8 icachewritecoord 280 3.6 _sched 260 3.4 qlock 240 3.1 trace 230 3.0 _threadready :; tprof 204 total: 14040 TEXT 00001000 ms % sym 2020 14.3 _tas 1010 7.1 _threadrendezvous 950 6.7 rendezvous 930 6.6 runthread 920 6.5 lock 590 4.2 icachewritecoord 540 3.8 trace 510 3.6 _sched 470 3.3 _threadready tight loops with most of its time in the thread library. poking around with acid now to get more info. On 9/28/07, Kernel Panic wrote: > Russ Cox wrote: > > >dma is worth around 10x, certainly less than 50. > >i agree that your venti server is taking a very long > >time to come back. i reboot mine all the time > >and don't have this problem. > > > >i am at a loss for what could be taking it so long. > >it's probably not going to hurt any to stop it. > >it could take forever -- maybe it's looping! > > > > > It is... > > while(1){ > proc main: kick icache > work icachewritecoord: start > proc icachewritecoord: icachewritecoord kick dcache > work flushproc: start > proc flushproc: build t=131 > proc flushproc: writeblocks t=991 > proc flushproc: writeblocks.1 t=1632 > proc flushproc: writeblocks.2 t=2296 > proc flushproc: writeblocks.3 t=2944 > proc flushproc: undirty.4 t=3564 > work flushproc: finish > proc icachewritecoord: kick dcache > proc icachewritecoord: icachewritecoord kicked dcache > proc icachewritecoord: icachewritecoord start flush > proc icachewritecoord: icachedirty enter > proc icachewritecoord: icachedirty exit > proc icachewritecoord: icachewritecoord sleep > proc main: kick icache > } > > the main proc loops in icachealloc(): > > while(icache.ndirty == icache.entries){ > /* > * This is a bit suspect. Kickicache will wake up the > * icachewritecoord, but if all the index entries are for > * unflushed disk blocks, icachewritecoord won't be > * able to do much. It always rewakes everyone when > * it thinks it is done, though, so at least we'll go around > * the while loop again. Also, if icachewritecoord sees > * that the disk state hasn't change at all since the last > * time around, it kicks the disk. This needs to be > * rethought, but it shouldn't deadlock anymore. > */ > kickicache(); > rsleep(&icache.full); > } > > but icache.ndirty never changes... so it hangs forever in > "sync..." because it cant allocate ientries. > > >when you manage to boot in other means, > >it would be nice to see what ps -a|grep venti > >says. venti sets its proc args that show up in ps -a > >to tell you what each proc does. > > > >the new venti is very careful both about the > >consistency of what is stored on disk and about > >recovering quickly after a disk failure > >(there's not a lot to do -- just pick up the unindexed > >arena entries from the arena tocs and toss them > >back into the index write buffer where they were > >when you restarted the system). > > > >what you're describing could happen if you were > >running a new venti (which buffers index updates > >quite aggressively) and then on reboot managed > >to start an old venti (which would then process the > >unindexed new blocks one at a time instead of > >buffering the updates, with about 3 seeks per block). > > > >without more information i'm afraid i have no good answers. > > > >russ > > > > > > > >