* [9fans] Plan 9 server controls impaired and steps taken before failure.
@ 2005-05-10 1:43 Vester Thacker
2005-05-10 1:50 ` andrey mirtchovski
0 siblings, 1 reply; 13+ messages in thread
From: Vester Thacker @ 2005-05-10 1:43 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
I am running a fossil+venti on an AMD64 machine. After about 20 to 30
minutes the machine ceases to receive text or chording input but the
mouse cursor still moves and the stats display continues to run. Also,
I am unable to cpu to it.
Any suggestions for a fix?
Btw, I have rebooted the machine numerous times for troubleshooting. I
disabled cron once while troubleshooting but that didn't appear to be
the culprit.
If anyone has successfully installed Plan 9 with the fossil+venti
option please state an overview of the procedures that allowed for a
hassle free installation.
Here the standard steps taken:
1) started with the default fossil+venti option
2) modified fossil.conf (e.g, added -AWP)
3) reviewed plan9.ini to ensure a line was a venti line added.
4) reboot
5) login as glenda
6) modified /lib/ndb/local (i.e., added dns, dnsdomain, ip address,
name, netmask, cpu, fs, auth, ect...)
7) modified /rc/bin/cpurc (e.g. added devices, set IP address,
factotum, secstored, keyfs, /ndb/dns -r, ect...)
8) added user...a new hostowner
9) modified /lib/ndb/auth
10) reboot
11) login as hostowner
12) replica/pull -v /dist/replica/network
13) stopped to fix this annoying problem that occurs every 20 to 30 minutes ;)
If I am forgetting something or doing something out of order, I'd like
to know. I've tried variations on my steps and even minimize the steps
but the results all lead to the same conclusion.
What really gets my goat is that this is my third machine that I've
attempted to get fossil+venti working correctly. All machines have the
same problem. I'm really thinking that it isn't an issue with hardware
but rather a missed step. If it turns out to be a hardware issue in
all 3 cases, then I am an unlucky guy.
-vester
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [9fans] Plan 9 server controls impaired and steps taken before failure.
2005-05-10 1:43 [9fans] Plan 9 server controls impaired and steps taken before failure Vester Thacker
@ 2005-05-10 1:50 ` andrey mirtchovski
2005-05-10 21:28 ` Bruce Ellis
0 siblings, 1 reply; 13+ messages in thread
From: andrey mirtchovski @ 2005-05-10 1:50 UTC (permalink / raw)
To: Vester Thacker, Fans of the OS Plan 9 from Bell Labs
On 5/9/05, Vester Thacker <vester.thacker@gmail.com> wrote:
> I am running a fossil+venti on an AMD64 machine. After about 20 to 30
> minutes the machine ceases to receive text or chording input but the
> mouse cursor still moves and the stats display continues to run. Also,
> I am unable to cpu to it.
>
fossil is deadlocked while dumping to venti. it still does its job but
no new files can be opened, consequently nothing works except the
programs that are already started.
give it enough time and it'll release the deadlock. i've left it
overnight for big dumps and it has taken up to 12 hours to have a
useable system before, for something like 10+ gigs.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [9fans] Plan 9 server controls impaired and steps taken before failure.
2005-05-10 1:50 ` andrey mirtchovski
@ 2005-05-10 21:28 ` Bruce Ellis
2005-05-10 23:14 ` [9fans] Plan 9 server controls impaired and steps taken before andrey mirtchovski
0 siblings, 1 reply; 13+ messages in thread
From: Bruce Ellis @ 2005-05-10 21:28 UTC (permalink / raw)
To: andrey mirtchovski, Fans of the OS Plan 9 from Bell Labs
dma may be your friend. it makes things 100 times quicker on fast machines.
brucee
On 5/10/05, andrey mirtchovski <mirtchovski@gmail.com> wrote:
> On 5/9/05, Vester Thacker <vester.thacker@gmail.com> wrote:
> > I am running a fossil+venti on an AMD64 machine. After about 20 to 30
> > minutes the machine ceases to receive text or chording input but the
> > mouse cursor still moves and the stats display continues to run. Also,
> > I am unable to cpu to it.
> >
>
> fossil is deadlocked while dumping to venti. it still does its job but
> no new files can be opened, consequently nothing works except the
> programs that are already started.
>
> give it enough time and it'll release the deadlock. i've left it
> overnight for big dumps and it has taken up to 12 hours to have a
> useable system before, for something like 10+ gigs.
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [9fans] Plan 9 server controls impaired and steps taken before
2005-05-10 21:28 ` Bruce Ellis
@ 2005-05-10 23:14 ` andrey mirtchovski
2005-05-11 0:33 ` Vester Thacker
0 siblings, 1 reply; 13+ messages in thread
From: andrey mirtchovski @ 2005-05-10 23:14 UTC (permalink / raw)
To: 9fans
> dma may be your friend. it makes things 100 times quicker on fast machines.
>
> brucee
my experiences are definitely with dma on :)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [9fans] Plan 9 server controls impaired and steps taken before
2005-05-10 23:14 ` [9fans] Plan 9 server controls impaired and steps taken before andrey mirtchovski
@ 2005-05-11 0:33 ` Vester Thacker
2005-05-13 16:26 ` Russ Cox
0 siblings, 1 reply; 13+ messages in thread
From: Vester Thacker @ 2005-05-11 0:33 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On 5/11/05, andrey mirtchovski <mirtchov@cpsc.ucalgary.ca> wrote:
> > dma may be your friend. it makes things 100 times quicker on fast machines.
> >
> > brucee
>
> my experiences are definitely with dma on :)
If I take that literally, then I can expect my a working machine
within 200 days with dma turned off. ;)
Thanks for the suggestion, Brucee, I'll turn dma on. I'll add it to
the installation guide.
Andrey, thanks for your help too.
Kenji Arisawa sent me a pccpuf config that has worked well for him and
that I plan to use.
Should anyone be following this thread and anticipates using the
fossil+venti option during an install, this should help. Thanks go to
Kenji Arisawa for the following config:
The configuration(/sys/src/9/pc/pccpuf) is as following:
dev
root
cons
arch
pnp pci
env
pipe
proc
mnt
srv
dup
rtc
ssl
tls
bridge log
sdp thwack unthwack
cap
kprof
fs
ether netif
ip arp chandial ip ipv6 ipaux iproute netlog
nullmedium pktmedium ptclbsum386 inferno
draw screen vga vgax
mouse mouse
vga
sd
floppy dma
uart
usb
link
ether2000 ether8390
ether2114x pci
ether79c970 pci
ether8003 ether8390
ether8139 pci
ether82543gc pci
ether82557 pci
ether83815 pci
etherelnk3 pci
etherga620 pci
etherigbe pci ethermii
etherrhine pci ethermii
ethersink
ethermedium
netdevmedium
loopbackmedium
usbuhci
misc
archmp mp apic
uarti8250
uartpci pci
sdata pci sdscsi
sd53c8xx pci sdscsi
vga3dfx +cur
vgaark2000pv +cur
vgabt485 =cur
vgaclgd542x +cur
vgaclgd546x +cur
vgact65545 +cur
vgacyber938x +cur
vgaet4000 +cur
vgahiqvideo +cur
vgai81x +cur
vgamach64xx +cur
vgamga2164w +cur
vgamga4xx +cur
vganeomagic +cur
vganvidia +cur
vgargb524 =cur
vgas3 +cur vgasavage
vgat2r4 +cur
vgatvp3020 =cur
vgatvp3026 =cur
vgavmware +cur
ip
il
tcp
udp
ipifc
icmp
icmp6
gre
ipmux
esp
rudp
port
int cpuserver = 1;
boot cpu boot #S/sdC0/
tcp
il
local
bootdir
bootpccpuf.out boot
/386/bin/ip/ipconfig
/386/bin/auth/factotum
/386/bin/disk/kfs
/386/bin/fossil/fossil
/386/bin/venti/venti
--
-vester
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [9fans] Plan 9 server controls impaired and steps taken before
2005-05-11 0:33 ` Vester Thacker
@ 2005-05-13 16:26 ` Russ Cox
2005-05-13 21:29 ` arisawa
0 siblings, 1 reply; 13+ messages in thread
From: Russ Cox @ 2005-05-13 16:26 UTC (permalink / raw)
To: Vester Thacker, Fans of the OS Plan 9 from Bell Labs
> Kenji Arisawa sent me a pccpuf config that has worked well for him and
> that I plan to use.
> Should anyone be following this thread and anticipates using the
> fossil+venti option during an install, this should help. Thanks go to
> Kenji Arisawa for the following config:
Unless you're using a laptop as your cpu server,
this config won't work any differently from the standard 9pccpuf.
It's the standard one with some laptop ethernet drivers added.
Russ
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [9fans] Plan 9 server controls impaired and steps taken before
2005-05-13 16:26 ` Russ Cox
@ 2005-05-13 21:29 ` arisawa
2005-05-14 17:11 ` Vester Thacker
0 siblings, 1 reply; 13+ messages in thread
From: arisawa @ 2005-05-13 21:29 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Hello,
> Kenji Arisawa sent me a pccpuf config that has worked well for him and
> that I plan to use.
>
That depends on kernel codes.
It seems fossil (at least in Jan. and Feb. of this year) locked the
file service during "snap" and "snap -a".
Does this continue?
Kenji Arisawa
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [9fans] Plan 9 server controls impaired and steps taken before
2005-05-13 21:29 ` arisawa
@ 2005-05-14 17:11 ` Vester Thacker
2005-05-14 18:03 ` Russ Cox
0 siblings, 1 reply; 13+ messages in thread
From: Vester Thacker @ 2005-05-14 17:11 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On 5/14/05, arisawa@ar.aichi-u.ac.jp <arisawa@ar.aichi-u.ac.jp> wrote:
>
> It seems fossil (at least in Jan. and Feb. of this year) locked the
> file service during "snap" and "snap -a".
> Does this continue?
I am not sure, but I am on Day 4 of the initial snap. I'm not sure how
long I need to wait until I consider the current installation process
a failure. Perhaps 12 more days of waiting and I'll call it quits.
Btw I have a 40 GB fossil and a 210GB venti running on an AMD64
machine. There is approximately 300 Mb of files on the fossil. I have
dma turned on. My hard disk is an ATA 133 w/ 16Mb of cache. I don't
understand *why* it takes so long for a snap to complete.
Sorry if I come off as appearing frustrated about the wait, but I am
*frustrated* about this.
This isn't something you can recommend your friends to try; or even
present to a crowd during an Expo.
-vester
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [9fans] Plan 9 server controls impaired and steps taken before
2005-05-14 17:11 ` Vester Thacker
@ 2005-05-14 18:03 ` Russ Cox
2005-05-14 22:06 ` arisawa
0 siblings, 1 reply; 13+ messages in thread
From: Russ Cox @ 2005-05-14 18:03 UTC (permalink / raw)
To: Vester Thacker, Fans of the OS Plan 9 from Bell Labs
I'd be frustrated too. I've never seen a wait that long.
I made a bad design choice in the locking of fossil blocks
and I apologize. My suggestion would be to run sync
and then halt at the console, reboot, and let it start
again.
There is a window (I think ten seconds) between snap -a
and fossil deciding to start archiving. If you access any
file in those ten seconds then enough of the root gets
copied-on-write that you shouldn't see the deadlock at all.
Rebooting should cause enough file activity at startup
to get around the deadlock.
Russ
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [9fans] Plan 9 server controls impaired and steps taken before
2005-05-14 18:03 ` Russ Cox
@ 2005-05-14 22:06 ` arisawa
2005-05-14 22:33 ` Russ Cox
0 siblings, 1 reply; 13+ messages in thread
From: arisawa @ 2005-05-14 22:06 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Hello Russ,
> There is a window (I think ten seconds) between snap -a
> and fossil deciding to start archiving. If you access any
> file in those ten seconds then enough of the root gets
> copied-on-write that you shouldn't see the deadlock at all.
>
What happens if some accesses come from Internet during that time ?
Files in /sys/log/* are big enough.
Sorry I couldn't understand "the root".
Kenji Arisawa
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [9fans] Plan 9 server controls impaired and steps taken before
2005-05-14 22:06 ` arisawa
@ 2005-05-14 22:33 ` Russ Cox
0 siblings, 0 replies; 13+ messages in thread
From: Russ Cox @ 2005-05-14 22:33 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> > There is a window (I think ten seconds) between snap -a
> > and fossil deciding to start archiving. If you access any
> > file in those ten seconds then enough of the root gets
> > copied-on-write that you shouldn't see the deadlock at all.
> >
>
> What happens if some accesses come from Internet during that time ?
> Files in /sys/log/* are big enough.
> Sorry I couldn't understand "the root".
The root of the tree of files and blocks. It's copy-on-write
after a snapshot but snap -a locks the blocks while it is
archiving. If the block has already been copied, no big deal.
If it's still the one in the file tree (not been copied-on-write yet)
then you can't access it until the archiver finishes. I should
fix this to be some sort of read lock but it's not completely
straightforward.
Russ
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [9fans] Plan 9 server controls impaired and steps taken before
@ 2005-05-16 6:10 YAMANASHI Takeshi
2005-05-16 9:04 ` Richard Miller
0 siblings, 1 reply; 13+ messages in thread
From: YAMANASHI Takeshi @ 2005-05-16 6:10 UTC (permalink / raw)
To: 9fans
hi,
> I am not sure, but I am on Day 4 of the initial snap. I'm not sure how
> long I need to wait until I consider the current installation process
> a failure. Perhaps 12 more days of waiting and I'll call it quits.
My idea of working around this is taking the initial "snap -a" during
the installation. I'm assuming you are installing from CD on sdD0 to
sdC0 with the default partition names.
1: finish the installation to where "Feel free to turn off your computer."
message shows.
2: sweep a new window and follow the steps below.
term% mount /srv/9660 /n/dist /dev/sdD0/data
term% cd /n/dist/386/bin
term% {echo srv -p fscons; ./fossil/conf /dev/sdC0/fossil} > /tmp/f
term% ./venti/venti -c /dev/sdD0/arenas -a 'tcp!127.1!17034' -h 'tcp!127.1!8888'
term% venti=tcp!127.1!17034
term% ./fossil/fossil -c '. /tmp/f'
term% ./con -l /srv/fscons
prompt: fsys main snap -a
__wait until archive:.... shows__
prompt: fsys all sync
prompt: fsys all halt
__quits con__
>>>q
term% kill fossil | rc
term% ./fossil/fossil -c '. /tmp/f'
term% ./con -l /srv/fscons
prompt: fsys main df
__ the usage should be low once archived to venti __
prompt: fsys all sync
prompt: fsys all halt
__ quit con __
term% kill fossil | rc
term% ./venti/sync
<important>
DON'T HIT delete key IN THIS WINDOW. IT WILL KILL venti.
</important>
3: you can check the progress of snapshoting in another window.
term% mount /srv/9660 /n/dist /dev/sdD0/data
term% cd /n/dist/386/bin
term% telnet tcp!127.1!8888
GET /storage
__venti usages shows__
This process could be migrated into the installation process or not.
--
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [9fans] Plan 9 server controls impaired and steps taken before
2005-05-16 6:10 YAMANASHI Takeshi
@ 2005-05-16 9:04 ` Richard Miller
0 siblings, 0 replies; 13+ messages in thread
From: Richard Miller @ 2005-05-16 9:04 UTC (permalink / raw)
To: 9fans
> <important>
> DON'T HIT delete key IN THIS WINDOW. IT WILL KILL venti.
> </important>
Venti remains interruptible even when running in the background -
bug or feature? If it's a bug, I reckon this is the fix:
===============
/sys/src/cmd/venti/venti.c:80,85 - venti.c:80,88
if(config == nil)
config = "venti.conf";
> if(background)
> rfork(RFNOTEG);
>
vtAttach();
if(!initArenaSum())
===============
-- Richard
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2005-05-16 9:04 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-10 1:43 [9fans] Plan 9 server controls impaired and steps taken before failure Vester Thacker
2005-05-10 1:50 ` andrey mirtchovski
2005-05-10 21:28 ` Bruce Ellis
2005-05-10 23:14 ` [9fans] Plan 9 server controls impaired and steps taken before andrey mirtchovski
2005-05-11 0:33 ` Vester Thacker
2005-05-13 16:26 ` Russ Cox
2005-05-13 21:29 ` arisawa
2005-05-14 17:11 ` Vester Thacker
2005-05-14 18:03 ` Russ Cox
2005-05-14 22:06 ` arisawa
2005-05-14 22:33 ` Russ Cox
2005-05-16 6:10 YAMANASHI Takeshi
2005-05-16 9:04 ` Richard Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).