9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Plan 9 server controls impaired and steps taken before failure.
@ 2005-05-10  1:43 Vester Thacker
  2005-05-10  1:50 ` andrey mirtchovski
  0 siblings, 1 reply; 11+ messages in thread
From: Vester Thacker @ 2005-05-10  1:43 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I am running a fossil+venti on an AMD64 machine. After about 20 to 30
minutes the machine ceases to receive text or chording input but the
mouse cursor still moves and the stats display continues to run. Also,
I am unable to cpu to it.

Any suggestions for a fix?

Btw, I have rebooted the machine numerous times for troubleshooting. I
disabled cron once while troubleshooting but that didn't appear to be
the culprit.

If anyone has successfully installed Plan 9 with the fossil+venti
option please state an overview of the procedures that allowed for a
hassle free installation.

Here the standard steps taken:
1) started with the default fossil+venti option
2) modified fossil.conf (e.g, added -AWP)
3) reviewed plan9.ini to ensure a line was a venti line added.
4) reboot
5) login as glenda
6) modified /lib/ndb/local (i.e., added dns, dnsdomain, ip address,
name, netmask, cpu, fs, auth, ect...)
7) modified /rc/bin/cpurc (e.g. added devices, set IP address,
factotum, secstored, keyfs, /ndb/dns -r,  ect...)
8) added user...a new hostowner
9) modified /lib/ndb/auth 
10) reboot 
11) login as hostowner
12) replica/pull -v /dist/replica/network
13) stopped to fix this annoying problem that occurs every 20 to 30 minutes ;)

If I am forgetting something or doing something out of order, I'd like
to know. I've tried variations on my steps and even minimize the steps
but the results all lead to the same conclusion.

What really gets my goat is that this is my third machine that I've
attempted to get fossil+venti working correctly. All machines have the
same problem. I'm really thinking that it isn't an issue with hardware
but rather a missed step. If it turns out to be a hardware issue in
all 3 cases, then I am an unlucky guy.

-vester


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Plan 9 server controls impaired and steps taken before failure.
  2005-05-10  1:43 [9fans] Plan 9 server controls impaired and steps taken before failure Vester Thacker
@ 2005-05-10  1:50 ` andrey mirtchovski
  2005-05-10 21:28   ` Bruce Ellis
  0 siblings, 1 reply; 11+ messages in thread
From: andrey mirtchovski @ 2005-05-10  1:50 UTC (permalink / raw)
  To: Vester Thacker, Fans of the OS Plan 9 from Bell Labs

On 5/9/05, Vester Thacker <vester.thacker@gmail.com> wrote:
> I am running a fossil+venti on an AMD64 machine. After about 20 to 30
> minutes the machine ceases to receive text or chording input but the
> mouse cursor still moves and the stats display continues to run. Also,
> I am unable to cpu to it.
> 

fossil is deadlocked while dumping to venti. it still does its job but
no new files can be opened, consequently nothing works except the
programs that are already started.

give it enough time and it'll release the deadlock. i've left it
overnight for big dumps and it has taken up to 12 hours to have a
useable system before, for something like 10+ gigs.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Plan 9 server controls impaired and steps taken before failure.
  2005-05-10  1:50 ` andrey mirtchovski
@ 2005-05-10 21:28   ` Bruce Ellis
  2005-05-10 23:14     ` [9fans] Plan 9 server controls impaired and steps taken before andrey mirtchovski
  0 siblings, 1 reply; 11+ messages in thread
From: Bruce Ellis @ 2005-05-10 21:28 UTC (permalink / raw)
  To: andrey mirtchovski, Fans of the OS Plan 9 from Bell Labs

dma may be your friend.  it makes things 100 times quicker on fast machines.

brucee

On 5/10/05, andrey mirtchovski <mirtchovski@gmail.com> wrote:
> On 5/9/05, Vester Thacker <vester.thacker@gmail.com> wrote:
> > I am running a fossil+venti on an AMD64 machine. After about 20 to 30
> > minutes the machine ceases to receive text or chording input but the
> > mouse cursor still moves and the stats display continues to run. Also,
> > I am unable to cpu to it.
> >
> 
> fossil is deadlocked while dumping to venti. it still does its job but
> no new files can be opened, consequently nothing works except the
> programs that are already started.
> 
> give it enough time and it'll release the deadlock. i've left it
> overnight for big dumps and it has taken up to 12 hours to have a
> useable system before, for something like 10+ gigs.
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Plan 9 server controls impaired and steps taken before
  2005-05-10 21:28   ` Bruce Ellis
@ 2005-05-10 23:14     ` andrey mirtchovski
  2005-05-11  0:33       ` Vester Thacker
  0 siblings, 1 reply; 11+ messages in thread
From: andrey mirtchovski @ 2005-05-10 23:14 UTC (permalink / raw)
  To: 9fans

> dma may be your friend.  it makes things 100 times quicker on fast machines.
> 
> brucee

my experiences are definitely with dma on :)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Plan 9 server controls impaired and steps taken before
  2005-05-10 23:14     ` [9fans] Plan 9 server controls impaired and steps taken before andrey mirtchovski
@ 2005-05-11  0:33       ` Vester Thacker
  2005-05-13 16:26         ` Russ Cox
  0 siblings, 1 reply; 11+ messages in thread
From: Vester Thacker @ 2005-05-11  0:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 5/11/05, andrey mirtchovski <mirtchov@cpsc.ucalgary.ca> wrote:
> > dma may be your friend.  it makes things 100 times quicker on fast machines.
> >
> > brucee
> 
> my experiences are definitely with dma on :)

If I take that literally, then I can expect my a working machine
within 200 days with dma turned off. ;)

Thanks for the suggestion, Brucee, I'll turn dma on. I'll add it to
the installation guide.
Andrey, thanks for your help too.

Kenji Arisawa sent me a pccpuf config that has worked well for him and
that I plan to use.
Should anyone be following this thread and anticipates using the
fossil+venti option during an install, this should help. Thanks go to
Kenji Arisawa for the following config:

The configuration(/sys/src/9/pc/pccpuf) is as following:

dev
        root
        cons
        arch
        pnp             pci
        env
        pipe
        proc
        mnt
        srv
        dup
        rtc
        ssl
        tls
        bridge          log
        sdp             thwack unthwack
        cap
        kprof
        fs

        ether           netif
        ip              arp chandial ip ipv6 ipaux iproute netlog
nullmedium pktmedium ptclbsum386 inferno

        draw            screen vga vgax
        mouse           mouse
        vga

        sd
        floppy          dma

        uart
        usb

link
        ether2000       ether8390
        ether2114x      pci
        ether79c970     pci
        ether8003       ether8390
        ether8139       pci
        ether82543gc    pci
        ether82557      pci
        ether83815      pci
        etherelnk3      pci
        etherga620      pci
        etherigbe       pci ethermii
        etherrhine      pci ethermii
        ethersink
        ethermedium
        netdevmedium
        loopbackmedium
        usbuhci

misc
        archmp          mp apic

        uarti8250
        uartpci         pci

        sdata           pci sdscsi
        sd53c8xx        pci sdscsi

        vga3dfx         +cur
        vgaark2000pv    +cur
        vgabt485        =cur
        vgaclgd542x     +cur
        vgaclgd546x     +cur
        vgact65545      +cur
        vgacyber938x    +cur
        vgaet4000       +cur
        vgahiqvideo     +cur
        vgai81x +cur
        vgamach64xx     +cur
        vgamga2164w     +cur
        vgamga4xx       +cur
        vganeomagic     +cur
        vganvidia       +cur
        vgargb524       =cur
        vgas3           +cur vgasavage
        vgat2r4         +cur
        vgatvp3020      =cur
        vgatvp3026      =cur
        vgavmware       +cur

ip
        il
        tcp
        udp
        ipifc
        icmp
        icmp6
        gre
        ipmux
        esp
        rudp

port
        int cpuserver = 1;

boot cpu boot #S/sdC0/
        tcp
        il
        local

bootdir
        bootpccpuf.out boot
        /386/bin/ip/ipconfig
        /386/bin/auth/factotum
        /386/bin/disk/kfs
        /386/bin/fossil/fossil
        /386/bin/venti/venti

--

-vester


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Plan 9 server controls impaired and steps taken before
  2005-05-11  0:33       ` Vester Thacker
@ 2005-05-13 16:26         ` Russ Cox
  2005-05-13 21:29           ` arisawa
  0 siblings, 1 reply; 11+ messages in thread
From: Russ Cox @ 2005-05-13 16:26 UTC (permalink / raw)
  To: Vester Thacker, Fans of the OS Plan 9 from Bell Labs

> Kenji Arisawa sent me a pccpuf config that has worked well for him and
> that I plan to use.
> Should anyone be following this thread and anticipates using the
> fossil+venti option during an install, this should help. Thanks go to
> Kenji Arisawa for the following config:

Unless you're using a laptop as your cpu server,
this config won't work any differently from the standard 9pccpuf.
It's the standard one with some laptop ethernet drivers added.

Russ


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Plan 9 server controls impaired and steps taken before
  2005-05-13 16:26         ` Russ Cox
@ 2005-05-13 21:29           ` arisawa
  2005-05-14 17:11             ` Vester Thacker
  0 siblings, 1 reply; 11+ messages in thread
From: arisawa @ 2005-05-13 21:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hello,

> Kenji Arisawa sent me a pccpuf config that has worked well for him and
> that I plan to use.
>

That depends on kernel codes.

It seems fossil (at least in Jan. and Feb. of this year) locked the 
file service during "snap" and "snap -a".
Does this continue?

Kenji Arisawa



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Plan 9 server controls impaired and steps taken before
  2005-05-13 21:29           ` arisawa
@ 2005-05-14 17:11             ` Vester Thacker
  2005-05-14 18:03               ` Russ Cox
  0 siblings, 1 reply; 11+ messages in thread
From: Vester Thacker @ 2005-05-14 17:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 5/14/05, arisawa@ar.aichi-u.ac.jp <arisawa@ar.aichi-u.ac.jp> wrote:
> 
> It seems fossil (at least in Jan. and Feb. of this year) locked the
> file service during "snap" and "snap -a".
> Does this continue?

I am not sure, but I am on Day 4 of the initial snap. I'm not sure how
long I need to wait until I consider the current installation process
a failure. Perhaps 12 more days of waiting and I'll call it quits.

Btw I have a 40 GB fossil and a 210GB venti running on an AMD64
machine. There is approximately 300 Mb of files on the fossil.  I have
dma turned on. My hard disk is an ATA 133 w/ 16Mb of cache. I don't
understand *why* it takes so long for a snap to complete.

Sorry if I come off as appearing frustrated about the wait, but I am
*frustrated* about this.
This isn't something you can recommend your friends to try; or even
present to a crowd during an Expo.

-vester


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Plan 9 server controls impaired and steps taken before
  2005-05-14 17:11             ` Vester Thacker
@ 2005-05-14 18:03               ` Russ Cox
  2005-05-14 22:06                 ` arisawa
  0 siblings, 1 reply; 11+ messages in thread
From: Russ Cox @ 2005-05-14 18:03 UTC (permalink / raw)
  To: Vester Thacker, Fans of the OS Plan 9 from Bell Labs

I'd be frustrated too.  I've never seen a wait that long.
I made a bad design choice in the locking of fossil blocks
and I apologize.  My suggestion would be to run sync
and then halt at the console, reboot, and let it start
again.  

There is a window (I think ten seconds) between snap -a
and fossil deciding to start archiving.  If you access any
file in those ten seconds then enough of the root gets
copied-on-write that you shouldn't see the deadlock at all.
Rebooting should cause enough file activity at startup
to get around the deadlock.

Russ


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Plan 9 server controls impaired and steps taken before
  2005-05-14 18:03               ` Russ Cox
@ 2005-05-14 22:06                 ` arisawa
  2005-05-14 22:33                   ` Russ Cox
  0 siblings, 1 reply; 11+ messages in thread
From: arisawa @ 2005-05-14 22:06 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hello Russ,

> There is a window (I think ten seconds) between snap -a
> and fossil deciding to start archiving.  If you access any
> file in those ten seconds then enough of the root gets
> copied-on-write that you shouldn't see the deadlock at all.
>

What happens if some accesses come from Internet during that time ?
Files in /sys/log/* are big enough.
Sorry I couldn't understand "the root".

Kenji Arisawa



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Plan 9 server controls impaired and steps taken before
  2005-05-14 22:06                 ` arisawa
@ 2005-05-14 22:33                   ` Russ Cox
  0 siblings, 0 replies; 11+ messages in thread
From: Russ Cox @ 2005-05-14 22:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> > There is a window (I think ten seconds) between snap -a
> > and fossil deciding to start archiving.  If you access any
> > file in those ten seconds then enough of the root gets
> > copied-on-write that you shouldn't see the deadlock at all.
> >
> 
> What happens if some accesses come from Internet during that time ?
> Files in /sys/log/* are big enough.
> Sorry I couldn't understand "the root".

The root of the tree of files and blocks.  It's copy-on-write
after a snapshot but snap -a locks the blocks while it is
archiving.  If the block has already been copied, no big deal.
If it's still the one in the file tree (not been copied-on-write yet)
then you can't access it until the archiver finishes.  I should
fix this to be some sort of read lock but it's not completely
straightforward.

Russ


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-05-14 22:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-10  1:43 [9fans] Plan 9 server controls impaired and steps taken before failure Vester Thacker
2005-05-10  1:50 ` andrey mirtchovski
2005-05-10 21:28   ` Bruce Ellis
2005-05-10 23:14     ` [9fans] Plan 9 server controls impaired and steps taken before andrey mirtchovski
2005-05-11  0:33       ` Vester Thacker
2005-05-13 16:26         ` Russ Cox
2005-05-13 21:29           ` arisawa
2005-05-14 17:11             ` Vester Thacker
2005-05-14 18:03               ` Russ Cox
2005-05-14 22:06                 ` arisawa
2005-05-14 22:33                   ` Russ Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).