9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] bind /net
@ 2012-08-04  4:00 cinap_lenrek
  2012-08-04 13:33 ` erik quanstrom
  0 siblings, 1 reply; 3+ messages in thread
From: cinap_lenrek @ 2012-08-04  4:00 UTC (permalink / raw)
  To: 9fans

Recently helped debugging a strange plan9 server problem. The
machine being a cpu/auth/file server basicly doing everything
from serving http with rc-httpd, accepting mail, serving dns
and running a bunch of cronjobs doing various things. the
machine is quite busy.

It worked quite well for a some time. Then, it would stop
accepting cpu logins. The clients cpu process would just hang
there. Http would continue serve fine for a while until
that will stop working too and finally, the machine will lockup
and reboot.

This happend like every 2 days or so.

After some time, we where able to get a picture of what seemed
to going on.

There would be many processes blocked opening /mnt/factotum/rpc.
Trying to ls /mnt will hang the ls... The machine would slowly
accumulate locked up processes until it reached the 2k process
limit...

Problem was that factotum seemed busy in some auth protocol.
(this really sucks. factotum is mounted directly on /mnt instead
of /mnt/factotum and is single threaded so when its doing some
auth business, noone can walk /mnt... this can even cause
deadlock with authsrv which tries to access /mnt/keys on the
same machine... but thats a different thing...)

But there was no tcp567 or authsrv processes arround (the machine
is itself an auth server).

Netstat showed 2 established port 567 (ticket) connections. one
for the outgoing one (to itself) and a incoming one (from itself).

So where was that authsrv process?

We greped for these 2 tcp connections in /proc/*/fd and turned
out that the incoming one was showing up in the filedescriptor
table of *exportfs* processes that where used to import /net from
that machine instead of any authsrv.

How was this possible?

A terminal that was importing /net from this machine used to run
aux/listen1 -t to run some local service prior importing /net in
the same namespace. Why is this a problem? Well, the -t option
causes listen1 to not fork its namespace so it will notice when
we later overmount /net. On startup, it will succeed announcing
the port on the original /net and start listening. Then, the parent
process will change the /net under its foot. If a new connection
comes in and listen1 will try to accept it and open its data file,
it will grab some random connection on the *servers* /net instead
of the one it was originaly listening on!

We greped for the mysterious ticket connection path on the terminal
and found it as the stdin of the completly unrelated "local" service
on that terminal. And its /proc/xxx/ns file confirmed it was using
the remote /net. Killing that process immidiately made the server
unblock itself and continue normal operation.

So dont do this at home kids. use /net.alt or face the consequences.

--
cinap



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [9fans] bind /net
  2012-08-04  4:00 [9fans] bind /net cinap_lenrek
@ 2012-08-04 13:33 ` erik quanstrom
  2012-08-05  0:55   ` cinap_lenrek
  0 siblings, 1 reply; 3+ messages in thread
From: erik quanstrom @ 2012-08-04 13:33 UTC (permalink / raw)
  To: 9fans

> Problem was that factotum seemed busy in some auth protocol.
> (this really sucks. factotum is mounted directly on /mnt instead
> of /mnt/factotum and is single threaded so when its doing some
> auth business, noone can walk /mnt... this can even cause
> deadlock with authsrv which tries to access /mnt/keys on the
> same machine... but thats a different thing...)

the rsc factotum from 9atom is multithreaded.

- erik



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [9fans] bind /net
  2012-08-04 13:33 ` erik quanstrom
@ 2012-08-05  0:55   ` cinap_lenrek
  0 siblings, 0 replies; 3+ messages in thread
From: cinap_lenrek @ 2012-08-05  0:55 UTC (permalink / raw)
  To: 9fans

very good. i'll look into this. thanks :)

--
cinap



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-08-05  0:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-04  4:00 [9fans] bind /net cinap_lenrek
2012-08-04 13:33 ` erik quanstrom
2012-08-05  0:55   ` cinap_lenrek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).