From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: Date: Tue, 3 Jan 2006 14:46:56 -0500 From: Russ Cox To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] clunk clunk In-Reply-To: <775b8d190601031128t6efe9166ta02fad0dcf0f1d04@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <775b8d190601031128t6efe9166ta02fad0dcf0f1d04@mail.gmail.com> Topicbox-Message-UUID: cf50f098-ead0-11e9-9d60-3106f5b1d025 > When a process exits it closes it's fds, so it sends a > Tclunk ...but if the process it sends it to is exiting it can't > respond. It's in the same position. Call it "deadly embrace". > > Think about it. This only happens if two file servers have mounted each other, which creates many other possibilities for deadlock too. Usually file servers are careful to dissociate from the name spaces in which they mount themselves, so that they don't access their own files and cause ref count problems. This has the added effect that future servers that get mounted into the name space don't end up mounted in the first server's name space. So these kind of loops basically don't happen. What processes do you have running that are in this deadly embrace? Standard programs or ones you wrote? If the former, which ones? Are you sure they're in Tclunk? There is one exception in Plan 9: upas/fs and plumber have each other mounted, so that plumber can send around references to upas/fs's files. It sometimes happens that they end up sticking around just because of the circular ref count, if somehow the session ends without a hangup note being sent to the note group. Even in this case, though, the Tclunk thing doesn't happen, because plumber doesn't keep any of upas/fs's files open. It could possibly happen if the plumber managed to get killed in the middle of walking one of the upas paths during a stat, but that wouldn't happen hundreds of times on a single system. Instead of making us read through the Inferno code, why not tell us what you did to fix it? A separate kproc to run all clunks? Close all the non-devmnt chans first? Russ