From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Sun, 21 Feb 2016 21:18:26 -0800 To: 9fans@9fans.net Message-ID: In-Reply-To: References: <8FB7CBFD-7334-4F9F-8C71-571DEF9FAD31@ar.aichi-u.ac.jp> <21e3f8eee50170d6fcd21c43384eba04@felloff.net> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] file descriptor leak Topicbox-Message-UUID: 87074216-ead9-11e9-9d60-3106f5b1d025 On Tue Feb 16 13:19:29 PST 2016, charles.forsyth@gmail.com wrote: > On 16 February 2016 at 18:01, wrote: > > > and the parent proc doesnt need the fd to /dev/null, it could as well just > > open it in the child like: > > > > close(0); open("/dev/null", OREAD); > > > > > There's no harm in making and using a more general function, even in a > specific way, so that part's ok. > The caller just needs to play its part properly. > > after spending 5 minutes writing the code fixing all these issues mentiond > > above, i'll just throw it all away and delete the whole remounting logic > > for /net.alt in 9front. > > > It's often better to use the Erlang fail-fast ("just fail") and restart > approach for persistent services. > > More important would be to look at /proc/N/fd on a failing system. > I've a feeling that the system/outside stuff isn't actually the problem, > since I've seen the diagnostic on a system that wasn't using /net.alt. > In that case, the problem (as I remember it) was that an Internet link > further on was down, > so no messages got through to remote DNS, and file descriptors were > building up in slave processes > waiting for replies on /net/udp. Once the link was up, it went back to > normal. we saw this a lot at coraid, but never did catch a smoking-gun process. i don't recall a perfect correlation to internet down. - erik