From mboxrd@z Thu Jan 1 00:00:00 1970 From: pino@dohd.org (Martijn van Buul) Date: Mon, 5 Feb 2001 22:24:46 +0100 Subject: [pups] Strange problems on an uPDP 11/53+ Message-ID: <20010205222446.A3608@mud.stack.nl> After 52 days, my uPDP 11/53+ has suddenly been acting rather strange. /usr/include got 'replaced' by /usr/new, to be precise. At the time, I was the only user. Seeing this, I immediately halted the system, expecting a load of file system errors upon boot. None showed up, and /usr/include is back to itself again. However, programs which *used* to be running perfectly (like my work-in-progress ps) suddenly fail, with a "not enough memory for saving info". Any hints? -- Martijn van Buul - Pino at dohd.org - http://www.stack.nl/~martijnb/ Geek code: G-- - Visit OuterSpace: mud.stack.nl 3333 Kees J. Bot: The sum of CPU power and user brain power is a constant. Received: (from major at localhost) by minnie.cs.adfa.edu.au (8.9.3/8.9.3) id LAA70139 for pups-liszt; Tue, 6 Feb 2001 11:46:34 +1100 (EST) (envelope-from owner-pups at minnie.cs.adfa.edu.au) Received: from moe.2bsd.com (MOE.2BSD.COM [206.139.202.200]) by minnie.cs.adfa.edu.au (8.9.3/8.9.3) with ESMTP id LAA70135 for ; Tue, 6 Feb 2001 11:46:30 +1100 (EST) (envelope-from sms at moe.2bsd.com) Received: (from sms at localhost) by moe.2bsd.com (8.10.1/8.10.1) id f160ZHg18114 for pups at minnie.cs.adfa.edu.au; Mon, 5 Feb 2001 16:35:17 -0800 (PST) Date: Mon, 5 Feb 2001 16:35:17 -0800 (PST) From: "Steven M. Schultz" Message-Id: <200102060035.f160ZHg18114 at moe.2bsd.com> To: pups at minnie.cs.adfa.edu.au Subject: Re: [pups] Strange problems on an uPDP 11/53+ Sender: owner-pups at minnie.cs.adfa.edu.au Precedence: bulk Hi - > From: Martijn van Buul > After 52 days, my uPDP 11/53+ has suddenly been acting rather strange. > /usr/include got 'replaced' by /usr/new, to be precise. At the time, Oops! > I was the only user. Seeing this, I immediately halted the system, > expecting a load of file system errors upon boot. None showed up, and > /usr/include is back to itself again. However, programs which *used* > to be running perfectly (like my work-in-progress ps) suddenly fail, > with a "not enough memory for saving info". > Any hints? How much memory is on the system now after the reboot. The only thing that pops into mind is that the system is running without enough memory. If part of the memory on the system dropped out earlier that would (possibly) explain the strange behaviour was seen. Rebooting/reseting the system would cause the system to recount memory. A program can get 'ENOMEM' as an error two ways: 1) exceeding the maximum 64KB dataspace (stack + data) or 2) the system has run out of swap or the maps ('coremap' and/or 'swapmap') have become too fragmented. Two commands that can be useful in obtaining more information are sysctl hw and pstat -s "sysctl hw" will give several lines of output - the two you'd be interested in are hw.physmem = 2097152 hw.usermem = 415744 'physmem' is the amount of memory physically present and 'usermem' is the amount current free and available for user programs. "pstat -s" will give a swap space usage summary. Steven Schultz sms at Moe.2bsd.com Received: (from major at localhost) by minnie.cs.adfa.edu.au (8.9.3/8.9.3) id SAA71926 for pups-liszt; Tue, 6 Feb 2001 18:41:36 +1100 (EST) (envelope-from owner-pups at minnie.cs.adfa.edu.au) Received: from mud.stack.nl (mud.stack.nl [131.155.141.98]) by minnie.cs.adfa.edu.au (8.9.3/8.9.3) with ESMTP id SAA71922 for ; Tue, 6 Feb 2001 18:41:32 +1100 (EST) (envelope-from martijnb at stack.nl) Received: by mud.stack.nl (Postfix, from userid 587) id D00657F08; Tue, 6 Feb 2001 08:39:28 +0100 (CET) Date: Tue, 6 Feb 2001 08:39:28 +0100 From: Martijn van Buul To: "Steven M. Schultz" Cc: pups at minnie.cs.adfa.edu.au Subject: Re: [pups] Strange problems on an uPDP 11/53+ Message-ID: <20010206083928.A15141 at mud.stack.nl> Reply-To: Martijn van Buul References: <200102060035.f160ZHg18114 at moe.2bsd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.3i In-Reply-To: <200102060035.f160ZHg18114 at moe.2bsd.com>; from sms at moe.2bsd.com on Mon, Feb 05, 2001 at 04:35:17PM -0800 Sender: owner-pups at minnie.cs.adfa.edu.au Precedence: bulk Steven M. Schultz wrote: > Hi - > > > From: Martijn van Buul > > After 52 days, my uPDP 11/53+ has suddenly been acting rather strange. > > /usr/include got 'replaced' by /usr/new, to be precise. At the time, > > Oops! Well, strange things are afoot indeed. About the same time, 1 machine crashed (A DEC Alpha running OpenBSD), 2 started acting very strangely, and had to be rebooted (My PDP, and a Wintel box running Windows 2000), and a 4th machine (A Wintel box running Minix-VMD) suddenly had some problems reading his harddisk and using its network (but recovered). The strange thing is that these machines aren't related in any way but one: they're standing quite near to eachother. Do I hear EMC somewhere? > > Any hints? > > How much memory is on the system now after the reboot. 1.5 MB. 798 Kilowords. > The only thing that pops into mind is that the system is running > without enough memory. If part of the memory on the system dropped > out earlier that would (possibly) explain the strange behaviour was > seen. Rebooting/reseting the system would cause the system to > recount memory. Well, the machine had 1.5 MB before it crashed.. It's doubtlessly some memory fault, but it *seems* to be a temporal one. > > "sysctl hw" will give several lines of output - the two you'd be > interested in are > > hw.physmem = 2097152 hw.physmem = 1572864 > hw.usermem = 415744 hw.usermem = 313472 > 'physmem' is the amount of memory physically present and 'usermem' is > the amount current free and available for user programs. Should be enough. 'cc' works without problems - only my ps with debug info seems to be affected; it might not be a memory issue, but a "ps can't determine the right amount of processes"-issue.. I've checked it, and this seems to be the case. Ps thinks that there are 0 processes running, and does a outargs = (struct psout *)calloc(nproc, sizeof(struct psout)); on that. With 'nproc' being 0, this returns a NULL pointer, but doesn't mean that the process is out of memory. Having no ps is very annoying; finding back those 4 children spawned by a httpd can be a nuisance then. pstat -p works, but it isn't comfortable:) > "pstat -s" will give a swap space usage summary. 15/59 swapmap entries 910 kbytes swap used, 6263 kbytes free -- Martijn van Buul - Pino at dohd.org - http://www.stack.nl/~martijnb/ Geek code: G-- - Visit OuterSpace: mud.stack.nl 3333 Kees J. Bot: The sum of CPU power and user brain power is a constant. Received: (from major at localhost) by minnie.cs.adfa.edu.au (8.9.3/8.9.3) id DAA75027 for pups-liszt; Wed, 7 Feb 2001 03:47:05 +1100 (EST) (envelope-from owner-pups at minnie.cs.adfa.edu.au) Received: from moe.2bsd.com (MOE.2BSD.COM [206.139.202.200]) by minnie.cs.adfa.edu.au (8.9.3/8.9.3) with ESMTP id DAA75023 for ; Wed, 7 Feb 2001 03:46:56 +1100 (EST) (envelope-from sms at moe.2bsd.com) Received: (from sms at localhost) by moe.2bsd.com (8.10.1/8.10.1) id f16Ga3301595; Tue, 6 Feb 2001 08:36:03 -0800 (PST) Date: Tue, 6 Feb 2001 08:36:03 -0800 (PST) From: "Steven M. Schultz" Message-Id: <200102061636.f16Ga3301595 at moe.2bsd.com> To: pino at dohd.org, sms at moe.2bsd.com Subject: Re: [pups] Strange problems on an uPDP 11/53+ Cc: pups at minnie.cs.adfa.edu.au Sender: owner-pups at minnie.cs.adfa.edu.au Precedence: bulk Hi -- > Well, strange things are afoot indeed. About the same time, 1 machine... > they're standing quite near to eachother. Do I hear EMC somewhere? Time to increase the shielding around the computer room, eh? ;-) > Well, the machine had 1.5 MB before it crashed.. It's doubtlessly some > memory fault, but it *seems* to be a temporal one. I do not think it is a memory/hardware problem - that was just a guess (not a very good one at that ;)). > hw.usermem = 313472 That's fine. > Should be enough. 'cc' works without problems - only my ps with debug What about the standard 'ps' that came with the system? > info seems to be affected; it might not be a memory issue, but a "ps can't > determine the right amount of processes"-issue.. > I've checked it, and this seems to be the case. Ps thinks that there are > 0 processes running, and does a > outargs = (struct psout *)calloc(nproc, sizeof(struct psout)); Ah, ok - malloc() used to actually return a non-NULL pointer when presented with a size request of 0. That was an error and was changed (I forget the exact update/patch number). There were a couple programs in the system that relied on the old behaviour and those had to be fixed. > on that. With 'nproc' being 0, this returns a NULL pointer, but doesn't > mean that the process is out of memory. Right, the ENOMEM error was overloaded by malloc(). An argument can be made that EINVAL should have been returned instead by malloc() if 0 was passed in. > Having no ps is very annoying; finding back those 4 children spawned > by a httpd can be a nuisance then. pstat -p works, but it isn't comfortable:) Are you are using the traditional 'nlist()' method of reading the kernel symbol table to look for 'nproc' and '_proc'? If so is there a permissions problem? /dev/*mem needs to be group=kmem, mode 640, the /unix image should be mode 644 and the 'ps' program setgid to kmem. If there is a problem reading the kernel symbol table then 'nproc' will remain 0 which is what you're seeing. Another way of examining some kernel variables (proc table, file table, etc) is with the "sysctl" call. It's much faster since it doesn't have to do a sequential scan of the /unix symbol table. You can look in /usr/src/ucb/w.c at the function 'readpr()' to see how to examine the proc table using sysctl. Steve