From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19945 invoked from network); 27 Apr 1999 21:15:56 -0000 Received: from sunsite.auc.dk (130.225.51.30) by ns1.primenet.com.au with SMTP; 27 Apr 1999 21:15:56 -0000 Received: (qmail 19176 invoked by alias); 27 Apr 1999 21:15:14 -0000 Mailing-List: contact zsh-workers-help@sunsite.auc.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 6126 Received: (qmail 19169 invoked from network); 27 Apr 1999 21:15:13 -0000 Date: Wed, 28 Apr 1999 00:14:46 +0300 From: Ville Herva To: Peter Stephenson , Zsh hackers list Cc: sak@iki.fi Subject: [Solved] Re: Terminal problem with linux-2.0.34 Message-ID: <19990428001446.A4114@babbage.tky.hut.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.94.15i In-Reply-To: <9902171603.AA23593@ibmth.df.unipi.it>; from Peter Stephenson on Wed, Feb 17, 1999 at 05:03:02PM +0100 On Wed, Feb 17, 1999 at 05:03:02PM +0100, you [Peter Stephenson] claimed: > Ville Herva wrote: > > Today, I came across a very interesting problem with tha same linux > > machine: the clock() function would always return -1! > > clock() isn't used directly in 3.1.5, but times(), which is closely > related, is used. You can see if the bug shows up there just by typing > `times' (plus two returns :-(), which would typically report a few seconds' > usage. However, it doesn't look like it should be crucial, though if > there's a kernel bug around, all bets are off. > > If system calls are tickling a deeper problem, then apart from times() the > other chief suspect might be getrlimit(), because of its association with > times for processes, though times() is more likely. A brief trial on > 2.0.32 suggests neither /bin/bash (1.14.7(1)) nor /bin/tcsh (6.07.02) use > times() in their initialisation, and only call getrlimit() for > RLIMIT_NOFILE (bash) or only when told to (tcsh) --- zsh calls times() > after every command and reads all the limits when starting. If you feel > interested enough to comment out all the calls to times(), that should be > harmless enough in terms of side effects. After finding that interesting feature of clock() (that was fixed by the glibc maintainers), I had a rather long period of not paying attention to this. Then, somebody who had similar problems, pointed me that a select() call had a rather weird timeout value in the starce output: > select(11, [10], NULL, NULL, {20976515, 300}) = 1 (in [10], left > {20976512, 870000}) This proved to be the select call in line 517 in Zle/zle_main.c (zsh-3.1.5 vanilla): if (!kungetct && select(SHTTY+1, (SELECT_ARG_2_T) & foofd, NULL, NULL, &tv) <= 0) If I add the following line before the above call, zsh works well: tv.tv_sec = 0; As far as I an see, this not a but in zsh. As we know, on Linux, select() modifies the tv struct to reflect the time not spend waiting for io. (If the time limit is, say, 5 secs and select() uses 2, it changes the tv value to 3). Now, zsh_main.c does reinitialize tv.tv_usec before each call. It only initializes tv.tv_sec to zero once, which is sane, because select() should only decrement the value. After 248 days of uptime select seems to begin poking some weird values (like 13126839) to the tv.tv_sec variable. It is not obvious to me at the first glance where in the kernel the bug is, but I'll try to find it. The linux select man page does say "Consider timeout to be undefined after select returns." so perhaps this is well defined behaviour ;). Anyway, the mentioned fix should not break anything, so you may want to merge that. -- v -- v@iki.fi