9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] broken smtpd
@ 2008-11-06  4:31 Kenji Arisawa
  2008-11-06 10:30 ` Richard Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Kenji Arisawa @ 2008-11-06  4:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hello,

I sometimes observe broken smtpd.

ar% ps
...
none        1948578    0:01  44:55      264K Broken   smtpd
none        2521172    0:01  44:58      264K Broken   smtpd
none        3190037    0:01  45:03      264K Broken   smtpd
...

ar% ACID 3190037
/proc/3190037/text:386 plan 9  executable
/sys/lib/acid/port
/sys/lib/acid/386
acid: lstk()
errstr()+0x7 /sys/src/libc/9syscall/errstr.s:5
syslog(logname=0x3760a,cons=0x0,fmt=0x37610)+0x20 /sys/src/libc/9sys/
syslog.c:46
	err=0x616d2f27
	d=0x0
	ctim=0x0
	buf=0x0
	p=0x0
	arg=0x0
	n=0x0
catchalarm(msg=0xdfffe1ac)+0x5d /sys/src/cmd/upas/smtp/smtpd.c:68
	rv=0x1
notifier+0x30 /sys/src/libc/port/atnotify.c:15
acid:


ar% pr -tn /sys/src/libc/9sys/syslog.c
...
    34	void
    35	syslog(int cons, char *logname, char *fmt, ...)
    36	{
    37		char buf[1024];
    38		char *ctim, *p;
    39		va_list arg;
    40		int n;
    41		Dir *d;
    42		char err[ERRMAX];
    43
    44		err[0] = '\0';
    45		errstr(err, sizeof err);
    46		lock(&sl);
    47
...


Is this phenomenon specific to me?

Kenji Arisawa





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] broken smtpd
  2008-11-06  4:31 [9fans] broken smtpd Kenji Arisawa
@ 2008-11-06 10:30 ` Richard Miller
  2008-11-06 12:59   ` erik quanstrom
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Miller @ 2008-11-06 10:30 UTC (permalink / raw)
  To: 9fans

> Is this phenomenon specific to me?

No, it isn't:

term% ps|grep Broken
none            917    2:08  42:50      264K Broken   smtpd




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] broken smtpd
  2008-11-06 10:30 ` Richard Miller
@ 2008-11-06 12:59   ` erik quanstrom
  2008-11-17 12:04     ` Kenji Arisawa
  0 siblings, 1 reply; 10+ messages in thread
From: erik quanstrom @ 2008-11-06 12:59 UTC (permalink / raw)
  To: 9fans

>> Is this phenomenon specific to me?
>
> No, it isn't:
>
> term% ps|grep Broken
> none            917    2:08  42:50      264K Broken   smtpd

impressive!

i don't see this problem, but i have implemented
some fairly agressive spam control.  perhaps this
shows up after i tend to hang up.

(it is possible to use nupas smtpd with regular old
upas.)

would anyone mind sending me a full backtrace?

- erik




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] broken smtpd
  2008-11-06 12:59   ` erik quanstrom
@ 2008-11-17 12:04     ` Kenji Arisawa
  2008-11-17 12:51       ` erik quanstrom
  0 siblings, 1 reply; 10+ messages in thread
From: Kenji Arisawa @ 2008-11-17 12:04 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hello,

Today, I observed smtpd that was in endless loop and then broken.



ar% ps
...
none          20392    0:01  28:27      264K Running  smtpd
...



ar% ACID 20392
/proc/20392/text:386 plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/386
acid: lstk()
receiver(path=0x41f58)+0x1c5 /sys/src/cmd/upas/smtp/smtpd.c:625
	sender=0x379bf
	rcpt=0x41ad8
zzparse()+0x2c9 /sys/src/cmd/upas/smtp/smtpd.y:773
	save1=0x0
	save2=0x0
	save3=0x0
	save4=0x0
	zzstate=0xf
	zzchar=0xffffffff
	zzs=0x3f36b
	zzp=0xdfffe41c
	zzn=0xfffffc18
main(argv=0xdfffefa4,argc=0x0)+0x261 /sys/src/cmd/upas/smtp/smtpd.c:194
	netdir=0xdfffefbc
	_argc=0x6e
	_args=0x376c2
	buf=0x0
_main+0x31 /sys/src/libc/386/main9.s:16
acid: src(receiver+0x1c5)
/sys/src/cmd/upas/smtp/smtpd.c:625
  620
  621		logged = 0;
  622
  623		/* forwarding() can modify 'path' on loopback request */
  624		if(filterstate == ACCEPT && fflag && !authenticated &&
forwarding(path)) {
 >625			syslog(0, "smtpd", "Bad Forward %s (%s/%s) (%s)",
  626				s_to_c(senders.last->p), him, nci->rsys, s_to_c(path));
  627			rejectcount++;
  628			reply("550 5.7.1 we don't relay.  send to your-path@[] for "
  629				"loopback.\r\n");
  630			return;


I don't observe "Bad Forward" of this mail in /sys/log/smtpd.

After some amount of time,

ar% ps
...
none          20392    0:02  44:57      264K Broken   smtpd
...

acid: lstk()
errstr()+0x7 /sys/src/libc/9syscall/errstr.s:5
syslog(logname=0x3760a,cons=0x0,fmt=0x37610)+0x20 /sys/src/libc/9sys/
syslog.c:46
	err=0x616d2f27
	d=0x0
	ctim=0x0
	buf=0x0
	p=0x0
	arg=0x0
	n=0x0
catchalarm(msg=0xdfffe1ac)+0x5d /sys/src/cmd/upas/smtp/smtpd.c:68
	rv=0x1
notifier+0x30 /sys/src/libc/port/atnotify.c:15
acid: src(syslog+0x29)
/sys/src/libc/9sys/syslog.c:46
  41		Dir *d;
  42		char err[ERRMAX];
  43
  44		err[0] = '\0';
  45		errstr(err, sizeof err);
 >46		lock(&sl);
  47
  48		/*
  49		 *  paranoia makes us stat to make sure a fork+close
  50		 *  hasn't broken our fd's
  51		 */
acid:


I hope the above information is helpful to fix the problem.

Kenji Arisawa


On 2008/11/06, at 21:59, erik quanstrom wrote:

>>> Is this phenomenon specific to me?
>>
>> No, it isn't:
>>
>> term% ps|grep Broken
>> none            917    2:08  42:50      264K Broken   smtpd
>
> impressive!
>
> i don't see this problem, but i have implemented
> some fairly agressive spam control.  perhaps this
> shows up after i tend to hang up.
>
> (it is possible to use nupas smtpd with regular old
> upas.)
>
> would anyone mind sending me a full backtrace?
>
> - erik
>




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] broken smtpd
  2008-11-17 12:04     ` Kenji Arisawa
@ 2008-11-17 12:51       ` erik quanstrom
  2008-11-17 13:33         ` Kernel Panic
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: erik quanstrom @ 2008-11-17 12:51 UTC (permalink / raw)
  To: 9fans

d'oh!  the return value from catchalarm looks reversed.  from notify(2)

          [...] A handler must
          return a non-zero number if the note was recognized (and
          resolved); otherwise it must return zero.  When the system

i think you're getting into some sort of note loop.  i think it would
be helpful to log all notes.  i would try something like this

static int
catchalarm(void *, char *msg)
{
	static int chattycathy;

	if(chattycathy++ > 5)
		return 0;
	if(senders.first && rcvers.first)
		syslog(0, "smtpd", "note: %s->%s: %s",
			s_to_c(senders.first->p),
			s_to_c(rcvers.first->p), msg);
	else
		syslog(0, "smtpd", "note: %s", msg);
	if(pp){
		syskillpg(pp->pid);
		pp = 0;
	}
	return strstr(msg, "alarm") != 0;
}

but at a minimum, i would reverse the return values.

- erik




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] broken smtpd
  2008-11-17 12:51       ` erik quanstrom
@ 2008-11-17 13:33         ` Kernel Panic
  2008-11-17 13:45           ` erik quanstrom
  2008-11-17 14:02         ` Charles Forsyth
  2008-11-19  0:25         ` Kenji Arisawa
  2 siblings, 1 reply; 10+ messages in thread
From: Kernel Panic @ 2008-11-17 13:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

erik quanstrom wrote:
> d'oh!  the return value from catchalarm looks reversed.  from notify(2)
>
>           [...] A handler must
>           return a non-zero number if the note was recognized (and
>           resolved); otherwise it must return zero.  When the system
>
> i think you're getting into some sort of note loop.  i think it would
> be helpful to log all notes.  i would try something like this
>
its not a good idea to call syslog() in a note handler, because it
aquires locks
and calls into the memory allocator by strdup() (that too aquires locks).

if the note interrupts the program while in syslog() or the allocator,
it will deadlock.

--
cinap
> static int
> catchalarm(void *, char *msg)
> {
> 	static int chattycathy;
>
> 	if(chattycathy++ > 5)
> 		return 0;
> 	if(senders.first && rcvers.first)
> 		syslog(0, "smtpd", "note: %s->%s: %s",
> 			s_to_c(senders.first->p),
> 			s_to_c(rcvers.first->p), msg);
> 	else
> 		syslog(0, "smtpd", "note: %s", msg);
> 	if(pp){
> 		syskillpg(pp->pid);
> 		pp = 0;
> 	}
> 	return strstr(msg, "alarm") != 0;
> }
>
> but at a minimum, i would reverse the return values.
>
> - erik
>




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] broken smtpd
  2008-11-17 13:33         ` Kernel Panic
@ 2008-11-17 13:45           ` erik quanstrom
  0 siblings, 0 replies; 10+ messages in thread
From: erik quanstrom @ 2008-11-17 13:45 UTC (permalink / raw)
  To: 9fans

> its not a good idea to call syslog() in a note handler, because it
> aquires locks
> and calls into the memory allocator by strdup() (that too aquires locks).
>
> if the note interrupts the program while in syslog() or the allocator,
> it will deadlock.

excellent point.  in this case, i think it is more likely that the problem
is more mundane than that.  in any event, we are chewing gobs
of cpu time, so we don't have a literal deadlock.

- erik



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] broken smtpd
  2008-11-17 12:51       ` erik quanstrom
  2008-11-17 13:33         ` Kernel Panic
@ 2008-11-17 14:02         ` Charles Forsyth
  2008-11-19  0:25         ` Kenji Arisawa
  2 siblings, 0 replies; 10+ messages in thread
From: Charles Forsyth @ 2008-11-17 14:02 UTC (permalink / raw)
  To: 9fans

if you look at /dev/kmesg you might find many messages about "checked N page table entries".
i think smtpd is getting a trap (probably a null pointer) and because the value returned
from catchalarm is the wrong way round, it continues to retry using the bad pointer until
the alarm goes off (after a long time), when it breaks (because no note handler has
handled the alarm).

i didn't really understand why it was trying to continue after the alarm,
rather than logging and quitting.  does it not always mean that a caller
has connected but hasn't completed a request after a long time?
as cinap noted(!) there's a separate problem of what happens if the trap or alarm
happens when locks are held, but i think that's less likely to be a problem in practice:
an alarm will probably go off when nothing much has happened (ie, the program isn't
holding any other lock but waiting for input).  it's harder to be sure about
bad pointer traps, but if those happen with locks held,
i'd suspect those are more likely to be locks unrelated to those needed by syslog and print.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] broken smtpd
  2008-11-17 12:51       ` erik quanstrom
  2008-11-17 13:33         ` Kernel Panic
  2008-11-17 14:02         ` Charles Forsyth
@ 2008-11-19  0:25         ` Kenji Arisawa
  2008-11-19  0:33           ` erik quanstrom
  2 siblings, 1 reply; 10+ messages in thread
From: Kenji Arisawa @ 2008-11-19  0:25 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hello,

I added some debugging codes to catchalarm() in smtpd.c

static int
catchalarm(void *a, char *msg)
{
	int rv = 1;
	static int count = 0; // DEBUG by Kenar
	count++;
	if(count > 3)
		return 0;

	USED(a);

	/* log alarms but continue */
	if(strstr(msg, "alarm")){
		if(senders.first && rcvers.first)
			syslog(0, "smtpd", "note: %s->%s: %s",
				s_to_c(senders.first->p),
				s_to_c(rcvers.first->p), msg);
		else
			syslog(0, "smtpd", "note: %s", msg);
		rv = 0;
	}

	syslog(0, "smtpd", "note: %s", msg); // DEBUG by Kenar

	/* kill the children if there are any */
	if(pp)
		syskillpg(pp->pid);

	return rv;
}

after that I found broken smtpd:
none          50392    0:00   0:00      264K Broken   smtpd

note that wasted cpu time is now 0:00

/sys/log/smtpd shows:
ar Nov 19 02:23:05 ehlo from 124.8.67.36 as sotcndhz.com
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463

Kenji Arisawa

On 2008/11/17, at 21:51, erik quanstrom wrote:

> d'oh!  the return value from catchalarm looks reversed.  from
> notify(2)
>
>          [...] A handler must
>          return a non-zero number if the note was recognized (and
>          resolved); otherwise it must return zero.  When the system
>
> i think you're getting into some sort of note loop.  i think it would
> be helpful to log all notes.  i would try something like this
>
> static int
> catchalarm(void *, char *msg)
> {
> 	static int chattycathy;
>
> 	if(chattycathy++ > 5)
> 		return 0;
> 	if(senders.first && rcvers.first)
> 		syslog(0, "smtpd", "note: %s->%s: %s",
> 			s_to_c(senders.first->p),
> 			s_to_c(rcvers.first->p), msg);
> 	else
> 		syslog(0, "smtpd", "note: %s", msg);
> 	if(pp){
> 		syskillpg(pp->pid);
> 		pp = 0;
> 	}
> 	return strstr(msg, "alarm") != 0;
> }
>
> but at a minimum, i would reverse the return values.
>
> - erik
>




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] broken smtpd
  2008-11-19  0:25         ` Kenji Arisawa
@ 2008-11-19  0:33           ` erik quanstrom
  0 siblings, 0 replies; 10+ messages in thread
From: erik quanstrom @ 2008-11-19  0:33 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]

acid tells us that we're indirecting a nil senders.last->p.
we know it's senders.last->p because it's at offset 4.
(that was the fault address.)  the Lock at the beginning
of String is bigger than 4 bytes by itself, so it's not the
String and rsys is not the 2d pointer in NetConnInfo,
so that eliminates rsys.

acid: src(0x7463)
/sys/src/cmd/upas/smtp/smtpd.c:628
 623		logged = 0;
 624
 625		/* forwarding() can modify 'path' on loopback request */
 626		if(filterstate == ACCEPT && fflag && !authenticated && forwarding(path)) {
 627			syslog(0, "smtpd", "Bad Forward %s (%s/%s) (%s)",
>628				s_to_c(senders.last->p), him, nci->rsys, s_to_c(path));
 629			rejectcount++;
 630			reply("550 5.7.1 we don't relay.  send to your-path@[] for "
 631				"loopback.\r\n");
 632			return;
 633		}

replace lines 627,628 with

		syslog(0, "smtpd", "Bad Forward %s (%s/%s) (%s)",
			sender, him, nci->rsys, rcpt);

i made this change some time ago and it appears to be working fine.
i may have seen a similar fault.

- erik

[-- Attachment #2: Type: message/rfc822, Size: 3885 bytes --]

From: Kenji Arisawa <arisawa@ar.aichi-u.ac.jp>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] broken smtpd
Date: Wed, 19 Nov 2008 09:25:55 +0900
Message-ID: <5280BC21-18A9-4287-A1AE-23EEAFAE42CF@ar.aichi-u.ac.jp>

Hello,

I added some debugging codes to catchalarm() in smtpd.c

static int
catchalarm(void *a, char *msg)
{
	int rv = 1;
	static int count = 0; // DEBUG by Kenar
	count++;
	if(count > 3)
		return 0;

	USED(a);

	/* log alarms but continue */
	if(strstr(msg, "alarm")){
		if(senders.first && rcvers.first)
			syslog(0, "smtpd", "note: %s->%s: %s",
				s_to_c(senders.first->p),
				s_to_c(rcvers.first->p), msg);
		else
			syslog(0, "smtpd", "note: %s", msg);
		rv = 0;
	}

	syslog(0, "smtpd", "note: %s", msg); // DEBUG by Kenar

	/* kill the children if there are any */
	if(pp)
		syskillpg(pp->pid);

	return rv;
}

after that I found broken smtpd:
none          50392    0:00   0:00      264K Broken   smtpd

note that wasted cpu time is now 0:00

/sys/log/smtpd shows:
ar Nov 19 02:23:05 ehlo from 124.8.67.36 as sotcndhz.com
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463

Kenji Arisawa

On 2008/11/17, at 21:51, erik quanstrom wrote:

> d'oh!  the return value from catchalarm looks reversed.  from
> notify(2)
>
>          [...] A handler must
>          return a non-zero number if the note was recognized (and
>          resolved); otherwise it must return zero.  When the system
>
> i think you're getting into some sort of note loop.  i think it would
> be helpful to log all notes.  i would try something like this
>
> static int
> catchalarm(void *, char *msg)
> {
> 	static int chattycathy;
>
> 	if(chattycathy++ > 5)
> 		return 0;
> 	if(senders.first && rcvers.first)
> 		syslog(0, "smtpd", "note: %s->%s: %s",
> 			s_to_c(senders.first->p),
> 			s_to_c(rcvers.first->p), msg);
> 	else
> 		syslog(0, "smtpd", "note: %s", msg);
> 	if(pp){
> 		syskillpg(pp->pid);
> 		pp = 0;
> 	}
> 	return strstr(msg, "alarm") != 0;
> }
>
> but at a minimum, i would reverse the return values.
>
> - erik
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-11-19  0:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-06  4:31 [9fans] broken smtpd Kenji Arisawa
2008-11-06 10:30 ` Richard Miller
2008-11-06 12:59   ` erik quanstrom
2008-11-17 12:04     ` Kenji Arisawa
2008-11-17 12:51       ` erik quanstrom
2008-11-17 13:33         ` Kernel Panic
2008-11-17 13:45           ` erik quanstrom
2008-11-17 14:02         ` Charles Forsyth
2008-11-19  0:25         ` Kenji Arisawa
2008-11-19  0:33           ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).