* [9fans] broken smtpd
@ 2008-11-06 4:31 Kenji Arisawa
2008-11-06 10:30 ` Richard Miller
0 siblings, 1 reply; 10+ messages in thread
From: Kenji Arisawa @ 2008-11-06 4:31 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Hello,
I sometimes observe broken smtpd.
ar% ps
...
none 1948578 0:01 44:55 264K Broken smtpd
none 2521172 0:01 44:58 264K Broken smtpd
none 3190037 0:01 45:03 264K Broken smtpd
...
ar% ACID 3190037
/proc/3190037/text:386 plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/386
acid: lstk()
errstr()+0x7 /sys/src/libc/9syscall/errstr.s:5
syslog(logname=0x3760a,cons=0x0,fmt=0x37610)+0x20 /sys/src/libc/9sys/
syslog.c:46
err=0x616d2f27
d=0x0
ctim=0x0
buf=0x0
p=0x0
arg=0x0
n=0x0
catchalarm(msg=0xdfffe1ac)+0x5d /sys/src/cmd/upas/smtp/smtpd.c:68
rv=0x1
notifier+0x30 /sys/src/libc/port/atnotify.c:15
acid:
ar% pr -tn /sys/src/libc/9sys/syslog.c
...
34 void
35 syslog(int cons, char *logname, char *fmt, ...)
36 {
37 char buf[1024];
38 char *ctim, *p;
39 va_list arg;
40 int n;
41 Dir *d;
42 char err[ERRMAX];
43
44 err[0] = '\0';
45 errstr(err, sizeof err);
46 lock(&sl);
47
...
Is this phenomenon specific to me?
Kenji Arisawa
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd
2008-11-06 4:31 [9fans] broken smtpd Kenji Arisawa
@ 2008-11-06 10:30 ` Richard Miller
2008-11-06 12:59 ` erik quanstrom
0 siblings, 1 reply; 10+ messages in thread
From: Richard Miller @ 2008-11-06 10:30 UTC (permalink / raw)
To: 9fans
> Is this phenomenon specific to me?
No, it isn't:
term% ps|grep Broken
none 917 2:08 42:50 264K Broken smtpd
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd
2008-11-06 10:30 ` Richard Miller
@ 2008-11-06 12:59 ` erik quanstrom
2008-11-17 12:04 ` Kenji Arisawa
0 siblings, 1 reply; 10+ messages in thread
From: erik quanstrom @ 2008-11-06 12:59 UTC (permalink / raw)
To: 9fans
>> Is this phenomenon specific to me?
>
> No, it isn't:
>
> term% ps|grep Broken
> none 917 2:08 42:50 264K Broken smtpd
impressive!
i don't see this problem, but i have implemented
some fairly agressive spam control. perhaps this
shows up after i tend to hang up.
(it is possible to use nupas smtpd with regular old
upas.)
would anyone mind sending me a full backtrace?
- erik
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd
2008-11-06 12:59 ` erik quanstrom
@ 2008-11-17 12:04 ` Kenji Arisawa
2008-11-17 12:51 ` erik quanstrom
0 siblings, 1 reply; 10+ messages in thread
From: Kenji Arisawa @ 2008-11-17 12:04 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Hello,
Today, I observed smtpd that was in endless loop and then broken.
ar% ps
...
none 20392 0:01 28:27 264K Running smtpd
...
ar% ACID 20392
/proc/20392/text:386 plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/386
acid: lstk()
receiver(path=0x41f58)+0x1c5 /sys/src/cmd/upas/smtp/smtpd.c:625
sender=0x379bf
rcpt=0x41ad8
zzparse()+0x2c9 /sys/src/cmd/upas/smtp/smtpd.y:773
save1=0x0
save2=0x0
save3=0x0
save4=0x0
zzstate=0xf
zzchar=0xffffffff
zzs=0x3f36b
zzp=0xdfffe41c
zzn=0xfffffc18
main(argv=0xdfffefa4,argc=0x0)+0x261 /sys/src/cmd/upas/smtp/smtpd.c:194
netdir=0xdfffefbc
_argc=0x6e
_args=0x376c2
buf=0x0
_main+0x31 /sys/src/libc/386/main9.s:16
acid: src(receiver+0x1c5)
/sys/src/cmd/upas/smtp/smtpd.c:625
620
621 logged = 0;
622
623 /* forwarding() can modify 'path' on loopback request */
624 if(filterstate == ACCEPT && fflag && !authenticated &&
forwarding(path)) {
>625 syslog(0, "smtpd", "Bad Forward %s (%s/%s) (%s)",
626 s_to_c(senders.last->p), him, nci->rsys, s_to_c(path));
627 rejectcount++;
628 reply("550 5.7.1 we don't relay. send to your-path@[] for "
629 "loopback.\r\n");
630 return;
I don't observe "Bad Forward" of this mail in /sys/log/smtpd.
After some amount of time,
ar% ps
...
none 20392 0:02 44:57 264K Broken smtpd
...
acid: lstk()
errstr()+0x7 /sys/src/libc/9syscall/errstr.s:5
syslog(logname=0x3760a,cons=0x0,fmt=0x37610)+0x20 /sys/src/libc/9sys/
syslog.c:46
err=0x616d2f27
d=0x0
ctim=0x0
buf=0x0
p=0x0
arg=0x0
n=0x0
catchalarm(msg=0xdfffe1ac)+0x5d /sys/src/cmd/upas/smtp/smtpd.c:68
rv=0x1
notifier+0x30 /sys/src/libc/port/atnotify.c:15
acid: src(syslog+0x29)
/sys/src/libc/9sys/syslog.c:46
41 Dir *d;
42 char err[ERRMAX];
43
44 err[0] = '\0';
45 errstr(err, sizeof err);
>46 lock(&sl);
47
48 /*
49 * paranoia makes us stat to make sure a fork+close
50 * hasn't broken our fd's
51 */
acid:
I hope the above information is helpful to fix the problem.
Kenji Arisawa
On 2008/11/06, at 21:59, erik quanstrom wrote:
>>> Is this phenomenon specific to me?
>>
>> No, it isn't:
>>
>> term% ps|grep Broken
>> none 917 2:08 42:50 264K Broken smtpd
>
> impressive!
>
> i don't see this problem, but i have implemented
> some fairly agressive spam control. perhaps this
> shows up after i tend to hang up.
>
> (it is possible to use nupas smtpd with regular old
> upas.)
>
> would anyone mind sending me a full backtrace?
>
> - erik
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd
2008-11-17 12:04 ` Kenji Arisawa
@ 2008-11-17 12:51 ` erik quanstrom
2008-11-17 13:33 ` Kernel Panic
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: erik quanstrom @ 2008-11-17 12:51 UTC (permalink / raw)
To: 9fans
d'oh! the return value from catchalarm looks reversed. from notify(2)
[...] A handler must
return a non-zero number if the note was recognized (and
resolved); otherwise it must return zero. When the system
i think you're getting into some sort of note loop. i think it would
be helpful to log all notes. i would try something like this
static int
catchalarm(void *, char *msg)
{
static int chattycathy;
if(chattycathy++ > 5)
return 0;
if(senders.first && rcvers.first)
syslog(0, "smtpd", "note: %s->%s: %s",
s_to_c(senders.first->p),
s_to_c(rcvers.first->p), msg);
else
syslog(0, "smtpd", "note: %s", msg);
if(pp){
syskillpg(pp->pid);
pp = 0;
}
return strstr(msg, "alarm") != 0;
}
but at a minimum, i would reverse the return values.
- erik
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd
2008-11-17 12:51 ` erik quanstrom
@ 2008-11-17 13:33 ` Kernel Panic
2008-11-17 13:45 ` erik quanstrom
2008-11-17 14:02 ` Charles Forsyth
2008-11-19 0:25 ` Kenji Arisawa
2 siblings, 1 reply; 10+ messages in thread
From: Kernel Panic @ 2008-11-17 13:33 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
erik quanstrom wrote:
> d'oh! the return value from catchalarm looks reversed. from notify(2)
>
> [...] A handler must
> return a non-zero number if the note was recognized (and
> resolved); otherwise it must return zero. When the system
>
> i think you're getting into some sort of note loop. i think it would
> be helpful to log all notes. i would try something like this
>
its not a good idea to call syslog() in a note handler, because it
aquires locks
and calls into the memory allocator by strdup() (that too aquires locks).
if the note interrupts the program while in syslog() or the allocator,
it will deadlock.
--
cinap
> static int
> catchalarm(void *, char *msg)
> {
> static int chattycathy;
>
> if(chattycathy++ > 5)
> return 0;
> if(senders.first && rcvers.first)
> syslog(0, "smtpd", "note: %s->%s: %s",
> s_to_c(senders.first->p),
> s_to_c(rcvers.first->p), msg);
> else
> syslog(0, "smtpd", "note: %s", msg);
> if(pp){
> syskillpg(pp->pid);
> pp = 0;
> }
> return strstr(msg, "alarm") != 0;
> }
>
> but at a minimum, i would reverse the return values.
>
> - erik
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd
2008-11-17 13:33 ` Kernel Panic
@ 2008-11-17 13:45 ` erik quanstrom
0 siblings, 0 replies; 10+ messages in thread
From: erik quanstrom @ 2008-11-17 13:45 UTC (permalink / raw)
To: 9fans
> its not a good idea to call syslog() in a note handler, because it
> aquires locks
> and calls into the memory allocator by strdup() (that too aquires locks).
>
> if the note interrupts the program while in syslog() or the allocator,
> it will deadlock.
excellent point. in this case, i think it is more likely that the problem
is more mundane than that. in any event, we are chewing gobs
of cpu time, so we don't have a literal deadlock.
- erik
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd
2008-11-17 12:51 ` erik quanstrom
2008-11-17 13:33 ` Kernel Panic
@ 2008-11-17 14:02 ` Charles Forsyth
2008-11-19 0:25 ` Kenji Arisawa
2 siblings, 0 replies; 10+ messages in thread
From: Charles Forsyth @ 2008-11-17 14:02 UTC (permalink / raw)
To: 9fans
if you look at /dev/kmesg you might find many messages about "checked N page table entries".
i think smtpd is getting a trap (probably a null pointer) and because the value returned
from catchalarm is the wrong way round, it continues to retry using the bad pointer until
the alarm goes off (after a long time), when it breaks (because no note handler has
handled the alarm).
i didn't really understand why it was trying to continue after the alarm,
rather than logging and quitting. does it not always mean that a caller
has connected but hasn't completed a request after a long time?
as cinap noted(!) there's a separate problem of what happens if the trap or alarm
happens when locks are held, but i think that's less likely to be a problem in practice:
an alarm will probably go off when nothing much has happened (ie, the program isn't
holding any other lock but waiting for input). it's harder to be sure about
bad pointer traps, but if those happen with locks held,
i'd suspect those are more likely to be locks unrelated to those needed by syslog and print.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd
2008-11-17 12:51 ` erik quanstrom
2008-11-17 13:33 ` Kernel Panic
2008-11-17 14:02 ` Charles Forsyth
@ 2008-11-19 0:25 ` Kenji Arisawa
2008-11-19 0:33 ` erik quanstrom
2 siblings, 1 reply; 10+ messages in thread
From: Kenji Arisawa @ 2008-11-19 0:25 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Hello,
I added some debugging codes to catchalarm() in smtpd.c
static int
catchalarm(void *a, char *msg)
{
int rv = 1;
static int count = 0; // DEBUG by Kenar
count++;
if(count > 3)
return 0;
USED(a);
/* log alarms but continue */
if(strstr(msg, "alarm")){
if(senders.first && rcvers.first)
syslog(0, "smtpd", "note: %s->%s: %s",
s_to_c(senders.first->p),
s_to_c(rcvers.first->p), msg);
else
syslog(0, "smtpd", "note: %s", msg);
rv = 0;
}
syslog(0, "smtpd", "note: %s", msg); // DEBUG by Kenar
/* kill the children if there are any */
if(pp)
syskillpg(pp->pid);
return rv;
}
after that I found broken smtpd:
none 50392 0:00 0:00 264K Broken smtpd
note that wasted cpu time is now 0:00
/sys/log/smtpd shows:
ar Nov 19 02:23:05 ehlo from 124.8.67.36 as sotcndhz.com
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463
Kenji Arisawa
On 2008/11/17, at 21:51, erik quanstrom wrote:
> d'oh! the return value from catchalarm looks reversed. from
> notify(2)
>
> [...] A handler must
> return a non-zero number if the note was recognized (and
> resolved); otherwise it must return zero. When the system
>
> i think you're getting into some sort of note loop. i think it would
> be helpful to log all notes. i would try something like this
>
> static int
> catchalarm(void *, char *msg)
> {
> static int chattycathy;
>
> if(chattycathy++ > 5)
> return 0;
> if(senders.first && rcvers.first)
> syslog(0, "smtpd", "note: %s->%s: %s",
> s_to_c(senders.first->p),
> s_to_c(rcvers.first->p), msg);
> else
> syslog(0, "smtpd", "note: %s", msg);
> if(pp){
> syskillpg(pp->pid);
> pp = 0;
> }
> return strstr(msg, "alarm") != 0;
> }
>
> but at a minimum, i would reverse the return values.
>
> - erik
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd
2008-11-19 0:25 ` Kenji Arisawa
@ 2008-11-19 0:33 ` erik quanstrom
0 siblings, 0 replies; 10+ messages in thread
From: erik quanstrom @ 2008-11-19 0:33 UTC (permalink / raw)
To: 9fans
[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]
acid tells us that we're indirecting a nil senders.last->p.
we know it's senders.last->p because it's at offset 4.
(that was the fault address.) the Lock at the beginning
of String is bigger than 4 bytes by itself, so it's not the
String and rsys is not the 2d pointer in NetConnInfo,
so that eliminates rsys.
acid: src(0x7463)
/sys/src/cmd/upas/smtp/smtpd.c:628
623 logged = 0;
624
625 /* forwarding() can modify 'path' on loopback request */
626 if(filterstate == ACCEPT && fflag && !authenticated && forwarding(path)) {
627 syslog(0, "smtpd", "Bad Forward %s (%s/%s) (%s)",
>628 s_to_c(senders.last->p), him, nci->rsys, s_to_c(path));
629 rejectcount++;
630 reply("550 5.7.1 we don't relay. send to your-path@[] for "
631 "loopback.\r\n");
632 return;
633 }
replace lines 627,628 with
syslog(0, "smtpd", "Bad Forward %s (%s/%s) (%s)",
sender, him, nci->rsys, rcpt);
i made this change some time ago and it appears to be working fine.
i may have seen a similar fault.
- erik
[-- Attachment #2: Type: message/rfc822, Size: 3885 bytes --]
From: Kenji Arisawa <arisawa@ar.aichi-u.ac.jp>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] broken smtpd
Date: Wed, 19 Nov 2008 09:25:55 +0900
Message-ID: <5280BC21-18A9-4287-A1AE-23EEAFAE42CF@ar.aichi-u.ac.jp>
Hello,
I added some debugging codes to catchalarm() in smtpd.c
static int
catchalarm(void *a, char *msg)
{
int rv = 1;
static int count = 0; // DEBUG by Kenar
count++;
if(count > 3)
return 0;
USED(a);
/* log alarms but continue */
if(strstr(msg, "alarm")){
if(senders.first && rcvers.first)
syslog(0, "smtpd", "note: %s->%s: %s",
s_to_c(senders.first->p),
s_to_c(rcvers.first->p), msg);
else
syslog(0, "smtpd", "note: %s", msg);
rv = 0;
}
syslog(0, "smtpd", "note: %s", msg); // DEBUG by Kenar
/* kill the children if there are any */
if(pp)
syskillpg(pp->pid);
return rv;
}
after that I found broken smtpd:
none 50392 0:00 0:00 264K Broken smtpd
note that wasted cpu time is now 0:00
/sys/log/smtpd shows:
ar Nov 19 02:23:05 ehlo from 124.8.67.36 as sotcndhz.com
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463
ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463
Kenji Arisawa
On 2008/11/17, at 21:51, erik quanstrom wrote:
> d'oh! the return value from catchalarm looks reversed. from
> notify(2)
>
> [...] A handler must
> return a non-zero number if the note was recognized (and
> resolved); otherwise it must return zero. When the system
>
> i think you're getting into some sort of note loop. i think it would
> be helpful to log all notes. i would try something like this
>
> static int
> catchalarm(void *, char *msg)
> {
> static int chattycathy;
>
> if(chattycathy++ > 5)
> return 0;
> if(senders.first && rcvers.first)
> syslog(0, "smtpd", "note: %s->%s: %s",
> s_to_c(senders.first->p),
> s_to_c(rcvers.first->p), msg);
> else
> syslog(0, "smtpd", "note: %s", msg);
> if(pp){
> syskillpg(pp->pid);
> pp = 0;
> }
> return strstr(msg, "alarm") != 0;
> }
>
> but at a minimum, i would reverse the return values.
>
> - erik
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-11-19 0:33 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-06 4:31 [9fans] broken smtpd Kenji Arisawa
2008-11-06 10:30 ` Richard Miller
2008-11-06 12:59 ` erik quanstrom
2008-11-17 12:04 ` Kenji Arisawa
2008-11-17 12:51 ` erik quanstrom
2008-11-17 13:33 ` Kernel Panic
2008-11-17 13:45 ` erik quanstrom
2008-11-17 14:02 ` Charles Forsyth
2008-11-19 0:25 ` Kenji Arisawa
2008-11-19 0:33 ` erik quanstrom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).