* [9fans] broken smtpd @ 2008-11-06 4:31 Kenji Arisawa 2008-11-06 10:30 ` Richard Miller 0 siblings, 1 reply; 10+ messages in thread From: Kenji Arisawa @ 2008-11-06 4:31 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Hello, I sometimes observe broken smtpd. ar% ps ... none 1948578 0:01 44:55 264K Broken smtpd none 2521172 0:01 44:58 264K Broken smtpd none 3190037 0:01 45:03 264K Broken smtpd ... ar% ACID 3190037 /proc/3190037/text:386 plan 9 executable /sys/lib/acid/port /sys/lib/acid/386 acid: lstk() errstr()+0x7 /sys/src/libc/9syscall/errstr.s:5 syslog(logname=0x3760a,cons=0x0,fmt=0x37610)+0x20 /sys/src/libc/9sys/ syslog.c:46 err=0x616d2f27 d=0x0 ctim=0x0 buf=0x0 p=0x0 arg=0x0 n=0x0 catchalarm(msg=0xdfffe1ac)+0x5d /sys/src/cmd/upas/smtp/smtpd.c:68 rv=0x1 notifier+0x30 /sys/src/libc/port/atnotify.c:15 acid: ar% pr -tn /sys/src/libc/9sys/syslog.c ... 34 void 35 syslog(int cons, char *logname, char *fmt, ...) 36 { 37 char buf[1024]; 38 char *ctim, *p; 39 va_list arg; 40 int n; 41 Dir *d; 42 char err[ERRMAX]; 43 44 err[0] = '\0'; 45 errstr(err, sizeof err); 46 lock(&sl); 47 ... Is this phenomenon specific to me? Kenji Arisawa ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd 2008-11-06 4:31 [9fans] broken smtpd Kenji Arisawa @ 2008-11-06 10:30 ` Richard Miller 2008-11-06 12:59 ` erik quanstrom 0 siblings, 1 reply; 10+ messages in thread From: Richard Miller @ 2008-11-06 10:30 UTC (permalink / raw) To: 9fans > Is this phenomenon specific to me? No, it isn't: term% ps|grep Broken none 917 2:08 42:50 264K Broken smtpd ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd 2008-11-06 10:30 ` Richard Miller @ 2008-11-06 12:59 ` erik quanstrom 2008-11-17 12:04 ` Kenji Arisawa 0 siblings, 1 reply; 10+ messages in thread From: erik quanstrom @ 2008-11-06 12:59 UTC (permalink / raw) To: 9fans >> Is this phenomenon specific to me? > > No, it isn't: > > term% ps|grep Broken > none 917 2:08 42:50 264K Broken smtpd impressive! i don't see this problem, but i have implemented some fairly agressive spam control. perhaps this shows up after i tend to hang up. (it is possible to use nupas smtpd with regular old upas.) would anyone mind sending me a full backtrace? - erik ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd 2008-11-06 12:59 ` erik quanstrom @ 2008-11-17 12:04 ` Kenji Arisawa 2008-11-17 12:51 ` erik quanstrom 0 siblings, 1 reply; 10+ messages in thread From: Kenji Arisawa @ 2008-11-17 12:04 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Hello, Today, I observed smtpd that was in endless loop and then broken. ar% ps ... none 20392 0:01 28:27 264K Running smtpd ... ar% ACID 20392 /proc/20392/text:386 plan 9 executable /sys/lib/acid/port /sys/lib/acid/386 acid: lstk() receiver(path=0x41f58)+0x1c5 /sys/src/cmd/upas/smtp/smtpd.c:625 sender=0x379bf rcpt=0x41ad8 zzparse()+0x2c9 /sys/src/cmd/upas/smtp/smtpd.y:773 save1=0x0 save2=0x0 save3=0x0 save4=0x0 zzstate=0xf zzchar=0xffffffff zzs=0x3f36b zzp=0xdfffe41c zzn=0xfffffc18 main(argv=0xdfffefa4,argc=0x0)+0x261 /sys/src/cmd/upas/smtp/smtpd.c:194 netdir=0xdfffefbc _argc=0x6e _args=0x376c2 buf=0x0 _main+0x31 /sys/src/libc/386/main9.s:16 acid: src(receiver+0x1c5) /sys/src/cmd/upas/smtp/smtpd.c:625 620 621 logged = 0; 622 623 /* forwarding() can modify 'path' on loopback request */ 624 if(filterstate == ACCEPT && fflag && !authenticated && forwarding(path)) { >625 syslog(0, "smtpd", "Bad Forward %s (%s/%s) (%s)", 626 s_to_c(senders.last->p), him, nci->rsys, s_to_c(path)); 627 rejectcount++; 628 reply("550 5.7.1 we don't relay. send to your-path@[] for " 629 "loopback.\r\n"); 630 return; I don't observe "Bad Forward" of this mail in /sys/log/smtpd. After some amount of time, ar% ps ... none 20392 0:02 44:57 264K Broken smtpd ... acid: lstk() errstr()+0x7 /sys/src/libc/9syscall/errstr.s:5 syslog(logname=0x3760a,cons=0x0,fmt=0x37610)+0x20 /sys/src/libc/9sys/ syslog.c:46 err=0x616d2f27 d=0x0 ctim=0x0 buf=0x0 p=0x0 arg=0x0 n=0x0 catchalarm(msg=0xdfffe1ac)+0x5d /sys/src/cmd/upas/smtp/smtpd.c:68 rv=0x1 notifier+0x30 /sys/src/libc/port/atnotify.c:15 acid: src(syslog+0x29) /sys/src/libc/9sys/syslog.c:46 41 Dir *d; 42 char err[ERRMAX]; 43 44 err[0] = '\0'; 45 errstr(err, sizeof err); >46 lock(&sl); 47 48 /* 49 * paranoia makes us stat to make sure a fork+close 50 * hasn't broken our fd's 51 */ acid: I hope the above information is helpful to fix the problem. Kenji Arisawa On 2008/11/06, at 21:59, erik quanstrom wrote: >>> Is this phenomenon specific to me? >> >> No, it isn't: >> >> term% ps|grep Broken >> none 917 2:08 42:50 264K Broken smtpd > > impressive! > > i don't see this problem, but i have implemented > some fairly agressive spam control. perhaps this > shows up after i tend to hang up. > > (it is possible to use nupas smtpd with regular old > upas.) > > would anyone mind sending me a full backtrace? > > - erik > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd 2008-11-17 12:04 ` Kenji Arisawa @ 2008-11-17 12:51 ` erik quanstrom 2008-11-17 13:33 ` Kernel Panic ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: erik quanstrom @ 2008-11-17 12:51 UTC (permalink / raw) To: 9fans d'oh! the return value from catchalarm looks reversed. from notify(2) [...] A handler must return a non-zero number if the note was recognized (and resolved); otherwise it must return zero. When the system i think you're getting into some sort of note loop. i think it would be helpful to log all notes. i would try something like this static int catchalarm(void *, char *msg) { static int chattycathy; if(chattycathy++ > 5) return 0; if(senders.first && rcvers.first) syslog(0, "smtpd", "note: %s->%s: %s", s_to_c(senders.first->p), s_to_c(rcvers.first->p), msg); else syslog(0, "smtpd", "note: %s", msg); if(pp){ syskillpg(pp->pid); pp = 0; } return strstr(msg, "alarm") != 0; } but at a minimum, i would reverse the return values. - erik ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd 2008-11-17 12:51 ` erik quanstrom @ 2008-11-17 13:33 ` Kernel Panic 2008-11-17 13:45 ` erik quanstrom 2008-11-17 14:02 ` Charles Forsyth 2008-11-19 0:25 ` Kenji Arisawa 2 siblings, 1 reply; 10+ messages in thread From: Kernel Panic @ 2008-11-17 13:33 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs erik quanstrom wrote: > d'oh! the return value from catchalarm looks reversed. from notify(2) > > [...] A handler must > return a non-zero number if the note was recognized (and > resolved); otherwise it must return zero. When the system > > i think you're getting into some sort of note loop. i think it would > be helpful to log all notes. i would try something like this > its not a good idea to call syslog() in a note handler, because it aquires locks and calls into the memory allocator by strdup() (that too aquires locks). if the note interrupts the program while in syslog() or the allocator, it will deadlock. -- cinap > static int > catchalarm(void *, char *msg) > { > static int chattycathy; > > if(chattycathy++ > 5) > return 0; > if(senders.first && rcvers.first) > syslog(0, "smtpd", "note: %s->%s: %s", > s_to_c(senders.first->p), > s_to_c(rcvers.first->p), msg); > else > syslog(0, "smtpd", "note: %s", msg); > if(pp){ > syskillpg(pp->pid); > pp = 0; > } > return strstr(msg, "alarm") != 0; > } > > but at a minimum, i would reverse the return values. > > - erik > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd 2008-11-17 13:33 ` Kernel Panic @ 2008-11-17 13:45 ` erik quanstrom 0 siblings, 0 replies; 10+ messages in thread From: erik quanstrom @ 2008-11-17 13:45 UTC (permalink / raw) To: 9fans > its not a good idea to call syslog() in a note handler, because it > aquires locks > and calls into the memory allocator by strdup() (that too aquires locks). > > if the note interrupts the program while in syslog() or the allocator, > it will deadlock. excellent point. in this case, i think it is more likely that the problem is more mundane than that. in any event, we are chewing gobs of cpu time, so we don't have a literal deadlock. - erik ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd 2008-11-17 12:51 ` erik quanstrom 2008-11-17 13:33 ` Kernel Panic @ 2008-11-17 14:02 ` Charles Forsyth 2008-11-19 0:25 ` Kenji Arisawa 2 siblings, 0 replies; 10+ messages in thread From: Charles Forsyth @ 2008-11-17 14:02 UTC (permalink / raw) To: 9fans if you look at /dev/kmesg you might find many messages about "checked N page table entries". i think smtpd is getting a trap (probably a null pointer) and because the value returned from catchalarm is the wrong way round, it continues to retry using the bad pointer until the alarm goes off (after a long time), when it breaks (because no note handler has handled the alarm). i didn't really understand why it was trying to continue after the alarm, rather than logging and quitting. does it not always mean that a caller has connected but hasn't completed a request after a long time? as cinap noted(!) there's a separate problem of what happens if the trap or alarm happens when locks are held, but i think that's less likely to be a problem in practice: an alarm will probably go off when nothing much has happened (ie, the program isn't holding any other lock but waiting for input). it's harder to be sure about bad pointer traps, but if those happen with locks held, i'd suspect those are more likely to be locks unrelated to those needed by syslog and print. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd 2008-11-17 12:51 ` erik quanstrom 2008-11-17 13:33 ` Kernel Panic 2008-11-17 14:02 ` Charles Forsyth @ 2008-11-19 0:25 ` Kenji Arisawa 2008-11-19 0:33 ` erik quanstrom 2 siblings, 1 reply; 10+ messages in thread From: Kenji Arisawa @ 2008-11-19 0:25 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Hello, I added some debugging codes to catchalarm() in smtpd.c static int catchalarm(void *a, char *msg) { int rv = 1; static int count = 0; // DEBUG by Kenar count++; if(count > 3) return 0; USED(a); /* log alarms but continue */ if(strstr(msg, "alarm")){ if(senders.first && rcvers.first) syslog(0, "smtpd", "note: %s->%s: %s", s_to_c(senders.first->p), s_to_c(rcvers.first->p), msg); else syslog(0, "smtpd", "note: %s", msg); rv = 0; } syslog(0, "smtpd", "note: %s", msg); // DEBUG by Kenar /* kill the children if there are any */ if(pp) syskillpg(pp->pid); return rv; } after that I found broken smtpd: none 50392 0:00 0:00 264K Broken smtpd note that wasted cpu time is now 0:00 /sys/log/smtpd shows: ar Nov 19 02:23:05 ehlo from 124.8.67.36 as sotcndhz.com ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463 ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463 ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463 Kenji Arisawa On 2008/11/17, at 21:51, erik quanstrom wrote: > d'oh! the return value from catchalarm looks reversed. from > notify(2) > > [...] A handler must > return a non-zero number if the note was recognized (and > resolved); otherwise it must return zero. When the system > > i think you're getting into some sort of note loop. i think it would > be helpful to log all notes. i would try something like this > > static int > catchalarm(void *, char *msg) > { > static int chattycathy; > > if(chattycathy++ > 5) > return 0; > if(senders.first && rcvers.first) > syslog(0, "smtpd", "note: %s->%s: %s", > s_to_c(senders.first->p), > s_to_c(rcvers.first->p), msg); > else > syslog(0, "smtpd", "note: %s", msg); > if(pp){ > syskillpg(pp->pid); > pp = 0; > } > return strstr(msg, "alarm") != 0; > } > > but at a minimum, i would reverse the return values. > > - erik > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] broken smtpd 2008-11-19 0:25 ` Kenji Arisawa @ 2008-11-19 0:33 ` erik quanstrom 0 siblings, 0 replies; 10+ messages in thread From: erik quanstrom @ 2008-11-19 0:33 UTC (permalink / raw) To: 9fans [-- Attachment #1: Type: text/plain, Size: 1047 bytes --] acid tells us that we're indirecting a nil senders.last->p. we know it's senders.last->p because it's at offset 4. (that was the fault address.) the Lock at the beginning of String is bigger than 4 bytes by itself, so it's not the String and rsys is not the 2d pointer in NetConnInfo, so that eliminates rsys. acid: src(0x7463) /sys/src/cmd/upas/smtp/smtpd.c:628 623 logged = 0; 624 625 /* forwarding() can modify 'path' on loopback request */ 626 if(filterstate == ACCEPT && fflag && !authenticated && forwarding(path)) { 627 syslog(0, "smtpd", "Bad Forward %s (%s/%s) (%s)", >628 s_to_c(senders.last->p), him, nci->rsys, s_to_c(path)); 629 rejectcount++; 630 reply("550 5.7.1 we don't relay. send to your-path@[] for " 631 "loopback.\r\n"); 632 return; 633 } replace lines 627,628 with syslog(0, "smtpd", "Bad Forward %s (%s/%s) (%s)", sender, him, nci->rsys, rcpt); i made this change some time ago and it appears to be working fine. i may have seen a similar fault. - erik [-- Attachment #2: Type: message/rfc822, Size: 3885 bytes --] From: Kenji Arisawa <arisawa@ar.aichi-u.ac.jp> To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Subject: Re: [9fans] broken smtpd Date: Wed, 19 Nov 2008 09:25:55 +0900 Message-ID: <5280BC21-18A9-4287-A1AE-23EEAFAE42CF@ar.aichi-u.ac.jp> Hello, I added some debugging codes to catchalarm() in smtpd.c static int catchalarm(void *a, char *msg) { int rv = 1; static int count = 0; // DEBUG by Kenar count++; if(count > 3) return 0; USED(a); /* log alarms but continue */ if(strstr(msg, "alarm")){ if(senders.first && rcvers.first) syslog(0, "smtpd", "note: %s->%s: %s", s_to_c(senders.first->p), s_to_c(rcvers.first->p), msg); else syslog(0, "smtpd", "note: %s", msg); rv = 0; } syslog(0, "smtpd", "note: %s", msg); // DEBUG by Kenar /* kill the children if there are any */ if(pp) syskillpg(pp->pid); return rv; } after that I found broken smtpd: none 50392 0:00 0:00 264K Broken smtpd note that wasted cpu time is now 0:00 /sys/log/smtpd shows: ar Nov 19 02:23:05 ehlo from 124.8.67.36 as sotcndhz.com ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463 ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463 ar Nov 19 02:23:08 note: sys: trap: fault read addr=0x4 pc=0x00007463 Kenji Arisawa On 2008/11/17, at 21:51, erik quanstrom wrote: > d'oh! the return value from catchalarm looks reversed. from > notify(2) > > [...] A handler must > return a non-zero number if the note was recognized (and > resolved); otherwise it must return zero. When the system > > i think you're getting into some sort of note loop. i think it would > be helpful to log all notes. i would try something like this > > static int > catchalarm(void *, char *msg) > { > static int chattycathy; > > if(chattycathy++ > 5) > return 0; > if(senders.first && rcvers.first) > syslog(0, "smtpd", "note: %s->%s: %s", > s_to_c(senders.first->p), > s_to_c(rcvers.first->p), msg); > else > syslog(0, "smtpd", "note: %s", msg); > if(pp){ > syskillpg(pp->pid); > pp = 0; > } > return strstr(msg, "alarm") != 0; > } > > but at a minimum, i would reverse the return values. > > - erik > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-11-19 0:33 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-11-06 4:31 [9fans] broken smtpd Kenji Arisawa 2008-11-06 10:30 ` Richard Miller 2008-11-06 12:59 ` erik quanstrom 2008-11-17 12:04 ` Kenji Arisawa 2008-11-17 12:51 ` erik quanstrom 2008-11-17 13:33 ` Kernel Panic 2008-11-17 13:45 ` erik quanstrom 2008-11-17 14:02 ` Charles Forsyth 2008-11-19 0:25 ` Kenji Arisawa 2008-11-19 0:33 ` erik quanstrom
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).