* Re: [9fans] DNS @ 2017-04-01 1:28 cinap_lenrek 2017-04-01 3:40 ` Skip Tavakkolian 0 siblings, 1 reply; 71+ messages in thread From: cinap_lenrek @ 2017-04-01 1:28 UTC (permalink / raw) To: 9fans yes. raising Maxretries to > 5 in dnresolve.c fixes it. this parameter limits the chain of cname redirects. -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] DNS 2017-04-01 1:28 [9fans] DNS cinap_lenrek @ 2017-04-01 3:40 ` Skip Tavakkolian 2017-04-01 9:46 ` Alexandru Gheorghe 0 siblings, 1 reply; 71+ messages in thread From: Skip Tavakkolian @ 2017-04-01 3:40 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 257 bytes --] Thanks! I should have checked the sources. On Fri, Mar 31, 2017 at 6:28 PM <cinap_lenrek@felloff.net> wrote: > yes. raising Maxretries to > 5 in dnresolve.c fixes it. this parameter > limits the chain of cname redirects. > > -- > cinap > > [-- Attachment #2: Type: text/html, Size: 647 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] DNS 2017-04-01 3:40 ` Skip Tavakkolian @ 2017-04-01 9:46 ` Alexandru Gheorghe 2017-04-01 17:04 ` Steve Simon 0 siblings, 1 reply; 71+ messages in thread From: Alexandru Gheorghe @ 2017-04-01 9:46 UTC (permalink / raw) To: 9fans That's a weird name for CNAME traversing. Should've been (maybe more appropriately): "MaxCnameDepth". On 04/01/2017 04:40 AM, Skip Tavakkolian wrote: > Maxretries to > 5 -- ; Alexandru Gheorghe ; ; <alghe.global gmail com> ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] DNS 2017-04-01 9:46 ` Alexandru Gheorghe @ 2017-04-01 17:04 ` Steve Simon 2017-04-03 7:06 ` David Arroyo 0 siblings, 1 reply; 71+ messages in thread From: Steve Simon @ 2017-04-01 17:04 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs possibly. however it didn't take the plan9 community long to figure out what needed to be changed. Thus, by definition, it was not too obscure. -Steve > On 1 Apr 2017, at 10:46, Alexandru Gheorghe <alghe.global@gmail.com> wrote: > > That's a weird name for CNAME traversing. Should've been (maybe more > appropriately): "MaxCnameDepth". > > >> On 04/01/2017 04:40 AM, Skip Tavakkolian wrote: >> Maxretries to > 5 > > -- > ; Alexandru Gheorghe > ; > ; <alghe.global gmail com> > ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] DNS 2017-04-01 17:04 ` Steve Simon @ 2017-04-03 7:06 ` David Arroyo 0 siblings, 0 replies; 71+ messages in thread From: David Arroyo @ 2017-04-03 7:06 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 1174 bytes --] I had a similar problem with 'dnscache' from the TinyDNS suite of tools a couple years back. Certain DNS names hosted in AWS Route53 would exceed the max number of requests to be resolved at random, because Route53 randomized the NS records it gave for each subdomain, so every once in awhile during the dns 'walk' dnscache would have to resolve their .net, .co.uk, .org, and .com NS records, which blows up the number of resolutions pretty quickly. Akamai has the same problem. The internet was much simpler 20 years ago :\. David On Sat, Apr 1, 2017 at 10:04 AM Steve Simon <steve@quintile.net> wrote: > possibly. > > however it didn't take the plan9 community long to figure out what needed > to be changed. Thus, by definition, it was not too obscure. > > -Steve > > > On 1 Apr 2017, at 10:46, Alexandru Gheorghe <alghe.global@gmail.com> > wrote: > > > > That's a weird name for CNAME traversing. Should've been (maybe more > > appropriately): "MaxCnameDepth". > > > > > >> On 04/01/2017 04:40 AM, Skip Tavakkolian wrote: > >> Maxretries to > 5 > > > > -- > > ; Alexandru Gheorghe > > ; > > ; <alghe.global gmail com> > > > > > [-- Attachment #2: Type: text/html, Size: 2070 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
* [9fans] DNS @ 2017-03-20 7:26 Skip Tavakkolian 2017-03-21 19:19 ` Skip Tavakkolian 0 siblings, 1 reply; 71+ messages in thread From: Skip Tavakkolian @ 2017-03-20 7:26 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 260 bytes --] It seems Plan 9 dns can't resolve www.paypal.com correctly; I'm not sure why. Can anyone with a 9front installation try this to see if it resolves to the an IP address (i.e. see a final "answer" output)? ndb/dnsdebug www.paypal.com Thanks, -Skip [-- Attachment #2: Type: text/html, Size: 465 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] DNS 2017-03-20 7:26 Skip Tavakkolian @ 2017-03-21 19:19 ` Skip Tavakkolian 2017-03-30 22:19 ` Steve Simon 2017-03-31 7:43 ` Peter Hull 0 siblings, 2 replies; 71+ messages in thread From: Skip Tavakkolian @ 2017-03-21 19:19 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 393 bytes --] trying again. On Mon, Mar 20, 2017 at 12:26 AM Skip Tavakkolian < skip.tavakkolian@gmail.com> wrote: > It seems Plan 9 dns can't resolve www.paypal.com correctly; I'm not sure > why. > > Can anyone with a 9front installation try this to see if it resolves to > the an IP address (i.e. see a final "answer" output)? > > ndb/dnsdebug www.paypal.com > > Thanks, > -Skip > > [-- Attachment #2: Type: text/html, Size: 1148 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] DNS 2017-03-21 19:19 ` Skip Tavakkolian @ 2017-03-30 22:19 ` Steve Simon 2017-03-31 7:43 ` Peter Hull 1 sibling, 0 replies; 71+ messages in thread From: Steve Simon @ 2017-03-30 22:19 UTC (permalink / raw) To: 9fans fails for me too. dnsdebug is happy though it returns more than I would expect. acording to these guys: https://intodns.com/paypal.com www.paypal.com jumps though quite a few indirections: www.paypal.com -> www.paypal.com.akadns.net -> pprussia.www.paypal.com.akadns.net -> ppdirect.paypal.com.akadns.net -> wlb.paypal.com.akadns.net -> www.paypal.com.edgekey.net -> e3694.a.akamaiedge.net -> [ 104.86.233.120 ] I don't know if dns(1) has an upper limit to the number of redirections it will allow but it wouldn't surprise me. -Steve ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] DNS 2017-03-21 19:19 ` Skip Tavakkolian 2017-03-30 22:19 ` Steve Simon @ 2017-03-31 7:43 ` Peter Hull 1 sibling, 0 replies; 71+ messages in thread From: Peter Hull @ 2017-03-31 7:43 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Doesn’t work quite right for me. It resolves a few steps and gets to an IP address but it never prints out the “answer” between ----- lines. If I do it for www.google.com then it does. Also it prints some console messages - "no rr from dblookup; crapped out" Pete ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns @ 2012-08-26 14:48 cinap_lenrek 0 siblings, 0 replies; 71+ messages in thread From: cinap_lenrek @ 2012-08-26 14:48 UTC (permalink / raw) To: 9fans 9fans stoped sending me mail again, i read your response on the 9fans archive. you are indeed correct, i missed that it used rralloc() wich allocates all the structures. so false alarm from me again, but at least you spoted the memory leak. another thing, the DN* rr->sig->signer doesnt seem to get marked in dnchangenever() and dnageall(), which would cause it to get accidently freed unless its referenced somewhere else no? -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* [9fans] dns @ 2012-08-21 1:51 Jeff Sickel 2012-08-21 1:59 ` erik quanstrom ` (2 more replies) 0 siblings, 3 replies; 71+ messages in thread From: Jeff Sickel @ 2012-08-21 1:51 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Has anyone else been seeing their Plan 9 dns servers work for a little while and then stop responding? I've seen this happen after delegation loops show up in the logs: Aug 20 00:24:10 delegation loop 68.23.115.in-addr.arpa ns rev1.kornet.net -> ns H.ROOT-SERVERS.NET from 211.216.50.180 and no answers Aug 20 00:24:11 delegation loop 68.23.115.in-addr.arpa ns rev1.kornet.net -> ns J.ROOT-SERVERS.NET from 211.216.50.170 Aug 20 00:24:11 and no answers Aug 20 00:24:11 delegation loop 68.23.115.in-addr.arpa ns rev1.kornet.net -> ns G.ROOT-SERVERS.NET from 211.216.50.180 Aug 20 00:24:11 and no answers Aug 20 00:24:13 delegation loop 68.23.115.in-addr.arpa ns rev1.kornet.net -> ns J.ROOT-SERVERS.NET from 211.216.50.170 Aug 20 00:24:13 and no answers Aug 20 00:24:15 delegation loop 68.23.115.in-addr.arpa ns rev1.kornet.net -> ns H.ROOT-SERVERS.NET from 211.216.50.180 ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-21 1:51 Jeff Sickel @ 2012-08-21 1:59 ` erik quanstrom 2012-08-21 4:23 ` arisawa 2012-08-21 5:08 ` Benjamin Huntsman 2 siblings, 0 replies; 71+ messages in thread From: erik quanstrom @ 2012-08-21 1:59 UTC (permalink / raw) To: 9fans [-- Attachment #1: Type: text/plain, Size: 439 bytes --] i'm using a modified version of dns. i found that aktomi redirections too unreliable. even so, i still get crashes, which have become more frequent in recent weeks. i've attached a copy of "restartdns" which is ment to be called from cron on short intervals. contrib quanstro/ndb has the whole nine yards. one of these days i will redo dns with a better (and maintainable) structure. :-). but please beat me to it. - erik [-- Attachment #2: restartdns --] [-- Type: text/plain, Size: 2302 bytes --] #!/bin/rc rfork en nl=' ' mailto=(quanstro) allow=(ladd) recursive=() gcidr = ( # only blocks that can map to google's a records 72.14.192.0/18 74.125.0.0/16 209.85.128.0/17 216.239.32.0/19 173.194.0.0/16 ) if(! ~ `{cat /dev/user} `{cat /dev/hostowner}){ echo 'restartdns: must be hostowner' >[1=2]; exit user } 9fs other fn syslog{ echo $sysname `{date} restartdns: $* > /sys/log/dns } fn pgroup{ ifs=$nl g=`{cat /proc/$1/noteid} for(i in `{grep -l $g /proc/*/noteid | sed 's:/proc/([^/]+)/noteid:\1:g'}) if(test -d /proc/$i) echo $i } fn reaper{ nbroken=() for(i in `{ps | awk '$6 == "Broken" && $7 == "dns" {print $2}'}){ r = /n/other/$user/dnssnap/$sysname.$i.`{date -n} snap -o $r `{pgroup $i} nbroken = ($nbroken $r) } } fn getips{ ndb/dnsquery $* | sed 's/.*[ ]//g' } fn google{ google=() if(! ip/cidr -rf <{getips google.com} <{echo $gcidr}) google=1 if(ip/cidr -f /lib/badcidr <{getips 9fans.net} ) google=($google 2) } fn why{ if(! ~ $#nbroken 0){ echo getting mediæval on $#nbroken broken dns processes. for(i in $nbroken) echo $i } if(! ~ $#nwait 0){ echo getting mediæval on $#nwait deadlocked dns processes. for(i in $nwait) echo $i } if(! ~ $#google 0){ echo google broken ndb/dnsquery google.com ndb/dnsquery 9fans.net any } } flagfmt='p,f' args=() if(! ifs=() eval `{aux/getflags $*} || ! ~ $#* 0){ aux/usage exit usage } if(~ $#flagf 0){ if(! ~ $sysname $allow) exit 'wrong system' reaper ifs=$nl nwait=`{ps -a |sed -n 's/.* +dns \[query lock wait for(.*)\]/\1/gp' | sort | uniq -c | awk '$1>2'} google if(~ $#nbroken 0 && ~ $#nwait 0 && ~ $#google 0) exit 'none broken' why if(~ $service rx) {date; echo; why; echo; ps -a | grep ' dns ' }| mail -s 'restartdns: '^$sysname $mailto } if(~ $flagp 1) exit '' syslog slaying broken $#nbroken nwait $#nwait google $#google dns = ndb/dns slaydns = `{echo $dns | sed 's:.*/::g'} slay $slaydns | rc unmount '#s/dns' /net/dns >[2=] unmount '#s/dns_net.alt' /net.alt/dns >[2=] rm -f '#s/dns' '#s/dns_net.alt' $dns -N 20000 -s if(~ $sysname $recursive) $dns -sx /net.alt -f /lib/ndb/external if not $dns -Rrsx /net.alt -f /lib/ndb/external ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-21 1:51 Jeff Sickel 2012-08-21 1:59 ` erik quanstrom @ 2012-08-21 4:23 ` arisawa 2012-08-21 5:08 ` Jeff Sickel 2012-08-21 5:08 ` Benjamin Huntsman 2 siblings, 1 reply; 71+ messages in thread From: arisawa @ 2012-08-21 4:23 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Hello Recent version of dns crashes. Old version (for example, Apr this year) is OK. It seems crashed dns make other side effect: Many processes stop with Fauth status. # I don't know the reason. Kenji Arisawa On 2012/08/21, at 10:51, Jeff Sickel wrote: > Has anyone else been seeing their Plan 9 dns servers work for a > little while and then stop responding? I've seen this happen > after delegation loops show up in the logs: > > Aug 20 00:24:10 delegation loop 68.23.115.in-addr.arpa ns rev1.kornet.net -> ns H.ROOT-SERVERS.NET from 211.216.50.180 > and no answers > Aug 20 00:24:11 delegation loop 68.23.115.in-addr.arpa ns rev1.kornet.net -> ns J.ROOT-SERVERS.NET from 211.216.50.170 > Aug 20 00:24:11 and no answers > Aug 20 00:24:11 delegation loop 68.23.115.in-addr.arpa ns rev1.kornet.net -> ns G.ROOT-SERVERS.NET from 211.216.50.180 > Aug 20 00:24:11 and no answers > Aug 20 00:24:13 delegation loop 68.23.115.in-addr.arpa ns rev1.kornet.net -> ns J.ROOT-SERVERS.NET from 211.216.50.170 > Aug 20 00:24:13 and no answers > Aug 20 00:24:15 delegation loop 68.23.115.in-addr.arpa ns rev1.kornet.net -> ns H.ROOT-SERVERS.NET from 211.216.50.180 > > ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-21 4:23 ` arisawa @ 2012-08-21 5:08 ` Jeff Sickel 2012-08-21 8:42 ` Kenji Arisawa 0 siblings, 1 reply; 71+ messages in thread From: Jeff Sickel @ 2012-08-21 5:08 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs As in using: Apr 12 22:46:05 CDT 2012 /n/sourcesdump/2012/0501/plan9/386/bin/ndb/dns 310819 [jmk] Oct 14 13:32:38 CDT 2011 /n/sourcesdump/2012/0412/plan9/386/bin/ndb/dns 310519 [sys] There are a few differences if you compare with sourcesdump. Knowing one that works should help narrow down the changes to get the latest working cleanly again. -jas On Aug 20, 2012, at 11:23 PM, arisawa <arisawa@ar.aichi-u.ac.jp> wrote: > Hello > > Recent version of dns crashes. > Old version (for example, Apr this year) is OK. > It seems crashed dns make other side effect: > Many processes stop with Fauth status. # I don't know the reason. > > Kenji Arisawa ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-21 5:08 ` Jeff Sickel @ 2012-08-21 8:42 ` Kenji Arisawa 2012-08-21 11:27 ` cinap_lenrek 0 siblings, 1 reply; 71+ messages in thread From: Kenji Arisawa @ 2012-08-21 8:42 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Hello, I suspect the problem is in: --rwxrwxr-x M 21231 glenda sys 311768 Jul 14 11:46 dns I don't have 2012/0501 version. I am lazy sorry. My current dns is --rwxrwxr-x M 131526 sys sys 310819 Apr 13 12:46 /386/bin/ndb/dns That works 12 days since last boot without any troubles. ar% ps .... none 99 0:00 0:00 184K Open listen none 100 0:00 0:00 184K Open listen bootes 540 0:00 0:01 240K Await patrol none 241197 0:00 0:00 240K Await tcp110 arisawa 241202 0:00 0:00 360K Pread pop3 where patrol is a program that reboot cpu server if it detects a process that stops with Fauth status. the contents is ar% cat /usr/bootes/bin/rc/patrol #!/bin/rc rfork e while(sleep 60){ a=`{ps|grep Fauth|wc} if(! ~ $a(1) 0){ /usr/local/bin/386/logit -l reboot Fauth:$a(1) # for logging (my program). don't mind reboot } } ar% Kenji Arisawa On 2012/08/21, at 14:08, Jeff Sickel wrote: > As in using: > > Apr 12 22:46:05 CDT 2012 /n/sourcesdump/2012/0501/plan9/386/bin/ndb/dns 310819 [jmk] > Oct 14 13:32:38 CDT 2011 /n/sourcesdump/2012/0412/plan9/386/bin/ndb/dns 310519 [sys] > > There are a few differences if you compare with sourcesdump. Knowing one that > works should help narrow down the changes to get the latest working cleanly again. > > -jas > > On Aug 20, 2012, at 11:23 PM, arisawa <arisawa@ar.aichi-u.ac.jp> wrote: > >> Hello >> >> Recent version of dns crashes. >> Old version (for example, Apr this year) is OK. >> It seems crashed dns make other side effect: >> Many processes stop with Fauth status. # I don't know the reason. >> >> Kenji Arisawa > > ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-21 8:42 ` Kenji Arisawa @ 2012-08-21 11:27 ` cinap_lenrek 2012-08-21 12:18 ` Lucio De Re ` (3 more replies) 0 siblings, 4 replies; 71+ messages in thread From: cinap_lenrek @ 2012-08-21 11:27 UTC (permalink / raw) To: 9fans nothing wrong with diffing the changes and see if theres a clue, but to solve this one really needs to find the underlying cause no matter what. changes can just hide bugs or make them more or less likely to appear. can anyone provide at least a stacktrace or process snapshot of the crashed dns processes? from that you try to build a theory of what might be going wrong by thinking really really hard... (the thinking should be directly proportional to the time it takes to reproduce the bug) and then you work on how to prove that theory. just changing stuff without knowing what exactly was the problem with the old code is sometimes tempting, but wrong and dangerous. -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-21 11:27 ` cinap_lenrek @ 2012-08-21 12:18 ` Lucio De Re 2012-08-21 12:37 ` arisawa ` (2 subsequent siblings) 3 siblings, 0 replies; 71+ messages in thread From: Lucio De Re @ 2012-08-21 12:18 UTC (permalink / raw) To: 9fans > can anyone provide at least a stacktrace or process snapshot > of the crashed dns processes? I've had double-free errors from dns on my server, but the same machine suffers from fossil/venti errors, so I can't use it as diagnostics. I do think that Plan 9 dns is a bit fragile, though not in the way BIND is monstrously complex. ++L ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-21 11:27 ` cinap_lenrek 2012-08-21 12:18 ` Lucio De Re @ 2012-08-21 12:37 ` arisawa 2012-08-21 18:32 ` erik quanstrom 2012-08-25 0:10 ` Kenji Arisawa 3 siblings, 0 replies; 71+ messages in thread From: arisawa @ 2012-08-21 12:37 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Hello, Files in /sys/src/cmd/ndb are not be changed since this Mar 31. However dns in /n/sources/plan9/386/bin/ndb have changed two or three times since that day. Probably the difference comes from others (library?). hera% pwd /sys/src/cmd/ndb hera% ls -lt --rw-rw-r-- M 21422 sys sys 34365 Mar 31 05:41 cs.c --rw-rw-r-- M 21422 sys sys 40329 Mar 29 08:07 dn.c --rw-rw-r-- M 21422 sys sys 20706 Mar 29 08:01 dns.c --rw-rw-r-- M 21422 sys sys 6916 Oct 14 2011 dnudpserver.c --rw-rw-r-- M 21422 sys sys 39487 Oct 14 2011 dnresolve.c --rw-rw-r-- M 21422 sys sys 12571 Oct 14 2011 convM2DNS.c hera% ls -lt /n/sources/plan9/sys/src/cmd/ndb --rw-rw-r-- M 21436 glenda sys 34365 Mar 31 05:41 /n/sources/plan9/sys/src/cmd/ndb/cs.c --rw-rw-r-- M 21436 glenda sys 40329 Mar 29 08:07 /n/sources/plan9/sys/src/cmd/ndb/dn.c --rw-rw-r-- M 21436 glenda sys 20706 Mar 29 08:01 /n/sources/plan9/sys/src/cmd/ndb/dns.c --rw-rw-r-- M 21436 glenda sys 6916 Oct 14 2011 /n/sources/plan9/sys/src/cmd/ndb/dnudpserver.c --rw-rw-r-- M 21436 glenda sys 39487 Oct 14 2011 /n/sources/plan9/sys/src/cmd/ndb/dnresolve.c --rw-rw-r-- M 21436 glenda sys 12571 Oct 14 2011 /n/sources/plan9/sys/src/cmd/ndb/convM2DNS.c >the old code is sometimes tempting, but wrong and dangerous. I agree. And I wonder why dns runs as bootes (sever owner). Kenji Arisawa On 2012/08/21, at 20:27, cinap_lenrek@gmx.de wrote: > nothing wrong with diffing the changes and see if theres a clue, but > to solve this one really needs to find the underlying cause no matter > what. changes can just hide bugs or make them more or less likely to > appear. can anyone provide at least a stacktrace or process snapshot > of the crashed dns processes? from that you try to build a theory of > what might be going wrong by thinking really really hard... (the > thinking should be directly proportional to the time it takes to > reproduce the bug) and then you work on how to prove that theory. > just changing stuff without knowing what exactly was the problem with > the old code is sometimes tempting, but wrong and dangerous. > > -- > cinap > ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-21 11:27 ` cinap_lenrek 2012-08-21 12:18 ` Lucio De Re 2012-08-21 12:37 ` arisawa @ 2012-08-21 18:32 ` erik quanstrom 2012-08-21 20:05 ` Charles Forsyth ` (3 more replies) 2012-08-25 0:10 ` Kenji Arisawa 3 siblings, 4 replies; 71+ messages in thread From: erik quanstrom @ 2012-08-21 18:32 UTC (permalink / raw) To: 9fans On Tue Aug 21 07:31:55 EDT 2012, cinap_lenrek@gmx.de wrote: > nothing wrong with diffing the changes and see if theres a clue, but > to solve this one really needs to find the underlying cause no matter > what. changes can just hide bugs or make them more or less likely to > appear. can anyone provide at least a stacktrace or process snapshot > of the crashed dns processes? from that you try to build a theory of > what might be going wrong by thinking really really hard... (the > thinking should be directly proportional to the time it takes to > reproduce the bug) and then you work on how to prove that theory. > just changing stuff without knowing what exactly was the problem with > the old code is sometimes tempting, but wrong and dangerous. very good point. in the past, much of the trouble has been that the rr records get smashed and you crash much later on. this makes debugging from a crash frustrating. but as it happily turns out, i keep all the snaps of all my broken dns processes. i've had ~35 since january, when i last looked at dns. as it turns out all of the crashes were due to an off-by-one in dnresolv:/^serveraddrs the final loop in the function should be limited by Maxdest-1, since Maxdest is an index not a count. with this fixed, it will be interesting to see if we see better behavior or not. my big concern right now is occassionally corrupted results we see. the crashes and whatnot are relatively easy to clean up after. if anyone else is interested in my compendium of dnssnaps, i'd be happy to send them along. there's only a gb of them! :-). - erik ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-21 18:32 ` erik quanstrom @ 2012-08-21 20:05 ` Charles Forsyth [not found] ` <CAOw7k5hALBex13jJUbscVt4th-Z9urpmAP0eOBi7b1HuBQESrw@mail.gmail.c> ` (2 subsequent siblings) 3 siblings, 0 replies; 71+ messages in thread From: Charles Forsyth @ 2012-08-21 20:05 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 369 bytes --] Doh! I can't tell you how often I've looked at that very line of code, but evidently my eyes are shallow. On 21 August 2012 14:32, erik quanstrom <quanstro@quanstro.net> wrote: > all of the crashes were due to an off-by-one in dnresolv:/^serveraddrs > the final loop in the function should be limited by Maxdest-1, since > Maxdest is an index not a count. > [-- Attachment #2: Type: text/html, Size: 664 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
[parent not found: <CAOw7k5hALBex13jJUbscVt4th-Z9urpmAP0eOBi7b1HuBQESrw@mail.gmail.c>]
* Re: [9fans] dns [not found] ` <CAOw7k5hALBex13jJUbscVt4th-Z9urpmAP0eOBi7b1HuBQESrw@mail.gmail.c> @ 2012-08-21 20:41 ` erik quanstrom 0 siblings, 0 replies; 71+ messages in thread From: erik quanstrom @ 2012-08-21 20:41 UTC (permalink / raw) To: 9fans On Tue Aug 21 16:06:56 EDT 2012, charles.forsyth@gmail.com wrote: > Doh! I can't tell you how often I've looked at that very line of code, > but evidently my eyes are shallow. > i missed it last time, too. - erik ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-21 18:32 ` erik quanstrom 2012-08-21 20:05 ` Charles Forsyth [not found] ` <CAOw7k5hALBex13jJUbscVt4th-Z9urpmAP0eOBi7b1HuBQESrw@mail.gmail.c> @ 2012-08-21 22:44 ` cinap_lenrek 2012-08-22 10:32 ` cinap_lenrek 3 siblings, 0 replies; 71+ messages in thread From: cinap_lenrek @ 2012-08-21 22:44 UTC (permalink / raw) To: 9fans excellent! :) -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-21 18:32 ` erik quanstrom ` (2 preceding siblings ...) 2012-08-21 22:44 ` cinap_lenrek @ 2012-08-22 10:32 ` cinap_lenrek 2012-08-22 12:09 ` Kenji Arisawa 3 siblings, 1 reply; 71+ messages in thread From: cinap_lenrek @ 2012-08-22 10:32 UTC (permalink / raw) To: 9fans wait, Maxdest is a count, see: dest = emalloc(Maxdest * sizeof *dest); /* dest can't be on stack */ code like: Dest *curdest; /* pointer to next to fill */ p = dp->dest; ... if (qp->ndest > qp->curdest - p) { and for(p = qp->dest; p < qp->curdest; p++) indicates that dp->curdest is an end pointer. so it should be perfectly valid for it to point at &dp->dest[Maxdest]. the check at the top of serveraddrs() should really be: if(nd >= Maxdest) /* dest array is full? */ return Maxdest; serveraddr() really returns a count. which is the same as an end pointer index. the result check of that serveraddr() call should really be: if (j < 0 || j > Maxdest) { dnslog("serveraddrs() result %d out of range", j); abort(); } qp->curdest = &qp->dest[j]; and the destck(dp->curdest); should be removed. --- a/sys/src/cmd/ndb/dnresolve.c Wed Aug 22 00:11:42 2012 +0200 +++ b/sys/src/cmd/ndb/dnresolve.c Wed Aug 22 12:28:34 2012 +0200 @@ -832,7 +832,7 @@ Dest *cur; if(nd >= Maxdest) /* dest array is full? */ - return Maxdest - 1; + return Maxdest; /* * look for a server whose address we already know. @@ -1080,13 +1080,12 @@ */ if (qp->ndest > qp->curdest - p) { j = serveraddrs(qp, qp->curdest - p, depth); - if (j < 0 || j >= Maxdest) { + if (j < 0 || j > Maxdest) { dnslog("serveraddrs() result %d out of range", j); abort(); } qp->curdest = &qp->dest[j]; } - destck(qp->curdest); /* no servers, punt */ if (qp->ndest == 0) -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 10:32 ` cinap_lenrek @ 2012-08-22 12:09 ` Kenji Arisawa 2012-08-22 12:35 ` cinap_lenrek 0 siblings, 1 reply; 71+ messages in thread From: Kenji Arisawa @ 2012-08-22 12:09 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 1139 bytes --] Hello, On 2012/08/22, at 19:32, cinap_lenrek@gmx.de wrote: > the result check of that serveraddr() call should really be: > > if (j < 0 || j > Maxdest) { > dnslog("serveraddrs() result %d out of range", j); > abort(); > } > qp->curdest = &qp->dest[j]; what happens if j == Maxdest ? note that j is index. I rather notice the foolowing code. /* use any addresses that we found */ for(trp = arp; trp && nd < Maxdest; trp = trp->next){ cur = &qp->dest[nd]; parseip(cur->a, trp->ip->name); /* * straddling servers can reject all nameservers if they are all * inside, so be sure to list at least one outside ns at * the end of the ns list in /lib/ndb for `dom='. */ if (ipisbm(cur->a) || cfg.straddle && !insideaddr(qp->dp->name) && insidens(cur->a)) continue; cur->nx = 0; cur->s = trp->owner; cur->code = Rtimeout; nd++; } lock(&dnlock); rrfreelist(arp); unlock(&dnlock); return nd; returned value may be Maxdest. This code is in function serveraddrs(), and the function must return index. (must not be Maxdest) Kenji Arisawa [-- Attachment #2: Type: text/html, Size: 4691 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 12:09 ` Kenji Arisawa @ 2012-08-22 12:35 ` cinap_lenrek 2012-08-22 13:05 ` erik quanstrom 2012-08-22 13:23 ` Lucio De Re 0 siblings, 2 replies; 71+ messages in thread From: cinap_lenrek @ 2012-08-22 12:35 UTC (permalink / raw) To: 9fans look closely at: > qp->curdest = &qp->dest[j]; this just takes the address, it doesnt actually dereference anything. you could write this as qp->curdest = qp->dest + j. that destck(qp->curdest); call and the j >= Maxdest check after the serveraddrs() call seem like a confused debug attempt to me. or is here any reason why serveraddrs() should *not* fill the whole array up to the last element and stop at Maxdest-2? look how qp->curdest is used in the code. it is a pointer pointing *past* the last filled entry (or represents the next entry to be filled). if the whole array is filled, it should point at qp->dest + Maxdest (and serveraddrs even checked exactly for that). except that bogus destck(), dp->curdest is never dereferenced in the code and just used in loops as the end marker with the p < dp->curdest or p != dp->curdest condition. -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 12:35 ` cinap_lenrek @ 2012-08-22 13:05 ` erik quanstrom 2012-08-22 13:08 ` erik quanstrom ` (2 more replies) 2012-08-22 13:23 ` Lucio De Re 1 sibling, 3 replies; 71+ messages in thread From: erik quanstrom @ 2012-08-22 13:05 UTC (permalink / raw) To: 9fans On Wed Aug 22 08:38:51 EDT 2012, cinap_lenrek@gmx.de wrote: > look closely at: > > > qp->curdest = &qp->dest[j]; > > this just takes the address, it doesnt actually dereference anything. you > could write this as qp->curdest = qp->dest + j. > > that destck(qp->curdest); call and the j >= Maxdest check after the > serveraddrs() call seem like a confused debug attempt to me. or is here > any reason why serveraddrs() should *not* fill the whole array up > to the last element and stop at Maxdest-2? it will stop at setting qp->curdest = &qp->dest[Maxdest-1], since the maximum return value is the loop limit + 1. since qb->curdest is indirected, for example by setdestoutns, this must be a valid pointer to remain safe. and further, after failing to get an answer from 24 different servers, it's unlikely that the 25th will trying 24 different servers for the same address is already i think it should be smaller. but that's just me. :-). i think we can all agree that this code is hard to read. - erik ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 13:05 ` erik quanstrom @ 2012-08-22 13:08 ` erik quanstrom 2012-08-22 13:11 ` Charles Forsyth [not found] ` <CAOw7k5gGhL-=E=V1C9ffQP7qMyUVMhtrgpzLMidnCsHFGDzHwA@mail.gmail.c> 2 siblings, 0 replies; 71+ messages in thread From: erik quanstrom @ 2012-08-22 13:08 UTC (permalink / raw) To: 9fans > and further, after failing to get an answer from 24 different servers, it's unlikely > that the 25th will trying 24 different servers for the same address is already > i think it should be smaller. but that's just me. :-). good grief. what that was supposed to say was "the 25th is unlikely to be helpful if the first 24 fail. 24 already seems excessive." - erik ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 13:05 ` erik quanstrom 2012-08-22 13:08 ` erik quanstrom @ 2012-08-22 13:11 ` Charles Forsyth [not found] ` <CAOw7k5gGhL-=E=V1C9ffQP7qMyUVMhtrgpzLMidnCsHFGDzHwA@mail.gmail.c> 2 siblings, 0 replies; 71+ messages in thread From: Charles Forsyth @ 2012-08-22 13:11 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 338 bytes --] for(n = 0; n < Maxdest; n++, qp->curdest++) if (setdestoutns(qp->curdest, n) < 0) On 22 August 2012 09:05, erik quanstrom <quanstro@quanstro.net> wrote: > since qb->curdest is indirected, for example by setdestoutns, this must be > a valid pointer to remain safe. > only when the pointer is valid. I think cinap is right. [-- Attachment #2: Type: text/html, Size: 794 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
[parent not found: <CAOw7k5gGhL-=E=V1C9ffQP7qMyUVMhtrgpzLMidnCsHFGDzHwA@mail.gmail.c>]
* Re: [9fans] dns [not found] ` <CAOw7k5gGhL-=E=V1C9ffQP7qMyUVMhtrgpzLMidnCsHFGDzHwA@mail.gmail.c> @ 2012-08-22 13:18 ` erik quanstrom 2012-08-22 13:21 ` Charles Forsyth 2012-08-22 13:31 ` cinap_lenrek 0 siblings, 2 replies; 71+ messages in thread From: erik quanstrom @ 2012-08-22 13:18 UTC (permalink / raw) To: 9fans On Wed Aug 22 09:12:24 EDT 2012, charles.forsyth@gmail.com wrote: > for(n = 0; n < Maxdest; n++, qp->curdest++) > if (setdestoutns(qp->curdest, n) < 0) > > > On 22 August 2012 09:05, erik quanstrom <quanstro@quanstro.net> wrote: > > > since qb->curdest is indirected, for example by setdestoutns, this must be > > a valid pointer to remain safe. > > > > only when the pointer is valid. I think cinap is right. i don't think so. destck will still indirect it: if (qp->ndest > qp->curdest - p) { [...] qp->curdest = &qp->dest[j]; destsum(qp, j); } >> destck(qp->curdest); the definition if destck: static void destck(Dest *p) { assert(p); assert(p->magic == Destmagic); } you can argue that this is a mistake, but i'm sure there are others. so imho, this is not worth fixing. dns needs replacing. - erik ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 13:18 ` erik quanstrom @ 2012-08-22 13:21 ` Charles Forsyth 2012-08-22 13:31 ` cinap_lenrek 1 sibling, 0 replies; 71+ messages in thread From: Charles Forsyth @ 2012-08-22 13:21 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 197 bytes --] that's why cinap removed that line in his diff. On 22 August 2012 09:18, erik quanstrom <quanstro@quanstro.net> wrote: > you can argue that this is a mistake, but i'm sure there are others. [-- Attachment #2: Type: text/html, Size: 447 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 13:18 ` erik quanstrom 2012-08-22 13:21 ` Charles Forsyth @ 2012-08-22 13:31 ` cinap_lenrek 2012-08-22 14:22 ` erik quanstrom 1 sibling, 1 reply; 71+ messages in thread From: cinap_lenrek @ 2012-08-22 13:31 UTC (permalink / raw) To: 9fans i just said it like 2 times already that i think this destck() and the result serveraddrs() check is bogus and is the source of all this confustion. the "curdest" is handled as an end pointer in the rest of the code. its where you start adding stuff, and after writing it, you increment it to the next element. so it always points past the last valid element. this is just idiomatic c code. serveraddrs() should take a start index (current count) and return a count. and at the beginning check if the start index is already the maximum size of the dest array. this makes the most sense and is not hard to understand. -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 13:31 ` cinap_lenrek @ 2012-08-22 14:22 ` erik quanstrom 2012-08-22 14:41 ` Charles Forsyth 2012-08-22 14:47 ` cinap_lenrek 0 siblings, 2 replies; 71+ messages in thread From: erik quanstrom @ 2012-08-22 14:22 UTC (permalink / raw) To: 9fans On Wed Aug 22 09:34:24 EDT 2012, cinap_lenrek@gmx.de wrote: > i just said it like 2 times already that i think this destck() and the > result serveraddrs() check is bogus and is the source of all this > confustion. > > the "curdest" is handled as an end pointer in the rest of the code. doesn't this code from procansw also treat qp->curdest as something other than an end pointer? if(p != qp->curdest) p->code = Rserver; since, if qp->curdest is an end pointer, then p!=qp->curdest ≡ 1. - erik ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 14:22 ` erik quanstrom @ 2012-08-22 14:41 ` Charles Forsyth 2012-08-22 14:47 ` cinap_lenrek 1 sibling, 0 replies; 71+ messages in thread From: Charles Forsyth @ 2012-08-22 14:41 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 177 bytes --] No, see the call to procansw On 22 August 2012 10:22, erik quanstrom <quanstro@quanstro.net> wrote: > since, if qp->curdest is an end pointer, then p!=qp->curdest ≡ 1. [-- Attachment #2: Type: text/html, Size: 425 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 14:22 ` erik quanstrom 2012-08-22 14:41 ` Charles Forsyth @ 2012-08-22 14:47 ` cinap_lenrek 2012-08-22 15:26 ` erik quanstrom ` (2 more replies) 1 sibling, 3 replies; 71+ messages in thread From: cinap_lenrek @ 2012-08-22 14:47 UTC (permalink / raw) To: 9fans this is interesting. the p != qp->curdest check would just support my point because it effectively checks if p is valid. if p would be at qp->curdest, it would be past the last valid entry and hence invalid so its not written. but theres another thing, look in queryns() how p comes to be: /* find responder */ // dnslog("queryns got reply from %I", srcip); for(p = qp->dest; p < qp->curdest; p++) if(memcmp(p->a, srcip, sizeof p->a) == 0) break; for(np = qp->dest; np < qp->curdest; np++) if(np->s == p->s) <-- oops, p might be qp->curdest here p->nx = Maxtrans; <-- fuck! rv = procansw(qp, &m, srcip, depth, p); i think we also need to check p != qp->curdest before that 2nd for loop or it would trash the entry at curdist. @@ -1439,9 +1438,10 @@ break; /* remove all addrs of responding server from list */ - for(np = qp->dest; np < qp->curdest; np++) - if(np->s == p->s) - p->nx = Maxtrans; + if(p != qp->curdest) + for(np = qp->dest; np < qp->curdest; np++) + if(np->s == p->s) + p->nx = Maxtrans; /* free or incorporate RRs in m */ rv = procansw(qp, &m, srcip, depth, p); -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 14:47 ` cinap_lenrek @ 2012-08-22 15:26 ` erik quanstrom 2012-08-22 15:33 ` erik quanstrom [not found] ` <CAOw7k5jOE0Bf13fXJ04dUF4vGniRTJ1LwEMV1Oqw=AFKg77boA@mail.gmail.com> 2 siblings, 0 replies; 71+ messages in thread From: erik quanstrom @ 2012-08-22 15:26 UTC (permalink / raw) To: 9fans On Wed Aug 22 10:51:02 EDT 2012, cinap_lenrek@gmx.de wrote: > this is interesting. the p != qp->curdest check would just > support my point because it effectively checks if p is valid. > if p would be at qp->curdest, it would be past the last valid > entry and hence invalid so its not written. if that's true, then this could be some of the most convoluted code we've got. - erik ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 14:47 ` cinap_lenrek 2012-08-22 15:26 ` erik quanstrom @ 2012-08-22 15:33 ` erik quanstrom 2012-08-22 15:35 ` erik quanstrom [not found] ` <CAOw7k5jOE0Bf13fXJ04dUF4vGniRTJ1LwEMV1Oqw=AFKg77boA@mail.gmail.com> 2 siblings, 1 reply; 71+ messages in thread From: erik quanstrom @ 2012-08-22 15:33 UTC (permalink / raw) To: 9fans [-- Attachment #1: Type: text/plain, Size: 224 bytes --] here's my copy. it probablly won't work if *ncpu>1 on the standard system because vmap's no up to the task. i didn't re-verify that all the vmap() wiggles are necessary with nix; they are with the pae kernel. - erik [-- Attachment #2: realmode.c --] [-- Type: text/plain, Size: 1637 bytes --] #include "u.h" #include "../port/lib.h" #include "mem.h" #include "dat.h" #include "fns.h" #include "io.h" #include "ureg.h" #include "../port/error.h" #include "amd64.h" /* * we depend on aux/realemu to emulate realmode calls. */ enum { LORMBUF = 0x9000, }; static ulong wtab[] = { LORMBUF, LORMBUF+PGSZ, /* realmode buffer page */ 0xA0000, 0xC0000, /* mda/vga range */ }; static void* lomap(uint off) { static uintptr va; #define O PGSZ if(va == 0){ va = PTR2UINT(vmappat(O, 0x100000 - O, PATWB)); if(va == 0) error("can't map lo memory"); } return UINT2PTR(va+off - O); } static long rmemrw(int isr, void *va, long n, vlong off) { uchar *a; int i; a = va; if(off >= 0x100000) return 0; if(off < 0 || n < 0) error("bad offset/count"); if(off+n > 0x100000) n = 0x100000 - off; if(isr){ if(off == 0){ if(n >= PGSZ) memmove(a, KADDR(0), PGSZ); else memmove(a, KADDR(0), n); n -= PGSZ; off += PGSZ; a += PGSZ; } if(n > 0) memmove(a, lomap((ulong)off), n); }else{ /* writes are more restricted */ for(i = 0;; i += 2) if(i == nelem(wtab)) error("bad offset/count in write"); else if(off >= wtab[i] && off+n <= wtab[i+1]){ memmove(lomap((ulong)off), a, n); break; } } return n; } static long rmemread(Chan*, void *a, long n, vlong off) { return rmemrw(1, a, n, off); } static long rmemwrite(Chan*, void *a, long n, vlong off) { return rmemrw(0, a, n, off); } void realmodelink(void) { addarchfile("realmodemem", 0660, rmemread, rmemwrite); } ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 15:33 ` erik quanstrom @ 2012-08-22 15:35 ` erik quanstrom 0 siblings, 0 replies; 71+ messages in thread From: erik quanstrom @ 2012-08-22 15:35 UTC (permalink / raw) To: 9fans On Wed Aug 22 11:34:39 EDT 2012, quanstro@quanstro.net wrote: > here's my copy. it probablly won't work if *ncpu>1 on the standard > system because vmap's no up to the task. > > i didn't re-verify that all the vmap() wiggles are necessary with > nix; they are with the pae kernel. > > - erik sorry, fat fingered the message #. wrong list. so far nothing's gone right today! - erik ^ permalink raw reply [flat|nested] 71+ messages in thread
[parent not found: <CAOw7k5jOE0Bf13fXJ04dUF4vGniRTJ1LwEMV1Oqw=AFKg77boA@mail.gmail.com>]
* Re: [9fans] dns [not found] ` <CAOw7k5jOE0Bf13fXJ04dUF4vGniRTJ1LwEMV1Oqw=AFKg77boA@mail.gmail.com> @ 2012-08-22 16:05 ` Charles Forsyth 0 siblings, 0 replies; 71+ messages in thread From: Charles Forsyth @ 2012-08-22 16:05 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 355 bytes --] I sent this to cinap earlier, who concurred. On 22 August 2012 11:12, Charles Forsyth <charles.forsyth@gmail.com> wrote: > Given the comment > > /* remove all addrs of responding server from list */ > for(np = qp->dest; np < qp->curdest; np++) > if(np->s == p->s) > p->nx = Maxtrans; > > I wonder whether np->nx = Maxtrans was meant. > > [-- Attachment #2: Type: text/html, Size: 926 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-22 12:35 ` cinap_lenrek 2012-08-22 13:05 ` erik quanstrom @ 2012-08-22 13:23 ` Lucio De Re 1 sibling, 0 replies; 71+ messages in thread From: Lucio De Re @ 2012-08-22 13:23 UTC (permalink / raw) To: 9fans > or is here > any reason why serveraddrs() should *not* fill the whole array up > to the last element and stop at Maxdest-2? I'm just guessing here, but Kenji pointed to the need to add a server address in the case of a straddling server. Would that perhaps be a factor here? ++L ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-21 11:27 ` cinap_lenrek ` (2 preceding siblings ...) 2012-08-21 18:32 ` erik quanstrom @ 2012-08-25 0:10 ` Kenji Arisawa 2012-08-25 10:54 ` cinap_lenrek 2012-08-25 21:56 ` cinap_lenrek 3 siblings, 2 replies; 71+ messages in thread From: Kenji Arisawa @ 2012-08-25 0:10 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Hello cinap, I got a one. I hope this a helpful. ar% cat broken/1345779846.41356 name=dns /proc/41356/text:386 plan 9 executable /sys/lib/acid/port /sys/lib/acid/386 acid: abort()+0x0 /sys/src/libc/9sys/abort.c:6 ppanic(p=0x3975c,fmt=0x394ec)+0x146 /sys/src/libc/port/malloc.c:166 pv=0x3e820 msg=0x3f804 v=0xdfffc800 n=0x2c D2B(p=0x3975c,v=0x497f8)+0x5a /sys/src/libc/port/pool.c:968 a=0x497f0 poolfreel(v=0x497f8,p=0x3975c)+0x20 /sys/src/libc/port/pool.c:1192 ab=0x3e820 poolfree(p=0x3975c,v=0x497f8)+0x41 /sys/src/libc/port/pool.c:1327 free(v=0x49800)+0x23 /sys/src/libc/port/malloc.c:250 mydnsquery(qp=0x88cf0,udppkt=0xc76f0,len=0x2a,medium=0x1)+0x185 /sys/src/cmd/ndb/dnresolve.c:1032 rv=0xc domain=0x49800 net=0x74656e2f conndir=0x74656e2f nci=0x52b59 belen=0x6e2f000f xmitquery(qp=0x88cf0,depth=0x1,medium=0x1,inns=0x1,obuf=0xc76f0,len=0x2a)+0x227 /sys/src/cmd/ndb/dnresolve.c:1114 p=0xc7950 j=0x1 n=0x0 buf=0x1b59c4c3 tcpquery(qp=0x88cf0,waitms=0x63f,obuf=0xc76f0,ibuf=0xa7530,depth=0x1,inns=0x1,len=0x2a,req=0x1d85,mp=0xdfffc9b4)+0xea /sys/src/cmd/ndb/dnresolve.c:1353 rv=0x0 endms=0x56ba1ef1 queryns(qp=0x88cf0,obuf=0xc76f0,depth=0x1,inns=0x1,waitms=0x63f,ibuf=0xa7530)+0x4d3 /sys/src/cmd/ndb/dnresolve.c:1428 req=0xa9961d85 len=0x2a dest=0xc7950 p=0xc7c30 ndest=0x1 endms=0x56ba1dcc replywaits=0x0 buf=0x9dfa996 m=0x1d85 srcip=0xdfffca18 rv=0x9dfa996 udpquery(mntpt=0x3f0e0,qp=0x88cf0,patient=0x0,depth=0x1,inns=0x1)+0x1b7 /sys/src/cmd/ndb/dnresolve.c:1578 ibuf=0xa7530 obuf=0xc76f0 fd=0xb msg=0x6faa pcntprob=0x3c reqtm=0x1f40 wait=0x63f rv=0x87710 netquery(depth=0x1,qp=0x88cf0)+0x2b5 /sys/src/cmd/ndb/dnresolve.c:1660 rv=0x0 dp=0x6d460 qlp=0x6d4fc lock=0x1 buf=0x3975c triedin=0x0 inname=0x1 netqueryns(qp=0x88cf0,nsrp=0x876b0,depth=0x1)+0x1e /sys/src/cmd/ndb/dnresolve.c:338 rv=0x88ce8 issuequery(class=0x1,qp=0x88cf0,depth=0x0,name=0xdfffce13,recurse=0x0)+0x50 /sys/src/cmd/ndb/dnresolve.c:359 nsrp=0x876b0 cp=0x88cf0 dbnsrp=0x8558 rp=0x0 dnresolve1(name=0xdfffce13,type=0xf,class=0x1,req=0xdfffcdd8,depth=0x0,recurse=0x0)+0x25c /sys/src/cmd/ndb/dnresolve.c:505 dp=0x6d460 rp=0x0 qp=0x88cf0 dnresolve(status=0xdfffcce0,depth=0x0,rooted=0x0,name=0xdfffce13,class=0x1,type=0xf,req=0xdfffcdd8,cn=0x0,recurse=0x0)+0xa8 /sys/src/cmd/ndb/dnresolve.c:198 procname=0x9cb50 rp=0x0 drp=0x71a98 nrp=0x9cb40 nname=0x48 dp=0xdfffcca8 loops=0x9cb90 lookupqueryold(p=0xdfffce13,mf=0xbac50,req=0xdfffcdd8,rooted=0x0,job=0xba810,errbuf=0xdfffcd0c,wantsav=0x0)+0x70 /sys/src/cmd/ndb/dns.c:864 status=0x0 rp=0x9cb48 rwrite(job=0xba810,mf=0xbac50,req=0xdfffcdd8)+0x2be /sys/src/cmd/ndb/dns.c:838 err=0x0 cnt=0x1b send=0x0 errbuf=0x0 atype=0xdfffce2c io()+0x39e /sys/src/cmd/ndb/dns.c:532 req=0x1 mdata=0x32 n=0x32 job=0xba810 mf=0xbac50 main(argv=0xdfffefb0,argc=0x0)+0x32c /sys/src/cmd/ndb/dns.c:267 ext=0x0 _argc=0x72 _args=0xdfffefc7 servefile=0x642f7323 dir=0x0 kid=0x0 _main+0x31 /sys/src/libc/386/main9.s:16 acid: echo kill > /proc/41356/ctl ar% Kenji Arisawa On 2012/08/21, at 20:27, cinap_lenrek@gmx.de wrote: > nothing wrong with diffing the changes and see if theres a clue, but > to solve this one really needs to find the underlying cause no matter > what. changes can just hide bugs or make them more or less likely to > appear. can anyone provide at least a stacktrace or process snapshot > of the crashed dns processes? from that you try to build a theory of > what might be going wrong by thinking really really hard... (the > thinking should be directly proportional to the time it takes to > reproduce the bug) and then you work on how to prove that theory. > just changing stuff without knowing what exactly was the problem with > the old code is sometimes tempting, but wrong and dangerous. > > -- > cinap > ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-25 0:10 ` Kenji Arisawa @ 2012-08-25 10:54 ` cinap_lenrek 2012-08-25 12:37 ` Kenji Arisawa 2012-08-25 21:56 ` cinap_lenrek 1 sibling, 1 reply; 71+ messages in thread From: cinap_lenrek @ 2012-08-25 10:54 UTC (permalink / raw) To: 9fans always make a process snapshot as the kernel might discard your broken process once it runs low on memory so you have time to debug: snap 41356 >/tmp/dns.snap char *domain strings alloc header seems to have been corrupted (or just freed by accident?). the string just gets allocated and freed in mydnsquery() so its unlikely a bug there. someone else has corrupted its alloc header? it looks more like corruption as we dont hand this pointer out to someone else but netmkaddr(). look at the raw data, often one can get a clue by what it got overridden with and try to figure out what the previous block before our block that got corrupted was. the pool allocator keeps the callerpc's of who allocated the block so you can use that to figure out what it is, or look at the contents. // dump the memory arround our corrupted block dump(0x497f8 - 0x10, 0x100, "X") maybe our block didnt got overridden but really freed with a call to free but with the wrong pointer? check the alloc magic! // check the contents, should be an ip address string dump(0x49800, 1, "s") run acid with -lpool -lleak and run blockdump() if its corrupted block, it might just stop at the block before our one and will print the allocpc's and give some diagnostics. i can try this if you provide process snapshot file. -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-25 10:54 ` cinap_lenrek @ 2012-08-25 12:37 ` Kenji Arisawa 2012-08-25 13:22 ` cinap_lenrek 2012-08-27 22:44 ` arisawa 0 siblings, 2 replies; 71+ messages in thread From: Kenji Arisawa @ 2012-08-25 12:37 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Hello cinap, broken dns triggers Fauth problem, so I have rebooted. I will get snapshot at next crash. Kenji Arisawa On 2012/08/25, at 19:54, cinap_lenrek@gmx.de wrote: > always make a process snapshot as the kernel might discard > your broken process once it runs low on memory so you have > time to debug: > > snap 41356 >/tmp/dns.snap > > char *domain strings alloc header seems to have been corrupted > (or just freed by accident?). > > the string just gets allocated and freed in mydnsquery() so its > unlikely a bug there. someone else has corrupted its alloc header? > > it looks more like corruption as we dont hand this pointer out to > someone else but netmkaddr(). > > look at the raw data, often one can get a clue by what it got > overridden with and try to figure out what the previous block > before our block that got corrupted was. the pool allocator keeps > the callerpc's of who allocated the block so you can use that > to figure out what it is, or look at the contents. > > // dump the memory arround our corrupted block > dump(0x497f8 - 0x10, 0x100, "X") > > maybe our block didnt got overridden but really freed with > a call to free but with the wrong pointer? check the alloc > magic! > > // check the contents, should be an ip address string > dump(0x49800, 1, "s") > > run acid with -lpool -lleak and run blockdump() if its > corrupted block, it might just stop at the block before > our one and will print the allocpc's and give some > diagnostics. > > i can try this if you provide process snapshot file. > > -- > cinap > ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-25 12:37 ` Kenji Arisawa @ 2012-08-25 13:22 ` cinap_lenrek 2012-08-25 13:38 ` cinap_lenrek 2012-08-25 13:41 ` Charles Forsyth 2012-08-27 22:44 ` arisawa 1 sibling, 2 replies; 71+ messages in thread From: cinap_lenrek @ 2012-08-25 13:22 UTC (permalink / raw) To: 9fans did somebody notice that in dnresolve.c, netqueryns() frees the nsrp rr list? but in issuequery() it calls netqueryns() in a loop with the same nsrp pointer! -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-25 13:22 ` cinap_lenrek @ 2012-08-25 13:38 ` cinap_lenrek 2012-08-25 13:41 ` Charles Forsyth 1 sibling, 0 replies; 71+ messages in thread From: cinap_lenrek @ 2012-08-25 13:38 UTC (permalink / raw) To: 9fans no, sorry. missed that nsrp = nil, carry on... -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-25 13:22 ` cinap_lenrek 2012-08-25 13:38 ` cinap_lenrek @ 2012-08-25 13:41 ` Charles Forsyth 2012-08-25 13:44 ` cinap_lenrek 1 sibling, 1 reply; 71+ messages in thread From: Charles Forsyth @ 2012-08-25 13:41 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 337 bytes --] In my copy, which I'm sure I haven't changed, it gets a new nsrp from randomize(rrlookup... each time, or uses dbnsrp, which is randomize(dblookup..., so both are new and can be freed. On 25 August 2012 09:22, <cinap_lenrek@gmx.de> wrote: > but in issuequery() it calls netqueryns() in a loop with the same > nsrp pointer! > [-- Attachment #2: Type: text/html, Size: 615 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-25 13:41 ` Charles Forsyth @ 2012-08-25 13:44 ` cinap_lenrek 0 siblings, 0 replies; 71+ messages in thread From: cinap_lenrek @ 2012-08-25 13:44 UTC (permalink / raw) To: 9fans yes, missed that. -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-25 12:37 ` Kenji Arisawa 2012-08-25 13:22 ` cinap_lenrek @ 2012-08-27 22:44 ` arisawa 2012-08-28 1:45 ` cinap_lenrek 1 sibling, 1 reply; 71+ messages in thread From: arisawa @ 2012-08-27 22:44 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Hello, I got a broken dns snapshot. You can download from: http://plan9.aichi-u.ac.jp/dns.snap.gz Kenji Arisawa ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-27 22:44 ` arisawa @ 2012-08-28 1:45 ` cinap_lenrek 2012-08-28 1:57 ` erik quanstrom 0 siblings, 1 reply; 71+ messages in thread From: cinap_lenrek @ 2012-08-28 1:45 UTC (permalink / raw) To: 9fans very good. thanks. one wired thing is that the string pointer (0xfb900) it tried to free (char *domain) points in the middle of the querylck array of a allocated DN. thats not a valid alloc block indeed. there migh'v been a block there, but it got accidently freed and then the space reused for that DN, or the pointer itself got corrupted (only possible from our current process as its stored on the stack which is private to our proc). theres a block (0xfb9a0) after it that satisfies the requirement of being the real thing (alloc callerpc is right after the smprint(), still valid ip address string). the char *domain pointer is stored on the stack at 0x74(SP). char conndir[40] starts at 0x4c(SP). 0x4c+40 = 0x74 so if dial overflows conndir (off by one error?) it could indeed trash that pointer overriding the lsb of char *domain resulting in bogus 0xfb900 address instead of 0xfb9a0. acid: dump(0xdfffc8b4, 44/4, "X") 0xdfffc8b4: 0x74656e2f <- char conndir[40] 0xdfffc8b8: 0x7063742f 0xdfffc8bc: 0x0000392f 0xdfffc8c0: 0x00000000 0xdfffc8c4: 0x00000000 0xdfffc8c8: 0x00000000 0xdfffc8cc: 0x00000000 0xdfffc8d0: 0x00000000 0xdfffc8d4: 0x00000000 0xdfffc8d8: 0x00000000 0xdfffc8dc: 0x000fb900 <- char *domain /sys/include/libc.h:480: #define NETPATHLEN 40 from sources /sys/src/libc/9sys/dial.c static int fillinds(DS *ds, Dest *dp) { Conn *conn; if (dp->winner < 0) return -1; conn = &dp->conn[dp->winner]; if (ds->cfdp) *ds->cfdp = conn->cfd; if (ds->dir) { strncpy(ds->dir, conn->dir, NETPATHLEN); ds->dir[NETPATHLEN] = '\0'; <---- fuck! } return conn->dfd; } this bug was introduced with the new parallel dial implementation. the old sequential dial doesnt have this bug so 9front systems are not affected. someone make a patch. -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-28 1:45 ` cinap_lenrek @ 2012-08-28 1:57 ` erik quanstrom 2012-08-28 2:08 ` cinap_lenrek 0 siblings, 1 reply; 71+ messages in thread From: erik quanstrom @ 2012-08-28 1:57 UTC (permalink / raw) To: 9fans if (ds->dir) { > strncpy(ds->dir, conn->dir, NETPATHLEN); > ds->dir[NETPATHLEN] = '\0'; i think this is okay, since the definition is char dir[NETPATHLEN+1]; - erik ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-28 1:57 ` erik quanstrom @ 2012-08-28 2:08 ` cinap_lenrek 2012-08-28 4:03 ` erik quanstrom 0 siblings, 1 reply; 71+ messages in thread From: cinap_lenrek @ 2012-08-28 2:08 UTC (permalink / raw) To: 9fans no. just look at all the call sites for announce() and dial(). ghost drivers! ghost drivers everywhere! -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-28 2:08 ` cinap_lenrek @ 2012-08-28 4:03 ` erik quanstrom 2012-09-08 17:37 ` Skip Tavakkolian 0 siblings, 1 reply; 71+ messages in thread From: erik quanstrom @ 2012-08-28 4:03 UTC (permalink / raw) To: 9fans On Mon Aug 27 22:11:14 EDT 2012, cinap_lenrek@gmx.de wrote: > no. > > just look at all the call sites for announce() and dial(). ah, you're right about dial. i misread that. i incorrectly considered the Conn and not the DS. both dial and announce could use a parameter declaring the size of the buffer. - erik ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-28 4:03 ` erik quanstrom @ 2012-09-08 17:37 ` Skip Tavakkolian 2012-09-08 18:02 ` cinap_lenrek 2012-09-08 18:18 ` cinap_lenrek 0 siblings, 2 replies; 71+ messages in thread From: Skip Tavakkolian @ 2012-09-08 17:37 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs I'm seeing dns panics about once a day. It was built from sources that were updated about a week ago from sources. it's a double free error. any ideas? cpue% acid 7196 /proc/7196/text:386 plan 9 executable /sys/lib/acid/port /sys/lib/acid/386 acid: stk() abort()+0x0 /sys/src/libc/9sys/abort.c:6 ppanic(p=0x3975c,fmt=0x394ec)+0x146 /sys/src/libc/port/malloc.c:166 D2B(p=0x3975c,v=0xa71f8)+0x5a /sys/src/libc/port/pool.c:968 poolfreel(v=0xa71f8,p=0x3975c)+0x20 /sys/src/libc/port/pool.c:1192 poolfree(p=0x3975c,v=0xa71f8)+0x41 /sys/src/libc/port/pool.c:1327 free(v=0xa7200)+0x23 /sys/src/libc/port/malloc.c:250 mydnsquery(qp=0x1b48e0,udppkt=0x1906c0,len=0x24,medium=0x1)+0x185 /sys/src/cmd/ndb/dnresolve.c:1032 xmitquery(qp=0x1b48e0,depth=0x1,medium=0x1,inns=0x1,obuf=0x1906c0,len=0x24)+0x227 /sys/src/cmd/ndb/dnresolve.c:1114 tcpquery(qp=0x1b48e0,waitms=0x9e4,obuf=0x1906c0,ibuf=0x21db20,depth=0x1,inns=0x1,len=0x24,req=0x4486,mp=0xdfffe36c)+0xea /sys/src/cmd/ndb/dnresolve.c:1353 queryns(qp=0x1b48e0,obuf=0x1906c0,depth=0x1,inns=0x1,waitms=0x9e4,ibuf=0x21db20)+0x4d3 /sys/src/cmd/ndb/dnresolve.c:1428 udpquery(mntpt=0x3f0e0,qp=0x1b48e0,patient=0x0,depth=0x1,inns=0x1)+0x1b7 /sys/src/cmd/ndb/dnresolve.c:1578 netquery(depth=0x1,qp=0x1b48e0)+0x2b5 /sys/src/cmd/ndb/dnresolve.c:1660 netqueryns(qp=0x1b48e0,nsrp=0x21da60,depth=0x1)+0x1e /sys/src/cmd/ndb/dnresolve.c:338 issuequery(class=0x1,qp=0x1b48e0,depth=0x0,name=0x18ee00,recurse=0x0)+0x1e9 /sys/src/cmd/ndb/dnresolve.c:413 dnresolve1(name=0x18ee00,type=0x1,class=0x1,req=0xdfffe740,depth=0x0,recurse=0x0)+0x25c /sys/src/cmd/ndb/dnresolve.c:505 dnresolve(status=0x0,depth=0x0,rooted=0x1,name=0x18ee00,class=0x1,type=0x1,req=0xdfffe740,cn=0xdfffe774,recurse=0x0)+0xa8 /sys/src/cmd/ndb/dnresolve.c:198 doextquery(mp=0xdfffe760,req=0xdfffe740,recurse=0x0)+0x64 /sys/src/cmd/ndb/dnserver.c:186 dnserver(repp=0xdfffe760,reqp=0xdfffe788,rcode=0x0,srcip=0xdfffe7b0,req=0xdfffe740)+0x1fa /sys/src/cmd/ndb/dnserver.c:83 dnudpserver(mntpt=0x3f0e0)+0x58b /sys/src/cmd/ndb/dnudpserver.c:255 main(argv=0xdfffefb0,argc=0x0)+0x319 /sys/src/cmd/ndb/dns.c:264 _main+0x31 /sys/src/libc/386/main9.s:16 acid: On Mon, Aug 27, 2012 at 9:03 PM, erik quanstrom <quanstro@quanstro.net> wrote: > On Mon Aug 27 22:11:14 EDT 2012, cinap_lenrek@gmx.de wrote: >> no. >> >> just look at all the call sites for announce() and dial(). > > ah, you're right about dial. i misread that. i incorrectly considered > the Conn and not the DS. both dial and announce could use a parameter > declaring the size of the buffer. > > - erik > ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-09-08 17:37 ` Skip Tavakkolian @ 2012-09-08 18:02 ` cinap_lenrek 2012-09-08 18:18 ` cinap_lenrek 1 sibling, 0 replies; 71+ messages in thread From: cinap_lenrek @ 2012-09-08 18:02 UTC (permalink / raw) To: 9fans look at: free(v=0xa7200)+0x23 /sys/src/libc/port/malloc.c:250 notice the lsb is 00 exactly like in the case i solved here: http://9fans.net/archive/2012/08/249 i hate to repeat myself all the time. geoff needs to fix dial(). -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-09-08 17:37 ` Skip Tavakkolian 2012-09-08 18:02 ` cinap_lenrek @ 2012-09-08 18:18 ` cinap_lenrek 2012-09-08 21:53 ` Skip Tavakkolian 1 sibling, 1 reply; 71+ messages in thread From: cinap_lenrek @ 2012-09-08 18:18 UTC (permalink / raw) To: 9fans look at this: free(v=0xa7200)+0x23 /sys/src/libc/port/malloc.c:250 thats exactly what happend here: http://9fans.net/archive/2012/08/249 its not a double free. dial() corrupts your stack. geoff needs to fix fial(). -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-09-08 18:18 ` cinap_lenrek @ 2012-09-08 21:53 ` Skip Tavakkolian 2012-09-08 22:27 ` cinap_lenrek 0 siblings, 1 reply; 71+ messages in thread From: Skip Tavakkolian @ 2012-09-08 21:53 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs I there a patch? it's not obvious from the archive posting what should be changed. -Skip On Sat, Sep 8, 2012 at 11:18 AM, <cinap_lenrek@gmx.de> wrote: > look at this: > > free(v=0xa7200)+0x23 /sys/src/libc/port/malloc.c:250 > > thats exactly what happend here: > > http://9fans.net/archive/2012/08/249 > > its not a double free. dial() corrupts your stack. geoff needs > to fix fial(). > > -- > cinap > ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-09-08 21:53 ` Skip Tavakkolian @ 2012-09-08 22:27 ` cinap_lenrek 2012-09-09 2:09 ` Charles Forsyth 0 siblings, 1 reply; 71+ messages in thread From: cinap_lenrek @ 2012-09-08 22:27 UTC (permalink / raw) To: 9fans its pretty simple, just look where i made the <--- arrow in my original post. a patch would might look like this: static int fillinds(DS *ds, Dest *dp) { Conn *conn; if (dp->winner < 0) return -1; conn = &dp->conn[dp->winner]; if (dp->cfdp) *ds->cfdp = conn->cfd; if (ds->dir) { - strncpy(ds->dir, conn->dir, NETPATHLEN); - ds->dir[NETPATHLEN] = '\0'; + strncpy(ds->dir, conn->dir, NETPATHLEN-1); + ds->dir[NETPATHLEN-1] = '\0'; } return conn->dfd; } to be clear, everyone seems to get confused with conn->dir vs ds->dir. conn->dir has NETPATHLEN+1 capacity (why? makes no sense..). theres no overflow at conn->dir. but ds->dir is a pointer to the connection dir string passed by the caller of dial(). this buffer is just 40 (NETPATHLEN) bytes long (thats its required minimum size), so doing: ds->dir[NETPATHLEN] = '\0'; will write beyond it. theres no patch yet. geoff is notified of the issue. -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-09-08 22:27 ` cinap_lenrek @ 2012-09-09 2:09 ` Charles Forsyth 2012-09-09 2:37 ` cinap_lenrek 0 siblings, 1 reply; 71+ messages in thread From: Charles Forsyth @ 2012-09-09 2:09 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 592 bytes --] it makes perfect sense: if the length of the path is NETPATHLEN, length of a string usually referring to non-zero bytes, you need to allow for the terminating zero byte. unfortunately, as you describe, it's a bit pointless because the dnsresolve callers rely on the manual page ("The path name is guaranteed to be less than 40 bytes long") and use an explicit constant 40 (others use something random, or know to use the undocumented NETPATHLEN). confusion! On 8 September 2012 23:27, <cinap_lenrek@gmx.de> wrote: > conn->dir has NETPATHLEN+1 capacity (why? makes no sense..). [-- Attachment #2: Type: text/html, Size: 894 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-09-09 2:09 ` Charles Forsyth @ 2012-09-09 2:37 ` cinap_lenrek 2012-09-09 3:01 ` Charles Forsyth [not found] ` <CAOw7k5jydN7wssxBEqTc6h9FtLErjkoyxyfuQZLgcBTZ-nRgiw@mail.gmail.c> 0 siblings, 2 replies; 71+ messages in thread From: cinap_lenrek @ 2012-09-09 2:37 UTC (permalink / raw) To: charles.forsyth, 9fans ah, so NETPATHLEN was introduced later and is really refering to the number of non null characters? and there was never an official named constant for the size of the netdir buffers for dial/annouce? thats indeed confusing. -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-09-09 2:37 ` cinap_lenrek @ 2012-09-09 3:01 ` Charles Forsyth [not found] ` <CAOw7k5jydN7wssxBEqTc6h9FtLErjkoyxyfuQZLgcBTZ-nRgiw@mail.gmail.c> 1 sibling, 0 replies; 71+ messages in thread From: Charles Forsyth @ 2012-09-09 3:01 UTC (permalink / raw) To: cinap_lenrek; +Cc: 9fans [-- Attachment #1: Type: text/plain, Size: 641 bytes --] No, I didn't explain it well. NETPATHLEN isn't named in the manual pages, so ndb/dns doesn't use it. On the other hand, dial does, but because it's ...LEN, it's an easy mistake to think of it as the length of a string, which wouldn't include the terminating zero byte, instead of the length of the underlying array. On 9 September 2012 03:37, <cinap_lenrek@gmx.de> wrote: > ah, so NETPATHLEN was introduced later and is really refering to > the number of non null characters? and there was never an official > named constant for the size of the netdir buffers for dial/annouce? > thats indeed confusing. > > -- > cinap > [-- Attachment #2: Type: text/html, Size: 960 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
[parent not found: <CAOw7k5jydN7wssxBEqTc6h9FtLErjkoyxyfuQZLgcBTZ-nRgiw@mail.gmail.c>]
* Re: [9fans] dns [not found] ` <CAOw7k5jydN7wssxBEqTc6h9FtLErjkoyxyfuQZLgcBTZ-nRgiw@mail.gmail.c> @ 2012-09-09 4:26 ` erik quanstrom 2012-09-09 5:28 ` Skip Tavakkolian 0 siblings, 1 reply; 71+ messages in thread From: erik quanstrom @ 2012-09-09 4:26 UTC (permalink / raw) To: 9fans On Sat Sep 8 23:02:25 EDT 2012, charles.forsyth@gmail.com wrote: > No, I didn't explain it well. NETPATHLEN isn't named in the manual > pages, so ndb/dns doesn't use it. On the other hand, dial does, but > because it's ...LEN, it's an easy mistake to think of it as the length > of a string, which wouldn't include the terminating zero byte, instead > of the length of the underlying array. indeed, a fair number of recent dns issues have been this sort of confusion. there is something in ndb/dns (at least for me) that induces confusion. - erik ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-09-09 4:26 ` erik quanstrom @ 2012-09-09 5:28 ` Skip Tavakkolian 0 siblings, 0 replies; 71+ messages in thread From: Skip Tavakkolian @ 2012-09-09 5:28 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Thanks all. it's interesting that announce.c doesn't have the same problem: cpue% grep -n NETPATHLEN /sys/src/libc/*/*.[ch] /sys/src/libc/9sys/announce.c:73: strncpy(dir, buf, NETPATHLEN); /sys/src/libc/9sys/announce.c:74: dir[NETPATHLEN-1] = 0; /sys/src/libc/9sys/announce.c:124: strncpy(newdir, buf, NETPATHLEN); /sys/src/libc/9sys/announce.c:125: newdir[NETPATHLEN-1] = 0; /sys/src/libc/9sys/dial.c:48: char dir[NETPATHLEN+1]; /sys/src/libc/9sys/dial.c:229: strncpy(ds->dir, conn->dir, NETPATHLEN); /sys/src/libc/9sys/dcpuial.c:230: ds->dir[NETPATHLEN] = '\0'; /sys/src/libc/9sys/dial.c:459: snprint(conn->dir, NETPATHLEN, "%s/%s", cname, name); On Sat, Sep 8, 2012 at 9:26 PM, erik quanstrom <quanstro@quanstro.net> wrote: > On Sat Sep 8 23:02:25 EDT 2012, charles.forsyth@gmail.com wrote: > >> No, I didn't explain it well. NETPATHLEN isn't named in the manual >> pages, so ndb/dns doesn't use it. On the other hand, dial does, but >> because it's ...LEN, it's an easy mistake to think of it as the length >> of a string, which wouldn't include the terminating zero byte, instead >> of the length of the underlying array. > > indeed, a fair number of recent dns issues have been this sort of confusion. > there is something in ndb/dns (at least for me) that induces confusion. > > - erik > ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-25 0:10 ` Kenji Arisawa 2012-08-25 10:54 ` cinap_lenrek @ 2012-08-25 21:56 ` cinap_lenrek 2012-08-26 13:16 ` cinap_lenrek 1 sibling, 1 reply; 71+ messages in thread From: cinap_lenrek @ 2012-08-25 21:56 UTC (permalink / raw) To: 9fans looking for bugs, that mydnsquery() function uses netmkaddr(), which stores the resulting dialstring in a global static buffer. the tcplock in the query makes sure theres just one tcp dial going on *per query* but there could be multiple queries in parallel i think which might override that buffer and corrupt it. its probably harmless (dial might try to connect to the wrong machine) and not the cause of your crash tho. -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-25 21:56 ` cinap_lenrek @ 2012-08-26 13:16 ` cinap_lenrek 2012-08-26 13:36 ` Charles Forsyth 0 siblings, 1 reply; 71+ messages in thread From: cinap_lenrek @ 2012-08-26 13:16 UTC (permalink / raw) To: 9fans anyone look at rrcopy() in dn.c, it seems it assumes that structures like rr->soa, rr->cert, rr->key, rr->sig and rr->null where embedded into the RR struct but this is not true. (porting bug?) when you look at rralloc(), you can see that these sub structures are indeed malloced so rrcopy needs to copy these too (like it does for the Tsrv case). this would explain the double frees one sees indeed. -- cinap ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-26 13:16 ` cinap_lenrek @ 2012-08-26 13:36 ` Charles Forsyth 0 siblings, 0 replies; 71+ messages in thread From: Charles Forsyth @ 2012-08-26 13:36 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 629 bytes --] It does. In fact, it's the Tsrv case that looks wrong (it leaks space already allocated by rralloc). nrp = rralloc(rp->type); ... case Tsoa: soa = nrp->soa; --- save space allocated by rralloc *nrp = *rp; --- destroy pointer by copying onto it nrp->soa = soa; --- restore pointer *nrp->soa = *rp->soa; ---- copy substructure. ... On 26 August 2012 09:16, <cinap_lenrek@gmx.de> wrote: > when you look at rralloc(), you can see that these sub structures are > indeed malloced so rrcopy needs to copy these too (like it does for the > Tsrv case). > [-- Attachment #2: Type: text/html, Size: 1433 bytes --] ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2012-08-21 1:51 Jeff Sickel 2012-08-21 1:59 ` erik quanstrom 2012-08-21 4:23 ` arisawa @ 2012-08-21 5:08 ` Benjamin Huntsman 2 siblings, 0 replies; 71+ messages in thread From: Benjamin Huntsman @ 2012-08-21 5:08 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs >Has anyone else been seeing their Plan 9 dns servers work for a little while and then stop responding? I had that, and other problems with it left and right. I probably had to restart DNS a few times per week. I was able to get it to support a local Windows Active Directory domain though, but it was painful, so I switched to BIND on FreeBSD. I'd love to switch back someday though... the Plan 9-based DNS server was much simpler to deal with from a config standpoint... -Ben ^ permalink raw reply [flat|nested] 71+ messages in thread
* [9fans] dns @ 2007-08-15 19:26 erik quanstrom 0 siblings, 0 replies; 71+ messages in thread From: erik quanstrom @ 2007-08-15 19:26 UTC (permalink / raw) To: 9fans the hack i posted yesterday only attacks a symptom. the real problem is queries like dennis 7141063 0:00 0:00 11148K Rendez dns [reading outside reply from 10.128.1.22 for 190.73-94-123.dyn.dsl.cantv.net ip] which for us is bogus, but could be valid in some cases, causes an infinite lookup loop. trying to make a special case for unroutable addresses is probablly the wrong route. (you might have an internal dns server or two on an unrouteable address.) i think, in fact, that any lookup on a set of nameservers that are all unavailable will never complete. - erik p.s. it turns out that on most networks, ptr queries on *.168.192.in-addr.arpa work just fine returning a negative rcode. for us, our provider is dropping these packets. the ns is for 168.192.in-addr.arpa is 192.175.x.y. i suspect that someone's routing table has has a 192.168/12 instead of a /16. ^ permalink raw reply [flat|nested] 71+ messages in thread
* [9fans] dns @ 2007-07-03 20:54 erik quanstrom 2007-07-04 20:51 ` geoff 0 siblings, 1 reply; 71+ messages in thread From: erik quanstrom @ 2007-07-03 20:54 UTC (permalink / raw) To: 9fans i realize this is apolitical and offtopic. i apologize in advance. geoff's improvements in dns are really quite nice. dns appears do a good job in the face of well-behaved servers, but there are some ill behaved ones for popular sites that give me occasional fits. for example, www.apple.com decided to disappear last evening. i looked into the problem and nserver*.apple.com were not reachable. and the reason the local nameserver didn't have www.apple.com cached is that the ttl of www.apple.com is 60 seconds. interestingly, a dsl modem we have here continued to resolve www.apple.com during this time, though it also couldn't reach the nameserver. it seems that the dsl modem's resolver or upstream cache was just serving up a stale rr. this is a common problem around here (esp.) for akadns-served sites with very low ttls. i'm thinking it would make sense to either (in order of increasing preference andimplementation difficulty ) a) ignore very short, but nonzero, ttls and make them at least 3600s. b) serve up stale rrs if a fresh answer isn't available up to some multiple of the real ttl. c) activly refresh the "most active" cache entries starting at 1/2 the ttl and increasing in frequency until some mutiple of ttl has expired. - erik ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2007-07-03 20:54 erik quanstrom @ 2007-07-04 20:51 ` geoff 2007-07-04 21:18 ` erik quanstrom 0 siblings, 1 reply; 71+ messages in thread From: geoff @ 2007-07-04 20:51 UTC (permalink / raw) To: 9fans I've seen the same thing with ibm.com and google.com; they tend to use ttls of 5 minutes or less. On the other hand, we use short ttls internally for zones like cs.bell-labs.com. It's tempting to set a floor on ttls for NS records at least, but ultimately I wonder why expired data isn't just getting replaced. Perhaps the time-outs on queries need to be adjusted. One problem we see locally is that we never seem to get negative answers to queries, notably for AAAA and CNAME queries, which rarely succeed, so we wait for the queries to time out instead of getting prompt negative answers from our upstream nameservers. ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2007-07-04 20:51 ` geoff @ 2007-07-04 21:18 ` erik quanstrom 2007-07-04 21:52 ` geoff 0 siblings, 1 reply; 71+ messages in thread From: erik quanstrom @ 2007-07-04 21:18 UTC (permalink / raw) To: 9fans ah, we're out of phase 180 degrees. the particular problem i happened to catch the other day found nserver*.apple.com unreachable starting from the root. i tried with dig on linux, too. that didn't work either. but perhaps trying harder would make sense. i noticed that you aren't seeing negative responses to cname queries from the code. is that something particular to your environment or is that something we are likely to see, too.? unfortunately akadns uses cnames in their scheme. this really stinks. it seems that the timeout in such a case should depend on the name. (awful, no?) i wonder if it might be safe to replace small tts with the expire time in the soa. being on the other end of that a decade ago, i remember to maintaining a server at an old ip address for 2*expiretime to ensure that everybody had updated. it was suprising how long the old ip address was cached. - erik ^ permalink raw reply [flat|nested] 71+ messages in thread
* Re: [9fans] dns 2007-07-04 21:18 ` erik quanstrom @ 2007-07-04 21:52 ` geoff 0 siblings, 0 replies; 71+ messages in thread From: geoff @ 2007-07-04 21:52 UTC (permalink / raw) To: 9fans I'm not sure if the lack of negative answers is a local phenomenon or not, but I suspect so. Our dns environment got more complex a few months ago and our upstream dns servers may be at fault. ^ permalink raw reply [flat|nested] 71+ messages in thread
* [9fans] dns @ 2004-03-20 2:25 David Presotto 0 siblings, 0 replies; 71+ messages in thread From: David Presotto @ 2004-03-20 2:25 UTC (permalink / raw) To: 9fans Fixed a problem in dns related to flush. You should pick up new dns source and rebuild it. ^ permalink raw reply [flat|nested] 71+ messages in thread
end of thread, other threads:[~2017-04-03 7:06 UTC | newest] Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-01 1:28 [9fans] DNS cinap_lenrek 2017-04-01 3:40 ` Skip Tavakkolian 2017-04-01 9:46 ` Alexandru Gheorghe 2017-04-01 17:04 ` Steve Simon 2017-04-03 7:06 ` David Arroyo -- strict thread matches above, loose matches on Subject: below -- 2017-03-20 7:26 Skip Tavakkolian 2017-03-21 19:19 ` Skip Tavakkolian 2017-03-30 22:19 ` Steve Simon 2017-03-31 7:43 ` Peter Hull 2012-08-26 14:48 [9fans] dns cinap_lenrek 2012-08-21 1:51 Jeff Sickel 2012-08-21 1:59 ` erik quanstrom 2012-08-21 4:23 ` arisawa 2012-08-21 5:08 ` Jeff Sickel 2012-08-21 8:42 ` Kenji Arisawa 2012-08-21 11:27 ` cinap_lenrek 2012-08-21 12:18 ` Lucio De Re 2012-08-21 12:37 ` arisawa 2012-08-21 18:32 ` erik quanstrom 2012-08-21 20:05 ` Charles Forsyth [not found] ` <CAOw7k5hALBex13jJUbscVt4th-Z9urpmAP0eOBi7b1HuBQESrw@mail.gmail.c> 2012-08-21 20:41 ` erik quanstrom 2012-08-21 22:44 ` cinap_lenrek 2012-08-22 10:32 ` cinap_lenrek 2012-08-22 12:09 ` Kenji Arisawa 2012-08-22 12:35 ` cinap_lenrek 2012-08-22 13:05 ` erik quanstrom 2012-08-22 13:08 ` erik quanstrom 2012-08-22 13:11 ` Charles Forsyth [not found] ` <CAOw7k5gGhL-=E=V1C9ffQP7qMyUVMhtrgpzLMidnCsHFGDzHwA@mail.gmail.c> 2012-08-22 13:18 ` erik quanstrom 2012-08-22 13:21 ` Charles Forsyth 2012-08-22 13:31 ` cinap_lenrek 2012-08-22 14:22 ` erik quanstrom 2012-08-22 14:41 ` Charles Forsyth 2012-08-22 14:47 ` cinap_lenrek 2012-08-22 15:26 ` erik quanstrom 2012-08-22 15:33 ` erik quanstrom 2012-08-22 15:35 ` erik quanstrom [not found] ` <CAOw7k5jOE0Bf13fXJ04dUF4vGniRTJ1LwEMV1Oqw=AFKg77boA@mail.gmail.com> 2012-08-22 16:05 ` Charles Forsyth 2012-08-22 13:23 ` Lucio De Re 2012-08-25 0:10 ` Kenji Arisawa 2012-08-25 10:54 ` cinap_lenrek 2012-08-25 12:37 ` Kenji Arisawa 2012-08-25 13:22 ` cinap_lenrek 2012-08-25 13:38 ` cinap_lenrek 2012-08-25 13:41 ` Charles Forsyth 2012-08-25 13:44 ` cinap_lenrek 2012-08-27 22:44 ` arisawa 2012-08-28 1:45 ` cinap_lenrek 2012-08-28 1:57 ` erik quanstrom 2012-08-28 2:08 ` cinap_lenrek 2012-08-28 4:03 ` erik quanstrom 2012-09-08 17:37 ` Skip Tavakkolian 2012-09-08 18:02 ` cinap_lenrek 2012-09-08 18:18 ` cinap_lenrek 2012-09-08 21:53 ` Skip Tavakkolian 2012-09-08 22:27 ` cinap_lenrek 2012-09-09 2:09 ` Charles Forsyth 2012-09-09 2:37 ` cinap_lenrek 2012-09-09 3:01 ` Charles Forsyth [not found] ` <CAOw7k5jydN7wssxBEqTc6h9FtLErjkoyxyfuQZLgcBTZ-nRgiw@mail.gmail.c> 2012-09-09 4:26 ` erik quanstrom 2012-09-09 5:28 ` Skip Tavakkolian 2012-08-25 21:56 ` cinap_lenrek 2012-08-26 13:16 ` cinap_lenrek 2012-08-26 13:36 ` Charles Forsyth 2012-08-21 5:08 ` Benjamin Huntsman 2007-08-15 19:26 erik quanstrom 2007-07-03 20:54 erik quanstrom 2007-07-04 20:51 ` geoff 2007-07-04 21:18 ` erik quanstrom 2007-07-04 21:52 ` geoff 2004-03-20 2:25 David Presotto
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).