* Patches for DNS resolving @ 2017-01-19 21:48 KARL BIELEFELDT 2017-01-19 21:49 ` [PATCH 1/2] Fix name_from_dns_search to handle errors properly KARL BIELEFELDT 2017-01-19 21:49 ` [PATCH 2/2] Fix digit offset on resolv.conf attempts KARL BIELEFELDT 0 siblings, 2 replies; 9+ messages in thread From: KARL BIELEFELDT @ 2017-01-19 21:48 UTC (permalink / raw) To: musl; +Cc: OMADA_TEAM Hello, Following are two patches for DNS resolving. The first is for two bugs that cancel each other out during normal operation, but when another error occurs, like a network timeout, it causes the search to terminate too early. We happened to be "fortunate" enough to have a Kubernetes cluster with the boundary condition for this to occur all the time. The second patch fixes a copy/paste error we happened to see while reading the code during debugging of the first issue. Musl is a great product. We hope you find these fixes useful. Sincerely, Karl Bielefeldt ADTRAN ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] Fix name_from_dns_search to handle errors properly. 2017-01-19 21:48 Patches for DNS resolving KARL BIELEFELDT @ 2017-01-19 21:49 ` KARL BIELEFELDT 2017-01-19 23:11 ` Rich Felker 2017-01-19 21:49 ` [PATCH 2/2] Fix digit offset on resolv.conf attempts KARL BIELEFELDT 1 sibling, 1 reply; 9+ messages in thread From: KARL BIELEFELDT @ 2017-01-19 21:49 UTC (permalink / raw) To: musl; +Cc: OMADA_TEAM, KARL BIELEFELDT This function previously exited after the first search failure due to an inverted test condition, and incorrect testing of return codes in name_from_dns. This commit corrects those self-cancelling errors that were only evident when another type of error such as a network timeout occurred. --- src/network/lookup_name.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c index fb7303a3..bfa76d38 100644 --- a/src/network/lookup_name.c +++ b/src/network/lookup_name.c @@ -164,8 +164,8 @@ static int name_from_dns(struct address buf[static MAXADDRS], char canon[static if (ctx.cnt) return ctx.cnt; if (alens[0] < 4 || (abuf[0][3] & 15) == 2) return EAI_AGAIN; - if ((abuf[0][3] & 15) == 0) return EAI_NONAME; - if ((abuf[0][3] & 15) == 3) return 0; + if ((abuf[0][3] & 15) == 3) return EAI_NONAME; + if ((abuf[0][3] & 15) == 0) return 0; return EAI_FAIL; } @@ -201,7 +201,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[ memcpy(canon+l+1, p, z-p); canon[z-p+1+l] = 0; int cnt = name_from_dns(buf, canon, canon, family, &conf); - if (cnt) return cnt; + if (!cnt) return cnt; } } -- 2.11.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Fix name_from_dns_search to handle errors properly. 2017-01-19 21:49 ` [PATCH 1/2] Fix name_from_dns_search to handle errors properly KARL BIELEFELDT @ 2017-01-19 23:11 ` Rich Felker 2017-01-20 0:13 ` KARL BIELEFELDT 0 siblings, 1 reply; 9+ messages in thread From: Rich Felker @ 2017-01-19 23:11 UTC (permalink / raw) To: musl On Thu, Jan 19, 2017 at 09:49:01PM +0000, KARL BIELEFELDT wrote: > This function previously exited after the first search failure > due to an inverted test condition, and incorrect testing of > return codes in name_from_dns. This commit corrects those > self-cancelling errors that were only evident when another type > of error such as a network timeout occurred. > --- > src/network/lookup_name.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c > index fb7303a3..bfa76d38 100644 > --- a/src/network/lookup_name.c > +++ b/src/network/lookup_name.c > @@ -164,8 +164,8 @@ static int name_from_dns(struct address buf[static MAXADDRS], char canon[static > > if (ctx.cnt) return ctx.cnt; > if (alens[0] < 4 || (abuf[0][3] & 15) == 2) return EAI_AGAIN; > - if ((abuf[0][3] & 15) == 0) return EAI_NONAME; > - if ((abuf[0][3] & 15) == 3) return 0; > + if ((abuf[0][3] & 15) == 3) return EAI_NONAME; > + if ((abuf[0][3] & 15) == 0) return 0; > return EAI_FAIL; > } > > @@ -201,7 +201,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[ > memcpy(canon+l+1, p, z-p); > canon[z-p+1+l] = 0; > int cnt = name_from_dns(buf, canon, canon, family, &conf); > - if (cnt) return cnt; > + if (!cnt) return cnt; > } > } I think you're mistaken here. The intent throughout __lookup_name is that, when one backend returns 0 (no results), the lookup process continues to the next backend. Conversely, if a backend returns a positive number of results or an error, lookup must stop and report what was found. If this principle is not followed then transient errors are not reported to the caller but instead _silently_ change the result of the lookup. In the case of dns lookups, a response code of 0 from the nameserver with 0 records is a result of "this dns label exists, but doesn't have a record of the requested type". Usually that happens when only A was available but you requested AAAA, or vice versa. In that case we must return EAI_NONAME as the result rather than continuing the search. Otherwise, requests for AF_UNSPEC will return results from different sources than what you would get from at least one of AF_INET or AF_INET6. On the other hand, a result of NxDomain is not an error; it simply means the requested dns label does not exist at all. In that case, it's valid to continue the search. Rich ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Fix name_from_dns_search to handle errors properly. 2017-01-19 23:11 ` Rich Felker @ 2017-01-20 0:13 ` KARL BIELEFELDT 2017-01-20 1:02 ` Rich Felker 0 siblings, 1 reply; 9+ messages in thread From: KARL BIELEFELDT @ 2017-01-20 0:13 UTC (permalink / raw) To: musl; +Cc: OMADA_TEAM On 01/19/2017 05:11 PM, Rich Felker wrote: > On Thu, Jan 19, 2017 at 09:49:01PM +0000, KARL BIELEFELDT wrote: >> This function previously exited after the first search failure >> due to an inverted test condition, and incorrect testing of >> return codes in name_from_dns. This commit corrects those >> self-cancelling errors that were only evident when another type >> of error such as a network timeout occurred. >> --- >> src/network/lookup_name.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c >> index fb7303a3..bfa76d38 100644 >> --- a/src/network/lookup_name.c >> +++ b/src/network/lookup_name.c >> @@ -164,8 +164,8 @@ static int name_from_dns(struct address buf[static MAXADDRS], char canon[static >> >> if (ctx.cnt) return ctx.cnt; >> if (alens[0] < 4 || (abuf[0][3] & 15) == 2) return EAI_AGAIN; >> - if ((abuf[0][3] & 15) == 0) return EAI_NONAME; >> - if ((abuf[0][3] & 15) == 3) return 0; >> + if ((abuf[0][3] & 15) == 3) return EAI_NONAME; >> + if ((abuf[0][3] & 15) == 0) return 0; >> return EAI_FAIL; >> } >> >> @@ -201,7 +201,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[ >> memcpy(canon+l+1, p, z-p); >> canon[z-p+1+l] = 0; >> int cnt = name_from_dns(buf, canon, canon, family, &conf); >> - if (cnt) return cnt; >> + if (!cnt) return cnt; >> } >> } > I think you're mistaken here. The intent throughout __lookup_name is > that, when one backend returns 0 (no results), the lookup process > continues to the next backend. Conversely, if a backend returns a > positive number of results or an error, lookup must stop and report > what was found. If this principle is not followed then transient > errors are not reported to the caller but instead _silently_ change > the result of the lookup. > > In the case of dns lookups, a response code of 0 from the nameserver > with 0 records is a result of "this dns label exists, but doesn't have > a record of the requested type". Usually that happens when only A was > available but you requested AAAA, or vice versa. In that case we must > return EAI_NONAME as the result rather than continuing the search. > Otherwise, requests for AF_UNSPEC will return results from different > sources than what you would get from at least one of AF_INET or > AF_INET6. > > On the other hand, a result of NxDomain is not an error; it simply > means the requested dns label does not exist at all. In that case, > it's valid to continue the search. > > > Rich > OK, I follow that. The problem we're seeing though is when one of the entries on the "search" line of /etc/resolv.conf causes a network timeout error, it never gets down to the last line of the function, which as far as I can tell tests the absolute name without any search list elements appended, and would have returned a successful result. Karl ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Fix name_from_dns_search to handle errors properly. 2017-01-20 0:13 ` KARL BIELEFELDT @ 2017-01-20 1:02 ` Rich Felker 2017-01-20 16:23 ` KARL BIELEFELDT 0 siblings, 1 reply; 9+ messages in thread From: Rich Felker @ 2017-01-20 1:02 UTC (permalink / raw) To: musl On Fri, Jan 20, 2017 at 12:13:36AM +0000, KARL BIELEFELDT wrote: > On 01/19/2017 05:11 PM, Rich Felker wrote: > > On Thu, Jan 19, 2017 at 09:49:01PM +0000, KARL BIELEFELDT wrote: > >> This function previously exited after the first search failure > >> due to an inverted test condition, and incorrect testing of > >> return codes in name_from_dns. This commit corrects those > >> self-cancelling errors that were only evident when another type > >> of error such as a network timeout occurred. > >> --- > >> src/network/lookup_name.c | 6 +++--- > >> 1 file changed, 3 insertions(+), 3 deletions(-) > >> > >> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c > >> index fb7303a3..bfa76d38 100644 > >> --- a/src/network/lookup_name.c > >> +++ b/src/network/lookup_name.c > >> @@ -164,8 +164,8 @@ static int name_from_dns(struct address buf[static MAXADDRS], char canon[static > >> > >> if (ctx.cnt) return ctx.cnt; > >> if (alens[0] < 4 || (abuf[0][3] & 15) == 2) return EAI_AGAIN; > >> - if ((abuf[0][3] & 15) == 0) return EAI_NONAME; > >> - if ((abuf[0][3] & 15) == 3) return 0; > >> + if ((abuf[0][3] & 15) == 3) return EAI_NONAME; > >> + if ((abuf[0][3] & 15) == 0) return 0; > >> return EAI_FAIL; > >> } > >> > >> @@ -201,7 +201,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[ > >> memcpy(canon+l+1, p, z-p); > >> canon[z-p+1+l] = 0; > >> int cnt = name_from_dns(buf, canon, canon, family, &conf); > >> - if (cnt) return cnt; > >> + if (!cnt) return cnt; > >> } > >> } > > I think you're mistaken here. The intent throughout __lookup_name is > > that, when one backend returns 0 (no results), the lookup process > > continues to the next backend. Conversely, if a backend returns a > > positive number of results or an error, lookup must stop and report > > what was found. If this principle is not followed then transient > > errors are not reported to the caller but instead _silently_ change > > the result of the lookup. > > > > In the case of dns lookups, a response code of 0 from the nameserver > > with 0 records is a result of "this dns label exists, but doesn't have > > a record of the requested type". Usually that happens when only A was > > available but you requested AAAA, or vice versa. In that case we must > > return EAI_NONAME as the result rather than continuing the search. > > Otherwise, requests for AF_UNSPEC will return results from different > > sources than what you would get from at least one of AF_INET or > > AF_INET6. > > > > On the other hand, a result of NxDomain is not an error; it simply > > means the requested dns label does not exist at all. In that case, > > it's valid to continue the search. > > > OK, I follow that. The problem we're seeing though is when one of the > entries > on the "search" line of /etc/resolv.conf causes a network timeout error, > it never > gets down to the last line of the function, which as far as I can tell > tests the absolute > name without any search list elements appended, and would have returned > a successful > result. This is the only way to get consistent results; otherwise, which result you get changes depending on whether you experience a transient error on the search element. Note that this generally only happens with ndots>1. With the default, ndots=1, only queries with no dots in them (which are generally not valid as standalone dns lookups, although sometimes now they are with the ridiculous new tlds) are subject to search. Use of ndots>1 has lots of other issues like making dns lookups twice as slow. Rich ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Fix name_from_dns_search to handle errors properly. 2017-01-20 1:02 ` Rich Felker @ 2017-01-20 16:23 ` KARL BIELEFELDT 0 siblings, 0 replies; 9+ messages in thread From: KARL BIELEFELDT @ 2017-01-20 16:23 UTC (permalink / raw) To: musl; +Cc: OMADA_TEAM On 01/19/2017 07:02 PM, Rich Felker wrote: > On Fri, Jan 20, 2017 at 12:13:36AM +0000, KARL BIELEFELDT wrote: >> On 01/19/2017 05:11 PM, Rich Felker wrote: >>> On Thu, Jan 19, 2017 at 09:49:01PM +0000, KARL BIELEFELDT wrote: >>>> This function previously exited after the first search failure >>>> due to an inverted test condition, and incorrect testing of >>>> return codes in name_from_dns. This commit corrects those >>>> self-cancelling errors that were only evident when another type >>>> of error such as a network timeout occurred. >>>> --- >>>> src/network/lookup_name.c | 6 +++--- >>>> 1 file changed, 3 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c >>>> index fb7303a3..bfa76d38 100644 >>>> --- a/src/network/lookup_name.c >>>> +++ b/src/network/lookup_name.c >>>> @@ -164,8 +164,8 @@ static int name_from_dns(struct address buf[static MAXADDRS], char canon[static >>>> >>>> if (ctx.cnt) return ctx.cnt; >>>> if (alens[0] < 4 || (abuf[0][3] & 15) == 2) return EAI_AGAIN; >>>> - if ((abuf[0][3] & 15) == 0) return EAI_NONAME; >>>> - if ((abuf[0][3] & 15) == 3) return 0; >>>> + if ((abuf[0][3] & 15) == 3) return EAI_NONAME; >>>> + if ((abuf[0][3] & 15) == 0) return 0; >>>> return EAI_FAIL; >>>> } >>>> >>>> @@ -201,7 +201,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[ >>>> memcpy(canon+l+1, p, z-p); >>>> canon[z-p+1+l] = 0; >>>> int cnt = name_from_dns(buf, canon, canon, family, &conf); >>>> - if (cnt) return cnt; >>>> + if (!cnt) return cnt; >>>> } >>>> } >>> I think you're mistaken here. The intent throughout __lookup_name is >>> that, when one backend returns 0 (no results), the lookup process >>> continues to the next backend. Conversely, if a backend returns a >>> positive number of results or an error, lookup must stop and report >>> what was found. If this principle is not followed then transient >>> errors are not reported to the caller but instead _silently_ change >>> the result of the lookup. >>> >>> In the case of dns lookups, a response code of 0 from the nameserver >>> with 0 records is a result of "this dns label exists, but doesn't have >>> a record of the requested type". Usually that happens when only A was >>> available but you requested AAAA, or vice versa. In that case we must >>> return EAI_NONAME as the result rather than continuing the search. >>> Otherwise, requests for AF_UNSPEC will return results from different >>> sources than what you would get from at least one of AF_INET or >>> AF_INET6. >>> >>> On the other hand, a result of NxDomain is not an error; it simply >>> means the requested dns label does not exist at all. In that case, >>> it's valid to continue the search. >>> >> OK, I follow that. The problem we're seeing though is when one of the >> entries >> on the "search" line of /etc/resolv.conf causes a network timeout error, >> it never >> gets down to the last line of the function, which as far as I can tell >> tests the absolute >> name without any search list elements appended, and would have returned >> a successful >> result. > This is the only way to get consistent results; otherwise, which > result you get changes depending on whether you experience a transient > error on the search element. > > Note that this generally only happens with ndots>1. With the default, > ndots=1, only queries with no dots in them (which are generally not > valid as standalone dns lookups, although sometimes now they are with > the ridiculous new tlds) are subject to search. Use of ndots>1 has > lots of other issues like making dns lookups twice as slow. > > Rich > That makes sense. I appreciate you taking the time to explain it. Karl ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] Fix digit offset on resolv.conf attempts. 2017-01-19 21:48 Patches for DNS resolving KARL BIELEFELDT 2017-01-19 21:49 ` [PATCH 1/2] Fix name_from_dns_search to handle errors properly KARL BIELEFELDT @ 2017-01-19 21:49 ` KARL BIELEFELDT 2017-01-20 3:50 ` Rich Felker 1 sibling, 1 reply; 9+ messages in thread From: KARL BIELEFELDT @ 2017-01-19 21:49 UTC (permalink / raw) To: musl; +Cc: OMADA_TEAM, KARL BIELEFELDT --- src/network/resolvconf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/network/resolvconf.c b/src/network/resolvconf.c index 2cf1f475..4c3e4c4b 100644 --- a/src/network/resolvconf.c +++ b/src/network/resolvconf.c @@ -45,8 +45,8 @@ int __get_resolv_conf(struct resolvconf *conf, char *search, size_t search_sz) if (z != p) conf->ndots = x > 15 ? 15 : x; } p = strstr(line, "attempts:"); - if (p && isdigit(p[6])) { - p += 6; + if (p && isdigit(p[9])) { + p += 9; unsigned long x = strtoul(p, &z, 10); if (z != p) conf->attempts = x > 10 ? 10 : x; } -- 2.11.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] Fix digit offset on resolv.conf attempts. 2017-01-19 21:49 ` [PATCH 2/2] Fix digit offset on resolv.conf attempts KARL BIELEFELDT @ 2017-01-20 3:50 ` Rich Felker 2017-01-20 16:22 ` KARL BIELEFELDT 0 siblings, 1 reply; 9+ messages in thread From: Rich Felker @ 2017-01-20 3:50 UTC (permalink / raw) To: musl On Thu, Jan 19, 2017 at 09:49:08PM +0000, KARL BIELEFELDT wrote: > --- > src/network/resolvconf.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/src/network/resolvconf.c b/src/network/resolvconf.c > index 2cf1f475..4c3e4c4b 100644 > --- a/src/network/resolvconf.c > +++ b/src/network/resolvconf.c > @@ -45,8 +45,8 @@ int __get_resolv_conf(struct resolvconf *conf, char *search, size_t search_sz) > if (z != p) conf->ndots = x > 15 ? 15 : x; > } > p = strstr(line, "attempts:"); > - if (p && isdigit(p[6])) { > - p += 6; > + if (p && isdigit(p[9])) { > + p += 9; > unsigned long x = strtoul(p, &z, 10); > if (z != p) conf->attempts = x > 10 ? 10 : x; > } This looks correct. Would you like to be credited as commit author? (I usually ask for new first-time contributors before putting name/email in permanent repo history.) Rich ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] Fix digit offset on resolv.conf attempts. 2017-01-20 3:50 ` Rich Felker @ 2017-01-20 16:22 ` KARL BIELEFELDT 0 siblings, 0 replies; 9+ messages in thread From: KARL BIELEFELDT @ 2017-01-20 16:22 UTC (permalink / raw) To: musl Yes, thanks. On 01/19/2017 09:50 PM, Rich Felker wrote: > On Thu, Jan 19, 2017 at 09:49:08PM +0000, KARL BIELEFELDT wrote: >> --- >> src/network/resolvconf.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/src/network/resolvconf.c b/src/network/resolvconf.c >> index 2cf1f475..4c3e4c4b 100644 >> --- a/src/network/resolvconf.c >> +++ b/src/network/resolvconf.c >> @@ -45,8 +45,8 @@ int __get_resolv_conf(struct resolvconf *conf, char *search, size_t search_sz) >> if (z != p) conf->ndots = x > 15 ? 15 : x; >> } >> p = strstr(line, "attempts:"); >> - if (p && isdigit(p[6])) { >> - p += 6; >> + if (p && isdigit(p[9])) { >> + p += 9; >> unsigned long x = strtoul(p, &z, 10); >> if (z != p) conf->attempts = x > 10 ? 10 : x; >> } > This looks correct. Would you like to be credited as commit author? (I > usually ask for new first-time contributors before putting name/email > in permanent repo history.) > > Rich > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-01-20 16:23 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-01-19 21:48 Patches for DNS resolving KARL BIELEFELDT 2017-01-19 21:49 ` [PATCH 1/2] Fix name_from_dns_search to handle errors properly KARL BIELEFELDT 2017-01-19 23:11 ` Rich Felker 2017-01-20 0:13 ` KARL BIELEFELDT 2017-01-20 1:02 ` Rich Felker 2017-01-20 16:23 ` KARL BIELEFELDT 2017-01-19 21:49 ` [PATCH 2/2] Fix digit offset on resolv.conf attempts KARL BIELEFELDT 2017-01-20 3:50 ` Rich Felker 2017-01-20 16:22 ` KARL BIELEFELDT
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).