mailing list of musl libc
 help / color / mirror / code / Atom feed
* Patches for DNS resolving
@ 2017-01-19 21:48 KARL BIELEFELDT
  2017-01-19 21:49 ` [PATCH 1/2] Fix name_from_dns_search to handle errors properly KARL BIELEFELDT
  2017-01-19 21:49 ` [PATCH 2/2] Fix digit offset on resolv.conf attempts KARL BIELEFELDT
  0 siblings, 2 replies; 9+ messages in thread
From: KARL BIELEFELDT @ 2017-01-19 21:48 UTC (permalink / raw)
  To: musl; +Cc: OMADA_TEAM


Hello,

Following are two patches for DNS resolving.  The first is for two bugs that
cancel each other out during normal operation, but when another error occurs,
like a network timeout, it causes the search to terminate too early.  We
happened to be "fortunate" enough to have a Kubernetes cluster with the
boundary condition for this to occur all the time.

The second patch fixes a copy/paste error we happened to see while reading the
code during debugging of the first issue.

Musl is a great product.  We hope you find these fixes useful.

Sincerely,

Karl Bielefeldt
ADTRAN



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] Fix name_from_dns_search to handle errors properly.
  2017-01-19 21:48 Patches for DNS resolving KARL BIELEFELDT
@ 2017-01-19 21:49 ` KARL BIELEFELDT
  2017-01-19 23:11   ` Rich Felker
  2017-01-19 21:49 ` [PATCH 2/2] Fix digit offset on resolv.conf attempts KARL BIELEFELDT
  1 sibling, 1 reply; 9+ messages in thread
From: KARL BIELEFELDT @ 2017-01-19 21:49 UTC (permalink / raw)
  To: musl; +Cc: OMADA_TEAM, KARL BIELEFELDT

This function previously exited after the first search failure
due to an inverted test condition, and incorrect testing of
return codes in name_from_dns.  This commit corrects those
self-cancelling errors that were only evident when another type
of error such as a network timeout occurred.
---
 src/network/lookup_name.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
index fb7303a3..bfa76d38 100644
--- a/src/network/lookup_name.c
+++ b/src/network/lookup_name.c
@@ -164,8 +164,8 @@ static int name_from_dns(struct address buf[static MAXADDRS], char canon[static
 
 	if (ctx.cnt) return ctx.cnt;
 	if (alens[0] < 4 || (abuf[0][3] & 15) == 2) return EAI_AGAIN;
-	if ((abuf[0][3] & 15) == 0) return EAI_NONAME;
-	if ((abuf[0][3] & 15) == 3) return 0;
+	if ((abuf[0][3] & 15) == 3) return EAI_NONAME;
+	if ((abuf[0][3] & 15) == 0) return 0;
 	return EAI_FAIL;
 }
 
@@ -201,7 +201,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
 			memcpy(canon+l+1, p, z-p);
 			canon[z-p+1+l] = 0;
 			int cnt = name_from_dns(buf, canon, canon, family, &conf);
-			if (cnt) return cnt;
+			if (!cnt) return cnt;
 		}
 	}
 
-- 
2.11.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] Fix digit offset on resolv.conf attempts.
  2017-01-19 21:48 Patches for DNS resolving KARL BIELEFELDT
  2017-01-19 21:49 ` [PATCH 1/2] Fix name_from_dns_search to handle errors properly KARL BIELEFELDT
@ 2017-01-19 21:49 ` KARL BIELEFELDT
  2017-01-20  3:50   ` Rich Felker
  1 sibling, 1 reply; 9+ messages in thread
From: KARL BIELEFELDT @ 2017-01-19 21:49 UTC (permalink / raw)
  To: musl; +Cc: OMADA_TEAM, KARL BIELEFELDT

---
 src/network/resolvconf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/network/resolvconf.c b/src/network/resolvconf.c
index 2cf1f475..4c3e4c4b 100644
--- a/src/network/resolvconf.c
+++ b/src/network/resolvconf.c
@@ -45,8 +45,8 @@ int __get_resolv_conf(struct resolvconf *conf, char *search, size_t search_sz)
 				if (z != p) conf->ndots = x > 15 ? 15 : x;
 			}
 			p = strstr(line, "attempts:");
-			if (p && isdigit(p[6])) {
-				p += 6;
+			if (p && isdigit(p[9])) {
+				p += 9;
 				unsigned long x = strtoul(p, &z, 10);
 				if (z != p) conf->attempts = x > 10 ? 10 : x;
 			}
-- 
2.11.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Fix name_from_dns_search to handle errors properly.
  2017-01-19 21:49 ` [PATCH 1/2] Fix name_from_dns_search to handle errors properly KARL BIELEFELDT
@ 2017-01-19 23:11   ` Rich Felker
  2017-01-20  0:13     ` KARL BIELEFELDT
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Felker @ 2017-01-19 23:11 UTC (permalink / raw)
  To: musl

On Thu, Jan 19, 2017 at 09:49:01PM +0000, KARL BIELEFELDT wrote:
> This function previously exited after the first search failure
> due to an inverted test condition, and incorrect testing of
> return codes in name_from_dns.  This commit corrects those
> self-cancelling errors that were only evident when another type
> of error such as a network timeout occurred.
> ---
>  src/network/lookup_name.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
> index fb7303a3..bfa76d38 100644
> --- a/src/network/lookup_name.c
> +++ b/src/network/lookup_name.c
> @@ -164,8 +164,8 @@ static int name_from_dns(struct address buf[static MAXADDRS], char canon[static
>  
>  	if (ctx.cnt) return ctx.cnt;
>  	if (alens[0] < 4 || (abuf[0][3] & 15) == 2) return EAI_AGAIN;
> -	if ((abuf[0][3] & 15) == 0) return EAI_NONAME;
> -	if ((abuf[0][3] & 15) == 3) return 0;
> +	if ((abuf[0][3] & 15) == 3) return EAI_NONAME;
> +	if ((abuf[0][3] & 15) == 0) return 0;
>  	return EAI_FAIL;
>  }
>  
> @@ -201,7 +201,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
>  			memcpy(canon+l+1, p, z-p);
>  			canon[z-p+1+l] = 0;
>  			int cnt = name_from_dns(buf, canon, canon, family, &conf);
> -			if (cnt) return cnt;
> +			if (!cnt) return cnt;
>  		}
>  	}

I think you're mistaken here. The intent throughout __lookup_name is
that, when one backend returns 0 (no results), the lookup process
continues to the next backend. Conversely, if a backend returns a
positive number of results or an error, lookup must stop and report
what was found. If this principle is not followed then transient
errors are not reported to the caller but instead _silently_ change
the result of the lookup.

In the case of dns lookups, a response code of 0 from the nameserver
with 0 records is a result of "this dns label exists, but doesn't have
a record of the requested type". Usually that happens when only A was
available but you requested AAAA, or vice versa. In that case we must
return EAI_NONAME as the result rather than continuing the search.
Otherwise, requests for AF_UNSPEC will return results from different
sources than what you would get from at least one of AF_INET or
AF_INET6.

On the other hand, a result of NxDomain is not an error; it simply
means the requested dns label does not exist at all. In that case,
it's valid to continue the search.


Rich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Fix name_from_dns_search to handle errors properly.
  2017-01-19 23:11   ` Rich Felker
@ 2017-01-20  0:13     ` KARL BIELEFELDT
  2017-01-20  1:02       ` Rich Felker
  0 siblings, 1 reply; 9+ messages in thread
From: KARL BIELEFELDT @ 2017-01-20  0:13 UTC (permalink / raw)
  To: musl; +Cc: OMADA_TEAM

On 01/19/2017 05:11 PM, Rich Felker wrote:
> On Thu, Jan 19, 2017 at 09:49:01PM +0000, KARL BIELEFELDT wrote:
>> This function previously exited after the first search failure
>> due to an inverted test condition, and incorrect testing of
>> return codes in name_from_dns.  This commit corrects those
>> self-cancelling errors that were only evident when another type
>> of error such as a network timeout occurred.
>> ---
>>   src/network/lookup_name.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
>> index fb7303a3..bfa76d38 100644
>> --- a/src/network/lookup_name.c
>> +++ b/src/network/lookup_name.c
>> @@ -164,8 +164,8 @@ static int name_from_dns(struct address buf[static MAXADDRS], char canon[static
>>   
>>   	if (ctx.cnt) return ctx.cnt;
>>   	if (alens[0] < 4 || (abuf[0][3] & 15) == 2) return EAI_AGAIN;
>> -	if ((abuf[0][3] & 15) == 0) return EAI_NONAME;
>> -	if ((abuf[0][3] & 15) == 3) return 0;
>> +	if ((abuf[0][3] & 15) == 3) return EAI_NONAME;
>> +	if ((abuf[0][3] & 15) == 0) return 0;
>>   	return EAI_FAIL;
>>   }
>>   
>> @@ -201,7 +201,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
>>   			memcpy(canon+l+1, p, z-p);
>>   			canon[z-p+1+l] = 0;
>>   			int cnt = name_from_dns(buf, canon, canon, family, &conf);
>> -			if (cnt) return cnt;
>> +			if (!cnt) return cnt;
>>   		}
>>   	}
> I think you're mistaken here. The intent throughout __lookup_name is
> that, when one backend returns 0 (no results), the lookup process
> continues to the next backend. Conversely, if a backend returns a
> positive number of results or an error, lookup must stop and report
> what was found. If this principle is not followed then transient
> errors are not reported to the caller but instead _silently_ change
> the result of the lookup.
>
> In the case of dns lookups, a response code of 0 from the nameserver
> with 0 records is a result of "this dns label exists, but doesn't have
> a record of the requested type". Usually that happens when only A was
> available but you requested AAAA, or vice versa. In that case we must
> return EAI_NONAME as the result rather than continuing the search.
> Otherwise, requests for AF_UNSPEC will return results from different
> sources than what you would get from at least one of AF_INET or
> AF_INET6.
>
> On the other hand, a result of NxDomain is not an error; it simply
> means the requested dns label does not exist at all. In that case,
> it's valid to continue the search.
>
>
> Rich
>
OK, I follow that.  The problem we're seeing though is when one of the 
entries
on the "search" line of /etc/resolv.conf causes a network timeout error, 
it never
gets down to the last line of the function, which as far as I can tell 
tests the absolute
name without any search list elements appended, and would have returned 
a successful
result.

Karl



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Fix name_from_dns_search to handle errors properly.
  2017-01-20  0:13     ` KARL BIELEFELDT
@ 2017-01-20  1:02       ` Rich Felker
  2017-01-20 16:23         ` KARL BIELEFELDT
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Felker @ 2017-01-20  1:02 UTC (permalink / raw)
  To: musl

On Fri, Jan 20, 2017 at 12:13:36AM +0000, KARL BIELEFELDT wrote:
> On 01/19/2017 05:11 PM, Rich Felker wrote:
> > On Thu, Jan 19, 2017 at 09:49:01PM +0000, KARL BIELEFELDT wrote:
> >> This function previously exited after the first search failure
> >> due to an inverted test condition, and incorrect testing of
> >> return codes in name_from_dns.  This commit corrects those
> >> self-cancelling errors that were only evident when another type
> >> of error such as a network timeout occurred.
> >> ---
> >>   src/network/lookup_name.c | 6 +++---
> >>   1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
> >> index fb7303a3..bfa76d38 100644
> >> --- a/src/network/lookup_name.c
> >> +++ b/src/network/lookup_name.c
> >> @@ -164,8 +164,8 @@ static int name_from_dns(struct address buf[static MAXADDRS], char canon[static
> >>   
> >>   	if (ctx.cnt) return ctx.cnt;
> >>   	if (alens[0] < 4 || (abuf[0][3] & 15) == 2) return EAI_AGAIN;
> >> -	if ((abuf[0][3] & 15) == 0) return EAI_NONAME;
> >> -	if ((abuf[0][3] & 15) == 3) return 0;
> >> +	if ((abuf[0][3] & 15) == 3) return EAI_NONAME;
> >> +	if ((abuf[0][3] & 15) == 0) return 0;
> >>   	return EAI_FAIL;
> >>   }
> >>   
> >> @@ -201,7 +201,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
> >>   			memcpy(canon+l+1, p, z-p);
> >>   			canon[z-p+1+l] = 0;
> >>   			int cnt = name_from_dns(buf, canon, canon, family, &conf);
> >> -			if (cnt) return cnt;
> >> +			if (!cnt) return cnt;
> >>   		}
> >>   	}
> > I think you're mistaken here. The intent throughout __lookup_name is
> > that, when one backend returns 0 (no results), the lookup process
> > continues to the next backend. Conversely, if a backend returns a
> > positive number of results or an error, lookup must stop and report
> > what was found. If this principle is not followed then transient
> > errors are not reported to the caller but instead _silently_ change
> > the result of the lookup.
> >
> > In the case of dns lookups, a response code of 0 from the nameserver
> > with 0 records is a result of "this dns label exists, but doesn't have
> > a record of the requested type". Usually that happens when only A was
> > available but you requested AAAA, or vice versa. In that case we must
> > return EAI_NONAME as the result rather than continuing the search.
> > Otherwise, requests for AF_UNSPEC will return results from different
> > sources than what you would get from at least one of AF_INET or
> > AF_INET6.
> >
> > On the other hand, a result of NxDomain is not an error; it simply
> > means the requested dns label does not exist at all. In that case,
> > it's valid to continue the search.
> >
> OK, I follow that.  The problem we're seeing though is when one of the 
> entries
> on the "search" line of /etc/resolv.conf causes a network timeout error, 
> it never
> gets down to the last line of the function, which as far as I can tell 
> tests the absolute
> name without any search list elements appended, and would have returned 
> a successful
> result.

This is the only way to get consistent results; otherwise, which
result you get changes depending on whether you experience a transient
error on the search element.

Note that this generally only happens with ndots>1. With the default,
ndots=1, only queries with no dots in them (which are generally not
valid as standalone dns lookups, although sometimes now they are with
the ridiculous new tlds) are subject to search. Use of ndots>1 has
lots of other issues like making dns lookups twice as slow.

Rich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] Fix digit offset on resolv.conf attempts.
  2017-01-19 21:49 ` [PATCH 2/2] Fix digit offset on resolv.conf attempts KARL BIELEFELDT
@ 2017-01-20  3:50   ` Rich Felker
  2017-01-20 16:22     ` KARL BIELEFELDT
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Felker @ 2017-01-20  3:50 UTC (permalink / raw)
  To: musl

On Thu, Jan 19, 2017 at 09:49:08PM +0000, KARL BIELEFELDT wrote:
> ---
>  src/network/resolvconf.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/network/resolvconf.c b/src/network/resolvconf.c
> index 2cf1f475..4c3e4c4b 100644
> --- a/src/network/resolvconf.c
> +++ b/src/network/resolvconf.c
> @@ -45,8 +45,8 @@ int __get_resolv_conf(struct resolvconf *conf, char *search, size_t search_sz)
>  				if (z != p) conf->ndots = x > 15 ? 15 : x;
>  			}
>  			p = strstr(line, "attempts:");
> -			if (p && isdigit(p[6])) {
> -				p += 6;
> +			if (p && isdigit(p[9])) {
> +				p += 9;
>  				unsigned long x = strtoul(p, &z, 10);
>  				if (z != p) conf->attempts = x > 10 ? 10 : x;
>  			}

This looks correct. Would you like to be credited as commit author? (I
usually ask for new first-time contributors before putting name/email
in permanent repo history.)

Rich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] Fix digit offset on resolv.conf attempts.
  2017-01-20  3:50   ` Rich Felker
@ 2017-01-20 16:22     ` KARL BIELEFELDT
  0 siblings, 0 replies; 9+ messages in thread
From: KARL BIELEFELDT @ 2017-01-20 16:22 UTC (permalink / raw)
  To: musl

Yes, thanks.


On 01/19/2017 09:50 PM, Rich Felker wrote:
> On Thu, Jan 19, 2017 at 09:49:08PM +0000, KARL BIELEFELDT wrote:
>> ---
>>   src/network/resolvconf.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/network/resolvconf.c b/src/network/resolvconf.c
>> index 2cf1f475..4c3e4c4b 100644
>> --- a/src/network/resolvconf.c
>> +++ b/src/network/resolvconf.c
>> @@ -45,8 +45,8 @@ int __get_resolv_conf(struct resolvconf *conf, char *search, size_t search_sz)
>>   				if (z != p) conf->ndots = x > 15 ? 15 : x;
>>   			}
>>   			p = strstr(line, "attempts:");
>> -			if (p && isdigit(p[6])) {
>> -				p += 6;
>> +			if (p && isdigit(p[9])) {
>> +				p += 9;
>>   				unsigned long x = strtoul(p, &z, 10);
>>   				if (z != p) conf->attempts = x > 10 ? 10 : x;
>>   			}
> This looks correct. Would you like to be credited as commit author? (I
> usually ask for new first-time contributors before putting name/email
> in permanent repo history.)
>
> Rich
>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Fix name_from_dns_search to handle errors properly.
  2017-01-20  1:02       ` Rich Felker
@ 2017-01-20 16:23         ` KARL BIELEFELDT
  0 siblings, 0 replies; 9+ messages in thread
From: KARL BIELEFELDT @ 2017-01-20 16:23 UTC (permalink / raw)
  To: musl; +Cc: OMADA_TEAM

On 01/19/2017 07:02 PM, Rich Felker wrote:
> On Fri, Jan 20, 2017 at 12:13:36AM +0000, KARL BIELEFELDT wrote:
>> On 01/19/2017 05:11 PM, Rich Felker wrote:
>>> On Thu, Jan 19, 2017 at 09:49:01PM +0000, KARL BIELEFELDT wrote:
>>>> This function previously exited after the first search failure
>>>> due to an inverted test condition, and incorrect testing of
>>>> return codes in name_from_dns.  This commit corrects those
>>>> self-cancelling errors that were only evident when another type
>>>> of error such as a network timeout occurred.
>>>> ---
>>>>    src/network/lookup_name.c | 6 +++---
>>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
>>>> index fb7303a3..bfa76d38 100644
>>>> --- a/src/network/lookup_name.c
>>>> +++ b/src/network/lookup_name.c
>>>> @@ -164,8 +164,8 @@ static int name_from_dns(struct address buf[static MAXADDRS], char canon[static
>>>>    
>>>>    	if (ctx.cnt) return ctx.cnt;
>>>>    	if (alens[0] < 4 || (abuf[0][3] & 15) == 2) return EAI_AGAIN;
>>>> -	if ((abuf[0][3] & 15) == 0) return EAI_NONAME;
>>>> -	if ((abuf[0][3] & 15) == 3) return 0;
>>>> +	if ((abuf[0][3] & 15) == 3) return EAI_NONAME;
>>>> +	if ((abuf[0][3] & 15) == 0) return 0;
>>>>    	return EAI_FAIL;
>>>>    }
>>>>    
>>>> @@ -201,7 +201,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
>>>>    			memcpy(canon+l+1, p, z-p);
>>>>    			canon[z-p+1+l] = 0;
>>>>    			int cnt = name_from_dns(buf, canon, canon, family, &conf);
>>>> -			if (cnt) return cnt;
>>>> +			if (!cnt) return cnt;
>>>>    		}
>>>>    	}
>>> I think you're mistaken here. The intent throughout __lookup_name is
>>> that, when one backend returns 0 (no results), the lookup process
>>> continues to the next backend. Conversely, if a backend returns a
>>> positive number of results or an error, lookup must stop and report
>>> what was found. If this principle is not followed then transient
>>> errors are not reported to the caller but instead _silently_ change
>>> the result of the lookup.
>>>
>>> In the case of dns lookups, a response code of 0 from the nameserver
>>> with 0 records is a result of "this dns label exists, but doesn't have
>>> a record of the requested type". Usually that happens when only A was
>>> available but you requested AAAA, or vice versa. In that case we must
>>> return EAI_NONAME as the result rather than continuing the search.
>>> Otherwise, requests for AF_UNSPEC will return results from different
>>> sources than what you would get from at least one of AF_INET or
>>> AF_INET6.
>>>
>>> On the other hand, a result of NxDomain is not an error; it simply
>>> means the requested dns label does not exist at all. In that case,
>>> it's valid to continue the search.
>>>
>> OK, I follow that.  The problem we're seeing though is when one of the
>> entries
>> on the "search" line of /etc/resolv.conf causes a network timeout error,
>> it never
>> gets down to the last line of the function, which as far as I can tell
>> tests the absolute
>> name without any search list elements appended, and would have returned
>> a successful
>> result.
> This is the only way to get consistent results; otherwise, which
> result you get changes depending on whether you experience a transient
> error on the search element.
>
> Note that this generally only happens with ndots>1. With the default,
> ndots=1, only queries with no dots in them (which are generally not
> valid as standalone dns lookups, although sometimes now they are with
> the ridiculous new tlds) are subject to search. Use of ndots>1 has
> lots of other issues like making dns lookups twice as slow.
>
> Rich
>
That makes sense.  I appreciate you taking the time to explain it.

Karl



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-01-20 16:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-19 21:48 Patches for DNS resolving KARL BIELEFELDT
2017-01-19 21:49 ` [PATCH 1/2] Fix name_from_dns_search to handle errors properly KARL BIELEFELDT
2017-01-19 23:11   ` Rich Felker
2017-01-20  0:13     ` KARL BIELEFELDT
2017-01-20  1:02       ` Rich Felker
2017-01-20 16:23         ` KARL BIELEFELDT
2017-01-19 21:49 ` [PATCH 2/2] Fix digit offset on resolv.conf attempts KARL BIELEFELDT
2017-01-20  3:50   ` Rich Felker
2017-01-20 16:22     ` KARL BIELEFELDT

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).