mailing list of musl libc
 help / color / mirror / code / Atom feed
* DNS resolver patch
       [not found] <882247050.3003099.1544074074084.JavaMail.zimbra@totalphase.com>
@ 2018-12-06  5:31 ` Tarun Johar
  2018-12-06 14:13   ` Rich Felker
  2018-12-06 14:53   ` Florian Weimer
  0 siblings, 2 replies; 17+ messages in thread
From: Tarun Johar @ 2018-12-06  5:31 UTC (permalink / raw)
  To: musl; +Cc: Tarun Johar 

[-- Attachment #1: Type: text/plain, Size: 1165 bytes --]

Hi Team, 

The VirtualBox --natdnsresolver does not support IPv6 AAAA address queries. It returns "NotImp" (code 4) for such queries. 

The MUSL library (https://www.musl-libc.org/) resolver does not recognize this code and retries the query until the timeout. This causes DNS lookups to take several seconds after which they are eventually successful. 

The GLIBC resolver works properly with the same configuration, suggesting that a fix should be made to MUSL to handle the "NotImp" response code. 

The root cause is this section of code in musl/src/network/res_msend.c:149 
/* Only accept positive or negative responses; 
* retry immediately on server failure, and ignore 
* all other codes such as refusal. */ 
switch (answers[next][3] & 15) { 
case 0: 
case 3: 
break; 
case 2: 
if (servfail_retry && servfail_retry--) 
sendto(fd, queries[i], 
qlens[i], MSG_NOSIGNAL, 
(void *)&ns[j], sl); 
default: 
continue; 
} 

If "case 4" is added after "case 3" and before "break", the NotImp code is treated as a positive or negative response and the name resolution loop completes immediately. 

Can the patch for this be included in MUSL 1.1.21 ? 

Thanks, 
Tarun 

[-- Attachment #2: Type: text/html, Size: 3870 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-06  5:31 ` DNS resolver patch Tarun Johar
@ 2018-12-06 14:13   ` Rich Felker
  2018-12-06 15:23     ` Natanael Copa
  2018-12-06 14:53   ` Florian Weimer
  1 sibling, 1 reply; 17+ messages in thread
From: Rich Felker @ 2018-12-06 14:13 UTC (permalink / raw)
  To: musl

On Wed, Dec 05, 2018 at 09:31:55PM -0800, Tarun Johar wrote:
> Hi Team, 
> 
> The VirtualBox --natdnsresolver does not support IPv6 AAAA address
> queries. It returns "NotImp" (code 4) for such queries.
> 
> The MUSL library (https://www.musl-libc.org/) resolver does not
> recognize this code and retries the query until the timeout. This
> causes DNS lookups to take several seconds after which they are
> eventually successful.
> 
> The GLIBC resolver works properly with the same configuration,
> suggesting that a fix should be made to MUSL to handle the "NotImp"
> response code.
> 
> The root cause is this section of code in musl/src/network/res_msend.c:149 
> /* Only accept positive or negative responses; 
> * retry immediately on server failure, and ignore 
> * all other codes such as refusal. */ 
> switch (answers[next][3] & 15) { 
> case 0: 
> case 3: 
> break; 
> case 2: 
> if (servfail_retry && servfail_retry--) 
> sendto(fd, queries[i], 
> qlens[i], MSG_NOSIGNAL, 
> (void *)&ns[j], sl); 
> default: 
> continue; 
> } 
> 
> If "case 4" is added after "case 3" and before "break", the NotImp
> code is treated as a positive or negative response and the name
> resolution loop completes immediately.
> 
> Can the patch for this be included in MUSL 1.1.21 ? 

No, this is specifically wrong. If one buggy nameserver is responding
with "NotImp", the correct behavior is waiting for a response from a
different one that's not broken. It's possible that we could try to
remember such errors for each nameserver, and abort the lookup early
with an error (not a negative result, since this is not a result) if
*all* of them have failed, but it's not clear that that's the right
thing to do if there might be multiple actual servers behind each
logical one (ip address), which is probably the case for things like
8.8.8.8; in that case an error from one should not result in aborting
the query. Note also that treating it as an error would not help with
the practical need, since then the whole query would fail and you
wouldn't get the IPv4 results either.

The real fix here is just making VirtualBox's nameserver do the right
thing, or bypassing it and querying a real nameserver. Apparently
there's some reason it's desirable for use with certain NAT setups,
but I'm not clear on what this is. If it's returning real results, it
should just support AAAA and pass it through. If it's returning faked
results for some reason, and doesn't use IPv6 for them, it should just
return NxDomain for AAAA queries rather than an error. I'm happy to
help in explaining to upstream why the current behavior is problematic
if needed.

Rich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-06  5:31 ` DNS resolver patch Tarun Johar
  2018-12-06 14:13   ` Rich Felker
@ 2018-12-06 14:53   ` Florian Weimer
  2018-12-06 15:48     ` Natanael Copa
  1 sibling, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2018-12-06 14:53 UTC (permalink / raw)
  To: Tarun Johar; +Cc: musl

* Tarun Johar:

> The VirtualBox --natdnsresolver does not support IPv6 AAAA address
> queries.  It returns "NotImp" (code 4) for such queries.

I think that's not the only bug, and glibc fails to work around all of
them.  We occasionally get bug reports about DNS resolution issues under
VirtualBox, too.  Oracle really needs to fix this properly.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-06 14:13   ` Rich Felker
@ 2018-12-06 15:23     ` Natanael Copa
  0 siblings, 0 replies; 17+ messages in thread
From: Natanael Copa @ 2018-12-06 15:23 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Thu, 6 Dec 2018 09:13:01 -0500
Rich Felker <dalias@libc.org> wrote:

> On Wed, Dec 05, 2018 at 09:31:55PM -0800, Tarun Johar wrote:
> > Hi Team, 
> > 
> > The VirtualBox --natdnsresolver does not support IPv6 AAAA address
> > queries. It returns "NotImp" (code 4) for such queries.
> > 
> > The MUSL library (https://www.musl-libc.org/) resolver does not
> > recognize this code and retries the query until the timeout. This
> > causes DNS lookups to take several seconds after which they are
> > eventually successful.
> > 
> > The GLIBC resolver works properly with the same configuration,
> > suggesting that a fix should be made to MUSL to handle the "NotImp"
> > response code.
> > 
> > The root cause is this section of code in musl/src/network/res_msend.c:149 
> > /* Only accept positive or negative responses; 
> > * retry immediately on server failure, and ignore 
> > * all other codes such as refusal. */ 
> > switch (answers[next][3] & 15) { 
> > case 0: 
> > case 3: 
> > break; 
> > case 2: 
> > if (servfail_retry && servfail_retry--) 
> > sendto(fd, queries[i], 
> > qlens[i], MSG_NOSIGNAL, 
> > (void *)&ns[j], sl); 
> > default: 
> > continue; 
> > } 
> > 
> > If "case 4" is added after "case 3" and before "break", the NotImp
> > code is treated as a positive or negative response and the name
> > resolution loop completes immediately.
> > 
> > Can the patch for this be included in MUSL 1.1.21 ?   
> 
> No, this is specifically wrong. If one buggy nameserver is responding
> with "NotImp", the correct behavior is waiting for a response from a
> different one that's not broken. It's possible that we could try to
> remember such errors for each nameserver, and abort the lookup early
> with an error (not a negative result, since this is not a result) if
> *all* of them have failed, but it's not clear that that's the right
> thing to do if there might be multiple actual servers behind each
> logical one (ip address), which is probably the case for things like
> 8.8.8.8; in that case an error from one should not result in aborting
> the query. Note also that treating it as an error would not help with
> the practical need, since then the whole query would fail and you
> wouldn't get the IPv4 results either.
> 
> The real fix here is just making VirtualBox's nameserver do the right
> thing, or bypassing it and querying a real nameserver. Apparently
> there's some reason it's desirable for use with certain NAT setups,
> but I'm not clear on what this is. If it's returning real results, it
> should just support AAAA and pass it through. If it's returning faked
> results for some reason, and doesn't use IPv6 for them, it should just
> return NxDomain for AAAA queries rather than an error. I'm happy to
> help in explaining to upstream why the current behavior is problematic
> if needed.

For the record, here is a good explanation:
https://nlnetlabs.nl/pipermail/unbound-users/2017-August/004866.html

RCODE 4 means "Not Implemented - The name server does not support the
requested kind of query." where "kind of query" is specified in OPCODE[1]
field (eg query or notify) and is not the RR type.

-nc

[1]: https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-06 14:53   ` Florian Weimer
@ 2018-12-06 15:48     ` Natanael Copa
  2018-12-06 18:18       ` Florian Weimer
  2018-12-06 18:50       ` Tarun Johar
  0 siblings, 2 replies; 17+ messages in thread
From: Natanael Copa @ 2018-12-06 15:48 UTC (permalink / raw)
  To: Florian Weimer; +Cc: musl, Tarun Johar

On Thu, 06 Dec 2018 15:53:43 +0100
Florian Weimer <fweimer@redhat.com> wrote:

> * Tarun Johar:
> 
> > The VirtualBox --natdnsresolver does not support IPv6 AAAA address
> > queries.  It returns "NotImp" (code 4) for such queries.  
> 
> I think that's not the only bug, and glibc fails to work around all of
> them.  We occasionally get bug reports about DNS resolution issues under
> VirtualBox, too.  Oracle really needs to fix this properly.
> 
> Thanks,
> Florian

Problem is here:
https://www.virtualbox.org/browser/vbox/trunk/src/VBox/Devices/Network/slirp/hostres.c?rev=59202#L408

402	    if (   qtype != Type_A
403	        && qtype != Type_CNAME
404	        && qtype != Type_PTR
405	        && qtype != Type_ANY)
406	    {
407	        LogErr(("NAT: hostres: unsupported qtype %d\n", qtype));
408	        return refuse(pData, m, RCode_NotImp);
409	    }


They should return RCode_NXDomain instead of RCode_NotImp. Seems like
they also have more of those invalid use of NotImp.

-nc


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-06 15:48     ` Natanael Copa
@ 2018-12-06 18:18       ` Florian Weimer
  2018-12-06 18:38         ` A. Wilcox
  2018-12-06 18:50       ` Tarun Johar
  1 sibling, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2018-12-06 18:18 UTC (permalink / raw)
  To: Natanael Copa; +Cc: musl, Tarun Johar

* Natanael Copa:

> On Thu, 06 Dec 2018 15:53:43 +0100
> Florian Weimer <fweimer@redhat.com> wrote:
>
>> * Tarun Johar:
>> 
>> > The VirtualBox --natdnsresolver does not support IPv6 AAAA address
>> > queries.  It returns "NotImp" (code 4) for such queries.  
>> 
>> I think that's not the only bug, and glibc fails to work around all of
>> them.  We occasionally get bug reports about DNS resolution issues under
>> VirtualBox, too.  Oracle really needs to fix this properly.
>> 
>> Thanks,
>> Florian
>
> Problem is here:
> https://www.virtualbox.org/browser/vbox/trunk/src/VBox/Devices/Network/slirp/hostres.c?rev=59202#L408
>
> 402	    if (   qtype != Type_A
> 403	        && qtype != Type_CNAME
> 404	        && qtype != Type_PTR
> 405	        && qtype != Type_ANY)
> 406	    {
> 407	        LogErr(("NAT: hostres: unsupported qtype %d\n", qtype));
> 408	        return refuse(pData, m, RCode_NotImp);
> 409	    }
>
>
> They should return RCode_NXDomain instead of RCode_NotImp. Seems like
> they also have more of those invalid use of NotImp.

I think that's probably worse because NXDOMAIN says that there is no
data at that name, so there's no A record either.  It will confuse some
DNS resolvers.

The alternative, using a NOERROR/NODATA response, confuses musl search
processing.  In order to fix this properly, you need to pass through the
AAAA records (even if there's no actual IPv6 networking support in the
code; I haven't checked this and it does not matter for name
resolution).

Thanks,
Florian


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-06 18:18       ` Florian Weimer
@ 2018-12-06 18:38         ` A. Wilcox
  2018-12-06 19:46           ` Laurent Bercot
  2018-12-06 20:36           ` Florian Weimer
  0 siblings, 2 replies; 17+ messages in thread
From: A. Wilcox @ 2018-12-06 18:38 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 623 bytes --]

On 12/06/18 12:18, Florian Weimer wrote:
> The alternative, using a NOERROR/NODATA response, confuses musl search
> processing.

???

The musl resolver should be able to handle a resolver returning NODATA.
That is popular for having a separate extranet infrastructure - your
extranet DNS only contains records for your local domain and returns
NODATA for requests outside that domain.

If you are correct that such a response "confuses musl search
processing", that's a bug in musl that needs to be fixed.

Best,
--arw

-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-06 15:48     ` Natanael Copa
  2018-12-06 18:18       ` Florian Weimer
@ 2018-12-06 18:50       ` Tarun Johar
  2018-12-06 19:36         ` Tarun Johar
  1 sibling, 1 reply; 17+ messages in thread
From: Tarun Johar @ 2018-12-06 18:50 UTC (permalink / raw)
  To: Natanael Copa, Florian Weimer; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 2181 bytes --]

Hi Natanael/Florian, 

A couple of solutions for this are stated below. 

The code is in src/VBox/Devices/Network/slirp/hostres.c :517 

if ( qtype != Type_A 
&& qtype != Type_CNAME 
&& qtype != Type_PTR 
&& qtype != Type_ANY) 
{ 
LogErr(("NAT: hostres: unsupported qtype %d\n", qtype)); 
return refuse(res, RCode_NotImp); 
} 

There are two possible fixes: 

- Add a conditional above this code for Type_AAAA where the resolver returns RCode_NXDomain instead of RCode_NotImp: 

if (qtype == Type_AAAA) { 
LogErr(("NAT: hostres: cannot resolve qtype %d\n", qtype)); 
return refuse(res, RCode_NXDomain); 
} 

- Implement IPv6 resolution for AAAA records. The resolve() function at line 574 would need to be updated. 

I just came across ticket filed by Natanael with Virtual box at https://www.virtualbox.org/ticket/18171 . Since one of us is already talking with them, could you propose the above solutions to them (and add me to the loop at well). 

Thanks, 
Tarun 

----- Original Message -----

From: "Natanael Copa" <ncopa@alpinelinux.org> 
To: "Florian Weimer" <fweimer@redhat.com> 
Cc: musl@lists.openwall.com, "Tarun Johar" <tjohar@totalphase.com> 
Sent: Thursday, December 6, 2018 9:18:20 PM 
Subject: Re: [musl] DNS resolver patch 

On Thu, 06 Dec 2018 15:53:43 +0100 
Florian Weimer <fweimer@redhat.com> wrote: 



* Tarun Johar: 

> The VirtualBox --natdnsresolver does not support IPv6 AAAA address 
> queries. It returns "NotImp" (code 4) for such queries. 

I think that's not the only bug, and glibc fails to work around all of 
them. We occasionally get bug reports about DNS resolution issues under 
VirtualBox, too. Oracle really needs to fix this properly. 

Thanks, 
Florian 

Problem is here: 



https://www.virtualbox.org/browser/vbox/trunk/src/VBox/Devices/Network/slirp/hostres.c?rev=59202#L408 

402 if ( qtype != Type_A 
403 && qtype != Type_CNAME 
404 && qtype != Type_PTR 
405 && qtype != Type_ANY) 
406 { 
407 LogErr(("NAT: hostres: unsupported qtype %d\n", qtype)); 
408 return refuse(pData, m, RCode_NotImp); 
409 } 


They should return RCode_NXDomain instead of RCode_NotImp. Seems like 
they also have more of those invalid use of NotImp. 

-nc 

[-- Attachment #2: Type: text/html, Size: 4719 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-06 18:50       ` Tarun Johar
@ 2018-12-06 19:36         ` Tarun Johar
  0 siblings, 0 replies; 17+ messages in thread
From: Tarun Johar @ 2018-12-06 19:36 UTC (permalink / raw)
  To: Natanael Copa, Florian Weimer; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 2598 bytes --]

I managed to add the proposed fixes to https://www.virtualbox.org/ticket/18171 as comment. Hopefully, this helps our cause. 

Thanks, 
Tarun 

----- Original Message -----

From: "Tarun Johar" <tjohar@totalphase.com> 
To: "Natanael Copa" <ncopa@alpinelinux.org>, "Florian Weimer" <fweimer@redhat.com> 
Cc: musl@lists.openwall.com 
Sent: Friday, December 7, 2018 12:20:24 AM 
Subject: Re: [musl] DNS resolver patch 

Hi Natanael/Florian, 

A couple of solutions for this are stated below. 

The code is in src/VBox/Devices/Network/slirp/hostres.c :517 

if ( qtype != Type_A 
&& qtype != Type_CNAME 
&& qtype != Type_PTR 
&& qtype != Type_ANY) 
{ 
LogErr(("NAT: hostres: unsupported qtype %d\n", qtype)); 
return refuse(res, RCode_NotImp); 
} 

There are two possible fixes: 

- Add a conditional above this code for Type_AAAA where the resolver returns RCode_NXDomain instead of RCode_NotImp: 

if (qtype == Type_AAAA) { 
LogErr(("NAT: hostres: cannot resolve qtype %d\n", qtype)); 
return refuse(res, RCode_NXDomain); 
} 

- Implement IPv6 resolution for AAAA records. The resolve() function at line 574 would need to be updated. 

I just came across ticket filed by Natanael with Virtual box at https://www.virtualbox.org/ticket/18171 . Since one of us is already talking with them, could you propose the above solutions to them (and add me to the loop at well). 

Thanks, 
Tarun 

----- Original Message -----

From: "Natanael Copa" <ncopa@alpinelinux.org> 
To: "Florian Weimer" <fweimer@redhat.com> 
Cc: musl@lists.openwall.com, "Tarun Johar" <tjohar@totalphase.com> 
Sent: Thursday, December 6, 2018 9:18:20 PM 
Subject: Re: [musl] DNS resolver patch 

On Thu, 06 Dec 2018 15:53:43 +0100 
Florian Weimer <fweimer@redhat.com> wrote: 



* Tarun Johar: 

> The VirtualBox --natdnsresolver does not support IPv6 AAAA address 
> queries. It returns "NotImp" (code 4) for such queries. 

I think that's not the only bug, and glibc fails to work around all of 
them. We occasionally get bug reports about DNS resolution issues under 
VirtualBox, too. Oracle really needs to fix this properly. 

Thanks, 
Florian 

Problem is here: 



https://www.virtualbox.org/browser/vbox/trunk/src/VBox/Devices/Network/slirp/hostres.c?rev=59202#L408 

402 if ( qtype != Type_A 
403 && qtype != Type_CNAME 
404 && qtype != Type_PTR 
405 && qtype != Type_ANY) 
406 { 
407 LogErr(("NAT: hostres: unsupported qtype %d\n", qtype)); 
408 return refuse(pData, m, RCode_NotImp); 
409 } 


They should return RCode_NXDomain instead of RCode_NotImp. Seems like 
they also have more of those invalid use of NotImp. 

-nc 


[-- Attachment #2: Type: text/html, Size: 5658 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-06 18:38         ` A. Wilcox
@ 2018-12-06 19:46           ` Laurent Bercot
  2018-12-25  2:06             ` Rich Felker
  2018-12-06 20:36           ` Florian Weimer
  1 sibling, 1 reply; 17+ messages in thread
From: Laurent Bercot @ 2018-12-06 19:46 UTC (permalink / raw)
  To: musl

>The musl resolver should be able to handle a resolver returning NODATA.
>That is popular for having a separate extranet infrastructure - your
>extranet DNS only contains records for your local domain and returns
>NODATA for requests outside that domain.

No, you are talking about servers containing data. The musl client
(which is not a resolver, because it only performs recursive queries)
should not contact those directly. It should contact a real resolver,
a.k.a. cache, and the cache will contact the servers containing data.
If the domain has been configured properly, the servers are never asked
for data that are outside that domain.

It is the single most annoying, most bug-prone, and most confusing
flaw of DNS to have "communication between the DNS client and the DNS
cache" (recursive queries) and "communication between the DNS cache
and the DNS server" (non-recursive queries) happen on the same port.
I'd even take a different _protocol_ if it could stop people from
misconfiguring DNS.

The default usage of BIND, which was "one single daemon is both a
cache and a server and we entertain the confusion", did a lot of harm
to the Internet. As your post illustrates, this harm pertains to this
day.

--
Laurent



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-06 18:38         ` A. Wilcox
  2018-12-06 19:46           ` Laurent Bercot
@ 2018-12-06 20:36           ` Florian Weimer
  2018-12-06 21:01             ` Rich Felker
  1 sibling, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2018-12-06 20:36 UTC (permalink / raw)
  To: A. Wilcox; +Cc: musl

* A. Wilcox:

> On 12/06/18 12:18, Florian Weimer wrote:
>> The alternative, using a NOERROR/NODATA response, confuses musl search
>> processing.
>
> ???
>
> The musl resolver should be able to handle a resolver returning NODATA.
> That is popular for having a separate extranet infrastructure - your
> extranet DNS only contains records for your local domain and returns
> NODATA for requests outside that domain.
>
> If you are correct that such a response "confuses musl search
> processing", that's a bug in musl that needs to be fixed.

<https://www.openwall.com/lists/musl/2018/03/31/2>

I don't know if it was merged.

Florian


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-06 20:36           ` Florian Weimer
@ 2018-12-06 21:01             ` Rich Felker
  0 siblings, 0 replies; 17+ messages in thread
From: Rich Felker @ 2018-12-06 21:01 UTC (permalink / raw)
  To: Florian Weimer; +Cc: A. Wilcox, musl

On Thu, Dec 06, 2018 at 09:36:01PM +0100, Florian Weimer wrote:
> * A. Wilcox:
> 
> > On 12/06/18 12:18, Florian Weimer wrote:
> >> The alternative, using a NOERROR/NODATA response, confuses musl search
> >> processing.
> >
> > ???
> >
> > The musl resolver should be able to handle a resolver returning NODATA.
> > That is popular for having a separate extranet infrastructure - your
> > extranet DNS only contains records for your local domain and returns
> > NODATA for requests outside that domain.
> >
> > If you are correct that such a response "confuses musl search
> > processing", that's a bug in musl that needs to be fixed.
> 
> <https://www.openwall.com/lists/musl/2018/03/31/2>
> 
> I don't know if it was merged.

That patch didn't (and fundamentally can't) produce fully-consistent
results (consistent with non-A/AAAA queries). Something similar (with
configurable opt-in) might be sufficient if there were a need, but
supposedly the underlying issue with Cloudflare was fixed. So in
summary, it hasn't been merged.

Rich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-06 19:46           ` Laurent Bercot
@ 2018-12-25  2:06             ` Rich Felker
  2018-12-27 19:18               ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Rich Felker @ 2018-12-25  2:06 UTC (permalink / raw)
  To: musl

On Thu, Dec 06, 2018 at 07:46:02PM +0000, Laurent Bercot wrote:
> >The musl resolver should be able to handle a resolver returning NODATA.
> >That is popular for having a separate extranet infrastructure - your
> >extranet DNS only contains records for your local domain and returns
> >NODATA for requests outside that domain.
> 
> No, you are talking about servers containing data. The musl client
> (which is not a resolver, because it only performs recursive queries)
> should not contact those directly. It should contact a real resolver,
> a.k.a. cache, and the cache will contact the servers containing data.
> If the domain has been configured properly, the servers are never asked
> for data that are outside that domain.
> 
> It is the single most annoying, most bug-prone, and most confusing
> flaw of DNS to have "communication between the DNS client and the DNS
> cache" (recursive queries) and "communication between the DNS cache
> and the DNS server" (non-recursive queries) happen on the same port.
> I'd even take a different _protocol_ if it could stop people from
> misconfiguring DNS.
> 
> The default usage of BIND, which was "one single daemon is both a
> cache and a server and we entertain the confusion", did a lot of harm
> to the Internet. As your post illustrates, this harm pertains to this
> day.

I'm not sure what the relation to the confusion between querying an
authoritative server and a recursive server is here, but the quoted
interpretation of NODATA above is wrong independent of any such
confusion. NODATA does not indicate that the server you asked doesn't
know about the queried name. It indicates that that queried name
exists but has no records of the requested type.

Rich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-25  2:06             ` Rich Felker
@ 2018-12-27 19:18               ` Florian Weimer
  2018-12-28 17:21                 ` Rich Felker
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2018-12-27 19:18 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> On Thu, Dec 06, 2018 at 07:46:02PM +0000, Laurent Bercot wrote:
>> >The musl resolver should be able to handle a resolver returning NODATA.
>> >That is popular for having a separate extranet infrastructure - your
>> >extranet DNS only contains records for your local domain and returns
>> >NODATA for requests outside that domain.
>> 
>> No, you are talking about servers containing data. The musl client
>> (which is not a resolver, because it only performs recursive queries)
>> should not contact those directly. It should contact a real resolver,
>> a.k.a. cache, and the cache will contact the servers containing data.
>> If the domain has been configured properly, the servers are never asked
>> for data that are outside that domain.
>> 
>> It is the single most annoying, most bug-prone, and most confusing
>> flaw of DNS to have "communication between the DNS client and the DNS
>> cache" (recursive queries) and "communication between the DNS cache
>> and the DNS server" (non-recursive queries) happen on the same port.
>> I'd even take a different _protocol_ if it could stop people from
>> misconfiguring DNS.
>> 
>> The default usage of BIND, which was "one single daemon is both a
>> cache and a server and we entertain the confusion", did a lot of harm
>> to the Internet. As your post illustrates, this harm pertains to this
>> day.
>
> I'm not sure what the relation to the confusion between querying an
> authoritative server and a recursive server is here, but the quoted
> interpretation of NODATA above is wrong independent of any such
> confusion. NODATA does not indicate that the server you asked doesn't
> know about the queried name. It indicates that that queried name
> exists but has no records of the requested type.

Maybe a referral looks like a NODATA response upon cursory inspection?

glibc has code which switches to the next configured nameserver upon
encountering what looks like a referral:

		if (anhp->rcode == NOERROR && anhp->ancount == 0
		    && anhp->aa == 0 && anhp->ra == 0 && anhp->arcount == 0) {
			goto next_ns;
		}

(Oops: When EDNS support is enabled, this check is buggy because
anhp->arcount is not necessarily zero due to the OPT record.)

REFUSED is handled the same way, so I think this enables the
misconfiguration A. Wilcox described.  Fortunately, we still only
support three name servers, so there is a limit to what people can do
with this.

Curiously this isn't something that was part of the original BIND stub
resolver code.  It's a fairly recent addition to the glibc stub
resolver, dating back to 2005 only.

Recognizing referrals reliably is quite hard; I wouldn't immediately
know how to implement that in a stub resolver.  (Back in 2005,
referrals with a non-empty answer section were still common, I think.)
It's easier in a recursive resolver because you can just follow the
referral (with some safeguards to deal with loops and other
nastiness).  And you can do a lameness check if you know that the
sever should be authoritative.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-27 19:18               ` Florian Weimer
@ 2018-12-28 17:21                 ` Rich Felker
  2019-05-30  8:50                   ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Rich Felker @ 2018-12-28 17:21 UTC (permalink / raw)
  To: musl

On Thu, Dec 27, 2018 at 08:18:16PM +0100, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Thu, Dec 06, 2018 at 07:46:02PM +0000, Laurent Bercot wrote:
> >> >The musl resolver should be able to handle a resolver returning NODATA.
> >> >That is popular for having a separate extranet infrastructure - your
> >> >extranet DNS only contains records for your local domain and returns
> >> >NODATA for requests outside that domain.
> >> 
> >> No, you are talking about servers containing data. The musl client
> >> (which is not a resolver, because it only performs recursive queries)
> >> should not contact those directly. It should contact a real resolver,
> >> a.k.a. cache, and the cache will contact the servers containing data.
> >> If the domain has been configured properly, the servers are never asked
> >> for data that are outside that domain.
> >> 
> >> It is the single most annoying, most bug-prone, and most confusing
> >> flaw of DNS to have "communication between the DNS client and the DNS
> >> cache" (recursive queries) and "communication between the DNS cache
> >> and the DNS server" (non-recursive queries) happen on the same port.
> >> I'd even take a different _protocol_ if it could stop people from
> >> misconfiguring DNS.
> >> 
> >> The default usage of BIND, which was "one single daemon is both a
> >> cache and a server and we entertain the confusion", did a lot of harm
> >> to the Internet. As your post illustrates, this harm pertains to this
> >> day.
> >
> > I'm not sure what the relation to the confusion between querying an
> > authoritative server and a recursive server is here, but the quoted
> > interpretation of NODATA above is wrong independent of any such
> > confusion. NODATA does not indicate that the server you asked doesn't
> > know about the queried name. It indicates that that queried name
> > exists but has no records of the requested type.
> 
> Maybe a referral looks like a NODATA response upon cursory inspection?
> 
> glibc has code which switches to the next configured nameserver upon
> encountering what looks like a referral:
> 
> 		if (anhp->rcode == NOERROR && anhp->ancount == 0
> 		    && anhp->aa == 0 && anhp->ra == 0 && anhp->arcount == 0) {
> 			goto next_ns;
> 		}

Can you elaborate or provide a citation on how this "looks like a
referral"? I don't see any obvious difference between this and a
nodata response except possibly RA==0, which would only happen when
you have an auth-only nameserver listed in your resolv.conf. This
would not be useful for unioning in musl because it depends on an
ordering between the nameservers rather than providing a true union;
at least one of the servers is going to be recursive and return an
nxdomain or nodata which could be seen before the auth-only local
server responds.

> (Oops: When EDNS support is enabled, this check is buggy because
> anhp->arcount is not necessarily zero due to the OPT record.)

FTR, also not an issue for musl since we intentionally don't do EDNS.

> REFUSED is handled the same way, so I think this enables the
> misconfiguration A. Wilcox described.  Fortunately, we still only
> support three name servers, so there is a limit to what people can do
> with this.

Refused and other errors are pretty much ignored in musl; they're at
least not conclusive unless all nameservers have responded with an
error, since another one could still succeed.

Rich


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2018-12-28 17:21                 ` Rich Felker
@ 2019-05-30  8:50                   ` Florian Weimer
  2019-05-30 13:54                     ` Rich Felker
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2019-05-30  8:50 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> On Thu, Dec 27, 2018 at 08:18:16PM +0100, Florian Weimer wrote:
>> * Rich Felker:
>> 
>> > On Thu, Dec 06, 2018 at 07:46:02PM +0000, Laurent Bercot wrote:
>> >> >The musl resolver should be able to handle a resolver returning NODATA.
>> >> >That is popular for having a separate extranet infrastructure - your
>> >> >extranet DNS only contains records for your local domain and returns
>> >> >NODATA for requests outside that domain.
>> >> 
>> >> No, you are talking about servers containing data. The musl client
>> >> (which is not a resolver, because it only performs recursive queries)
>> >> should not contact those directly. It should contact a real resolver,
>> >> a.k.a. cache, and the cache will contact the servers containing data.
>> >> If the domain has been configured properly, the servers are never asked
>> >> for data that are outside that domain.
>> >> 
>> >> It is the single most annoying, most bug-prone, and most confusing
>> >> flaw of DNS to have "communication between the DNS client and the DNS
>> >> cache" (recursive queries) and "communication between the DNS cache
>> >> and the DNS server" (non-recursive queries) happen on the same port.
>> >> I'd even take a different _protocol_ if it could stop people from
>> >> misconfiguring DNS.
>> >> 
>> >> The default usage of BIND, which was "one single daemon is both a
>> >> cache and a server and we entertain the confusion", did a lot of harm
>> >> to the Internet. As your post illustrates, this harm pertains to this
>> >> day.
>> >
>> > I'm not sure what the relation to the confusion between querying an
>> > authoritative server and a recursive server is here, but the quoted
>> > interpretation of NODATA above is wrong independent of any such
>> > confusion. NODATA does not indicate that the server you asked doesn't
>> > know about the queried name. It indicates that that queried name
>> > exists but has no records of the requested type.
>> 
>> Maybe a referral looks like a NODATA response upon cursory inspection?
>> 
>> glibc has code which switches to the next configured nameserver upon
>> encountering what looks like a referral:
>> 
>> 		if (anhp->rcode == NOERROR && anhp->ancount == 0
>> 		    && anhp->aa == 0 && anhp->ra == 0 && anhp->arcount == 0) {
>> 			goto next_ns;
>> 		}
>
> Can you elaborate or provide a citation on how this "looks like a
> referral"? I don't see any obvious difference between this and a
> nodata response except possibly RA==0, which would only happen when
> you have an auth-only nameserver listed in your resolv.conf.

But that's exactly the scenario when people want to ignore referrals.
A name server which provides recursive service will never send a
referral, after all.  If it cannot complete the recursion, it will
respond with SERVFAIL instead.

> This would not be useful for unioning in musl because it depends on
> an ordering between the nameservers rather than providing a true
> union; at least one of the servers is going to be recursive and
> return an nxdomain or nodata which could be seen before the
> auth-only local server responds.

I expect that the authoritative-only server is put first in this case.

My position is that this is not really worth supporting, though.  It's
easy enough to run a local caching resolver which can implement such
policies, including forwarding queries for certain zones to certain
authoritative servers.  Then there's no need to resort to search path
hacks and listing non-recursive name servers in etc/resolv.conf.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: DNS resolver patch
  2019-05-30  8:50                   ` Florian Weimer
@ 2019-05-30 13:54                     ` Rich Felker
  0 siblings, 0 replies; 17+ messages in thread
From: Rich Felker @ 2019-05-30 13:54 UTC (permalink / raw)
  To: Florian Weimer; +Cc: musl

On Thu, May 30, 2019 at 10:50:22AM +0200, Florian Weimer wrote:
> * Rich Felker:
> > On Thu, Dec 27, 2018 at 08:18:16PM +0100, Florian Weimer wrote:
> >> glibc has code which switches to the next configured nameserver upon
> >> encountering what looks like a referral:
> >> 
> >> 		if (anhp->rcode == NOERROR && anhp->ancount == 0
> >> 		    && anhp->aa == 0 && anhp->ra == 0 && anhp->arcount == 0) {
> >> 			goto next_ns;
> >> 		}
> >
> > Can you elaborate or provide a citation on how this "looks like a
> > referral"? I don't see any obvious difference between this and a
> > nodata response except possibly RA==0, which would only happen when
> > you have an auth-only nameserver listed in your resolv.conf.
> 
> But that's exactly the scenario when people want to ignore referrals.
> A name server which provides recursive service will never send a
> referral, after all.  If it cannot complete the recursion, it will
> respond with SERVFAIL instead.

Yes, I just wasn't clear how to interpret that combination of bits. I
should re-read the spec I guess.

> > This would not be useful for unioning in musl because it depends on
> > an ordering between the nameservers rather than providing a true
> > union; at least one of the servers is going to be recursive and
> > return an nxdomain or nodata which could be seen before the
> > auth-only local server responds.
> 
> I expect that the authoritative-only server is put first in this case.

Note that musl does not have a concept of an order between the
nameservers; they're just treated as alternative sources for the same
data.

> My position is that this is not really worth supporting, though.  It's
> easy enough to run a local caching resolver which can implement such
> policies, including forwarding queries for certain zones to certain
> authoritative servers.  Then there's no need to resort to search path
> hacks and listing non-recursive name servers in etc/resolv.conf.

I agree completely. Fancy policy things like unioning and remapping
are best done in an external process. Moreover, with DNS privacy and
integrity becoming such a critical issue in the future (or perhaps the
present), it's going to be mandatory to run a (proxy, at least)
nameserver on localhost anyway to perform DNSSEC validation and/or
DNS-over-HTTPS.

Rich


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-05-30 13:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <882247050.3003099.1544074074084.JavaMail.zimbra@totalphase.com>
2018-12-06  5:31 ` DNS resolver patch Tarun Johar
2018-12-06 14:13   ` Rich Felker
2018-12-06 15:23     ` Natanael Copa
2018-12-06 14:53   ` Florian Weimer
2018-12-06 15:48     ` Natanael Copa
2018-12-06 18:18       ` Florian Weimer
2018-12-06 18:38         ` A. Wilcox
2018-12-06 19:46           ` Laurent Bercot
2018-12-25  2:06             ` Rich Felker
2018-12-27 19:18               ` Florian Weimer
2018-12-28 17:21                 ` Rich Felker
2019-05-30  8:50                   ` Florian Weimer
2019-05-30 13:54                     ` Rich Felker
2018-12-06 20:36           ` Florian Weimer
2018-12-06 21:01             ` Rich Felker
2018-12-06 18:50       ` Tarun Johar
2018-12-06 19:36         ` Tarun Johar

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).