mailing list of musl libc
 help / color / Atom feed
* [musl] Re: Outgoing DANE not working
       [not found]                 ` <CECAFB36-DA1B-4EFB-ACD1-294E3B121B2E@dukhovni.org>
@ 2020-04-13 18:35                   ` Rich Felker
       [not found]                     ` <20200413190412.GF41308@straasha.imrryr.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2020-04-13 18:35 UTC (permalink / raw)
  To: Postfix users; +Cc: musl

On Mon, Apr 13, 2020 at 02:15:14PM -0400, Viktor Dukhovni wrote:
> > On Apr 13, 2020, at 7:18 AM, Christian <list-christian@web.de> wrote:
> > 
> > FYI: I put your findings forward to the musl-libc mailing list and
> > asked what they now think what should be done.
> 
> The problem can be partly resolved by setting the "AD" bit in the
> outgoing DNS query header sent by the musl-libc stub resolver.  Then
> the local iterative resolver will return the AD bit in its response.
> 
> However, lack of support for retrying truncated responses over TCP
> or support for disabling RES_DEFNAMES and RES_DNSRCH remain as issues.

This has also been discussed some on the musl list already
(https://www.openwall.com/lists/musl/2020/04/13/1) but I'm replying
into this thread as well because I'd like to come to some mutually
acceptable solution.

musl's stub resolver intentionally speaks only rfc1035 udp, and the
intent has always been that DNSSEC validation and policy be the
responsibility of the nameserver running on localhost, not the stub
resolver or the calling application. The resolver is intentionally
stateless. It was probably a mistake to provide the fake _res
definition, and I'm interested in resolving that mistake either by
removing it or adding res_n* API that honor (parts of) it at some
point, but determining the right action here and coordinating with
distros to ensure they have fixes in place for anything that breaks
will take a while.

RES_DEFNAMES and RES_DNSRCH are irrelevant as search is never
performed by the res_* interfaces, and domain/search keywords are used
only by the high-level ones (getaddrinfo/getnameinfo and the old
legacy gethostby*).

What is relevant, as far as I can tell, is that Postfix wants a way to
perform an EDNS0 query that lets it distinguish between a valid signed
result and a valid unsigned result. This is currently not possible,
but would be practical to add based on "options edns0" in resolv.conf.
I'm not sure if or how soon that will happen, but determining that is
something I'd like to have come out of this discussion.

From my perspective, what would work best with what's always been the
intended DNSSEC usage model of musl would be if Postfix supported use
of DANE with smtp_dns_support_level=enabled, i.e. outsourcing all
DNSSEC functionality to the nameserver.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [musl] Re: Outgoing DANE not working
       [not found]                                     ` <878siucvqd.fsf@mid.deneb.enyo.de>
@ 2020-04-17 16:07                                       ` Rich Felker
  2020-04-18 17:14                                         ` [musl] TCP support in the stub resolver (was: Re: Outgoing DANE not working) Florian Weimer
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2020-04-17 16:07 UTC (permalink / raw)
  To: Florian Weimer; +Cc: musl

On Fri, Apr 17, 2020 at 11:22:34AM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Wed, Apr 15, 2020 at 08:27:08PM +0200, Florian Weimer wrote:
> >> >> I don't understand your PTR example.  It seems such a fringe case that
> >> >> people produce larger PTR responses because they add all virtual hosts
> >> >> to the reverse DNS zone.  Sure, it happens, but not often.
> >> >
> >> > I think it's probably more a matter of the concurrent lookups from
> >> > multiple nameservers (e.g. local, ISP, and G/CF, where local has
> >> > fastest round-trip but not much in cache, G/CF has nearly everything
> >> > in cache but slowest round trip, and ISP is middle on both) than lack
> >> > of tcp fallback that makes netstat etc. so much faster.
> >> 
> >> The question is: Why would you get a TC bit response?  Is the musl
> >> resolver code triggering some anti-spoofing measure that tries to
> >> validate source addresses over TCP?  (I forgot about this aspect of
> >> DNS.  Ugh.)
> >
> > TC bit is for truncation, and means that the complete response would
> > have been larger than 512 bytes and was truncated to whatever number
> > of whole RRs fit in 512 bytes.
> 
> You mentioned that TC processing added observable latency to the
> netstat tool.  netstat performs PTR queries.  Non-DNSSEC responses to
> PTR queries are rarely larger than 512 bytes.  (The only exception I
> have seen occur when people list all their HTTP virtual hosts in PTR
> records, but again, that's very rare.)  Typically, they are less than
> 150 bytes.  Non-minimal responses can be larger, but the additional
> data is removed without setting the TC bit.
> 
> This is why something very odd must have happened during your test.
> One explanation would be a middlebox that injects TC queries to
> validate source addresses.

I think this was just a misunderstanding. What I said was that things
like netstat run a lot faster in practice with musl than with other
resolvers in a range of typical setups, and the two potential factors
are concurrent requests to multiple nameservers and non-fallback to
TCP, and that I didn't have evidence for which it was. It sounds like
you have a good hypothesis that TCP is not the cause here.

> >> > However it's not clear how "fallback to tcp" logic should interact
> >> > with such concurrent requests -- switch to tcp for everything and
> >> > just one nameserver as soon as we get any TC response?
> >> 
> >> It's TCP for this query only, not all subsequent queries.  It makes
> >> sense to query the name server that provided the TC response: It
> >> reduces latency because that server is more likely to have the large
> >> response in its cache.
> >
> > I'm not talking about future queries but other unfinished queries that
> > are part of the same operation (presently just concurrent A and AAAA
> > lookups).
> 
> If the second response has TC set (but not the first), you can keep
> the first response.  Re-querying both over TCP increases the
> likelihood that you get a response from the same cluster node (so more
> consistency), but you won't get that over UDP, ever, so I don't think
> it matters.
> 
> If the first response has TC set, you have an open TCP connection you
> could use for the second query as well.  Pipelining of DNS requests
> has compatibility issues because there is no application-layer
> connection teardown (an equivalent to HTTP's Connection: close).  If
> the server closes the connection after sending the response to the
> first query, without reading the second, this is a TCP data loss
> event, which results in an RST segment and potentially, loss of the
> response to the first query.  Ideally, a client would wait for the
> second UDP response and the TCP response to arrive.  If the second UDP
> response is TC as well, the TCP query should be delayed until the
> first TCP response came back.
> 
> (We should move this discussion someplace else.)

Yes. I just took postfix-users off the CC and added musl. Discussing
it further on postfix-users does not seem constructive as the
arguments are mostly ideological (about what the roles of different
components "should be") vs practical (can we reasonably improve
behavior here?).

Indeed it sounds like one TCP connection would be needed per request,
so switchover would just be per-request if done.

My leaning is probably not to do fallback at all (complex logic,
potential for unexpected slowness, not needed by vast majority of
users) and just add TCP support with option use-vc for users who
really want complete replies. All of this would be contingent anyway
on making internal mechanisms able to handle variable result size
rather than fixed-size 512 bytes so it's not happening right away.
Doing it carelessly would create possibly dangerous bugs.

I'm still also somewhat of the opinion that users who want a resolver
library (res_* API) with lots of features should just link BIND's, but
it would be nice not to have to do that.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [musl] TCP support in the stub resolver (was: Re: Outgoing DANE not working)
  2020-04-17 16:07                                       ` Rich Felker
@ 2020-04-18 17:14                                         ` Florian Weimer
  2020-04-19  0:03                                           ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2020-04-18 17:14 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> On Fri, Apr 17, 2020 at 11:22:34AM +0200, Florian Weimer wrote:
>> >> > However it's not clear how "fallback to tcp" logic should interact
>> >> > with such concurrent requests -- switch to tcp for everything and
>> >> > just one nameserver as soon as we get any TC response?
>> >> 
>> >> It's TCP for this query only, not all subsequent queries.  It makes
>> >> sense to query the name server that provided the TC response: It
>> >> reduces latency because that server is more likely to have the large
>> >> response in its cache.
>> >
>> > I'm not talking about future queries but other unfinished queries that
>> > are part of the same operation (presently just concurrent A and AAAA
>> > lookups).
>> 
>> If the second response has TC set (but not the first), you can keep
>> the first response.  Re-querying both over TCP increases the
>> likelihood that you get a response from the same cluster node (so more
>> consistency), but you won't get that over UDP, ever, so I don't think
>> it matters.
>> 
>> If the first response has TC set, you have an open TCP connection you
>> could use for the second query as well.  Pipelining of DNS requests
>> has compatibility issues because there is no application-layer
>> connection teardown (an equivalent to HTTP's Connection: close).  If
>> the server closes the connection after sending the response to the
>> first query, without reading the second, this is a TCP data loss
>> event, which results in an RST segment and potentially, loss of the
>> response to the first query.  Ideally, a client would wait for the
>> second UDP response and the TCP response to arrive.  If the second UDP
>> response is TC as well, the TCP query should be delayed until the
>> first TCP response came back.

> Indeed it sounds like one TCP connection would be needed per request,
> so switchover would just be per-request if done.

No, you can reuse the connection for the second query (in most cases).
However, for maximum robustness, you should not send the second query
until the first response has arrived (no pipelining).  You may still
need a new connection for the second query if the TCP stream ends
without a response, though.

> My leaning is probably not to do fallback at all (complex logic,
> potential for unexpected slowness, not needed by vast majority of
> users) and just add TCP support with option use-vc for users who
> really want complete replies. All of this would be contingent anyway
> on making internal mechanisms able to handle variable result size
> rather than fixed-size 512 bytes so it's not happening right away.
> Doing it carelessly would create possibly dangerous bugs.

I still think it's wrong.  The protocol says that you must perform TCP
fallback.  If you don't, it's rather confusing for the libresolv
interfaces.

> I'm still also somewhat of the opinion that users who want a resolver
> library (res_* API) with lots of features should just link BIND's, but
> it would be nice not to have to do that.

You could drop the res_* interfaces from musl.  They are mostly needed
for non-address queries, and those are the ones that tend to be larger
than 512 bytes.

Then it might be possible that no one will notice the missing TCP
fallback.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver (was: Re: Outgoing DANE not working)
  2020-04-18 17:14                                         ` [musl] TCP support in the stub resolver (was: Re: Outgoing DANE not working) Florian Weimer
@ 2020-04-19  0:03                                           ` Rich Felker
  2020-04-19  8:12                                             ` [musl] TCP support in the stub resolver Florian Weimer
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2020-04-19  0:03 UTC (permalink / raw)
  To: Florian Weimer; +Cc: musl

On Sat, Apr 18, 2020 at 07:14:24PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Fri, Apr 17, 2020 at 11:22:34AM +0200, Florian Weimer wrote:
> >> >> > However it's not clear how "fallback to tcp" logic should interact
> >> >> > with such concurrent requests -- switch to tcp for everything and
> >> >> > just one nameserver as soon as we get any TC response?
> >> >> 
> >> >> It's TCP for this query only, not all subsequent queries.  It makes
> >> >> sense to query the name server that provided the TC response: It
> >> >> reduces latency because that server is more likely to have the large
> >> >> response in its cache.
> >> >
> >> > I'm not talking about future queries but other unfinished queries that
> >> > are part of the same operation (presently just concurrent A and AAAA
> >> > lookups).
> >> 
> >> If the second response has TC set (but not the first), you can keep
> >> the first response.  Re-querying both over TCP increases the
> >> likelihood that you get a response from the same cluster node (so more
> >> consistency), but you won't get that over UDP, ever, so I don't think
> >> it matters.
> >> 
> >> If the first response has TC set, you have an open TCP connection you
> >> could use for the second query as well.  Pipelining of DNS requests
> >> has compatibility issues because there is no application-layer
> >> connection teardown (an equivalent to HTTP's Connection: close).  If
> >> the server closes the connection after sending the response to the
> >> first query, without reading the second, this is a TCP data loss
> >> event, which results in an RST segment and potentially, loss of the
> >> response to the first query.  Ideally, a client would wait for the
> >> second UDP response and the TCP response to arrive.  If the second UDP
> >> response is TC as well, the TCP query should be delayed until the
> >> first TCP response came back.
> 
> > Indeed it sounds like one TCP connection would be needed per request,
> > so switchover would just be per-request if done.
> 
> No, you can reuse the connection for the second query (in most cases).
> However, for maximum robustness, you should not send the second query
> until the first response has arrived (no pipelining).  You may still
> need a new connection for the second query if the TCP stream ends
> without a response, though.

That's why you need one per request -- so you can make them
concurrently (can't assume pipelining).

> > My leaning is probably not to do fallback at all (complex logic,
> > potential for unexpected slowness, not needed by vast majority of
> > users) and just add TCP support with option use-vc for users who
> > really want complete replies. All of this would be contingent anyway
> > on making internal mechanisms able to handle variable result size
> > rather than fixed-size 512 bytes so it's not happening right away.
> > Doing it carelessly would create possibly dangerous bugs.
> 
> I still think it's wrong.  The protocol says that you must perform TCP
> fallback.  If you don't, it's rather confusing for the libresolv
> interfaces.

There's a clause I'd have to look up again, but that explicitly says
(roughly, I'm paraphrasing this from memory) you have the option not
to in settings where it wouldn't be appropriate to do so or where
you're happy with the truncated responses. The reason my leaning is to
make it require explicit configuration to use TCP is that the vast
majority of musl users seem happy with what it's doing now, which
*was* intentional; it'd be nice not to change that without explicit
user intent to do so. Also, making TCP available only in TCP-only
(use-vc) mode would perform badly with remote nameservers, which would
strongly encourage users who want large responses (which are almost
certainly things that do need DNSSEC validation) to setup a proper
local validating nameserver.

Of course all of this has prerequisite core changes that'd need to be
made before it could be done, so nothing's going to happen either way
in the short term.

> > I'm still also somewhat of the opinion that users who want a resolver
> > library (res_* API) with lots of features should just link BIND's, but
> > it would be nice not to have to do that.
> 
> You could drop the res_* interfaces from musl.  They are mostly needed
> for non-address queries, and those are the ones that tend to be larger
> than 512 bytes.

They're sufficient for pretty much everything that actually matters,
and very convenient to have. Removing them seems like it has no
advantages. If someone *really* wants more functionality they can link
BIND's libresolv, or we can evaluate adding the functionality they're
missing.

> Then it might be possible that no one will notice the missing TCP
> fallback.

Really almost no one has noticed it so far, and the places where it
was noticed were buggy (IIRC Google or Cloudflare) nameservers that
were sending an empty response on truncation rather than a properly
truncated response, which seems to have since been fixed. (And in this
case the fallback would have been a major performance hit, so it was
nice that it was caught and fixed instead).

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-04-19  0:03                                           ` Rich Felker
@ 2020-04-19  8:12                                             ` Florian Weimer
  2020-04-20  1:24                                               ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2020-04-19  8:12 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

>> No, you can reuse the connection for the second query (in most cases).
>> However, for maximum robustness, you should not send the second query
>> until the first response has arrived (no pipelining).  You may still
>> need a new connection for the second query if the TCP stream ends
>> without a response, though.
>
> That's why you need one per request -- so you can make them
> concurrently (can't assume pipelining).

Since the other query has likely already been cached in the recursive
resolver due to the UDP query (which is already in progress), the
second TCP query only saves one round-trip, I think.  Is that really
worth it?

>> Then it might be possible that no one will notice the missing TCP
>> fallback.
>
> Really almost no one has noticed it so far, and the places where it
> was noticed were buggy (IIRC Google or Cloudflare) nameservers that
> were sending an empty response on truncation rather than a properly
> truncated response, which seems to have since been fixed. (And in this
> case the fallback would have been a major performance hit, so it was
> nice that it was caught and fixed instead).

SPF lookups for various domains return other TXT records, which push
the size of the response over the limit.  There is no way to fix this
on the recursive resolver side because the TXT RRset is itself larger
than 512 bytes.

TXT RRsets for DKIM can also approach, but i have not seen them cross
it.

This is just one application, receiving mail with some form of
authentcation, that requires TCP fallback.  I'm sure there other
applications.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-04-19  8:12                                             ` [musl] TCP support in the stub resolver Florian Weimer
@ 2020-04-20  1:24                                               ` Rich Felker
  2020-04-20  6:26                                                 ` Florian Weimer
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2020-04-20  1:24 UTC (permalink / raw)
  To: Florian Weimer; +Cc: musl

On Sun, Apr 19, 2020 at 10:12:56AM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> >> No, you can reuse the connection for the second query (in most cases).
> >> However, for maximum robustness, you should not send the second query
> >> until the first response has arrived (no pipelining).  You may still
> >> need a new connection for the second query if the TCP stream ends
> >> without a response, though.
> >
> > That's why you need one per request -- so you can make them
> > concurrently (can't assume pipelining).
> 
> Since the other query has likely already been cached in the recursive
> resolver due to the UDP query (which is already in progress), the
> second TCP query only saves one round-trip, I think.  Is that really
> worth it?

If the nameserver is not local, absolutely. A round trip can be over
500 ms.

> >> Then it might be possible that no one will notice the missing TCP
> >> fallback.
> >
> > Really almost no one has noticed it so far, and the places where it
> > was noticed were buggy (IIRC Google or Cloudflare) nameservers that
> > were sending an empty response on truncation rather than a properly
> > truncated response, which seems to have since been fixed. (And in this
> > case the fallback would have been a major performance hit, so it was
> > nice that it was caught and fixed instead).
> 
> SPF lookups for various domains return other TXT records, which push
> the size of the response over the limit.  There is no way to fix this
> on the recursive resolver side because the TXT RRset is itself larger
> than 512 bytes.
> 
> TXT RRsets for DKIM can also approach, but i have not seen them cross
> it.
> 
> This is just one application, receiving mail with some form of
> authentcation, that requires TCP fallback.  I'm sure there other
> applications.

Yes. I don't claim there aren't potential cases where it's wanted,
just that it hasn't come up aside from the buggy NS with empty TC
response.

Anything related to mail is a case where you really really should be
running a local DNSSEC-validating nameserver, which adds to the appeal
of just doing TCP to begin with (activated by use-vc) or not at all.
But I think before making any decisions I should get started on the
prereq work to make it even possible. The core is currently built
around having an array of up to 2 (for A and AAAA at same time) [512]
arrays the response packets go into. That's changeable without too
much work but requires some care since this is attack surface code.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-04-20  1:24                                               ` Rich Felker
@ 2020-04-20  6:26                                                 ` Florian Weimer
  2020-04-20 17:39                                                   ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2020-04-20  6:26 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> On Sun, Apr 19, 2020 at 10:12:56AM +0200, Florian Weimer wrote:
>> * Rich Felker:
>> 
>> >> No, you can reuse the connection for the second query (in most cases).
>> >> However, for maximum robustness, you should not send the second query
>> >> until the first response has arrived (no pipelining).  You may still
>> >> need a new connection for the second query if the TCP stream ends
>> >> without a response, though.
>> >
>> > That's why you need one per request -- so you can make them
>> > concurrently (can't assume pipelining).
>> 
>> Since the other query has likely already been cached in the recursive
>> resolver due to the UDP query (which is already in progress), the
>> second TCP query only saves one round-trip, I think.  Is that really
>> worth it?
>
> If the nameserver is not local, absolutely. A round trip can be over
> 500 ms.

Sure, but you have to put this into context. In this situation, you
already need three roundtrips (UDP query, TCP handshake, TCP query).
The other TCP handshake increases the packet count quite noticeably.

>> >> Then it might be possible that no one will notice the missing TCP
>> >> fallback.
>> >
>> > Really almost no one has noticed it so far, and the places where it
>> > was noticed were buggy (IIRC Google or Cloudflare) nameservers that
>> > were sending an empty response on truncation rather than a properly
>> > truncated response, which seems to have since been fixed. (And in this
>> > case the fallback would have been a major performance hit, so it was
>> > nice that it was caught and fixed instead).
>> 
>> SPF lookups for various domains return other TXT records, which push
>> the size of the response over the limit.  There is no way to fix this
>> on the recursive resolver side because the TXT RRset is itself larger
>> than 512 bytes.
>> 
>> TXT RRsets for DKIM can also approach, but i have not seen them cross
>> it.
>> 
>> This is just one application, receiving mail with some form of
>> authentcation, that requires TCP fallback.  I'm sure there other
>> applications.
>
> Yes. I don't claim there aren't potential cases where it's wanted,
> just that it hasn't come up aside from the buggy NS with empty TC
> response.

I don't quite understand why you keep claiming that.

For this TXT response, it's not a bug:

; <<>> DiG 9.11.5-P4-5.1-Debian <<>> +ignore +noedns ebay.com txt
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43378
;; flags: qr tc rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;ebay.com.			IN	TXT

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Apr 20 07:57:10 CEST 2020
;; MSG SIZE  rcvd: 26

Lack of users reporting this could just mean that there are no users
running mail servers that use SPF authentication with musl.

> Anything related to mail is a case where you really really should be
> running a local DNSSEC-validating nameserver, which adds to the appeal
> of just doing TCP to begin with (activated by use-vc) or not at all.

Always using TCP for what is essentially a fringe case (but
unfortunately one that is needed for correctness) seems very wasteful.

With a local DNS server, EDNS with really large buffer size seems much
more attractive.  But for maximum compatibility, you will have to
rewrite the response to strip out the OPT record.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-04-20  6:26                                                 ` Florian Weimer
@ 2020-04-20 17:39                                                   ` Rich Felker
  2020-04-21  9:48                                                     ` Florian Weimer
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2020-04-20 17:39 UTC (permalink / raw)
  To: Florian Weimer; +Cc: musl

On Mon, Apr 20, 2020 at 08:26:45AM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Sun, Apr 19, 2020 at 10:12:56AM +0200, Florian Weimer wrote:
> >> * Rich Felker:
> >> 
> >> >> No, you can reuse the connection for the second query (in most cases).
> >> >> However, for maximum robustness, you should not send the second query
> >> >> until the first response has arrived (no pipelining).  You may still
> >> >> need a new connection for the second query if the TCP stream ends
> >> >> without a response, though.
> >> >
> >> > That's why you need one per request -- so you can make them
> >> > concurrently (can't assume pipelining).
> >> 
> >> Since the other query has likely already been cached in the recursive
> >> resolver due to the UDP query (which is already in progress), the
> >> second TCP query only saves one round-trip, I think.  Is that really
> >> worth it?
> >
> > If the nameserver is not local, absolutely. A round trip can be over
> > 500 ms.
> 
> Sure, but you have to put this into context. In this situation, you
> already need three roundtrips (UDP query, TCP handshake, TCP query).
> The other TCP handshake increases the packet count quite noticeably.

Yes but they happen concurrently. Of course it's possible that you
have bandwidth so low that latency is affected by throughput, but I
think the far more common case nowadays is moderately fast connection
(3G cellular, possibly rate-limited, or DSL) but saturated with other
much-higher-volume traffic causing the latency.

BTW, am I mistaken or can TCP fastopen make it so you can get a DNS
reply with no additional round-trips? (query in the payload with
fastopen, response sent immediately after SYN-ACK before receiving ACK
from client, and nobody has to wait for connection to be closed) Of
course there are problems with fastopen that lead to it often being
disabled so it's not a full substitute for UDP.

> >> >> Then it might be possible that no one will notice the missing TCP
> >> >> fallback.
> >> >
> >> > Really almost no one has noticed it so far, and the places where it
> >> > was noticed were buggy (IIRC Google or Cloudflare) nameservers that
> >> > were sending an empty response on truncation rather than a properly
> >> > truncated response, which seems to have since been fixed. (And in this
> >> > case the fallback would have been a major performance hit, so it was
> >> > nice that it was caught and fixed instead).
> >> 
> >> SPF lookups for various domains return other TXT records, which push
> >> the size of the response over the limit.  There is no way to fix this
> >> on the recursive resolver side because the TXT RRset is itself larger
> >> than 512 bytes.
> >> 
> >> TXT RRsets for DKIM can also approach, but i have not seen them cross
> >> it.
> >> 
> >> This is just one application, receiving mail with some form of
> >> authentcation, that requires TCP fallback.  I'm sure there other
> >> applications.
> >
> > Yes. I don't claim there aren't potential cases where it's wanted,
> > just that it hasn't come up aside from the buggy NS with empty TC
> > response.
> 
> I don't quite understand why you keep claiming that.
> 
> For this TXT response, it's not a bug:
> 
> ; <<>> DiG 9.11.5-P4-5.1-Debian <<>> +ignore +noedns ebay.com txt
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43378
> ;; flags: qr tc rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> 
> ;; QUESTION SECTION:
> ;ebay.com.			IN	TXT
> 
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#53(127.0.0.1)
> ;; WHEN: Mon Apr 20 07:57:10 CEST 2020
> ;; MSG SIZE  rcvd: 26
> 
> Lack of users reporting this could just mean that there are no users
> running mail servers that use SPF authentication with musl.

I agree that's probably what it means. I'm just saying the only user
reports I'm aware of were the empty TC responses.

> > Anything related to mail is a case where you really really should be
> > running a local DNSSEC-validating nameserver, which adds to the appeal
> > of just doing TCP to begin with (activated by use-vc) or not at all.
> 
> Always using TCP for what is essentially a fringe case (but
> unfortunately one that is needed for correctness) seems very wasteful.
> 
> With a local DNS server, EDNS with really large buffer size seems much
> more attractive.  But for maximum compatibility, you will have to
> rewrite the response to strip out the OPT record.

Do you think EDNS support eliminates the need for TCP? I was under the
impression that, for practical purposes, it mostly does, but a certain
contingent of DNS purists will insist on being able to get giant 64k
RRsets despite the creation of such records not being compatible with
a decent portion (including many things other than musl) of
clients/stub resolvers.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-04-20 17:39                                                   ` Rich Felker
@ 2020-04-21  9:48                                                     ` Florian Weimer
  2020-04-21 15:02                                                       ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2020-04-21  9:48 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> On Mon, Apr 20, 2020 at 08:26:45AM +0200, Florian Weimer wrote:
>> * Rich Felker:
>> 
>> > On Sun, Apr 19, 2020 at 10:12:56AM +0200, Florian Weimer wrote:
>> >> * Rich Felker:
>> >> 
>> >> >> No, you can reuse the connection for the second query (in most cases).
>> >> >> However, for maximum robustness, you should not send the second query
>> >> >> until the first response has arrived (no pipelining).  You may still
>> >> >> need a new connection for the second query if the TCP stream ends
>> >> >> without a response, though.
>> >> >
>> >> > That's why you need one per request -- so you can make them
>> >> > concurrently (can't assume pipelining).
>> >> 
>> >> Since the other query has likely already been cached in the recursive
>> >> resolver due to the UDP query (which is already in progress), the
>> >> second TCP query only saves one round-trip, I think.  Is that really
>> >> worth it?
>> >
>> > If the nameserver is not local, absolutely. A round trip can be over
>> > 500 ms.
>> 
>> Sure, but you have to put this into context. In this situation, you
>> already need three roundtrips (UDP query, TCP handshake, TCP query).
>> The other TCP handshake increases the packet count quite noticeably.
>
> Yes but they happen concurrently. Of course it's possible that you
> have bandwidth so low that latency is affected by throughput, but I
> think the far more common case nowadays is moderately fast connection
> (3G cellular, possibly rate-limited, or DSL) but saturated with other
> much-higher-volume traffic causing the latency.

I'm not sure.  It should be possible to measure this.

Generally, once you have to use TCP, performance will not be good in
any case, especially if the recursive resolver is not local.

I'm excited that Fedora plans to add a local caching resolver by
default.  It will help with a lot of these issues.

> BTW, am I mistaken or can TCP fastopen make it so you can get a DNS
> reply with no additional round-trips? (query in the payload with
> fastopen, response sent immediately after SYN-ACK before receiving ACK
> from client, and nobody has to wait for connection to be closed) Of
> course there are problems with fastopen that lead to it often being
> disabled so it's not a full substitute for UDP.

There's no handshake to enable it, so it would have to be an
/etc/resolv.conf setting.  It's also not clear how you would perform
auto-detection that works across arbitrary middleboxen.  I don't think
it's useful for an in-process stub resolver.

> Do you think EDNS support eliminates the need for TCP?

There is a window of package sizes where it avoids TCP and works (from
512 to something between 1200 and 1500 bytes) *if* the recursive
resolver does EDNS at all.  For decent compatibility, you would have
to have heuristics in the stub resolver to figure out if
FORMERR/NOTIMP and missing responses are due to lack of EDNS support.

The other problem with EDNS is that for sizes on the large end
(certainly above the MTU), it depends on fragmentation.  Fragmentation
is completely insecure because in DNS packets, all the randomness is
in one fragment, so packet spoofing only needs to guess the fragment
ID (and the recipient IP stack will provide the UDP port for free).
Some of us have been working on eliminating fragmented DNS responses
for that reason, which unfortunately reduces the reach of EDNS
somewhat.

Above 4096 bytes, pretty much all recursive resolvers will send TC
responses even if the client offers a larger buffer size.  This means
for correctness, you cannot do away with TCP support.

Some implementations have used a longer sequence of transports: DNS
over UDP, EDNS over UDP, and finally TCP.  That avoids EDNS
pseudo-negotiation until it is actually needed.  I'm not aware of any
stub resolvers doing that, though.

(Things change if you connect to a local stub resolver, of course.)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-04-21  9:48                                                     ` Florian Weimer
@ 2020-04-21 15:02                                                       ` Rich Felker
  2020-04-21 17:26                                                         ` Florian Weimer
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2020-04-21 15:02 UTC (permalink / raw)
  To: Florian Weimer; +Cc: musl

On Tue, Apr 21, 2020 at 11:48:10AM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Mon, Apr 20, 2020 at 08:26:45AM +0200, Florian Weimer wrote:
> >> * Rich Felker:
> >> 
> >> > On Sun, Apr 19, 2020 at 10:12:56AM +0200, Florian Weimer wrote:
> >> >> * Rich Felker:
> >> >> 
> >> >> >> No, you can reuse the connection for the second query (in most cases).
> >> >> >> However, for maximum robustness, you should not send the second query
> >> >> >> until the first response has arrived (no pipelining).  You may still
> >> >> >> need a new connection for the second query if the TCP stream ends
> >> >> >> without a response, though.
> >> >> >
> >> >> > That's why you need one per request -- so you can make them
> >> >> > concurrently (can't assume pipelining).
> >> >> 
> >> >> Since the other query has likely already been cached in the recursive
> >> >> resolver due to the UDP query (which is already in progress), the
> >> >> second TCP query only saves one round-trip, I think.  Is that really
> >> >> worth it?
> >> >
> >> > If the nameserver is not local, absolutely. A round trip can be over
> >> > 500 ms.
> >> 
> >> Sure, but you have to put this into context. In this situation, you
> >> already need three roundtrips (UDP query, TCP handshake, TCP query).
> >> The other TCP handshake increases the packet count quite noticeably.
> >
> > Yes but they happen concurrently. Of course it's possible that you
> > have bandwidth so low that latency is affected by throughput, but I
> > think the far more common case nowadays is moderately fast connection
> > (3G cellular, possibly rate-limited, or DSL) but saturated with other
> > much-higher-volume traffic causing the latency.
> 
> I'm not sure.  It should be possible to measure this.
> 
> Generally, once you have to use TCP, performance will not be good in
> any case, especially if the recursive resolver is not local.
> 
> I'm excited that Fedora plans to add a local caching resolver by
> default.  It will help with a lot of these issues.

That's great news! Will it be DNSSEC-enforcing by default?

> > BTW, am I mistaken or can TCP fastopen make it so you can get a DNS
> > reply with no additional round-trips? (query in the payload with
> > fastopen, response sent immediately after SYN-ACK before receiving ACK
> > from client, and nobody has to wait for connection to be closed) Of
> > course there are problems with fastopen that lead to it often being
> > disabled so it's not a full substitute for UDP.
> 
> There's no handshake to enable it, so it would have to be an
> /etc/resolv.conf setting.  It's also not clear how you would perform
> auto-detection that works across arbitrary middleboxen.  I don't think
> it's useful for an in-process stub resolver.

The kernel automatically does it, and AIUI automatically falls back to
normal TCP (sending the payload as a separate packet) if it's not
supported. It does this by remembering a cookie for the destination
which the destination advertised in an earlier connection.

Unfortunately the cookie system is a tracking vector (that pokes
through the anonymization of NAT/CGN), making it undesirable for
clients to accept any cookie but a zero-length one (which the spec
allows, but which requires separate DoS mitigations like
auto-disabling fastopen under too many concurrent attempts).

> > Do you think EDNS support eliminates the need for TCP?
> 
> There is a window of package sizes where it avoids TCP and works (from
> 512 to something between 1200 and 1500 bytes) *if* the recursive
> resolver does EDNS at all.  For decent compatibility, you would have
> to have heuristics in the stub resolver to figure out if
> FORMERR/NOTIMP and missing responses are due to lack of EDNS support.

I had in mind just the resolv.conf option, no fallback. Once you do
fallbacks things get slow, and it's an invisible/unreported slowness
until someone does tcpdump or strace and sees why...

> The other problem with EDNS is that for sizes on the large end
> (certainly above the MTU), it depends on fragmentation.  Fragmentation
> is completely insecure because in DNS packets, all the randomness is
> in one fragment, so packet spoofing only needs to guess the fragment
> ID (and the recipient IP stack will provide the UDP port for free).
> Some of us have been working on eliminating fragmented DNS responses
> for that reason, which unfortunately reduces the reach of EDNS
> somewhat.

Well DNS is completely insecure anyway unless you're validating DNSSEC
locally. Yes the fragmentation issue makes it a lot easier to blindly
spoof (as opposed to needing ability to intercept/MITM).

> Above 4096 bytes, pretty much all recursive resolvers will send TC
> responses even if the client offers a larger buffer size.  This means
> for correctness, you cannot do away with TCP support.

In that case doing EDNS at all seems a lot less useful. Fragmentation
is always a possibility above min MTU (essentially same limit as
original UDP DNS) and the large responses are almost surely things you
do want to avoid forgery on, which leads me back around to thinking
that if you want them you really really need to be running a local
DNSSEC validating nameserver and then can just use-vc...

> Some implementations have used a longer sequence of transports: DNS
> over UDP, EDNS over UDP, and finally TCP.  That avoids EDNS
> pseudo-negotiation until it is actually needed.  I'm not aware of any
> stub resolvers doing that, though.

Yeah, each fallback is just going to increase total latency though,
very badly if they're all remote.

Actually, the current musl approach adapted to this would be to just
do them all concurrently: DNS/UDP, EDNS/UDP, and DNS/TCP, and accept
the first answer that's not truncated or broken server
(servfail/formerr/notimp), basically same as we do now but with more
choices. But that's getting heavier on unwanted network traffic...

> (Things change if you connect to a local stub resolver, of course.)

Yes, and that's clearly the future. Which has me looking back towards
designing around the future with opt-in rather than trying to make
these queries for large RRsets work in broken insecure setups.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-04-21 15:02                                                       ` Rich Felker
@ 2020-04-21 17:26                                                         ` Florian Weimer
  2020-05-01 22:02                                                           ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2020-04-21 17:26 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

>> I'm excited that Fedora plans to add a local caching resolver by
>> default.  It will help with a lot of these issues.
>
> That's great news! Will it be DNSSEC-enforcing by default?

No.  It is currently not even DNSSEC-aware, in the sense that you
can't get any DNSSEC data from it.  That's the sad part.

>> > BTW, am I mistaken or can TCP fastopen make it so you can get a DNS
>> > reply with no additional round-trips? (query in the payload with
>> > fastopen, response sent immediately after SYN-ACK before receiving ACK
>> > from client, and nobody has to wait for connection to be closed) Of
>> > course there are problems with fastopen that lead to it often being
>> > disabled so it's not a full substitute for UDP.
>> 
>> There's no handshake to enable it, so it would have to be an
>> /etc/resolv.conf setting.  It's also not clear how you would perform
>> auto-detection that works across arbitrary middleboxen.  I don't think
>> it's useful for an in-process stub resolver.
>
> The kernel automatically does it,

Surely not, it causes too many interoperability issues for that.  It's
also difficult to fit it into the BSD sockets API.  As far as I can
see, you have to use sendmsg or sendto with MSG_FASTOPEN instead of a
connect call to establish the connection.

(When the kernel says that it's enabled by default, it means that you
can use MSG_FASTOPEN with sysctl tweaks.)

>> The other problem with EDNS is that for sizes on the large end
>> (certainly above the MTU), it depends on fragmentation.  Fragmentation
>> is completely insecure because in DNS packets, all the randomness is
>> in one fragment, so packet spoofing only needs to guess the fragment
>> ID (and the recipient IP stack will provide the UDP port for free).
>> Some of us have been working on eliminating fragmented DNS responses
>> for that reason, which unfortunately reduces the reach of EDNS
>> somewhat.
>
> Well DNS is completely insecure anyway unless you're validating DNSSEC
> locally.

It's not, it works quite well actually in the absence of on-path
attackers.

Even DNSSEC still needs that level of security because a resolver
often has to use unsigned data to figure out where to send the next
query.  If DNS were completely insecure, DNSSEC would still break
because it's prone to denial-of-service attacks on the unsigned
routing data.

> Yes the fragmentation issue makes it a lot easier to blindly
> spoof (as opposed to needing ability to intercept/MITM).

And that difference does matter.

>> Above 4096 bytes, pretty much all recursive resolvers will send TC
>> responses even if the client offers a larger buffer size.  This means
>> for correctness, you cannot do away with TCP support.
>
> In that case doing EDNS at all seems a lot less useful. Fragmentation
> is always a possibility above min MTU (essentially same limit as
> original UDP DNS) and the large responses are almost surely things you
> do want to avoid forgery on, which leads me back around to thinking
> that if you want them you really really need to be running a local
> DNSSEC validating nameserver and then can just use-vc...

Why use use-vc at all?  Some software *will* break because it assumes
that certain libc calls do not keep open some random file descriptor.

>> Some implementations have used a longer sequence of transports: DNS
>> over UDP, EDNS over UDP, and finally TCP.  That avoids EDNS
>> pseudo-negotiation until it is actually needed.  I'm not aware of any
>> stub resolvers doing that, though.
>
> Yeah, each fallback is just going to increase total latency though,
> very badly if they're all remote.
>
> Actually, the current musl approach adapted to this would be to just
> do them all concurrently: DNS/UDP, EDNS/UDP, and DNS/TCP, and accept
> the first answer that's not truncated or broken server
> (servfail/formerr/notimp), basically same as we do now but with more
> choices. But that's getting heavier on unwanted network traffic...

Aggressive parallel queries tend to break middleboxes.  Even A/AAAA is
problematic.  Good interoperability and good performance are difficult
to obtain, particularly from short-lived processes.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-04-21 17:26                                                         ` Florian Weimer
@ 2020-05-01 22:02                                                           ` Rich Felker
  2020-05-02 15:28                                                             ` Florian Weimer
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2020-05-01 22:02 UTC (permalink / raw)
  To: Florian Weimer; +Cc: musl

On Tue, Apr 21, 2020 at 07:26:08PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> >> I'm excited that Fedora plans to add a local caching resolver by
> >> default.  It will help with a lot of these issues.
> >
> > That's great news! Will it be DNSSEC-enforcing by default?
> 
> No.  It is currently not even DNSSEC-aware, in the sense that you
> can't get any DNSSEC data from it.  That's the sad part.

That's really disappointing. Why? Both systemd-resolved and dnsmasq,
the two reasonable (well, reasonable for distros using systemd already
in the systemd-resolved case :) options for this, support DNSSEC fully
as I understand it. Is it just being turned off by default because of
risk of breaking things, or is some other implementation that lacks
DNSSEC being used?

> >> > BTW, am I mistaken or can TCP fastopen make it so you can get a DNS
> >> > reply with no additional round-trips? (query in the payload with
> >> > fastopen, response sent immediately after SYN-ACK before receiving ACK
> >> > from client, and nobody has to wait for connection to be closed) Of
> >> > course there are problems with fastopen that lead to it often being
> >> > disabled so it's not a full substitute for UDP.
> >> 
> >> There's no handshake to enable it, so it would have to be an
> >> /etc/resolv.conf setting.  It's also not clear how you would perform
> >> auto-detection that works across arbitrary middleboxen.  I don't think
> >> it's useful for an in-process stub resolver.
> >
> > The kernel automatically does it,
> 
> Surely not, it causes too many interoperability issues for that.  It's
> also difficult to fit it into the BSD sockets API.  As far as I can
> see, you have to use sendmsg or sendto with MSG_FASTOPEN instead of a
> connect call to establish the connection.
> 
> (When the kernel says that it's enabled by default, it means that you
> can use MSG_FASTOPEN with sysctl tweaks.)

What I mean is that, if you use MSG_FASTOPEN on a kernel new enough to
understand it, I think it makes a normal TCP connection and sends the
data if fastopen is not enabled or not supported by the remote host,
but uses fastopen as long as it's enabled and supported. In this sense
it's automatic. But of course we'd have to fallback explicitly anyway
if it's not supported in order to maintain compatibility with older
kernels.

> >> Above 4096 bytes, pretty much all recursive resolvers will send TC
> >> responses even if the client offers a larger buffer size.  This means
> >> for correctness, you cannot do away with TCP support.
> >
> > In that case doing EDNS at all seems a lot less useful. Fragmentation
> > is always a possibility above min MTU (essentially same limit as
> > original UDP DNS) and the large responses are almost surely things you
> > do want to avoid forgery on, which leads me back around to thinking
> > that if you want them you really really need to be running a local
> > DNSSEC validating nameserver and then can just use-vc...
> 
> Why use use-vc at all?  Some software *will* break because it assumes
> that certain libc calls do not keep open some random file descriptor.

Does use-vc do that (keep the fd open) in glibc? It doesn't seem to be
documented that way, just as forcing use of tcp, and my intent was not
to keep any fd open (since you need a separate fd per query anyway to
do them in parallel or in case the server closes the socket after one
reply).

> >> Some implementations have used a longer sequence of transports: DNS
> >> over UDP, EDNS over UDP, and finally TCP.  That avoids EDNS
> >> pseudo-negotiation until it is actually needed.  I'm not aware of any
> >> stub resolvers doing that, though.
> >
> > Yeah, each fallback is just going to increase total latency though,
> > very badly if they're all remote.
> >
> > Actually, the current musl approach adapted to this would be to just
> > do them all concurrently: DNS/UDP, EDNS/UDP, and DNS/TCP, and accept
> > the first answer that's not truncated or broken server
> > (servfail/formerr/notimp), basically same as we do now but with more
> > choices. But that's getting heavier on unwanted network traffic...
> 
> Aggressive parallel queries tend to break middleboxes.  Even A/AAAA is
> problematic.  Good interoperability and good performance are difficult
> to obtain, particularly from short-lived processes.

Yes, and currently we do them anyway and just don't care. It's
possible that there are users who are just working around this by not
configuring IPv6 and only using apps that call gethostbyname (ipv4
only) or use AI_ADDRCONFIG, but the latter was not supported at all in
musl until fairly recently, and it only takes effect if you _fully_
disable IPv6 (including on lo and link-local addrs), so I'd think
someone would complain if it were a real problem.

If sending queries with AD bit set is less of a compatibility issue
than parallel queries, I think we can probably just do it
unconditionally. And if anyone really _really_ wants to run in an
environment with broken nameservers, iptables should be able to
reject, redirect, or rewrite packets as needed to get something the
broken server can handle...

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-05-01 22:02                                                           ` Rich Felker
@ 2020-05-02 15:28                                                             ` Florian Weimer
  2020-05-02 15:44                                                               ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2020-05-02 15:28 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> On Tue, Apr 21, 2020 at 07:26:08PM +0200, Florian Weimer wrote:
>> * Rich Felker:
>> 
>> >> I'm excited that Fedora plans to add a local caching resolver by
>> >> default.  It will help with a lot of these issues.
>> >
>> > That's great news! Will it be DNSSEC-enforcing by default?
>> 
>> No.  It is currently not even DNSSEC-aware, in the sense that you
>> can't get any DNSSEC data from it.  That's the sad part.
>
> That's really disappointing. Why? Both systemd-resolved and dnsmasq,
> the two reasonable (well, reasonable for distros using systemd already
> in the systemd-resolved case :) options for this, support DNSSEC fully
> as I understand it. Is it just being turned off by default because of
> risk of breaking things, or is some other implementation that lacks
> DNSSEC being used?

It's systemd-resolved.  As far as I can tell, it does not provide
DNSSEC data on the DNS client interface.

>> >> > BTW, am I mistaken or can TCP fastopen make it so you can get a DNS
>> >> > reply with no additional round-trips? (query in the payload with
>> >> > fastopen, response sent immediately after SYN-ACK before receiving ACK
>> >> > from client, and nobody has to wait for connection to be closed) Of
>> >> > course there are problems with fastopen that lead to it often being
>> >> > disabled so it's not a full substitute for UDP.
>> >> 
>> >> There's no handshake to enable it, so it would have to be an
>> >> /etc/resolv.conf setting.  It's also not clear how you would perform
>> >> auto-detection that works across arbitrary middleboxen.  I don't think
>> >> it's useful for an in-process stub resolver.
>> >
>> > The kernel automatically does it,
>> 
>> Surely not, it causes too many interoperability issues for that.  It's
>> also difficult to fit it into the BSD sockets API.  As far as I can
>> see, you have to use sendmsg or sendto with MSG_FASTOPEN instead of a
>> connect call to establish the connection.
>> 
>> (When the kernel says that it's enabled by default, it means that you
>> can use MSG_FASTOPEN with sysctl tweaks.)
>
> What I mean is that, if you use MSG_FASTOPEN on a kernel new enough to
> understand it, I think it makes a normal TCP connection and sends the
> data if fastopen is not enabled or not supported by the remote host,
> but uses fastopen as long as it's enabled and supported. In this sense
> it's automatic. But of course we'd have to fallback explicitly anyway
> if it's not supported in order to maintain compatibility with older
> kernels.

I found this in the kernel sources.  It's a bit worrying.

/*
 * The following code block is to deal with middle box issues with TFO:
 * Middlebox firewall issues can potentially cause server's data being
 * blackholed after a successful 3WHS using TFO.
 * The proposed solution is to disable active TFO globally under the
 * following circumstances:
 *   1. client side TFO socket receives out of order FIN
 *   2. client side TFO socket receives out of order RST
 *   3. client side TFO socket has timed out three times consecutively during
 *      or after handshake
 * We disable active side TFO globally for 1hr at first. Then if it
 * happens again, we disable it for 2h, then 4h, 8h, ...
 * And we reset the timeout back to 1hr when we see a successful active
 * TFO connection with data exchanges.
 */

It's possible that the retransmit with TCP Fast Open happens as part
of the regular TCP state machine.  I can't find an explicit fallback
handler.

>> >> Above 4096 bytes, pretty much all recursive resolvers will send TC
>> >> responses even if the client offers a larger buffer size.  This means
>> >> for correctness, you cannot do away with TCP support.
>> >
>> > In that case doing EDNS at all seems a lot less useful. Fragmentation
>> > is always a possibility above min MTU (essentially same limit as
>> > original UDP DNS) and the large responses are almost surely things you
>> > do want to avoid forgery on, which leads me back around to thinking
>> > that if you want them you really really need to be running a local
>> > DNSSEC validating nameserver and then can just use-vc...
>> 
>> Why use use-vc at all?  Some software *will* break because it assumes
>> that certain libc calls do not keep open some random file descriptor.
>
> Does use-vc do that (keep the fd open) in glibc? It doesn't seem to be
> documented that way, just as forcing use of tcp, and my intent was not
> to keep any fd open (since you need a separate fd per query anyway to
> do them in parallel or in case the server closes the socket after one
> reply).

Sorry, I thought you wanted to keep the connection open to reduce
latency.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-05-02 15:28                                                             ` Florian Weimer
@ 2020-05-02 15:44                                                               ` Rich Felker
  2020-05-02 22:52                                                                 ` Bartosz Brachaczek
  2020-05-03 18:18                                                                 ` Florian Weimer
  0 siblings, 2 replies; 28+ messages in thread
From: Rich Felker @ 2020-05-02 15:44 UTC (permalink / raw)
  To: Florian Weimer; +Cc: musl

On Sat, May 02, 2020 at 05:28:48PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Tue, Apr 21, 2020 at 07:26:08PM +0200, Florian Weimer wrote:
> >> * Rich Felker:
> >> 
> >> >> I'm excited that Fedora plans to add a local caching resolver by
> >> >> default.  It will help with a lot of these issues.
> >> >
> >> > That's great news! Will it be DNSSEC-enforcing by default?
> >> 
> >> No.  It is currently not even DNSSEC-aware, in the sense that you
> >> can't get any DNSSEC data from it.  That's the sad part.
> >
> > That's really disappointing. Why? Both systemd-resolved and dnsmasq,
> > the two reasonable (well, reasonable for distros using systemd already
> > in the systemd-resolved case :) options for this, support DNSSEC fully
> > as I understand it. Is it just being turned off by default because of
> > risk of breaking things, or is some other implementation that lacks
> > DNSSEC being used?
> 
> It's systemd-resolved.  As far as I can tell, it does not provide
> DNSSEC data on the DNS client interface.

According to this it does:

https://wiki.archlinux.org/index.php/Systemd-resolved#DNSSEC

However it's subject to downgrade attacks unless you edit a config
file. Note that the example shows:

    ....
    -- Data is authenticated: yes

so it looks like it's setting the AD bit like it should.

> > What I mean is that, if you use MSG_FASTOPEN on a kernel new enough to
> > understand it, I think it makes a normal TCP connection and sends the
> > data if fastopen is not enabled or not supported by the remote host,
> > but uses fastopen as long as it's enabled and supported. In this sense
> > it's automatic. But of course we'd have to fallback explicitly anyway
> > if it's not supported in order to maintain compatibility with older
> > kernels.
> 
> I found this in the kernel sources.  It's a bit worrying.
> 
> /*
>  * The following code block is to deal with middle box issues with TFO:
>  * Middlebox firewall issues can potentially cause server's data being
>  * blackholed after a successful 3WHS using TFO.
>  * The proposed solution is to disable active TFO globally under the
>  * following circumstances:
>  *   1. client side TFO socket receives out of order FIN
>  *   2. client side TFO socket receives out of order RST
>  *   3. client side TFO socket has timed out three times consecutively during
>  *      or after handshake
>  * We disable active side TFO globally for 1hr at first. Then if it
>  * happens again, we disable it for 2h, then 4h, 8h, ...
>  * And we reset the timeout back to 1hr when we see a successful active
>  * TFO connection with data exchanges.
>  */
> 
> It's possible that the retransmit with TCP Fast Open happens as part
> of the regular TCP state machine.  I can't find an explicit fallback
> handler.

I'm not sure what you're saying. Fastopen is only tried initially if
the kernel previously got a TCP header from the remote host indicating
support for it (and providing a cookie -- the kernel should have an
option to only accept zero-length cookies since anything else is a
tracking-vector/privacy-risk, but I'm not aware of such an option). If
not available for the particular host, or not at all (due to the above
global-disable heuristic or configuration), AIUI it just initially
does normal TCP and puts the payload in the send buffer.

> >> >> Above 4096 bytes, pretty much all recursive resolvers will send TC
> >> >> responses even if the client offers a larger buffer size.  This means
> >> >> for correctness, you cannot do away with TCP support.
> >> >
> >> > In that case doing EDNS at all seems a lot less useful. Fragmentation
> >> > is always a possibility above min MTU (essentially same limit as
> >> > original UDP DNS) and the large responses are almost surely things you
> >> > do want to avoid forgery on, which leads me back around to thinking
> >> > that if you want them you really really need to be running a local
> >> > DNSSEC validating nameserver and then can just use-vc...
> >> 
> >> Why use use-vc at all?  Some software *will* break because it assumes
> >> that certain libc calls do not keep open some random file descriptor.
> >
> > Does use-vc do that (keep the fd open) in glibc? It doesn't seem to be
> > documented that way, just as forcing use of tcp, and my intent was not
> > to keep any fd open (since you need a separate fd per query anyway to
> > do them in parallel or in case the server closes the socket after one
> > reply).
> 
> Sorry, I thought you wanted to keep the connection open to reduce
> latency.

No, the intent is that users only use this with localhost where the
result can be trusted and the latency is trivial and in theory can be
optimized out. (Kernel can in theory do the whole handshake with itself
during the connect syscall or just high-level emulate it if you didn't
care about seeing it on "tcpdump -i lo". Not sure if it does this but
I wouldn't be surprised.)

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-05-02 15:44                                                               ` Rich Felker
@ 2020-05-02 22:52                                                                 ` Bartosz Brachaczek
  2020-05-03  8:46                                                                   ` Florian Weimer
  2020-05-03 18:18                                                                 ` Florian Weimer
  1 sibling, 1 reply; 28+ messages in thread
From: Bartosz Brachaczek @ 2020-05-02 22:52 UTC (permalink / raw)
  To: musl


[-- Attachment #1: Type: text/plain, Size: 1538 bytes --]

On Sat, May 2, 2020 at 5:44 PM Rich Felker <dalias@libc.org> wrote:

> On Sat, May 02, 2020 at 05:28:48PM +0200, Florian Weimer wrote:
> > * Rich Felker:
> >
> > > On Tue, Apr 21, 2020 at 07:26:08PM +0200, Florian Weimer wrote:
> > >> * Rich Felker:
> > >>
> > >> >> I'm excited that Fedora plans to add a local caching resolver by
> > >> >> default.  It will help with a lot of these issues.
> > >> >
> > >> > That's great news! Will it be DNSSEC-enforcing by default?
> > >>
> > >> No.  It is currently not even DNSSEC-aware, in the sense that you
> > >> can't get any DNSSEC data from it.  That's the sad part.
> > >
> > > That's really disappointing. Why? Both systemd-resolved and dnsmasq,
> > > the two reasonable (well, reasonable for distros using systemd already
> > > in the systemd-resolved case :) options for this, support DNSSEC fully
> > > as I understand it. Is it just being turned off by default because of
> > > risk of breaking things, or is some other implementation that lacks
> > > DNSSEC being used?
> >
> > It's systemd-resolved.  As far as I can tell, it does not provide
> > DNSSEC data on the DNS client interface.
>
> According to this it does:
>
> https://wiki.archlinux.org/index.php/Systemd-resolved#DNSSEC
>
> However it's subject to downgrade attacks unless you edit a config
> file. Note that the example shows:
>
>     ....
>     -- Data is authenticated: yes
>
> so it looks like it's setting the AD bit like it should.
>

Relevant info:
https://fedoraproject.org/wiki/Changes/systemd-resolved#DNSSEC

[-- Attachment #2: Type: text/html, Size: 2314 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-05-02 22:52                                                                 ` Bartosz Brachaczek
@ 2020-05-03  8:46                                                                   ` Florian Weimer
  2020-05-03 16:51                                                                     ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2020-05-03  8:46 UTC (permalink / raw)
  To: musl

* Bartosz Brachaczek:

> On Sat, May 2, 2020 at 5:44 PM Rich Felker <dalias@libc.org> wrote:
>
>> On Sat, May 02, 2020 at 05:28:48PM +0200, Florian Weimer wrote:
>> > * Rich Felker:
>> >
>> > > On Tue, Apr 21, 2020 at 07:26:08PM +0200, Florian Weimer wrote:
>> > >> * Rich Felker:
>> > >>
>> > >> >> I'm excited that Fedora plans to add a local caching resolver by
>> > >> >> default.  It will help with a lot of these issues.
>> > >> >
>> > >> > That's great news! Will it be DNSSEC-enforcing by default?
>> > >>
>> > >> No.  It is currently not even DNSSEC-aware, in the sense that you
>> > >> can't get any DNSSEC data from it.  That's the sad part.
>> > >
>> > > That's really disappointing. Why? Both systemd-resolved and dnsmasq,
>> > > the two reasonable (well, reasonable for distros using systemd already
>> > > in the systemd-resolved case :) options for this, support DNSSEC fully
>> > > as I understand it. Is it just being turned off by default because of
>> > > risk of breaking things, or is some other implementation that lacks
>> > > DNSSEC being used?
>> >
>> > It's systemd-resolved.  As far as I can tell, it does not provide
>> > DNSSEC data on the DNS client interface.
>>
>> According to this it does:
>>
>> https://wiki.archlinux.org/index.php/Systemd-resolved#DNSSEC
>>
>> However it's subject to downgrade attacks unless you edit a config
>> file. Note that the example shows:
>>
>>     ....
>>     -- Data is authenticated: yes
>>
>> so it looks like it's setting the AD bit like it should.
>>
>
> Relevant info:
> https://fedoraproject.org/wiki/Changes/systemd-resolved#DNSSEC

This section talks about DNSSEC validation.  As far as I can tell,
running systemd-resolved as the stub resolver prevents applications
from accessing DNSSEC data and doing their own validation (or just
looking add DNSSEC record types), independently of how
systemd-resolved is built and configured.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-05-03  8:46                                                                   ` Florian Weimer
@ 2020-05-03 16:51                                                                     ` Rich Felker
  2020-05-03 17:19                                                                       ` Florian Weimer
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2020-05-03 16:51 UTC (permalink / raw)
  To: Florian Weimer; +Cc: musl

On Sun, May 03, 2020 at 10:46:55AM +0200, Florian Weimer wrote:
> * Bartosz Brachaczek:
> 
> > On Sat, May 2, 2020 at 5:44 PM Rich Felker <dalias@libc.org> wrote:
> >
> >> On Sat, May 02, 2020 at 05:28:48PM +0200, Florian Weimer wrote:
> >> > * Rich Felker:
> >> >
> >> > > On Tue, Apr 21, 2020 at 07:26:08PM +0200, Florian Weimer wrote:
> >> > >> * Rich Felker:
> >> > >>
> >> > >> >> I'm excited that Fedora plans to add a local caching resolver by
> >> > >> >> default.  It will help with a lot of these issues.
> >> > >> >
> >> > >> > That's great news! Will it be DNSSEC-enforcing by default?
> >> > >>
> >> > >> No.  It is currently not even DNSSEC-aware, in the sense that you
> >> > >> can't get any DNSSEC data from it.  That's the sad part.
> >> > >
> >> > > That's really disappointing. Why? Both systemd-resolved and dnsmasq,
> >> > > the two reasonable (well, reasonable for distros using systemd already
> >> > > in the systemd-resolved case :) options for this, support DNSSEC fully
> >> > > as I understand it. Is it just being turned off by default because of
> >> > > risk of breaking things, or is some other implementation that lacks
> >> > > DNSSEC being used?
> >> >
> >> > It's systemd-resolved.  As far as I can tell, it does not provide
> >> > DNSSEC data on the DNS client interface.
> >>
> >> According to this it does:
> >>
> >> https://wiki.archlinux.org/index.php/Systemd-resolved#DNSSEC
> >>
> >> However it's subject to downgrade attacks unless you edit a config
> >> file. Note that the example shows:
> >>
> >>     ....
> >>     -- Data is authenticated: yes
> >>
> >> so it looks like it's setting the AD bit like it should.
> >>
> >
> > Relevant info:
> > https://fedoraproject.org/wiki/Changes/systemd-resolved#DNSSEC
> 
> This section talks about DNSSEC validation.  As far as I can tell,
> running systemd-resolved as the stub resolver prevents applications
> from accessing DNSSEC data and doing their own validation (or just
> looking add DNSSEC record types), independently of how
> systemd-resolved is built and configured.

Normally applications don't want to do their own DNSSEC validation,
just get back a valid AD flag indicating that the trusted nameserver
did it, and AIUI it works with systemd-resolved, but indeed with a
non-broken nameserver it should still be possible for the application
to do it. Are you saying that, if you request full DNSSEC data with
EDNS0 DO flag, systemd-resolved refuses to give it? Does dig
+trace/+dnssec fail to work with it?

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-05-03 16:51                                                                     ` Rich Felker
@ 2020-05-03 17:19                                                                       ` Florian Weimer
  0 siblings, 0 replies; 28+ messages in thread
From: Florian Weimer @ 2020-05-03 17:19 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> Normally applications don't want to do their own DNSSEC validation,
> just get back a valid AD flag indicating that the trusted nameserver
> did it, and AIUI it works with systemd-resolved, but indeed with a
> non-broken nameserver it should still be possible for the application
> to do it. Are you saying that, if you request full DNSSEC data with
> EDNS0 DO flag, systemd-resolved refuses to give it? Does dig
> +trace/+dnssec fail to work with it?

The answer to a EDNS0 query does not contain any DNSSEC data even if
the DO bit is set.  The data in the additional section of such
responses seems to be corrupted (dig reports parse errors).  The CD
flag in the query is apparently ignored.

(I still need to investigate the details, sorry.)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-05-02 15:44                                                               ` Rich Felker
  2020-05-02 22:52                                                                 ` Bartosz Brachaczek
@ 2020-05-03 18:18                                                                 ` Florian Weimer
  2020-05-03 19:09                                                                   ` Rich Felker
  1 sibling, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2020-05-03 18:18 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> I'm not sure what you're saying. Fastopen is only tried initially if
> the kernel previously got a TCP header from the remote host indicating
> support for it (and providing a cookie -- the kernel should have an
> option to only accept zero-length cookies since anything else is a
> tracking-vector/privacy-risk, but I'm not aware of such an option). If
> not available for the particular host, or not at all (due to the above
> global-disable heuristic or configuration), AIUI it just initially
> does normal TCP and puts the payload in the send buffer.

I find the global off switch a bit odd.  The implementation doesn't
really seem fully worked out to me.

>> >> Why use use-vc at all?  Some software *will* break because it assumes
>> >> that certain libc calls do not keep open some random file descriptor.
>> >
>> > Does use-vc do that (keep the fd open) in glibc? It doesn't seem to be
>> > documented that way, just as forcing use of tcp, and my intent was not
>> > to keep any fd open (since you need a separate fd per query anyway to
>> > do them in parallel or in case the server closes the socket after one
>> > reply).
>> 
>> Sorry, I thought you wanted to keep the connection open to reduce
>> latency.
>
> No, the intent is that users only use this with localhost where the
> result can be trusted and the latency is trivial and in theory can be
> optimized out.

Can't you do DNS with really large packet sizes on localhost?

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000

That's the one place where TCP does not make much sense, except to get
the last 30 or so bytes in packet size.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-05-03 18:18                                                                 ` Florian Weimer
@ 2020-05-03 19:09                                                                   ` Rich Felker
  2020-05-03 19:34                                                                     ` Florian Weimer
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2020-05-03 19:09 UTC (permalink / raw)
  To: Florian Weimer; +Cc: musl

On Sun, May 03, 2020 at 08:18:42PM +0200, Florian Weimer wrote:
> >> >> Why use use-vc at all?  Some software *will* break because it assumes
> >> >> that certain libc calls do not keep open some random file descriptor.
> >> >
> >> > Does use-vc do that (keep the fd open) in glibc? It doesn't seem to be
> >> > documented that way, just as forcing use of tcp, and my intent was not
> >> > to keep any fd open (since you need a separate fd per query anyway to
> >> > do them in parallel or in case the server closes the socket after one
> >> > reply).
> >> 
> >> Sorry, I thought you wanted to keep the connection open to reduce
> >> latency.
> >
> > No, the intent is that users only use this with localhost where the
> > result can be trusted and the latency is trivial and in theory can be
> > optimized out.
> 
> Can't you do DNS with really large packet sizes on localhost?
> 
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
> 
> That's the one place where TCP does not make much sense, except to get
> the last 30 or so bytes in packet size.

No, the protocol simply does not support it. Normal (non-EDNS) DNS
protocol forbids UDP packets over 512 bytes. A nameserver that replied
with them rather than with TC would be non-conforming and would break
conforming clients that expect to see the TC rather than a short read.
With EDNS0 longer packets can be sent but I think there's still a
limit of 4096 bytes or something. I don't understand this entirely so
I may be wrong and it may be possible to just support EDNS0 and say
"run a server with 64k EDNS0 limit on localhost if you want to
guarantee non-truncated replies".

However, this also runs into the issue that you have to mangle the
query (by adding OPT to it) and get back a response that's not going
to look like what the application requested (since it has extra OPT,
etc. in it) unless you do the work to reverse that and reformat the
response to look like what the application would have received if it
were using normal DNS protocol over TCP. So it's probably *more*
unwanted complexity and bug surface to do this right, even if it is
possible...

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-05-03 19:09                                                                   ` Rich Felker
@ 2020-05-03 19:34                                                                     ` Florian Weimer
  2020-05-03 19:45                                                                       ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2020-05-03 19:34 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

>> Can't you do DNS with really large packet sizes on localhost?
>> 
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
>> 
>> That's the one place where TCP does not make much sense, except to get
>> the last 30 or so bytes in packet size.
>
> No, the protocol simply does not support it. Normal (non-EDNS) DNS
> protocol forbids UDP packets over 512 bytes. A nameserver that replied
> with them rather than with TC would be non-conforming and would break
> conforming clients that expect to see the TC rather than a short read.
> With EDNS0 longer packets can be sent but I think there's still a
> limit of 4096 bytes or something. I don't understand this entirely so
> I may be wrong and it may be possible to just support EDNS0 and say
> "run a server with 64k EDNS0 limit on localhost if you want to
> guarantee non-truncated replies".

On localhost, one could just disregard the protocol limit, perhaps
with special configuration of the recursive resolver.  (The stub
resolver would not need configuration, it just has to accept the
packets if they arrive.)

The other option would be to use a UNIX Domain datagram socket instead
of UDP.  Since it is a new transport protocol, it's possible to make
up different rules about packet sizes.

(Even on localhost, TCP has some denial-of-service issues not shared
by datagram transports, so there might be some other benefit of avoid
TCP, not just reduction in implementation complexity.)

> However, this also runs into the issue that you have to mangle the
> query (by adding OPT to it) and get back a response that's not going
> to look like what the application requested (since it has extra OPT,
> etc. in it) unless you do the work to reverse that and reformat the
> response to look like what the application would have received if it
> were using normal DNS protocol over TCP. So it's probably *more*
> unwanted complexity and bug surface to do this right, even if it is
> possible...

True, if query mangling is required *and* TCP fallback is needed in
some cases, there is little incentive to do this from a complexity
point of view.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [musl] TCP support in the stub resolver
  2020-05-03 19:34                                                                     ` Florian Weimer
@ 2020-05-03 19:45                                                                       ` Rich Felker
  0 siblings, 0 replies; 28+ messages in thread
From: Rich Felker @ 2020-05-03 19:45 UTC (permalink / raw)
  To: Florian Weimer; +Cc: musl

On Sun, May 03, 2020 at 09:34:31PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> >> Can't you do DNS with really large packet sizes on localhost?
> >> 
> >> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
> >> 
> >> That's the one place where TCP does not make much sense, except to get
> >> the last 30 or so bytes in packet size.
> >
> > No, the protocol simply does not support it. Normal (non-EDNS) DNS
> > protocol forbids UDP packets over 512 bytes. A nameserver that replied
> > with them rather than with TC would be non-conforming and would break
> > conforming clients that expect to see the TC rather than a short read.
> > With EDNS0 longer packets can be sent but I think there's still a
> > limit of 4096 bytes or something. I don't understand this entirely so
> > I may be wrong and it may be possible to just support EDNS0 and say
> > "run a server with 64k EDNS0 limit on localhost if you want to
> > guarantee non-truncated replies".
> 
> On localhost, one could just disregard the protocol limit, perhaps
> with special configuration of the recursive resolver.  (The stub
> resolver would not need configuration, it just has to accept the
> packets if they arrive.)

No you can't because it's a permanent public interface contract. You
may have foreign-libc binaries or static linked binaries from before
that policy change or from a party who disagrees (rightly so) with
that policy change.

> The other option would be to use a UNIX Domain datagram socket instead
> of UDP.  Since it is a new transport protocol, it's possible to make
> up different rules about packet sizes.

Putting unix domain nameservers in resolv.conf directly would likewise
be incompatible with the above. You could do it in some way that they
don't see/care about, but then it's a matter of inventing new policy
mechanisms which musl explicitly seeks to avoid. (E.g. that's why we
used nscd protocol for alternate passwd/group backends rather than
NIH'ing something.)

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [musl] Re: Outgoing DANE not working
       [not found]                                 ` <20200414215951.GJ41308@straasha.imrryr.org>
@ 2020-05-19  1:37                                   ` Rich Felker
       [not found]                                     ` <20200519023814.GN68966@straasha.imrryr.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2020-05-19  1:37 UTC (permalink / raw)
  To: postfix-users; +Cc: musl

On Tue, Apr 14, 2020 at 05:59:51PM -0400, Viktor Dukhovni wrote:
> > > That RFC was published in 2013.  That's long enough ago.
> > 
> > We support environments that haven't been touched since 2009 or so,
> > and to a lesser/minimal-support extent ones that haven't been touched
> > since around 2004. Your idea of environments Postfix might be running
> > on musl in is very different from the concept of environments that
> > arbitrary applications binaries linked to musl might be running in.
> 
> Nevertheless, the AD bit is on by default in dig and similar tools, with
> no reports of any issues in a long time.  Do you see dig fail where MUSL
> libc lookups succeed?  I'm asking around the DNS community for any
> evidence of barriers to AD=1, so far nobody knows of any.  I'll try to
> find more compelling evidence, but basically tolerating AD=1 (either
> ignoring or acting on it per the 2013 RFC) is *required* resolver
> behaviour.
> 
> > So if there's any chance of this breaking there almost certainly needs
> > to be a way to turn it off that works even on static binaries.
> 
> Whether and where to place such controls is your call.  If novel
> /etc/resolv.conf options are not a problem for statically linked
> binaries using something other than musl-libc, then you could
> have:
> 
>     options noad ...
> 
> but if that is incompatible with other stub resolver libraries on the
> same machine, you may need a private musl-specific configuration file.
> 
> My money is on this being unnecessary.  I'll let know what I find
> from dns-operations, and if possible perhaps a RIPE ATLAS probe,
> assuming they support enabling AD=1.
> 
> > > In that case, set the AD bit unconditionally, or provide a documented
> > > mechanism to do so via a suitable configuration file.
> > 
> > Putting it in resolv.conf on an options line is probably the best. The
> > main remaining question is just which default to use, and where to
> > apply it (at res_mkquery or at res_send).
> 
> Your call.
> 
> > > Find me a resolver that fails when the AD bit is set.  Stub resolvers
> > > that always set it have been around for some time now.
> >
> > Do you know if the usual Windows, Android, iOS, etc. ones always set
> > it? If so it's almost surely safe to do so and this might not even
> > need to be an option (which would really be my favorite coarse of
> > action -- making it unconditional so there's no new configuration to
> > invent).
> 
> Mostly dig, unbound-host, ... Most of the platform C libraries support
> DO=1, which obviates the need for AD=1, so they don't do that, but it is
> nevertheless safe.  AD=1 is much cheaper than DO=1, because you get back
> just the AD bit without the excess RRSIG baggage, which is not needed
> when you're not doing your own validation.

I have a proposed solution expected to go upstream in this release
cycle: res_* set AD bit unconditionally in outgoing queries, but the
[backend for the] netdb.h functions clears it after calling
__res_mkquery.

This ensures that even if there are some broken nameservers/networks
still that can't handle AD in queries, the standard, widely-used,
high-level lookup APIs will still work, and at worst res_query breaks.

Note that the netdb.h functions have no use for the AD bit and no way
to pass it back to the caller, so there is no reduction in
functionality by having them clear it.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [musl] Re: Outgoing DANE not working
       [not found]                                     ` <20200519023814.GN68966@straasha.imrryr.org>
@ 2020-05-19  5:44                                       ` Rich Felker
       [not found]                                         ` <20200519090610.GO68966@straasha.imrryr.org>
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2020-05-19  5:44 UTC (permalink / raw)
  To: postfix-users; +Cc: musl

On Mon, May 18, 2020 at 10:38:14PM -0400, Viktor Dukhovni wrote:
> On Mon, May 18, 2020 at 09:37:36PM -0400, Rich Felker wrote:
> 
> > > Mostly dig, unbound-host, ... Most of the platform C libraries support
> > > DO=1, which obviates the need for AD=1, so they don't do that, but it is
> > > nevertheless safe.  AD=1 is much cheaper than DO=1, because you get back
> > > just the AD bit without the excess RRSIG baggage, which is not needed
> > > when you're not doing your own validation.
> > 
> > I have a proposed solution expected to go upstream in this release
> > cycle: res_* set AD bit unconditionally in outgoing queries, but the
> > [backend for the] netdb.h functions clears it after calling
> > __res_mkquery.
> > 
> > This ensures that even if there are some broken nameservers/networks
> > still that can't handle AD in queries, the standard, widely-used,
> > high-level lookup APIs will still work, and at worst res_query breaks.
> > 
> > Note that the netdb.h functions have no use for the AD bit and no way
> > to pass it back to the caller, so there is no reduction in
> > functionality by having them clear it.
> 
> This sounds reasonable.  Will there be a way for Postfix to detect the
> new library version, so that we don't disable DANE for musl-libc
> versions that do set the AD bit?

I'm really disappointed with the detection, which made things much
worse by producing postfix builds that won't do DANE even after
libc.so is upgraded. It should have just worked after upgrade. The
test is also somewhat broken in that it gets the wrong result if
/bin/sh is static-linked, or if you have postfix built against musl on
a system where /bin/sh is glibc-based, etc. and I don't even know what
happens if you're cross-compiling or if that's even supported at all.

There's not really a "test for versions that do set" by version; I
would expect once the patch is upstream and tested, distros like
Alpine would just apply it to their existing musl package rather than
waiting to upgrade to get it. The only real test is a runtime one,
calling res_mkquery and observing that it's set.

BTW I saw in git master you added an additional musl test of the same
form for the res_n* APIs. A simpler way to detect them is just with
__RES macro in resolv.h, which indicates the supported API version.
AIUI it's provided by all known implementations, though I haven't
actually checked that.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [musl] Re: Outgoing DANE not working
       [not found]                                         ` <20200519090610.GO68966@straasha.imrryr.org>
@ 2020-05-19 14:00                                           ` Rich Felker
  2020-05-19 14:23                                             ` Wietse Venema
  0 siblings, 1 reply; 28+ messages in thread
From: Rich Felker @ 2020-05-19 14:00 UTC (permalink / raw)
  To: postfix-users; +Cc: musl

On Tue, May 19, 2020 at 05:06:10AM -0400, Viktor Dukhovni wrote:
> On Tue, May 19, 2020 at 01:44:30AM -0400, Rich Felker wrote:
> 
> > > This sounds reasonable.  Will there be a way for Postfix to detect the
> > > new library version, so that we don't disable DANE for musl-libc
> > > versions that do set the AD bit?
> > 
> > I'm really disappointed with the detection, which made things much
> > worse by producing postfix builds that won't do DANE even after
> > libc.so is upgraded. It should have just worked after upgrade.
> 
> We have no choice, we can't ship code that silently fails to honour its
> configuration.  I'm not worried about DANE "working", I'm worried about
> DANE *not* working, and the user being none-the-wiser.
> 
> When remote TLSA RRs are published, and DANE is enabled in Postfix, the
> mail must be delivered securely, or the delivery attempt MUST fail.
> 
> > The test is also somewhat broken in that it gets the wrong result if
> > /bin/sh is static-linked, or if you have postfix built against musl on
> > a system where /bin/sh is glibc-based, etc. and I don't even know what
> > happens if you're cross-compiling or if that's even supported at all.
> 
> A better test would be appreciated.  Glibc has GLIBC_PREREQ macros,
> we haven't found anything similar for MUSL.

The is fundamentally no build-time test possible for this. Even if we
were willing to make flags for each bug (or missing feature) that was
ever fixed indicating the change, that would only tell you whether the
version present at build time had the property, not whether the
version present at runtime does. With a distro, unless the distro
manually makes their package file depend on a particular version of
the distro libc package (which they can do very well, since they know
what patches they applied), it's possible for a user to upgrade
postfix but not libc. And for a user building everything from source
manually, that work's on them.

> > There's not really a "test for versions that do set" by version; I
> > would expect once the patch is upstream and tested, distros like
> > Alpine would just apply it to their existing musl package rather than
> > waiting to upgrade to get it. The only real test is a runtime one,
> > calling res_mkquery and observing that it's set.
> 
> Sorry, no such test is possible.  There is no reliable canary domain to
> query, and DANE should in any case also work in domains disconnected
> from the Internet, with locally configured trust-anchors.

Canary domain is not needed for a runtime test. All that's needed is a
call to res_mkquery with a dummy domain and inspection of buf[3]. (Of
course you can also just set the bit at query time in exactly the same
manner, but that only works for replacing res_query not res_search. I
don't understand why you're using res_search at all in mail software
though.)

> > BTW I saw in git master you added an additional musl test of the same
> > form for the res_n* APIs. A simpler way to detect them is just with
> > __RES macro in resolv.h, which indicates the supported API version.
> > AIUI it's provided by all known implementations, though I haven't
> > actually checked that.
> 
> Robust detection of MUSL features at build time would be much
> appreciated.  Precludes any tests that depend on live DNS queries.
> The tests need to *statically* test the features of the platform's
> C library.

This is an area of open cross-implementation effort to provide
something meaningful. See the thread here:
https://www.openwall.com/lists/libc-coord/2020/04/22/1

Note that the "Possibly this should include the semantics for
definition with a value of 0 ("supported for compilation and might or
might not be supported at runtime")..." text applies here since a
static build-time test does not suffice here.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [musl] Re: Outgoing DANE not working
  2020-05-19 14:00                                           ` Rich Felker
@ 2020-05-19 14:23                                             ` Wietse Venema
  2020-05-19 14:28                                               ` Rich Felker
  0 siblings, 1 reply; 28+ messages in thread
From: Wietse Venema @ 2020-05-19 14:23 UTC (permalink / raw)
  To: Rich Felker; +Cc: postfix-users, musl

Rich Felker:
> The is fundamentally no build-time test possible for this. Even if we
> were willing to make flags for each bug (or missing feature) that was
> ever fixed indicating the change, that would only tell you whether the
> version present at build time had the property, not whether the
> version present at runtime does. With a distro, unless the distro

If you can provide a libc-musl runtime __version variable, then
Postfix can at run time determine that the library supports the
necessary functionality, and enable/disable DANE accordingly.

	Wietse

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [musl] Re: Outgoing DANE not working
  2020-05-19 14:23                                             ` Wietse Venema
@ 2020-05-19 14:28                                               ` Rich Felker
  0 siblings, 0 replies; 28+ messages in thread
From: Rich Felker @ 2020-05-19 14:28 UTC (permalink / raw)
  To: Wietse Venema; +Cc: postfix-users, musl

On Tue, May 19, 2020 at 10:23:18AM -0400, Wietse Venema wrote:
> Rich Felker:
> > The is fundamentally no build-time test possible for this. Even if we
> > were willing to make flags for each bug (or missing feature) that was
> > ever fixed indicating the change, that would only tell you whether the
> > version present at build time had the property, not whether the
> > version present at runtime does. With a distro, unless the distro
> 
> If you can provide a libc-musl runtime __version variable, then
> Postfix can at run time determine that the library supports the
> necessary functionality, and enable/disable DANE accordingly.

We've been over this countless times from folks requesting version
numbers. A version number does not tell you what you want to know.
Distros will patch the functionality into whatever version they're
shipping. A 1.1.25 (if it ever happens) will likely have the patch
backported (just applied; no conflict). Querying features has to be
done on a per-feature basis not based on version numbers. See the
proposal on libc-coord.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [musl] Re: Outgoing DANE not working
       [not found] ` <49RN803wcfzJrNv@spike.porcupine.org>
@ 2020-05-19 20:08   ` Rich Felker
  0 siblings, 0 replies; 28+ messages in thread
From: Rich Felker @ 2020-05-19 20:08 UTC (permalink / raw)
  To: Postfix users; +Cc: musl

On Tue, May 19, 2020 at 01:25:52PM -0400, Wietse Venema wrote:
> Rich Felker:
> > On Tue, May 19, 2020 at 11:11:56AM -0400, Wietse Venema wrote:
> > > Rich Felker:
> > > > On Tue, May 19, 2020 at 10:23:18AM -0400, Wietse Venema wrote:
> > > > > Rich Felker:
> > > > > > The is fundamentally no build-time test possible for this. Even if we
> > > > > > were willing to make flags for each bug (or missing feature) that was
> > > > > > ever fixed indicating the change, that would only tell you whether the
> > > > > > version present at build time had the property, not whether the
> > > > > > version present at runtime does. With a distro, unless the distro
> > > > > 
> > > > > If you can provide a libc-musl runtime __version variable, then
> > > > > Postfix can at run time determine that the library supports the
> > > > > necessary functionality, and enable/disable DANE accordingly.
> > > > 
> > > > We've been over this countless times from folks requesting version
> > > > numbers. A version number does not tell you what you want to know.
> > > > Distros will patch the functionality into whatever version they're
> > > > shipping. A 1.1.25 (if it ever happens) will likely have the patch
> > > > backported (just applied; no conflict). Querying features has to be
> > > > done on a per-feature basis not based on version numbers. See the
> > > > proposal on libc-coord.
> > > 
> > > Do let us know when libc-musl provides an indication whether a DNS
> > > lookup result is authentic (DNSSEC pass).
> >
> > It is now in master. I've also recommended the patch to Alpine.
> 
> A pointer to how one would use the updated code would be welcome,
> perhaps a pointer to the submit message.

https://git.musl-libc.org/cgit/musl/commit/?id=fd7ec068efd590c0393a612599a4fab9bb0a8633

> I won't comment on distro maintainers who willingly break Postfix's
> security guarantees of DANE, without informing the user.

I'm not encouraging any to do that; rather I've encouraged them to
take measures to both:

(1) ensure that DANE is not silently ignored, by either patching
Postfix to work with old musl (prior to the above commit) or patching
the musl package and adding a dependency from the postfix package on
the updated musl package, and:

(2) not ship Postfix packages with DNSSEC/DANE disabled, because that
would encourage admins to switch DANE off in their config files to
"fix the breakage" after upgrading, then forget to turn it back on
once updated packages are available to make it work.

I haven't been through this with other distros yet, but Alpine folks
were committed to both of these principles.

Rich

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, back to index

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <fce05ab0ed102dec10e4163dd4ce5d8095d2ffd7.camel@web.de>
     [not found] ` <20200412211807.GC41308@straasha.imrryr.org>
     [not found]   ` <d64b1b8801cc5350e9d27dd109dd2446e7d4b860.camel@web.de>
     [not found]     ` <20200413024746.GD41308@straasha.imrryr.org>
     [not found]       ` <b38668e94b2781003a14c6dca3d41edf33e347e2.camel@web.de>
     [not found]         ` <A2FE67B5-A9A9-4A0F-A59D-78FF2AB992B7@dukhovni.org>
     [not found]           ` <f79a9f0c369607fc38bef06fec521eaf3ab23d8c.camel@web.de>
     [not found]             ` <6E8A9D4F-18CE-4ADA-A5B4-D14DB30C99E5@dukhovni.org>
     [not found]               ` <25e70f31f0c4629f7a7d3957649d08be06144067.camel@web.de>
     [not found]                 ` <CECAFB36-DA1B-4EFB-ACD1-294E3B121B2E@dukhovni.org>
2020-04-13 18:35                   ` [musl] Re: Outgoing DANE not working Rich Felker
     [not found]                     ` <20200413190412.GF41308@straasha.imrryr.org>
     [not found]                       ` <20200413193505.GY11469@brightrain.aerifal.cx>
     [not found]                         ` <20200413214138.GG41308@straasha.imrryr.org>
     [not found]                           ` <20200414035303.GZ11469@brightrain.aerifal.cx>
     [not found]                             ` <87v9m0hdjk.fsf@mid.deneb.enyo.de>
     [not found]                               ` <20200415180149.GH11469@brightrain.aerifal.cx>
     [not found]                                 ` <87imi0haf7.fsf@mid.deneb.enyo.de>
     [not found]                                   ` <20200417034059.GF11469@brightrain.aerifal.cx>
     [not found]                                     ` <878siucvqd.fsf@mid.deneb.enyo.de>
2020-04-17 16:07                                       ` Rich Felker
2020-04-18 17:14                                         ` [musl] TCP support in the stub resolver (was: Re: Outgoing DANE not working) Florian Weimer
2020-04-19  0:03                                           ` Rich Felker
2020-04-19  8:12                                             ` [musl] TCP support in the stub resolver Florian Weimer
2020-04-20  1:24                                               ` Rich Felker
2020-04-20  6:26                                                 ` Florian Weimer
2020-04-20 17:39                                                   ` Rich Felker
2020-04-21  9:48                                                     ` Florian Weimer
2020-04-21 15:02                                                       ` Rich Felker
2020-04-21 17:26                                                         ` Florian Weimer
2020-05-01 22:02                                                           ` Rich Felker
2020-05-02 15:28                                                             ` Florian Weimer
2020-05-02 15:44                                                               ` Rich Felker
2020-05-02 22:52                                                                 ` Bartosz Brachaczek
2020-05-03  8:46                                                                   ` Florian Weimer
2020-05-03 16:51                                                                     ` Rich Felker
2020-05-03 17:19                                                                       ` Florian Weimer
2020-05-03 18:18                                                                 ` Florian Weimer
2020-05-03 19:09                                                                   ` Rich Felker
2020-05-03 19:34                                                                     ` Florian Weimer
2020-05-03 19:45                                                                       ` Rich Felker
     [not found]                             ` <20200414061620.GI41308@straasha.imrryr.org>
     [not found]                               ` <20200414160641.GC11469@brightrain.aerifal.cx>
     [not found]                                 ` <20200414215951.GJ41308@straasha.imrryr.org>
2020-05-19  1:37                                   ` [musl] Re: Outgoing DANE not working Rich Felker
     [not found]                                     ` <20200519023814.GN68966@straasha.imrryr.org>
2020-05-19  5:44                                       ` Rich Felker
     [not found]                                         ` <20200519090610.GO68966@straasha.imrryr.org>
2020-05-19 14:00                                           ` Rich Felker
2020-05-19 14:23                                             ` Wietse Venema
2020-05-19 14:28                                               ` Rich Felker
     [not found] <20200519154542.GC1079@brightrain.aerifal.cx>
     [not found] ` <49RN803wcfzJrNv@spike.porcupine.org>
2020-05-19 20:08   ` Rich Felker

mailing list of musl libc

Archives are clonable: git clone --mirror http://inbox.vuxu.org/musl

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.musl


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git