From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS
	autolearn=ham autolearn_force=no version=3.4.2
Received: (qmail 13667 invoked from network); 21 Apr 2020 15:02:55 -0000
Received-SPF:  pass (mother.openwall.net: domain of lists.openwall.com
  designates 195.42.179.200 as permitted sender)
  receiver=inbox.vuxu.org; client-ip=195.42.179.200
  envelope-from=<musl-return-15783-ml=inbox.vuxu.org@lists.openwall.com>
Received: from mother.openwall.net (195.42.179.200)
  by inbox.vuxu.org with UTF8ESMTPZ; 21 Apr 2020 15:02:55 -0000
Received: (qmail 32661 invoked by uid 550); 21 Apr 2020 15:02:53 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 32634 invoked from network); 21 Apr 2020 15:02:52 -0000
Date: Tue, 21 Apr 2020 11:02:41 -0400
From: Rich Felker <dalias@libc.org>
To: Florian Weimer <fw@deneb.enyo.de>
Cc: musl@lists.openwall.com
Message-ID: <20200421150241.GL11469@brightrain.aerifal.cx>
References: <20200417034059.GF11469@brightrain.aerifal.cx>
 <878siucvqd.fsf@mid.deneb.enyo.de>
 <20200417160726.GG11469@brightrain.aerifal.cx>
 <87o8ro67in.fsf_-_@mid.deneb.enyo.de>
 <20200419000347.GU11469@brightrain.aerifal.cx>
 <871roj51x3.fsf@mid.deneb.enyo.de>
 <20200420012441.GW11469@brightrain.aerifal.cx>
 <87a736y8nu.fsf@mid.deneb.enyo.de>
 <20200420173920.GD11469@brightrain.aerifal.cx>
 <87mu75uq3p.fsf@mid.deneb.enyo.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <87mu75uq3p.fsf@mid.deneb.enyo.de>
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: Re: [musl] TCP support in the stub resolver

On Tue, Apr 21, 2020 at 11:48:10AM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Mon, Apr 20, 2020 at 08:26:45AM +0200, Florian Weimer wrote:
> >> * Rich Felker:
> >> 
> >> > On Sun, Apr 19, 2020 at 10:12:56AM +0200, Florian Weimer wrote:
> >> >> * Rich Felker:
> >> >> 
> >> >> >> No, you can reuse the connection for the second query (in most cases).
> >> >> >> However, for maximum robustness, you should not send the second query
> >> >> >> until the first response has arrived (no pipelining).  You may still
> >> >> >> need a new connection for the second query if the TCP stream ends
> >> >> >> without a response, though.
> >> >> >
> >> >> > That's why you need one per request -- so you can make them
> >> >> > concurrently (can't assume pipelining).
> >> >> 
> >> >> Since the other query has likely already been cached in the recursive
> >> >> resolver due to the UDP query (which is already in progress), the
> >> >> second TCP query only saves one round-trip, I think.  Is that really
> >> >> worth it?
> >> >
> >> > If the nameserver is not local, absolutely. A round trip can be over
> >> > 500 ms.
> >> 
> >> Sure, but you have to put this into context. In this situation, you
> >> already need three roundtrips (UDP query, TCP handshake, TCP query).
> >> The other TCP handshake increases the packet count quite noticeably.
> >
> > Yes but they happen concurrently. Of course it's possible that you
> > have bandwidth so low that latency is affected by throughput, but I
> > think the far more common case nowadays is moderately fast connection
> > (3G cellular, possibly rate-limited, or DSL) but saturated with other
> > much-higher-volume traffic causing the latency.
> 
> I'm not sure.  It should be possible to measure this.
> 
> Generally, once you have to use TCP, performance will not be good in
> any case, especially if the recursive resolver is not local.
> 
> I'm excited that Fedora plans to add a local caching resolver by
> default.  It will help with a lot of these issues.

That's great news! Will it be DNSSEC-enforcing by default?

> > BTW, am I mistaken or can TCP fastopen make it so you can get a DNS
> > reply with no additional round-trips? (query in the payload with
> > fastopen, response sent immediately after SYN-ACK before receiving ACK
> > from client, and nobody has to wait for connection to be closed) Of
> > course there are problems with fastopen that lead to it often being
> > disabled so it's not a full substitute for UDP.
> 
> There's no handshake to enable it, so it would have to be an
> /etc/resolv.conf setting.  It's also not clear how you would perform
> auto-detection that works across arbitrary middleboxen.  I don't think
> it's useful for an in-process stub resolver.

The kernel automatically does it, and AIUI automatically falls back to
normal TCP (sending the payload as a separate packet) if it's not
supported. It does this by remembering a cookie for the destination
which the destination advertised in an earlier connection.

Unfortunately the cookie system is a tracking vector (that pokes
through the anonymization of NAT/CGN), making it undesirable for
clients to accept any cookie but a zero-length one (which the spec
allows, but which requires separate DoS mitigations like
auto-disabling fastopen under too many concurrent attempts).

> > Do you think EDNS support eliminates the need for TCP?
> 
> There is a window of package sizes where it avoids TCP and works (from
> 512 to something between 1200 and 1500 bytes) *if* the recursive
> resolver does EDNS at all.  For decent compatibility, you would have
> to have heuristics in the stub resolver to figure out if
> FORMERR/NOTIMP and missing responses are due to lack of EDNS support.

I had in mind just the resolv.conf option, no fallback. Once you do
fallbacks things get slow, and it's an invisible/unreported slowness
until someone does tcpdump or strace and sees why...

> The other problem with EDNS is that for sizes on the large end
> (certainly above the MTU), it depends on fragmentation.  Fragmentation
> is completely insecure because in DNS packets, all the randomness is
> in one fragment, so packet spoofing only needs to guess the fragment
> ID (and the recipient IP stack will provide the UDP port for free).
> Some of us have been working on eliminating fragmented DNS responses
> for that reason, which unfortunately reduces the reach of EDNS
> somewhat.

Well DNS is completely insecure anyway unless you're validating DNSSEC
locally. Yes the fragmentation issue makes it a lot easier to blindly
spoof (as opposed to needing ability to intercept/MITM).

> Above 4096 bytes, pretty much all recursive resolvers will send TC
> responses even if the client offers a larger buffer size.  This means
> for correctness, you cannot do away with TCP support.

In that case doing EDNS at all seems a lot less useful. Fragmentation
is always a possibility above min MTU (essentially same limit as
original UDP DNS) and the large responses are almost surely things you
do want to avoid forgery on, which leads me back around to thinking
that if you want them you really really need to be running a local
DNSSEC validating nameserver and then can just use-vc...

> Some implementations have used a longer sequence of transports: DNS
> over UDP, EDNS over UDP, and finally TCP.  That avoids EDNS
> pseudo-negotiation until it is actually needed.  I'm not aware of any
> stub resolvers doing that, though.

Yeah, each fallback is just going to increase total latency though,
very badly if they're all remote.

Actually, the current musl approach adapted to this would be to just
do them all concurrently: DNS/UDP, EDNS/UDP, and DNS/TCP, and accept
the first answer that's not truncated or broken server
(servfail/formerr/notimp), basically same as we do now but with more
choices. But that's getting heavier on unwanted network traffic...

> (Things change if you connect to a local stub resolver, of course.)

Yes, and that's clearly the future. Which has me looking back towards
designing around the future with opt-in rather than trying to make
these queries for large RRsets work in broken insecure setups.

Rich