On Fri, Mar 8, 2024 at 2:54 PM Rich Felker <dalias@libc.org> wrote:

On Fri, Mar 08, 2024 at 01:55:18PM -0800, David Schinazi wrote:
> On Fri, Mar 8, 2024 at 12:31 PM Rich Felker <dalias@libc.org> wrote:
>
> > On Fri, Mar 08, 2024 at 11:15:52AM -0800, David Schinazi wrote:
> > > On Fri, Mar 8, 2024 at 5:30 AM Rich Felker <dalias@libc.org> wrote:
> > >
> > > > On Thu, Mar 07, 2024 at 08:47:20PM -0800, David Schinazi wrote:
> > > > > Thanks. How would you feel about the following potential
> > configuration
> > > > > design?
> > > > > * Add a new configuration option "send_mdns_unicast"
> > > > > * When true, use the current behavior
> > > > > * When false, send the query on all non-loopback non-p2p interfaces
> > > > > * Have send_mdns_unicast default to false
> > > > >
> > > > > I was thinking through how to pick interfaces, looked up what other
> > mDNS
> > > > > libraries do, and pretty much all of them don't allow configuring
> > > > > interfaces, whereas Avahi exposes allow-interfaces and
> > deny-interfaces.
> > > > I'm
> > > > > leaning towards not making this configurable to reduce complexity. I
> > > > think
> > > > > that anyone interested in that level of config is probably using
> > Avahi
> > > > > anyway.
> > > > >
> > > > > Additionally this design has two nice properties: the default
> > behavior is
> > > > > RFC-compliant, and it means that for my use-case I don't need to
> > change
> > > > the
> > > > > config file, which was a big part of my motivation for doing this
> > inside
> > > > of
> > > > > musl in the first place :-)
> > > >
> > > > As discussed in this thread, I don't think so. The biggest problems I
> > > > initially brought up were increased information leakage in the default
> > > > configuration and inability to control where the traffic goes when you
> > > > do want it on. The above proposal just reverts to the initial, except
> > > > for providing a way to opt-out.
> > > >
> > > > For the most part, mDNS is very much a "home user, personal device on
> > > > trusted network" thing. Not only do you not want it to default on
> > > > because a lot of systems will be network servers on networks where
> > > > it's not meaningful (and can be a weakness that aids attackers in
> > > > lateral movement), but you also don't want it on when connected to
> > > > public wifi. For example if you have an open browser tab to
> > > > http://mything.local, and migrate to an untrusted network (with your
> > > > laptop, tablet, phone, whatever), now your browser will be leaking
> > > > private data (likely at least session auth tokens, maybe more) to
> > > > whoever answers the mDNS query for mything.local.
> > >
> > > That's not quite right. The security properties of mDNS and DNS are the
> > > same. DNS is inherently insecure, regardless of unicast vs multicast. If
> > > I'm on a coffee shop Wi-Fi, all my DNS queries are sent in the clear to
> > > whatever IP address the DHCP server gave me.
> >
> > That's not the case. Connections to non-mDNS hosts are authenticated
> > by TLS with certificates issued on the basis of ownership of the
> > domain name. That's not possible with mDNS hostnames, so they'll
> > either be no-TLS or self-signed certs. That's why the above attack is
> > possible. It was also possible with normal DNS in the bad old days of
> > http://, but that time is long gone.
>
> Apologies for being pedantic, but that's not true. The ability to get TLS
> certificates for a domain name that you own is a property of the WebPKI,
> not a property of TLS. What you wrote is true, but only in the context of a
> Web browser with an unmodified root certificate store. The features I
> mentioned above don't use the WebPKI, they have a separate root of trust.
> For example, some of those Apple features exchange TLS certificates via an
> out-of-band mechanism such as Apple trusted servers. Another example is the
> Apple Watch: when you first pair a new Apple Watch with an iPhone, they
> exchange ed25519 public keys. Then any time the watch wants to transfer a
> large file to/from the phone, it'll connect to Wi-Fi, use mDNS to find the
> phone, and set up an IKEv2/IPsec tunnel that then protects the exchange.
> It's resilient to any attacks at the mDNS level.
>
> You're absolutely right that the security of Web requests using local
> connectivity is completely broken by the lack of WebPKI certificates for
> those. But sending the DNS query over multicast as opposed to unencrypted
> unicast to an untrusted DNS server doesn't change the security properties.
> In your example above, the open tab to http://mything.local will send that
> query to the recursive resolver - and if that's the one received by DHCP
> then that server can reply with its own address and receive your auth
> tokens. One potential fix here is to configure your resolv.conf to
> localhost and then apply policy in that local resolver. But in practice,
> application developers don't rely on security at that layer, they assume
> that DNS is unsafe and implement encryption in userspace with some out of
> band trust mechanism.

My specific example was http://mything.local in a web browser, which
is the way you access lots of mDNS-enabled things in the absence of a
specific software ecosystem like Apple's. Since we're talking about
musl which would be running on Linux or a Linux-syscall-compatible
environment, without Apple apps, I think that's the main way anyone
would be using hypothetical mDNS support. And indeed this is the way
you access many printers, 3D printers, IP cameras, etc.

I have multiple services at home that use HTTP and mDNS to communicate with. But they're built knowing that unencrypted HTTP is unsafe. For example, one of my servers doesn't have any authentication - my browser just uses unauthenticated GETs, POSTs and WebSockets. If I leave the tab open and go to a coffee shop, my browser might send that GET to a server I don't trust but that request won't carry any sensitive information. Another of my servers uses TLS with self-signed certs, so every time I want to communicate with it, I need to click through my browser's "this is unsafe" interstitial to get to the page. If I switch networks, the browser will send me the warning again and I'll know not to click through when I'm not at home. In both of those cases, the security is handled (or not handled at all) at the application layer.

Maybe at some point we'll have a good framework for authenticating
this kind of usage with certificates (probably certificate pinning on
first use, with good UX, is the only easy solution),

Trust on first use works, or even better there are emerging solutions that leverage codes printed on devices and PAKEs so that a device on the untrusted network can't even hijack the first connection without having access to that code. The leading one for home automation is Matter [1]. Coincidentally, it also leverages mDNS for discovery, and doesn't rely on security at the DNS level.

[1] https://csa-iot.org/all-solutions/matter/

but at present,
mDNS devices on the .local zone get accessed with plain http:// all
the time, and this means it's unsafe to do mDNS on
public/untrusted/hostile networks.

The notion of something being "unsafe" (and security in general) is predicated on the existence of a threat model. It's unsafe to use unencrypted HTTP to your bank when your threat model includes someone on the coffee shop Wi-Fi trying to steal your bank credentials. Conversely, it's safe for me to print to this coffee shop printer if my threat model assumes that I'm ok with the owner of the coffee shop seeing my document. Another example is Chromecast which also uses mDNS: from Chrome on a Linux laptop, I can cast YouTube videos to the TV in this coffee shop. That's safe because I trust the network with the YouTube link I'm telling the TV to play. mDNS is not in and of itself safe or unsafe. It converts names into addresses, and what you do with those addresses can potentially be unsafe.

That doesn't mean that every single use of mDNS on untrusted networks is safe. If someone builds a web page that sends valuable secrets over unencrypted HTTP to a .local name, then you have a security problem. But my point is that this security problem needs to be solved at the application layer and not at the DNS layer. That said, I agree that having a way to disable mDNS on a machine is a good idea, because there probably are users out there that are stuck with applications that for some reason decided to rely on DNS being secure.

In terms of the tradeoff between usability and security, the default to me lies with default-enabling mDNS on all interfaces as Apple and Avahi do. But this tradeoff is between two metrics that can't be quantified one against the other for all possible uses, so I totally understand if your opinion for musl is that the tradeoff there is different than in other situations. You know your users better than I do.

> > So the stack has to deal with
> > > the fact that any DNS response can be spoofed.
> >
> > That's also not possible with DNSSEC, but only helps if you're
> > validating it.
> >
> > > The most widely used
> > > solution is TLS: a successful DNS hijack can prevent you from accessing a
> > > TLS service, but can't impersonate it. That's true of both mDNS and
> > regular
> > > unicast DNS. As an example, all Apple devices have mDNS enabled on all
> > > interfaces, with no security impact - the features that rely on it
> > > (AirDrop, AirPlay, contact sharing, etc) all use mTLS to ensure they're
> > > talking to the right device regardless of the correctness of DNS.
> > (Printing
> > > remains completely insecure, but that's also independent of DNS - your
> > > coffee shop Wi-Fi access point can attack you at the IP layer too). One
> > > might think that DNSSEC could save us here, but it doesn't. DNSSEC was
> > > unfortunately built with a fundamental design flaw: it requires you to
> > > trust all resolvers on the path, including recursive resolvers. So even
> > if
> > > you ask for DNSSEC validation of the DNS records for www.example.com,
> > your
> > > coffee shop DNS recursive resolver can tell you "I checked, and
> > example.com
> > > does not support DNSSEC, here's the IP address for www.example.com
> > though"
> > > and you have to accept it.
> >
> > This is a completely false but somehow persistent myth about DNSSEC.
> > You cannot lie that a zone does not support DNSSEC. The only way to
> > claim a zone does not support DNSSEC is with a signature chain from
> > the DNS root proving the nonexistence of the DS records for the
> > delegation. Without that, the reply is BOGUS and will be ignored as if
> > there was no reply at all.
>
> I was talking about the case where the recursive resolver does the
> validation, which is what's deployed in practice today. What you wrote is
> only true if the client does the DNSSEC validation itself. Most clients
> don't do that today, because too many domains are just misconfigured and
> broken. Eric Rescorla (the editor of the TLS RFCs) wrote a great blog post
> about this:

The consensus of folks in the stub resolver space (at least glibc+musl
and I would assume the BSDs as well) is that the way you do DNSSEC
validation is by having a validating caching proxy or full recursive
resolver on localhost. Doing validation in the stub resolver is not
viable because it may be static-linked, where it would not be able to
be updated with new algorithms, root-of-trust, etc.

No disagreement there. By "client" I meant the client device as a whole, and by "recursive resolver" I meant "the DNS server you got from DHCP". Running a DNSSEC-validating recursive resolver on the client device falls into what I meant by "if the client does the DNSSEC validation itself". Sorry for being unclear.

This is one of the
reasons our go-to response for new functionality wanted in the stub
resolver is "do it in a nameserver on localhost" -- because you
already need that to do DNSSEC.

That makes sense. I wasn't working with the assumption that DNSSEC was a requirement.

It really did not sound like you were talking about trusting the
recursive, though. You called it a "fundamental design flaw", which it
is not, and said it requires you to "trust all resolvers on the path",
which it does not. It only requires you to trust the immediate
resolver you are interacting with (and not even that if you put the
validation in the stub resolver, but there are good reasons not to do
that, as above). A pure-proxying server that relies on upstream
recursives can do full DNSSEC validation. Dnsmasq is a canonical
example. I believe systemd-resolvd also does it.

That's fair, and I apologize for overstating my point. I absolutely agree that if you run a validating recursive resolver locally, then the attack I described isn't possible. When DNSSEC was designed, it was intended to be deployed in the model I described, where the validating recursive resolver is not on-device. And that's how it is still mostly deployed today because almost all general-purpose client devices do not validate locally. My mental model is very focused around consumer devices where folks buy them and use them without ever changing default settings. That might be a portion of musl users, but you clearly also have advanced users that do things differently.

> > > Regarding untrusted networks, one thing I hadn't considered yet is
> > > > that a network configurator probably needs a way to setup resolv.conf
> > > > such that .local queries temp-fail rather than perma-fail (as they
> > > > would if you just sent the query to public dns) to use during certain
> > > > race windows while switching networks. IOW "send .local queries to
> > > > configured nameservers" and "treat .local specially but with an empty
> > > > list of interfaces to send to" should be distinct configurations.
> > >
> > > Yeah, caching negative results in DNS has been a tricky thing from the
> > > start. You probably could hack something by installing a fake SOA record
> > > for .local. in your recursive resolver running on localhost. But the
> > > RFC-compliant answer is for stub resolvers to treat it specially and know
> > > that those often never get an answer (musl doesn't cache DNS results so
> > in
> > > a way we're avoiding this problem altogether at the stub resolver).
> >
> > The problem here is not about caching, just about clients using a
> > response. You want a task (like a browser with open tabs) trying to
> > contact the site to get a tempfail rather than NxDomain which might
> > make it stop trying. But you probably want NxDomain if mDNS has been
> > disabled entirely, so that every .local lookup doesn't hang 5 seconds
> > or whatever before saying "inconclusive".
>
> I'm assuming that by tempfail you mean EAI_AGAIN. The two browsers that
> I've written code in don't use that (Chrome just treats it the same as a
> resolution failure and will automatically refresh the tab on a network
> change; Safari doesn't use getaddrinfo and instead relies on an
> asynchronous DNS API that adds results as they come in - I wrote that
> algorithm up in RFC 8305). All that said, synchronous blocking APIs like
> getaddrinfo need to eventually return even if no one replies, so EAI_AGAIN
> makes sense in that case - whereas if .local is blocked by policy then
> immediately returning EAI_NONAME is best.

Right. Even if applications don't currently distinguish them well,
returning EAI_AGAIN vs EAI_NONAME is meaningful and enables them to do
the right thing.

Agreed.

Thinking back to our discussion about whether to disable mDNS when the resolver is on localhost. I still agree that from an ergonomics perspective, using configs to mean multiple things isn't great. But focusing just on the security properties for a second: if resolv.conf is configured to an IP address that is routed over a given non-loopback interface, the current status quo is to send the .local query unsecured over that interface. So if we were to, in that specific scenario, instead send the query over multicast, but only on that interface - then we wouldn't measurably change the security properties of the system. In practice there is a slight difference where now you can be attacked by any device on the network as opposed to only by the router on that network, but I'd argue that there's no meaningful threat model that distinguishes between those two attacks. So that would be a safe default option. But again, your points about least surprise are still valid, so if you object to that on those grounds I can't disagree.

David