Thanks for the detailed responses, everyone!
I'm sending replies inline.
David

On Wed, Mar 6, 2024 at 8:15 AM Rich Felker <dalias@libc.org> wrote:
On Tue, Mar 05, 2024 at 11:29:03PM -0800, David Schinazi wrote:
> Hi everyone,
>
> I was debugging a network connectivity issue on Alpine and have tracked it
> down to lack of support for mDNS in musl gethostbyname / getaddrinfo [1]. I
> looked through the musl codebase to understand why, and it would be pretty
> straightforward to fix. I'd be interested in writing a patch for this, so I
> was wondering: would you be at all interested in potentially taking such a
> patch?
>
> Some more info on mDNS: all names that end in ".local" are reserved for use
> by mDNS, and instead of sending them to the DNS resolver, they're sent
> locally over multicast - and the machine with that name replies with its IP
> address. It's used today to discover printers and pretty much everything in
> home networks.

Last I checked, .local is not actually reserved by any relevant
specification/authority. It was basically just appropriated by mDNS.
The protocol spoken is also not exactly DNS (for example, it uses raw
UTF-8 rather than IDN/punycode, which would need to be special-cased
once we support the latter).

On Wed, Mar 6, 2024 at 8:45 AM Jeffrey Walton <noloader@gmail.com> wrote:
It looks like IANA reserves it, and cites RFC 6762,
<https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml>.

As Jeffrey points out, when the IETF decided to standardize mDNS, they published it (RFC 6762) at the same time as the Special-Use Domain Registry (RFC 6761) which created a process for reserving domain names for custom purposes, and ".local" was one of the initial entries into that registry. The UTF-8 vs punycode issue when it comes to mDNS and DNS is somewhat of a mess. It was discussed in Section 16 of RFC 6762 but at the end of the day punycode won. Even Apple's implementation of getaddrinfo will perform punycode conversion for .local instead of sending the UTF-8. So in practice you wouldn't need to special-case anything here.

There's also very much a policy matter of what "locally over
multicast" means (what the user wants it to mean). Which interfaces
should be queried? Wired and wireless ethernet? VPN links or other
sorts of tunnels? Just one local interface (which one to prioritize)
or all of them? Only if the network is "trusted"? Etc.
 
You're absolutely right. Most mDNS systems try all non-loopback non-p2p multicast-supporting interfaces, but sending to the default route interface would be a good start, more on that below.

My view has always been that the right way to do something like this,
where there's no existing interface or contract/expectations for how
the libc stub resolver does it, is that it belongs in a resolver
speaking dns protocol on localhost. That way policy isn't baked-in to
individual executables (which may be static linked) but kept in a
place that's reasonable to have policy controls and where the user can
customize them.

I agree that providing an option to have these policy decisions in user-space makes a lot of sense. That's what glibc and Apple's libsystem do, but that comes at a higher indirection cost. For components that don't have as much flexibility though, it would be nice to be able to send these queries without requiring additional software.

> From looking through musl, both gethostbyname() and getaddrinfo() route
> through __lookup_name(), which eventually calls name_from_dns(). From
> looking at that function, the issue is that it doesn't treat .local
> specifically - instead of sending those queries to multicast, it sends them
> to the regularly configured DNS nameservers.
>
> The fix would be to modify name_from_dns() [2] such that if `name` ends in
> ".local", then pass in a different conf variable to __res_msend_rc(). The
> conf variable contains (amongst other things) the DNS nameservers to send
> the query to. So, when the name ends in .local, instead of passing in the
> regular nameservers, we pass the multicast addresses and ports dedicated to
> mDNS (224.0.0.251:5353 and [ff02::fb]:5353).

When you do that, how do you control which interface(s) it goes over?
I think that's an important missing ingredient.

You're absolutely right. In IPv4, sending to a link-local multicast address like this will send it over the IPv4 default route interface. In IPv6, the interface needs to be specified in the scope_id. So we'd need to pull that out of the kernel with rtnetlink.

> And that's it! This implementation is compatible with the "One-Shot
> Multicast DNS Queries" mode of the mDNS RFC [3]. (Other versions of libc
> have a mode to send the query over dbus to avahi so that it can cache mDNS
> results locally. But that's the more complicated "Continuous Multicast DNS
> Querying" mode of the RFC, and we don't need that here.)
>
> So what do you think, would you be interested in support for mDNS? (In case
> it matters, I've made changes in getaddrinfo inside Apple's libc, so I'm
> comfortable in this kind of code even though I have zero prior experience
> with musl)

If at some point there's a consensus on stub resolvers having an
expectation to support this themselves, and on untanging the details
like the above, and on "ownership" of the ".local" TLD, it might make
sense to have a resolv.conf option to do this.

So there's at least IETF consensus on these things. The ownership is well-defined in RFC 6761, and the support by stub resolvers is discussed in RFC 6762 Section 22.1 paragraph 3 <<Name resolution APIs and libraries SHOULD recognize these names as special and SHOULD NOT send queries for these names to their configured (unicast) caching DNS server(s).>>
 
Unlike general unioning
of sources, which is really problematic, the mDNS stuff seems to be
putting the decision which source to use *before* making any queries,
which is a lot less problematic.

I'm not familiar with what you mean by unioning here, are you referring to interface selection, DNS name server selection, or something else?

On Wed, Mar 6, 2024 at 8:16 AM Markus Wichmann <nullplan@gmx.net> wrote:
So is there something wrong with the solution presented in the wiki
page? Because that is generally the answer we recommend: If you want any
name resolution other than DNS, write a proxy that does what you want
and point resolv.conf to it. Similarly, if you want any user database
lookup other than local files, write an nscd proxy that does what you
want.

That's certainly an option. Ideally I'd rather avoid adding additional processes that can be failure points, when the stub can send these itself with a very small modification.
 
Reason for that is that that is the most generic way to support any
other name service besides DNS. It avoids the dependency on dynamic
loading that something like glibc's nsswitch would create, and would
avoid having multiple backends in libc. I really don't think anyone
wants to open that particular door. Once mDNS is in there, someone will
add NetBIOS, just you wait.

I'm definitely supportive of the slippery slope argument, but I think there's still a real line between mDNS and NetBIOS. mDNS uses a different transport but lives inside the DNS namespace, whereas NetBIOS is really its own thing - NetBIOS names aren't valid DNS hostnames.

Let me know what you think of the above. If you think of mDNS as its own beast then I can see how including it wouldn't really make sense. But if you see it as an actual part of the DNS, then it might be worth a small code change :-)

Cheers,
David