Hi there, I am having an issue in my alpine docker setup with Postfix. I activated DANE for my server and did some tests if E-Mails are handled correctly. In that I found the outgoing mails to fail using DANE. Investigating the issue with Viktor Dukhovni over at postfix-users, we figured, that Postfix has troubles recognising the DANE parameters of the target server I am sending my E-Mails to. If you are interested in the conversation: https://pastebin.com/1e3sR0Hq In the tcpdumps we could figure, that no DNSSEC flags are in the request by Postfix, hence not getting the information to properly do DANE. That explains the failure of DANE, however not why this is happening. I am no programmer, hence not sure about libc etc. but Viktors last thought: "When Postfix is configured with "smtp_dns_support_level = dnssec", the RES_USE_DNSSEC and RES_USE_EDNS0 flags are set around calls to the resolver routines. If your C-library (perhaps only inside docker) has an incopatible resolver API, then you'll need a more compatible resolver library and/or a different container technology." In comparison using dig to check for DNSSEC out of the same container based on alpine works. However I do not know if the request is constructed the same way. So the question is now on how we can go about this to figure if there is an incompatibility? Kind regards Christian
Hi again, So Viktor did some digging: "The comment on line 25: https://github.com/runtimejs/musl-libc/blob/master/include/resolv.h#L25 is not encouraging. It suggests that _res is unused. If so, Postfix DNS does not work correctly with this C library. And not just for DANE, since Postfix is also unable to to control RES_DEFNAMES and RES_DNSRCH. Indeed searching the github repo for RES_USE_DNSSEC and RES_USE_EDNS0 finds hits only the header file, and similarly: https://raw.githubusercontent.com/runtimejs/musl-libc/master/src/network/res_state.c pretty much rules out support for configurable lookup options." and "The musl-libc resolver code also includes [...]: https://github.com/runtimejs/musl-libc/blob/master/src/network/__dns.c#L67-L69 So not terribly safe if using a remote resolver. Definitely no support for EDNS(0) or sending the "DO" or "AD" bits in the request. Always queries all resolvers in parallel without waiting for a short timeout from the first one (or use connect(2) for prompt notification of host/port unreachable). There is no support for truncated responses or TCP failover, so if a host has enough IP addresses, some may be dropped, and FcRDNS checks may fail spuriously." I guess this all shows the incompatibility. Big questions is now: Will/Can this be resolved? Out of what I can understand it makes Postfix with musl pretty much unusable, at least less secure (not only because of the lib itself, but as well as security features, like DANE get crippled without noticing). Also I am not sure why there are stubs build to "satisfy broken apps"? An exception would have showed right away that something is not working and prevented the use.
On Mon, Apr 13, 2020 at 01:20:49PM +0200, Christian wrote: > Hi again, > > So Viktor did some digging: > > "The comment on line 25: > > https://github.com/runtimejs/musl-libc/blob/master/include/resolv.h#L25 > > is not encouraging. It suggests that _res is unused. This is correct. It was probably a really bad idea to offer it at the header level at all, rather than having a compile-time failure if software is attempting to use it to configure the resolver. It's always been intentional that musl's res_* functions are stateless and thread-safe. > If so, Postfix DNS does not work correctly with this C library. And > not just for DANE, since Postfix is also unable to to control > RES_DEFNAMES and RES_DNSRCH. > > Indeed searching the github repo for RES_USE_DNSSEC and RES_USE_EDNS0 > finds hits only the header file, and similarly: > > > https://raw.githubusercontent.com/runtimejs/musl-libc/master/src/network/res_state.c > > pretty much rules out support for configurable lookup options." > > and > > "The musl-libc resolver code also includes [...]: > > > https://github.com/runtimejs/musl-libc/blob/master/src/network/__dns.c#L67-L69 > > So not terribly safe if using a remote resolver. Definitely no support > for EDNS(0) or sending the "DO" or "AD" bits in the request. > > Always queries all resolvers in parallel without waiting for a short > timeout from the first one (or use connect(2) for prompt notification of host/port unreachable). > > There is no support for truncated responses or TCP failover, so if a > host has enough IP addresses, some may be dropped, and FcRDNS checks may fail spuriously." > > I guess this all shows the incompatibility. > > Big questions is now: Will/Can this be resolved? The intended usage model for musl's resolver has always been that the stub resolver (which gets static linked into programs if you static link and is therefore non-upgradeable) be minimal and speak only the most basic original DNS protocol (RFC 1035) that's understood by all nameservers. The intended usage model for DNSSEC with this is that you run your DNSSEC-validating nameserver on localhost (or if you want to YOLO it, somewhere on a trusted LAN where you daringly assume nobody can root a random box and use it to spoof DNS replies) and thereby get ServFail if DNSSEC validation fails (which translates into EAI_AGAIN/temporary failure with the higher-level functions) and a trusted result if there's any result at all. So, the intended fix is simply not doing what Postfix is doing, and just making the queries normally, and getting DNSSEC protection if it's configured. However, that's not necessarily immutable policy. Indeed it doesn't let a program alter its behavior based on whether the records looked up were DNSSEC-protected, which presumably Postfix wants to do, since it's required for conforming use of DANE. (Personally I disagree with this requirement and believe it's useful to honor TLSA records even on non-signed domains, but don't expect everyone to just do that.) However I don't know when or how this sort of change would take place. The easy short-term fix to use Postfix unmodified with fully conforming policy is to just link it to a copy (probably static so as not to conflict with system) of BIND's resolver library. The easy short-term fix if you don't care that (or are happy that) TLSA records for unsigned domains would be honored is to just disable (comment-out/#if 0) the code in Postfix that's checking whether the result was signed treat all results as if they were. > Out of what I can understand it makes Postfix with musl pretty much > unusable, at least less secure (not only because of the lib itself, but > as well as security features, like DANE get crippled without noticing). > > Also I am not sure why there are stubs build to "satisfy broken apps"? > An exception would have showed right away that something is not working > and prevented the use. Indeed I think that was a bad idea, and may remove them (the header-level parts, not the ABI symbol _res). If so I'll coordinate with distros first to make sure they have patches ready for any software that will fail to build (and this might uncover bugs in the existing packages that need to be fixed, too). On the other hand if part of the resulting changes end up being addition of res_n* APIs that do use a configuration context, then it can't be removed. Rich
* Christian:
> So Viktor did some digging:
>
> "The comment on line 25:
>
> https://github.com/runtimejs/musl-libc/blob/master/include/resolv.h#L25
>
> is not encouraging. It suggests that _res is unused. If so, Postfix
> DNS does not work correctly with this C library. And not just for DANE, since Postfix is also unable to to control RES_DEFNAMES and RES_DNSRCH.
Are these changes to the RES_DEFNAMES and RES_DNSRCH flags really
necessary? Why doesn't Postfix use res_query (or perhaps res_send) as
appropriate?
On Mon, Apr 13, 2020 at 05:52:34PM +0200, Florian Weimer wrote:
> * Christian:
>
> > So Viktor did some digging:
> >
> > "The comment on line 25:
> >
> > https://github.com/runtimejs/musl-libc/blob/master/include/resolv.h#L25
> >
> > is not encouraging. It suggests that _res is unused. If so, Postfix
> > DNS does not work correctly with this C library. And not just for
> > DANE, since Postfix is also unable to to control RES_DEFNAMES and
> > RES_DNSRCH.
>
> Are these changes to the RES_DEFNAMES and RES_DNSRCH flags really
> necessary? Why doesn't Postfix use res_query (or perhaps res_send) as
> appropriate?
What I'd really like to see Postfix doing is not trying to poke
at/override configuration, and assuming option edns0 is set in
resolv.conf if the user wants it. Then, if it's set and the resolver
supports making edns queries with DNSSEC result flags available, it
can act on them and treat "valid result for signed domain" differently
from "valid result for unsigned domain".
My preferred behavior if not, that's compatible with what's always
been the intended musl stub resolver usage model, is that treat all
DNSSEC behavior as outsourced to the configured nameserver and simply
lookup records. (If wanted, the user's local nameserver can then drop
TLSA records for unsigned domains, or report them to be honored as if
they were signed, according to the wishes of whoever set it up.) But
it might be unrealistic to expect Postfix to do this.
Rich
On Mon, Apr 13, 2020 at 05:52:34PM +0200, Florian Weimer wrote:
> * Christian:
>
> > So Viktor did some digging:
> >
> > "The comment on line 25:
> >
> > https://github.com/runtimejs/musl-libc/blob/master/include/resolv.h#L25
> >
> > is not encouraging. It suggests that _res is unused. If so, Postfix
> > DNS does not work correctly with this C library. And not just for DANE, since Postfix is also unable to to control RES_DEFNAMES and RES_DNSRCH.
>
> Are these changes to the RES_DEFNAMES and RES_DNSRCH flags really
> necessary? Why doesn't Postfix use res_query (or perhaps res_send) as
> appropriate?
But to actually answer these questions, modifying the flags is
presumably because traditional req_query builds an rfc1035 query or
edns query based on these flags derived from from resolv.conf, and
Postfix either assumes or wants to support the case where resolv.conf
is not already configured for edns, perhaps because it was generated
by a dhcp client.
Rich
Am Montag, den 13.04.2020, 12:38 -0400 schrieb Rich Felker: > On Mon, Apr 13, 2020 at 05:52:34PM +0200, Florian Weimer wrote: > > * Christian: > > > So Viktor did some digging: > > "The comment on line 25: > > https://github.com/runtimejs/musl-libc/blob/master/include/resolv.h#L25 > > is not encouraging. It suggests that _res is unused. If so, Postfix > DNS does not work correctly with this C library. And not just for > DANE, since Postfix is also unable to to control RES_DEFNAMES and > RES_DNSRCH. > > Are these changes to the RES_DEFNAMES and RES_DNSRCH flags really > necessary? Why doesn't Postfix use res_query (or perhaps res_send) as > appropriate? > > But to actually answer these questions, modifying the flags is > presumably because traditional req_query builds an rfc1035 query or > edns query based on these flags derived from from resolv.conf, and > Postfix either assumes or wants to support the case where resolv.conf > is not already configured for edns, perhaps because it was generated > by a dhcp client. > > Rich > > I can't tell you much on the coding or why it is this way. I am merely a user that found the incompatibility. If this is of interest, you might want to get in contact with Viktor, e.g. via the postfix users mailing list. FYI: I just moved my config to a glibc setup in debian and it is working without issues, hence confirming Viktors finding, that Postfix won't work with musl
On Mon, Apr 13, 2020 at 07:51:23PM +0200, Christian wrote:
> Am Montag, den 13.04.2020, 12:38 -0400 schrieb Rich Felker:
> > On Mon, Apr 13, 2020 at 05:52:34PM +0200, Florian Weimer wrote:
> >
> > * Christian:
> >
> >
> > So Viktor did some digging:
> >
> > "The comment on line 25:
> >
> >
> https://github.com/runtimejs/musl-libc/blob/master/include/resolv.h#L25
> >
> > is not encouraging. It suggests that _res is unused. If so, Postfix
> > DNS does not work correctly with this C library. And not just for
> > DANE, since Postfix is also unable to to control RES_DEFNAMES and
> > RES_DNSRCH.
> >
> > Are these changes to the RES_DEFNAMES and RES_DNSRCH flags really
> > necessary? Why doesn't Postfix use res_query (or perhaps res_send) as
> > appropriate?
> >
> > But to actually answer these questions, modifying the flags is
> > presumably because traditional req_query builds an rfc1035 query or
> > edns query based on these flags derived from from resolv.conf, and
> > Postfix either assumes or wants to support the case where resolv.conf
> > is not already configured for edns, perhaps because it was generated
> > by a dhcp client.
> >
> > Rich
> >
> >
>
> I can't tell you much on the coding or why it is this way. I am merely
> a user that found the incompatibility. If this is of interest, you
> might want to get in contact with Viktor, e.g. via the postfix users
> mailing list.
>
> FYI: I just moved my config to a glibc setup in debian and it is
> working without issues, hence confirming Viktors finding, that Postfix
> won't work with musl
Thanks. I'll see if I can reply into the postfix-users thread.
Rich
* Rich Felker:
> On Mon, Apr 13, 2020 at 05:52:34PM +0200, Florian Weimer wrote:
>> * Christian:
>>
>> > So Viktor did some digging:
>> >
>> > "The comment on line 25:
>> >
>> > https://github.com/runtimejs/musl-libc/blob/master/include/resolv.h#L25
>> >
>> > is not encouraging. It suggests that _res is unused. If so, Postfix
>> > DNS does not work correctly with this C library. And not just for DANE, since Postfix is also unable to to control RES_DEFNAMES and RES_DNSRCH.
>>
>> Are these changes to the RES_DEFNAMES and RES_DNSRCH flags really
>> necessary? Why doesn't Postfix use res_query (or perhaps res_send) as
>> appropriate?
>
> But to actually answer these questions, modifying the flags is
> presumably because traditional req_query builds an rfc1035 query or
> edns query based on these flags derived from from resolv.conf, and
> Postfix either assumes or wants to support the case where resolv.conf
> is not already configured for edns, perhaps because it was generated
> by a dhcp client.
In my comment above, I specifically meant RES_DEFNAMES and RES_DNSRCH.
RES_USE_EDNS0 seems different; I would expect applications to use
their own DNS libraries if they need to access DNSSEC data and
non-address record types (where there is no benefit gained form
integrating with /etc/hosts or other data sources).
On Tue, Apr 14, 2020 at 11:57:17AM +0200, Florian Weimer wrote:
> * Rich Felker:
>
> > On Mon, Apr 13, 2020 at 05:52:34PM +0200, Florian Weimer wrote:
> >> * Christian:
> >>
> >> > So Viktor did some digging:
> >> >
> >> > "The comment on line 25:
> >> >
> >> > https://github.com/runtimejs/musl-libc/blob/master/include/resolv.h#L25
> >> >
> >> > is not encouraging. It suggests that _res is unused. If so, Postfix
> >> > DNS does not work correctly with this C library. And not just for DANE, since Postfix is also unable to to control RES_DEFNAMES and RES_DNSRCH.
> >>
> >> Are these changes to the RES_DEFNAMES and RES_DNSRCH flags really
> >> necessary? Why doesn't Postfix use res_query (or perhaps res_send) as
> >> appropriate?
> >
> > But to actually answer these questions, modifying the flags is
> > presumably because traditional req_query builds an rfc1035 query or
> > edns query based on these flags derived from from resolv.conf, and
> > Postfix either assumes or wants to support the case where resolv.conf
> > is not already configured for edns, perhaps because it was generated
> > by a dhcp client.
>
> In my comment above, I specifically meant RES_DEFNAMES and RES_DNSRCH.
>
> RES_USE_EDNS0 seems different; I would expect applications to use
> their own DNS libraries if they need to access DNSSEC data and
> non-address record types (where there is no benefit gained form
> integrating with /etc/hosts or other data sources).
Oh. For those it seems to be to suppress search domains, so that when
looking up the MX or TLSA for example.com it doesn't get records for
example.com.searchdomain.
I don't know why they poke at flags in _res rather than just appending
a . to the name, and/or comparting the name in the result to ensure
that it matches.
Also res_query is *documented* not to use search domains. You have to
use res_search if you want them. So the flags would only affect A/AAAA
lookups via getaddrinfo etc. anyway. Maybe that's the case they care
about, but appending . would still solve it, and it's not a DANE
integrity issue anyway since if you contacted the wrong server IP the
certificate/key would not match.
Rich
* Rich Felker: > On Tue, Apr 14, 2020 at 11:57:17AM +0200, Florian Weimer wrote: >> * Rich Felker: >> >> > On Mon, Apr 13, 2020 at 05:52:34PM +0200, Florian Weimer wrote: >> >> * Christian: >> >> >> >> > So Viktor did some digging: >> >> > >> >> > "The comment on line 25: >> >> > >> >> > https://github.com/runtimejs/musl-libc/blob/master/include/resolv.h#L25 >> >> > >> >> > is not encouraging. It suggests that _res is unused. If so, Postfix >> >> > DNS does not work correctly with this C library. And not just for DANE, since Postfix is also unable to to control RES_DEFNAMES and RES_DNSRCH. >> >> >> >> Are these changes to the RES_DEFNAMES and RES_DNSRCH flags really >> >> necessary? Why doesn't Postfix use res_query (or perhaps res_send) as >> >> appropriate? >> > >> > But to actually answer these questions, modifying the flags is >> > presumably because traditional req_query builds an rfc1035 query or >> > edns query based on these flags derived from from resolv.conf, and >> > Postfix either assumes or wants to support the case where resolv.conf >> > is not already configured for edns, perhaps because it was generated >> > by a dhcp client. >> >> In my comment above, I specifically meant RES_DEFNAMES and RES_DNSRCH. >> >> RES_USE_EDNS0 seems different; I would expect applications to use >> their own DNS libraries if they need to access DNSSEC data and >> non-address record types (where there is no benefit gained form >> integrating with /etc/hosts or other data sources). > > Oh. For those it seems to be to suppress search domains, so that when > looking up the MX or TLSA for example.com it doesn't get records for > example.com.searchdomain. > > I don't know why they poke at flags in _res rather than just appending > a . to the name, and/or comparting the name in the result to ensure > that it matches. It doesn't work when the data doesn't come out of DNS. > Also res_query is *documented* not to use search domains. Exactly, that's why I don't understand why changing the flags is needed. res_search for searching, res_query for not searching.