* [musl] musl resolver handling of "search ." in /etc/resolv.conf @ 2022-08-31 17:33 Dalton Hubble 2022-08-31 23:59 ` Rich Felker 0 siblings, 1 reply; 8+ messages in thread From: Dalton Hubble @ 2022-08-31 17:33 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1033 bytes --] Hey folks, I wanted to flag a possible issue with musl handling of DNS "search ." in /etc/resolv.conf.The easiest way I have to repro and consume musl is starting an alpine or busybox musl container image. podman run -it docker.io/alpine:3.16.2 /bin/ash Edit /etc/resolv.conf to the following (not the "." at the end of search): ``` search default.svc.cluster.local . nameserver 8.8.8.8 options ndots:5 ``` ``` wget www.google.com wget: bad address 'www.google.com' ``` Remove the "." from search and wget will work fine again. https://github.com/coreos/fedora-coreos-tracker/issues/1287 has some great details showing DNS packet capture and a malformed packet. Broader context is that systemd and recently Kubernetes start adding "search ." to resolv.conf in certain scenarios, which seems to break musl-based resolvers. - https://github.com/systemd/systemd/pull/17201 - https://github.com/kubernetes/kubernetes/pull/109441 - https://github.com/kubernetes/kubernetes/issues/112135 -- Dalton Hubble dghubble@gmail.com [-- Attachment #2: Type: text/html, Size: 2092 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] musl resolver handling of "search ." in /etc/resolv.conf 2022-08-31 17:33 [musl] musl resolver handling of "search ." in /etc/resolv.conf Dalton Hubble @ 2022-08-31 23:59 ` Rich Felker 2022-09-01 1:32 ` Jeffrey Walton 0 siblings, 1 reply; 8+ messages in thread From: Rich Felker @ 2022-08-31 23:59 UTC (permalink / raw) To: Dalton Hubble; +Cc: musl On Wed, Aug 31, 2022 at 10:33:05AM -0700, Dalton Hubble wrote: > Hey folks, > > I wanted to flag a possible issue with musl handling of DNS "search ." in > /etc/resolv.conf.The easiest way I have to repro and consume musl is > starting an alpine or busybox musl container image. > > podman run -it docker.io/alpine:3.16.2 /bin/ash > > Edit /etc/resolv.conf to the following (not the "." at the end of search): > > ``` > search default.svc.cluster.local . > nameserver 8.8.8.8 > options ndots:5 > ``` > > ``` > wget www.google.com > wget: bad address 'www.google.com' > ``` > > Remove the "." from search and wget will work fine again. > > https://github.com/coreos/fedora-coreos-tracker/issues/1287 has some great > details showing DNS packet capture and a malformed packet. > > Broader context is that systemd and recently Kubernetes start adding > "search ." to resolv.conf in certain scenarios, which seems to break > musl-based resolvers. > - https://github.com/systemd/systemd/pull/17201 > - https://github.com/kubernetes/kubernetes/pull/109441 > - https://github.com/kubernetes/kubernetes/issues/112135 Uhg. It was not forseen that . would be put in the search domains list, and putting it there, especially anywhere but the final position in the list, recreates a bad behavior that we explicitly tried to avoid having in musl. The mechanism of the failure is that malformed DNS queries are sent with a literal . at the end of the name. This probably also happens if the domains in the search list end in dot. Since the queries are malformed, they don't get responses (or ServFail) and then the search cannot continue. This can be fixed by properly stripping the final dot in search entries, and skipping ones that are otherwise malformed. Then we need to decide what to do with the empty (root) search suffix. There are 3 options I see: - Actually support it as a search. This is *bad* behavior, but at least unlike the version of this behavior musl explicitly does not implement, it was explicitly requested by the user. Except that it wasn't, because systemd is just putting it in everyone's resolv.conf.. - Skip it completely. Never search root; wait for the end of the search list and query root as always. - End search on encountering it and go directly to the post-search query at root. If it weren't for systemd and other things creating searches for . without the user's intent, I think the first option would clearly be the most reasonable. It provides a way to explicitly "get back" the functionality musl omits, on an opt-in basis. And maybe systemd is only emitting it as "search .", not putting . in the middle of other search domains? One of the other options might be a more conservative choice to make now, to avoid creating a new "feature" without thinking through what consequences it might have. We could always allow searching root later after there's been time to think through the consequences, rather than rushing is as part of a bugfix. Anyone care strongly about this one way or another? Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] musl resolver handling of "search ." in /etc/resolv.conf 2022-08-31 23:59 ` Rich Felker @ 2022-09-01 1:32 ` Jeffrey Walton 2022-09-01 12:45 ` Rich Felker 0 siblings, 1 reply; 8+ messages in thread From: Jeffrey Walton @ 2022-09-01 1:32 UTC (permalink / raw) To: musl On Wed, Aug 31, 2022 at 7:59 PM Rich Felker <dalias@libc.org> wrote: > > On Wed, Aug 31, 2022 at 10:33:05AM -0700, Dalton Hubble wrote: > > Hey folks, > > > > I wanted to flag a possible issue with musl handling of DNS "search ." in > > /etc/resolv.conf.The easiest way I have to repro and consume musl is > > starting an alpine or busybox musl container image. > > > > podman run -it docker.io/alpine:3.16.2 /bin/ash > > > > Edit /etc/resolv.conf to the following (not the "." at the end of search): > > > > ``` > > search default.svc.cluster.local . > > nameserver 8.8.8.8 > > options ndots:5 > > ``` > > > > ``` > > wget www.google.com > > wget: bad address 'www.google.com' > > ``` > > > > Remove the "." from search and wget will work fine again. > > > > https://github.com/coreos/fedora-coreos-tracker/issues/1287 has some great > > details showing DNS packet capture and a malformed packet. > > > > Broader context is that systemd and recently Kubernetes start adding > > "search ." to resolv.conf in certain scenarios, which seems to break > > musl-based resolvers. > > - https://github.com/systemd/systemd/pull/17201 > > - https://github.com/kubernetes/kubernetes/pull/109441 > > - https://github.com/kubernetes/kubernetes/issues/112135 > > Uhg. It was not forseen that . would be put in the search domains > list, and putting it there, especially anywhere but the final position > in the list, recreates a bad behavior that we explicitly tried to > avoid having in musl. > > The mechanism of the failure is that malformed DNS queries are sent > with a literal . at the end of the name. This probably also happens if > the domains in the search list end in dot. Since the queries are > malformed, they don't get responses (or ServFail) and then the search > cannot continue. > > This can be fixed by properly stripping the final dot in search > entries, and skipping ones that are otherwise malformed. Then we need > to decide what to do with the empty (root) search suffix. There are 3 > options I see: > > - Actually support it as a search. This is *bad* behavior, but at > least unlike the version of this behavior musl explicitly does not > implement, it was explicitly requested by the user. Except that it > wasn't, because systemd is just putting it in everyone's > resolv.conf.. > > - Skip it completely. Never search root; wait for the end of the > search list and query root as always. > > - End search on encountering it and go directly to the post-search > query at root. > > If it weren't for systemd and other things creating searches for . > without the user's intent, I think the first option would clearly be > the most reasonable. It provides a way to explicitly "get back" the > functionality musl omits, on an opt-in basis. And maybe systemd is > only emitting it as "search .", not putting . in the middle of other > search domains? > > One of the other options might be a more conservative choice to make > now, to avoid creating a new "feature" without thinking through what > consequences it might have. We could always allow searching root > later after there's been time to think through the consequences, > rather than rushing is as part of a bugfix. > > Anyone care strongly about this one way or another? Forgive my ignorance Rich. What, exactly, does 'search .' mean? Does that mean lookup a single-label hostname in the TLD context? If so, that's not supposed to happen: https://www.iab.org/documents/correspondence-reports-documents/2013-2/iab-statement-dotless-domains-considered-harmful/ . I would reject a 'search .' as malformed. And be careful of systemd and its networking implementation. The systemd folks do some shady things, like stripping a trailing dot from a fqdn when setting a hostname. Stripping that dot means it is no longer fully qualified. Jeff ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] musl resolver handling of "search ." in /etc/resolv.conf 2022-09-01 1:32 ` Jeffrey Walton @ 2022-09-01 12:45 ` Rich Felker 2022-09-01 16:03 ` Luca BRUNO 0 siblings, 1 reply; 8+ messages in thread From: Rich Felker @ 2022-09-01 12:45 UTC (permalink / raw) To: Jeffrey Walton; +Cc: musl On Wed, Aug 31, 2022 at 09:32:02PM -0400, Jeffrey Walton wrote: > On Wed, Aug 31, 2022 at 7:59 PM Rich Felker <dalias@libc.org> wrote: > > > > On Wed, Aug 31, 2022 at 10:33:05AM -0700, Dalton Hubble wrote: > > > Hey folks, > > > > > > I wanted to flag a possible issue with musl handling of DNS "search ." in > > > /etc/resolv.conf.The easiest way I have to repro and consume musl is > > > starting an alpine or busybox musl container image. > > > > > > podman run -it docker.io/alpine:3.16.2 /bin/ash > > > > > > Edit /etc/resolv.conf to the following (not the "." at the end of search): > > > > > > ``` > > > search default.svc.cluster.local . > > > nameserver 8.8.8.8 > > > options ndots:5 > > > ``` > > > > > > ``` > > > wget www.google.com > > > wget: bad address 'www.google.com' > > > ``` > > > > > > Remove the "." from search and wget will work fine again. > > > > > > https://github.com/coreos/fedora-coreos-tracker/issues/1287 has some great > > > details showing DNS packet capture and a malformed packet. > > > > > > Broader context is that systemd and recently Kubernetes start adding > > > "search ." to resolv.conf in certain scenarios, which seems to break > > > musl-based resolvers. > > > - https://github.com/systemd/systemd/pull/17201 > > > - https://github.com/kubernetes/kubernetes/pull/109441 > > > - https://github.com/kubernetes/kubernetes/issues/112135 > > > > Uhg. It was not forseen that . would be put in the search domains > > list, and putting it there, especially anywhere but the final position > > in the list, recreates a bad behavior that we explicitly tried to > > avoid having in musl. > > > > The mechanism of the failure is that malformed DNS queries are sent > > with a literal . at the end of the name. This probably also happens if > > the domains in the search list end in dot. Since the queries are > > malformed, they don't get responses (or ServFail) and then the search > > cannot continue. > > > > This can be fixed by properly stripping the final dot in search > > entries, and skipping ones that are otherwise malformed. Then we need > > to decide what to do with the empty (root) search suffix. There are 3 > > options I see: > > > > - Actually support it as a search. This is *bad* behavior, but at > > least unlike the version of this behavior musl explicitly does not > > implement, it was explicitly requested by the user. Except that it > > wasn't, because systemd is just putting it in everyone's > > resolv.conf.. > > > > - Skip it completely. Never search root; wait for the end of the > > search list and query root as always. > > > > - End search on encountering it and go directly to the post-search > > query at root. > > > > If it weren't for systemd and other things creating searches for . > > without the user's intent, I think the first option would clearly be > > the most reasonable. It provides a way to explicitly "get back" the > > functionality musl omits, on an opt-in basis. And maybe systemd is > > only emitting it as "search .", not putting . in the middle of other > > search domains? > > > > One of the other options might be a more conservative choice to make > > now, to avoid creating a new "feature" without thinking through what > > consequences it might have. We could always allow searching root > > later after there's been time to think through the consequences, > > rather than rushing is as part of a bugfix. > > > > Anyone care strongly about this one way or another? > > Forgive my ignorance Rich. What, exactly, does 'search .' mean? "search ." by itself is a semantically a no-op. It specifies a single search domain that's the DNS root, which is exactly what gets queried with no search at all. systemd is writing this into resolv.conf because of a glibc "misbehavior" (to put it lightly) where, in the absence of any search directive, it defaults to searching the domain of the system hostname (so hostname=foo.example.com would implicitly search example.com, which is obviously wrong to do, and systemd is trying to suppress that). But it would also cause failing lookups to be performed in duplicate, unless there's logic to suppress the final non-search lookup when root was already searched explicitly. Where it gets more complicated (see the email you replied to) is what happens when it's not just "search ." but something like search a.example.com . b.example.com or even search . example.com > Does > that mean lookup a single-label hostname in the TLD context? > > If so, that's not supposed to happen: > https://www.iab.org/documents/correspondence-reports-documents/2013-2/iab-statement-dotless-domains-considered-harmful/ > .. I would reject a 'search .' as malformed. That's not at all what this is about, but that *is* supposed to happen, and in fact there are single-label domains that are valid and have records. Rejecting them is not valid. Search is only performed when there are fewer than ndots dots in the queried name, but in the relevant examples, ndots is greater than the default of 1. In the case where ndots is 1, only single-label name lookup breaks. Support for search and especially ndots>1 is harmful functionality that was only added reluctantly, and with limitations to avoid the most harmful cases. But it is currently buggy as described, and needs some sort of fix. > And be careful of systemd and its networking implementation. The > systemd folks do some shady things, In this case it was glibc doing shady things and systemd doing shady workarounds. > like stripping a trailing dot from > a fqdn when setting a hostname. Stripping that dot means it is no > longer fully qualified. ... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] musl resolver handling of "search ." in /etc/resolv.conf 2022-09-01 12:45 ` Rich Felker @ 2022-09-01 16:03 ` Luca BRUNO 2022-09-01 18:01 ` Rich Felker 0 siblings, 1 reply; 8+ messages in thread From: Luca BRUNO @ 2022-09-01 16:03 UTC (permalink / raw) To: musl On Thu, 1 Sep 2022 08:45:12 -0400 Rich Felker <dalias@libc.org> wrote: > "search ." by itself is a semantically a no-op. It specifies a single > search domain that's the DNS root, which is exactly what gets queried > with no search at all. systemd is writing this into resolv.conf > because of a glibc "misbehavior" (to put it lightly) where, in the > absence of any search directive, it defaults to searching the domain > of the system hostname (so hostname=foo.example.com would implicitly > search example.com, which is obviously wrong to do, and systemd is > trying to suppress that). But it would also cause failing lookups to > be performed in duplicate, unless there's logic to suppress the final > non-search lookup when root was already searched explicitly. While tracking down this musl bug, I empirically observed from network traces that glibc does apply such de-duplication logic under the same configuration. That is, it performs the root-anchored query in the specified order, and in case of a negative response it does *not* perform the query again as it would otherwise do for the final fallback case. > > > There are 3 options I see: > > > > > > - Actually support it as a search. This is *bad* behavior, but at > > > least unlike the version of this behavior musl explicitly does > > > not implement, it was explicitly requested by the user. Except > > > that it wasn't, because systemd is just putting it in everyone's > > > resolv.conf.. > > > > > > - Skip it completely. Never search root; wait for the end of the > > > search list and query root as always. > > > > > > - End search on encountering it and go directly to the post-search > > > query at root. > > > > > > Anyone care strongly about this one way or another? From my observations, option 1 is consistent with other libc's behavior. But it has the above caveat that it needs additional caching to avoid duplicate root-queries on negative responses. If it isn't too invasive to implement, that would be my preferred one. Option 2 looks somehow reasonable too. The skewed order would be a bit surprising, but it can be documented and it's unlikely to affect many real-world usages. Ciao, Luca ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] musl resolver handling of "search ." in /etc/resolv.conf 2022-09-01 16:03 ` Luca BRUNO @ 2022-09-01 18:01 ` Rich Felker 2022-09-02 8:09 ` Luca BRUNO 2022-09-19 17:18 ` Rich Felker 0 siblings, 2 replies; 8+ messages in thread From: Rich Felker @ 2022-09-01 18:01 UTC (permalink / raw) To: Luca BRUNO; +Cc: musl On Thu, Sep 01, 2022 at 04:03:18PM +0000, Luca BRUNO wrote: > On Thu, 1 Sep 2022 08:45:12 -0400 > Rich Felker <dalias@libc.org> wrote: > > > "search ." by itself is a semantically a no-op. It specifies a single > > search domain that's the DNS root, which is exactly what gets queried > > with no search at all. systemd is writing this into resolv.conf > > because of a glibc "misbehavior" (to put it lightly) where, in the > > absence of any search directive, it defaults to searching the domain > > of the system hostname (so hostname=foo.example.com would implicitly > > search example.com, which is obviously wrong to do, and systemd is > > trying to suppress that). But it would also cause failing lookups to > > be performed in duplicate, unless there's logic to suppress the final > > non-search lookup when root was already searched explicitly. > > While tracking down this musl bug, I empirically observed from > network traces that glibc does apply such de-duplication logic under the > same configuration. > That is, it performs the root-anchored query in the specified order, and > in case of a negative response it does *not* perform the query again as > it would otherwise do for the final fallback case. Thanks! This is good to know. > > > > There are 3 options I see: > > > > > > > > - Actually support it as a search. This is *bad* behavior, but at > > > > least unlike the version of this behavior musl explicitly does > > > > not implement, it was explicitly requested by the user. Except > > > > that it wasn't, because systemd is just putting it in everyone's > > > > resolv.conf.. > > > > > > > > - Skip it completely. Never search root; wait for the end of the > > > > search list and query root as always. > > > > > > > > - End search on encountering it and go directly to the post-search > > > > query at root. > > > > > > > > Anyone care strongly about this one way or another? > > From my observations, option 1 is consistent with other libc's behavior. > But it has the above caveat that it needs additional caching to > avoid duplicate root-queries on negative responses. > If it isn't too invasive to implement, that would be my preferred one. I'm not clear what additional caching you have in mind. AFAICT the search loop can just set a flag if it searched root already, and the final root query can be skipped if it's reached and the flag is set. > Option 2 looks somehow reasonable too. The skewed order would be > a bit surprising, but it can be documented and it's unlikely to affect > many real-world usages. If we go this route, I think the way to document it would be that search list entries are strings of one or more label, and that malformed ones (including zero-length, over-length, etc.) are ignored. Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] musl resolver handling of "search ." in /etc/resolv.conf 2022-09-01 18:01 ` Rich Felker @ 2022-09-02 8:09 ` Luca BRUNO 2022-09-19 17:18 ` Rich Felker 1 sibling, 0 replies; 8+ messages in thread From: Luca BRUNO @ 2022-09-02 8:09 UTC (permalink / raw) To: musl On Thu, 1 Sep 2022 14:01:53 -0400 Rich Felker <dalias@libc.org> wrote: > > From my observations, option 1 is consistent with other libc's > > behavior. But it has the above caveat that it needs additional > > caching to avoid duplicate root-queries on negative responses. > > If it isn't too invasive to implement, that would be my preferred > > one. > > I'm not clear what additional caching you have in mind. AFAICT the > search loop can just set a flag if it searched root already, and the > final root query can be skipped if it's reached and the flag is set. Yes sorry, poor choice of wording from my side, that was the additional logic I was hinting to. For future reference, this bug was observed in the wild due to a combination of recent systemd (>= v247) and kubernetes (= 1.25.0). The on-host systemd behavior is on purpose, while the logic on kubernetes side was not completely expected. A bugfix for kubernetes is being assembled right now to avoid triggering this case, see https://github.com/kubernetes/kubernetes/pull/112157. But the same situation may crop up with other non-kubernetes runtimes, if they try to blindly forward/merge the "search ." from the host environment. Ciao, Luca ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] musl resolver handling of "search ." in /etc/resolv.conf 2022-09-01 18:01 ` Rich Felker 2022-09-02 8:09 ` Luca BRUNO @ 2022-09-19 17:18 ` Rich Felker 1 sibling, 0 replies; 8+ messages in thread From: Rich Felker @ 2022-09-19 17:18 UTC (permalink / raw) To: Luca BRUNO; +Cc: musl On Thu, Sep 01, 2022 at 02:01:53PM -0400, Rich Felker wrote: > On Thu, Sep 01, 2022 at 04:03:18PM +0000, Luca BRUNO wrote: > > On Thu, 1 Sep 2022 08:45:12 -0400 > > Rich Felker <dalias@libc.org> wrote: > > > > > "search ." by itself is a semantically a no-op. It specifies a single > > > search domain that's the DNS root, which is exactly what gets queried > > > with no search at all. systemd is writing this into resolv.conf > > > because of a glibc "misbehavior" (to put it lightly) where, in the > > > absence of any search directive, it defaults to searching the domain > > > of the system hostname (so hostname=foo.example.com would implicitly > > > search example.com, which is obviously wrong to do, and systemd is > > > trying to suppress that). But it would also cause failing lookups to > > > be performed in duplicate, unless there's logic to suppress the final > > > non-search lookup when root was already searched explicitly. > > > > While tracking down this musl bug, I empirically observed from > > network traces that glibc does apply such de-duplication logic under the > > same configuration. > > That is, it performs the root-anchored query in the specified order, and > > in case of a negative response it does *not* perform the query again as > > it would otherwise do for the final fallback case. > > Thanks! This is good to know. > > > > > > There are 3 options I see: > > > > > > > > > > - Actually support it as a search. This is *bad* behavior, but at > > > > > least unlike the version of this behavior musl explicitly does > > > > > not implement, it was explicitly requested by the user. Except > > > > > that it wasn't, because systemd is just putting it in everyone's > > > > > resolv.conf.. > > > > > > > > > > - Skip it completely. Never search root; wait for the end of the > > > > > search list and query root as always. > > > > > > > > > > - End search on encountering it and go directly to the post-search > > > > > query at root. > > > > > > > > > > Anyone care strongly about this one way or another? > > > > From my observations, option 1 is consistent with other libc's behavior. > > But it has the above caveat that it needs additional caching to > > avoid duplicate root-queries on negative responses. > > If it isn't too invasive to implement, that would be my preferred one. > > I'm not clear what additional caching you have in mind. AFAICT the > search loop can just set a flag if it searched root already, and the > final root query can be skipped if it's reached and the flag is set. > > > Option 2 looks somehow reasonable too. The skewed order would be > > a bit surprising, but it can be documented and it's unlikely to affect > > many real-world usages. > > If we go this route, I think the way to document it would be that > search list entries are strings of one or more label, and that > malformed ones (including zero-length, over-length, etc.) are ignored. OK, I've looked at it in more detail now and there are actually multiple layers to this bug, separate from the search logic itself: 1. res_mkquery does not error out on multiple consecutive dots in the name, but instead produces a malformed query packet. There are likely other error conditions it doesn't handle right, too. 2. name_from_dns wrongly returns EAI_NONAME (inhibiting further search) rather than 0 when res_mkquery produces an error. This causes entries in the search list that cannot exist as valid domain names (due to invalid characters, exceeding max total length or label length, etc.) to break the whole search when they should conclusively prove nonexistence of the attempted name and let the search continue. With these two fixed, we will "automatically" get the option 2 behavior, since concatenating name+"."+search where search is "." will produce a malformed name ending in double dot. If we want to later change this to the option 1 behavior, we can make the search logic remove final dot itself. I'll work on patches for the above two issues. Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-09-19 17:18 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-31 17:33 [musl] musl resolver handling of "search ." in /etc/resolv.conf Dalton Hubble 2022-08-31 23:59 ` Rich Felker 2022-09-01 1:32 ` Jeffrey Walton 2022-09-01 12:45 ` Rich Felker 2022-09-01 16:03 ` Luca BRUNO 2022-09-01 18:01 ` Rich Felker 2022-09-02 8:09 ` Luca BRUNO 2022-09-19 17:18 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).