mailing list of musl libc
 help / color / mirror / code / Atom feed
* DNS resolution happenning only after timeout
@ 2017-09-28 10:15 Srinivasa Raghavan
  2017-09-28 10:28 ` Szabolcs Nagy
  0 siblings, 1 reply; 8+ messages in thread
From: Srinivasa Raghavan @ 2017-09-28 10:15 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 1257 bytes --]

Hi,

When using "Alpine" docker image which uses musl-libc, we are facing delay
when we do operations like below in our production environment,
1. ping <name>
2. nslookup <name>
3. traceroute <name>
4. http request from node.js

There is a 5 second delay in name resolution, and then the above command
returns the response. The same problem does not occur in "debian" docker
image (which uses GNU libc).

In our case, there is a combination of SERVFAIL, "canonical name" along
with "Non authoritative answer".

Some learnings after doing some trial and error:
1. If I install "bind-tools" package in alpine, the "nslookup" happens
without delay.
2. If I set "options timout:1" in /etc/resolv.conf , then the name is
resolved after 1 second (instead of 5 seconds).
3. Whatever I change in /etc/resolv.conf (Like setting "domain", "search"),
there was no benefit.
4. output of "host"/"nslookup" command shows "SERVFAIL"
5. The problem does not occur if run from the host machine (Not from alpine
container).
6. The problem does not occur if run from another container which uses Gnu
libc, like "Debian" image.

Sample command outputs attached for reference.

Request you to kindly help in debugging / resolution of this.

Kind Regards,
R. Srinivasa Raghavan.

[-- Attachment #1.2: Type: text/html, Size: 1713 bytes --]

[-- Attachment #2: dns.rtf --]
[-- Type: application/rtf, Size: 3864 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DNS resolution happenning only after timeout
  2017-09-28 10:15 DNS resolution happenning only after timeout Srinivasa Raghavan
@ 2017-09-28 10:28 ` Szabolcs Nagy
  2017-09-28 16:55   ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Szabolcs Nagy @ 2017-09-28 10:28 UTC (permalink / raw)
  To: musl; +Cc: Srinivasa Raghavan

* Srinivasa Raghavan <raghav135@gmail.com> [2017-09-28 15:45:28 +0530]:
> When using "Alpine" docker image which uses musl-libc, we are facing delay
> when we do operations like below in our production environment,
> 1. ping <name>
> 2. nslookup <name>
> 3. traceroute <name>
> 4. http request from node.js
> 

this bug may be related:
https://github.com/rancher/rancher/issues/9961

> There is a 5 second delay in name resolution, and then the above command
> returns the response. The same problem does not occur in "debian" docker
> image (which uses GNU libc).
> 
> In our case, there is a combination of SERVFAIL, "canonical name" along
> with "Non authoritative answer".
> 
> Some learnings after doing some trial and error:
> 1. If I install "bind-tools" package in alpine, the "nslookup" happens
> without delay.
> 2. If I set "options timout:1" in /etc/resolv.conf , then the name is
> resolved after 1 second (instead of 5 seconds).
> 3. Whatever I change in /etc/resolv.conf (Like setting "domain", "search"),
> there was no benefit.
> 4. output of "host"/"nslookup" command shows "SERVFAIL"
> 5. The problem does not occur if run from the host machine (Not from alpine
> container).
> 6. The problem does not occur if run from another container which uses Gnu
> libc, like "Debian" image.
> 
> Sample command outputs attached for reference.
> 
> Request you to kindly help in debugging / resolution of this.
> 
> Kind Regards,
> R. Srinivasa Raghavan.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DNS resolution happenning only after timeout
  2017-09-28 10:28 ` Szabolcs Nagy
@ 2017-09-28 16:55   ` Rich Felker
  2017-10-04 13:48     ` Srinivasa Raghavan
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2017-09-28 16:55 UTC (permalink / raw)
  To: musl; +Cc: Srinivasa Raghavan

On Thu, Sep 28, 2017 at 12:28:55PM +0200, Szabolcs Nagy wrote:
> * Srinivasa Raghavan <raghav135@gmail.com> [2017-09-28 15:45:28 +0530]:
> > When using "Alpine" docker image which uses musl-libc, we are facing delay
> > when we do operations like below in our production environment,
> > 1. ping <name>
> > 2. nslookup <name>
> > 3. traceroute <name>
> > 4. http request from node.js
> > 
> 
> this bug may be related:
> https://github.com/rancher/rancher/issues/9961

Yes, I just filed it after reading the discussion on IRC and this bug
report that was linked as describing similar behavior:

https://github.com/rancher/rancher/issues/4177#issuecomment-332571951

This really requires a fix on the rancher-dns side. I'm not sure
exactly what glibc is doing, but it couldn't be giving the behavior
you want without doing something wrong: it's falling back and trying
different search domains when it hasn't been told that the first one
doesn't exist, only that the nameserver is experiencing a problem.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DNS resolution happenning only after timeout
  2017-09-28 16:55   ` Rich Felker
@ 2017-10-04 13:48     ` Srinivasa Raghavan
  2017-10-04 16:46       ` Markus Wichmann
  0 siblings, 1 reply; 8+ messages in thread
From: Srinivasa Raghavan @ 2017-10-04 13:48 UTC (permalink / raw)
  To: musl, dalias

[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]

Hi Rich,

Thanks for the reply.

Some updates:
1. Our DNS server is "Infoblox appliance".
2. When we had a delay, we found that there was a "AAAA" query along with
"A" query.

I did further debugging with "tcpdump" and able to narrow down on the
difference in behavior between "debian" and "alpine" images.

In debian:
If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
Then the "nslookup" (or name resolution) does *not* do a "AAAA" query

In alpine:
If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
Then the "nslookup" (or name resolution) does an "AAAA" query along with
"A" query

Is this intentional?

Also, I was wondering if there was any way to disable AAAA query in name
resolution?

Kind Regards,
Srinivasa Raghavan.

On Thu, Sep 28, 2017 at 10:25 PM, Rich Felker <dalias@libc.org> wrote:

> On Thu, Sep 28, 2017 at 12:28:55PM +0200, Szabolcs Nagy wrote:
> > * Srinivasa Raghavan <raghav135@gmail.com> [2017-09-28 15:45:28 +0530]:
> > > When using "Alpine" docker image which uses musl-libc, we are facing
> delay
> > > when we do operations like below in our production environment,
> > > 1. ping <name>
> > > 2. nslookup <name>
> > > 3. traceroute <name>
> > > 4. http request from node.js
> > >
> >
> > this bug may be related:
> > https://github.com/rancher/rancher/issues/9961
>
> Yes, I just filed it after reading the discussion on IRC and this bug
> report that was linked as describing similar behavior:
>
> https://github.com/rancher/rancher/issues/4177#issuecomment-332571951
>
> This really requires a fix on the rancher-dns side. I'm not sure
> exactly what glibc is doing, but it couldn't be giving the behavior
> you want without doing something wrong: it's falling back and trying
> different search domains when it hasn't been told that the first one
> doesn't exist, only that the nameserver is experiencing a problem.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 3889 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DNS resolution happenning only after timeout
  2017-10-04 13:48     ` Srinivasa Raghavan
@ 2017-10-04 16:46       ` Markus Wichmann
  2017-10-04 19:28         ` Srinivasa Raghavan
  0 siblings, 1 reply; 8+ messages in thread
From: Markus Wichmann @ 2017-10-04 16:46 UTC (permalink / raw)
  To: musl

On Wed, Oct 04, 2017 at 07:18:10PM +0530, Srinivasa Raghavan wrote:
> Hi Rich,
> 
> Thanks for the reply.
> 
> Some updates:
> 1. Our DNS server is "Infoblox appliance".
> 2. When we had a delay, we found that there was a "AAAA" query along with
> "A" query.
> 
> I did further debugging with "tcpdump" and able to narrow down on the
> difference in behavior between "debian" and "alpine" images.
> 
> In debian:
> If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
> Then the "nslookup" (or name resolution) does *not* do a "AAAA" query
> 

That's probably because glibc's DNS resolver only generates AAAA queries
if it can create an IPv6 socket.

> In alpine:
> If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
> Then the "nslookup" (or name resolution) does an "AAAA" query along with
> "A" query
> 
> Is this intentional?
> 
> Also, I was wondering if there was any way to disable AAAA query in name
> resolution?
> 

There does not appear to be a way without changing code. In musl, the
function name_from_dns() will always generate both the AAAA and the A
query unless "family" is explicitly set to one of the address families.
No input from resolv.conf or similar is used for this. And "family"
comes directly from the caller, i.e. nslookup. You'd have to change the
nslookup code to only ask for IPv4 addresses.

> Kind Regards,
> Srinivasa Raghavan.

Ciao,
Markus


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DNS resolution happenning only after timeout
  2017-10-04 16:46       ` Markus Wichmann
@ 2017-10-04 19:28         ` Srinivasa Raghavan
  2017-10-04 20:18           ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Srinivasa Raghavan @ 2017-10-04 19:28 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1963 bytes --]

Hi Markus,

Thanks for the reply.

The problem is not only in nslookup, it is there in ping, tracert, curl,
node.js, wget etc. :(

I will debug and find the exact c api that is used for each of the
scenarios.

I am just wondering if there is any workaround ?

Lot of folks are facing this issue (slow dns name resolution in alpine
linux, with some dns servers) , and this may be the root cause?

Kind Regards,
Rsr


On Wed, 4 Oct 2017 at 10:16 PM, Markus Wichmann <nullplan@gmx.net> wrote:

> On Wed, Oct 04, 2017 at 07:18:10PM +0530, Srinivasa Raghavan wrote:
> > Hi Rich,
> >
> > Thanks for the reply.
> >
> > Some updates:
> > 1. Our DNS server is "Infoblox appliance".
> > 2. When we had a delay, we found that there was a "AAAA" query along with
> > "A" query.
> >
> > I did further debugging with "tcpdump" and able to narrow down on the
> > difference in behavior between "debian" and "alpine" images.
> >
> > In debian:
> > If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
> > Then the "nslookup" (or name resolution) does *not* do a "AAAA" query
> >
>
> That's probably because glibc's DNS resolver only generates AAAA queries
> if it can create an IPv6 socket.
>
> > In alpine:
> > If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
> > Then the "nslookup" (or name resolution) does an "AAAA" query along with
> > "A" query
> >
> > Is this intentional?
> >
> > Also, I was wondering if there was any way to disable AAAA query in name
> > resolution?
> >
>
> There does not appear to be a way without changing code. In musl, the
> function name_from_dns() will always generate both the AAAA and the A
> query unless "family" is explicitly set to one of the address families.
> No input from resolv.conf or similar is used for this. And "family"
> comes directly from the caller, i.e. nslookup. You'd have to change the
> nslookup code to only ask for IPv4 addresses.
>
> > Kind Regards,
> > Srinivasa Raghavan.
>
> Ciao,
> Markus
>

[-- Attachment #2: Type: text/html, Size: 2886 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DNS resolution happenning only after timeout
  2017-10-04 19:28         ` Srinivasa Raghavan
@ 2017-10-04 20:18           ` Rich Felker
  2017-10-04 20:39             ` Srinivasa Raghavan
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2017-10-04 20:18 UTC (permalink / raw)
  To: musl

On Wed, Oct 04, 2017 at 07:28:35PM +0000, Srinivasa Raghavan wrote:
> Hi Markus,
> 
> Thanks for the reply.
> 
> The problem is not only in nslookup, it is there in ping, tracert, curl,
> node.js, wget etc. :(
> 
> I will debug and find the exact c api that is used for each of the
> scenarios.
> 
> I am just wondering if there is any workaround ?
> 
> Lot of folks are facing this issue (slow dns name resolution in alpine
> linux, with some dns servers) , and this may be the root cause?

musl does not have any way to suppress applications' requests for IPv6
lookups. In theory if an application used the AI_ADDRCONF option to
request "only give IPv6 results if IPv6 is supported" we could do it,
but there are multiple reasons this hasn't been implemented including
ambiguity as to how exactly it should behave, and I doubt it would
help anyway since most applications don't use this option.

From the info you've provided so far, my best guess is that you have a
buggy nameserver that either stalls or replies with a non-conclusive
message like ServFail when it receives an AAAA query. If this is the
case, there are a few possible fixes or workarounds you could try:

1. If the nameserver is on a device under your control, see if there's
   an upgrade/patch to fix the issue.

2. Switch to a different nameserver without the bug like the public
   Google ones at 8.8.8.8 etc.

3. Run your own caching/proxy nameserver on localhost and configure it
   to reply NxDomain (does not exist) for all AAAA lookups.

4. Use iptables to catch DNS query packets for AAAA records and
   redirect them to a dummy server that just always replies with
   NxDomain.

Without knowing more about your environment I can't really guess which
ones of these options, if any, might be practical for you but
hopefully at least one is.

Rich



> On Wed, 4 Oct 2017 at 10:16 PM, Markus Wichmann <nullplan@gmx.net> wrote:
> 
> > On Wed, Oct 04, 2017 at 07:18:10PM +0530, Srinivasa Raghavan wrote:
> > > Hi Rich,
> > >
> > > Thanks for the reply.
> > >
> > > Some updates:
> > > 1. Our DNS server is "Infoblox appliance".
> > > 2. When we had a delay, we found that there was a "AAAA" query along with
> > > "A" query.
> > >
> > > I did further debugging with "tcpdump" and able to narrow down on the
> > > difference in behavior between "debian" and "alpine" images.
> > >
> > > In debian:
> > > If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
> > > Then the "nslookup" (or name resolution) does *not* do a "AAAA" query
> > >
> >
> > That's probably because glibc's DNS resolver only generates AAAA queries
> > if it can create an IPv6 socket.
> >
> > > In alpine:
> > > If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
> > > Then the "nslookup" (or name resolution) does an "AAAA" query along with
> > > "A" query
> > >
> > > Is this intentional?
> > >
> > > Also, I was wondering if there was any way to disable AAAA query in name
> > > resolution?
> > >
> >
> > There does not appear to be a way without changing code. In musl, the
> > function name_from_dns() will always generate both the AAAA and the A
> > query unless "family" is explicitly set to one of the address families.
> > No input from resolv.conf or similar is used for this. And "family"
> > comes directly from the caller, i.e. nslookup. You'd have to change the
> > nslookup code to only ask for IPv4 addresses.
> >
> > > Kind Regards,
> > > Srinivasa Raghavan.
> >
> > Ciao,
> > Markus
> >


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: DNS resolution happenning only after timeout
  2017-10-04 20:18           ` Rich Felker
@ 2017-10-04 20:39             ` Srinivasa Raghavan
  0 siblings, 0 replies; 8+ messages in thread
From: Srinivasa Raghavan @ 2017-10-04 20:39 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3837 bytes --]

Hi Rich,
Thanks for your time and reply.
Will try to get the dns fixed.
Kind Regards,
R. Srinivasa Raghavan.


On Thu, 5 Oct 2017 at 1:49 AM, Rich Felker <dalias@libc.org> wrote:

> On Wed, Oct 04, 2017 at 07:28:35PM +0000, Srinivasa Raghavan wrote:
> > Hi Markus,
> >
> > Thanks for the reply.
> >
> > The problem is not only in nslookup, it is there in ping, tracert, curl,
> > node.js, wget etc. :(
> >
> > I will debug and find the exact c api that is used for each of the
> > scenarios.
> >
> > I am just wondering if there is any workaround ?
> >
> > Lot of folks are facing this issue (slow dns name resolution in alpine
> > linux, with some dns servers) , and this may be the root cause?
>
> musl does not have any way to suppress applications' requests for IPv6
> lookups. In theory if an application used the AI_ADDRCONF option to
> request "only give IPv6 results if IPv6 is supported" we could do it,
> but there are multiple reasons this hasn't been implemented including
> ambiguity as to how exactly it should behave, and I doubt it would
> help anyway since most applications don't use this option.
>
> From the info you've provided so far, my best guess is that you have a
> buggy nameserver that either stalls or replies with a non-conclusive
> message like ServFail when it receives an AAAA query. If this is the
> case, there are a few possible fixes or workarounds you could try:
>
> 1. If the nameserver is on a device under your control, see if there's
>    an upgrade/patch to fix the issue.
>
> 2. Switch to a different nameserver without the bug like the public
>    Google ones at 8.8.8.8 etc.
>
> 3. Run your own caching/proxy nameserver on localhost and configure it
>    to reply NxDomain (does not exist) for all AAAA lookups.
>
> 4. Use iptables to catch DNS query packets for AAAA records and
>    redirect them to a dummy server that just always replies with
>    NxDomain.
>
> Without knowing more about your environment I can't really guess which
> ones of these options, if any, might be practical for you but
> hopefully at least one is.
>
> Rich
>
>
>
> > On Wed, 4 Oct 2017 at 10:16 PM, Markus Wichmann <nullplan@gmx.net>
> wrote:
> >
> > > On Wed, Oct 04, 2017 at 07:18:10PM +0530, Srinivasa Raghavan wrote:
> > > > Hi Rich,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > Some updates:
> > > > 1. Our DNS server is "Infoblox appliance".
> > > > 2. When we had a delay, we found that there was a "AAAA" query along
> with
> > > > "A" query.
> > > >
> > > > I did further debugging with "tcpdump" and able to narrow down on the
> > > > difference in behavior between "debian" and "alpine" images.
> > > >
> > > > In debian:
> > > > If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
> > > > Then the "nslookup" (or name resolution) does *not* do a "AAAA" query
> > > >
> > >
> > > That's probably because glibc's DNS resolver only generates AAAA
> queries
> > > if it can create an IPv6 socket.
> > >
> > > > In alpine:
> > > > If ipv6 is disabled (net.ipv6.conf.default.disable_ipv6 = 1)
> > > > Then the "nslookup" (or name resolution) does an "AAAA" query along
> with
> > > > "A" query
> > > >
> > > > Is this intentional?
> > > >
> > > > Also, I was wondering if there was any way to disable AAAA query in
> name
> > > > resolution?
> > > >
> > >
> > > There does not appear to be a way without changing code. In musl, the
> > > function name_from_dns() will always generate both the AAAA and the A
> > > query unless "family" is explicitly set to one of the address families.
> > > No input from resolv.conf or similar is used for this. And "family"
> > > comes directly from the caller, i.e. nslookup. You'd have to change the
> > > nslookup code to only ask for IPv4 addresses.
> > >
> > > > Kind Regards,
> > > > Srinivasa Raghavan.
> > >
> > > Ciao,
> > > Markus
> > >
>

[-- Attachment #2: Type: text/html, Size: 5120 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-10-04 20:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-28 10:15 DNS resolution happenning only after timeout Srinivasa Raghavan
2017-09-28 10:28 ` Szabolcs Nagy
2017-09-28 16:55   ` Rich Felker
2017-10-04 13:48     ` Srinivasa Raghavan
2017-10-04 16:46       ` Markus Wichmann
2017-10-04 19:28         ` Srinivasa Raghavan
2017-10-04 20:18           ` Rich Felker
2017-10-04 20:39             ` Srinivasa Raghavan

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).