[musl] dns resolution failure in virtio-net guest

mailing list of musl libc
 help / color / mirror / code / Atom feed

* [musl] dns resolution failure in virtio-net guest
@ 2024-02-17 11:08 g1pi
  2024-02-17 18:45 ` Rich Felker
  0 siblings, 1 reply; 3+ messages in thread
From: g1pi @ 2024-02-17 11:08 UTC (permalink / raw)
  To: musl


Hi all.

I stumped on a weird instance of domain resolution failure in a
virtualization scenario involving a MUSL-based guest.  A little
investigation turned out results that are puzzling, at least for me.

This is the scenario:

Host:
- debian 12 x86_64
- kernel 6.1.0-18-amd64, qemu 7.2
- caching nameserver listening on 127.0.0.1

Guest:
- void linux x86_64
- kvm acceleration
- virtio netdev, configured in (default) user-mode
- kernel 6.1.71_1, musl-1.1.24_20
- /etc/resolv.conf:
    nameserver 10.0.2.2         the caching dns in the host
    nameserver 192.168.1.123    non existent

In this scenario, "getent hosts example.com" consistently fails.

The problem vanishes when I do any of these:
- strace the command (!)
- replace 10.0.2.2 with another working dns across a physical cable/wifi
  (e.g. 192.168.1.1)
- remove the non-existent dns
- swap the nameservers in /etc/resolv.conf

I wrote a short test program (see below) to perform the same system calls
done by the MUSL resolver, and it turns out that

- when all sendto() calls are performed in short order, the (unique)
  response packet is never received

    $ ./a.out 10.0.2.2 192.168.1.123
    poll: 0 1 0
    recvfrom() -1
    recvfrom() -1

- if a short delay (16 msec) is inserted between the calls, all is fine

    $ ./a.out 10.0.2.2 delay 192.168.1.123
    poll: 1 1 1
    recvfrom() 45
    <response packet>
    recvfrom() -1

The program's output is the same in several guests with different
kernel/libc combinations (linux/glibc, linux/musl, freebsd, openbsd).
Only when the emulated netdev was switched from virtio to pcnet, did
the problem go away.

I guess that, when there is no delay between the sendto() calls, the
second one happens exactly while the kernel is receiving the response
packet, and the latter is silently dropped.  A short delay before
the second sendto(), or a random delay in the response (because the
working dns is "far away"), apparently solve the issue.

I don't know what the UDP standard mandates, and especially what should
happen when a packet is received on a socket at the exact time another
packet is sent out on the same socket.

If the kernel is allowed to drop the packet, then the MUSL resolver
could be modified to introduce some minimal delay between calls, at
least when retrying.

Otherwise, there could be a race-condition in the network layer.
Perhaps in the host linux/kvm/qemu.  Perhaps in virtio-net, since the
problem shows up in guests with different kernels, and only when they
use virtio-net; but it might just be that other emulated devices mask
the issue just by adding a little overhead.

Please, CC me in replies.

Best regards,
        g.b.

===== cut here =====

    #include <stdio.h>
    #include <time.h>
    #include <poll.h>
    #include <assert.h>
    #include <string.h>

    #include <arpa/inet.h>
    #include <netdb.h>
    #include <netinet/in.h>
    #include <sys/socket.h>
    #include <sys/socket.h>
    #include <sys/types.h>

    static void dump(const char *s, size_t len) {
        while (len--) {
            char t = *s++;
            if (' ' <= t && t <= '~' && t != '\\')
                printf("%c", t);
            else
                printf("\\%o", t & 0xff);
        }
        printf("\n");
    }

    int main(int argc, char *argv[]) {
        int sock, rv, n;
        const char req[] =
            "\202\254\1\0\0\1\0\0\0\0\0\0\7example\3com\0\0\1\0\1";
        struct timespec delay_l = { 1, 0 }; /* 1 sec */
        struct pollfd pfs;
        struct sockaddr_in me = { 0 };

        sock = socket(AF_INET, SOCK_DGRAM | SOCK_CLOEXEC | SOCK_NONBLOCK,
                      IPPROTO_IP);
        assert(sock >= 0);

        me.sin_family = AF_INET;
        me.sin_port = 0;
        me.sin_addr.s_addr = inet_addr("0.0.0.0");
        rv = bind(sock, (struct sockaddr *) &me, sizeof me);
        assert(0 == rv);

        for (n = 1; n < argc; n++) {
            if (0 == strcmp("delay", argv[n])) {
                struct timespec delay_s = { 0, (1 << 24) }; /* ~ 16 msec */
                nanosleep(&delay_s, NULL);
            } else {
                struct sockaddr_in dst = { 0 };
                dst.sin_family = AF_INET;
                dst.sin_port = htons(53);
                dst.sin_addr.s_addr = inet_addr(argv[n]);
                rv = sendto(sock, req, sizeof req - 1, MSG_NOSIGNAL,
                            (struct sockaddr *) &dst, sizeof dst);
                assert(rv >= 0);
            }
        }

        nanosleep(&delay_l, NULL);
        pfs.fd = sock;
        pfs.events = POLLIN;
        rv = poll(&pfs, 1, 2000);
        printf("poll: %d %d %d\n", rv, pfs.events, pfs.revents);

        for (n = 1; n < argc; n++) {
            char resp[4000];
            if (0 == strcmp("delay", argv[n]))
                continue;
            rv = recvfrom(sock, resp, sizeof resp, 0, NULL, NULL);
            printf("recvfrom() %d\n", rv);
            if (rv > 0)
                dump(resp, rv);
        }

        return 0;
    }

===== cut here =====

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [musl] dns resolution failure in virtio-net guest
  2024-02-17 11:08 [musl] dns resolution failure in virtio-net guest g1pi
@ 2024-02-17 18:45 ` Rich Felker
  2024-02-18  7:51   ` g1pi
  0 siblings, 1 reply; 3+ messages in thread
From: Rich Felker @ 2024-02-17 18:45 UTC (permalink / raw)
  To: g1pi; +Cc: musl

On Sat, Feb 17, 2024 at 12:08:12PM +0100, g1pi@libero.it wrote:
> 
> Hi all.
> 
> I stumped on a weird instance of domain resolution failure in a
> virtualization scenario involving a MUSL-based guest.  A little
> investigation turned out results that are puzzling, at least for me.
> 
> This is the scenario:
> 
> Host:
> - debian 12 x86_64
> - kernel 6.1.0-18-amd64, qemu 7.2
> - caching nameserver listening on 127.0.0.1
> 
> Guest:
> - void linux x86_64
> - kvm acceleration
> - virtio netdev, configured in (default) user-mode
> - kernel 6.1.71_1, musl-1.1.24_20
> - /etc/resolv.conf:
>     nameserver 10.0.2.2         the caching dns in the host
>     nameserver 192.168.1.123    non existent
> 
> In this scenario, "getent hosts example.com" consistently fails.
> 
> The problem vanishes when I do any of these:
> - strace the command (!)
> - replace 10.0.2.2 with another working dns across a physical cable/wifi
>   (e.g. 192.168.1.1)
> - remove the non-existent dns
> - swap the nameservers in /etc/resolv.conf
> 
> I wrote a short test program (see below) to perform the same system calls
> done by the MUSL resolver, and it turns out that
> 
> - when all sendto() calls are performed in short order, the (unique)
>   response packet is never received
> 
>     $ ./a.out 10.0.2.2 192.168.1.123
>     poll: 0 1 0
>     recvfrom() -1
>     recvfrom() -1
> 
> - if a short delay (16 msec) is inserted between the calls, all is fine
> 
>     $ ./a.out 10.0.2.2 delay 192.168.1.123
>     poll: 1 1 1
>     recvfrom() 45
>     <response packet>
>     recvfrom() -1
> 
> The program's output is the same in several guests with different
> kernel/libc combinations (linux/glibc, linux/musl, freebsd, openbsd).
> Only when the emulated netdev was switched from virtio to pcnet, did
> the problem go away.
> 
> I guess that, when there is no delay between the sendto() calls, the
> second one happens exactly while the kernel is receiving the response
> packet, and the latter is silently dropped.  A short delay before
> the second sendto(), or a random delay in the response (because the
> working dns is "far away"), apparently solve the issue.
> 
> I don't know what the UDP standard mandates, and especially what should
> happen when a packet is received on a socket at the exact time another
> packet is sent out on the same socket.
> 
> If the kernel is allowed to drop the packet, then the MUSL resolver
> could be modified to introduce some minimal delay between calls, at
> least when retrying.

UDP is "allowed" to drop packets any time for any reason, but that
doesn't mean it's okay to do so in the absence of a good reason, or
that musl should work around bugs where that happens, especially when
they're not a fundamental part of Linux but a particular
virtualization configuration.

I suggest you run tcpdump on the host and watch what's happening, and
I suspect you'll find this is qemu's virtio network being... qemu. It
probably does not do any real NAT, but directly rewrites source and
destination addresses so that your local caching DNS sees *two
identical queries* (same source/dest host/port combination, same query
id) and treats the second as a duplicated packet and ignores it. Or it
may be something different, but at least inspecting the actual network
traffic coming out of the qemu process will tell you what's going on.

Rich

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [musl] dns resolution failure in virtio-net guest
  2024-02-17 18:45 ` Rich Felker
@ 2024-02-18  7:51   ` g1pi
  0 siblings, 0 replies; 3+ messages in thread
From: g1pi @ 2024-02-18  7:51 UTC (permalink / raw)
  To: musl; +Cc: g1pi

On Sat, Feb 17, 2024 at 01:45:34PM -0500, Rich Felker wrote:
> [...]
>
> UDP is "allowed" to drop packets any time for any reason, but that
> doesn't mean it's okay to do so in the absence of a good reason, or
> that musl should work around bugs where that happens, especially when
> they're not a fundamental part of Linux but a particular
> virtualization configuration.

I expected the network to drop a UDP packet anywhere, just not at the
boundary between kernel-space and user-space: it's gratuitously rude.

I agree a workaround is not worth the effort, although I suspect such
a configuration to be more common than not.

> 
> I suggest you run tcpdump on the host and watch what's happening, and
> I suspect you'll find this is qemu's virtio network being... qemu. It
> probably does not do any real NAT, but directly rewrites source and
> destination addresses so that your local caching DNS sees *two
> identical queries* (same source/dest host/port combination, same query
> id) and treats the second as a duplicated packet and ignores it. Or it
> may be something different, but at least inspecting the actual network
> traffic coming out of the qemu process will tell you what's going on.
> 

On the host side all is fine: the cache log shows that it receives the
request and replies correctly, and tcpdump agrees.  I had already
checked that.

But tcpdump on the guest side surprised me:

Good case -- 16 msec delay before second sendto()

7:32:44.332 IP 10.0.2.15.43276 > 10.0.2.2.53: 33452+ A? example.com. (29)
7:32:44.333 IP 10.0.2.2.53 > 10.0.2.15.43276: 33452 1/0/0 A 93.184.216.34 (45)
7:32:44.349 IP 10.0.2.15.43276 > 192.168.1.123.53: 33452+ A? example.com. (29)

Bad case -- rushing the sendto()s

7:32:55.358 IP 10.0.2.15.46537 > 10.0.2.2.53: 33452+ A? example.com. (29)
7:32:55.358 IP 10.0.2.15.46537 > 192.168.1.123.53: 33452+ A? example.com. (29)
7:32:55.358 IP *127.0.0.1*.53 > 10.0.2.15.46537: 33452 1/0/0 A 93.184.216.34 (45)

The response packet does arrive, but has wrong src host.  Same behaviour
in linux and bsd guests.

I believe you guessed correctly that this is a bug in qemu, just more
interesting than I initially thought.  Most likely it's in the
virtio-net driver, which was ported also to the BSDs.  Any suggestion
about how to report it?

g.b.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-02-18 12:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-17 11:08 [musl] dns resolution failure in virtio-net guest g1pi
2024-02-17 18:45 ` Rich Felker
2024-02-18  7:51   ` g1pi

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).