mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN
@ 2020-08-23 21:31 Daniel Neri
  2020-08-24 16:16 ` Rich Felker
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Neri @ 2020-08-23 21:31 UTC (permalink / raw)
  To: musl

  Hello,


Musl’s res_query(3) returns success for a query that results in an NXDOMAIN response, which disagrees with other common implementations.

Simple test case: https://t.rb67.eu/res_query.c

On Debian Linux, OpenBSD and FreeBSD this program prints:

  error: Unknown host

but on Alpine Linux edge (musl v1.2.1):

  success (size 91)



Best regards,
Daniel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN
  2020-08-23 21:31 [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN Daniel Neri
@ 2020-08-24 16:16 ` Rich Felker
  2020-08-24 16:43   ` Rich Felker
  0 siblings, 1 reply; 13+ messages in thread
From: Rich Felker @ 2020-08-24 16:16 UTC (permalink / raw)
  To: musl

On Sun, Aug 23, 2020 at 11:31:32PM +0200, Daniel Neri wrote:
>   Hello,
> 
> 
> Musl’s res_query(3) returns success for a query that results in an
> NXDOMAIN response, which disagrees with other common
> implementations.
> 
> Simple test case: https://t.rb67.eu/res_query.c
> 
> On Debian Linux, OpenBSD and FreeBSD this program prints:
> 
>   error: Unknown host
> 
> but on Alpine Linux edge (musl v1.2.1):
> 
>   success (size 91)

I think there's a good argument that this should be changed, but that
would require making res_* set h_errno, which would require making
h_errno thread-local -- which it probably should have been all along,
but the standard interfaces that used it, gethostby* etc., were
non-thread-safe anyway and thus it seemed unnecessary.

FWIW I think the musl behavior is *better* here (more informative to
the application), but usually it's preferable not to break the
existing (even if informal) contracts of the interfaces being
implemented than to be "better".

Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN
  2020-08-24 16:16 ` Rich Felker
@ 2020-08-24 16:43   ` Rich Felker
  2020-08-24 20:39     ` Daniel Neri
  2020-08-24 21:04     ` Florian Weimer
  0 siblings, 2 replies; 13+ messages in thread
From: Rich Felker @ 2020-08-24 16:43 UTC (permalink / raw)
  To: musl

On Mon, Aug 24, 2020 at 12:16:47PM -0400, Rich Felker wrote:
> On Sun, Aug 23, 2020 at 11:31:32PM +0200, Daniel Neri wrote:
> >   Hello,
> > 
> > 
> > Musl’s res_query(3) returns success for a query that results in an
> > NXDOMAIN response, which disagrees with other common
> > implementations.
> > 
> > Simple test case: https://t.rb67.eu/res_query.c
> > 
> > On Debian Linux, OpenBSD and FreeBSD this program prints:
> > 
> >   error: Unknown host
> > 
> > but on Alpine Linux edge (musl v1.2.1):
> > 
> >   success (size 91)
> 
> I think there's a good argument that this should be changed, but that
> would require making res_* set h_errno, which would require making
> h_errno thread-local -- which it probably should have been all along,
> but the standard interfaces that used it, gethostby* etc., were
> non-thread-safe anyway and thus it seemed unnecessary.
> 
> FWIW I think the musl behavior is *better* here (more informative to
> the application), but usually it's preferable not to break the
> existing (even if informal) contracts of the interfaces being
> implemented than to be "better".

Hmm, I think in this case the "better" might be sufficient that we
want to keep it and pressure other implementations to change too. A
program performing a lookup where the result is NxDomain may very well
want to know whether that's an authenticated (by DNSSEC) NxDomain or
one in an insecure zone. Returning an error to the caller with no
packet contents discards this critical data.

Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN
  2020-08-24 16:43   ` Rich Felker
@ 2020-08-24 20:39     ` Daniel Neri
  2020-08-24 21:36       ` Rich Felker
  2020-08-24 21:04     ` Florian Weimer
  1 sibling, 1 reply; 13+ messages in thread
From: Daniel Neri @ 2020-08-24 20:39 UTC (permalink / raw)
  To: musl

On 24 Aug 2020, at 18:43, Rich Felker <dalias@libc.org> wrote:
> 
> Hmm, I think in this case the "better" might be sufficient that we
> want to keep it and pressure other implementations to change too. A
> program performing a lookup where the result is NxDomain may very well
> want to know whether that's an authenticated (by DNSSEC) NxDomain or
> one in an insecure zone. Returning an error to the caller with no
> packet contents discards this critical data.

In that case, it’d be better to add a new resolver API, or implement an already existing one that supports this usecase. The other implementations I mentioned also support option flags (in global state) that can change the behaviour.

res_query(3) is almost as old as DNS itself — it doesn’t seem likely that everyone else, both libraries and applications, are going to make incompatible changes at this point.


Regards,
Daniel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN
  2020-08-24 16:43   ` Rich Felker
  2020-08-24 20:39     ` Daniel Neri
@ 2020-08-24 21:04     ` Florian Weimer
  2020-08-24 21:32       ` Rich Felker
  1 sibling, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2020-08-24 21:04 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> Hmm, I think in this case the "better" might be sufficient that we
> want to keep it and pressure other implementations to change too. A
> program performing a lookup where the result is NxDomain may very well
> want to know whether that's an authenticated (by DNSSEC) NxDomain or
> one in an insecure zone. Returning an error to the caller with no
> packet contents discards this critical data.

Isn't this the behavior you'd get with res_send?

I think such error translation is precisely the point of the res_query
convenience function (along with the implicit construction of the
query packet).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN
  2020-08-24 21:04     ` Florian Weimer
@ 2020-08-24 21:32       ` Rich Felker
  2020-08-24 21:51         ` Daniel Neri
  2020-08-24 22:04         ` Florian Weimer
  0 siblings, 2 replies; 13+ messages in thread
From: Rich Felker @ 2020-08-24 21:32 UTC (permalink / raw)
  To: musl

On Mon, Aug 24, 2020 at 11:04:49PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > Hmm, I think in this case the "better" might be sufficient that we
> > want to keep it and pressure other implementations to change too. A
> > program performing a lookup where the result is NxDomain may very well
> > want to know whether that's an authenticated (by DNSSEC) NxDomain or
> > one in an insecure zone. Returning an error to the caller with no
> > packet contents discards this critical data.
> 
> Isn't this the behavior you'd get with res_send?
> 
> I think such error translation is precisely the point of the res_query
> convenience function (along with the implicit construction of the
> query packet).

Does such a distinction exist? I thought res_query was just equivalent
to res_mkquery+res_send and that calling res_send directly would get
you the same errors. If they are different then I suspect some
applications are doing the wrong thing calling res_query here and
should be using res_mkquery+res_send...

Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN
  2020-08-24 20:39     ` Daniel Neri
@ 2020-08-24 21:36       ` Rich Felker
  0 siblings, 0 replies; 13+ messages in thread
From: Rich Felker @ 2020-08-24 21:36 UTC (permalink / raw)
  To: musl

On Mon, Aug 24, 2020 at 10:39:30PM +0200, Daniel Neri wrote:
> On 24 Aug 2020, at 18:43, Rich Felker <dalias@libc.org> wrote:
> > 
> > Hmm, I think in this case the "better" might be sufficient that we
> > want to keep it and pressure other implementations to change too. A
> > program performing a lookup where the result is NxDomain may very well
> > want to know whether that's an authenticated (by DNSSEC) NxDomain or
> > one in an insecure zone. Returning an error to the caller with no
> > packet contents discards this critical data.
> 
> In that case, it’d be better to add a new resolver API, or implement
> an already existing one that supports this usecase. The other
> implementations I mentioned also support option flags (in global
> state) that can change the behaviour.
> 
> res_query(3) is almost as old as DNS itself — it doesn’t seem likely
> that everyone else, both libraries and applications, are going to
> make incompatible changes at this point.

If it were really incompatible behavior I would agree, but the
behaviors aren't incompatible. Either is compatible with the same
underspecified documentation, and a reasonable caller will easily deal
with both -- the nxdomain reply looks very similar to nodata, which it
also needs to be able to handle, except for the different error code
(3 vs 0). A caller can't assume just because res_query succeeded that
ancount>0.

Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN
  2020-08-24 21:32       ` Rich Felker
@ 2020-08-24 21:51         ` Daniel Neri
  2020-08-24 22:09           ` Rich Felker
  2020-08-24 22:04         ` Florian Weimer
  1 sibling, 1 reply; 13+ messages in thread
From: Daniel Neri @ 2020-08-24 21:51 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 532 bytes --]

On 24 Aug 2020, at 23:32, Rich Felker <dalias@libc.org> wrote:
> 
> Does such a distinction exist? I thought res_query was just equivalent
> to res_mkquery+res_send and that calling res_send directly would get
> you the same errors.

I thought so too, but I’ve been reading the musl implementation. ;-)

After looking more at the other implementations, I think Florian is correct though: it’s more like res_mkquery+res_send+setting h_errno and the return value based on the RCODE of the response.

Regards,
Daniel


[-- Attachment #2: Type: text/html, Size: 2938 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN
  2020-08-24 21:32       ` Rich Felker
  2020-08-24 21:51         ` Daniel Neri
@ 2020-08-24 22:04         ` Florian Weimer
  2020-08-24 22:13           ` Rich Felker
  1 sibling, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2020-08-24 22:04 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> On Mon, Aug 24, 2020 at 11:04:49PM +0200, Florian Weimer wrote:
>> * Rich Felker:
>> 
>> > Hmm, I think in this case the "better" might be sufficient that we
>> > want to keep it and pressure other implementations to change too. A
>> > program performing a lookup where the result is NxDomain may very well
>> > want to know whether that's an authenticated (by DNSSEC) NxDomain or
>> > one in an insecure zone. Returning an error to the caller with no
>> > packet contents discards this critical data.
>> 
>> Isn't this the behavior you'd get with res_send?
>> 
>> I think such error translation is precisely the point of the res_query
>> convenience function (along with the implicit construction of the
>> query packet).
>
> Does such a distinction exist?

Yes, I think so.  It's the behavior of the BIND 4 era stub resolver
code.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN
  2020-08-24 21:51         ` Daniel Neri
@ 2020-08-24 22:09           ` Rich Felker
  2020-08-25  3:26             ` Rich Felker
  0 siblings, 1 reply; 13+ messages in thread
From: Rich Felker @ 2020-08-24 22:09 UTC (permalink / raw)
  To: musl

On Mon, Aug 24, 2020 at 11:51:52PM +0200, Daniel Neri wrote:
> On 24 Aug 2020, at 23:32, Rich Felker <dalias@libc.org> wrote:
> > 
> > Does such a distinction exist? I thought res_query was just equivalent
> > to res_mkquery+res_send and that calling res_send directly would get
> > you the same errors.
> 
> I thought so too, but I’ve been reading the musl implementation. ;-)
> 
> After looking more at the other implementations, I think Florian is
> correct though: it’s more like res_mkquery+res_send+setting h_errno
> and the return value based on the RCODE of the response.

If this is indeed the case and the right behavior can be obtained
reliably by ignoring res_query and using res_mkquery+res_send, I have
no fundamental objection to changing this. However we should have a
plan for moving h_errno to be thread-local and figuring out what
breakage if any there could be for apps built with it global.

Thankfully, it looks like we're already using (*__h_errno_location())
even though it was not thread-local, so existing apps built against
current musl headers should be immediately compatible with changing it
to be thread-local.

Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN
  2020-08-24 22:04         ` Florian Weimer
@ 2020-08-24 22:13           ` Rich Felker
  0 siblings, 0 replies; 13+ messages in thread
From: Rich Felker @ 2020-08-24 22:13 UTC (permalink / raw)
  To: musl

On Tue, Aug 25, 2020 at 12:04:44AM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Mon, Aug 24, 2020 at 11:04:49PM +0200, Florian Weimer wrote:
> >> * Rich Felker:
> >> 
> >> > Hmm, I think in this case the "better" might be sufficient that we
> >> > want to keep it and pressure other implementations to change too. A
> >> > program performing a lookup where the result is NxDomain may very well
> >> > want to know whether that's an authenticated (by DNSSEC) NxDomain or
> >> > one in an insecure zone. Returning an error to the caller with no
> >> > packet contents discards this critical data.
> >> 
> >> Isn't this the behavior you'd get with res_send?
> >> 
> >> I think such error translation is precisely the point of the res_query
> >> convenience function (along with the implicit construction of the
> >> query packet).
> >
> > Does such a distinction exist?
> 
> Yes, I think so.  It's the behavior of the BIND 4 era stub resolver
> code.

OK. The man pages (and glibc docs? not sure) don't seem to document
this, so we should probably try to get them fixed too. The closest I
can find is the text that:

    "In the case of an error return from res_nquery(), res_query(),
    res_nsearch(), res_search(), res_nquerydomain(), or
    res_querydomain(), the global variable h_errno (see
    gethostbyname(3)) can be consulted to determine the cause of the
    error."

which could be interpreted as saying the same errors are specified to
happen (does this mean NODATA shoudl also be an error rather than a
success return?), but it doesn't make clear that nxdomain and noerror
are not errors for res_send etc.

Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN
  2020-08-24 22:09           ` Rich Felker
@ 2020-08-25  3:26             ` Rich Felker
  2020-08-25 13:56               ` Daniel Neri
  0 siblings, 1 reply; 13+ messages in thread
From: Rich Felker @ 2020-08-25  3:26 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1300 bytes --]

On Mon, Aug 24, 2020 at 06:09:23PM -0400, Rich Felker wrote:
> On Mon, Aug 24, 2020 at 11:51:52PM +0200, Daniel Neri wrote:
> > On 24 Aug 2020, at 23:32, Rich Felker <dalias@libc.org> wrote:
> > > 
> > > Does such a distinction exist? I thought res_query was just equivalent
> > > to res_mkquery+res_send and that calling res_send directly would get
> > > you the same errors.
> > 
> > I thought so too, but I’ve been reading the musl implementation. ;-)
> > 
> > After looking more at the other implementations, I think Florian is
> > correct though: it’s more like res_mkquery+res_send+setting h_errno
> > and the return value based on the RCODE of the response.
> 
> If this is indeed the case and the right behavior can be obtained
> reliably by ignoring res_query and using res_mkquery+res_send, I have
> no fundamental objection to changing this. However we should have a
> plan for moving h_errno to be thread-local and figuring out what
> breakage if any there could be for apps built with it global.
> 
> Thankfully, it looks like we're already using (*__h_errno_location())
> even though it was not thread-local, so existing apps built against
> current musl headers should be immediately compatible with changing it
> to be thread-local.

I have the attached patches queued now.

Rich

[-- Attachment #2: 0001-make-h_errno-thread-local.patch --]
[-- Type: text/plain, Size: 1413 bytes --]

From 9d0b8b92a508c328e7eac774847f001f80dfb5ff Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Mon, 24 Aug 2020 21:38:49 -0400
Subject: [PATCH 1/2] make h_errno thread-local

the framework to do this always existed but it was deemed unnecessary
because the only [ex-]standard functions using h_errno were not
thread-safe anyway. however, some of the nonstandard res_* functions
are also supposed to set h_errno to indicate the cause of error, and
were unable to do so because it was not thread-safe. this change is a
prerequisite for fixing them.
---
 src/internal/pthread_impl.h | 1 +
 src/network/h_errno.c       | 6 ++----
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/internal/pthread_impl.h b/src/internal/pthread_impl.h
index 5742dfc5..5749a336 100644
--- a/src/internal/pthread_impl.h
+++ b/src/internal/pthread_impl.h
@@ -43,6 +43,7 @@ struct pthread {
 		long off;
 		volatile void *volatile pending;
 	} robust_list;
+	int h_errno_val;
 	volatile int timer_id;
 	locale_t locale;
 	volatile int killlock[1];
diff --git a/src/network/h_errno.c b/src/network/h_errno.c
index 4f700cea..8677a92b 100644
--- a/src/network/h_errno.c
+++ b/src/network/h_errno.c
@@ -1,9 +1,7 @@
 #include <netdb.h>
-
-#undef h_errno
-int h_errno;
+#include "pthread_impl.h"
 
 int *__h_errno_location(void)
 {
-	return &h_errno;
+	return &__pthread_self()->h_errno_val;
 }
-- 
2.21.0


[-- Attachment #3: 0002-report-res_query-failures-including-nxdomain-nodata-.patch --]
[-- Type: text/plain, Size: 1554 bytes --]

From 19f8642494b7d27b2ceed5c14d4a0b27cb749afe Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Mon, 24 Aug 2020 21:56:48 -0400
Subject: [PATCH 2/2] report res_query failures, including nxdomain/nodata, via
 h_errno

while it's not clearly documented anywhere, this is the historical
behavior which some applications expect. applications which need to
see the response packet in these cases, for example to distinguish
between nonexistence in a secure vs insecure zone, must already use
res_mkquery with res_send in order to be portable, since most if not
all other implementations of res_query don't provide it.
---
 src/network/res_query.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/network/res_query.c b/src/network/res_query.c
index 2f4da2e2..506dc231 100644
--- a/src/network/res_query.c
+++ b/src/network/res_query.c
@@ -1,3 +1,4 @@
+#define _BSD_SOURCE
 #include <resolv.h>
 #include <netdb.h>
 
@@ -6,7 +7,20 @@ int res_query(const char *name, int class, int type, unsigned char *dest, int le
 	unsigned char q[280];
 	int ql = __res_mkquery(0, name, class, type, 0, 0, 0, q, sizeof q);
 	if (ql < 0) return ql;
-	return __res_send(q, ql, dest, len);
+	int r = __res_send(q, ql, dest, len);
+	if (r<12) {
+		h_errno = TRY_AGAIN;
+		return -1;
+	}
+	if ((dest[3] & 15) == 3) {
+		h_errno = HOST_NOT_FOUND;
+		return -1;
+	}
+	if ((dest[3] & 15) == 0 && !dest[6] && !dest[7]) {
+		h_errno = NO_DATA;
+		return -1;
+	}
+	return r;
 }
 
 weak_alias(res_query, res_search);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN
  2020-08-25  3:26             ` Rich Felker
@ 2020-08-25 13:56               ` Daniel Neri
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Neri @ 2020-08-25 13:56 UTC (permalink / raw)
  To: musl

On 25 Aug 2020, at 05:26, Rich Felker <dalias@libc.org> wrote:
> 
> I have the attached patches queued now.

Great! I've patched my Alpine machine and it seems to work as expected.

Thanks,
Daniel


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-08-25 13:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-23 21:31 [musl] Incompatible behaviour of res_query(3) w.r.t. NXDOMAIN Daniel Neri
2020-08-24 16:16 ` Rich Felker
2020-08-24 16:43   ` Rich Felker
2020-08-24 20:39     ` Daniel Neri
2020-08-24 21:36       ` Rich Felker
2020-08-24 21:04     ` Florian Weimer
2020-08-24 21:32       ` Rich Felker
2020-08-24 21:51         ` Daniel Neri
2020-08-24 22:09           ` Rich Felker
2020-08-25  3:26             ` Rich Felker
2020-08-25 13:56               ` Daniel Neri
2020-08-24 22:04         ` Florian Weimer
2020-08-24 22:13           ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).