mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Proposal to match behaviour of gethostbyname to glibc
@ 2020-03-13 21:46 Wolf
  2020-03-13 22:16 ` Rich Felker
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Wolf @ 2020-03-13 21:46 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1850 bytes --]

Hello,

today I've noticed difference in behavior of gethostbyname in musl and
in glibc. Given /etc/hosts

	127.0.0.1   foo.bar foo
	127.0.0.1   bar.foo foo

and simple test program

	#include <netdb.h>
	#include <stdio.h>

	int main(int argc, char **argv) {
		struct hostent *he = gethostbyname(argv[1]);
		printf("Hostname: %s\n", he->h_name);
	}

, I've run it both under musl (alpine) and glibc (archlinux).

musl:

	/test # ./test foo
	Hostname: bar.foo

glibc:

	[root@foo test]# ./test foo
	Hostname: foo.bar

I don't think there is an actual reason to iterate through all of the
/etc/hosts and first match can be returned instead. Following patch
should in my opinion fix this.



diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
index c93263a9..da8db9d4 100644
--- a/src/network/lookup_name.c
+++ b/src/network/lookup_name.c
@@ -87,7 +87,10 @@ static int name_from_hosts(struct address buf[static MAXADDRS], char canon[stati
                for (; *p && isspace(*p); p++);
                for (z=p; *z && !isspace(*z); z++);
                *z = 0;
-               if (is_valid_hostname(p)) memcpy(canon, p, z-p+1);
+               if (is_valid_hostname(p)) {
+                       memcpy(canon, p, z-p+1);
+                       break;
+               }
        }
        __fclose_ca(f);
        return cnt ? cnt : badfam;



While this is admittedly edge case that most users will not run into, I
still think it would be nice to behave the same way as glibc does on
this one. And as a bonus, it will be *tiny* bit faster, since there
would not be any need to iterate rest of the /etc/hosts file.



Thank you for considering this,

W.

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [musl] Proposal to match behaviour of gethostbyname to glibc
  2020-03-13 21:46 [musl] Proposal to match behaviour of gethostbyname to glibc Wolf
@ 2020-03-13 22:16 ` Rich Felker
  2020-03-13 23:43   ` Wolf
  2020-03-14  8:24 ` Florian Weimer
  2020-04-17 18:42 ` Rich Felker
  2 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2020-03-13 22:16 UTC (permalink / raw)
  To: musl

On Fri, Mar 13, 2020 at 10:46:48PM +0100, Wolf wrote:
> Hello,
> 
> today I've noticed difference in behavior of gethostbyname in musl and
> in glibc. Given /etc/hosts
> 
> 	127.0.0.1   foo.bar foo
> 	127.0.0.1   bar.foo foo
> 
> and simple test program
> 
> 	#include <netdb.h>
> 	#include <stdio.h>
> 
> 	int main(int argc, char **argv) {
> 		struct hostent *he = gethostbyname(argv[1]);
> 		printf("Hostname: %s\n", he->h_name);
> 	}
> 
> , I've run it both under musl (alpine) and glibc (archlinux).
> 
> musl:
> 
> 	/test # ./test foo
> 	Hostname: bar.foo
> 
> glibc:
> 
> 	[root@foo test]# ./test foo
> 	Hostname: foo.bar
> 
> I don't think there is an actual reason to iterate through all of the
> /etc/hosts and first match can be returned instead. Following patch
> should in my opinion fix this.
> 
> 
> 
> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
> index c93263a9..da8db9d4 100644
> --- a/src/network/lookup_name.c
> +++ b/src/network/lookup_name.c
> @@ -87,7 +87,10 @@ static int name_from_hosts(struct address buf[static MAXADDRS], char canon[stati
>                 for (; *p && isspace(*p); p++);
>                 for (z=p; *z && !isspace(*z); z++);
>                 *z = 0;
> -               if (is_valid_hostname(p)) memcpy(canon, p, z-p+1);
> +               if (is_valid_hostname(p)) {
> +                       memcpy(canon, p, z-p+1);
> +                       break;
> +               }
>         }
>         __fclose_ca(f);
>         return cnt ? cnt : badfam;
> 
> 
> 
> While this is admittedly edge case that most users will not run into, I
> still think it would be nice to behave the same way as glibc does on
> this one. And as a bonus, it will be *tiny* bit faster, since there
> would not be any need to iterate rest of the /etc/hosts file.
> 
> 
> 
> Thank you for considering this,

I don't really see any downsides to doing this, even if "matching
glibc" isn't a terribly useful goal. If nothing else it's faster. Do
you know if there's widespread match for this behavior across other
systems too?

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [musl] Proposal to match behaviour of gethostbyname to glibc
  2020-03-13 22:16 ` Rich Felker
@ 2020-03-13 23:43   ` Wolf
  2020-04-13 23:18     ` Wolf
  2020-04-15  8:31     ` Natanael Copa
  0 siblings, 2 replies; 12+ messages in thread
From: Wolf @ 2020-03-13 23:43 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 359 bytes --]

On 2020-03-13 18:16:49 -0400, Rich Felker wrote:
 
> Do you know if there's widespread match for this behavior across other
> systems too?

Windows use the glibc behaviour, other than that I do not have systems
to test this on.

W.

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [musl] Proposal to match behaviour of gethostbyname to glibc
  2020-03-13 21:46 [musl] Proposal to match behaviour of gethostbyname to glibc Wolf
  2020-03-13 22:16 ` Rich Felker
@ 2020-03-14  8:24 ` Florian Weimer
  2020-03-14 14:54   ` Rich Felker
  2020-04-17 18:42 ` Rich Felker
  2 siblings, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2020-03-14  8:24 UTC (permalink / raw)
  To: Wolf; +Cc: musl

* Wolf:

> While this is admittedly edge case that most users will not run into, I
> still think it would be nice to behave the same way as glibc does on
> this one. And as a bonus, it will be *tiny* bit faster, since there
> would not be any need to iterate rest of the /etc/hosts file.

But something has to scan the entire file in “multi on” mode at least,
to find all relevant addresses.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [musl] Proposal to match behaviour of gethostbyname to glibc
  2020-03-14  8:24 ` Florian Weimer
@ 2020-03-14 14:54   ` Rich Felker
  2020-03-14 16:31     ` Jeffrey Walton
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2020-03-14 14:54 UTC (permalink / raw)
  To: musl

On Sat, Mar 14, 2020 at 09:24:08AM +0100, Florian Weimer wrote:
> * Wolf:
> 
> > While this is admittedly edge case that most users will not run into, I
> > still think it would be nice to behave the same way as glibc does on
> > this one. And as a bonus, it will be *tiny* bit faster, since there
> > would not be any need to iterate rest of the /etc/hosts file.
> 
> But something has to scan the entire file in “multi on” mode at least,
> to find all relevant addresses.

musl doesn't use/support host.conf, so "multi on" isn't an issue for
us at present.

BTW it's not clear to me how the canonical name would/should be
determined in this example if you allow multiple hits.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [musl] Proposal to match behaviour of gethostbyname to glibc
  2020-03-14 14:54   ` Rich Felker
@ 2020-03-14 16:31     ` Jeffrey Walton
  0 siblings, 0 replies; 12+ messages in thread
From: Jeffrey Walton @ 2020-03-14 16:31 UTC (permalink / raw)
  To: musl

On Sat, Mar 14, 2020 at 10:54 AM Rich Felker <dalias@libc.org> wrote:
>
> On Sat, Mar 14, 2020 at 09:24:08AM +0100, Florian Weimer wrote:
> > * Wolf:
> > ...
> > But something has to scan the entire file in “multi on” mode at least,
> > to find all relevant addresses.
>
> musl doesn't use/support host.conf, so "multi on" isn't an issue for
> us at present.
>
> BTW it's not clear to me how the canonical name would/should be
> determined in this example if you allow multiple hits.

Perhaps DHCP domain-search option?

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [musl] Proposal to match behaviour of gethostbyname to glibc
  2020-03-13 23:43   ` Wolf
@ 2020-04-13 23:18     ` Wolf
  2020-04-15  8:31     ` Natanael Copa
  1 sibling, 0 replies; 12+ messages in thread
From: Wolf @ 2020-04-13 23:18 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

Hello,

On 2020-03-14 00:43:04 +0100, Wolf wrote:
> On 2020-03-13 18:16:49 -0400, Rich Felker wrote:
>  
> > Do you know if there's widespread match for this behavior across other
> > systems too?
> 
> Windows use the glibc behaviour, other than that I do not have systems
> to test this on.

Just a polite bump to check if this was just forgotten or there are some
issues I need to resolve in the patch.

W.

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [musl] Proposal to match behaviour of gethostbyname to glibc
  2020-03-13 23:43   ` Wolf
  2020-04-13 23:18     ` Wolf
@ 2020-04-15  8:31     ` Natanael Copa
  2020-04-15 16:03       ` Rich Felker
  1 sibling, 1 reply; 12+ messages in thread
From: Natanael Copa @ 2020-04-15  8:31 UTC (permalink / raw)
  To: Wolf; +Cc: musl

On Sat, 14 Mar 2020 00:43:04 +0100
Wolf <wolf@wolfsden.cz> wrote:

> On 2020-03-13 18:16:49 -0400, Rich Felker wrote:
>  
> > Do you know if there's widespread match for this behavior across other
> > systems too?  
> 
> Windows use the glibc behaviour, other than that I do not have systems
> to test this on.

I tested on openbsd, feebsd and netbsd and they all use the glibc behavior.

-nc

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [musl] Proposal to match behaviour of gethostbyname to glibc
  2020-04-15  8:31     ` Natanael Copa
@ 2020-04-15 16:03       ` Rich Felker
  0 siblings, 0 replies; 12+ messages in thread
From: Rich Felker @ 2020-04-15 16:03 UTC (permalink / raw)
  To: Natanael Copa; +Cc: Wolf, musl

On Wed, Apr 15, 2020 at 10:31:10AM +0200, Natanael Copa wrote:
> On Sat, 14 Mar 2020 00:43:04 +0100
> Wolf <wolf@wolfsden.cz> wrote:
> 
> > On 2020-03-13 18:16:49 -0400, Rich Felker wrote:
> >  
> > > Do you know if there's widespread match for this behavior across other
> > > systems too?  
> > 
> > Windows use the glibc behaviour, other than that I do not have systems
> > to test this on.
> 
> I tested on openbsd, feebsd and netbsd and they all use the glibc behavior.

Thanks. I think I'll go forward with applying this then. I have to
cleanup my working tree -- presently have a branch for merging x86
math patches which should also go upstream now, and I've not been
attending to it due to focus on mallocng (still in separate tree, to
be integrated with musl next) instead.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [musl] Proposal to match behaviour of gethostbyname to glibc
  2020-03-13 21:46 [musl] Proposal to match behaviour of gethostbyname to glibc Wolf
  2020-03-13 22:16 ` Rich Felker
  2020-03-14  8:24 ` Florian Weimer
@ 2020-04-17 18:42 ` Rich Felker
  2020-04-17 18:46   ` Rich Felker
  2 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2020-04-17 18:42 UTC (permalink / raw)
  To: Wolf; +Cc: musl

On Fri, Mar 13, 2020 at 10:46:48PM +0100, Wolf wrote:
> Hello,
> 
> today I've noticed difference in behavior of gethostbyname in musl and
> in glibc. Given /etc/hosts
> 
> 	127.0.0.1   foo.bar foo
> 	127.0.0.1   bar.foo foo
> 
> and simple test program
> 
> 	#include <netdb.h>
> 	#include <stdio.h>
> 
> 	int main(int argc, char **argv) {
> 		struct hostent *he = gethostbyname(argv[1]);
> 		printf("Hostname: %s\n", he->h_name);
> 	}
> 
> , I've run it both under musl (alpine) and glibc (archlinux).
> 
> musl:
> 
> 	/test # ./test foo
> 	Hostname: bar.foo
> 
> glibc:
> 
> 	[root@foo test]# ./test foo
> 	Hostname: foo.bar
> 
> I don't think there is an actual reason to iterate through all of the
> /etc/hosts and first match can be returned instead. Following patch
> should in my opinion fix this.
> 
> 
> 
> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
> index c93263a9..da8db9d4 100644
> --- a/src/network/lookup_name.c
> +++ b/src/network/lookup_name.c
> @@ -87,7 +87,10 @@ static int name_from_hosts(struct address buf[static MAXADDRS], char canon[stati
>                 for (; *p && isspace(*p); p++);
>                 for (z=p; *z && !isspace(*z); z++);
>                 *z = 0;
> -               if (is_valid_hostname(p)) memcpy(canon, p, z-p+1);
> +               if (is_valid_hostname(p)) {
> +                       memcpy(canon, p, z-p+1);
> +                       break;
> +               }
>         }
>         __fclose_ca(f);
>         return cnt ? cnt : badfam;
> 
> 
> 
> While this is admittedly edge case that most users will not run into, I
> still think it would be nice to behave the same way as glibc does on
> this one. And as a bonus, it will be *tiny* bit faster, since there
> would not be any need to iterate rest of the /etc/hosts file.
> 
> 
> 
> Thank you for considering this,

Patch does not apply as submitted (your mail software corrupted it in
the message body; for future reference, use attachment if you don't
have patch-clean mail software) but I'll apply it manually. Thanks.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [musl] Proposal to match behaviour of gethostbyname to glibc
  2020-04-17 18:42 ` Rich Felker
@ 2020-04-17 18:46   ` Rich Felker
  2020-04-26  0:15     ` Wolf
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2020-04-17 18:46 UTC (permalink / raw)
  To: Wolf; +Cc: musl

On Fri, Apr 17, 2020 at 02:42:22PM -0400, Rich Felker wrote:
> On Fri, Mar 13, 2020 at 10:46:48PM +0100, Wolf wrote:
> > Hello,
> > 
> > today I've noticed difference in behavior of gethostbyname in musl and
> > in glibc. Given /etc/hosts
> > 
> > 	127.0.0.1   foo.bar foo
> > 	127.0.0.1   bar.foo foo
> > 
> > and simple test program
> > 
> > 	#include <netdb.h>
> > 	#include <stdio.h>
> > 
> > 	int main(int argc, char **argv) {
> > 		struct hostent *he = gethostbyname(argv[1]);
> > 		printf("Hostname: %s\n", he->h_name);
> > 	}
> > 
> > , I've run it both under musl (alpine) and glibc (archlinux).
> > 
> > musl:
> > 
> > 	/test # ./test foo
> > 	Hostname: bar.foo
> > 
> > glibc:
> > 
> > 	[root@foo test]# ./test foo
> > 	Hostname: foo.bar
> > 
> > I don't think there is an actual reason to iterate through all of the
> > /etc/hosts and first match can be returned instead. Following patch
> > should in my opinion fix this.
> > 
> > 
> > 
> > diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
> > index c93263a9..da8db9d4 100644
> > --- a/src/network/lookup_name.c
> > +++ b/src/network/lookup_name.c
> > @@ -87,7 +87,10 @@ static int name_from_hosts(struct address buf[static MAXADDRS], char canon[stati
> >                 for (; *p && isspace(*p); p++);
> >                 for (z=p; *z && !isspace(*z); z++);
> >                 *z = 0;
> > -               if (is_valid_hostname(p)) memcpy(canon, p, z-p+1);
> > +               if (is_valid_hostname(p)) {
> > +                       memcpy(canon, p, z-p+1);
> > +                       break;
> > +               }
> >         }
> >         __fclose_ca(f);
> >         return cnt ? cnt : badfam;
> > 
> > 
> > 
> > While this is admittedly edge case that most users will not run into, I
> > still think it would be nice to behave the same way as glibc does on
> > this one. And as a bonus, it will be *tiny* bit faster, since there
> > would not be any need to iterate rest of the /etc/hosts file.
> > 
> > 
> > 
> > Thank you for considering this,
> 
> Patch does not apply as submitted (your mail software corrupted it in
> the message body; for future reference, use attachment if you don't
> have patch-clean mail software) but I'll apply it manually. Thanks.

Actually now that I'm doing this I'm not sure it's correct. The
existing code reports all matches from the hosts file, not just the
first one. This patch will prevent getting both ipv4 and ipv6 results,
or multiple results for the same address family, by stopping after the
first one.

If you want the canonical name to come from the first result, rather
than suppressing all but the first result, the code instead needs to
be changed to remember that it already found one name and not copy any
others.

So this needs more discussion to clarify what the actual intent is,
and whether that change is okay,  before it can move forward.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [musl] Proposal to match behaviour of gethostbyname to glibc
  2020-04-17 18:46   ` Rich Felker
@ 2020-04-26  0:15     ` Wolf
  0 siblings, 0 replies; 12+ messages in thread
From: Wolf @ 2020-04-26  0:15 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 841 bytes --]

Hello,

On 2020-04-17 14:46:40 -0400, Rich Felker wrote:
> Actually now that I'm doing this I'm not sure it's correct. The
> existing code reports all matches from the hosts file, not just the
> first one. This patch will prevent getting both ipv4 and ipv6 results,
> or multiple results for the same address family, by stopping after the
> first one.

Well, I've completely missed that. Sorry.

> If you want the canonical name to come from the first result, rather
> than suppressing all but the first result, the code instead needs to
> be changed to remember that it already found one name and not copy any
> others.

I've attached patch v2 doing exactly that. Hopefully this time it's
correct.

W.

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #1.2: patch --]
[-- Type: text/plain, Size: 1009 bytes --]

diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
index c93263a9..00f056e6 100644
--- a/src/network/lookup_name.c
+++ b/src/network/lookup_name.c
@@ -50,7 +50,7 @@ static int name_from_hosts(struct address buf[static MAXADDRS], char canon[stati
 {
 	char line[512];
 	size_t l = strlen(name);
-	int cnt = 0, badfam = 0;
+	int cnt = 0, badfam = 0, have_canon = 0;
 	unsigned char _buf[1032];
 	FILE _f, *f = __fopen_rb_ca("/etc/hosts", &_f, _buf, sizeof _buf);
 	if (!f) switch (errno) {
@@ -83,11 +83,16 @@ static int name_from_hosts(struct address buf[static MAXADDRS], char canon[stati
 			continue;
 		}
 
+		if (have_canon) continue;
+
 		/* Extract first name as canonical name */
 		for (; *p && isspace(*p); p++);
 		for (z=p; *z && !isspace(*z); z++);
 		*z = 0;
-		if (is_valid_hostname(p)) memcpy(canon, p, z-p+1);
+		if (is_valid_hostname(p)) {
+			have_canon = 1;
+			memcpy(canon, p, z-p+1);
+		}
 	}
 	__fclose_ca(f);
 	return cnt ? cnt : badfam;

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-04-26  0:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-13 21:46 [musl] Proposal to match behaviour of gethostbyname to glibc Wolf
2020-03-13 22:16 ` Rich Felker
2020-03-13 23:43   ` Wolf
2020-04-13 23:18     ` Wolf
2020-04-15  8:31     ` Natanael Copa
2020-04-15 16:03       ` Rich Felker
2020-03-14  8:24 ` Florian Weimer
2020-03-14 14:54   ` Rich Felker
2020-03-14 16:31     ` Jeffrey Walton
2020-04-17 18:42 ` Rich Felker
2020-04-17 18:46   ` Rich Felker
2020-04-26  0:15     ` Wolf

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).