[-- Attachment #1: Type: text/plain, Size: 1850 bytes --] Hello, today I've noticed difference in behavior of gethostbyname in musl and in glibc. Given /etc/hosts 127.0.0.1 foo.bar foo 127.0.0.1 bar.foo foo and simple test program #include <netdb.h> #include <stdio.h> int main(int argc, char **argv) { struct hostent *he = gethostbyname(argv[1]); printf("Hostname: %s\n", he->h_name); } , I've run it both under musl (alpine) and glibc (archlinux). musl: /test # ./test foo Hostname: bar.foo glibc: [root@foo test]# ./test foo Hostname: foo.bar I don't think there is an actual reason to iterate through all of the /etc/hosts and first match can be returned instead. Following patch should in my opinion fix this. diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c index c93263a9..da8db9d4 100644 --- a/src/network/lookup_name.c +++ b/src/network/lookup_name.c @@ -87,7 +87,10 @@ static int name_from_hosts(struct address buf[static MAXADDRS], char canon[stati for (; *p && isspace(*p); p++); for (z=p; *z && !isspace(*z); z++); *z = 0; - if (is_valid_hostname(p)) memcpy(canon, p, z-p+1); + if (is_valid_hostname(p)) { + memcpy(canon, p, z-p+1); + break; + } } __fclose_ca(f); return cnt ? cnt : badfam; While this is admittedly edge case that most users will not run into, I still think it would be nice to behave the same way as glibc does on this one. And as a bonus, it will be *tiny* bit faster, since there would not be any need to iterate rest of the /etc/hosts file. Thank you for considering this, W. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
On Fri, Mar 13, 2020 at 10:46:48PM +0100, Wolf wrote:
> Hello,
>
> today I've noticed difference in behavior of gethostbyname in musl and
> in glibc. Given /etc/hosts
>
> 127.0.0.1 foo.bar foo
> 127.0.0.1 bar.foo foo
>
> and simple test program
>
> #include <netdb.h>
> #include <stdio.h>
>
> int main(int argc, char **argv) {
> struct hostent *he = gethostbyname(argv[1]);
> printf("Hostname: %s\n", he->h_name);
> }
>
> , I've run it both under musl (alpine) and glibc (archlinux).
>
> musl:
>
> /test # ./test foo
> Hostname: bar.foo
>
> glibc:
>
> [root@foo test]# ./test foo
> Hostname: foo.bar
>
> I don't think there is an actual reason to iterate through all of the
> /etc/hosts and first match can be returned instead. Following patch
> should in my opinion fix this.
>
>
>
> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
> index c93263a9..da8db9d4 100644
> --- a/src/network/lookup_name.c
> +++ b/src/network/lookup_name.c
> @@ -87,7 +87,10 @@ static int name_from_hosts(struct address buf[static MAXADDRS], char canon[stati
> for (; *p && isspace(*p); p++);
> for (z=p; *z && !isspace(*z); z++);
> *z = 0;
> - if (is_valid_hostname(p)) memcpy(canon, p, z-p+1);
> + if (is_valid_hostname(p)) {
> + memcpy(canon, p, z-p+1);
> + break;
> + }
> }
> __fclose_ca(f);
> return cnt ? cnt : badfam;
>
>
>
> While this is admittedly edge case that most users will not run into, I
> still think it would be nice to behave the same way as glibc does on
> this one. And as a bonus, it will be *tiny* bit faster, since there
> would not be any need to iterate rest of the /etc/hosts file.
>
>
>
> Thank you for considering this,
I don't really see any downsides to doing this, even if "matching
glibc" isn't a terribly useful goal. If nothing else it's faster. Do
you know if there's widespread match for this behavior across other
systems too?
Rich
[-- Attachment #1: Type: text/plain, Size: 359 bytes --] On 2020-03-13 18:16:49 -0400, Rich Felker wrote: > Do you know if there's widespread match for this behavior across other > systems too? Windows use the glibc behaviour, other than that I do not have systems to test this on. W. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
* Wolf:
> While this is admittedly edge case that most users will not run into, I
> still think it would be nice to behave the same way as glibc does on
> this one. And as a bonus, it will be *tiny* bit faster, since there
> would not be any need to iterate rest of the /etc/hosts file.
But something has to scan the entire file in “multi on” mode at least,
to find all relevant addresses.
On Sat, Mar 14, 2020 at 09:24:08AM +0100, Florian Weimer wrote:
> * Wolf:
>
> > While this is admittedly edge case that most users will not run into, I
> > still think it would be nice to behave the same way as glibc does on
> > this one. And as a bonus, it will be *tiny* bit faster, since there
> > would not be any need to iterate rest of the /etc/hosts file.
>
> But something has to scan the entire file in “multi on” mode at least,
> to find all relevant addresses.
musl doesn't use/support host.conf, so "multi on" isn't an issue for
us at present.
BTW it's not clear to me how the canonical name would/should be
determined in this example if you allow multiple hits.
Rich
On Sat, Mar 14, 2020 at 10:54 AM Rich Felker <dalias@libc.org> wrote:
>
> On Sat, Mar 14, 2020 at 09:24:08AM +0100, Florian Weimer wrote:
> > * Wolf:
> > ...
> > But something has to scan the entire file in “multi on” mode at least,
> > to find all relevant addresses.
>
> musl doesn't use/support host.conf, so "multi on" isn't an issue for
> us at present.
>
> BTW it's not clear to me how the canonical name would/should be
> determined in this example if you allow multiple hits.
Perhaps DHCP domain-search option?
Jeff
[-- Attachment #1: Type: text/plain, Size: 542 bytes --] Hello, On 2020-03-14 00:43:04 +0100, Wolf wrote: > On 2020-03-13 18:16:49 -0400, Rich Felker wrote: > > > Do you know if there's widespread match for this behavior across other > > systems too? > > Windows use the glibc behaviour, other than that I do not have systems > to test this on. Just a polite bump to check if this was just forgotten or there are some issues I need to resolve in the patch. W. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
On Sat, 14 Mar 2020 00:43:04 +0100
Wolf <wolf@wolfsden.cz> wrote:
> On 2020-03-13 18:16:49 -0400, Rich Felker wrote:
>
> > Do you know if there's widespread match for this behavior across other
> > systems too?
>
> Windows use the glibc behaviour, other than that I do not have systems
> to test this on.
I tested on openbsd, feebsd and netbsd and they all use the glibc behavior.
-nc
On Wed, Apr 15, 2020 at 10:31:10AM +0200, Natanael Copa wrote:
> On Sat, 14 Mar 2020 00:43:04 +0100
> Wolf <wolf@wolfsden.cz> wrote:
>
> > On 2020-03-13 18:16:49 -0400, Rich Felker wrote:
> >
> > > Do you know if there's widespread match for this behavior across other
> > > systems too?
> >
> > Windows use the glibc behaviour, other than that I do not have systems
> > to test this on.
>
> I tested on openbsd, feebsd and netbsd and they all use the glibc behavior.
Thanks. I think I'll go forward with applying this then. I have to
cleanup my working tree -- presently have a branch for merging x86
math patches which should also go upstream now, and I've not been
attending to it due to focus on mallocng (still in separate tree, to
be integrated with musl next) instead.
Rich
On Fri, Mar 13, 2020 at 10:46:48PM +0100, Wolf wrote:
> Hello,
>
> today I've noticed difference in behavior of gethostbyname in musl and
> in glibc. Given /etc/hosts
>
> 127.0.0.1 foo.bar foo
> 127.0.0.1 bar.foo foo
>
> and simple test program
>
> #include <netdb.h>
> #include <stdio.h>
>
> int main(int argc, char **argv) {
> struct hostent *he = gethostbyname(argv[1]);
> printf("Hostname: %s\n", he->h_name);
> }
>
> , I've run it both under musl (alpine) and glibc (archlinux).
>
> musl:
>
> /test # ./test foo
> Hostname: bar.foo
>
> glibc:
>
> [root@foo test]# ./test foo
> Hostname: foo.bar
>
> I don't think there is an actual reason to iterate through all of the
> /etc/hosts and first match can be returned instead. Following patch
> should in my opinion fix this.
>
>
>
> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
> index c93263a9..da8db9d4 100644
> --- a/src/network/lookup_name.c
> +++ b/src/network/lookup_name.c
> @@ -87,7 +87,10 @@ static int name_from_hosts(struct address buf[static MAXADDRS], char canon[stati
> for (; *p && isspace(*p); p++);
> for (z=p; *z && !isspace(*z); z++);
> *z = 0;
> - if (is_valid_hostname(p)) memcpy(canon, p, z-p+1);
> + if (is_valid_hostname(p)) {
> + memcpy(canon, p, z-p+1);
> + break;
> + }
> }
> __fclose_ca(f);
> return cnt ? cnt : badfam;
>
>
>
> While this is admittedly edge case that most users will not run into, I
> still think it would be nice to behave the same way as glibc does on
> this one. And as a bonus, it will be *tiny* bit faster, since there
> would not be any need to iterate rest of the /etc/hosts file.
>
>
>
> Thank you for considering this,
Patch does not apply as submitted (your mail software corrupted it in
the message body; for future reference, use attachment if you don't
have patch-clean mail software) but I'll apply it manually. Thanks.
Rich
On Fri, Apr 17, 2020 at 02:42:22PM -0400, Rich Felker wrote:
> On Fri, Mar 13, 2020 at 10:46:48PM +0100, Wolf wrote:
> > Hello,
> >
> > today I've noticed difference in behavior of gethostbyname in musl and
> > in glibc. Given /etc/hosts
> >
> > 127.0.0.1 foo.bar foo
> > 127.0.0.1 bar.foo foo
> >
> > and simple test program
> >
> > #include <netdb.h>
> > #include <stdio.h>
> >
> > int main(int argc, char **argv) {
> > struct hostent *he = gethostbyname(argv[1]);
> > printf("Hostname: %s\n", he->h_name);
> > }
> >
> > , I've run it both under musl (alpine) and glibc (archlinux).
> >
> > musl:
> >
> > /test # ./test foo
> > Hostname: bar.foo
> >
> > glibc:
> >
> > [root@foo test]# ./test foo
> > Hostname: foo.bar
> >
> > I don't think there is an actual reason to iterate through all of the
> > /etc/hosts and first match can be returned instead. Following patch
> > should in my opinion fix this.
> >
> >
> >
> > diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
> > index c93263a9..da8db9d4 100644
> > --- a/src/network/lookup_name.c
> > +++ b/src/network/lookup_name.c
> > @@ -87,7 +87,10 @@ static int name_from_hosts(struct address buf[static MAXADDRS], char canon[stati
> > for (; *p && isspace(*p); p++);
> > for (z=p; *z && !isspace(*z); z++);
> > *z = 0;
> > - if (is_valid_hostname(p)) memcpy(canon, p, z-p+1);
> > + if (is_valid_hostname(p)) {
> > + memcpy(canon, p, z-p+1);
> > + break;
> > + }
> > }
> > __fclose_ca(f);
> > return cnt ? cnt : badfam;
> >
> >
> >
> > While this is admittedly edge case that most users will not run into, I
> > still think it would be nice to behave the same way as glibc does on
> > this one. And as a bonus, it will be *tiny* bit faster, since there
> > would not be any need to iterate rest of the /etc/hosts file.
> >
> >
> >
> > Thank you for considering this,
>
> Patch does not apply as submitted (your mail software corrupted it in
> the message body; for future reference, use attachment if you don't
> have patch-clean mail software) but I'll apply it manually. Thanks.
Actually now that I'm doing this I'm not sure it's correct. The
existing code reports all matches from the hosts file, not just the
first one. This patch will prevent getting both ipv4 and ipv6 results,
or multiple results for the same address family, by stopping after the
first one.
If you want the canonical name to come from the first result, rather
than suppressing all but the first result, the code instead needs to
be changed to remember that it already found one name and not copy any
others.
So this needs more discussion to clarify what the actual intent is,
and whether that change is okay, before it can move forward.
Rich
[-- Attachment #1.1: Type: text/plain, Size: 841 bytes --] Hello, On 2020-04-17 14:46:40 -0400, Rich Felker wrote: > Actually now that I'm doing this I'm not sure it's correct. The > existing code reports all matches from the hosts file, not just the > first one. This patch will prevent getting both ipv4 and ipv6 results, > or multiple results for the same address family, by stopping after the > first one. Well, I've completely missed that. Sorry. > If you want the canonical name to come from the first result, rather > than suppressing all but the first result, the code instead needs to > be changed to remember that it already found one name and not copy any > others. I've attached patch v2 doing exactly that. Hopefully this time it's correct. W. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. [-- Attachment #1.2: patch --] [-- Type: text/plain, Size: 1009 bytes --] diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c index c93263a9..00f056e6 100644 --- a/src/network/lookup_name.c +++ b/src/network/lookup_name.c @@ -50,7 +50,7 @@ static int name_from_hosts(struct address buf[static MAXADDRS], char canon[stati { char line[512]; size_t l = strlen(name); - int cnt = 0, badfam = 0; + int cnt = 0, badfam = 0, have_canon = 0; unsigned char _buf[1032]; FILE _f, *f = __fopen_rb_ca("/etc/hosts", &_f, _buf, sizeof _buf); if (!f) switch (errno) { @@ -83,11 +83,16 @@ static int name_from_hosts(struct address buf[static MAXADDRS], char canon[stati continue; } + if (have_canon) continue; + /* Extract first name as canonical name */ for (; *p && isspace(*p); p++); for (z=p; *z && !isspace(*z); z++); *z = 0; - if (is_valid_hostname(p)) memcpy(canon, p, z-p+1); + if (is_valid_hostname(p)) { + have_canon = 1; + memcpy(canon, p, z-p+1); + } } __fclose_ca(f); return cnt ? cnt : badfam; [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]