* [musl] hostname is using a case sensitive search in function name_from_hosts @ 2022-06-09 18:42 Hans Harder 2022-06-09 20:34 ` Rich Felker 0 siblings, 1 reply; 5+ messages in thread From: Hans Harder @ 2022-06-09 18:42 UTC (permalink / raw) To: musl Hi, I discovered that the function name_from_hosts parses the /etc/hosts file and does a case sensitive search for a name. Sometimes I encounter mixed upper and lowercase hostnames in a /etc/hosts file. It would be easier if the function searches for the name in a case insensitive way.... By changing line 68 in src/network/lookup_name.c for(p=line+1; (p=strstr(p, name)) && to: for(p=line+1; (p=strcasestr(p, name)) && That would resolve the problem. Hans ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] hostname is using a case sensitive search in function name_from_hosts 2022-06-09 18:42 [musl] hostname is using a case sensitive search in function name_from_hosts Hans Harder @ 2022-06-09 20:34 ` Rich Felker 2022-06-15 6:30 ` Hans Harder 2022-06-16 16:19 ` NRK 0 siblings, 2 replies; 5+ messages in thread From: Rich Felker @ 2022-06-09 20:34 UTC (permalink / raw) To: Hans Harder; +Cc: musl On Thu, Jun 09, 2022 at 08:42:28PM +0200, Hans Harder wrote: > Hi, > I discovered that the function name_from_hosts parses the /etc/hosts > file and does a case sensitive search for a name. > Sometimes I encounter mixed upper and lowercase hostnames in a /etc/hosts file. > It would be easier if the function searches for the name in a case > insensitive way.... > > By changing line 68 in src/network/lookup_name.c > for(p=line+1; (p=strstr(p, name)) && > to: > for(p=line+1; (p=strcasestr(p, name)) && > > That would resolve the problem. strcasestr isn't a good match here, because it's quadratic time and would be potentially quite slow (depending on file contents). It's also not in a usable namespace, and is something of a junk function we included for questionable reasons. The core problem here is that strstr isn't really the right operation to be using, and was something of a lazy hack. Due to the linear-time implementation it doesn't hurt, but it would make a lot more sense to parse this right looking at separators. Even then though it's some work to make it properly case-insensitive; strcasecmp is insufficient and only handles single-byte characters. So the right thing to do is really picking up review and merge of the draft IDN handling work, which (if I'm remembering right) normalizes case as an inherent part of the process. Rich ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] hostname is using a case sensitive search in function name_from_hosts 2022-06-09 20:34 ` Rich Felker @ 2022-06-15 6:30 ` Hans Harder 2022-06-16 16:19 ` NRK 1 sibling, 0 replies; 5+ messages in thread From: Hans Harder @ 2022-06-15 6:30 UTC (permalink / raw) To: Rich Felker; +Cc: musl I don't have much experience in making patches, but you mean sometime like this patch. It uses strtok and only compares something case insensitive if the token is the same length of name. Also it makes it simpler in the remaining code. Hans diff -u a/src/network/lookup_name.c b/src/network/lookup_name.c --- a/src/network/lookup_name.c 2022-04-07 17:12:40.000000000 +0000 +++ b/src/network/lookup_name.c 2022-06-15 06:15:46.680000000 +0000 @@ -6,6 +6,7 @@ #include <ctype.h> #include <stdlib.h> #include <string.h> +#include <strings.h> #include <fcntl.h> #include <unistd.h> #include <pthread.h> @@ -49,6 +50,7 @@ static int name_from_hosts(struct address buf[static MAXADDRS], char canon[static 256], const char *name, int family) { char line[512]; + char sep[4] = " \t\n"; size_t l = strlen(name); int cnt = 0, badfam = 0, have_canon = 0; unsigned char _buf[1032]; @@ -62,17 +64,22 @@ return EAI_SYSTEM; } while (fgets(line, sizeof line, f) && cnt < MAXADDRS) { - char *p, *z; - + char *p; if ((p=strchr(line, '#'))) *p++='\n', *p=0; - for(p=line+1; (p=strstr(p, name)) && - (!isspace(p[-1]) || !isspace(p[l])); p++); - if (!p) continue; + if (line[0] == 0 || line[0]=='\n') continue; + p = strtok(line, sep); + while( p != NULL ) { + /* only compare case insensitive if length of both are the same */ + if (strlen(p) == l && strcasecmp(p,name)==0) { + p = strtok(line, sep); + break; + } + p = strtok(NULL, sep); + } + if (p == NULL) continue; /* Isolate IP address to parse */ - for (p=line; *p && !isspace(*p); p++); - *p++ = 0; - switch (name_from_numeric(buf+cnt, line, family)) { + switch (name_from_numeric(buf+cnt, p, family)) { case 1: cnt++; break; @@ -86,12 +93,10 @@ if (have_canon) continue; /* Extract first name as canonical name */ - for (; *p && isspace(*p); p++); - for (z=p; *z && !isspace(*z); z++); - *z = 0; - if (is_valid_hostname(p)) { + p = strtok(NULL, sep); + if (p != NULL && is_valid_hostname(p)) { have_canon = 1; - memcpy(canon, p, z-p+1); + strcpy(canon, p); } } __fclose_ca(f); On Thu, Jun 9, 2022 at 10:34 PM Rich Felker <dalias@libc.org> wrote: > > On Thu, Jun 09, 2022 at 08:42:28PM +0200, Hans Harder wrote: > > Hi, > > I discovered that the function name_from_hosts parses the /etc/hosts > > file and does a case sensitive search for a name. > > Sometimes I encounter mixed upper and lowercase hostnames in a /etc/hosts file. > > It would be easier if the function searches for the name in a case > > insensitive way.... > > > > By changing line 68 in src/network/lookup_name.c > > for(p=line+1; (p=strstr(p, name)) && > > to: > > for(p=line+1; (p=strcasestr(p, name)) && > > > > That would resolve the problem. > > strcasestr isn't a good match here, because it's quadratic time and > would be potentially quite slow (depending on file contents). It's > also not in a usable namespace, and is something of a junk function we > included for questionable reasons. > > The core problem here is that strstr isn't really the right operation > to be using, and was something of a lazy hack. Due to the linear-time > implementation it doesn't hurt, but it would make a lot more sense to > parse this right looking at separators. Even then though it's some > work to make it properly case-insensitive; strcasecmp is insufficient > and only handles single-byte characters. So the right thing to do is > really picking up review and merge of the draft IDN handling work, > which (if I'm remembering right) normalizes case as an inherent part > of the process. > > Rich ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] hostname is using a case sensitive search in function name_from_hosts 2022-06-09 20:34 ` Rich Felker 2022-06-15 6:30 ` Hans Harder @ 2022-06-16 16:19 ` NRK 2022-06-16 16:45 ` Rich Felker 1 sibling, 1 reply; 5+ messages in thread From: NRK @ 2022-06-16 16:19 UTC (permalink / raw) To: musl On Thu, Jun 09, 2022 at 04:34:27PM -0400, Rich Felker wrote: > strcasestr isn't a good match here, because it's quadratic time and > would be potentially quite slow (depending on file contents). It's > also not in a usable namespace, and is something of a junk function we > included for questionable reasons. A bit off-topic, but is there any desire to improve the strcasestr performance? If there is, then I can supply the patch as I was playing around with it a couple weeks ago. - NRK ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] hostname is using a case sensitive search in function name_from_hosts 2022-06-16 16:19 ` NRK @ 2022-06-16 16:45 ` Rich Felker 0 siblings, 0 replies; 5+ messages in thread From: Rich Felker @ 2022-06-16 16:45 UTC (permalink / raw) To: NRK; +Cc: musl On Thu, Jun 16, 2022 at 10:19:24PM +0600, NRK wrote: > On Thu, Jun 09, 2022 at 04:34:27PM -0400, Rich Felker wrote: > > strcasestr isn't a good match here, because it's quadratic time and > > would be potentially quite slow (depending on file contents). It's > > also not in a usable namespace, and is something of a junk function we > > included for questionable reasons. > > A bit off-topic, but is there any desire to improve the strcasestr > performance? If there is, then I can supply the patch as I was playing > around with it a couple weeks ago. I'm not sure. It's an interesting research problem whether it can be made linear-time in constant space -- I mean, for a dumb single-byte-only version like we do now, I'm pretty sure it can, but I'm doubtful the same can be done for one that respects full multibyte case mapping. But aside from that, I'm not sure if there's much practical value. The function is not really appropriate for much real-world use, especially as long as it doesn't respect non-ASCII case mappings, and was probably a mistake to include to begin with. If there are simple changes that could be made to make it "less awfully slow", that's probably ok, but anything that involves open-coding a more complex implementation without actually making this a high-quality usable function seems like an unwanted increase in code complexity and potential bug-surface/maintenance-burden without a good motivation. Rich ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-06-16 16:45 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-09 18:42 [musl] hostname is using a case sensitive search in function name_from_hosts Hans Harder 2022-06-09 20:34 ` Rich Felker 2022-06-15 6:30 ` Hans Harder 2022-06-16 16:19 ` NRK 2022-06-16 16:45 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).