mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] hostname is using a case sensitive search in function name_from_hosts
@ 2022-06-09 18:42 Hans Harder
  2022-06-09 20:34 ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Hans Harder @ 2022-06-09 18:42 UTC (permalink / raw)
  To: musl

Hi,
I discovered that the function name_from_hosts parses the /etc/hosts
file and does a case sensitive search for a name.
Sometimes I encounter mixed upper and lowercase hostnames in a /etc/hosts file.
It would be easier if the function searches for the name in a case
insensitive way....

By changing line 68  in src/network/lookup_name.c
       for(p=line+1; (p=strstr(p, name)) &&
to:
       for(p=line+1; (p=strcasestr(p, name)) &&

That would resolve the problem.

Hans

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] hostname is using a case sensitive search in function name_from_hosts
  2022-06-09 18:42 [musl] hostname is using a case sensitive search in function name_from_hosts Hans Harder
@ 2022-06-09 20:34 ` Rich Felker
  2022-06-15  6:30   ` Hans Harder
  2022-06-16 16:19   ` NRK
  0 siblings, 2 replies; 5+ messages in thread
From: Rich Felker @ 2022-06-09 20:34 UTC (permalink / raw)
  To: Hans Harder; +Cc: musl

On Thu, Jun 09, 2022 at 08:42:28PM +0200, Hans Harder wrote:
> Hi,
> I discovered that the function name_from_hosts parses the /etc/hosts
> file and does a case sensitive search for a name.
> Sometimes I encounter mixed upper and lowercase hostnames in a /etc/hosts file.
> It would be easier if the function searches for the name in a case
> insensitive way....
> 
> By changing line 68  in src/network/lookup_name.c
>        for(p=line+1; (p=strstr(p, name)) &&
> to:
>        for(p=line+1; (p=strcasestr(p, name)) &&
> 
> That would resolve the problem.

strcasestr isn't a good match here, because it's quadratic time and
would be potentially quite slow (depending on file contents). It's
also not in a usable namespace, and is something of a junk function we
included for questionable reasons.

The core problem here is that strstr isn't really the right operation
to be using, and was something of a lazy hack. Due to the linear-time
implementation it doesn't hurt, but it would make a lot more sense to
parse this right looking at separators. Even then though it's some
work to make it properly case-insensitive; strcasecmp is insufficient
and only handles single-byte characters. So the right thing to do is
really picking up review and merge of the draft IDN handling work,
which (if I'm remembering right) normalizes case as an inherent part
of the process.

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] hostname is using a case sensitive search in function name_from_hosts
  2022-06-09 20:34 ` Rich Felker
@ 2022-06-15  6:30   ` Hans Harder
  2022-06-16 16:19   ` NRK
  1 sibling, 0 replies; 5+ messages in thread
From: Hans Harder @ 2022-06-15  6:30 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

I don't have much experience in making patches, but you mean sometime
like this patch.

It uses strtok and only compares something case insensitive if the
token is the same length of name.
Also it makes it simpler in the remaining code.

Hans

diff -u a/src/network/lookup_name.c b/src/network/lookup_name.c
--- a/src/network/lookup_name.c 2022-04-07 17:12:40.000000000 +0000
+++ b/src/network/lookup_name.c 2022-06-15 06:15:46.680000000 +0000
@@ -6,6 +6,7 @@
 #include <ctype.h>
 #include <stdlib.h>
 #include <string.h>
+#include <strings.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <pthread.h>
@@ -49,6 +50,7 @@
 static int name_from_hosts(struct address buf[static MAXADDRS], char
canon[static 256], const char *name, int family)
 {
    char line[512];
+   char sep[4] = " \t\n";
    size_t l = strlen(name);
    int cnt = 0, badfam = 0, have_canon = 0;
    unsigned char _buf[1032];
@@ -62,17 +64,22 @@
        return EAI_SYSTEM;
    }
    while (fgets(line, sizeof line, f) && cnt < MAXADDRS) {
-       char *p, *z;
-
+       char *p;
        if ((p=strchr(line, '#'))) *p++='\n', *p=0;
-       for(p=line+1; (p=strstr(p, name)) &&
-           (!isspace(p[-1]) || !isspace(p[l])); p++);
-       if (!p) continue;
+       if (line[0] == 0 || line[0]=='\n') continue;
+       p = strtok(line, sep);
+       while( p != NULL ) {
+           /* only compare case insensitive if length of both are the same */
+           if (strlen(p) == l  && strcasecmp(p,name)==0) {
+               p = strtok(line, sep);
+               break;
+           }
+           p = strtok(NULL, sep);
+       }
+       if (p == NULL) continue;

        /* Isolate IP address to parse */
-       for (p=line; *p && !isspace(*p); p++);
-       *p++ = 0;
-       switch (name_from_numeric(buf+cnt, line, family)) {
+       switch (name_from_numeric(buf+cnt, p, family)) {
        case 1:
            cnt++;
            break;
@@ -86,12 +93,10 @@
        if (have_canon) continue;

        /* Extract first name as canonical name */
-       for (; *p && isspace(*p); p++);
-       for (z=p; *z && !isspace(*z); z++);
-       *z = 0;
-       if (is_valid_hostname(p)) {
+       p = strtok(NULL, sep);
+       if (p != NULL && is_valid_hostname(p)) {
            have_canon = 1;
-           memcpy(canon, p, z-p+1);
+           strcpy(canon, p);
        }
    }
    __fclose_ca(f);



On Thu, Jun 9, 2022 at 10:34 PM Rich Felker <dalias@libc.org> wrote:
>
> On Thu, Jun 09, 2022 at 08:42:28PM +0200, Hans Harder wrote:
> > Hi,
> > I discovered that the function name_from_hosts parses the /etc/hosts
> > file and does a case sensitive search for a name.
> > Sometimes I encounter mixed upper and lowercase hostnames in a /etc/hosts file.
> > It would be easier if the function searches for the name in a case
> > insensitive way....
> >
> > By changing line 68  in src/network/lookup_name.c
> >        for(p=line+1; (p=strstr(p, name)) &&
> > to:
> >        for(p=line+1; (p=strcasestr(p, name)) &&
> >
> > That would resolve the problem.
>
> strcasestr isn't a good match here, because it's quadratic time and
> would be potentially quite slow (depending on file contents). It's
> also not in a usable namespace, and is something of a junk function we
> included for questionable reasons.
>
> The core problem here is that strstr isn't really the right operation
> to be using, and was something of a lazy hack. Due to the linear-time
> implementation it doesn't hurt, but it would make a lot more sense to
> parse this right looking at separators. Even then though it's some
> work to make it properly case-insensitive; strcasecmp is insufficient
> and only handles single-byte characters. So the right thing to do is
> really picking up review and merge of the draft IDN handling work,
> which (if I'm remembering right) normalizes case as an inherent part
> of the process.
>
> Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] hostname is using a case sensitive search in function name_from_hosts
  2022-06-09 20:34 ` Rich Felker
  2022-06-15  6:30   ` Hans Harder
@ 2022-06-16 16:19   ` NRK
  2022-06-16 16:45     ` Rich Felker
  1 sibling, 1 reply; 5+ messages in thread
From: NRK @ 2022-06-16 16:19 UTC (permalink / raw)
  To: musl

On Thu, Jun 09, 2022 at 04:34:27PM -0400, Rich Felker wrote:
> strcasestr isn't a good match here, because it's quadratic time and
> would be potentially quite slow (depending on file contents). It's
> also not in a usable namespace, and is something of a junk function we
> included for questionable reasons.

A bit off-topic, but is there any desire to improve the strcasestr
performance? If there is, then I can supply the patch as I was playing
around with it a couple weeks ago.

- NRK

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] hostname is using a case sensitive search in function name_from_hosts
  2022-06-16 16:19   ` NRK
@ 2022-06-16 16:45     ` Rich Felker
  0 siblings, 0 replies; 5+ messages in thread
From: Rich Felker @ 2022-06-16 16:45 UTC (permalink / raw)
  To: NRK; +Cc: musl

On Thu, Jun 16, 2022 at 10:19:24PM +0600, NRK wrote:
> On Thu, Jun 09, 2022 at 04:34:27PM -0400, Rich Felker wrote:
> > strcasestr isn't a good match here, because it's quadratic time and
> > would be potentially quite slow (depending on file contents). It's
> > also not in a usable namespace, and is something of a junk function we
> > included for questionable reasons.
> 
> A bit off-topic, but is there any desire to improve the strcasestr
> performance? If there is, then I can supply the patch as I was playing
> around with it a couple weeks ago.

I'm not sure. It's an interesting research problem whether it can be
made linear-time in constant space -- I mean, for a dumb
single-byte-only version like we do now, I'm pretty sure it can, but
I'm doubtful the same can be done for one that respects full multibyte
case mapping.

But aside from that, I'm not sure if there's much practical value. The
function is not really appropriate for much real-world use, especially
as long as it doesn't respect non-ASCII case mappings, and was
probably a mistake to include to begin with. If there are simple
changes that could be made to make it "less awfully slow", that's
probably ok, but anything that involves open-coding a more complex
implementation without actually making this a high-quality usable
function seems like an unwanted increase in code complexity and
potential bug-surface/maintenance-burden without a good motivation.

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-06-16 16:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-09 18:42 [musl] hostname is using a case sensitive search in function name_from_hosts Hans Harder
2022-06-09 20:34 ` Rich Felker
2022-06-15  6:30   ` Hans Harder
2022-06-16 16:19   ` NRK
2022-06-16 16:45     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).