zsh-workers
 help / color / mirror / code / Atom feed
* $watch, log and Cyrillic usernames
@ 2023-10-05 21:29 Максим
  2023-10-05 21:42 ` Bart Schaefer
  2023-10-07  2:05 ` Oliver Kiddle
  0 siblings, 2 replies; 9+ messages in thread
From: Максим @ 2023-10-05 21:29 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 813 bytes --]

Hello again. I found another bug with cyrillic usernames in zsh (again on
Cygwin, but can be reproduced on Linux)

Tested on latest master:

1. Create user "Студент" (for example, via /etc/passwd editing)
2. Create /var/run/utmp file if it is missing (may be necessary on Cygwin)
3. Login into "Студент" (via login or ssh)
4. % zsh-5.9.0.1-dev -f

% whoami
    Студент

% watch=(all); log # (everything is ok)
    root has logged on /proc/10045/fd/2 from .
    Студент has logged on pts/29 from 127.0.0.1.

% watch=(notme); log # ("Студент" record is not filtered)
    root has logged on /proc/10045/fd/2 from .
    Студент has logged on pts/29 from 127.0.0.1.

% watch=(Студент); log # ("Студент" record is missing)

Sent via Gmail

[-- Attachment #2: Type: text/html, Size: 1623 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: $watch, log and Cyrillic usernames
  2023-10-05 21:29 $watch, log and Cyrillic usernames Максим
@ 2023-10-05 21:42 ` Bart Schaefer
  2023-10-07  2:05 ` Oliver Kiddle
  1 sibling, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 2023-10-05 21:42 UTC (permalink / raw)
  To: Максим; +Cc: zsh-workers

I'm going to leave this one to Oliver as he most recently fiddled with
the "watch" code (e.g., moved to a module) but I have to ask:  Is
there really such a thing as a time-sharing Cygwin system?  If not, of
what use is "watch"?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: $watch, log and Cyrillic usernames
  2023-10-05 21:29 $watch, log and Cyrillic usernames Максим
  2023-10-05 21:42 ` Bart Schaefer
@ 2023-10-07  2:05 ` Oliver Kiddle
  2023-10-07 16:49   ` Максим
  2023-10-08 21:47   ` metafy() (was Re: $watch, log and Cyrillic usernames) Oliver Kiddle
  1 sibling, 2 replies; 9+ messages in thread
From: Oliver Kiddle @ 2023-10-07  2:05 UTC (permalink / raw)
  To: Максим; +Cc: zsh-workers

Максим wrote:
> Hello again. I found another bug with cyrillic usernames in zsh (again on
> Cygwin, but can be reproduced on Linux)

Reproducing does involve bypassing utilities like useradd which complain
about invalid usernames. But I can imagine such rules will increasingly
be relaxed and there's no reason for zsh to make assumptions.

> % watch=(Студент); log # ("Студент" record is missing)

The value from $watch is metafied and that's what patcompile() and
pattry() need so the fix below uses metafy() on the username from
utmp.

However, in looking closer at the code I observed the existing use of
sizeof(u->ut_name) which is 32 on my system. So I tried creating 32 and
33 character usernames (which, incidentally, useradd was happy with) and
as I suspected u->ut_name is not null-terminated for these. So the patch
uses strnlen() with the sizeof() for n to get the length to pass
to metafy(). We have no existing uses of strnlen() but I don't foresee
portability issues given that it is attributed to the 2008 POSIX
standard and is supported in Solaris 10 which is from a few years prior
to that. If needed, it'd be easy to provide an alternative
implementation.

To match the 33 character username, it does need to be truncated in
$watch. last -w does manage to print the full username, would be good
to know how. For the hostname, our code was using strlen() rather than
sizeof(). I can't see why this would be needed. I would have tried
putting UTF-8 in my hosts file to test that that but I'm only getting IP
addresses in utmp. I guess we could do reverse lookups but it hardly
seems worth it for the amount of use watch/log likely get these days.

The example also uses an uppercase letter. Usernames on Unix are
case-sensitive but it wouldn't surprise me if they aren't on Cygwin.
If so, should we add #ifdefs for that?

Oliver

diff --git a/Src/Modules/watch.c b/Src/Modules/watch.c
index 0de8cbf9a..2ad962fb6 100644
--- a/Src/Modules/watch.c
+++ b/Src/Modules/watch.c
@@ -423,20 +423,22 @@ watchlog2(int inout, WATCH_STRUCT_UTMP *u, char *fmt, int prnt, int fini)
 /* See if the watch entry matches */
 
 static int
-watchlog_match(char *teststr, char *actual, int len)
+watchlog_match(char *teststr, char *actual, size_t buflen)
 {
     int ret = 0;
     Patprog pprog;
     char *str = dupstring(teststr);
+    int len = strnlen(actual, buflen);
+    char *user = metafy(actual, len, META_USEHEAP);
 
     tokenize(str);
 
     if ((pprog = patcompile(str, PAT_STATIC, 0))) {
 	queue_signals();
-	if (pattry(pprog, actual))
+	if (pattry(pprog, user))
 	    ret = 1;
 	unqueue_signals();
-    } else if (!strncmp(actual, teststr, len))
+    } else if (!strcmp(user, teststr))
 	ret = 1;
     return ret;
 }
@@ -488,7 +490,7 @@ watchlog(int inout, WATCH_STRUCT_UTMP *u, char **w, char *fmt)
 		for (vv = ++v; *vv && *vv != '%'; vv++);
 		sav = *vv;
 		*vv = '\0';
-		if (!watchlog_match(v, u->ut_host, strlen(v)))
+		if (!watchlog_match(v, u->ut_host, sizeof(u->ut_host)))
 		    bad = 1;
 		*vv = sav;
 		v = vv;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: $watch, log and Cyrillic usernames
  2023-10-07  2:05 ` Oliver Kiddle
@ 2023-10-07 16:49   ` Максим
  2023-10-08  0:24     ` Oliver Kiddle
  2023-10-08 21:47   ` metafy() (was Re: $watch, log and Cyrillic usernames) Oliver Kiddle
  1 sibling, 1 reply; 9+ messages in thread
From: Максим @ 2023-10-07 16:49 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 4633 bytes --]

Found out that there is no need in bypassing "adduser", as this command
works just fine:

% adduser --allow-all-names --home /root --shell /usr/bin/zsh Плохой

So it is actually possible to create user with unicode username in a
correct way (at least on Linux).


Now about the patch:

% watch=(all); log
    root has logged on /proc/10045/fd/2 from .
    Студент has logged on pts/29 from 127.0.0.1.

% watch=(notme); log # Broken
    root has logged on /proc/10045/fd/2 from .
    Студент has logged on pts/29 from 127.0.0.1.

% watch=(Студент); log
    Студент has logged on pts/29 from 127.0.0.1.


Also tested "too long" username:

% echo ${#USERNAME}
    33

% watch=(all); log
    root has logged on /proc/10045/fd/2 from .
    oooooooooooooooooooooooooooooooo has logged on pts/29 from 127.0.0.1.

% watch=(notme); log
    root has logged on /proc/10045/fd/2 from .

% watch=(${(l:33::o:):-}); log

% watch=(${(l:32::o:):-}); log
    oooooooooooooooooooooooooooooooo has logged on pts/29 from .


And no, Cygwin usernames are case-sensitive.

Sent via Gmail

сб, 7 окт. 2023 г., 05:05 Oliver Kiddle <opk@zsh.org>:

> Максим wrote:
> > Hello again. I found another bug with cyrillic usernames in zsh (again on
> > Cygwin, but can be reproduced on Linux)
>
> Reproducing does involve bypassing utilities like useradd which complain
> about invalid usernames. But I can imagine such rules will increasingly
> be relaxed and there's no reason for zsh to make assumptions.
>
> > % watch=(Студент); log # ("Студент" record is missing)
>
> The value from $watch is metafied and that's what patcompile() and
> pattry() need so the fix below uses metafy() on the username from
> utmp.
>
> However, in looking closer at the code I observed the existing use of
> sizeof(u->ut_name) which is 32 on my system. So I tried creating 32 and
> 33 character usernames (which, incidentally, useradd was happy with) and
> as I suspected u->ut_name is not null-terminated for these. So the patch
> uses strnlen() with the sizeof() for n to get the length to pass
> to metafy(). We have no existing uses of strnlen() but I don't foresee
> portability issues given that it is attributed to the 2008 POSIX
> standard and is supported in Solaris 10 which is from a few years prior
> to that. If needed, it'd be easy to provide an alternative
> implementation.
>
> To match the 33 character username, it does need to be truncated in
> $watch. last -w does manage to print the full username, would be good
> to know how. For the hostname, our code was using strlen() rather than
> sizeof(). I can't see why this would be needed. I would have tried
> putting UTF-8 in my hosts file to test that that but I'm only getting IP
> addresses in utmp. I guess we could do reverse lookups but it hardly
> seems worth it for the amount of use watch/log likely get these days.
>
> The example also uses an uppercase letter. Usernames on Unix are
> case-sensitive but it wouldn't surprise me if they aren't on Cygwin.
> If so, should we add #ifdefs for that?
>
> Oliver
>
> diff --git a/Src/Modules/watch.c b/Src/Modules/watch.c
> index 0de8cbf9a..2ad962fb6 100644
> --- a/Src/Modules/watch.c
> +++ b/Src/Modules/watch.c
> @@ -423,20 +423,22 @@ watchlog2(int inout, WATCH_STRUCT_UTMP *u, char
> *fmt, int prnt, int fini)
>  /* See if the watch entry matches */
>
>  static int
> -watchlog_match(char *teststr, char *actual, int len)
> +watchlog_match(char *teststr, char *actual, size_t buflen)
>  {
>      int ret = 0;
>      Patprog pprog;
>      char *str = dupstring(teststr);
> +    int len = strnlen(actual, buflen);
> +    char *user = metafy(actual, len, META_USEHEAP);
>
>      tokenize(str);
>
>      if ((pprog = patcompile(str, PAT_STATIC, 0))) {
>         queue_signals();
> -       if (pattry(pprog, actual))
> +       if (pattry(pprog, user))
>             ret = 1;
>         unqueue_signals();
> -    } else if (!strncmp(actual, teststr, len))
> +    } else if (!strcmp(user, teststr))
>         ret = 1;
>      return ret;
>  }
> @@ -488,7 +490,7 @@ watchlog(int inout, WATCH_STRUCT_UTMP *u, char **w,
> char *fmt)
>                 for (vv = ++v; *vv && *vv != '%'; vv++);
>                 sav = *vv;
>                 *vv = '\0';
> -               if (!watchlog_match(v, u->ut_host, strlen(v)))
> +               if (!watchlog_match(v, u->ut_host, sizeof(u->ut_host)))
>                     bad = 1;
>                 *vv = sav;
>                 v = vv;
>

[-- Attachment #2: Type: text/html, Size: 6666 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: $watch, log and Cyrillic usernames
  2023-10-07 16:49   ` Максим
@ 2023-10-08  0:24     ` Oliver Kiddle
  2023-10-08  6:20       ` Максим
  0 siblings, 1 reply; 9+ messages in thread
From: Oliver Kiddle @ 2023-10-08  0:24 UTC (permalink / raw)
  To: Максим; +Cc: zsh-workers

Максим wrote:
> Found out that there is no need in bypassing "adduser", as this command works
> just fine:

These utilities can be quite distribution specific. There is no adduser
command on my system but the old manual approach is always possible. I
also looked into how last is able to print longer usernames and it's
using sqlite and a library for access. The 32 char limit is specific to
the getutxent() interface and could be different on other systems.

> % watch=(notme); log # Broken
>     root has logged on /proc/10045/fd/2 from .
>     Студент has logged on pts/29 from 127.0.0.1.

Thanks for testing. "notme" is handled separately and the patch below
should be applied on top of the previous one.

The w++ line is a minor change from the old behaviour - moving on from
a notme entry. An alternative would be to move the return from just
above to that line.

The code is arguably now wasting time and memory metafying the same
strings multiple times. We could convert the whole utmp structure
up-front. Not sure that's worth the bother, though (?).

Oliver

diff --git a/Src/Modules/watch.c b/Src/Modules/watch.c
index 2ad962fb6..d30f5ff98 100644
--- a/Src/Modules/watch.c
+++ b/Src/Modules/watch.c
@@ -458,10 +458,14 @@ watchlog(int inout, WATCH_STRUCT_UTMP *u, char **w, char *fmt)
 	(void)watchlog2(inout, u, fmt, 1, 0);
 	return;
     }
-    if (*w && !strcmp(*w, "notme") &&
-	strncmp(u->ut_name, get_username(), sizeof(u->ut_name))) {
-	(void)watchlog2(inout, u, fmt, 1, 0);
-	return;
+    if (*w && !strcmp(*w, "notme")) {
+        char *username = metafy(u->ut_name,
+		strnlen(u->ut_name, sizeof(u->ut_name)), META_USEHEAP);
+	if (strcmp(username, get_username())) {
+	    (void)watchlog2(inout, u, fmt, 1, 0);
+	    return;
+	}
+	w++;
     }
     for (; *w; w++) {
 	bad = 0;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: $watch, log and Cyrillic usernames
  2023-10-08  0:24     ` Oliver Kiddle
@ 2023-10-08  6:20       ` Максим
  0 siblings, 0 replies; 9+ messages in thread
From: Максим @ 2023-10-08  6:20 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 2455 bytes --]

Tested:

% watch=(all); log
    root has logged on /proc/10045/fd/2 from .
    Студент has logged on pts/29 from 127.0.0.1.

% watch=(notme); log
    root has logged on /proc/10045/fd/2 from .

% watch=(Студент); log
    Студент has logged on pts/29 from 127.0.0.1.

Everything works now, thanks

Sent via Gmail

вс, 8 окт. 2023 г., 03:24 Oliver Kiddle <opk@zsh.org>:

> Максим wrote:
> > Found out that there is no need in bypassing "adduser", as this command
> works
> > just fine:
>
> These utilities can be quite distribution specific. There is no adduser
> command on my system but the old manual approach is always possible. I
> also looked into how last is able to print longer usernames and it's
> using sqlite and a library for access. The 32 char limit is specific to
> the getutxent() interface and could be different on other systems.
>
> > % watch=(notme); log # Broken
> >     root has logged on /proc/10045/fd/2 from .
> >     Студент has logged on pts/29 from 127.0.0.1.
>
> Thanks for testing. "notme" is handled separately and the patch below
> should be applied on top of the previous one.
>
> The w++ line is a minor change from the old behaviour - moving on from
> a notme entry. An alternative would be to move the return from just
> above to that line.
>
> The code is arguably now wasting time and memory metafying the same
> strings multiple times. We could convert the whole utmp structure
> up-front. Not sure that's worth the bother, though (?).
>
> Oliver
>
> diff --git a/Src/Modules/watch.c b/Src/Modules/watch.c
> index 2ad962fb6..d30f5ff98 100644
> --- a/Src/Modules/watch.c
> +++ b/Src/Modules/watch.c
> @@ -458,10 +458,14 @@ watchlog(int inout, WATCH_STRUCT_UTMP *u, char **w,
> char *fmt)
>         (void)watchlog2(inout, u, fmt, 1, 0);
>         return;
>      }
> -    if (*w && !strcmp(*w, "notme") &&
> -       strncmp(u->ut_name, get_username(), sizeof(u->ut_name))) {
> -       (void)watchlog2(inout, u, fmt, 1, 0);
> -       return;
> +    if (*w && !strcmp(*w, "notme")) {
> +        char *username = metafy(u->ut_name,
> +               strnlen(u->ut_name, sizeof(u->ut_name)), META_USEHEAP);
> +       if (strcmp(username, get_username())) {
> +           (void)watchlog2(inout, u, fmt, 1, 0);
> +           return;
> +       }
> +       w++;
>      }
>      for (; *w; w++) {
>         bad = 0;
>

[-- Attachment #2: Type: text/html, Size: 3471 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* metafy() (was Re: $watch, log and Cyrillic usernames)
  2023-10-07  2:05 ` Oliver Kiddle
  2023-10-07 16:49   ` Максим
@ 2023-10-08 21:47   ` Oliver Kiddle
  2023-10-09  2:01     ` Bart Schaefer
  1 sibling, 1 reply; 9+ messages in thread
From: Oliver Kiddle @ 2023-10-08 21:47 UTC (permalink / raw)
  To: zsh-workers

I wrote:
> However, in looking closer at the code I observed the existing use of
> sizeof(u->ut_name) which is 32 on my system. So I tried creating 32 and
> 33 character usernames (which, incidentally, useradd was happy with) and

It occurred to be to double check what metafy() does about null
termination with META_USEHEAP. The comment by META_USEHEAP is "get
memory from the heap. This leaves buf unchanged". The main way it
differs from META_HEAPDUP is that if no characters that need metafying
are found, it will return back the original passed buf. However, it does
add a terminating null at the len + 1 position so while the buf pointer is
unchanged, what it points to does get changed.

There aren't especially many calls to metafy with META_USEHEAP and in
most cases, the call uses the result of getkeystring(). I
noticed one case where we do need to add 1 byte to an allocation to
accomodate this null.

I'm not sure what the best approach is for the watch module. Subtracting 1
from n in each call to strnlen() avoids writing a null past the end of
the buffer but is not ideal for 32 character usernames. Using
META_HEAPDUP instead means a lot of heap allocations in the normal case
where there are only short ASCII-only usernames. Any ideas?

Oliver

diff --git a/Src/subst.c b/Src/subst.c
index cdbfc138a..60d850feb 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -1506,7 +1506,7 @@ substevalchar(char *ptr)
     }
 #ifdef MULTIBYTE_SUPPORT
     else if (isset(MULTIBYTE) && ires > 127) {
-	ptr = zhalloc(MB_CUR_MAX);
+	ptr = zhalloc(MB_CUR_MAX+1);
 	len = ucs4tomb((unsigned int)ires & 0xffffffff, ptr);
     }
     if (len <= 0)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: metafy() (was Re: $watch, log and Cyrillic usernames)
  2023-10-08 21:47   ` metafy() (was Re: $watch, log and Cyrillic usernames) Oliver Kiddle
@ 2023-10-09  2:01     ` Bart Schaefer
  2023-10-10 21:46       ` Oliver Kiddle
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Schaefer @ 2023-10-09  2:01 UTC (permalink / raw)
  To: zsh-workers

On Sun, Oct 8, 2023 at 2:47 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> I'm not sure what the best approach is for the watch module. Subtracting 1
> from n in each call to strnlen() avoids writing a null past the end of
> the buffer but is not ideal for 32 character usernames. Using
> META_HEAPDUP instead means a lot of heap allocations in the normal case
> where there are only short ASCII-only usernames. Any ideas?

Handle the special case explicitly?  If I follow your explanation,
something like this?

        int len = strnlen(u->ut_name, sizeof(u->ut_name));
        char *username = metafy(u->ut_name, len,
                                (len == sizeof(u->ut_name) ?
                                 META_HEAPDUP /* allow for nul terminator */ :
                                 META_USEHEAP));


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: metafy() (was Re: $watch, log and Cyrillic usernames)
  2023-10-09  2:01     ` Bart Schaefer
@ 2023-10-10 21:46       ` Oliver Kiddle
  0 siblings, 0 replies; 9+ messages in thread
From: Oliver Kiddle @ 2023-10-10 21:46 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

Bart Schaefer wrote:
> Handle the special case explicitly?  If I follow your explanation,
> something like this?

Thanks. That works.

Full patch follows.

Oliver

diff --git a/Src/Modules/watch.c b/Src/Modules/watch.c
index 0de8cbf9a..97d4fa608 100644
--- a/Src/Modules/watch.c
+++ b/Src/Modules/watch.c
@@ -423,20 +423,23 @@ watchlog2(int inout, WATCH_STRUCT_UTMP *u, char *fmt, int prnt, int fini)
 /* See if the watch entry matches */
 
 static int
-watchlog_match(char *teststr, char *actual, int len)
+watchlog_match(char *teststr, char *actual, size_t buflen)
 {
     int ret = 0;
     Patprog pprog;
     char *str = dupstring(teststr);
+    int len = strnlen(actual, buflen);
+    char *user = metafy(actual, len,
+	    len == buflen ? META_HEAPDUP : META_USEHEAP);
 
     tokenize(str);
 
     if ((pprog = patcompile(str, PAT_STATIC, 0))) {
 	queue_signals();
-	if (pattry(pprog, actual))
+	if (pattry(pprog, user))
 	    ret = 1;
 	unqueue_signals();
-    } else if (!strncmp(actual, teststr, len))
+    } else if (!strcmp(user, teststr))
 	ret = 1;
     return ret;
 }
@@ -456,10 +459,17 @@ watchlog(int inout, WATCH_STRUCT_UTMP *u, char **w, char *fmt)
 	(void)watchlog2(inout, u, fmt, 1, 0);
 	return;
     }
-    if (*w && !strcmp(*w, "notme") &&
-	strncmp(u->ut_name, get_username(), sizeof(u->ut_name))) {
-	(void)watchlog2(inout, u, fmt, 1, 0);
-	return;
+    if (*w && !strcmp(*w, "notme")) {
+	int len = strnlen(u->ut_name, sizeof(u->ut_name));
+	char *username = metafy(u->ut_name, len,
+				(len == sizeof(u->ut_name) ?
+				 META_HEAPDUP /* allow for nul terminator */ :
+				 META_USEHEAP));
+	if (strcmp(username, get_username())) {
+	    (void)watchlog2(inout, u, fmt, 1, 0);
+	    return;
+	}
+	w++;
     }
     for (; *w; w++) {
 	bad = 0;
@@ -488,7 +498,7 @@ watchlog(int inout, WATCH_STRUCT_UTMP *u, char **w, char *fmt)
 		for (vv = ++v; *vv && *vv != '%'; vv++);
 		sav = *vv;
 		*vv = '\0';
-		if (!watchlog_match(v, u->ut_host, strlen(v)))
+		if (!watchlog_match(v, u->ut_host, sizeof(u->ut_host)))
 		    bad = 1;
 		*vv = sav;
 		v = vv;


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-10-10 21:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-05 21:29 $watch, log and Cyrillic usernames Максим
2023-10-05 21:42 ` Bart Schaefer
2023-10-07  2:05 ` Oliver Kiddle
2023-10-07 16:49   ` Максим
2023-10-08  0:24     ` Oliver Kiddle
2023-10-08  6:20       ` Максим
2023-10-08 21:47   ` metafy() (was Re: $watch, log and Cyrillic usernames) Oliver Kiddle
2023-10-09  2:01     ` Bart Schaefer
2023-10-10 21:46       ` Oliver Kiddle

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).