mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] realpath without procfs -- should be ready for inclusion
@ 2020-11-22 22:56 Rich Felker
  2020-11-23  2:03 ` Alexey Izbyshev
  0 siblings, 1 reply; 20+ messages in thread
From: Rich Felker @ 2020-11-22 22:56 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1218 bytes --]

I put this aside for a while, but realpath without procfs was one of
the things I wanted to get in this release cycle, and I had it
essentially finished back in September. Here's the latest version.
Compared to the earlier draft, it handles // more consistently (//.
and //.. both resolve to //) and expands getcwd only as a final step
if needed, rather than doing it first. In the case where an absolute
symlink is reached via a relative path, this saves a syscall, avoids
repeated absolute path traversal at each step, and avoids spuriously
erroring out with ENAMETOOLONG when the working directory is deeper
than PATH_MAX but the absolute link target has a valid absolute
pathname.

To recap, the motivation for the rewrite without procfs dependency was
having things work before procfs is mounted or in chroot/namespace
where it's intentionally unavailable.

I originally considered keeping the procfs based version and only
using the new one as a fallback, but I discovered there are cases
(involving chroot, namespaces, etc.) where the answer from procfs is
wrong and validating it requires basically the same procedure as
implementing it manually (walking and performing readlink on each path
component).

Rich

[-- Attachment #2: realpath8.c --]
[-- Type: text/plain, Size: 2392 bytes --]

#define _GNU_SOURCE
#include <stdlib.h>
#include <limits.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>

static inline int at_dotdot(const char *end, size_t len)
{
	if (len<2) return 0;
	if (len>2 && end[-3]!='/') return 0;
	return end[-1]=='.' && end[-2]=='.';
}

char *realpath(const char *restrict filename, char *restrict resolved)
{
	char stack[PATH_MAX];
	char buf[resolved ? 1 : PATH_MAX];
	char *output = resolved ? resolved : buf;
	size_t p, q, l, cnt=0;

	l = strnlen(filename, sizeof stack + 1);
	if (!l) {
		errno = ENOENT;
		return 0;
	}
	if (l >= sizeof stack) goto toolong;
	p = sizeof stack - l - 1;
	q = 0;
	memcpy(stack+p, filename, l+1);

	while (stack[p]) {
		int up = 0;
		if (stack[p] == '/') {
			q=0;
			output[q++] = '/';
			p++;
			/* Initial // is special. */
			if (stack[p] == '/' && stack[p+1] != '/') {
				output[q++] = '/';
			}
			while (stack[p] == '/') p++;
			continue;
		}
		char *z = __strchrnul(stack+p, '/');
		l = z-(stack+p);
		if (l==1 && stack[p]=='.') {
			p += l;
			while (stack[p] == '/') p++;
			continue;
		}
		if (at_dotdot(stack+p+l, l)) {
			if (q && !at_dotdot(output+q, q)) {
				while(q && output[q-1]!='/') q--;
				if (q>1 && (q>2 || output[0]!='/')) q--;
				p += l;
				while (stack[p] == '/') p++;
				continue;
			}
			up = 1;
		}
		if (q && output[q-1] != '/') {
			if (!p) goto toolong;
			stack[--p] = '/';
			l++;
		}
		if (q+l >= PATH_MAX) goto toolong;
		memcpy(output+q, stack+p, l);
		output[q+l] = 0;
		p += l;
		if (up) goto notlink;
		ssize_t k = readlink(output, stack, p);
		if (k==-1) {
			if (errno == EINVAL) {
notlink:
				q += l;
				while (stack[p] == '/') p++;
				continue;
			}
			return 0;
		}
		if (k==p) goto toolong;
		if (++cnt == SYMLOOP_MAX) {
			errno = ELOOP;
			return 0;
		}
		p -= k;
		memmove(stack+p, stack, k);
	}

	output[q] = 0;

	if (output[0] != '/') {
		if (!getcwd(stack, sizeof stack)) return 0;
		l = strlen(stack);
		/* Cancel any initial .. components. */
		p = 0;
		while (q-p>=2 && at_dotdot(output+p+2, p+2)) {
			while(l>1 && stack[l-1]!='/') l--;
			if (l>1) l--;
			p += 2;
			if (p<q) p++;
		}
		if (q-p && stack[l-1]!='/') output[--p] = '/';
		if (l + (q-p) + 1 >= PATH_MAX) goto toolong;
		memmove(output + l, output + p, q - p + 1);
		memcpy(output, stack, l);
	}

	return resolved ? resolved : strdup(output);

toolong:
	errno = ENAMETOOLONG;
	return 0;
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-22 22:56 [musl] realpath without procfs -- should be ready for inclusion Rich Felker
@ 2020-11-23  2:03 ` Alexey Izbyshev
  2020-11-23  3:17   ` Érico Nogueira
  2020-11-23  3:19   ` Rich Felker
  0 siblings, 2 replies; 20+ messages in thread
From: Alexey Izbyshev @ 2020-11-23  2:03 UTC (permalink / raw)
  To: musl

On 2020-11-23 01:56, Rich Felker wrote:
> I originally considered keeping the procfs based version and only
> using the new one as a fallback, but I discovered there are cases
> (involving chroot, namespaces, etc.) where the answer from procfs is
> wrong and validating it requires basically the same procedure as
> implementing it manually (walking and performing readlink on each path
> component).
> 
Pity that the simple and fast procfs-based implementation goes away. Do 
you have any specific example of a wrong answer from procfs at hand, or 
at least a more specific direction to look at than just 
"chroot/namespaces"?

> #define _GNU_SOURCE
> #include <stdlib.h>
> #include <limits.h>
> #include <errno.h>
> #include <unistd.h>
> #include <string.h>
> 
> static inline int at_dotdot(const char *end, size_t len)
> {
>    if (len<2) return 0;
>    if (len>2 && end[-3]!='/') return 0;
>    return end[-1]=='.' && end[-2]=='.';
> }
> 
> char *realpath(const char *restrict filename, char *restrict resolved)
> {
>    char stack[PATH_MAX];
>    char buf[resolved ? 1 : PATH_MAX];
>    char *output = resolved ? resolved : buf;
>    size_t p, q, l, cnt=0;
> 
>    l = strnlen(filename, sizeof stack + 1);

Why + 1?

>    if (!l) {
>        errno = ENOENT;
>        return 0;
>    }
>    if (l >= sizeof stack) goto toolong;
>    p = sizeof stack - l - 1;
>    q = 0;
>    memcpy(stack+p, filename, l+1);
> 
>    while (stack[p]) {
>        int up = 0;
>        if (stack[p] == '/') {
>            q=0;
>            output[q++] = '/';
>            p++;
>            /* Initial // is special. */
>            if (stack[p] == '/' && stack[p+1] != '/') {
>                output[q++] = '/';
>            }
>            while (stack[p] == '/') p++;
>            continue;
>        }
>        char *z = __strchrnul(stack+p, '/');
>        l = z-(stack+p);
>        if (l==1 && stack[p]=='.') {
>            p += l;
>            while (stack[p] == '/') p++;
>            continue;
>        }
>        if (at_dotdot(stack+p+l, l)) {
>            if (q && !at_dotdot(output+q, q)) {
>                while(q && output[q-1]!='/') q--;
>                if (q>1 && (q>2 || output[0]!='/')) q--;
>                p += l;
>                while (stack[p] == '/') p++;
>                continue;
>            }
>            up = 1;
>        }
>        if (q && output[q-1] != '/') {
>            if (!p) goto toolong;
>            stack[--p] = '/';
>            l++;
>        }
>        if (q+l >= PATH_MAX) goto toolong;
>        memcpy(output+q, stack+p, l);
>        output[q+l] = 0;
>        p += l;
>        if (up) goto notlink;
>        ssize_t k = readlink(output, stack, p);
>        if (k==-1) {
>            if (errno == EINVAL) {
> notlink:
>                q += l;
>                while (stack[p] == '/') p++;
>                continue;
>            }
>            return 0;
>        }
>        if (k==p) goto toolong;
>        if (++cnt == SYMLOOP_MAX) {
>            errno = ELOOP;
>            return 0;
>        }
>        p -= k;
>        memmove(stack+p, stack, k);

This isn't always correct if the symlink resolves to "/" because 
stack[p] might be '/', so two slashes will be preserved in the output. 
For example, "root/home" resolves to "//home" (where "root" -> "/").

>    }
> 
>    output[q] = 0;
> 
>    if (output[0] != '/') {
>        if (!getcwd(stack, sizeof stack)) return 0;
>        l = strlen(stack);
>        /* Cancel any initial .. components. */
>        p = 0;
>        while (q-p>=2 && at_dotdot(output+p+2, p+2)) {

This doesn't check that output+p+2 is the end of a path element, which 
is the prerequisite for at_dotdot(). So, for example, "..." resolves 
incorrectly.

>            while(l>1 && stack[l-1]!='/') l--;
>            if (l>1) l--;
>            p += 2;
>            if (p<q) p++;
>        }
>        if (q-p && stack[l-1]!='/') output[--p] = '/';
>        if (l + (q-p) + 1 >= PATH_MAX) goto toolong;
>        memmove(output + l, output + p, q - p + 1);
>        memcpy(output, stack, l);
>    }
> 
>    return resolved ? resolved : strdup(output);
> 
> toolong:
>    errno = ENAMETOOLONG;
>    return 0;
> }
> 

Alexey


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-23  2:03 ` Alexey Izbyshev
@ 2020-11-23  3:17   ` Érico Nogueira
  2020-11-23  3:34     ` Rich Felker
  2020-11-23  3:19   ` Rich Felker
  1 sibling, 1 reply; 20+ messages in thread
From: Érico Nogueira @ 2020-11-23  3:17 UTC (permalink / raw)
  To: musl

On Sun Nov 22, 2020 at 11:03 PM -03, Alexey Izbyshev wrote:
> On 2020-11-23 01:56, Rich Felker wrote:
> > I originally considered keeping the procfs based version and only
> > using the new one as a fallback, but I discovered there are cases
> > (involving chroot, namespaces, etc.) where the answer from procfs is
> > wrong and validating it requires basically the same procedure as
> > implementing it manually (walking and performing readlink on each path
> > component).
> > 
> Pity that the simple and fast procfs-based implementation goes away. Do
> you have any specific example of a wrong answer from procfs at hand, or
> at least a more specific direction to look at than just
> "chroot/namespaces"?

bubblewrap (when driven by Flatpak) is one such software. Void carries
a patch [1] with NetBSD's realpath impl to work around this. Without it,
launching flatpak applications sometimes didn't work at all.

- [1] https://github.com/void-linux/void-packages/blob/da86d30391e2b3535e8f9dfae452d2b362887e41/srcpkgs/bubblewrap/patches/realpath-workaround.patch

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-23  2:03 ` Alexey Izbyshev
  2020-11-23  3:17   ` Érico Nogueira
@ 2020-11-23  3:19   ` Rich Felker
  2020-11-23 18:56     ` Rich Felker
  2020-11-24  3:41     ` Alexey Izbyshev
  1 sibling, 2 replies; 20+ messages in thread
From: Rich Felker @ 2020-11-23  3:19 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Mon, Nov 23, 2020 at 05:03:25AM +0300, Alexey Izbyshev wrote:
> On 2020-11-23 01:56, Rich Felker wrote:
> >I originally considered keeping the procfs based version and only
> >using the new one as a fallback, but I discovered there are cases
> >(involving chroot, namespaces, etc.) where the answer from procfs is
> >wrong and validating it requires basically the same procedure as
> >implementing it manually (walking and performing readlink on each path
> >component).
> >
> Pity that the simple and fast procfs-based implementation goes away.
> Do you have any specific example of a wrong answer from procfs at
> hand, or at least a more specific direction to look at than just
> "chroot/namespaces"?

Assuming you even have procfs, the name read from readlink on procfs
will be relative to the real root of the mount namespace, not the
chroot. If the mount namespace has bind mounts over top of anything,
it can also give you a pathname that's no longer valid because
something is mounted over it.

There may be other ways this arises too. At the very least, you need
to do fstat/stat to match them up like we do now; otherwise you can
get wildly wrong results. But even if the stat matches up, it's still
possible that the resulting pathname is an absolute pathname outside
the chroot or behind the bind mount or whatever, but is also valid but
involving symlink traversal in when processed from in the current
process context. This means you return a result that does not satisfy
the contract to be symlink-free.

There's also a matter I didn't mention that the current code is wrong
in an unsafe way on per-O_PATH kernels. Other places we mitigate that
by using O_NOFOLLOW and O_NOCTTY to avoid *most* of the possible
unwanted side effects if opening an actual file on a kernel that
doesn't have O_PATH, but on realpath we specifically can't use
O_NOFOLLOW, and this makes it susceptible to tricking root (or any
user with read access) into opening device nodes, in ways that might
have side effects.

So, there are a lot of bad things about the current implementation.
Even the minor mitigations present now for some of them (the stat
check) along with the overhead (open/close) makes it questionable
whether it's faster for lots of inputs. For deep paths the new one is
probably slower, but for typical ones it's not as clear and I didn't
measure.


> 
> >#define _GNU_SOURCE
> >#include <stdlib.h>
> >#include <limits.h>
> >#include <errno.h>
> >#include <unistd.h>
> >#include <string.h>
> >
> >static inline int at_dotdot(const char *end, size_t len)
> >{
> >   if (len<2) return 0;
> >   if (len>2 && end[-3]!='/') return 0;
> >   return end[-1]=='.' && end[-2]=='.';
> >}
> >
> >char *realpath(const char *restrict filename, char *restrict resolved)
> >{
> >   char stack[PATH_MAX];
> >   char buf[resolved ? 1 : PATH_MAX];
> >   char *output = resolved ? resolved : buf;
> >   size_t p, q, l, cnt=0;
> >
> >   l = strnlen(filename, sizeof stack + 1);
> 
> Why + 1?

I was thinking it was to ensure that the largest possible result is
sufficient to detect ENAMETOOLONG condition, but even == PATH_MAX is
sufficient for that since PATH_MAX is a limit including null
termination. So I think the +1 can be removed.

> >   if (!l) {
> >       errno = ENOENT;
> >       return 0;
> >   }
> >   if (l >= sizeof stack) goto toolong;
> >   p = sizeof stack - l - 1;
> >   q = 0;
> >   memcpy(stack+p, filename, l+1);
> >
> >   while (stack[p]) {
> >       int up = 0;
> >       if (stack[p] == '/') {
> >           q=0;
> >           output[q++] = '/';
> >           p++;
> >           /* Initial // is special. */
> >           if (stack[p] == '/' && stack[p+1] != '/') {
> >               output[q++] = '/';
> >           }
> >           while (stack[p] == '/') p++;
> >           continue;
> >       }
> >       char *z = __strchrnul(stack+p, '/');
> >       l = z-(stack+p);
> >       if (l==1 && stack[p]=='.') {
> >           p += l;
> >           while (stack[p] == '/') p++;
> >           continue;
> >       }
> >       if (at_dotdot(stack+p+l, l)) {
> >           if (q && !at_dotdot(output+q, q)) {
> >               while(q && output[q-1]!='/') q--;
> >               if (q>1 && (q>2 || output[0]!='/')) q--;
> >               p += l;
> >               while (stack[p] == '/') p++;
> >               continue;
> >           }
> >           up = 1;
> >       }
> >       if (q && output[q-1] != '/') {
> >           if (!p) goto toolong;
> >           stack[--p] = '/';
> >           l++;
> >       }
> >       if (q+l >= PATH_MAX) goto toolong;
> >       memcpy(output+q, stack+p, l);
> >       output[q+l] = 0;
> >       p += l;
> >       if (up) goto notlink;
> >       ssize_t k = readlink(output, stack, p);
> >       if (k==-1) {
> >           if (errno == EINVAL) {
> >notlink:
> >               q += l;
> >               while (stack[p] == '/') p++;
> >               continue;
> >           }
> >           return 0;
> >       }
> >       if (k==p) goto toolong;
> >       if (++cnt == SYMLOOP_MAX) {
> >           errno = ELOOP;
> >           return 0;
> >       }
> >       p -= k;
> >       memmove(stack+p, stack, k);
> 
> This isn't always correct if the symlink resolves to "/" because
> stack[p] might be '/', so two slashes will be preserved in the
> output. For example, "root/home" resolves to "//home" (where "root"
> -> "/").

Thanks for catching that. I propose:

	if (stack[k-1]=='/') p++;

And that raises the point that k==0 should be handled, even though
Linux doesn't let you create such links, since in theory they could
come from an existing fs or from non-Linux kernels (see
https://lwn.net/Articles/551224/ for coverage of the topic). I propose
erroring out with ENOENT in this case right after the k==p check above
(since k==0 due to p==0 would bee toolong not ENOENT if it can
happen).

> >   output[q] = 0;
> >
> >   if (output[0] != '/') {
> >       if (!getcwd(stack, sizeof stack)) return 0;
> >       l = strlen(stack);
> >       /* Cancel any initial .. components. */
> >       p = 0;
> >       while (q-p>=2 && at_dotdot(output+p+2, p+2)) {
> 
> This doesn't check that output+p+2 is the end of a path element,
> which is the prerequisite for at_dotdot(). So, for example, "..."
> resolves incorrectly.

Thanks again. I don't see a really good way to reuse at_dotdot here,
do you? Probably just [0]=='.' && [1]=='.' && (![2] || [2]=='/') is
the cleanest.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-23  3:17   ` Érico Nogueira
@ 2020-11-23  3:34     ` Rich Felker
  0 siblings, 0 replies; 20+ messages in thread
From: Rich Felker @ 2020-11-23  3:34 UTC (permalink / raw)
  To: musl

On Mon, Nov 23, 2020 at 12:17:27AM -0300, Érico Nogueira wrote:
> On Sun Nov 22, 2020 at 11:03 PM -03, Alexey Izbyshev wrote:
> > On 2020-11-23 01:56, Rich Felker wrote:
> > > I originally considered keeping the procfs based version and only
> > > using the new one as a fallback, but I discovered there are cases
> > > (involving chroot, namespaces, etc.) where the answer from procfs is
> > > wrong and validating it requires basically the same procedure as
> > > implementing it manually (walking and performing readlink on each path
> > > component).
> > > 
> > Pity that the simple and fast procfs-based implementation goes away. Do
> > you have any specific example of a wrong answer from procfs at hand, or
> > at least a more specific direction to look at than just
> > "chroot/namespaces"?
> 
> bubblewrap (when driven by Flatpak) is one such software. Void carries
> a patch [1] with NetBSD's realpath impl to work around this. Without it,
> launching flatpak applications sometimes didn't work at all.
> 
> - [1] https://github.com/void-linux/void-packages/blob/da86d30391e2b3535e8f9dfae452d2b362887e41/srcpkgs/bubblewrap/patches/realpath-workaround.patch

FWIW this seems to be a reason for needing the real implementation
like proposed, but not a reason for getting rid of the proc-based code
path. The reasons for that I mostly covered in my reply.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-23  3:19   ` Rich Felker
@ 2020-11-23 18:56     ` Rich Felker
  2020-11-23 20:53       ` Rich Felker
  2020-11-24  3:41     ` Alexey Izbyshev
  1 sibling, 1 reply; 20+ messages in thread
From: Rich Felker @ 2020-11-23 18:56 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 6941 bytes --]

On Sun, Nov 22, 2020 at 10:19:33PM -0500, Rich Felker wrote:
> On Mon, Nov 23, 2020 at 05:03:25AM +0300, Alexey Izbyshev wrote:
> > On 2020-11-23 01:56, Rich Felker wrote:
> > >I originally considered keeping the procfs based version and only
> > >using the new one as a fallback, but I discovered there are cases
> > >(involving chroot, namespaces, etc.) where the answer from procfs is
> > >wrong and validating it requires basically the same procedure as
> > >implementing it manually (walking and performing readlink on each path
> > >component).
> > >
> > Pity that the simple and fast procfs-based implementation goes away.
> > Do you have any specific example of a wrong answer from procfs at
> > hand, or at least a more specific direction to look at than just
> > "chroot/namespaces"?
> 
> Assuming you even have procfs, the name read from readlink on procfs
> will be relative to the real root of the mount namespace, not the
> chroot. If the mount namespace has bind mounts over top of anything,
> it can also give you a pathname that's no longer valid because
> something is mounted over it.
> 
> There may be other ways this arises too. At the very least, you need
> to do fstat/stat to match them up like we do now; otherwise you can
> get wildly wrong results. But even if the stat matches up, it's still
> possible that the resulting pathname is an absolute pathname outside
> the chroot or behind the bind mount or whatever, but is also valid but
> involving symlink traversal in when processed from in the current
> process context. This means you return a result that does not satisfy
> the contract to be symlink-free.
> 
> There's also a matter I didn't mention that the current code is wrong
> in an unsafe way on per-O_PATH kernels. Other places we mitigate that
> by using O_NOFOLLOW and O_NOCTTY to avoid *most* of the possible
> unwanted side effects if opening an actual file on a kernel that
> doesn't have O_PATH, but on realpath we specifically can't use
> O_NOFOLLOW, and this makes it susceptible to tricking root (or any
> user with read access) into opening device nodes, in ways that might
> have side effects.
> 
> So, there are a lot of bad things about the current implementation.
> Even the minor mitigations present now for some of them (the stat
> check) along with the overhead (open/close) makes it questionable
> whether it's faster for lots of inputs. For deep paths the new one is
> probably slower, but for typical ones it's not as clear and I didn't
> measure.
> 
> 
> > 
> > >#define _GNU_SOURCE
> > >#include <stdlib.h>
> > >#include <limits.h>
> > >#include <errno.h>
> > >#include <unistd.h>
> > >#include <string.h>
> > >
> > >static inline int at_dotdot(const char *end, size_t len)
> > >{
> > >   if (len<2) return 0;
> > >   if (len>2 && end[-3]!='/') return 0;
> > >   return end[-1]=='.' && end[-2]=='.';
> > >}
> > >
> > >char *realpath(const char *restrict filename, char *restrict resolved)
> > >{
> > >   char stack[PATH_MAX];
> > >   char buf[resolved ? 1 : PATH_MAX];
> > >   char *output = resolved ? resolved : buf;
> > >   size_t p, q, l, cnt=0;
> > >
> > >   l = strnlen(filename, sizeof stack + 1);
> > 
> > Why + 1?
> 
> I was thinking it was to ensure that the largest possible result is
> sufficient to detect ENAMETOOLONG condition, but even == PATH_MAX is
> sufficient for that since PATH_MAX is a limit including null
> termination. So I think the +1 can be removed.
> 
> > >   if (!l) {
> > >       errno = ENOENT;
> > >       return 0;
> > >   }
> > >   if (l >= sizeof stack) goto toolong;
> > >   p = sizeof stack - l - 1;
> > >   q = 0;
> > >   memcpy(stack+p, filename, l+1);
> > >
> > >   while (stack[p]) {
> > >       int up = 0;
> > >       if (stack[p] == '/') {
> > >           q=0;
> > >           output[q++] = '/';
> > >           p++;
> > >           /* Initial // is special. */
> > >           if (stack[p] == '/' && stack[p+1] != '/') {
> > >               output[q++] = '/';
> > >           }
> > >           while (stack[p] == '/') p++;
> > >           continue;
> > >       }
> > >       char *z = __strchrnul(stack+p, '/');
> > >       l = z-(stack+p);
> > >       if (l==1 && stack[p]=='.') {
> > >           p += l;
> > >           while (stack[p] == '/') p++;
> > >           continue;
> > >       }
> > >       if (at_dotdot(stack+p+l, l)) {
> > >           if (q && !at_dotdot(output+q, q)) {
> > >               while(q && output[q-1]!='/') q--;
> > >               if (q>1 && (q>2 || output[0]!='/')) q--;
> > >               p += l;
> > >               while (stack[p] == '/') p++;
> > >               continue;
> > >           }
> > >           up = 1;
> > >       }
> > >       if (q && output[q-1] != '/') {
> > >           if (!p) goto toolong;
> > >           stack[--p] = '/';
> > >           l++;
> > >       }
> > >       if (q+l >= PATH_MAX) goto toolong;
> > >       memcpy(output+q, stack+p, l);
> > >       output[q+l] = 0;
> > >       p += l;
> > >       if (up) goto notlink;
> > >       ssize_t k = readlink(output, stack, p);
> > >       if (k==-1) {
> > >           if (errno == EINVAL) {
> > >notlink:
> > >               q += l;
> > >               while (stack[p] == '/') p++;
> > >               continue;
> > >           }
> > >           return 0;
> > >       }
> > >       if (k==p) goto toolong;
> > >       if (++cnt == SYMLOOP_MAX) {
> > >           errno = ELOOP;
> > >           return 0;
> > >       }
> > >       p -= k;
> > >       memmove(stack+p, stack, k);
> > 
> > This isn't always correct if the symlink resolves to "/" because
> > stack[p] might be '/', so two slashes will be preserved in the
> > output. For example, "root/home" resolves to "//home" (where "root"
> > -> "/").
> 
> Thanks for catching that. I propose:
> 
> 	if (stack[k-1]=='/') p++;
> 
> And that raises the point that k==0 should be handled, even though
> Linux doesn't let you create such links, since in theory they could
> come from an existing fs or from non-Linux kernels (see
> https://lwn.net/Articles/551224/ for coverage of the topic). I propose
> erroring out with ENOENT in this case right after the k==p check above
> (since k==0 due to p==0 would bee toolong not ENOENT if it can
> happen).
> 
> > >   output[q] = 0;
> > >
> > >   if (output[0] != '/') {
> > >       if (!getcwd(stack, sizeof stack)) return 0;
> > >       l = strlen(stack);
> > >       /* Cancel any initial .. components. */
> > >       p = 0;
> > >       while (q-p>=2 && at_dotdot(output+p+2, p+2)) {
> > 
> > This doesn't check that output+p+2 is the end of a path element,
> > which is the prerequisite for at_dotdot(). So, for example, "..."
> > resolves incorrectly.
> 
> Thanks again. I don't see a really good way to reuse at_dotdot here,
> do you? Probably just [0]=='.' && [1]=='.' && (![2] || [2]=='/') is
> the cleanest.

And here are the proposed fixes as a patch. This should be good now.
Perhaps should add some comments.

Rich

[-- Attachment #2: realpath-fixes.diff --]
[-- Type: text/plain, Size: 887 bytes --]

--- realpath8.c	2020-11-22 17:52:17.586481571 -0500
+++ realpath9.c	2020-11-23 13:55:06.808458893 -0500
@@ -19,7 +19,7 @@
 	char *output = resolved ? resolved : buf;
 	size_t p, q, l, cnt=0;
 
-	l = strnlen(filename, sizeof stack + 1);
+	l = strnlen(filename, sizeof stack);
 	if (!l) {
 		errno = ENOENT;
 		return 0;
@@ -80,11 +80,16 @@
 			return 0;
 		}
 		if (k==p) goto toolong;
+		if (!k) {
+			errno = ENOENT;
+			return 0;
+		}
 		if (++cnt == SYMLOOP_MAX) {
 			errno = ELOOP;
 			return 0;
 		}
 		p -= k;
+		if (stack[k-1]=='/') p++;
 		memmove(stack+p, stack, k);
 	}
 
@@ -95,7 +100,8 @@
 		l = strlen(stack);
 		/* Cancel any initial .. components. */
 		p = 0;
-		while (q-p>=2 && at_dotdot(output+p+2, p+2)) {
+		while (output[p]=='.' && output[p+1]=='.'
+		  && (!output[p+2] || output[p+2]=='/')) {
 			while(l>1 && stack[l-1]!='/') l--;
 			if (l>1) l--;
 			p += 2;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-23 18:56     ` Rich Felker
@ 2020-11-23 20:53       ` Rich Felker
  2020-11-24  3:39         ` Alexey Izbyshev
  0 siblings, 1 reply; 20+ messages in thread
From: Rich Felker @ 2020-11-23 20:53 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Mon, Nov 23, 2020 at 01:56:33PM -0500, Rich Felker wrote:
> On Sun, Nov 22, 2020 at 10:19:33PM -0500, Rich Felker wrote:
> > On Mon, Nov 23, 2020 at 05:03:25AM +0300, Alexey Izbyshev wrote:
> > > On 2020-11-23 01:56, Rich Felker wrote:
> > > >I originally considered keeping the procfs based version and only
> > > >using the new one as a fallback, but I discovered there are cases
> > > >(involving chroot, namespaces, etc.) where the answer from procfs is
> > > >wrong and validating it requires basically the same procedure as
> > > >implementing it manually (walking and performing readlink on each path
> > > >component).
> > > >
> > > Pity that the simple and fast procfs-based implementation goes away.
> > > Do you have any specific example of a wrong answer from procfs at
> > > hand, or at least a more specific direction to look at than just
> > > "chroot/namespaces"?
> > 
> > Assuming you even have procfs, the name read from readlink on procfs
> > will be relative to the real root of the mount namespace, not the
> > chroot. If the mount namespace has bind mounts over top of anything,
> > it can also give you a pathname that's no longer valid because
> > something is mounted over it.
> > 
> > There may be other ways this arises too. At the very least, you need
> > to do fstat/stat to match them up like we do now; otherwise you can
> > get wildly wrong results. But even if the stat matches up, it's still
> > possible that the resulting pathname is an absolute pathname outside
> > the chroot or behind the bind mount or whatever, but is also valid but
> > involving symlink traversal in when processed from in the current
> > process context. This means you return a result that does not satisfy
> > the contract to be symlink-free.
> > 
> > There's also a matter I didn't mention that the current code is wrong
> > in an unsafe way on per-O_PATH kernels. Other places we mitigate that
> > by using O_NOFOLLOW and O_NOCTTY to avoid *most* of the possible
> > unwanted side effects if opening an actual file on a kernel that
> > doesn't have O_PATH, but on realpath we specifically can't use
> > O_NOFOLLOW, and this makes it susceptible to tricking root (or any
> > user with read access) into opening device nodes, in ways that might
> > have side effects.
> > 
> > So, there are a lot of bad things about the current implementation.
> > Even the minor mitigations present now for some of them (the stat
> > check) along with the overhead (open/close) makes it questionable
> > whether it's faster for lots of inputs. For deep paths the new one is
> > probably slower, but for typical ones it's not as clear and I didn't
> > measure.
> > 
> > 
> > > 
> > > >#define _GNU_SOURCE
> > > >#include <stdlib.h>
> > > >#include <limits.h>
> > > >#include <errno.h>
> > > >#include <unistd.h>
> > > >#include <string.h>
> > > >
> > > >static inline int at_dotdot(const char *end, size_t len)
> > > >{
> > > >   if (len<2) return 0;
> > > >   if (len>2 && end[-3]!='/') return 0;
> > > >   return end[-1]=='.' && end[-2]=='.';
> > > >}
> > > >
> > > >char *realpath(const char *restrict filename, char *restrict resolved)
> > > >{
> > > >   char stack[PATH_MAX];
> > > >   char buf[resolved ? 1 : PATH_MAX];
> > > >   char *output = resolved ? resolved : buf;
> > > >   size_t p, q, l, cnt=0;
> > > >
> > > >   l = strnlen(filename, sizeof stack + 1);
> > > 
> > > Why + 1?
> > 
> > I was thinking it was to ensure that the largest possible result is
> > sufficient to detect ENAMETOOLONG condition, but even == PATH_MAX is
> > sufficient for that since PATH_MAX is a limit including null
> > termination. So I think the +1 can be removed.
> > 
> > > >   if (!l) {
> > > >       errno = ENOENT;
> > > >       return 0;
> > > >   }
> > > >   if (l >= sizeof stack) goto toolong;
> > > >   p = sizeof stack - l - 1;
> > > >   q = 0;
> > > >   memcpy(stack+p, filename, l+1);
> > > >
> > > >   while (stack[p]) {
> > > >       int up = 0;
> > > >       if (stack[p] == '/') {
> > > >           q=0;
> > > >           output[q++] = '/';
> > > >           p++;
> > > >           /* Initial // is special. */
> > > >           if (stack[p] == '/' && stack[p+1] != '/') {
> > > >               output[q++] = '/';
> > > >           }
> > > >           while (stack[p] == '/') p++;
> > > >           continue;
> > > >       }
> > > >       char *z = __strchrnul(stack+p, '/');
> > > >       l = z-(stack+p);
> > > >       if (l==1 && stack[p]=='.') {
> > > >           p += l;
> > > >           while (stack[p] == '/') p++;
> > > >           continue;
> > > >       }
> > > >       if (at_dotdot(stack+p+l, l)) {
> > > >           if (q && !at_dotdot(output+q, q)) {
> > > >               while(q && output[q-1]!='/') q--;
> > > >               if (q>1 && (q>2 || output[0]!='/')) q--;
> > > >               p += l;
> > > >               while (stack[p] == '/') p++;
> > > >               continue;
> > > >           }
> > > >           up = 1;
> > > >       }
> > > >       if (q && output[q-1] != '/') {
> > > >           if (!p) goto toolong;
> > > >           stack[--p] = '/';
> > > >           l++;
> > > >       }
> > > >       if (q+l >= PATH_MAX) goto toolong;
> > > >       memcpy(output+q, stack+p, l);
> > > >       output[q+l] = 0;
> > > >       p += l;
> > > >       if (up) goto notlink;
> > > >       ssize_t k = readlink(output, stack, p);
> > > >       if (k==-1) {
> > > >           if (errno == EINVAL) {
> > > >notlink:
> > > >               q += l;
> > > >               while (stack[p] == '/') p++;
> > > >               continue;
> > > >           }
> > > >           return 0;
> > > >       }
> > > >       if (k==p) goto toolong;
> > > >       if (++cnt == SYMLOOP_MAX) {
> > > >           errno = ELOOP;
> > > >           return 0;
> > > >       }
> > > >       p -= k;
> > > >       memmove(stack+p, stack, k);
> > > 
> > > This isn't always correct if the symlink resolves to "/" because
> > > stack[p] might be '/', so two slashes will be preserved in the
> > > output. For example, "root/home" resolves to "//home" (where "root"
> > > -> "/").
> > 
> > Thanks for catching that. I propose:
> > 
> > 	if (stack[k-1]=='/') p++;
> > 
> > And that raises the point that k==0 should be handled, even though
> > Linux doesn't let you create such links, since in theory they could
> > come from an existing fs or from non-Linux kernels (see
> > https://lwn.net/Articles/551224/ for coverage of the topic). I propose
> > erroring out with ENOENT in this case right after the k==p check above
> > (since k==0 due to p==0 would bee toolong not ENOENT if it can
> > happen).
> > 
> > > >   output[q] = 0;
> > > >
> > > >   if (output[0] != '/') {
> > > >       if (!getcwd(stack, sizeof stack)) return 0;
> > > >       l = strlen(stack);
> > > >       /* Cancel any initial .. components. */
> > > >       p = 0;
> > > >       while (q-p>=2 && at_dotdot(output+p+2, p+2)) {
> > > 
> > > This doesn't check that output+p+2 is the end of a path element,
> > > which is the prerequisite for at_dotdot(). So, for example, "..."
> > > resolves incorrectly.
> > 
> > Thanks again. I don't see a really good way to reuse at_dotdot here,
> > do you? Probably just [0]=='.' && [1]=='.' && (![2] || [2]=='/') is
> > the cleanest.
> 
> And here are the proposed fixes as a patch. This should be good now.
> Perhaps should add some comments.
> 
> Rich

> --- realpath8.c	2020-11-22 17:52:17.586481571 -0500
> +++ realpath9.c	2020-11-23 13:55:06.808458893 -0500
> @@ -19,7 +19,7 @@
>  	char *output = resolved ? resolved : buf;
>  	size_t p, q, l, cnt=0;
>  
> -	l = strnlen(filename, sizeof stack + 1);
> +	l = strnlen(filename, sizeof stack);
>  	if (!l) {
>  		errno = ENOENT;
>  		return 0;
> @@ -80,11 +80,16 @@
>  			return 0;
>  		}
>  		if (k==p) goto toolong;
> +		if (!k) {
> +			errno = ENOENT;
> +			return 0;
> +		}
>  		if (++cnt == SYMLOOP_MAX) {
>  			errno = ELOOP;
>  			return 0;
>  		}
>  		p -= k;
> +		if (stack[k-1]=='/') p++;
>  		memmove(stack+p, stack, k);

This is wrong and needs further consideration.

>  	}
>  
> @@ -95,7 +100,8 @@
>  		l = strlen(stack);
>  		/* Cancel any initial .. components. */
>  		p = 0;
> -		while (q-p>=2 && at_dotdot(output+p+2, p+2)) {
> +		while (output[p]=='.' && output[p+1]=='.'
> +		  && (!output[p+2] || output[p+2]=='/')) {
>  			while(l>1 && stack[l-1]!='/') l--;
>  			if (l>1) l--;
>  			p += 2;

OK, I have a better improvement for this: counting the number of
levels of .. as they're built at the head of output. Then it's just
while (nup--) here, and the condition for canceling .. in the first
loop no longer needs any string inspection; it's just (q>3*nup).

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-23 20:53       ` Rich Felker
@ 2020-11-24  3:39         ` Alexey Izbyshev
  2020-11-24  4:26           ` Rich Felker
  0 siblings, 1 reply; 20+ messages in thread
From: Alexey Izbyshev @ 2020-11-24  3:39 UTC (permalink / raw)
  To: musl

On 2020-11-23 23:53, Rich Felker wrote:
> On Mon, Nov 23, 2020 at 01:56:33PM -0500, Rich Felker wrote:
>> On Sun, Nov 22, 2020 at 10:19:33PM -0500, Rich Felker wrote:
>> --- realpath8.c	2020-11-22 17:52:17.586481571 -0500
>> +++ realpath9.c	2020-11-23 13:55:06.808458893 -0500
>> @@ -19,7 +19,7 @@
>>  	char *output = resolved ? resolved : buf;
>>  	size_t p, q, l, cnt=0;
>> 
>> -	l = strnlen(filename, sizeof stack + 1);
>> +	l = strnlen(filename, sizeof stack);
>>  	if (!l) {
>>  		errno = ENOENT;
>>  		return 0;
>> @@ -80,11 +80,16 @@
>>  			return 0;
>>  		}
>>  		if (k==p) goto toolong;
>> +		if (!k) {
>> +			errno = ENOENT;
>> +			return 0;
>> +		}
>>  		if (++cnt == SYMLOOP_MAX) {
>>  			errno = ELOOP;
>>  			return 0;
>>  		}
>>  		p -= k;
>> +		if (stack[k-1]=='/') p++;
>>  		memmove(stack+p, stack, k);
> 
> This is wrong and needs further consideration.
> 
Yes, now memmove() overwrites NUL if p was at the end and stack[k-1] == 
'/'. Is it true per POSIX that "rr/home" must resolve to "//home" if 
"rr" -> "//"? If so, maybe something like the following instead:

+               while (stack[p] == '/') p++;
+               if (stack[p] && stack[k-1] != '/') p--;
                 p -= k;
-               if (stack[k-1]=='/') p++;

>>  	}
>> 
>> @@ -95,7 +100,8 @@
>>  		l = strlen(stack);
>>  		/* Cancel any initial .. components. */
>>  		p = 0;
>> -		while (q-p>=2 && at_dotdot(output+p+2, p+2)) {
>> +		while (output[p]=='.' && output[p+1]=='.'
>> +		  && (!output[p+2] || output[p+2]=='/')) {
>>  			while(l>1 && stack[l-1]!='/') l--;
>>  			if (l>1) l--;
>>  			p += 2;
> 
> OK, I have a better improvement for this: counting the number of
> levels of .. as they're built at the head of output. Then it's just
> while (nup--) here, and the condition for canceling .. in the first
> loop no longer needs any string inspection; it's just (q>3*nup).
> 
Sounds good.

I've missed the last time that the immediately following code is also 
broken:

>               if (q-p && stack[l-1]!='/') output[--p] = '/';

It will underflow the output in case of a simple relative path that 
doesn't start with "..".

I've also noticed other issues to be fixed, per POSIX:

* ENOENT should be returned if filename is NULL

* ENOTDIR should be returned if the last component is not a directory  
and the path has one or more trailing slashes

Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-23  3:19   ` Rich Felker
  2020-11-23 18:56     ` Rich Felker
@ 2020-11-24  3:41     ` Alexey Izbyshev
  1 sibling, 0 replies; 20+ messages in thread
From: Alexey Izbyshev @ 2020-11-24  3:41 UTC (permalink / raw)
  To: musl

On 2020-11-23 06:19, Rich Felker wrote:
> On Mon, Nov 23, 2020 at 05:03:25AM +0300, Alexey Izbyshev wrote:
>> On 2020-11-23 01:56, Rich Felker wrote:
>> >I originally considered keeping the procfs based version and only
>> >using the new one as a fallback, but I discovered there are cases
>> >(involving chroot, namespaces, etc.) where the answer from procfs is
>> >wrong and validating it requires basically the same procedure as
>> >implementing it manually (walking and performing readlink on each path
>> >component).
>> >
>> Pity that the simple and fast procfs-based implementation goes away.
>> Do you have any specific example of a wrong answer from procfs at
>> hand, or at least a more specific direction to look at than just
>> "chroot/namespaces"?
> 
> Assuming you even have procfs, the name read from readlink on procfs
> will be relative to the real root of the mount namespace, not the
> chroot.

Probably I misunderstand what you mean, but when I tried a 
straightforward test on a 5.4-based kernel (with procfs-based musl, of 
course), I failed:

$ mkdir -p chr/proc
$ cat test.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
   puts(realpath(argv[1], NULL));
}
$ musl-gcc -static -std=gnu99 test.c -o chr/a.out
$ unshare -Urm
# mount --rbind /proc chr/proc
# chroot chr ./a.out ./a.out
/a.out

> If the mount namespace has bind mounts over top of anything,
> it can also give you a pathname that's no longer valid because
> something is mounted over it.
> 
I see how this can be the case if a bind mount was created on top of 
something *after* musl opened the fd in realpath(), but in presence of 
concurrent FS modifications the new implementation can return a stale 
result too.

> There may be other ways this arises too. At the very least, you need
> to do fstat/stat to match them up like we do now; otherwise you can
> get wildly wrong results. But even if the stat matches up, it's still
> possible that the resulting pathname is an absolute pathname outside
> the chroot or behind the bind mount or whatever, but is also valid but
> involving symlink traversal in when processed from in the current
> process context. This means you return a result that does not satisfy
> the contract to be symlink-free.
> 
If a path that is not meaningful in the context of the current process 
can be returned by the kernel in absence of concurrent FS modifications, 
it is disturbing. The kernel code for open() does more-or-less the same 
as the new realpath() implementation, and it's not clear to me what goes 
wrong when the kernel walks the parent chain back to rebuild the path 
(and AFAIR, the kernel does remember the chain of mount points that it 
traversed while resolving a path, and it certainly does account for 
chroot).

> There's also a matter I didn't mention that the current code is wrong
> in an unsafe way on per-O_PATH kernels. Other places we mitigate that
> by using O_NOFOLLOW and O_NOCTTY to avoid *most* of the possible
> unwanted side effects if opening an actual file on a kernel that
> doesn't have O_PATH, but on realpath we specifically can't use
> O_NOFOLLOW, and this makes it susceptible to tricking root (or any
> user with read access) into opening device nodes, in ways that might
> have side effects.
> 
Ugh, yes, pre-O_PATH kernels are a problem. Apart from what you 
described, you can't open a non-readable file in a readable directory 
without O_PATH, and you'll have trouble with Unix domain sockets as 
well. But at least this class of problems will go away with time.

> So, there are a lot of bad things about the current implementation.
> Even the minor mitigations present now for some of them (the stat
> check) along with the overhead (open/close) makes it questionable
> whether it's faster for lots of inputs. For deep paths the new one is
> probably slower, but for typical ones it's not as clear and I didn't
> measure.
> 
Thanks for the detailed response. While I was aware about problems with 
procfs-based path resolution for long-lived fds (i.e. when FS tree 
changes after the file was opened), I'm a bit baffled that, according to 
your description, things can go wrong even in realpath() use case.

Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-24  3:39         ` Alexey Izbyshev
@ 2020-11-24  4:26           ` Rich Felker
  2020-11-24  5:13             ` Alexey Izbyshev
  2020-11-24 20:31             ` Rich Felker
  0 siblings, 2 replies; 20+ messages in thread
From: Rich Felker @ 2020-11-24  4:26 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Tue, Nov 24, 2020 at 06:39:59AM +0300, Alexey Izbyshev wrote:
> On 2020-11-23 23:53, Rich Felker wrote:
> >On Mon, Nov 23, 2020 at 01:56:33PM -0500, Rich Felker wrote:
> >>On Sun, Nov 22, 2020 at 10:19:33PM -0500, Rich Felker wrote:
> >>--- realpath8.c	2020-11-22 17:52:17.586481571 -0500
> >>+++ realpath9.c	2020-11-23 13:55:06.808458893 -0500
> >>@@ -19,7 +19,7 @@
> >> 	char *output = resolved ? resolved : buf;
> >> 	size_t p, q, l, cnt=0;
> >>
> >>-	l = strnlen(filename, sizeof stack + 1);
> >>+	l = strnlen(filename, sizeof stack);
> >> 	if (!l) {
> >> 		errno = ENOENT;
> >> 		return 0;
> >>@@ -80,11 +80,16 @@
> >> 			return 0;
> >> 		}
> >> 		if (k==p) goto toolong;
> >>+		if (!k) {
> >>+			errno = ENOENT;
> >>+			return 0;
> >>+		}
> >> 		if (++cnt == SYMLOOP_MAX) {
> >> 			errno = ELOOP;
> >> 			return 0;
> >> 		}
> >> 		p -= k;
> >>+		if (stack[k-1]=='/') p++;
> >> 		memmove(stack+p, stack, k);
> >
> >This is wrong and needs further consideration.
> >
> Yes, now memmove() overwrites NUL if p was at the end and stack[k-1]
> == '/'. Is it true per POSIX that "rr/home" must resolve to "//home"
> if "rr" -> "//"?

I don't think // is even required be distinct from /, just permitted,
but I think allowing it in userspace and handling it consistently is
the right behavior in case you ever run on a kernel that does make use
of the distinction.

> If so, maybe something like the following instead:
> 
> +               while (stack[p] == '/') p++;
> +               if (stack[p] && stack[k-1] != '/') p--;
>                 p -= k;
> -               if (stack[k-1]=='/') p++;

Rather just:

	/* If link contents end in /, strip any slashes already on
	 * stack to avoid /->// or //->/// or spurious toolong. */
	if (stack[k-1]=='/') while (stack[p]=='/') p++;

should work (before the p-=k;)

> >> 	}
> >>
> >>@@ -95,7 +100,8 @@
> >> 		l = strlen(stack);
> >> 		/* Cancel any initial .. components. */
> >> 		p = 0;
> >>-		while (q-p>=2 && at_dotdot(output+p+2, p+2)) {
> >>+		while (output[p]=='.' && output[p+1]=='.'
> >>+		  && (!output[p+2] || output[p+2]=='/')) {
> >> 			while(l>1 && stack[l-1]!='/') l--;
> >> 			if (l>1) l--;
> >> 			p += 2;
> >
> >OK, I have a better improvement for this: counting the number of
> >levels of .. as they're built at the head of output. Then it's just
> >while (nup--) here, and the condition for canceling .. in the first
> >loop no longer needs any string inspection; it's just (q>3*nup).
> >
> Sounds good.
> 
> I've missed the last time that the immediately following code is
> also broken:
> 
> >              if (q-p && stack[l-1]!='/') output[--p] = '/';
> 
> It will underflow the output in case of a simple relative path that
> doesn't start with "..".

Thanks. This logic just looks wrong; I'll rework it.

> I've also noticed other issues to be fixed, per POSIX:
> 
> * ENOENT should be returned if filename is NULL

Rather it looks like it's:

	[EINVAL] The file_name argument is a null pointer.

ENOENT is only for empty string or ENOENT somewhere in the path
traversal process.

> * ENOTDIR should be returned if the last component is not a
> directory  and the path has one or more trailing slashes

Yes, that's precisely what I've been working on the past couple hours.
I think you missed but .. will also erase a path component that's not
a dir (e.g. /dev/null/.. -> /dev) and these are both instances of a
common problem. I thought use of readlink covered all the ENOTDIR
cases but it doesn't when the next component isn't covered by readlink
or isn't present at all.

It's trivial to fix with a check after each component but that doubles
the number of syscalls and mostly isn't necessary. I have a reworked
draft to fix the problem by advancing over /(/|./|.$)* rather than just
/+ after each component, so that we can lookahead and do an extra
readlink in the cases that need it.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-24  4:26           ` Rich Felker
@ 2020-11-24  5:13             ` Alexey Izbyshev
  2020-11-24  6:30               ` Rich Felker
  2020-11-24 20:31             ` Rich Felker
  1 sibling, 1 reply; 20+ messages in thread
From: Alexey Izbyshev @ 2020-11-24  5:13 UTC (permalink / raw)
  To: musl

On 2020-11-24 07:26, Rich Felker wrote:
> On Tue, Nov 24, 2020 at 06:39:59AM +0300, Alexey Izbyshev wrote:
>> On 2020-11-23 23:53, Rich Felker wrote:
>> >On Mon, Nov 23, 2020 at 01:56:33PM -0500, Rich Felker wrote:
>> >>On Sun, Nov 22, 2020 at 10:19:33PM -0500, Rich Felker wrote:
>> >>--- realpath8.c	2020-11-22 17:52:17.586481571 -0500
>> >>+++ realpath9.c	2020-11-23 13:55:06.808458893 -0500
>> >>@@ -19,7 +19,7 @@
>> >> 	char *output = resolved ? resolved : buf;
>> >> 	size_t p, q, l, cnt=0;
>> >>
>> >>-	l = strnlen(filename, sizeof stack + 1);
>> >>+	l = strnlen(filename, sizeof stack);
>> >> 	if (!l) {
>> >> 		errno = ENOENT;
>> >> 		return 0;
>> >>@@ -80,11 +80,16 @@
>> >> 			return 0;
>> >> 		}
>> >> 		if (k==p) goto toolong;
>> >>+		if (!k) {
>> >>+			errno = ENOENT;
>> >>+			return 0;
>> >>+		}
>> >> 		if (++cnt == SYMLOOP_MAX) {
>> >> 			errno = ELOOP;
>> >> 			return 0;
>> >> 		}
>> >> 		p -= k;
>> >>+		if (stack[k-1]=='/') p++;
>> >> 		memmove(stack+p, stack, k);
>> >
>> >This is wrong and needs further consideration.
>> >
>> Yes, now memmove() overwrites NUL if p was at the end and stack[k-1]
>> == '/'. Is it true per POSIX that "rr/home" must resolve to "//home"
>> if "rr" -> "//"?
> 
> I don't think // is even required be distinct from /, just permitted,
> but I think allowing it in userspace and handling it consistently is
> the right behavior in case you ever run on a kernel that does make use
> of the distinction.
> 
>> If so, maybe something like the following instead:
>> 
>> +               while (stack[p] == '/') p++;
>> +               if (stack[p] && stack[k-1] != '/') p--;
>>                 p -= k;
>> -               if (stack[k-1]=='/') p++;
> 
> Rather just:
> 
> 	/* If link contents end in /, strip any slashes already on
> 	 * stack to avoid /->// or //->/// or spurious toolong. */
> 	if (stack[k-1]=='/') while (stack[p]=='/') p++;
> 
> should work (before the p-=k;)
> 
Yes, that looks good.

>> I've also noticed other issues to be fixed, per POSIX:
>> 
>> * ENOENT should be returned if filename is NULL
> 
> Rather it looks like it's:
> 
> 	[EINVAL] The file_name argument is a null pointer.
> 
> ENOENT is only for empty string or ENOENT somewhere in the path
> traversal process.
> 
Uh, yes, that was bad copy-paste or something.

>> * ENOTDIR should be returned if the last component is not a
>> directory  and the path has one or more trailing slashes
> 
> Yes, that's precisely what I've been working on the past couple hours.
> I think you missed but .. will also erase a path component that's not
> a dir (e.g. /dev/null/.. -> /dev) and these are both instances of a
> common problem. I thought use of readlink covered all the ENOTDIR
> cases but it doesn't when the next component isn't covered by readlink
> or isn't present at all.
> 
Yes, initially I forgot about this whole ENOTDIR issue completely, and 
after noticing the problem with the last component, didn't look further.

Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-24  5:13             ` Alexey Izbyshev
@ 2020-11-24  6:30               ` Rich Felker
  2020-11-24  9:21                 ` Alexey Izbyshev
  0 siblings, 1 reply; 20+ messages in thread
From: Rich Felker @ 2020-11-24  6:30 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Tue, Nov 24, 2020 at 08:13:56AM +0300, Alexey Izbyshev wrote:
> On 2020-11-24 07:26, Rich Felker wrote:
> >On Tue, Nov 24, 2020 at 06:39:59AM +0300, Alexey Izbyshev wrote:
> >>On 2020-11-23 23:53, Rich Felker wrote:
> >>>On Mon, Nov 23, 2020 at 01:56:33PM -0500, Rich Felker wrote:
> >>>>On Sun, Nov 22, 2020 at 10:19:33PM -0500, Rich Felker wrote:
> >>>>--- realpath8.c	2020-11-22 17:52:17.586481571 -0500
> >>>>+++ realpath9.c	2020-11-23 13:55:06.808458893 -0500
> >>>>@@ -19,7 +19,7 @@
> >>>> 	char *output = resolved ? resolved : buf;
> >>>> 	size_t p, q, l, cnt=0;
> >>>>
> >>>>-	l = strnlen(filename, sizeof stack + 1);
> >>>>+	l = strnlen(filename, sizeof stack);
> >>>> 	if (!l) {
> >>>> 		errno = ENOENT;
> >>>> 		return 0;
> >>>>@@ -80,11 +80,16 @@
> >>>> 			return 0;
> >>>> 		}
> >>>> 		if (k==p) goto toolong;
> >>>>+		if (!k) {
> >>>>+			errno = ENOENT;
> >>>>+			return 0;
> >>>>+		}
> >>>> 		if (++cnt == SYMLOOP_MAX) {
> >>>> 			errno = ELOOP;
> >>>> 			return 0;
> >>>> 		}
> >>>> 		p -= k;
> >>>>+		if (stack[k-1]=='/') p++;
> >>>> 		memmove(stack+p, stack, k);
> >>>
> >>>This is wrong and needs further consideration.
> >>>
> >>Yes, now memmove() overwrites NUL if p was at the end and stack[k-1]
> >>== '/'. Is it true per POSIX that "rr/home" must resolve to "//home"
> >>if "rr" -> "//"?
> >
> >I don't think // is even required be distinct from /, just permitted,
> >but I think allowing it in userspace and handling it consistently is
> >the right behavior in case you ever run on a kernel that does make use
> >of the distinction.
> >
> >>If so, maybe something like the following instead:
> >>
> >>+               while (stack[p] == '/') p++;
> >>+               if (stack[p] && stack[k-1] != '/') p--;
> >>                p -= k;
> >>-               if (stack[k-1]=='/') p++;
> >
> >Rather just:
> >
> >	/* If link contents end in /, strip any slashes already on
> >	 * stack to avoid /->// or //->/// or spurious toolong. */
> >	if (stack[k-1]=='/') while (stack[p]=='/') p++;
> >
> >should work (before the p-=k;)
> >
> Yes, that looks good.
> 
> >>I've also noticed other issues to be fixed, per POSIX:
> >>
> >>* ENOENT should be returned if filename is NULL
> >
> >Rather it looks like it's:
> >
> >	[EINVAL] The file_name argument is a null pointer.
> >
> >ENOENT is only for empty string or ENOENT somewhere in the path
> >traversal process.
> >
> Uh, yes, that was bad copy-paste or something.
> 
> >>* ENOTDIR should be returned if the last component is not a
> >>directory  and the path has one or more trailing slashes
> >
> >Yes, that's precisely what I've been working on the past couple hours.
> >I think you missed but .. will also erase a path component that's not
> >a dir (e.g. /dev/null/.. -> /dev) and these are both instances of a
> >common problem. I thought use of readlink covered all the ENOTDIR
> >cases but it doesn't when the next component isn't covered by readlink
> >or isn't present at all.
> >
> Yes, initially I forgot about this whole ENOTDIR issue completely,
> and after noticing the problem with the last component, didn't look
> further.

I think before this goes upstream we should have a good set of
testcases that can be contributed to libc-test. Do you have ideas for
coverage? Some that come to mind:

- Absolute argument starting with /, //, and ///
- Absolute symlink target starting with /, //, and ///
- Final / after symlink-to-dir, dir, symlink-to-nondir, nondir
- Final / in link contents after [the above]
- Initial .. in argument, cwd root or non-root
- Initial .. in symlink target, symlink in root or non-root
- Initial ...
- .. following symlink-to-dir, dir, symlink-to-nondir, nondir
- More .. than path depth
- Null argument
- Empty string argument
- Empty string link contents (testable only with seccomp hack)
- Argument valid abs path exact length PATH_MAX-1
- Argument valid rel path exact length PATH_MAX-1 to short abs path

Some of these require namespace gymnastics to set up without running
the tests as root.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-24  6:30               ` Rich Felker
@ 2020-11-24  9:21                 ` Alexey Izbyshev
  2020-11-24 14:35                   ` Rich Felker
  2020-11-25 15:02                   ` Rich Felker
  0 siblings, 2 replies; 20+ messages in thread
From: Alexey Izbyshev @ 2020-11-24  9:21 UTC (permalink / raw)
  To: musl

On 2020-11-24 09:30, Rich Felker wrote:
> I think before this goes upstream we should have a good set of
> testcases that can be contributed to libc-test. Do you have ideas for
> coverage? Some that come to mind:
> 
Added some more ideas.

> - Absolute argument starting with /, //, and ///
- Absolute argument equal to one of /, //, and ///
> - Absolute symlink target starting with /, //, and ///
- Absolute symlink target equal to one of /, //, and ///, with the link 
separated from the following component with /, //
> - Final / after symlink-to-dir, dir, symlink-to-nondir, nondir
- Intermediate / after symlink-to-nondir, nondir
> - Final / in link contents after [the above]
- Multiple / after ., .., normal component
> - Initial .. in argument, cwd root or non-root
> - Initial .. in symlink target, symlink in root or non-root
> - Initial ...
> - .. following symlink-to-dir, dir, symlink-to-nondir, nondir
- . following symlink-to-dir, dir, symlink-to-nondir, nondir
> - More .. than path depth
> - Null argument
> - Empty string argument
- Argument consisting of a single ., ..
> - Empty string link contents (testable only with seccomp hack)
> - Argument valid abs path exact length PATH_MAX-1
> - Argument valid rel path exact length PATH_MAX-1 to short abs path
- Argument with PATH_MAX length (ENAMETOOLONG)
- A relative symlink in the argument such that the length of the result 
is PATH_MAX-1 (valid path), PATH_MAX (ENAMETOOLONG)
- An absolute symlink in the argument similar to the above
- A relative argument with the current directory similar to the above
- An argument consisting of a single (relative, absolute) symlink with 
the target having length PATH_MAX-1
- An argument ending with a relative symlink with the target having 
length PATH_MAX-1 (ENAMETOOLONG)
- An argument ending with an absolute symlink with the target having 
length PATH_MAX-1 (valid path)

Hm, the last one doesn't work now. Since p is the position of NUL 
instead of the size of stack, "if (k==p) goto toolong;" forbids symlinks 
with the length of the target == PATH_MAX-1.
> 
> Some of these require namespace gymnastics to set up without running
> the tests as root.
> 
Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-24  9:21                 ` Alexey Izbyshev
@ 2020-11-24 14:35                   ` Rich Felker
  2020-11-24 20:17                     ` Rich Felker
  2020-11-25 15:02                   ` Rich Felker
  1 sibling, 1 reply; 20+ messages in thread
From: Rich Felker @ 2020-11-24 14:35 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Tue, Nov 24, 2020 at 12:21:36PM +0300, Alexey Izbyshev wrote:
> On 2020-11-24 09:30, Rich Felker wrote:
> >I think before this goes upstream we should have a good set of
> >testcases that can be contributed to libc-test. Do you have ideas for
> >coverage? Some that come to mind:
> >
> Added some more ideas.
> 
> [...]
> - An argument ending with an absolute symlink with the target having
> length PATH_MAX-1 (valid path)
> 
> Hm, the last one doesn't work now. Since p is the position of NUL
> instead of the size of stack, "if (k==p) goto toolong;" forbids
> symlinks with the length of the target == PATH_MAX-1.

This should be fixable just by increasing size of stack to PATH_MAX+1.
In theory it doesn't need to be null-terminated but then strchrnul
won't work. memchr would work instead but it's slightly less
convenient to use.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-24 14:35                   ` Rich Felker
@ 2020-11-24 20:17                     ` Rich Felker
  0 siblings, 0 replies; 20+ messages in thread
From: Rich Felker @ 2020-11-24 20:17 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Tue, Nov 24, 2020 at 09:35:55AM -0500, Rich Felker wrote:
> On Tue, Nov 24, 2020 at 12:21:36PM +0300, Alexey Izbyshev wrote:
> > On 2020-11-24 09:30, Rich Felker wrote:
> > >I think before this goes upstream we should have a good set of
> > >testcases that can be contributed to libc-test. Do you have ideas for
> > >coverage? Some that come to mind:
> > >
> > Added some more ideas.
> > 
> > [...]
> > - An argument ending with an absolute symlink with the target having
> > length PATH_MAX-1 (valid path)
> > 
> > Hm, the last one doesn't work now. Since p is the position of NUL
> > instead of the size of stack, "if (k==p) goto toolong;" forbids
> > symlinks with the length of the target == PATH_MAX-1.
> 
> This should be fixable just by increasing size of stack to PATH_MAX+1.
> In theory it doesn't need to be null-terminated but then strchrnul
> won't work. memchr would work instead but it's slightly less
> convenient to use.

One thing I noticed while working out test ideas: my very early
observation that we can use the caller's buffer for output is wrong.
The spec allows it to have been clobbered on failure but not on
success; there's nothing allowing write past the resulting string
size.

It's not too bad to have two 4k buffers, but I think we can actually
put them together. PATH_MAX+1 isn't quite enough because, when
expanding a link, we momentatily need space for both the link name and
contents. However PATH_MAX+NAME_MAX+2 should suffice, since at most
a NAME_MAX part is being removed before pushing the link contents onto
the stack. I don't really feel like making this improvement now
though; it's better done as a change later if desired.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-24  4:26           ` Rich Felker
  2020-11-24  5:13             ` Alexey Izbyshev
@ 2020-11-24 20:31             ` Rich Felker
  2020-11-25  5:40               ` Alexey Izbyshev
  1 sibling, 1 reply; 20+ messages in thread
From: Rich Felker @ 2020-11-24 20:31 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 2378 bytes --]

On Mon, Nov 23, 2020 at 11:26:46PM -0500, Rich Felker wrote:
> On Tue, Nov 24, 2020 at 06:39:59AM +0300, Alexey Izbyshev wrote:
> > * ENOTDIR should be returned if the last component is not a
> > directory  and the path has one or more trailing slashes
> 
> Yes, that's precisely what I've been working on the past couple hours.
> I think you missed but .. will also erase a path component that's not
> a dir (e.g. /dev/null/.. -> /dev) and these are both instances of a
> common problem. I thought use of readlink covered all the ENOTDIR
> cases but it doesn't when the next component isn't covered by readlink
> or isn't present at all.
> 
> It's trivial to fix with a check after each component but that doubles
> the number of syscalls and mostly isn't necessary. I have a reworked
> draft to fix the problem by advancing over /(/|./|.$)* rather than just
> /+ after each component, so that we can lookahead and do an extra
> readlink in the cases that need it.

While this worked, it ended up being the wrong thing to do, making two
places where readlink is called, one of them with a dummy buffer. The
right way to do it is rework the flow so that the existing readlink is
"naturally" hit where needed. This amounts to:

- Letting .. processing that cancels path components go through the
  same code path as new path components, rather than handling it
  early, and just skipping the actual readlink if we already know we
  have a dir.

- Also treating a zero-length final component as something that goes
  through the readlink code path.

There was a fair amount of reorganizing needed to make this work out,
but the end result is clean and non-redundant and code size is almost
the same as before with the missing-ENOTDIR bugs.

Speaking of code size, on 32-bit archs the proposed explicit realpath
is roughly the same size as stat+fstat+fstatat (a little over 1k on
i386), which were needed to implement the old lazy realpath in terms
of procfs. So for minimal static linking, resulting code size may be
same or smaller. (Of course it's larger if stat is already linked for
other reasons.)

New draft attached. It's possible that there are regressions since I
haven't put together an automated testset. I'm not sure if I'll try to
merge it in this release cycle still or not; that probably depends on
how easy or difficult automating these tests ends up being.

Rich

[-- Attachment #2: realpath17.c --]
[-- Type: text/plain, Size: 3614 bytes --]

#define _GNU_SOURCE
#include <stdlib.h>
#include <limits.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>

static size_t slash_len(const char *s)
{
	const char *s0 = s;
	while (*s == '/') s++;
	return s-s0;
}

char *realpath(const char *restrict filename, char *restrict resolved)
{
	char stack[PATH_MAX+1];
	char output[PATH_MAX];
	size_t p, q, l, l0, cnt=0, nup=0;
	int check_dir=0;

	if (!filename) {
		errno = EINVAL;
		return 0;
	}
	l = strnlen(filename, sizeof stack);
	if (!l) {
		errno = ENOENT;
		return 0;
	}
	if (l >= PATH_MAX) goto toolong;
	p = sizeof stack - l - 1;
	q = 0;
	memcpy(stack+p, filename, l+1);

	/* Main loop. Each iteration pops the next part from stack of
	 * remaining path components and consumes any slashes that follow.
	 * If not a link, it's moved to output; if a link, contents are
	 * pushed to the stack. */
restart:
	for (; ; p+=slash_len(stack+p)) {
		/* If stack starts with /, the whole component is / or //
		 * and the output state must be reset. */
		if (stack[p] == '/') {
			check_dir=0;
			nup=0;
			q=0;
			output[q++] = '/';
			p++;
			/* Initial // is special. */
			if (stack[p] == '/' && stack[p+1] != '/')
				output[q++] = '/';
			continue;
		}

		char *z = __strchrnul(stack+p, '/');
		l0 = l = z-(stack+p);

		if (!l && !check_dir) break;

		/* Skip any . component but preserve check_dir status. */
		if (l==1 && stack[p]=='.') {
			p += l;
			continue;
		}

		/* Copy next component onto output at least temporarily, to
		 * call readlink, but wait to advance output position until
		 * determining it's not a link. */
		if (q && output[q-1] != '/') {
			if (!p) goto toolong;
			stack[--p] = '/';
			l++;
		}
		if (q+l >= PATH_MAX) goto toolong;
		memcpy(output+q, stack+p, l);
		output[q+l] = 0;
		p += l;

		int up = 0;
		if (l0==2 && stack[p-2]=='.' && stack[p-1]=='.') {
			up = 1;
			/* Any non-.. path components we could cancel start
			 * after nup repetitions of the 3-byte string "../";
			 * if there are none, accumulate .. components to
			 * later apply to cwd, if needed. */
			if (q <= 3*nup) {
				nup++;
				q += l;
				continue;
			}
			/* When previous components are already known to be
			 * directories, processing .. can skip readlink. */
			if (!check_dir) goto skip_readlink;
		}
		ssize_t k = readlink(output, stack, p);
		if (k==p) goto toolong;
		if (!k) {
			errno = ENOENT;
			return 0;
		}
		if (k<0) {
			if (errno != EINVAL) return 0;
skip_readlink:
			check_dir = 0;
			if (up) {
				while(q && output[q-1]!='/') q--;
				if (q>1 && (q>2 || output[0]!='/')) q--;
				continue;
			}
			if (l0) q += l;
			check_dir = stack[p];
			continue;
		}
		if (++cnt == SYMLOOP_MAX) {
			errno = ELOOP;
			return 0;
		}

		/* If link contents end in /, strip any slashes already on
		 * stack to avoid /->// or //->/// or spurious toolong. */
		if (stack[k-1]=='/') while (stack[p]=='/') p++;
		p -= k;
		memmove(stack+p, stack, k);

		/* Skip the stack advancement in case we have a new
		 * absolute base path. */
		goto restart;
	}

 	output[q] = 0;

	if (output[0] != '/') {
		if (!getcwd(stack, sizeof stack)) return 0;
		l = strlen(stack);
		/* Cancel any initial .. components. */
		p = 0;
		while (nup--) {
			while(l>1 && stack[l-1]!='/') l--;
			if (l>1) l--;
			p += 2;
			if (p<q) p++;
		}
		if (q-p && stack[l-1]!='/') stack[l++] = '/';
		if (l + (q-p) + 1 >= PATH_MAX) goto toolong;
		memmove(output + l, output + p, q - p + 1);
		memcpy(output, stack, l);
		q = l + q-p;
	}

	if (resolved) return memcpy(resolved, output, q+1);
	else return strdup(output);

toolong:
	errno = ENAMETOOLONG;
	return 0;
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-24 20:31             ` Rich Felker
@ 2020-11-25  5:40               ` Alexey Izbyshev
  2020-11-25 15:03                 ` Rich Felker
  0 siblings, 1 reply; 20+ messages in thread
From: Alexey Izbyshev @ 2020-11-25  5:40 UTC (permalink / raw)
  To: musl

On 2020-11-24 23:31, Rich Felker wrote:
> On Mon, Nov 23, 2020 at 11:26:46PM -0500, Rich Felker wrote:
>> On Tue, Nov 24, 2020 at 06:39:59AM +0300, Alexey Izbyshev wrote:
>> > * ENOTDIR should be returned if the last component is not a
>> > directory  and the path has one or more trailing slashes
>> 
>> Yes, that's precisely what I've been working on the past couple hours.
>> I think you missed but .. will also erase a path component that's not
>> a dir (e.g. /dev/null/.. -> /dev) and these are both instances of a
>> common problem. I thought use of readlink covered all the ENOTDIR
>> cases but it doesn't when the next component isn't covered by readlink
>> or isn't present at all.
>> 
>> It's trivial to fix with a check after each component but that doubles
>> the number of syscalls and mostly isn't necessary. I have a reworked
>> draft to fix the problem by advancing over /(/|./|.$)* rather than 
>> just
>> /+ after each component, so that we can lookahead and do an extra
>> readlink in the cases that need it.
> 
> While this worked, it ended up being the wrong thing to do, making two
> places where readlink is called, one of them with a dummy buffer. The
> right way to do it is rework the flow so that the existing readlink is
> "naturally" hit where needed. This amounts to:
> 
> - Letting .. processing that cancels path components go through the
>   same code path as new path components, rather than handling it
>   early, and just skipping the actual readlink if we already know we
>   have a dir.
> 
> - Also treating a zero-length final component as something that goes
>   through the readlink code path.
> 
> There was a fair amount of reorganizing needed to make this work out,
> but the end result is clean and non-redundant and code size is almost
> the same as before with the missing-ENOTDIR bugs.
> 
> Speaking of code size, on 32-bit archs the proposed explicit realpath
> is roughly the same size as stat+fstat+fstatat (a little over 1k on
> i386), which were needed to implement the old lazy realpath in terms
> of procfs. So for minimal static linking, resulting code size may be
> same or smaller. (Of course it's larger if stat is already linked for
> other reasons.)
> 
> New draft attached. It's possible that there are regressions since I
> haven't put together an automated testset. I'm not sure if I'll try to
> merge it in this release cycle still or not; that probably depends on
> how easy or difficult automating these tests ends up being.
> 
The new draft looks good to me. I've also done some basic manual testing 
(not covering all proposed cases) and haven't found any issues.

I don't see why the size of stack has to be PATH_MAX+1 though. To 
address the issue with symlink targets of PATH_MAX-1 length, it seems 
sufficient to just do the following:

-               ssize_t k = readlink(output, stack, p);
-               if (k==p) goto toolong;
+               ssize_t k = readlink(output, stack, p+1);
+               if (k==p+1) goto toolong;

Since p is never past the end of the stack, there is no harm in allowing 
k == p.

Alexey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-24  9:21                 ` Alexey Izbyshev
  2020-11-24 14:35                   ` Rich Felker
@ 2020-11-25 15:02                   ` Rich Felker
  2020-11-25 19:40                     ` Alexey Izbyshev
  1 sibling, 1 reply; 20+ messages in thread
From: Rich Felker @ 2020-11-25 15:02 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 2839 bytes --]

On Tue, Nov 24, 2020 at 12:21:36PM +0300, Alexey Izbyshev wrote:
> On 2020-11-24 09:30, Rich Felker wrote:
> >I think before this goes upstream we should have a good set of
> >testcases that can be contributed to libc-test. Do you have ideas for
> >coverage? Some that come to mind:
> >
> Added some more ideas.
> 
> >- Absolute argument starting with /, //, and ///
> - Absolute argument equal to one of /, //, and ///
> >- Absolute symlink target starting with /, //, and ///
> - Absolute symlink target equal to one of /, //, and ///, with the
> link separated from the following component with /, //
> >- Final / after symlink-to-dir, dir, symlink-to-nondir, nondir
> - Intermediate / after symlink-to-nondir, nondir
> >- Final / in link contents after [the above]
> - Multiple / after ., .., normal component
> >- Initial .. in argument, cwd root or non-root
> >- Initial .. in symlink target, symlink in root or non-root
> >- Initial ...
> >- .. following symlink-to-dir, dir, symlink-to-nondir, nondir
> - . following symlink-to-dir, dir, symlink-to-nondir, nondir
> >- More .. than path depth
> >- Null argument
> >- Empty string argument
> - Argument consisting of a single ., ..
> >- Empty string link contents (testable only with seccomp hack)
> >- Argument valid abs path exact length PATH_MAX-1
> >- Argument valid rel path exact length PATH_MAX-1 to short abs path
> - Argument with PATH_MAX length (ENAMETOOLONG)
> - A relative symlink in the argument such that the length of the
> result is PATH_MAX-1 (valid path), PATH_MAX (ENAMETOOLONG)
> - An absolute symlink in the argument similar to the above
> - A relative argument with the current directory similar to the above
> - An argument consisting of a single (relative, absolute) symlink
> with the target having length PATH_MAX-1
> - An argument ending with a relative symlink with the target having
> length PATH_MAX-1 (ENAMETOOLONG)
> - An argument ending with an absolute symlink with the target having
> length PATH_MAX-1 (valid path)

OK, here are my WIP testcases. They only cover static cases, not
anything that depends on implementation-defined value of PATH_MAX or
how the working directory combines with tests to reach PATH_MAX.

The way you run the tests is by running ../setup.sh then ../checker.sh
from a scratch subdir of the dir containing the test files (realpath.c
also needs to be compiled and linked against the realpath impl you
want to test)

Format of the testcase file is one test per line, input and expected
output separated by whitespace. . and .. in expected output are
translated based on working dir. Error names are treated as output
since they don't fall in same namespace (no initial /).

All tests are currently passing for me.

If you have ideas for additional static tests that fit the form and
are easy to add, please suggest.

Rich

[-- Attachment #2: setup.sh --]
[-- Type: application/x-sh, Size: 434 bytes --]

[-- Attachment #3: checker.sh --]
[-- Type: application/x-sh, Size: 335 bytes --]

[-- Attachment #4: realpath.c --]
[-- Type: text/plain, Size: 510 bytes --]

#define _XOPEN_SOURCE 700
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
#include <errno.h>

#define E(x) { x, #x }

static const struct {
	int ec;
	char *name;
} errs[] = { E(ENOENT), E(ENOTDIR), E(ENAMETOOLONG), E(ELOOP), E(EACCES), E(0) };

int main(int argc, char *argv[])
{
	while (*++argv) {
		char *s = realpath(*argv, (char[PATH_MAX]){""});
		//char *s = realpath(*argv, 0);
		if (!s) {
			int i;
			for (i=0; errs[i].ec && errs[i].ec!=errno; i++);
			s = errs[i].name;
		}
		puts(s);
	}
}

[-- Attachment #5: testcases.txt --]
[-- Type: text/plain, Size: 1158 bytes --]

/	/
//	//
///	/
/dev	/dev
/..	/
//..	//
///..	/
/dev/..	/
/dev/../..	/
/dev/../../..	/
dir	./dir
./dir	./dir
file	./file
./file	./file
noent	ENOENT
./noent	ENOENT
dir/	./dir
file/	ENOTDIR
noent/	ENOENT
link_to_dir/	./dir
link_to_file/	ENOTDIR
link_to_noent/	ENOENT
dir/..	.
file/..	ENOTDIR
noent/..	ENOENT
link_to_dir/..	.
link_to_file/.. ENOTDIR
link_to_noent/..	ENOENT
dlink_to_dir	./dir
dlink_to_file	ENOTDIR
dlink_to_noent	ENOENT
dlink_to_link_to_dir	./dir
dlink_to_link_to_file	ENOTDIR
dlink_to_link_to_noent	ENOENT
dlink_to_dir/	./dir
dlink_to_file/	ENOTDIR
dlink_to_noent/	ENOENT
dlink_to_link_to_dir/	./dir
dlink_to_link_to_file/	ENOTDIR
dlink_to_link_to_noent/	ENOENT
dlink_to_dir/..	.
dlink_to_file/..	ENOTDIR
dlink_to_noent/..	ENOENT
dlink_to_link_to_dir/..	.
dlink_to_link_to_file/..	ENOTDIR
dlink_to_link_to_noent/..	ENOENT
abs1	/
abs1/	/
abs1/dev	/dev
abs1/dev/	/dev
abs2	/
abs2/	/
abs2/dev	/dev
abs2/dev/	/dev
abs3	/
abs3/	/
abs3/dev	/dev
abs3/dev/	/dev
abs4	/
abs4/	/
abs4/dev	/dev
abs4/dev/	/dev
alt1	//
alt1/	//
alt1/.	//
alt2	//
alt2/	//
alt2/.	//
a	./a
a/..	.
...	./...
.../	./...
.../..	.
..	..
../	..
../.	..
a/../..	..
../noent	ENOENT

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-25  5:40               ` Alexey Izbyshev
@ 2020-11-25 15:03                 ` Rich Felker
  0 siblings, 0 replies; 20+ messages in thread
From: Rich Felker @ 2020-11-25 15:03 UTC (permalink / raw)
  To: Alexey Izbyshev; +Cc: musl

On Wed, Nov 25, 2020 at 08:40:02AM +0300, Alexey Izbyshev wrote:
> I don't see why the size of stack has to be PATH_MAX+1 though. To
> address the issue with symlink targets of PATH_MAX-1 length, it
> seems sufficient to just do the following:
> 
> -               ssize_t k = readlink(output, stack, p);
> -               if (k==p) goto toolong;
> +               ssize_t k = readlink(output, stack, p+1);
> +               if (k==p+1) goto toolong;
> 
> Since p is never past the end of the stack, there is no harm in
> allowing k == p.

Indeed. I think I was concerned about it clobbering the first byte of
the live part of the stack, but on error the stack is discarded
anyway, so that seems to be a non-issue.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] realpath without procfs -- should be ready for inclusion
  2020-11-25 15:02                   ` Rich Felker
@ 2020-11-25 19:40                     ` Alexey Izbyshev
  0 siblings, 0 replies; 20+ messages in thread
From: Alexey Izbyshev @ 2020-11-25 19:40 UTC (permalink / raw)
  To: musl

On 2020-11-25 18:02, Rich Felker wrote:
> On Tue, Nov 24, 2020 at 12:21:36PM +0300, Alexey Izbyshev wrote:
>> On 2020-11-24 09:30, Rich Felker wrote:
>> >I think before this goes upstream we should have a good set of
>> >testcases that can be contributed to libc-test. Do you have ideas for
>> >coverage? Some that come to mind:
>> >
>> Added some more ideas.
>> 
>> >- Absolute argument starting with /, //, and ///
>> - Absolute argument equal to one of /, //, and ///
>> >- Absolute symlink target starting with /, //, and ///
>> - Absolute symlink target equal to one of /, //, and ///, with the
>> link separated from the following component with /, //
>> >- Final / after symlink-to-dir, dir, symlink-to-nondir, nondir
>> - Intermediate / after symlink-to-nondir, nondir
>> >- Final / in link contents after [the above]
>> - Multiple / after ., .., normal component
>> >- Initial .. in argument, cwd root or non-root
>> >- Initial .. in symlink target, symlink in root or non-root
>> >- Initial ...
>> >- .. following symlink-to-dir, dir, symlink-to-nondir, nondir
>> - . following symlink-to-dir, dir, symlink-to-nondir, nondir
>> >- More .. than path depth
>> >- Null argument
>> >- Empty string argument
>> - Argument consisting of a single ., ..
>> >- Empty string link contents (testable only with seccomp hack)
>> >- Argument valid abs path exact length PATH_MAX-1
>> >- Argument valid rel path exact length PATH_MAX-1 to short abs path
>> - Argument with PATH_MAX length (ENAMETOOLONG)
>> - A relative symlink in the argument such that the length of the
>> result is PATH_MAX-1 (valid path), PATH_MAX (ENAMETOOLONG)
>> - An absolute symlink in the argument similar to the above
>> - A relative argument with the current directory similar to the above
>> - An argument consisting of a single (relative, absolute) symlink
>> with the target having length PATH_MAX-1
>> - An argument ending with a relative symlink with the target having
>> length PATH_MAX-1 (ENAMETOOLONG)
>> - An argument ending with an absolute symlink with the target having
>> length PATH_MAX-1 (valid path)
> 
> OK, here are my WIP testcases. They only cover static cases, not
> anything that depends on implementation-defined value of PATH_MAX or
> how the working directory combines with tests to reach PATH_MAX.
> 
> The way you run the tests is by running ../setup.sh then ../checker.sh
> from a scratch subdir of the dir containing the test files (realpath.c
> also needs to be compiled and linked against the realpath impl you
> want to test)
> 
> Format of the testcase file is one test per line, input and expected
> output separated by whitespace. . and .. in expected output are
> translated based on working dir. Error names are treated as output
> since they don't fall in same namespace (no initial /).
> 
> All tests are currently passing for me.
> 
Tests pass for me too. For the previous procfs-based impl and for glibc 
realpath(), the set of failing tests is exactly the same (all tests for 
//):

// -> / (expected //)
//.. -> / (expected //)
alt1 -> / (expected //)
alt1/ -> / (expected //)
alt1/. -> / (expected //)
alt2 -> / (expected //)
alt2/ -> / (expected //)
alt2/. -> / (expected //)

> If you have ideas for additional static tests that fit the form and
> are easy to add, please suggest.
> 
A test for ELOOP is easy to add.

Some of the skipped tests of the initially proposed list also fall 
within "static tests" limitations, e.g.:

* trailing . (could be added in the form of <normal, ENOTDIR, ENOENT> 
triples as existing tests for trailing / and ..)

* initial .. (or .) in a symlink target

* multiple / in the middle of the argument before a component that is 
not going to be removed

Alexey





^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-11-25 19:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-22 22:56 [musl] realpath without procfs -- should be ready for inclusion Rich Felker
2020-11-23  2:03 ` Alexey Izbyshev
2020-11-23  3:17   ` Érico Nogueira
2020-11-23  3:34     ` Rich Felker
2020-11-23  3:19   ` Rich Felker
2020-11-23 18:56     ` Rich Felker
2020-11-23 20:53       ` Rich Felker
2020-11-24  3:39         ` Alexey Izbyshev
2020-11-24  4:26           ` Rich Felker
2020-11-24  5:13             ` Alexey Izbyshev
2020-11-24  6:30               ` Rich Felker
2020-11-24  9:21                 ` Alexey Izbyshev
2020-11-24 14:35                   ` Rich Felker
2020-11-24 20:17                     ` Rich Felker
2020-11-25 15:02                   ` Rich Felker
2020-11-25 19:40                     ` Alexey Izbyshev
2020-11-24 20:31             ` Rich Felker
2020-11-25  5:40               ` Alexey Izbyshev
2020-11-25 15:03                 ` Rich Felker
2020-11-24  3:41     ` Alexey Izbyshev

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).