mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Markus Wichmann <nullplan@gmx.net>
To: musl@lists.openwall.com
Subject: Re: [musl] realpath without procfs
Date: Tue, 8 Sep 2020 21:27:46 +0200	[thread overview]
Message-ID: <20200908192746.GA7854@voyager> (raw)
In-Reply-To: <20200908171901.GV3265@brightrain.aerifal.cx>

On Tue, Sep 08, 2020 at 01:19:04PM -0400, Rich Felker wrote:
> Since it was raised yet again on #musl, I took some time to research
> justification for, and ease of implementing, realpath without procfs.
>

I do remember dietlibc's implementation of realpath(). But that has
serious side effects that make it not thread-safe. The basic idea they
had was to use chdir() and getcwd() to get the kernel to normalize the
paths without having to read it from procfs. Not needing procfs was one
of the design goals of that project, so that is why they implemented it
that way.

Unfortunately, in some cases chdir() is irreversible (e.g. deleted
working directory), and also, there is only one working directory per
process, so while this is going on, all other threads will have trouble
finding their files. Adding locking to prevent the other threads from
noticing this would be challenging, to say the least, if not outright
impossible. There are just so many places where the working directory
plays a role.

Oh, and one more side effect: While the working directory is switched
elsewhere, another process may unmount the volume containing the
original directory. You could open "." first, to prevent this, but that
adds another two syscalls overhead.

> - ttyname (important to things that use it)
>

I don't see much of an alternative to using procfs for that one. You
could probably search for device and inode of the fd among /dev/tty* and
/dev/pts/* but that seems like a hack. That should probably be at most a
fallback, if the normal way through /proc doesn't work.

> - dynamic linker identifying executable pathname
>

Well, Linux could just pass AT_EXECFN. But if it doesn't, unless they
want to add Solaris' getexecname() syscall, /proc/self/exe is the only
link to the executable file name.

> This is actually a lot less than I expected, and makes it reasonable
> to envision a path to eventually not needing procfs at all.
>
> So, I did the work to figure out what would be needed to write a
> procfs-free realpath, and it turns out that actually writing it was
> not any harder, so I did. Attached is a draft. It needs testing, and
> I'm undecided whether we should keep the old code and just use this as
> a fallback, or just replace it. (The old code has fixed 5-syscall
> overhead and ugly side effects on kernels too old to have O_PATH; new
> code needs one syscall per path component and might (?) have worse or
> different behavior under concurrent changes to the dir tree.)
>
> Some notes:
>
> - Attempts to support all pathnames where no intermediate exceeds
>   PATH_MAX.
>
> - Initial // is treated as special, but //. and //.. resolve to /
>
> - getcwd is expanded initially if pathname is relative. This might be
>   a bad choice since it causes failure whenever pwd is not
>   representable even if the symlink reached via a relative pathname
>   would take us to an absolute path that is representable.

I just checked, and glibc does the same thing. So at least you are in
good company with being unable to handle unreachable working directories
in realpath().

> We could
>   accumulate a relative path, including preserving .. components,
>   until the first absolute-target symlink, and only apply it by
>   prepending (and cancelling ..) at the end if no absolute-target
>   symlink was encountered, but that requires some rework to do.
>
> Thoughts?
>
> Rich

> #define _GNU_SOURCE
> #include <stdlib.h>
> #include <limits.h>
> #include <errno.h>
> #include <unistd.h>
> #include <string.h>
>
> char *realpath(const char *restrict filename, char *restrict resolved)
> {
> 	char output[PATH_MAX], stack[PATH_MAX];
> 	size_t p, q, l, cnt=0;
>
> 	l = strlen(filename);
> 	if (l > sizeof stack) goto toolong;

Shouldn't that be strnlen(), then?

> 	p = sizeof stack - l - 1;
> 	memcpy(stack+p, filename, l+1);
>
> 	if (stack[p] != '/') {
> 		if (getcwd(output, sizeof output) < 0) return 0;
> 		q = strlen(output);
> 	} else {
> 		q = 0;
> 	}
>
> 	while (stack[p]) {
> 		if (stack[p] == '/') {
> 			q=0;
> 			p++;
> 			/* Initial // is special. */
> 			if (stack[p] == '/' && stack[p+1] != '/') {

You already incremented p here. Did you want to test for "///"? The
comment indicated otherwise.

> 				output[q++] = '/';
> 			}
> 			while (stack[p] == '/') p++;
> 		}
> 		char *z = __strchrnul(stack+p, '/');
> 		l = z-(stack+p);
> 		if (l<=2 && stack[p]=='.' && stack[p+l-1]=='.') {
> 			if (l==2) {
> 				while(q>1 && output[q-1]!='/') q--;
> 				if (q>1) q--;
> 			}
> 			p += l;
> 			while (stack[p] == '/') p++;
> 			continue;
> 		}
> 		if (l==1 && stack[p]=='.')
> 		if (l+2 > sizeof output - q) goto toolong;

I believe you forgot to finish the first "if" line here. Also, you have
already handled the "." path at this point.

> 		output[q] = '/';
> 		memcpy(output+q+1, stack+p, l);
> 		output[q+1+l] = 0;
> 		p += l;
> 		ssize_t k = readlink(output, stack, p);
> 		if (k==-1) {
> 			if (errno == EINVAL) {
> 				q += 1+l;
> 				while (stack[p] == '/') p++;
> 				continue;
> 			}
> 			return 0;
> 		}
> 		if (k==p) goto toolong;
> 		if (++cnt == SYMLOOP_MAX) {
> 			errno = ELOOP;
> 			return 0;
> 		}
> 		p -= k;
> 		memmove(stack+p, stack, k);
> 	}
> 	if (!q) output[q++] = '/';
> 	output[q] = 0;
> 	return resolved ? strcpy(resolved, output) : strdup(output);
>
> toolong:
> 	errno = ENAMETOOLONG;
> 	return 0;
> }

Ciao,
Markus

  reply	other threads:[~2020-09-08 19:28 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-08 17:19 Rich Felker
2020-09-08 19:27 ` Markus Wichmann [this message]
2020-09-08 20:17   ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200908192746.GA7854@voyager \
    --to=nullplan@gmx.net \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).