mailing list of musl libc
 help / color / mirror / code / Atom feed
* [PATCH] In glob(), do not require that the target of a symlink exists.
@ 2019-07-23  0:37 James Y Knight
  2019-07-30 16:04 ` James Y Knight
  2019-08-06  2:03 ` Rich Felker
  0 siblings, 2 replies; 5+ messages in thread
From: James Y Knight @ 2019-07-23  0:37 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 611 bytes --]

Previously, when given a trailing path component with no
metacharacters, glob would call stat to check if the provided filename
existed, which would fail on a broken symlink. When expanding a
pattern however, glob would trust the list of files returned by
readdir, and thus would return broken symlinks.

Now, be consistent and allow broken symlinks in both cases, by using
lstat to determine file existence.

If GLOB_MARK is specified, stat must still be used to determine
whether a given name refers to a directory or file. But if that fails,
a symlink is simply considered to be a file for marking purposes.

[-- Attachment #2: 0001-In-glob-do-not-require-that-the-target-of-a-symlink-.patch --]
[-- Type: application/octet-stream, Size: 2736 bytes --]

From 7ef0fabf173f4d29461b01ccfb05c1f19977ba37 Mon Sep 17 00:00:00 2001
From: James Y Knight <jyknight@google.com>
Date: Thu, 11 Jul 2019 16:55:17 -0400
Subject: [PATCH] In glob(), do not require that the target of a symlink
 exists.

Previously, when given a trailing path component with no
metacharacters, glob would call stat to check if the provided filename
existed, which would fail on a broken symlink. When expanding a
pattern however, glob would trust the list of files returned by
readdir, and thus would return broken symlinks.

Now, be consistent and allow broken symlinks in both cases, by using
lstat to determine file existence.

If GLOB_MARK is specified, stat must still be used to determine
whether a given name refers to a directory or file. But if that fails,
a symlink is simply considered to be a file for marking purposes.
---
 src/regex/glob.c | 35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/src/regex/glob.c b/src/regex/glob.c
index aa1c6a44..01b178dc 100644
--- a/src/regex/glob.c
+++ b/src/regex/glob.c
@@ -88,18 +88,33 @@ static int do_glob(char *buf, size_t pos, int type, char *pat, int flags, int (*
 	}
 	buf[pos] = 0;
 	if (!*pat) {
-		/* If we consumed any components above, or if GLOB_MARK is
-		 * requested and we don't yet know if the match is a dir,
-		 * we must call stat to confirm the file exists and/or
-		 * determine its type. */
+		/* If we're in at the end of the pattern, check if the file
+		 * exists. And, if GLOB_MARK is requested, determine if it's a
+		 * directory. */
 		struct stat st;
-		if ((flags & GLOB_MARK) && type==DT_LNK) type = 0;
-		if (!type && stat(buf, &st)) {
-			if (errno!=ENOENT && (errfunc(buf, errno) || (flags & GLOB_ERR)))
-				return GLOB_ABORTED;
-			return 0;
+		if (flags & GLOB_MARK) {
+			/* We need to know if the name refers to a directory
+			 * (after resolving symlinks).
+			 *
+			 * However, broken symlinks should be considered to be a
+			 * file, rather than non-existent or an error, so if
+			 * stat fails, we just don't modify type. (And lstat
+			 * will be called below if required.)
+			 */
+			if ((!type || type==DT_LNK) && stat(buf, &st) == 0) {
+				if(S_ISDIR(st.st_mode)) type = DT_DIR;
+				else type = DT_REG;
+			}
+		}
+		if (!type) {
+			/* If we're don't already know that the file exists,
+			 * confirm its presence. */
+			if (lstat(buf, &st)) {
+				if (errno!=ENOENT && (errfunc(buf, errno) || (flags & GLOB_ERR)))
+					return GLOB_ABORTED;
+				return 0;
+			}
 		}
-		if (!type && S_ISDIR(st.st_mode)) type = DT_DIR;
 		if (append(tail, buf, pos, (flags & GLOB_MARK) && type==DT_DIR))
 			return GLOB_NOSPACE;
 		return 0;
-- 
2.22.0.657.g960e92d24f-goog


[-- Attachment #3: 0001-Add-some-basic-tests-for-glob.patch --]
[-- Type: application/octet-stream, Size: 6876 bytes --]

From 4e828b7230d2d8dfebec87aa9d262ee10b8e45a4 Mon Sep 17 00:00:00 2001
From: James Y Knight <jyknight@google.com>
Date: Mon, 22 Jul 2019 20:25:38 -0400
Subject: [PATCH] Add some basic tests for glob.

---
 src/functional/glob.c                         | 135 ++++++++++++++++++
 src/functional/glob.data/basic/dir/file       |   0
 src/functional/glob.data/basic/file           |   0
 src/functional/glob.data/basic/symlink-dir    |   1 +
 src/functional/glob.data/basic/symlink-file   |   1 +
 .../glob.data/broken-links/broken-symlink     |   1 +
 src/functional/glob.data/weird-chars/foo-[a]  |   0
 src/functional/glob.data/weird-chars/foo-a    |   0
 8 files changed, 138 insertions(+)
 create mode 100644 src/functional/glob.c
 create mode 100644 src/functional/glob.data/basic/dir/file
 create mode 100644 src/functional/glob.data/basic/file
 create mode 120000 src/functional/glob.data/basic/symlink-dir
 create mode 120000 src/functional/glob.data/basic/symlink-file
 create mode 120000 src/functional/glob.data/broken-links/broken-symlink
 create mode 100644 src/functional/glob.data/weird-chars/foo-[a]
 create mode 100644 src/functional/glob.data/weird-chars/foo-a

diff --git a/src/functional/glob.c b/src/functional/glob.c
new file mode 100644
index 0000000..69ffea5
--- /dev/null
+++ b/src/functional/glob.c
@@ -0,0 +1,135 @@
+#include <string.h>
+#include <stdlib.h>
+#include <glob.h>
+#include <stdio.h>
+
+#include "test.h"
+
+glob_t globbuf;
+
+int paths_match(int expected_num, const char **expected) {
+	if (globbuf.gl_pathc + globbuf.gl_offs != expected_num)
+		return 0;
+	for (int i = 0; i < expected_num; ++i) {
+		if ((globbuf.gl_pathv[i] == NULL) != (expected[i] == NULL))
+			return 0;
+		if (expected[i] && strcmp(globbuf.gl_pathv[i], expected[i]) != 0)
+			return 0;
+	}
+	return 1;
+}
+
+void print_mismatch(int expected_num, const char **expected) {
+	t_printf("Expected paths (%d):\n", expected_num);
+	for (int i = 0; i < expected_num; ++i)
+		t_printf("  %s\n", expected[i] ? expected[i] : "NULL");
+	t_printf("Got paths (%d):\n", globbuf.gl_pathc + globbuf.gl_offs);
+	for (int i = 0; i < globbuf.gl_pathc + globbuf.gl_offs; ++i)
+		t_printf("  %s\n", globbuf.gl_pathv ? globbuf.gl_pathv[i] : "NULL");
+}
+
+#define TEST_NOFREE(pat, flags, expected_ret, ...) do {			\
+	int tflags = flags|extraflags;					\
+	int ret = glob(pat, tflags, NULL, &globbuf);			\
+	const char *e[] = __VA_ARGS__;					\
+	if (ret != expected_ret)					\
+		t_error("glob(\"%s\", %d) failed (return %d, expected %d)\n", \
+			pat, tflags, ret, expected_ret);		\
+	if (!paths_match(sizeof(e)/sizeof(*e), e)) {			\
+		t_error("glob(\"%s\", %d) returned unexpected paths.\n", \
+			pat, tflags);					\
+		print_mismatch(sizeof(e)/sizeof(*e), e);		\
+	}								\
+} while(0)
+
+#define TEST(pat, flags, expected_ret, ...) do {		\
+	TEST_NOFREE(pat, flags, expected_ret, __VA_ARGS__);	\
+	globfree(&globbuf);					\
+} while(0)
+
+#define TEST_UNMATCH(pat) do {					\
+	if (extraflags & GLOB_NOCHECK)				\
+		TEST(pat, extraflags, 0, {pat});		\
+	else							\
+		TEST(pat, extraflags, GLOB_NOMATCH, {});	\
+} while(0)
+
+#define MARK(str) ((extraflags & GLOB_MARK) ? str "/" : str)
+
+static void do_tests(int extraflags) {
+	// Check basic functioning of paths
+	TEST("basic/file", 0,
+	     0, {"basic/file"});
+	TEST("basic/symlink-file", 0,
+	     0, {"basic/symlink-file"});
+	TEST("basic/dir", 0,
+	     0, {MARK("basic/dir")});
+	TEST("basic/symlink-dir", 0,
+	     0, {MARK("basic/symlink-dir")});
+	// With trailing slash input, always returns a trailing slash
+	TEST("basic/dir/", 0,
+	     0, {"basic/dir/"});
+
+	// And patterns...
+	TEST("basic/*", 0,
+	     0, {MARK("basic/dir"), "basic/file", MARK("basic/symlink-dir"), "basic/symlink-file"});
+	TEST("basic/d?r", 0,
+	     0, {MARK("basic/dir")});
+	TEST("basic/[df]??", 0,
+	     0, {MARK("basic/dir")});
+	TEST("basic/fi[l]e", 0,
+	     0, {"basic/file"});
+	TEST("basic/*/", 0,
+	     0, {"basic/dir/", "basic/symlink-dir/"});
+	TEST("basic/*/*", 0,
+	     0, {"basic/dir/file", "basic/symlink-dir/file"});
+
+	TEST("weird-chars/foo-[a]", 0,
+	     0, {"weird-chars/foo-a"});
+	TEST("weird-chars/foo-\\[a\\]", 0,
+	     0, {"weird-chars/foo-[a]"});
+
+	// Non-matching patterns
+	TEST_UNMATCH("basic/not-there");
+	TEST_UNMATCH("basic/not-there*");
+	TEST_UNMATCH("basic/not-there/*");
+	// A file can't be specified with a trailing slash
+	TEST_UNMATCH("basic/file/");
+
+	// Check GLOB_APPEND, GLOB_DOOFFS, and both together
+	globbuf.gl_offs = 2; // should be ignored without GLOB_DOOFFS
+	TEST_NOFREE("basic/file", 0,
+		    0, {"basic/file"});
+	TEST("basic/dir", GLOB_APPEND,
+	     0, {"basic/file", MARK("basic/dir")});
+
+	globbuf.gl_offs = 2;
+	TEST_NOFREE("basic/file", GLOB_DOOFFS,
+		    0, {NULL, NULL, "basic/file"});
+	TEST("basic/dir", GLOB_APPEND|GLOB_DOOFFS,
+	     0, {NULL, NULL, "basic/file", MARK("basic/dir")});
+
+	// Check proper support for broken symlinks. Both completely-specified,
+	// and pattern matched.
+	TEST("broken-links/broken-symlink", 0,
+	     0, {"broken-links/broken-symlink"});
+
+	TEST("broken-links/*", 0,
+	     0, {"broken-links/broken-symlink"});
+}
+
+
+int main(int argc, char **argv)
+{
+	char buf[512];
+	if (!t_pathrel(buf, sizeof buf, argv[0], "glob.data")) {
+		t_error("failed to obtain relative path to glob data\n");
+		return 1;
+	}
+	chdir(buf);
+
+	do_tests(0);
+	do_tests(GLOB_MARK);
+	do_tests(GLOB_NOCHECK);
+	return t_status;
+}
diff --git a/src/functional/glob.data/basic/dir/file b/src/functional/glob.data/basic/dir/file
new file mode 100644
index 0000000..e69de29
diff --git a/src/functional/glob.data/basic/file b/src/functional/glob.data/basic/file
new file mode 100644
index 0000000..e69de29
diff --git a/src/functional/glob.data/basic/symlink-dir b/src/functional/glob.data/basic/symlink-dir
new file mode 120000
index 0000000..8724519
--- /dev/null
+++ b/src/functional/glob.data/basic/symlink-dir
@@ -0,0 +1 @@
+dir
\ No newline at end of file
diff --git a/src/functional/glob.data/basic/symlink-file b/src/functional/glob.data/basic/symlink-file
new file mode 120000
index 0000000..1a010b1
--- /dev/null
+++ b/src/functional/glob.data/basic/symlink-file
@@ -0,0 +1 @@
+file
\ No newline at end of file
diff --git a/src/functional/glob.data/broken-links/broken-symlink b/src/functional/glob.data/broken-links/broken-symlink
new file mode 120000
index 0000000..456794e
--- /dev/null
+++ b/src/functional/glob.data/broken-links/broken-symlink
@@ -0,0 +1 @@
+no-such-file
\ No newline at end of file
diff --git a/src/functional/glob.data/weird-chars/foo-[a] b/src/functional/glob.data/weird-chars/foo-[a]
new file mode 100644
index 0000000..e69de29
diff --git a/src/functional/glob.data/weird-chars/foo-a b/src/functional/glob.data/weird-chars/foo-a
new file mode 100644
index 0000000..e69de29
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] In glob(), do not require that the target of a symlink exists.
  2019-07-23  0:37 [PATCH] In glob(), do not require that the target of a symlink exists James Y Knight
@ 2019-07-30 16:04 ` James Y Knight
  2019-07-30 16:13   ` Rich Felker
  2019-08-06  2:03 ` Rich Felker
  1 sibling, 1 reply; 5+ messages in thread
From: James Y Knight @ 2019-07-30 16:04 UTC (permalink / raw)
  To: musl

Ping -- is this patch ok?

On Mon, Jul 22, 2019 at 8:37 PM James Y Knight <jyknight@google.com> wrote:
>
> Previously, when given a trailing path component with no
> metacharacters, glob would call stat to check if the provided filename
> existed, which would fail on a broken symlink. When expanding a
> pattern however, glob would trust the list of files returned by
> readdir, and thus would return broken symlinks.
>
> Now, be consistent and allow broken symlinks in both cases, by using
> lstat to determine file existence.
>
> If GLOB_MARK is specified, stat must still be used to determine
> whether a given name refers to a directory or file. But if that fails,
> a symlink is simply considered to be a file for marking purposes.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: [PATCH] In glob(), do not require that the target of a symlink exists.
  2019-07-30 16:04 ` James Y Knight
@ 2019-07-30 16:13   ` Rich Felker
  2019-07-31 20:35     ` Stephane Chazelas
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2019-07-30 16:13 UTC (permalink / raw)
  To: musl

On Tue, Jul 30, 2019 at 12:04:57PM -0400, James Y Knight wrote:
> Ping -- is this patch ok?

I think the logic is right, but I need to look again. I was confused
by the comment changes of the patch.

Rich


> On Mon, Jul 22, 2019 at 8:37 PM James Y Knight <jyknight@google.com> wrote:
> >
> > Previously, when given a trailing path component with no
> > metacharacters, glob would call stat to check if the provided filename
> > existed, which would fail on a broken symlink. When expanding a
> > pattern however, glob would trust the list of files returned by
> > readdir, and thus would return broken symlinks.
> >
> > Now, be consistent and allow broken symlinks in both cases, by using
> > lstat to determine file existence.
> >
> > If GLOB_MARK is specified, stat must still be used to determine
> > whether a given name refers to a directory or file. But if that fails,
> > a symlink is simply considered to be a file for marking purposes.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: [PATCH] In glob(), do not require that the target of a symlink exists.
  2019-07-30 16:13   ` Rich Felker
@ 2019-07-31 20:35     ` Stephane Chazelas
  0 siblings, 0 replies; 5+ messages in thread
From: Stephane Chazelas @ 2019-07-31 20:35 UTC (permalink / raw)
  To: musl

2019-07-30 12:13:36 -0400, Rich Felker:
> On Tue, Jul 30, 2019 at 12:04:57PM -0400, James Y Knight wrote:
[...]
> > > Now, be consistent and allow broken symlinks in both cases, by using
> > > lstat to determine file existence.
[...]

Hello,

Note that there's a related discussion on the POSIX mailing list
where that unusual behaviour of musl libc (of using stat()
instead of lstat()) was noted:

https://www.mail-archive.com/austin-group-l@opengroup.org/msg04585.html

The discussion itself was about what is reported with GLOB_ERR
or when an errfunc is passed where the current POSIX spec is
unclear and implementations differ. The austin-group bug is
http://austingroupbugs.net/view.php?id=1273 where I argue
implementations shouldn't report ENOTDIR errors at least even
with GLOB_ERR.

musl is quite unique in that it does report the errors of
stat() in addition to those of opendir(). While POSIX currently
kind of implies it shouldn't, it seems to me it would make for a
more useful interface. That is still being discussed there.

musl's glob will still report errors for system calls failing
with ENOTDIR if it encounters them. At the moment, in globs like
*/*.c, and on systems and file systems where readdir() returns
the type of entries, it will generally not report ENOTDIR errors
because it wouldn't call opendir() on non-directory files in the
first place. But it would on systems where readdir() doesn't
return the type of entries or if there's a symlink to a
non-directory file in the current directory, which makes for a
not very consistent interface.

IMO, it should also ignore ENOTDIR errors (for both opendir()
and lstat()) as a backup even if that means that no error would
be reported for things like /etc/passwd/*

Most implementations ignore ENOTDIR errors, even dietlibc that
even tries harder than musl to avoid calling opendir() on
non-directory files.

-- 
Stephane


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] In glob(), do not require that the target of a symlink exists.
  2019-07-23  0:37 [PATCH] In glob(), do not require that the target of a symlink exists James Y Knight
  2019-07-30 16:04 ` James Y Knight
@ 2019-08-06  2:03 ` Rich Felker
  1 sibling, 0 replies; 5+ messages in thread
From: Rich Felker @ 2019-08-06  2:03 UTC (permalink / raw)
  To: musl

On Mon, Jul 22, 2019 at 08:37:38PM -0400, James Y Knight wrote:
> Previously, when given a trailing path component with no
> metacharacters, glob would call stat to check if the provided filename
> existed, which would fail on a broken symlink. When expanding a
> pattern however, glob would trust the list of files returned by
> readdir, and thus would return broken symlinks.
> 
> Now, be consistent and allow broken symlinks in both cases, by using
> lstat to determine file existence.
> 
> If GLOB_MARK is specified, stat must still be used to determine
> whether a given name refers to a directory or file. But if that fails,
> a symlink is simply considered to be a file for marking purposes.

> From 7ef0fabf173f4d29461b01ccfb05c1f19977ba37 Mon Sep 17 00:00:00 2001
> From: James Y Knight <jyknight@google.com>
> Date: Thu, 11 Jul 2019 16:55:17 -0400
> Subject: [PATCH] In glob(), do not require that the target of a symlink
>  exists.
> 
> Previously, when given a trailing path component with no
> metacharacters, glob would call stat to check if the provided filename
> existed, which would fail on a broken symlink. When expanding a
> pattern however, glob would trust the list of files returned by
> readdir, and thus would return broken symlinks.
> 
> Now, be consistent and allow broken symlinks in both cases, by using
> lstat to determine file existence.
> 
> If GLOB_MARK is specified, stat must still be used to determine
> whether a given name refers to a directory or file. But if that fails,
> a symlink is simply considered to be a file for marking purposes.
> ---
>  src/regex/glob.c | 35 +++++++++++++++++++++++++----------
>  1 file changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/src/regex/glob.c b/src/regex/glob.c
> index aa1c6a44..01b178dc 100644
> --- a/src/regex/glob.c
> +++ b/src/regex/glob.c
> @@ -88,18 +88,33 @@ static int do_glob(char *buf, size_t pos, int type, char *pat, int flags, int (*
>  	}
>  	buf[pos] = 0;
>  	if (!*pat) {
> -		/* If we consumed any components above, or if GLOB_MARK is
> -		 * requested and we don't yet know if the match is a dir,
> -		 * we must call stat to confirm the file exists and/or
> -		 * determine its type. */
> +		/* If we're in at the end of the pattern, check if the file
> +		 * exists. And, if GLOB_MARK is requested, determine if it's a
> +		 * directory. */

This is one of the comment changes I was confused about; it seems less
informative. The point of the original comment was not describing the
controlling expression just above it, but the conditions on not being
able to use the caller-passed type -- having consumed any new literal
components from the pattern above is what precludes it.

>  		struct stat st;
> -		if ((flags & GLOB_MARK) && type==DT_LNK) type = 0;
> -		if (!type && stat(buf, &st)) {
> -			if (errno!=ENOENT && (errfunc(buf, errno) || (flags & GLOB_ERR)))
> -				return GLOB_ABORTED;
> -			return 0;
> +		if (flags & GLOB_MARK) {
> +			/* We need to know if the name refers to a directory
> +			 * (after resolving symlinks).
> +			 *
> +			 * However, broken symlinks should be considered to be a
> +			 * file, rather than non-existent or an error, so if
> +			 * stat fails, we just don't modify type. (And lstat
> +			 * will be called below if required.)
> +			 */
> +			if ((!type || type==DT_LNK) && stat(buf, &st) == 0) {

!stat

> +				if(S_ISDIR(st.st_mode)) type = DT_DIR;
> +				else type = DT_REG;
> +			}
> +		}
> +		if (!type) {
> +			/* If we're don't already know that the file exists,
> +			 * confirm its presence. */
> +			if (lstat(buf, &st)) {

This could be "!type && lstat(..." to avoid the excessively long lines
below and to better mimic the old structure so that stylistic change
doesn't obscure reading of functional change.

Now that I look at it, the same applies to the above too.

> +				if (errno!=ENOENT && (errfunc(buf, errno) || (flags & GLOB_ERR)))
> +					return GLOB_ABORTED;
> +				return 0;
> +			}
>  		}
> -		if (!type && S_ISDIR(st.st_mode)) type = DT_DIR;
>  		if (append(tail, buf, pos, (flags & GLOB_MARK) && type==DT_DIR))
>  			return GLOB_NOSPACE;
>  		return 0;
> -- 
> 2.22.0.657.g960e92d24f-goog
> 

Otherwise I think this looks ok.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-08-06  2:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-23  0:37 [PATCH] In glob(), do not require that the target of a symlink exists James Y Knight
2019-07-30 16:04 ` James Y Knight
2019-07-30 16:13   ` Rich Felker
2019-07-31 20:35     ` Stephane Chazelas
2019-08-06  2:03 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).