From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14440 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] glob: implement GLOB_TILDE Date: Wed, 24 Jul 2019 09:00:34 -0400 Message-ID: <20190724130034.GC1506@brightrain.aerifal.cx> References: <20190723183354.31633-1-ismael@iodev.co.uk> <20190723200708.GA25787@brightrain.aerifal.cx> <20190724071522.GA24370@pirotess.home> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="264756"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-14456-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jul 24 15:00:55 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1hqGt0-0016OX-DZ for gllmg-musl@m.gmane.org; Wed, 24 Jul 2019 15:00:50 +0200 Original-Received: (qmail 28569 invoked by uid 550); 24 Jul 2019 13:00:47 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 28513 invoked from network); 24 Jul 2019 13:00:46 -0000 Content-Disposition: inline In-Reply-To: <20190724071522.GA24370@pirotess.home> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:14440 Archived-At: On Wed, Jul 24, 2019 at 09:15:22AM +0200, Ismael Luceno wrote: > On 23/Jul/2019 16:07, Rich Felker wrote: > > On Tue, Jul 23, 2019 at 08:33:54PM +0200, Ismael Luceno wrote: > > > Signed-off-by: Ismael Luceno > > > --- > > > include/glob.h | 1 + > > > src/regex/glob.c | 26 ++++++++++++++++++++++++++ > > > 2 files changed, 27 insertions(+) > > > > > > diff --git a/include/glob.h b/include/glob.h > > > index 76f6c1c68a23..fc8106b20c5b 100644 > > > --- a/include/glob.h > > > +++ b/include/glob.h > > > @@ -30,6 +30,7 @@ void globfree(glob_t *); > > > #define GLOB_APPEND 0x20 > > > #define GLOB_NOESCAPE 0x40 > > > #define GLOB_PERIOD 0x80 > > > +#define GLOB_TILDE 0x100 > > > > > > #define GLOB_NOSPACE 1 > > > #define GLOB_ABORTED 2 > > > diff --git a/src/regex/glob.c b/src/regex/glob.c > > > index aa1c6a4482ee..77b81f707420 100644 > > > --- a/src/regex/glob.c > > > +++ b/src/regex/glob.c > > > @@ -8,6 +8,8 @@ > > > #include > > > #include > > > #include > > > +#include > > > +#include "../passwd/pwf.h" > > > > > > struct match > > > { > > > @@ -35,6 +37,30 @@ static int do_glob(char *buf, size_t pos, int type, char *pat, int flags, int (* > > > /* If GLOB_MARK is unused, we don't care about type. */ > > > if (!type && !(flags & GLOB_MARK)) type = DT_REG; > > > > > > + if ((flags & GLOB_TILDE) && *pat == '~') { > > > + struct passwd pw, *res; > > > + char *buf = NULL; > > > + size_t len = 0; > > > + char *name_end = strchr(++pat, '/'); > > > + if (name_end) > > > + *name_end = 0; > > > + uid_t uid = *pat ? 0 : getuid(); > > > + int err = __getpw_a(pat, uid, &pw, &buf, &len, &res); > > > + if (!err && res) { > > > + while (pos < PATH_MAX - 1 && *pw.pw_dir) > > > + buf[pos++] = *pw.pw_dir++; > > > + } > > > + free(buf); > > > + if (err) > > > + return GLOB_NOSPACE; > > > + if (!res) > > > + return 0; > > > + if (name_end) { > > > + pat = name_end; > > > + *pat = '/'; > > > + } > > > + } > > > > This should almost surely go in the top-level glob function rather > > than the recusive function, where the original strdup of pat happens. > > As you've written it, I think it wrongly applies tilde expansion to > > each path component into which recursion happens, but not to ones > > where it doesn't, whereas it should only apply to the first component. > > Doing this in the recursive function also has the problem of exploding > > stack usage. One thing I don't see is how the code is working at all > > right now, because you've shadowed the argument buf with a locally > > scoped variable by the same name that's the working space buffer > > passed to __getpw_a. The result directory is never copied out into the > > actual buf array. > > Indeed it doesn't work XD. > > I meant it more as a quick try and to see if it was interesting and the > right kind of approach. It is interesting. Adding this functionality is a tradeoff, since it does pull in linking of more stuff when glob is used even without GLOB_TILDE, but since glob already inherently depends on dynamic allocation it's not a huge issue, and this has already been planned functionality for a while (it's on the roadmap). The approach is somewhat right, modulo points in my previous review I think. > > One thing to note at this point: it's generally preferable to use the > > public interfaces (getpwnam_r) rather than the internal ones from a > > different sybsystem of libc (__getpw_a) unless there's a really > > compelling reason not to. Here I think it would actually be easier > > using the public interface -- you could reuse buf for the buffer space > > you pass to getpwnam_r, then memmove from pw_dir (which lies somewhere > > inside buf, you don't know where) to the start of buf. > > The reason I had to use the private interface is to avoid dealing with > the buffer allocation on the side of the glob function. As noted, I think you can just reuse buf[] and avoid any dynamic allocation visible in glob (of course it happens internally in getpwnam_r but that's hidden). The only way it's not going to fit is if the user's homedir doesn't fit (or barely fits) in PATH_MAX, meaning they wouldn't be able to access anything in it with absolute paths to begin with. > It would be better to keep it that way. To clarify, the reason to avoid doing this unless there's a really strong motivation is to avoid cross-component coupling. If for example someone needed to change the signature of __getpw_a as part of changes in the pwd implementation, they'd have to figure out how glob is using it and how to adapt glob to the new change, rather than just focusing on the subsystem they're working on and familiar with. By using public interfaces, you avoid that. > > There's also a matter of intended behavor: my understanding is that > > plain ~ without a name is supposed to expand to $HOME, not lookup the > > caller's passwd entry based on uid (which is broken on setups where > > you have multiple login accounts with the same uid). The man page > > doesn't document this at all. I looked just now in the glibc manual > > and it documents the behavior in terms of getlogin+getpwnam, which is > > utterly broken -- it doesn't work at all in the absence of a login > > session associated with a terminal (except in musl where we wrongly > > just return getenv("LOGNAME"), a known/open bug). and it's also > > inconsistent with the standard shell behavior, which is to use $HOME: > > > > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_01 > > > > So I feel like, if we implement GLOB_TILDE, we should do it the way > > that's consistent with the standard meaning of ~, using $HOME, not the > > glibc implementation. This is also much more user-friendly, in that it > > honors manual override of $HOME. > > I'm not entirely comfortable expanding HOME unconditionally, as it could > easily be exploited into a vulnerability for any privileged (e.g. > setuid) process. I understand and appreciate your concern, but I don't think that can affect any use that's not already horribly insecure for other reasons. If glob is being used with GLOB_TILDE in a suid context, one of the following holds: - The input pattern is provided by the user, in which case the user already fully controls the output, regardless of whether it contains ~ or how ~ expands. - The input pattern is fixed and does not start with plain ~, in which case how ~ is expanded is irrelevant. - The input pattern is fixed and does start with plain ~, in which case the program is explicitly requesting output that's under the control of the invoking user by virtue of controlling the contents of their home directory. Naturally the cases where the suid program is operating on data controlled by the invoking user are usually a bad idea, and if done at all need to be done under extreme care, but I don't see how honoring the standard meaning of ~ makes that any moreso. The outputs of glob are always literal pathname strings, and have to be treated as such, not pasted into command lines or used as format strings or anything like that. Rich