From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 20178 invoked from network); 25 Nov 2020 05:40:16 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 25 Nov 2020 05:40:16 -0000 Received: (qmail 5598 invoked by uid 550); 25 Nov 2020 05:40:14 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 5580 invoked from network); 25 Nov 2020 05:40:13 -0000 MIME-Version: 1.0 Date: Wed, 25 Nov 2020 08:40:02 +0300 From: Alexey Izbyshev To: musl@lists.openwall.com In-Reply-To: <20201124203132.GE534@brightrain.aerifal.cx> References: <20201122225619.GR534@brightrain.aerifal.cx> <97dd3cf7c69673e5962e9ccd46ea5131@ispras.ru> <20201123031932.GS534@brightrain.aerifal.cx> <20201123185633.GY534@brightrain.aerifal.cx> <20201123205259.GZ534@brightrain.aerifal.cx> <48faf5ab9a1f3c869c85897217db0d75@ispras.ru> <20201124042646.GA534@brightrain.aerifal.cx> <20201124203132.GE534@brightrain.aerifal.cx> User-Agent: Roundcube Webmail/1.4.4 Message-ID: X-Sender: izbyshev@ispras.ru Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [musl] realpath without procfs -- should be ready for inclusion On 2020-11-24 23:31, Rich Felker wrote: > On Mon, Nov 23, 2020 at 11:26:46PM -0500, Rich Felker wrote: >> On Tue, Nov 24, 2020 at 06:39:59AM +0300, Alexey Izbyshev wrote: >> > * ENOTDIR should be returned if the last component is not a >> > directory and the path has one or more trailing slashes >> >> Yes, that's precisely what I've been working on the past couple hours. >> I think you missed but .. will also erase a path component that's not >> a dir (e.g. /dev/null/.. -> /dev) and these are both instances of a >> common problem. I thought use of readlink covered all the ENOTDIR >> cases but it doesn't when the next component isn't covered by readlink >> or isn't present at all. >> >> It's trivial to fix with a check after each component but that doubles >> the number of syscalls and mostly isn't necessary. I have a reworked >> draft to fix the problem by advancing over /(/|./|.$)* rather than >> just >> /+ after each component, so that we can lookahead and do an extra >> readlink in the cases that need it. > > While this worked, it ended up being the wrong thing to do, making two > places where readlink is called, one of them with a dummy buffer. The > right way to do it is rework the flow so that the existing readlink is > "naturally" hit where needed. This amounts to: > > - Letting .. processing that cancels path components go through the > same code path as new path components, rather than handling it > early, and just skipping the actual readlink if we already know we > have a dir. > > - Also treating a zero-length final component as something that goes > through the readlink code path. > > There was a fair amount of reorganizing needed to make this work out, > but the end result is clean and non-redundant and code size is almost > the same as before with the missing-ENOTDIR bugs. > > Speaking of code size, on 32-bit archs the proposed explicit realpath > is roughly the same size as stat+fstat+fstatat (a little over 1k on > i386), which were needed to implement the old lazy realpath in terms > of procfs. So for minimal static linking, resulting code size may be > same or smaller. (Of course it's larger if stat is already linked for > other reasons.) > > New draft attached. It's possible that there are regressions since I > haven't put together an automated testset. I'm not sure if I'll try to > merge it in this release cycle still or not; that probably depends on > how easy or difficult automating these tests ends up being. > The new draft looks good to me. I've also done some basic manual testing (not covering all proposed cases) and haven't found any issues. I don't see why the size of stack has to be PATH_MAX+1 though. To address the issue with symlink targets of PATH_MAX-1 length, it seems sufficient to just do the following: - ssize_t k = readlink(output, stack, p); - if (k==p) goto toolong; + ssize_t k = readlink(output, stack, p+1); + if (k==p+1) goto toolong; Since p is never past the end of the stack, there is no harm in allowing k == p. Alexey