From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, NICE_REPLY_A,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 10888 invoked from network); 31 May 2022 15:11:18 -0000 Received: from 9front.inri.net (168.235.81.73) by inbox.vuxu.org with ESMTPUTF8; 31 May 2022 15:11:18 -0000 Received: from mail.posixcafe.org ([45.76.19.58]) by 9front; Tue May 31 11:09:31 -0400 2022 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=posixcafe.org; s=20200506; t=1654009767; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0GFCNSNnEEOe3TKsQDFpDnAKL9xh/DcdGFGNLAKzHX0=; b=c/yFaN+8t/uM4cT3iyRdUOWwb7gpsyY7Q1Ps8fX8GuRQqbE9VBMp0FsL5z7sCiQhUzGZX3 LOItsu7viMURykBLus2tU5L9rUuUB0Ba3LekRcOb9X9c+RoMwLGw1RRJ0PAq2w+K/zzSxH pcgWcQu4bp1YW2NfPUB3aXmbu+hUB28= Received: from [192.168.168.200] (161-97-228-135.lpcnextlight.net [161.97.228.135]) by mail.posixcafe.org (OpenSMTPD) with ESMTPSA id 735b8ffd (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for <9front@9front.org>; Tue, 31 May 2022 10:09:26 -0500 (CDT) Message-ID: <475b654b-641e-6f08-c5bc-bcf6aab4f51a@posixcafe.org> Date: Tue, 31 May 2022 09:09:03 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Content-Language: en-US To: 9front@9front.org References: <847F45EC7225C1D0B69E796B69D6E3ED@eigenstate.org> From: Jacob Moody In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit List-ID: <9front.9front.org> List-Help: X-Glyph: ➈ X-Bullshit: self-healing rails-scale component app Subject: Re: [9front] [PATCH] private /srv attach option Reply-To: 9front@9front.org Precedence: bulk I spent the better half of yesterday thinking about how /srv fits into the current work. First lets talk about how we go about building an isolated namespace as is. The general idea is to only use the root folders provided by #/, then selectively binding in the necessities from the real root, then dropping any reference to the real root from the namespace. This is roughly how the current tcp80.namespace works, but this solution does not scale too well to programs that require a lot of file system state. An example is generally anything required out of /lib and/or /sys, these are not folders given by #/ and binding in the entirety of them from the real root isn't right either. So how do we setup the correct state in /? One way is to build a skeleton hierarchy stashed somewhere that can be unioned on to /, and used as targets for binds from the real root in the namespace file. This works pretty well for third party programs, an example for this would be something like werc. but shipping a skeleton for programs in tree sucks. So what else do we have? Well there is always mntgen. But mntgen has some issues: 1. There is some data leakage about what one client using through what folders show up when listing the root. Likely a somewhat easy fix 2. Mntgen only makes up directories at the root, we would want to extend this to child directories Also sounds like a doable fix 3. Mntgen is a userspace program, there is no facility for getting a fresh one in namespace files. You have to have one sitting in /srv somewhere for it to be useful. This last point is interesting and one I wanted to discuss more of. We could just provide a 'mntgen' command to namespace files to get a private mount of mntgen. But you wouldn't want to do this if you are rebuilding the namespace on each connection. A program wishing to do this could setup a mntgen/ramfs with the correct skeleton in a private /srv to prevent having to share it with the rest of the system. Now is the restriction to not being able to recurse through subsequent private /srv's once blocking the global /srv an issue? I dont really think it is. I explicitly do not want chdev to proliferate to every program on the system like openBSD's pledge. It is designed to be used at the very edge of the process tree, for leaf processes that are handling potentially malicious input. In part due to the fact that we never allow a process, or it's children, to regain access to something that has been lost. This is to say that a program should know where it's process tree ends, and what capabilities it's children will need. So you either end up with: bind -c '#sp' /srv chdev -r 's' in the 'setup' phase of the parent if you know your children will never need to make another further private /srv. If you do need to create a further private /srv, then simply kick that code snippet down the process tree to the last place you need to substantiate the private /srv. If you know the child will want a private /srv but the parent will not. Fork the child with RFNAMEG, then do 'chdev -r s' in the parent after the fork call. This is only an issue if you: 1. Don't know what capabilities your children want. At which point chdev isn't very helpful to begin with. 2. Conditionally making a private /srv after you've begun reading potentially malicious input I think you kinda get what you aught to here. I'm just trying to really rack my head for a scenario where you need to leave the global /srv on the table because you don't know if your children will want another private /srv. If others have examples, please let me know. I am pushing more because I _want_ it to be good enough for devsrv to just keep track of sessions itself. I dont expect devsrv to be the last device we may want private sessions for, and growing a new process group per device is not ideal. But maybe we can do a bit better then this, I am going to poke around with an in-tree way (ctl file or similar) to see if that shakes out to being a bit nicer. But I will say I do prefer the semantics of an attach option for getting a different 'tree' of /srv, it feels right. My grand plan is as you laid out, I want to turn namespaces in to security boundaries, and for the most part we're there now. Chdev allows namespaces to be security boundaries. The tricky part that we're trying to decide now is what tools do we want to give in order to make these types of namespaces easier to build. It is much harder to lay out a plan for this. This mail serves to attempt to lay out what I have in my head in this regard. The programs we choose to grow new features is dependent on their implementation quirks, such that our new use case does not impose too much on to the existing code. So a lot of it turns in to 'see what the low hanging fruits are' and reflecting on how/if these modifications produce a worthwhile tool. So there's the big thought dump, pick it apart as desired. Thanks, moody