On 1 December 2015 at 03:24, arisawa wrote: > current kernel allows unmount even if after rfork m. > this feature makes sandboxing difficult. > can anyone explain this feature is necessary? > For some applications, similar to ftp or http, I've built a name space somewhere and then bound that to /. As you say, that doesn't allow code to run that could unmount that binding. The approach to encapsulate arbitrary code is similar to newns, perhaps even using it: to build a new name space from scratch (#/) that has just what's needed in it, prevent new # names from being bound into it using the rfork option, and then the application can still rearrange it using bind and unmount but can't escape. One problem with the existing implementation is that the granularity of some name space components is too large. For instance, #p for /proc reveals the names and read-only data, even if a process can't operate on them; there's a special little hack to restrict "none", but it's not as clean as it could be. Another problem is that the set of # names allowed after RFNOMNT is built in to the kernel. There are several ways that could be improved, either through new interfaces (eg, Roger Peppe's "attach files") or new services to manage the existing scheme. I think encapsulation through computable name spaces also ought to be usable recursively, whereas RFNOMNT is either on or off.