On 1 December 2015 at 03:24, arisawa <arisawa@ar.aichi-u.ac.jp> wrote:

> current kernel allows unmount even if after rfork m.
> this feature makes sandboxing difficult.
> can anyone explain this feature is necessary?
>

For some applications, similar to ftp or http, I've built a name space
somewhere and then bound that to /.
As you say, that doesn't allow code to run that could unmount that binding.

The approach to encapsulate arbitrary code is similar to newns, perhaps
even using it:
to build a new name space from scratch (#/) that has just what's needed in
it,
prevent new # names from being bound into it using the rfork option, and
then the application can still rearrange
it using bind and unmount but can't escape.

One problem with the existing implementation is that the granularity of
some name space components is
too large. For instance, #p for /proc reveals the names and read-only data,
even if a process can't operate on them;
there's a special little hack to restrict "none", but it's not as clean as
it could be.
Another problem is that the set of # names allowed after RFNOMNT is built
in to the kernel.
There are several ways that could be improved, either through new
interfaces (eg, Roger Peppe's "attach files")
or new services to manage the existing scheme.
I think encapsulation through computable name spaces also ought to be
usable recursively, whereas RFNOMNT is either on or off.