From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: References: <9ab217670904161047w56b70b74ke25a0280b0f70cc2@mail.gmail.com> Date: Thu, 16 Apr 2009 16:10:38 -0400 Message-ID: <9ab217670904161310xc49286dv247689443b6d18e6@mail.gmail.com> From: "Devon H. O'Dell" To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [9fans] security questions Topicbox-Message-UUID: de82d0dc-ead4-11e9-9d60-3106f5b1d025 2009/4/16 Venkatesh Srinivas : > Devlimit / Rlimit is less than ideal - the resource limits aren't > adaptive to program needs and to resource availability. They would be > describing resources that user programs have very little visible > control over (kernel resources), except by changing their syscall mix > or giving up a segment or so. Or failing outright. Right, but that's part of the point. They have very little visible control over said resources, but the kernel does need to impose limitations on the allowable resources, more below. Either way, I agree. This form of resource limitation sucks. My reasoning is different, however: I feel that the number of places you need to add new code and verify that any future changes respect these sorts of limitations. This is much more difficult to do than when the limitation is built into the allocator. > Prohibitions per-user are kinda bad in general - what if you want to > run a potentially hostile (or more likely buggy) program? You can > already run it in its own ns, but having it be able to stab you via > kernel resources, about which you can do nothing, is bad. This depends on your perspective. If you are an administrator, and you are running a system that provides access to a plethora of users, your viewpoint is different than if you are a programmer, writing an application with expensive needs. A user who wants to run a potentially hostile program should not be able to affect the system to the point that other users have their own negative experiences. A user running a buggy program that hogs a ton of memory is not such a big deal. Buy more memory. A user running a buggy program that runs the kernel out of resources, causing it to halt is a big deal. > The typed allocator is worth looking at for speed reasons - the slab > allocator and customalloc have shown that its faster (from the > perspective of allocation time, fragmentation) to do things that way. > But I don't really see it addressing the problem? Now the constraints > are per-resource, but they're still magic constraints. This is the pitfall of all tunables. Unless someone pulls out a calculator (or is mathematically brilliant), figuring out exact numbers for X number of R resources spread between N users on a system with Y amount of memory is silly. Fortunately, an administrator makes educated best guesses based upon these same factors: 1) The expected needs of the users of the system (and in other cases, the absolute maximum needs of the users) 2) The limitations of the system itself. My laptop has 2GB of RAM, the Plan 9 kernel only sits in 256MB of that. Now, assuming I have a program that's able to allocate 10,000 64 byte structures a second, I can panic the system in under two minutes. Does that mean I want to limit my users to under 10,000 of those structures? Not necessarily, but if I expect 40 users at a given time, I might want to make sure that number is well under 100,000. While it may not be perfectly ideal, it allows the administrator to maintain control over the system. Additionally, there's typically always a way to turn them off (echo 0 > /dev/constraint/resource, hypothetically speaking). In this respect, I don't see how it doesn't address the problem... ...Unless you consider that using Pools for granular limitations isn't the best idea. In this light, perhaps additional pools aren't the correct answer. While creating an individual pool per limited resource serves to implement a hard limit on that resource, it's also required to have a maximum amount of memory. So if you ditch that idea and just make typed allocations, the problem is `solved': When an allocation takes place, we check various heuristics to determine whether or not the allocation is valid. Does the user have over X Fids? Does the process hold open more than Y ports? If these heuristics fail, the memory is not allocated, and the program takes whatever direction it takes when it does not have resources. (Crash, complain, happily move forward, whatever). If they pass, the program gets what it needs, and {crashes, complains, happily moves forward}. One can indirectly (and more consistently) limit the number of allocated resources in this fashion (indeed, the number of open file descriptors) by determining the amount of memory consumed by that resource as proportional to the size of the resource. If I as a user have 64,000 allocations of type Foo, and struct Foo is 64 bytes, then I hold 1,000 Foos. The one unfortunate downside to this is that implementing this as an allocation limit does not make it `provably safe.' That is to say, if I create a kernel subsystem that allocates Foos, and I allocate everything with T_MANAGED, there is no protection on the number of Foos I allocate (assuming T_MANAGED means that this is a memory allocation I manage myself. It's therefore not provably safer, in terms of crashing the kernel, and I haven't been able to come up with an idea that is provably safe (unless type determination is done using getcallerpc, which would result in an insanely large amount of tunables and would be completely impractical). Extending the API in this fashion, however, almost ensures that this case will not occur. Since the programmer must specify a type for allocation, they must be aware of the API and the reasoning (or at least we'd all like to hope so). If a malicious person is able to load unsafe code into the kernel, you're screwed anyway. So really, this project is more to protect the diligent and less to help the lazy. (The lazy are all going to have for (i in `{ls /dev/constraint}) { echo 0 > $i } in their {term,cpu}rc anyway.) > Something that might be interesting would be for each primitive pgroup > to be born with a maximum percentage 'under pressure' associated with > each interesting resource. Child pgroups would allocate out of their > parent's pgroup limits. Until there is pressure, limits could be > unenforced, leading to higher utilization than magic constants in > rlimit. > > To give a chance for a process to give up some of its resources > (caches, recomputable data) under pressure, there could be a > per-process /dev/halp device, which a program could read; reads would > block until a process was over its limit and something else needed > some more soup. If the app responds, great. Otherwise, something like > culling it or swapping it out (assuming that swap still worked :)) or > even slowing down its allocation rate artificially might be answers... Programs consuming user memory aren't an issue, in case that wasn't clear. It's programs that consume kernel memory indirectly due to their own behavior. I think you get this based on some of the points you raised earlier, but I just wanted to make sure. For instance, reading a from a resource is extremely unlikely to cause issues inside the kernel -- to read the data, all the structures for passing the data must already be allocated. If the data came over a socket (as erik pointed out), that memory is pre-allocated or of a fixed size. > If people are interested, there is some work going on in a research > kernel called Viengoos > (http://www.gnu.org/software/hurd/microkernel/viengoos.html) (gasp! > the hurd!) trying to address pretty much the same problem... I read the paper, but it's hard to see how exactly this would apply to this problem. There's a strong bias there towards providing programs with better scheduling / more memory / more allowable resources if the program is well behaved. This is interesting for improving user experience of programs, but I feel like there are two drawbacks from the outcome based on the problem I'm trying to solve: 1) Programmers are required to do more work to guarantee that their programs won't be affected by the system 2) You still have to have hard limits (in this case, arbitrarily based on percentage) to avoid a user program running the kernel out of resources. At the end of the day, you must have a limit that is lower than the maximum amount minus any overhead from managed resources. Any solution will overcommit, but a percentage-based solution seems more difficult to tune. Additionally, it's much more complex (and thus much more prone to error in multiple cases). Since the heuristics for determining the resource limits would be automated, it's not necessarily provable that someone couldn't find a way to subvert the limitations. It adds complexity to the scheduler, to the slab allocator, and to any area of code that would need to check resources (someone adding a new resource then needs to do much more work than registering a new memory type and using that for allocation). Quite frankly, the added complexity scares me a bit. Perhaps I am missing something, so if you can address those points, that would be good. > -- vs --dho