From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <8de1fd114be4dceb5aa92f63db859aca@sphericalharmony.com> Date: Tue, 12 Mar 2013 06:42:27 +0000 From: mycroftiv@sphericalharmony.com To: 9fans@9fans.net MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: [9fans] How to admin and repair venti/fossil? Topicbox-Message-UUID: 27aa7a14-ead8-11e9-9d60-3106f5b1d025 Problem: How to admin and repair Venti+Fossil Solution: Manage them from an independent namespace In response to feedback that I should explain how Advanced Namespace Tools (ANTS) can solve pratical problems, I will explain in terms of Venti and Fossil administration and maintenance. The full Plan 9 architecture of separate Venti, Fossil, and tcp boot cpu servers has many benefits, but sometimes experiences issues due to the dependencies between machines. In standard Plan 9, the Venti has to boot first, then the Fossil, then the cpu, and if there is any issue like a change in IP address, the kernel will panic and reboot if it cant find its root filesystem. Additionally, it is hard to control and administer something (a file or venti server) when your ability to do so depends on it working right! In other words, if your whole plan 9 environment is rooted to an fs which depends on venti->fossil->cpu, if something goes wrong, everything freezes up and you can't control the system. You have to reboot everything, and if there are problems then, the machines will go into "reboot loop" because they can't find their root file system - and without a root filesystem, how can you fix the problems? The solution to these problems usually involves making use of a live cd, or having other systems available to use to help fix the configurations or repair the filesystems. On my grid of plan 9 systems, I use a different approach. I use a special kernel 9pcram to create an independent namespace at boot, separate from the user's namespace which is built on top of the venti to fossil to cpu chain. The "service namespace" on each machine is available by cpu to port 17020 and has tools to perform any needed work like resetting a fossil or venti. Each node of the grid runs the service namespace underneath the conventional venti/fossil/cpu namespace so while the userspace is a completely standard Plan 9 environment built from 3 machines, underneath this, each machine has its own independent namespace, so there is never a forced reboot because of the failure of another machine. The user environment can "break" if there are issues with venti/fossil/cpu connections, but the user environment can be fixed without rebooting and you have the tools on each machine to administer it self-sufficiently if the rest of the grid is having issues. ANTS also has additional tools built on top of this idea of administering Venti+Fossil "from below". Making optimal use of Fossil + Venti means you need to replicate data between ventis and preserve fossil root scores, and it is helpful to save some metadata along with them. The service namespace on each machine means that the service namespaces on the grid can make a copy of the user's root available with a separate chain of hardware dependencies. Having the admin/service namespace makes the user environment much more robust, and it can be re-rooted to a new hardware copy of the same data on the fly. The separation of concerns between the user namespace and the administrative namespace makes administration far easier and not subject to disruptions that happen to userspace. My grid uses venti+fossil+tcp cpus, and before I wrote the ANTS software for myself, I found it hard to deal with any issues that arose such as hardware failures or losses of network connectivity. By using the 9pcram kernel and ANTS software, I can keep working with my data even as I reboot different nodes on the grid, which wasn't possible for me before. I no longer have problems with venti/fossil corruption, because ANTS helps me with progressive duplication of data and making my root filesystem available from multiple machines. This was the initial main motivation that led me to write the software that I am calling ANTS - finding a way to administer Venti+Fossil+tcp cpu environment and make it more reliable against disruptions of any kind. The software does a lot more than "just help maintain venti+fossil" and so when I talk about it, sometimes it is hard for me to stay focused on the most basic and important practical purposes. Making it so that you can administer and fix the dependencies of venti/fossil/cpu more easily was the major practical problem which ANTS was written to solve. Ben Kidwell "mycroftiv" (ants tutorials http://antfarm.9gridchan.org/tutorial) (ants VM images and links http://9gridchan.org) (ants software and documentation http://ants.9gridchan.org) (ants software also served via 9p and ftp at ants.9gridchan.org)