9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] How to admin and repair venti/fossil?
@ 2013-03-12  6:42 mycroftiv
  0 siblings, 0 replies; only message in thread
From: mycroftiv @ 2013-03-12  6:42 UTC (permalink / raw)
  To: 9fans

Problem: How to admin and repair Venti+Fossil

Solution: Manage them from an independent namespace

In response to feedback that I should explain how Advanced Namespace
Tools (ANTS) can solve pratical problems, I will explain in terms of
Venti and Fossil administration and maintenance.

The full Plan 9 architecture of separate Venti, Fossil, and tcp boot
cpu servers has many benefits, but sometimes experiences issues due to
the dependencies between machines.  In standard Plan 9, the Venti has
to boot first, then the Fossil, then the cpu, and if there is any
issue like a change in IP address, the kernel will panic and reboot if
it cant find its root filesystem.

Additionally, it is hard to control and administer something (a file
or venti server) when your ability to do so depends on it working
right!  In other words, if your whole plan 9 environment is rooted to
an fs which depends on venti->fossil->cpu, if something goes wrong,
everything freezes up and you can't control the system.  You have to
reboot everything, and if there are problems then, the machines will
go into "reboot loop" because they can't find their root file system -
and without a root filesystem, how can you fix the problems?

The solution to these problems usually involves making use of a live
cd, or having other systems available to use to help fix the
configurations or repair the filesystems.  On my grid of plan 9
systems, I use a different approach.

I use a special kernel 9pcram to create an independent namespace at
boot, separate from the user's namespace which is built on top of the
venti to fossil to cpu chain.  The "service namespace" on each machine
is available by cpu to port 17020 and has tools to perform any needed
work like resetting a fossil or venti.  Each node of the grid runs the
service namespace underneath the conventional venti/fossil/cpu
namespace so while the userspace is a completely standard Plan 9
environment built from 3 machines, underneath this, each machine has
its own independent namespace, so there is never a forced reboot
because of the failure of another machine.  The user environment can
"break" if there are issues with venti/fossil/cpu connections, but the
user environment can be fixed without rebooting and you have the tools
on each machine to administer it self-sufficiently if the rest of the
grid is having issues.

ANTS also has additional tools built on top of this idea of
administering Venti+Fossil "from below".  Making optimal use of Fossil
+ Venti means you need to replicate data between ventis and preserve
fossil root scores, and it is helpful to save some metadata along with
them.  The service namespace on each machine means that the service
namespaces on the grid can make a copy of the user's root available
with a separate chain of hardware dependencies.

Having the admin/service namespace makes the user environment much
more robust, and it can be re-rooted to a new hardware copy of the
same data on the fly.  The separation of concerns between the user
namespace and the administrative namespace makes administration far
easier and not subject to disruptions that happen to userspace.

My grid uses venti+fossil+tcp cpus, and before I wrote the ANTS
software for myself, I found it hard to deal with any issues that
arose such as hardware failures or losses of network connectivity.  By
using the 9pcram kernel and ANTS software, I can keep working with my
data even as I reboot different nodes on the grid, which wasn't
possible for me before.  I no longer have problems with venti/fossil
corruption, because ANTS helps me with progressive duplication of data
and making my root filesystem available from multiple machines.

This was the initial main motivation that led me to write the software
that I am calling ANTS - finding a way to administer Venti+Fossil+tcp
cpu environment and make it more reliable against disruptions of any
kind.  The software does a lot more than "just help maintain
venti+fossil" and so when I talk about it, sometimes it is hard for me
to stay focused on the most basic and important practical purposes.

Making it so that you can administer and fix the dependencies of
venti/fossil/cpu more easily was the major practical problem which
ANTS was written to solve.

Ben Kidwell "mycroftiv"
(ants tutorials http://antfarm.9gridchan.org/tutorial)
(ants VM images and links http://9gridchan.org)
(ants software and documentation http://ants.9gridchan.org)
(ants software also served via 9p and ftp at ants.9gridchan.org)



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2013-03-12  6:42 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-12  6:42 [9fans] How to admin and repair venti/fossil? mycroftiv

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).