From mboxrd@z Thu Jan  1 00:00:00 1970
MIME-Version: 1.0
Date: Fri, 26 Feb 2010 00:21:05 +0200
Message-ID: <8e04b5821002251421g6d02d319l47f716a4d40b8a08@mail.gmail.com>
From: "Ciprian Dorin, Craciun" <ciprian.craciun@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Content-Type: text/plain; charset=UTF-8
Subject: [9fans] RFC: 9p file system with queue semantics (as in message
	queues)
Topicbox-Message-UUID: db84038c-ead5-11e9-9d60-3106f5b1d025

    Hello all!

    In what follows I would like to ask for your comments regarding a
9p file server that exports a file system with a (message) queue
semantics. (My major interest here is more about the actual semantic
itself, and less about the implementation details. But all comments
are welcomed :) .)

    To keep things short I shall prepare a small description of my
proposal (for those with not too much time), and also a longer one
(with more details for the patient ones).


----------------------------------------
    Short version
----------------------------------------

    [Motivation] I want to obtain a message queue like IPC for
(distributed) applications, where it is not possible (or not wanted)
to implement / use an existing queueing library / implementation (like
JMS or AMQP based) (but where using the file system is a trivial
operation). Possible target languages: Bash, Tcl, Lua, even Python or
C.

    Also there could be legacy (or maybe just old) applications that
already use the file system like an IPC mechanism and which could be
just slightly updated to use a queue. Possible applications: anything
related with SMTP, just think about how qmail or Postfix works.

    [Solution] Implementing a 9p file server that exports a file
system with the following structure:
    / (this / is actually relative to the mount point)
    --> queues
        --> <queue-name>
            --> enqueue -- a folder
            --> dequeue -- a folder
            --> commit -- folder :)
            --> rollback -- still a folder

    Possible operations (I'm assuming we use it via shell scripting,
and the commands found on most UNIX-es):
    * queue access (creation if not existing, or just "opening" it):
        mkdir /queues/to-smtp-gateway
    * enqueue operation:
        cp    /.../path-to-my-email-file
/queues/to-smtp-gateway/enqueue/email-192832.eml
        # instead of cp I could just create and edit the file in that folder
        touch /queues/to-smtp-gateway/commit/email-192832.eml
    * dequeue operation:
        touch /queues/from-pop-server/dequeue/email-9283828.eml
        # if there is no data in the queue the touch operation fails
        # do something with the file like reading it or copying it
        touch /queues/from-pop-server/commit/email-9283828.eml
    * rollback:
        in any case just touch the same file name inside the rollback
folder and the entire operation is rollbacked

    End of short version. :)
    Comments? (Or go for the extended version.)


----------------------------------------
    Extended version
----------------------------------------

    [Motivation -- extended] My real motives are in fact somehow
different: for the moment I'm working at a university, and here we
have a large (by our standards, but I'm betting small by your
standards) cluster of Linux servers. Sometimes I have to run some
independent simulations or jobs on (parts) of this cluster. So my
possible solutions are:
    * Condor, Slurm, or any other true-and-tried queueing system --
the problems with them are that most are big (as in heavy) solutions,
which need a stable environment, are tedious to install, and need a
lot of care; (also they need root access to install and operate...)
    * Globus (either pre WS or g-Lite): I don't even want to enter
into this :) :) it just scares me... :) :)
    * XCPU -- I'm aware of XCPU, but it needs me to push tasks onto
the worker nodes... (it could be used for the execution of the jobs in
my queue;)
    * SSH + dtach -- my current solution -- I distribute an equal
number of job files to the servers, and then I just run a couple of
processes that try to grab a job file and execute it; (the problem is
that the job assignment is static and if one worker nodes finishes
early it just idles;)

    What I would like to have:
    * (on the submitter) just copy the job files in a folder on my
workstation (laptop) and that's all;
    * (on the worker nodes) just try to acquire a file from a folder,
execute it and write back the result to another folder;

    [Features] What should the queue file system support:
    * transactional processing of individual enqueues / dequeues: as
seen from my short description I want to be able to obtain the data
file (in case of dequeue), read it (maybe multiple times, as in open /
read / close, again open / read / close, etc.) and only when I'm done
processing it, I want to tell the system to commit the dequeue
operation;
    * transactional processing of multiple related enqueues /
dequeues: just think about an application that acts like a pipeline:
it dequeues a task, executes it and enqueues it for further processing
(to another queue); now the dequeue of the original message and the
enqueue of the processed message should be atomic; (this is of course
extended to multiple enqueues / dequeues from multiple queues);
    * (maybe) tagging a messages with some meta-data, and allowing me
to dequeue only those messages that are tagged in a certain way (think
of pattern matching in Clips or Prolog); (this allows me to match two
related messages from two different queues, wait until I have both of
them and to process them as one (like a join in a workflow)); (this
could allow me to implement something like map-reduce, if one process
chooses to dequeue all messages tagged in a certain way);
   * any other ideas? :)

    [Semantics] The semantics for the first feature set I've described
in the short explanation so I don't repeat them again here.

    For the multi-operation / multi-queue semantics I would propose
something like this:
    / (root)
    --> queues -- the same like before (only one operation is transactional)
    --> transactions
        --> <transaction-id>
            --> queues
                --> <queue-name>
                    --> enqueue
                    --> dequeue
                    --> commit -- only to allow applications to work
unmodified under the new transaction semantics
                    --> rollback
        --> commit -- overall transactions commit folder
        --> rollback -- likewise

    How these transactions work is simple: just `mkdir` a transaction
folder inside `transactions`. Then `mkdir` those queues that we want
to access. Then when rolling-back or commiting, just touch a file
named exactly like the transaction inside the `commit` or `rollback`
folder.

    Now about the names for transactions or enqueue / dequeue files: I
would have proposed UUIDs (and impose these names), as this would
reduce the likelihood of name clashes.

    Also because we have a central 9p file server that exports this
file system, there are two possible ways to "attach" the file system
to a node:
    * each client when attaches the same file system (`aname`), it
obtains a fresh view that is not shared with any other client; (thus
one node can't interfere with another one's transaction);
    * each client attaches the same `aname` and obtains a consolidated
view of all the other operations going on in the cluster; (we could
obtain thus distributed transactions;) (something like what was
obtained inside `/proc` inside a Beowulf cluster -- as I understood
from the XCPU and Beowulf papers;)


----------------------------------------
    Technical (as in implementation) details
----------------------------------------

    I already have the operations implemented in Python (in an OOP
fashion) (both individual transactions and multi-operation /
multi-queue operations thanks to BerkeleyDB). I've already managed to
export and test a (local) file system based on Fuse (but with a
slightly different way to obtain the semantics, and only for the
individual operation transactions.)

    About the 9p protocol, I've already implemented the protocol
(decoding messages from the client -> server, and encoding messages
from the server -> client, thus the server side) in Python, and I've
exposed it to the network with the help of Twisted framework. (The
message decoder / encoder, OOP entities that embody / hide the 9p file
system semantics, and Twisted protocol and factory are all decoupled
and can be reused independently.)

    If this works nice I'm thinking to moving (if time allows) to
RabbitMQ (obtaining now distributed queues), and Erlang (better
performance from the network part of the project). Another direction
would be to stick to BerkelyDB and add support for it's key / value
tables (as BTrees or hash tables).

    Any comments or observations about my technical choices?


---------- Finally the end :) :)


    I hope I haven't missed anything. And I also hope that at least
someone has reached this phrase :).

    Thanks all of you that have devoted time to my email,
    Ciprian Craciun.