From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 Date: Fri, 26 Feb 2010 00:21:05 +0200 Message-ID: <8e04b5821002251421g6d02d319l47f716a4d40b8a08@mail.gmail.com> From: "Ciprian Dorin, Craciun" To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: text/plain; charset=UTF-8 Subject: [9fans] RFC: 9p file system with queue semantics (as in message queues) Topicbox-Message-UUID: db84038c-ead5-11e9-9d60-3106f5b1d025 Hello all! In what follows I would like to ask for your comments regarding a 9p file server that exports a file system with a (message) queue semantics. (My major interest here is more about the actual semantic itself, and less about the implementation details. But all comments are welcomed :) .) To keep things short I shall prepare a small description of my proposal (for those with not too much time), and also a longer one (with more details for the patient ones). ---------------------------------------- Short version ---------------------------------------- [Motivation] I want to obtain a message queue like IPC for (distributed) applications, where it is not possible (or not wanted) to implement / use an existing queueing library / implementation (like JMS or AMQP based) (but where using the file system is a trivial operation). Possible target languages: Bash, Tcl, Lua, even Python or C. Also there could be legacy (or maybe just old) applications that already use the file system like an IPC mechanism and which could be just slightly updated to use a queue. Possible applications: anything related with SMTP, just think about how qmail or Postfix works. [Solution] Implementing a 9p file server that exports a file system with the following structure: / (this / is actually relative to the mount point) --> queues --> --> enqueue -- a folder --> dequeue -- a folder --> commit -- folder :) --> rollback -- still a folder Possible operations (I'm assuming we use it via shell scripting, and the commands found on most UNIX-es): * queue access (creation if not existing, or just "opening" it): mkdir /queues/to-smtp-gateway * enqueue operation: cp /.../path-to-my-email-file /queues/to-smtp-gateway/enqueue/email-192832.eml # instead of cp I could just create and edit the file in that folder touch /queues/to-smtp-gateway/commit/email-192832.eml * dequeue operation: touch /queues/from-pop-server/dequeue/email-9283828.eml # if there is no data in the queue the touch operation fails # do something with the file like reading it or copying it touch /queues/from-pop-server/commit/email-9283828.eml * rollback: in any case just touch the same file name inside the rollback folder and the entire operation is rollbacked End of short version. :) Comments? (Or go for the extended version.) ---------------------------------------- Extended version ---------------------------------------- [Motivation -- extended] My real motives are in fact somehow different: for the moment I'm working at a university, and here we have a large (by our standards, but I'm betting small by your standards) cluster of Linux servers. Sometimes I have to run some independent simulations or jobs on (parts) of this cluster. So my possible solutions are: * Condor, Slurm, or any other true-and-tried queueing system -- the problems with them are that most are big (as in heavy) solutions, which need a stable environment, are tedious to install, and need a lot of care; (also they need root access to install and operate...) * Globus (either pre WS or g-Lite): I don't even want to enter into this :) :) it just scares me... :) :) * XCPU -- I'm aware of XCPU, but it needs me to push tasks onto the worker nodes... (it could be used for the execution of the jobs in my queue;) * SSH + dtach -- my current solution -- I distribute an equal number of job files to the servers, and then I just run a couple of processes that try to grab a job file and execute it; (the problem is that the job assignment is static and if one worker nodes finishes early it just idles;) What I would like to have: * (on the submitter) just copy the job files in a folder on my workstation (laptop) and that's all; * (on the worker nodes) just try to acquire a file from a folder, execute it and write back the result to another folder; [Features] What should the queue file system support: * transactional processing of individual enqueues / dequeues: as seen from my short description I want to be able to obtain the data file (in case of dequeue), read it (maybe multiple times, as in open / read / close, again open / read / close, etc.) and only when I'm done processing it, I want to tell the system to commit the dequeue operation; * transactional processing of multiple related enqueues / dequeues: just think about an application that acts like a pipeline: it dequeues a task, executes it and enqueues it for further processing (to another queue); now the dequeue of the original message and the enqueue of the processed message should be atomic; (this is of course extended to multiple enqueues / dequeues from multiple queues); * (maybe) tagging a messages with some meta-data, and allowing me to dequeue only those messages that are tagged in a certain way (think of pattern matching in Clips or Prolog); (this allows me to match two related messages from two different queues, wait until I have both of them and to process them as one (like a join in a workflow)); (this could allow me to implement something like map-reduce, if one process chooses to dequeue all messages tagged in a certain way); * any other ideas? :) [Semantics] The semantics for the first feature set I've described in the short explanation so I don't repeat them again here. For the multi-operation / multi-queue semantics I would propose something like this: / (root) --> queues -- the same like before (only one operation is transactional) --> transactions --> --> queues --> --> enqueue --> dequeue --> commit -- only to allow applications to work unmodified under the new transaction semantics --> rollback --> commit -- overall transactions commit folder --> rollback -- likewise How these transactions work is simple: just `mkdir` a transaction folder inside `transactions`. Then `mkdir` those queues that we want to access. Then when rolling-back or commiting, just touch a file named exactly like the transaction inside the `commit` or `rollback` folder. Now about the names for transactions or enqueue / dequeue files: I would have proposed UUIDs (and impose these names), as this would reduce the likelihood of name clashes. Also because we have a central 9p file server that exports this file system, there are two possible ways to "attach" the file system to a node: * each client when attaches the same file system (`aname`), it obtains a fresh view that is not shared with any other client; (thus one node can't interfere with another one's transaction); * each client attaches the same `aname` and obtains a consolidated view of all the other operations going on in the cluster; (we could obtain thus distributed transactions;) (something like what was obtained inside `/proc` inside a Beowulf cluster -- as I understood from the XCPU and Beowulf papers;) ---------------------------------------- Technical (as in implementation) details ---------------------------------------- I already have the operations implemented in Python (in an OOP fashion) (both individual transactions and multi-operation / multi-queue operations thanks to BerkeleyDB). I've already managed to export and test a (local) file system based on Fuse (but with a slightly different way to obtain the semantics, and only for the individual operation transactions.) About the 9p protocol, I've already implemented the protocol (decoding messages from the client -> server, and encoding messages from the server -> client, thus the server side) in Python, and I've exposed it to the network with the help of Twisted framework. (The message decoder / encoder, OOP entities that embody / hide the 9p file system semantics, and Twisted protocol and factory are all decoupled and can be reused independently.) If this works nice I'm thinking to moving (if time allows) to RabbitMQ (obtaining now distributed queues), and Erlang (better performance from the network part of the project). Another direction would be to stick to BerkelyDB and add support for it's key / value tables (as BTrees or hash tables). Any comments or observations about my technical choices? ---------- Finally the end :) :) I hope I haven't missed anything. And I also hope that at least someone has reached this phrase :). Thanks all of you that have devoted time to my email, Ciprian Craciun.