supervision - discussion about system services, daemon supervision, init, runlevel management, and tools such as s6 and runit
 help / color / mirror / Atom feed
* new "sv status" flags and exit-tracking patch, and misc.
@ 2005-09-17  4:46 Charles Duffy
  2005-09-17  5:17 ` Mike Bell
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Charles Duffy @ 2005-09-17  4:46 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 4818 bytes --]

First, a brief glance at the results of this patch:

$ ~/runit-1.3.1.new/src/sv status -prf -s -w /local/ccd/runit-test/*
/local/ccd/runit-test/foo 13274 exit:0 exit:2 finish up
/local/ccd/runit-test/bar - signal:15 none:0 down down

$ ~/runit-1.3.1.new/src/runsvstat /local/ccd/runit-test/*
/local/ccd/runit-test/bar: run (pid 16819) 6 seconds
/local/ccd/runit-test/foo: run (pid 16825) 4 seconds, last finish exit 
status 2

$ ~/runit-1.3.1.new/src/sv status /local/ccd/runit-test/*
/local/ccd/runit-test/bar: run (pid 16884) 3 seconds; log: run (pid 
16796) 73 seconds
/local/ccd/runit-test/foo: run (pid 16883) 3 seconds, last finish exit 
status 2

(note that elements of the output format have been unified between "sv 
status" and "runsvstat"; this is because they no longer have two 
different chunks of code to do the same thing).


Now, the bad news:

This is not a small patch. While acquainting myself with runit's 
internals, I found their coding style such as to substantially reduce 
maintainability, and consequently did some refactoring on my way. 
There's still much to be done -- a great many of the items touched on in 
the rant pasted below are still issues, or are issues which have been 
addressed only in small portions of the code -- and frankly, I can 
appreciate why there would be some legitimate hesitation to apply a 
patch of this size, even if it *is* agreed that the changes it makes 
(beyond that for which it was begun) are necessary and proper.

That said, the patch is attached... and the rant is below. Please take 
it with a grain of salt -- while I *am* a bit disappointed in the 
quality of runit's code, it's still an excellent piece of software when 
one is using, as opposed to reading, it.

Oh, and all that said -- this patch needs testing before it goes into 
production. I've given it some attention myself, and will be handing it 
over to my QA department tomorrow (though there's no knowing when 
they'll actually have time to review it), but in any event more testing 
and desk-checking is certainly a Good Thing.

-----------------------------------

   Things This Patch Does:
- Implements the previously-discussed flags to "sv status", tacking 4 
bytes on to the end of the status file and updating some code to 
actually spit out the relevant information where useful.
- Includes additional headers (and changes some types) to clean up warnings
- Moves much duplicated code into a new "svdir.c"; eliminates duplicate 
parsing code, status->string generation code; etc. [Not to say that 
duplicate parsing code is -completely- eliminated; just from the places 
I touched. Need to go back and clean up the rest of it at some point].
- Widely uses "struct svdir" for storing status information, such that 
subprograms don't (need to) go picking around in status array.
- Lets programs register their name to be printed with warnings (such 
that WARNING, ERROR and friends don't need to be redefined for each 
program and library methods can print errors which contain the current 
program's name)

   Things This Patch Doesn't Do:
- Update the documentation
- (?)

-----------------------------------

   The Rant (or, a bunch of questions I came up with while writing this 
thing):

Why does runit use its own env_get() method rather than getenv()? [This 
is just one example; there are *lots* of places where C library calls 
are eschewed in favor of locally-implemented bits].

Why independently reimplement code to parse a status file FIVE SEPARATE 
TIMES (once each in runsv.c, runsvstat.c, sv.c, svwaitdown.c and 
svwaitup.c) rather than simply write a single library method and use that?

Why define the same error methods and macros at the top of each file (as 
with fail/failx/warn/warnx/etc)?

Why do sv.c and svstat.c both independently include code to build a 
string describing a service's status?

Why the huge number of distinct error handling functions and macros with 
large numbers of parameters instead of just one or two functions using 
varargs? Why the structures with members named "x", "y" and "z" with no 
documentation anywhere describing what they are or do?

Why the makefile that never heard of automatic dependency resolution or 
implicit rules?

Why the obscure string constants? The one-letter variable names with no 
comments describing them?

If one expects to get 3rd-party involvement in an open source project, 
it helps to actually have code that's of sufficient quality as to be 
*fun to work on*.


Oh, and finally: Mixing tabs and whitespace is *abominable*. All tabs is 
good (so folks can have the amount of whitespace they want just by 
setting their editor, without needing to change the files). All 
whitespace is acceptable. Mixing tabs and whitespace, on the other hand, 
make an utter and complete *mess*. Bad! Evil!

[-- Attachment #2: runit-1.3.1.new.patch --]
[-- Type: text/x-patch, Size: 39763 bytes --]

diff -rNu runit-1.3.1/src/chkshsgr.c runit-1.3.1.new/src/chkshsgr.c
--- runit-1.3.1/src/chkshsgr.c	2005-08-24 15:14:39.770106475 -0500
+++ runit-1.3.1.new/src/chkshsgr.c	2005-09-16 21:45:20.584476322 -0500
@@ -1,10 +1,12 @@
 /* Public domain. */
 
+#include <sys/types.h>
 #include <unistd.h>
+#include <grp.h>
 
 int main()
 {
-  short x[4];
+  gid_t x[4];
 
   x[0] = x[1] = 0;
   if (getgroups(1,x) == 0) if (setgroups(1,x) == -1) _exit(1);
diff -rNu runit-1.3.1/src/chpst.c runit-1.3.1.new/src/chpst.c
--- runit-1.3.1/src/chpst.c	2005-08-24 15:14:39.353778987 -0500
+++ runit-1.3.1.new/src/chpst.c	2005-09-16 21:47:24.932648009 -0500
@@ -1,8 +1,10 @@
 #include <sys/types.h>
-#include <time.h>
 #include <sys/time.h>
 #include <sys/resource.h>
+#include <grp.h>
+#include <time.h>
 #include <unistd.h>
+
 #include "sgetopt.h"
 #include "error.h"
 #include "strerr.h"
@@ -21,8 +23,6 @@
 #include "direntry.h"
 
 #define USAGE_MAIN " [-vP012] [-u user[:group]] [-U user[:group]] [-e dir] [-/ root] [-n nice] [-l|-L lock] [-m n] [-o n] [-p n] [-f n] [-c n] prog"
-#define FATAL "chpst: fatal: "
-#define WARNING "chpst: warning: "
 
 const char *progname;
 static stralloc sa;
@@ -258,6 +258,7 @@
   int i;
   unsigned long ul;
 
+  strerr_setname("chpst");
   progname =argv[0];
   for (i =str_len(progname); i; --i)
     if (progname[i -1] == '/') {
diff -rNu runit-1.3.1/src/conf-cc runit-1.3.1.new/src/conf-cc
--- runit-1.3.1/src/conf-cc	2005-08-24 15:14:39.425467844 -0500
+++ runit-1.3.1.new/src/conf-cc	2005-09-16 23:16:31.660554653 -0500
@@ -1,5 +1,7 @@
 gcc -O2 -Wall
 
+gcc -O2 -Wall -ggdb
+
 gcc -O2 -Wimplicit -Wunused -Wcomment -Wchar-subscripts -Wuninitialized -Wshadow -Wcast-qual -Wcast-align -Wwrite-strings
 
 This will be used to compile .c files.
diff -rNu runit-1.3.1/src/conf-ld runit-1.3.1.new/src/conf-ld
--- runit-1.3.1/src/conf-ld	2005-08-24 15:14:39.821970387 -0500
+++ runit-1.3.1.new/src/conf-ld	2005-09-16 23:16:46.620503222 -0500
@@ -1,3 +1,5 @@
 gcc -s
 
+gcc
+
 This will be used to link .o files into an executable.
diff -rNu runit-1.3.1/src/Makefile runit-1.3.1.new/src/Makefile
--- runit-1.3.1/src/Makefile	2005-08-24 15:14:39.993823320 -0500
+++ runit-1.3.1.new/src/Makefile	2005-09-16 20:10:25.351111903 -0500
@@ -12,20 +12,20 @@
 runit-init: load runit-init.o unix.a byte.a
 	./load runit-init unix.a byte.a -static
 
-runsv: load runsv.o unix.a byte.a time.a
-	./load runsv unix.a byte.a time.a
+runsv: load runsv.o svdir.o unix.a byte.a time.a
+	./load runsv svdir.o unix.a byte.a time.a
 
 runsvdir: load runsvdir.o unix.a byte.a time.a
 	./load runsvdir unix.a byte.a time.a
 
-runsvstat: load runsvstat.o unix.a byte.a time.a
-	./load runsvstat unix.a byte.a time.a
+runsvstat: load runsvstat.o svdir.o unix.a byte.a time.a
+	./load runsvstat svdir.o unix.a byte.a time.a
 
 runsvctrl: load runsvctrl.o unix.a byte.a
 	./load runsvctrl unix.a byte.a
 
-sv: load sv.o unix.a byte.a time.a
-	./load sv unix.a byte.a time.a
+sv: load sv.o svdir.o unix.a byte.a time.a
+	./load sv svdir.o unix.a byte.a time.a
 
 svwaitup: load svwaitup.o unix.a byte.a time.a
 	./load svwaitup unix.a byte.a time.a
@@ -58,6 +58,9 @@
 runsvdir.o: compile sysdeps runsvdir.c
 	./compile runsvdir.c
 
+svdir.o: compile sysdeps svdir.c svdir.h taia.h
+	./compile svdir.c
+
 runsvstat.o: compile sysdeps runsvstat.c
 	./compile runsvstat.c
 
@@ -424,6 +427,9 @@
 taia_pack.o: compile tai.h taia.h taia_pack.c uint64.h
 	./compile taia_pack.c
 
+taia_unpack.o: compile tai.h taia.h taia_unpack.c uint64.h
+	./compile taia_unpack.c
+
 taia_sub.o: compile tai.h taia.h taia_sub.c uint64.h
 	./compile taia_sub.c
 
@@ -432,10 +438,10 @@
 
 time.a: iopause.o makelib tai_now.o tai_pack.o tai_sub.o tai_unpack.o \
 taia_add.o taia_approx.o taia_frac.o taia_less.o taia_now.o \
-taia_pack.o taia_sub.o taia_uint.o
+taia_unpack.o taia_pack.o taia_sub.o taia_uint.o
 	./makelib time.a iopause.o tai_now.o tai_pack.o tai_sub.o \
 	tai_unpack.o taia_add.o taia_approx.o taia_frac.o taia_less.o \
-	taia_now.o taia_pack.o taia_sub.o taia_uint.o
+	taia_now.o taia_unpack.o taia_pack.o taia_sub.o taia_uint.o
 
 uint64.h: choose compile load tryulong64.c uint64.h1 uint64.h2
 	./choose clr tryulong64 uint64.h1 uint64.h2 > uint64.h
diff -rNu runit-1.3.1/src/pathexec_run.c runit-1.3.1.new/src/pathexec_run.c
--- runit-1.3.1/src/pathexec_run.c	2005-08-24 15:14:39.870476461 -0500
+++ runit-1.3.1.new/src/pathexec_run.c	2005-09-16 21:55:26.015270373 -0500
@@ -1,5 +1,6 @@
 /* Public domain. */
 
+#include <unistd.h>
 #include "error.h"
 #include "stralloc.h"
 #include "str.h"
diff -rNu runit-1.3.1/src/prot.c runit-1.3.1.new/src/prot.c
--- runit-1.3.1/src/prot.c	2005-08-24 15:14:39.854196578 -0500
+++ runit-1.3.1.new/src/prot.c	2005-09-16 21:48:42.561996420 -0500
@@ -1,5 +1,9 @@
 /* Public domain. */
 
+#include <sys/types.h>
+#include <unistd.h>
+#include <grp.h>
+
 #include "hasshsgr.h"
 #include "prot.h"
 
diff -rNu runit-1.3.1/src/runit.c runit-1.3.1.new/src/runit.c
--- runit-1.3.1/src/runit.c	2005-08-24 15:14:39.862091897 -0500
+++ runit-1.3.1.new/src/runit.c	2005-09-16 23:09:15.353778987 -0500
@@ -18,10 +18,6 @@
 
 /* #define DEBUG */
 
-#define INFO "- runit: "
-#define WARNING "- runit: warning: "
-#define FATAL "- runit: fatal: "
-
 const char * const stage[3] ={
   "/etc/runit/1",
   "/etc/runit/2",
@@ -54,6 +50,8 @@
   int ttyfd;
   struct stat s;
 
+  strerr_setname("- runit");
+
   if (getpid() != 1) strerr_die2x(111, FATAL, "must be run as process no 1.");
   setsid();
 
diff -rNu runit-1.3.1/src/runit-init.c runit-1.3.1.new/src/runit-init.c
--- runit-1.3.1/src/runit-init.c	2005-08-24 15:14:39.871740354 -0500
+++ runit-1.3.1.new/src/runit-init.c	2005-09-16 21:41:08.923269132 -0500
@@ -9,8 +9,6 @@
 #include "error.h"
 
 #define USAGE " 0|6"
-#define FATAL "init: fatal: "
-/* #define WARNING "init: warning: " */
 
 const char *progname;
 
@@ -44,6 +42,7 @@
 int main (int argc, const char * const *argv, char * const *envp) {
   const char *prog[2];
 
+  strerr_setname("init");
   progname =*argv++;
 
   if (getpid() == 1) {
diff -rNu runit-1.3.1/src/runsv.c runit-1.3.1.new/src/runsv.c
--- runit-1.3.1/src/runsv.c	2005-08-24 15:14:39.585119477 -0500
+++ runit-1.3.1.new/src/runsv.c	2005-09-16 23:00:24.038381057 -0500
@@ -1,4 +1,5 @@
 #include <sys/types.h>
+#include <sys/wait.h>
 #include <sys/stat.h>
 #include <unistd.h>
 #include <stdio.h>
@@ -20,6 +21,8 @@
 #include "fmt.h"
 #include "byte.h"
 
+#include "svdir.h"
+
 #define USAGE " dir"
 
 #define VERSION "$Id: runsv.c,v 1.24 2005/08/23 22:36:28 pape Exp $"
@@ -27,30 +30,6 @@
 char *progname;
 int selfpipe[2];
 
-/* state */
-#define S_DOWN 0
-#define S_RUN 1
-#define S_FINISH 2
-/* ctrl */
-#define C_NOOP 0
-#define C_TERM 1
-#define C_PAUSE 2
-/* want */
-#define W_UP 0
-#define W_DOWN 1
-#define W_EXIT 2
-
-struct svdir {
-  int pid;
-  int state;
-  int ctrl;
-  int want;
-  struct taia start;
-  int fdlock;
-  int fdcontrol;
-  int fdcontrolwrite;
-  int islog;
-};
 struct svdir svd[2];
 
 int sigterm =0;
@@ -86,7 +65,7 @@
 void update_status(struct svdir *s) {
   unsigned long l;
   int fd;
-  char status[20];
+  char status[SVDIRSIZE_FULL];
   char bspace[64];
   buffer b;
   char spid[FMT_ULONG];
@@ -161,25 +140,8 @@
   }
 
   /* supervise compatibility */
-  taia_pack(status, &s->start);
-  l =(unsigned long)s->pid;
-  status[12] =l; l >>=8;
-  status[13] =l; l >>=8;
-  status[14] =l; l >>=8;
-  status[15] =l;
-  if (s->ctrl & C_PAUSE)
-    status[16] =1;
-  else
-    status[16] =0;
-  if (s->want == W_UP)
-    status[17] ='u';
-  else
-    status[17] ='d';
-  if (s->ctrl & C_TERM)
-    status[18] =1;
-  else
-    status[18] =0;
-  status[19] =s->state;
+  build_status(s, status);
+
   if ((fd =open_trunc("supervise/status.new")) == -1) {
     warn("unable to open supervise/status.new");
     return;
@@ -399,6 +361,10 @@
   svd[0].ctrl =C_NOOP;
   svd[0].want =W_UP;
   svd[0].islog =0;
+  svd[0].runexit_type =0;
+  svd[0].runexit_val =0;
+  svd[0].finexit_type =0;
+  svd[0].finexit_val =0;
   svd[1].pid =0;
   taia_now(&svd[0].start);
   if (stat("down", &s) != -1) svd[0].want =W_DOWN;
@@ -416,6 +382,10 @@
       svd[1].ctrl =C_NOOP;
       svd[1].want =W_UP;
       svd[1].islog =1;
+      svd[1].runexit_type =0;
+      svd[1].runexit_val =0;
+      svd[1].finexit_type =0;
+      svd[1].finexit_val =0;
       taia_now(&svd[1].start);
       if (stat("log/down", &s) != -1)
         svd[1].want =W_DOWN;
@@ -540,13 +510,39 @@
         svd[0].pid =0;
         pidchanged =1;
         svd[0].ctrl &=~C_TERM;
-        if (svd[0].state != S_FINISH)
+        if (svd[0].state != S_FINISH) {
+          if (WIFEXITED(wstat)) {
+            svd[0].runexit_type = ET_NORMAL;
+            svd[0].runexit_val = WEXITSTATUS(wstat);
+          } else if (WIFSIGNALED(wstat)) {
+            svd[0].runexit_type = ET_SIGNAL;
+            svd[0].runexit_val = WTERMSIG(wstat);
+          } else if(WIFSTOPPED(wstat)) {
+            /* do nothing */
+          } else {
+            svd[0].runexit_type = ET_UNKNOWN;
+            svd[0].runexit_val = 0;
+          }
           if ((fd =open_read("finish")) != -1) {
             close(fd);
             svd[0].state =S_FINISH;
             update_status(&svd[0]);
             continue;
           }
+        } else {
+          if (WIFEXITED(wstat)) {
+            svd[0].finexit_type = ET_NORMAL;
+            svd[0].finexit_val = WEXITSTATUS(wstat);
+          } else if (WIFSIGNALED(wstat)) {
+            svd[0].finexit_type = ET_SIGNAL;
+            svd[0].finexit_val = WTERMSIG(wstat);
+          } else if(WIFSTOPPED(wstat)) {
+            /* do nothing */
+          } else {
+            svd[0].finexit_type = ET_UNKNOWN;
+            svd[0].finexit_val = 0;
+          }
+        }
         svd[0].state =S_DOWN;
         taia_uint(&deadline, 1);
         taia_add(&deadline, &svd[0].start, &deadline);
diff -rNu runit-1.3.1/src/runsvctrl.c runit-1.3.1.new/src/runsvctrl.c
--- runit-1.3.1/src/runsvctrl.c	2005-08-24 15:14:39.257944026 -0500
+++ runit-1.3.1.new/src/runsvctrl.c	2005-09-16 21:36:07.687127490 -0500
@@ -7,9 +7,6 @@
 
 #define VERSION "$Id: runsvctrl.c,v 1.9 2005/07/11 10:13:53 pape Exp $"
 
-#define FATAL "runsvctrl: fatal: "
-#define WARNING "runsvctrl: warning: "
-
 char *progname;
 unsigned int rc =0;
 
@@ -47,6 +44,7 @@
   int curdir;
   char c;
 
+  strerr_setname("runsvctrl");
   progname =*argv++;
 
   if (! argv || ! *argv) usage();
diff -rNu runit-1.3.1/src/runsvstat.c runit-1.3.1.new/src/runsvstat.c
--- runit-1.3.1/src/runsvstat.c	2005-08-24 15:14:39.653043632 -0500
+++ runit-1.3.1.new/src/runsvstat.c	2005-09-16 22:21:47.375425141 -0500
@@ -8,14 +8,12 @@
 #include "buffer.h"
 #include "tai.h"
 #include "fmt.h"
+#include "svdir.h"
 
 #define USAGE " [ -l ] service ..."
 
 #define VERSION "$Id: runsvstat.c,v 1.8 2005/07/11 10:13:53 pape Exp $"
 
-#define FATAL "runsvstat: fatal: "
-#define WARNING "runsvstat: warning: "
-
 const char *progname;
 unsigned int rc =0;
 struct stat s;
@@ -23,85 +21,27 @@
 
 void usage() { strerr_die4x(1, "usage: ", progname, USAGE, "\n"); }
 
-void fatal(char *m1) { strerr_die3sys(111, FATAL, m1, ": "); }
-void warn(char *m1, char *m2) {
+static inline void fatal(char *m1) { strerr_die3sys(111, FATAL, m1, ": "); }
+static inline void warn(char *m1, char *m2) {
   rc++;
   strerr_warn5(WARNING, m1, ": ", m2, ": ", &strerr_sys);
 }
-void warnx(char *m1, char *m2) {
+static inline void warnx(char *m1, char *m2) {
   rc++;
   strerr_warn4(WARNING, m1, ": ", m2, 0);
 }
 
 int show_status(char *name) {
-  char status[20];
-  int pid;
-  int fd;
-  int normallyup =0;
-  char sulong[FMT_ULONG];
-  struct tai when;
-  struct tai now;
-
-  if (stat("down", &s) == -1) {
-    if (errno != error_noent) {
-      warn(name, "unable to stat down");
-      return(-1);
-    }
-    normallyup = 1;
-  }
-  if ((fd =open_write("supervise/ok")) == -1) {
-    if (errno == error_nodevice)
-      warnx(name, "runsv not running.");
-    else
-      warn(name, "unable to open supervise/ok");
-    return(-1);
-  }
-  close(fd);
-  if ((fd =open_read("supervise/status")) == -1) {
-    warn(name, "unable to open supervise/status");
+  struct svdir svd;
+
+  if (read_status(name, &svd) <= 0) {
+    rc++;
     return(-1);
   }
-  switch(read(fd, status, 20)) {
-  case 20: break;
-  case -1:
-    warn(name, "unable to read supervise/status");
-    return(-1);
-  default:
-    warnx(name, "unable to read supervise/status: bad format.");
+  if (describe_status(name, &svd, buffer_1) == -1) {
+    rc++;
     return(-1);
   }
-  pid =(unsigned char) status[15];
-  pid <<=8; pid +=(unsigned char)status[14];
-  pid <<=8; pid +=(unsigned char)status[13];
-  pid <<=8; pid +=(unsigned char)status[12];
-
-  tai_unpack(status,&when);
-  tai_now(&now);
-  if (tai_less(&now,&when)) when =now;
-  tai_sub(&when,&now,&when);
-
-  buffer_puts(buffer_1, name);
-  buffer_puts(buffer_1, ": ");
-  if (pid) {
-    switch (status[19]) {
-    case 1: buffer_puts(buffer_1, "run "); break;
-    case 2: buffer_puts(buffer_1, "finish "); break;
-    }
-    buffer_puts(buffer_1, "(pid ");
-    buffer_put(buffer_1, sulong, fmt_ulong(sulong, pid));
-    buffer_puts(buffer_1, ") ");
-  }
-  else
-    buffer_puts(buffer_1, "down ");
-  buffer_put(buffer_1, sulong, fmt_ulong(sulong, tai_approx(&when)));
-  buffer_puts(buffer_1, " seconds");
-  if (pid && !normallyup) buffer_puts(buffer_1,", normally down");
-  if (!pid && normallyup) buffer_puts(buffer_1,", normally up");
-  if (pid && status[16]) buffer_puts(buffer_1,", paused");
-  if (!pid && (status[17] == 'u')) buffer_puts(buffer_1,", want up");
-  if (pid && (status[17] == 'd')) buffer_puts(buffer_1,", want down");
-  if (pid && status[18]) buffer_puts(buffer_1, ", got TERM");
-  /* buffer_putsflush(buffer_1, "\n"); */
   return(1);
 }
 
@@ -110,6 +50,7 @@
   int curdir;
   char **dir;
 
+  strerr_setname("runsvstat");
   progname =*argv;
 
   while ((opt =getopt(argc, (const char * const *)argv, "lV")) != opteof) {
diff -rNu runit-1.3.1/src/seek_set.c runit-1.3.1.new/src/seek_set.c
--- runit-1.3.1/src/seek_set.c	2005-08-24 15:14:39.983609511 -0500
+++ runit-1.3.1.new/src/seek_set.c	2005-09-16 21:49:13.423311095 -0500
@@ -1,6 +1,7 @@
 /* Public domain. */
 
 #include <sys/types.h>
+#include <unistd.h>
 #include "seek.h"
 
 #define SET 0 /* sigh */
diff -rNu runit-1.3.1/src/strerr_die.c runit-1.3.1.new/src/strerr_die.c
--- runit-1.3.1/src/strerr_die.c	2005-08-24 15:14:39.142788882 -0500
+++ runit-1.3.1.new/src/strerr_die.c	2005-09-16 15:14:15.401037721 -0500
@@ -7,6 +7,11 @@
 void strerr_warn(const char *x1,const char *x2,const char *x3,const char *x4,const char *x5,const char *x6,const struct strerr *se)
 {
   strerr_sysinit();
+
+  if(strerr_progname != NULL) {
+    buffer_puts(buffer_2, strerr_progname);
+    buffer_puts(buffer_2, ": ");
+  }
  
   if (x1) buffer_puts(buffer_2,x1);
   if (x2) buffer_puts(buffer_2,x2);
diff -rNu runit-1.3.1/src/strerr.h runit-1.3.1.new/src/strerr.h
--- runit-1.3.1/src/strerr.h	2005-08-24 15:14:39.250106243 -0500
+++ runit-1.3.1.new/src/strerr.h	2005-09-16 22:23:51.639258511 -0500
@@ -3,6 +3,14 @@
 #ifndef STRERR_H
 #define STRERR_H
 
+#define PAUSE "pausing: "
+#define INFO "info: "
+#define FAIL "fail: "
+#define FATAL "fatal: "
+#define WARNING "warning: "
+
+extern char* strerr_progname;
+
 struct strerr {
   struct strerr *who;
   const char *x;
@@ -12,6 +20,7 @@
 
 extern struct strerr strerr_sys;
 extern void strerr_sysinit(void);
+extern void strerr_setname(const char* progname);
 
 extern const char *strerr(const struct strerr *);
 extern void strerr_warn(const char *,const char *,const char *,const char *,const char *,const char *,const struct strerr *);
diff -rNu runit-1.3.1/src/strerr_sys.c runit-1.3.1.new/src/strerr_sys.c
--- runit-1.3.1/src/strerr_sys.c	2005-08-24 15:14:39.678678463 -0500
+++ runit-1.3.1.new/src/strerr_sys.c	2005-09-16 15:24:17.112585701 -0500
@@ -12,3 +12,9 @@
   strerr_sys.y = "";
   strerr_sys.z = "";
 }
+
+char* strerr_progname = 0;
+
+void strerr_setname(const char* progname) {
+  strerr_progname = (char*)progname;
+}
diff -rNu runit-1.3.1/src/sv.c runit-1.3.1.new/src/sv.c
--- runit-1.3.1/src/sv.c	2005-08-24 15:14:39.030052823 -0500
+++ runit-1.3.1.new/src/sv.c	2005-09-16 23:12:53.751212657 -0500
@@ -1,6 +1,7 @@
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <unistd.h>
+#include <string.h>
 #include "str.h"
 #include "strerr.h"
 #include "error.h"
@@ -13,6 +14,7 @@
 #include "tai.h"
 #include "taia.h"
 #include "wait.h"
+#include "svdir.h"
 
 #define USAGE " [-v] [-w sec] action service ..."
 #define USAGELSB " [-w sec] action"
@@ -49,7 +51,6 @@
 
 int curdir, fd, r;
 char svstatus[20];
-char sulong[FMT_ULONG];
 
 void usage() {
   if (!lsb) strerr_die4x(100, "usage: ", progname, USAGE, "\n");
@@ -63,6 +64,10 @@
   strerr_warn4(FATAL, m1, m2, ": ", &strerr_sys);
   done(lsb ? 151 : 100);
 }
+void fatal2x(char *m1, char *m2) {
+  strerr_warn3(FATAL, m1, m2, 0);
+  done(lsb ? 151 : 100);
+}
 void out(char *p, char *m1) {
   buffer_puts(buffer_1, p);
   buffer_puts(buffer_1, *service);
@@ -86,87 +91,158 @@
 void outs2(const char *s) { buffer_puts(buffer_2, s); }
 void flush2(const char *s) { outs2(s); buffer_flush(buffer_2); }
 
-int svstatus_get() {
-  if ((fd =open_write("supervise/ok")) == -1) {
-    if (errno == error_nodevice) {
-      *acts == 'x' ? ok("runsv not running") : failx("runsv not running");
-      return(0);
-    }
-    warn("unable to open supervise/ok");
-    return(-1);
-  }
-  close(fd);
-  if ((fd =open_read("supervise/status")) == -1) {
-    warn("unable to open supervise/status");
-    return(-1);
-  }
-  r =read(fd, svstatus, 20);
-  close(fd);
-  switch(r) {
-  case 20: break;
-  case -1: warn("unable to read supervise/status"); return(-1);
-  default: warnx("unable to read supervise/status: bad format"); return(-1);
-  }
-  return(1);
+typedef int (*flagprint_fn)(struct svdir*);
+typedef struct statusflag {
+  char flagchar;
+  flagprint_fn printfunc;
+} statusflag_t;
+
+#define MAX_FLAGPRINTFUNCS 16
+flagprint_fn flagprintfuncs[MAX_FLAGPRINTFUNCS] = { 0, };
+int flagprintfunccount = 0;  /* number of printfuncs currently requested */
+
+int printULong(unsigned long ulong) {
+  char sulong[FMT_ULONG];
+  memset(sulong, 0, FMT_ULONG);
+  fmt_ulong(sulong, ulong);
+  outs(sulong);
+  return 1;
 }
-unsigned int svstatus_print(char *m) {
-  int pid;
-  int normallyup =0;
-  struct stat s;
- 
-  if (stat("down", &s) == -1) {
-    if (errno != error_noent) {
-      outs2(WARN); outs2("unable to stat "); outs2(*service); outs2("/down: ");
-      outs2(error_str(errno)); flush2("\n");
-      return(0);
-    }
-    normallyup =1;
+
+int printExitData(int exit_type, int exit_val) {
+  switch(exit_type) {
+    case ET_NOEXIT:
+      outs("none:");
+      break;
+    case ET_NORMAL:
+      outs("exit:");
+      break;
+    case ET_SIGNAL:
+      outs("signal:");
+      break;
+    default:
+      outs("unknown:");
   }
-  pid =(unsigned char) svstatus[15];
-  pid <<=8; pid +=(unsigned char)svstatus[14];
-  pid <<=8; pid +=(unsigned char)svstatus[13];
-  pid <<=8; pid +=(unsigned char)svstatus[12];
-  tai_unpack(svstatus, &tstatus);
-  if (pid) {
-    switch (svstatus[19]) {
-    case 1: outs(RUN); break;
-    case 2: outs(FINISH); break;
-    }
-    outs(m); outs(": (pid "); sulong[fmt_ulong(sulong, pid)] =0;
-    outs(sulong); outs(") ");
+  printULong(exit_val);
+  return 1;
+}
+
+int printRunExit(struct svdir *service) {
+  return printExitData(service->runexit_type, service->runexit_val);
+}
+
+int printFinExit(struct svdir *service) {
+  return printExitData(service->finexit_type, service->finexit_val);
+}
+
+int printPID(struct svdir* service) {
+  if(service->pid)
+    printULong(service->pid);
+  else
+    outs("-");
+  return 1;
+}
+
+int printCurrentState(struct svdir* service) {
+  switch(service->state) {
+    case S_DOWN:
+      outs("down");
+      break;
+    case S_RUN:
+      outs("run");
+      break;
+    case S_FINISH:
+      outs("finish");
+      break;
+    default:
+      outs("error");
+      return -1;
   }
-  else {
-    outs(DOWN); outs(m); outs(": ");
+  return 1;
+}
+
+int printWantState(struct svdir* service) {
+  switch(service->want) {
+    case W_UP:
+      outs("up");
+      break;
+    case W_DOWN:
+      outs("down");
+      break;
+    case W_EXIT:
+      outs("exit");
+      break;
+    default:
+      outs("error");
+      return -1;
   }
-  buffer_put(buffer_1, sulong,
-    fmt_ulong(sulong, tnow.sec.x < tstatus.x ? 0 : tnow.sec.x -tstatus.x));
-  outs("s");
-  if (pid && !normallyup) outs(", normally down");
-  if (!pid && normallyup) outs(", normally up");
-  if (pid && svstatus[16]) outs(", paused");
-  if (!pid && (svstatus[17] == 'u')) outs(", want up");
-  if (pid && (svstatus[17] == 'd')) outs(", want down");
-  if (pid && svstatus[18]) outs(", got TERM");
-  return(pid ? 1 : 2);
+  return 1;
+}
+
+int printUptime(struct svdir* service) {
+  printULong(seconds_uptime(service));
+  return 1;
 }
+
+const statusflag_t statusflags[] = {
+  {'r', printRunExit},
+  {'f', printFinExit},
+  {'p', printPID},
+  {'s', printCurrentState},
+  {'w', printWantState},
+  {'u', printUptime},
+  {0, 0},
+};
+
+/* TODO: LSB-compatible return values */
 int status(char *unused) {
-  r =svstatus_get();
-  switch(r) { case -1: if (lsb) done(4); case 0: return(0); }
-  r =svstatus_print(*service);
-  if (chdir("log") == -1) {
-    if (errno != error_noent) {
-      outs("; log: "); outs(WARN);
-      outs("unable to change to log service directory: ");
-      outs(error_str(errno));
-    }
+  struct svdir svd;
+  int retval;
+
+  retval = read_status(*service, &svd);
+  if(retval == 0) {
+    /* runsv not running */
+    return(0);
+  } else if(retval == -1) {
+    /* unable to read status file */
+    if(lsb) done(4);
+    return(0);
   }
-  else
-    if (svstatus_get()) {
-      outs("; "); svstatus_print("log");
+
+  if(flagprintfunccount == 0) {
+    retval = describe_status(*service, &svd, buffer_1);
+    if(retval == -1) {
+      /* something bizarre happened */
+      if(lsb) done(4);
+      return(-1);
     }
+    if(chdir("log") == -1) {
+      if(errno != error_noent) {
+        outs("; log: "); outs(WARN);
+        outs("unable to change to log service directory: ");
+        outs(error_str(errno));
+      }
+    } else {
+      if(read_status(*service, &svd)) {
+        describe_status("; log", &svd, buffer_1);
+      }
+    }
+  } else {
+    int i;
+    outs(*service);
+    outs(" ");
+    for(i = 0; i < flagprintfunccount; i++) {
+      flagprintfuncs[i](&svd);
+      outs(" ");
+    }
+  }
   flush("\n");
-  if (lsb) switch(r) { case 1: done(0); case 2: done(3); case 0: done(4); }
-  return(r);
+
+  if(lsb) {
+    if(svd.pid) done(0);
+    else done(3);
+  }
+  return(1);
 }
 
 int checkscript() {
@@ -205,35 +281,50 @@
 }
 
 int check(char *a) {
-  unsigned int pid;
+  struct svdir svd;
+  int retval;
+
+  retval = read_status(*service, &svd);
+  if(retval == -1) {
+    return -1;
+  } else if(retval == 0) {
+    if(*a == 'x')
+      return(1);
+    return(-1);
+  }
 
-  if ((r =svstatus_get()) == -1) return(-1);
-  if (r == 0) { if (*a == 'x') return(1); return(-1); }
-  pid =(unsigned char)svstatus[15];
-  pid <<=8; pid +=(unsigned char)svstatus[14];
-  pid <<=8; pid +=(unsigned char)svstatus[13];
-  pid <<=8; pid +=(unsigned char)svstatus[12];
   switch (*a) {
   case 'x': return(0);
-  case 'u': if (!pid) return(0); if (!checkscript()) return(0); break;
-  case 'd': if (pid) return(0); break;
+  case 'u': if (!svd.pid) return(0); if (!checkscript()) return(0); break;
+  case 'd': if (svd.pid) return(0); break;
   case 't':
-    if (!pid && svstatus[17] == 'd') break;
-    tai_unpack(svstatus, &tstatus);
-    if ((tstart.sec.x > tstatus.x) || !pid || svstatus[18] || !checkscript())
+    if (!svd.pid && svd.want == W_DOWN) break;
+    if ((tstart.sec.x > svd.start.sec.x) || 
+        !svd.pid ||
+        ((svd.ctrl & C_TERM) != 0) ||
+        !checkscript())
       return(0);
     break;
   case 'o':
-    tai_unpack(svstatus, &tstatus);
-    if ((!pid && tstart.sec.x > tstatus.x) || (pid && svstatus[17] != 'd'))
+    if ((!svd.pid && tstart.sec.x > svd.start.sec.x) ||
+        (svd.pid && svd.want != W_DOWN))
       return(0);
   }
-  outs(OK); svstatus_print(*service); flush("\n");
+  outs(OK);
+  describe_status(*service, &svd, buffer_1);
+  flush("\n");
   return(1);
 }
 int control(char *a) {
-  if (svstatus_get() <= 0) return(-1);
-  if (svstatus[17] == *a) return(0);
+  struct svdir svd;
+
+  if(read_status(*service, &svd) < 0)
+    return(-1);
+  if(((*a == 'u') && (svd.want == W_UP)) ||
+     ((*a == 'd') && (svd.want == W_DOWN)) ||
+     ((*a == 'x') && (svd.want == W_EXIT)))
+    return(0);
+
   if ((fd =open_write("supervise/control")) == -1) {
     if (errno != error_nodevice)
       warn("unable to open supervise/control");
@@ -319,7 +410,30 @@
 
   servicex =service;
   for (i =0; i < services; ++i) {
-    if ((**service != '/') && (**service != '.')) {
+    if ((act == &status) && (**service == '-')) {
+      char *flag = (*service)+1;
+      while(*flag != 0) {
+        /* search for a function associated with this flag */
+        int fnidx, foundFlag = 0;
+        for(fnidx = 0; statusflags[fnidx].flagchar != 0; fnidx++) {
+          if(flagprintfunccount == MAX_FLAGPRINTFUNCS) {
+            fatal("Excessive number of flag-based parameters provided");
+          }
+          if(statusflags[fnidx].flagchar == *flag) {
+            flagprintfuncs[flagprintfunccount++] = statusflags[fnidx].printfunc;
+            foundFlag = 1;
+            break;
+          }
+        }
+        if(!foundFlag) {
+          char badFlag[3] = {'-', 'X', 0};
+          badFlag[1] = *flag;
+          fatal2x("Invalid status flag provided: ", badFlag);
+        }
+        flag++;
+      }
+      *service = 0;
+    } else if ((**service != '/') && (**service != '.')) {
       if ((chdir(varservice) == -1) || (chdir(*service) == -1)) {
         fail("unable to change to service directory");
         *service =0;
@@ -354,8 +468,12 @@
           }
         if (*service) { if (cbk(acts) != 0) *service =0; else done =0; }
         if (*service && taia_approx(&tdiff) > wait) {
+          struct svdir svd;
           kll ? outs(KILL) : outs(TIMEOUT);
-          if (svstatus_get() > 0) { svstatus_print(*service); ++rc; }
+          if (read_status(*service, &svd) > 0) {
+            describe_status(*service, &svd, buffer_1);
+            ++rc;
+          }
           flush("\n");
           if (kll) control("k");
           *service =0;
diff -rNu runit-1.3.1/src/svdir.c runit-1.3.1.new/src/svdir.c
--- runit-1.3.1/src/svdir.c	1969-12-31 18:00:00.000000000 -0600
+++ runit-1.3.1.new/src/svdir.c	2005-09-16 22:51:32.089624955 -0500
@@ -0,0 +1,174 @@
+#include <netinet/in.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include "open.h"
+#include "buffer.h"
+#include "error.h"
+#include "strerr.h"
+#include "fmt.h"
+
+#include "svdir.h"
+#include "taia.h"
+
+int parse_status(unsigned char* in, int in_len, struct svdir *out) {
+  if(in_len < SVDIRSIZE_MIN) return -1;
+
+  /* parse non-optional contents */
+  taia_unpack(in, &(out->start));
+#ifdef USE_LIBC_BYTEORDER_FNS
+  out->pid = ntohl(*((uint32_t*)(&in[12])));
+#else
+  out->pid = (in[12] |
+              (((unsigned long)in[13]) << 8)  |
+              (((unsigned long)in[14]) << 16) |
+              (((unsigned long)in[15]) << 24));
+#endif
+  out->ctrl = ( (in[16] == 0 ? 0 : C_PAUSE)
+              | (in[18] == 0 ? 0 : C_TERM) );
+  out->want = ((in[17]) == 'u') ? W_UP : (((in[17]) == 'd') ? W_DOWN : W_EXIT);
+  out->state = in[19];
+
+  /* parse optional contents only if present */
+  if(in_len < 24) {
+    out->runexit_type = 0;
+    out->runexit_val  = 0;
+    out->finexit_type = 0;
+    out->finexit_val  = 0;
+  } else {
+    out->runexit_type = in[20];
+    out->runexit_val  = in[21];
+    out->finexit_type = in[22];
+    out->finexit_val  = in[23];
+  }
+  return 1;
+}
+
+void build_status(struct svdir *in, unsigned char* out) {
+  taia_pack(out, &(in->start));
+#ifdef USE_LIBC_BYTEORDER_FNS
+  *((long*)(&out[12])) = htonl(in->pid);
+#else
+  {
+    unsigned long l = in->pid;
+    out[12] = l & 255; l >>= 8;
+    out[13] = l & 255; l >>= 8;
+    out[14] = l & 255; l >>= 8;
+    out[15] = l & 255;
+  }
+#endif
+  out[16] = ((in->ctrl & C_PAUSE) == 0) ? 0 : 1;
+  out[17] = (in->want == W_UP) ? 'u' : ((in->want == W_DOWN) ? 'd' : 'x');
+  out[18] = ((in->ctrl & C_TERM) == 0)  ? 0 : 1;
+  out[19] = in->state;
+  out[20] = in->runexit_type;
+  out[21] = in->runexit_val;
+  out[22] = in->finexit_type;
+  out[23] = in->finexit_val;
+}
+
+/* read status files corresponding to the current working directory */
+int read_status(char *name, struct svdir *out) {
+  struct stat s;
+  char status[SVDIRSIZE_FULL];
+  int fd, bytesread;
+
+  if (stat("down", &s) == -1) {
+    if (errno != error_noent) {
+      strerr_warn5(FAIL, name, ": ", "unable to stat down", ": ", &strerr_sys);
+      return(-1);
+    }
+    if(out) out->normallyup = 1;
+  } else {
+    if(out) out->normallyup = 0;
+  }
+  if ((fd =open_write("supervise/ok")) == -1) {
+    if (errno == error_nodevice) {
+      strerr_warn4(FAIL, name, ": ", "runsv not running.", 0);
+      return(0);
+    } else {
+      strerr_warn5(FAIL, name, ": ", "unable to open supervise/ok", ": ", &strerr_sys);
+      return(-1);
+    }
+  }
+  close(fd);
+  if ((fd =open_read("supervise/status")) == -1) {
+    strerr_warn5(FAIL, name, ": ", "unable to open supervise/status", ": ", &strerr_sys);
+    return(-1);
+  }
+  
+  if ((bytesread = read(fd, status, SVDIRSIZE_FULL)) == -1) {
+    strerr_warn5(FAIL, name, ": ", "unable to read supervise/status", ": ", &strerr_sys);
+    return(-1);
+  }
+  if (out && (parse_status(status, bytesread, out) == -1)) {
+    strerr_warn3(FAIL, name, "unable to read supervise/status: bad format.", 0);
+    return(-1);
+  }
+  return 1;
+}
+
+int seconds_uptime(const struct svdir *in) {
+  struct tai svtai;		/* from structure */
+  struct tai now;		/* current time*/
+
+  svtai = in->start.sec;
+  tai_now(&now);
+  if (tai_less(&now,&svtai)) svtai = now;
+  tai_sub(&svtai,&now,&svtai);
+  return tai_approx(&svtai);
+}
+
+int describe_status(const char* name,
+                    const struct svdir *in,
+                    buffer* out) {
+
+  char sulong[FMT_ULONG];	/* buffer for formatted numeric value */
+
+  if(name) {
+    buffer_puts(out, name);
+    buffer_puts(out, ": ");
+  }
+  if (in->pid) {
+    switch (in->state) {
+    case S_RUN:    buffer_puts(out, "run "); break;
+    case S_FINISH: buffer_puts(out, "finish "); break;
+    }
+    buffer_puts(out, "(pid ");
+    buffer_put(out, sulong, fmt_ulong(sulong, in->pid));
+    buffer_puts(out, ") ");
+  }
+  else
+    buffer_puts(out, "down ");
+  buffer_put(out, sulong, fmt_ulong(sulong, seconds_uptime(in)));
+  buffer_puts(out, " seconds");
+  if (in->pid && !in->normallyup) buffer_puts(out,", normally down");
+  if (!in->pid && in->normallyup) buffer_puts(out,", normally up");
+  if (in->pid && (in->ctrl & C_PAUSE)) buffer_puts(out,", paused");
+  if (!in->pid && (in->want == W_UP)) buffer_puts(out,", want up");
+  if (in->pid && (in->want == W_DOWN)) buffer_puts(out,", want down");
+  if (in->pid && (in->ctrl & C_TERM)) buffer_puts(out, ", got TERM");
+  /* buffer_putsflush(out, "\n"); */
+
+  if ((in->runexit_type != ET_NOEXIT) &&
+      (!(in->runexit_type == ET_NORMAL && in->runexit_val == 0)) &&
+      ((in->pid && in->state == S_FINISH) || !in->pid)) {
+    buffer_puts(out, ", last run exit ");
+    if(in->runexit_type == ET_NORMAL)
+      buffer_puts(out, "status ");
+    else if(in->runexit_type == ET_SIGNAL)
+      buffer_puts(out, "signal ");
+    buffer_put(out, sulong, fmt_ulong(sulong, in->runexit_val));
+  }
+  if ((in->finexit_type != ET_NOEXIT) &&
+      (!(in->finexit_type == ET_NORMAL && in->finexit_val == 0)) &&
+      ((in->pid && in->state == S_RUN) || !in->pid)) {
+    buffer_puts(out, ", last finish exit ");
+    if(in->finexit_type == ET_NORMAL)
+      buffer_puts(out, "status ");
+    else if(in->finexit_type == ET_SIGNAL)
+      buffer_puts(out, "signal ");
+    buffer_put(out, sulong, fmt_ulong(sulong, in->finexit_val));
+  }
+  return 1;
+}
diff -rNu runit-1.3.1/src/svdir.h runit-1.3.1.new/src/svdir.h
--- runit-1.3.1/src/svdir.h	1969-12-31 18:00:00.000000000 -0600
+++ runit-1.3.1.new/src/svdir.h	2005-09-16 21:28:23.010242943 -0500
@@ -0,0 +1,94 @@
+#ifndef SVDIR_H
+#define SVDIR_H
+
+#include "taia.h"
+
+/* state */
+#define S_DOWN 0
+#define S_RUN 1
+#define S_FINISH 2
+/* ctrl */
+#define C_NOOP 0
+#define C_TERM 1
+#define C_PAUSE 2
+/* want */
+#define W_UP 0
+#define W_DOWN 1
+#define W_EXIT 2
+/* exit type */
+#define ET_NOEXIT 0
+#define ET_NORMAL 1
+#define ET_SIGNAL 2
+#define ET_UNKNOWN 127
+
+/* in-memory format */
+struct svdir {
+
+  /* persisted to disk in status*/
+  struct taia start;
+  int pid;
+  int state;
+  int ctrl;
+  int want;
+  unsigned char runexit_type;
+  unsigned char runexit_val;
+  unsigned char finexit_type;
+  unsigned char finexit_val;
+
+  /* persisted to disk elsewhere */
+  int normallyup;
+  
+  /* non-persisted */
+  int fdlock;
+  int fdcontrol;
+  int fdcontrolwrite;
+  int islog;
+};
+
+/* on-disk format:
+	off	len	desc
+	0	12	time (taia); [12 bytes]
+	12	4	pid; [4 bytes; LSB]
+	16	1	(ctrl & C_PAUSE) ? 1 : 0
+	17	1	(want == W_UP) ? 'u' : 'd'
+	18	1	(ctrl & C_TERM) ? 1 : 0
+	19	1	state (S_DOWN, S_RUN, S_FINISH)
+	---------------- ITEMS BELOW HERE ARE OPTIONAL
+	20	1	run exit type
+	21	1	run exit status
+	22	1	finish exit type
+	23	1	finish exit status
+ */
+
+/* x86 - host is MSB; network is LSB */
+
+/* size of the desired on-disk format */
+#define SVDIRSIZE_MIN 20
+#define SVDIRSIZE_FULL 24
+
+/* parse a string no smaller than SVDIRSIZE_MIN into a svdir object.
+ */
+int parse_status(unsigned char* in, int in_len, struct svdir *out);
+
+/* read a status structure all the way from disk
+ * return 1 for success, 0 for runsv not running, -1 for other errors
+ * if out is NULL, only checks whether runsv is up and files are valid
+ */
+int read_status(char *name, struct svdir *out);
+
+/* build the on-disk status format in the char array out (which must
+ * be at least length SVDIRSIZE_FULL).
+ */
+void build_status(struct svdir *in, unsigned char* out);
+
+/* build a string describing current status.
+ * argument is a buffer of size BUFFER_OUTSIZE.
+ * return value is 0 on success.
+ */
+int describe_status(const char* name, const struct svdir *in, buffer* out);
+
+/* retrieve number of seconds of uptime. While approximate, this is a
+ * good value to present to the user. */
+int seconds_uptime(const struct svdir *in);
+
+#endif
diff -rNu runit-1.3.1/src/svlogd.c runit-1.3.1.new/src/svlogd.c
--- runit-1.3.1/src/svlogd.c	2005-08-24 15:14:39.244780354 -0500
+++ runit-1.3.1.new/src/svlogd.c	2005-09-16 21:37:34.992800370 -0500
@@ -36,11 +36,6 @@
 #define USAGE " [-ttv] [-r c] [-R abc] [-l len] [-b buflen] dir ..."
 #define VERSION "$Id: svlogd.c,v 1.16 2005/07/11 11:53:46 pape Exp $"
 
-#define FATAL "svlogd: fatal: "
-#define WARNING "svlogd: warning: "
-#define PAUSE "svlogd: pausing: "
-#define INFO "svlogd: info: "
-
 const char *progname;
 
 unsigned int verbose =0;
@@ -652,6 +647,7 @@
   int i;
   int opt;
 
+  strerr_setname("svlogd");
   progname =*argv;
 
   while ((opt =getopt(argc, argv, "R:r:l:b:tvV")) != opteof) {
diff -rNu runit-1.3.1/src/svwaitdown.c runit-1.3.1.new/src/svwaitdown.c
--- runit-1.3.1/src/svwaitdown.c	2005-08-24 15:14:39.015270373 -0500
+++ runit-1.3.1.new/src/svwaitdown.c	2005-09-16 21:40:22.870476461 -0500
@@ -7,9 +7,6 @@
 #include "tai.h"
 #include "buffer.h"
 
-#define FATAL "svwaitdown: fatal: "
-#define WARN "svwaitdown: warning: "
-#define INFO "svwaitdown: "
 #define USAGE " [-v] [-t 1..6000] service ..."
 
 #define VERSION "$Id: svwaitdown.c,v 1.13 2005/04/03 09:21:47 pape Exp $"
@@ -21,7 +18,7 @@
 void fatal(const char *m) { strerr_die3sys(111, FATAL, m, ": "); }
 void warn(const char *s1, const char *s2, struct strerr *e) {
   dir++; rc++;
-  strerr_warn3(WARN, s1, s2, e);
+  strerr_warn3(WARNING, s1, s2, e);
 }
 void usage() { strerr_die4x(1, "usage: ", progname, USAGE, "\n"); }
 
@@ -39,6 +36,7 @@
   struct tai start;
   struct tai now;
   
+  strerr_setname("svwaitdown");
   progname =*argv;
   
   while ((opt =getopt(argc, argv, "t:xkvV")) != opteof) {
@@ -170,7 +168,7 @@
     sleep(1);
   }
   if (fchdir(wdir) == -1) 
-    strerr_warn2(WARN, "unable to switch to starting directory: ", &strerr_sys);
+    strerr_warn2(WARNING, "unable to switch to starting directory: ", &strerr_sys);
   close(wdir);
   if (rc > 100) rc =100;
   _exit(rc);
diff -rNu runit-1.3.1/src/svwaitup.c runit-1.3.1.new/src/svwaitup.c
--- runit-1.3.1/src/svwaitup.c	2005-08-24 15:14:39.634420831 -0500
+++ runit-1.3.1.new/src/svwaitup.c	2005-09-16 21:54:10.594804619 -0500
@@ -8,9 +8,6 @@
 #include "buffer.h"
 #include "fmt.h"
 
-#define FATAL "svwaitup: fatal: "
-#define WARN "svwaitup: warning: "
-#define INFO "svwaitup: "
 #define USAGE " [-v] [-s 1..600] service ..."
 
 const char *progname;
@@ -21,7 +18,7 @@
 void fatal(const char *m) { strerr_die3sys(111, FATAL, m, ": "); }
 void warn(const char *s1, const char *s2, struct strerr *e) {
   dir++; rc++;
-  strerr_warn3(WARN, s1, s2, e);
+  strerr_warn3(WARNING, s1, s2, e);
 }
 void usage() { strerr_die4x(1, "usage: ", progname, USAGE, "\n"); }
 
@@ -38,6 +35,7 @@
   struct tai now;
   char sulong[FMT_ULONG];
 
+  strerr_setname("svwaitup");
   progname =*argv;
   
   while ((opt =getopt(argc, argv, "s:vV")) != opteof) {
@@ -119,7 +117,7 @@
     sleep(sec -is);
   }
   if (fchdir(wdir) == -1) 
-    strerr_warn2(WARN, "unable to switch to starting directory: ", &strerr_sys);
+    strerr_warn2(WARNING, "unable to switch to starting directory: ", &strerr_sys);
   close(wdir);
   if (rc > 100) rc =100;
   _exit(rc);
diff -rNu runit-1.3.1/src/taia_unpack.c runit-1.3.1.new/src/taia_unpack.c
--- runit-1.3.1/src/taia_unpack.c	1969-12-31 18:00:00.000000000 -0600
+++ runit-1.3.1.new/src/taia_unpack.c	2005-09-16 13:40:55.996983129 -0500
@@ -0,0 +1,19 @@
+/* Public domain. */
+
+#include <netinet/in.h>
+#include "taia.h"
+
+void taia_unpack(const char* s, struct taia *t)
+{
+  tai_unpack(s, &t->sec);
+  s += 8;
+
+  t->nano = (s[0] |
+             (((unsigned long)s[1]) >> 8) |
+             (((unsigned long)s[2]) >> 16) |
+             (((unsigned long)s[3]) >> 24));
+  t->atto = (s[4] |
+             (((unsigned long)s[5]) >> 8) |
+             (((unsigned long)s[6]) >> 16) |
+             (((unsigned long)s[7]) >> 24));
+}
diff -rNu runit-1.3.1/src/TARGETS runit-1.3.1.new/src/TARGETS
--- runit-1.3.1/src/TARGETS	2005-08-24 15:14:39.529956491 -0500
+++ runit-1.3.1.new/src/TARGETS	2005-09-15 15:55:00.038381057 -0500
@@ -12,6 +12,7 @@
 runsvctrl.o
 sv
 sv.o
+svdir.o
 svwaitdown
 svwaitdown.o
 svwaitup
@@ -123,6 +124,7 @@
 taia_pack.o
 taia_sub.o
 taia_uint.o
+taia_unpack.o
 time.a
 uint64.h
 unix.a
diff -rNu runit-1.3.1/src/utmpset.c runit-1.3.1.new/src/utmpset.c
--- runit-1.3.1/src/utmpset.c	2005-08-24 15:14:39.104228214 -0500
+++ runit-1.3.1.new/src/utmpset.c	2005-09-16 21:39:13.774279519 -0500
@@ -14,8 +14,6 @@
 #include "lock.h"
 
 #define USAGE " [-w] line"
-#define FATAL "utmpset: fatal: "
-#define WARNING "utmpset: warning: "
 
 const char *progname;
 
@@ -85,6 +83,7 @@
   int opt;
   int wtmp =0;
 
+  strerr_setname("utmpset");
   progname =*argv;
 
   while ((opt =getopt(argc, argv, "wV")) != opteof) {

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: new "sv status" flags and exit-tracking patch, and misc.
  2005-09-17  4:46 new "sv status" flags and exit-tracking patch, and misc Charles Duffy
@ 2005-09-17  5:17 ` Mike Bell
  2005-09-19  8:35   ` Gerrit Pape
  2005-09-17  8:53 ` Laurent Bercot
  2005-09-19  8:31 ` Gerrit Pape
  2 siblings, 1 reply; 18+ messages in thread
From: Mike Bell @ 2005-09-17  5:17 UTC (permalink / raw)
  Cc: supervision

On Fri, Sep 16, 2005 at 11:46:24PM -0500, Charles Duffy wrote:
> Oh, and finally: Mixing tabs and whitespace is *abominable*. All tabs is 
> good (so folks can have the amount of whitespace they want just by 
> setting their editor, without needing to change the files). All 
> whitespace is acceptable. Mixing tabs and whitespace, on the other hand, 
> make an utter and complete *mess*. Bad! Evil!

With all due respect to Gerrit, whose taste /can't/ be that bad given the
excellent project he maintains and some of the very informed decisions
he makes...

Even "better" is when the indentation level suddenly decreases because
the code block was too heavily nested. Faced with indentation levels
that make it impossible to fit even simple statements on one line, he
chooses to suddenly drop the indentation level rather than break-up the
function. :)

I ran runit through indent to fix it up once, and it wasn't recognizable
any longer (nor were the over-nested sections legible).

There's nothing wrong with a block of code into a subroutine, even if
it's only used a single time. Declare it static (or better yet static
inline if you use C99) and the compiler has all the information it needs
to inline the function and eliminate the overhead (which was only a
single subroutine worth anyway, so not going to show up on any profile -
ever - unless it's called tens of thousands of times in a loop).

As an added bonus, factor the right bits out and name them sensibly and
your original function becomes easier to read, in the same way that
the mind can easily substitute in what the printf() library call does,
and the result is much more legible than seeing that giant mess of
code inline.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: new "sv status" flags and exit-tracking patch, and misc.
  2005-09-17  4:46 new "sv status" flags and exit-tracking patch, and misc Charles Duffy
  2005-09-17  5:17 ` Mike Bell
@ 2005-09-17  8:53 ` Laurent Bercot
  2005-09-19  8:31 ` Gerrit Pape
  2 siblings, 0 replies; 18+ messages in thread
From: Laurent Bercot @ 2005-09-17  8:53 UTC (permalink / raw)


 I can't address all the points you're raising, but some bits I can answer:


> Why does runit use its own env_get() method rather than getenv()? [This 
> is just one example; there are *lots* of places where C library calls 
> are eschewed in favor of locally-implemented bits].
>
> (...)
> 
> Why define the same error methods and macros at the top of each file (as 
> with fail/failx/warn/warnx/etc)?
> 
> (...)
> 
> Why the huge number of distinct error handling functions and macros with 
> large numbers of parameters instead of just one or two functions using 
> varargs?

 This, my friend, is DJB-style coding. And strange as it may seem to the
profane, it's intentional. :)
 That coding style has its own reasons. All of which are of course
questionable, but at least, asking yourself questions is better than
blindly following a poor standard. If you like runit, then you're already
questioning System V or BSD init ; delving into runit's code is the next
step on the Path to Enlightenment, questioning more of your UNIX habits.
 Take a deep breath, coder, for what thou art about to see may change
your way of thinking forever.

 No, I'm not exaggerating. Well... maybe just a little bit. :)
 Anyway.

 The main points are:

  - when the standard is mediocre, don't follow it. Create a better
standard.

  - libc standards are often mediocre. UNIX is old, and bears the marks
of age (read: compatibility), but back then, good interface design wasn't
a science yet. Example of a deep-rooted legacy interface: stdio.

  - always wonder how something is implemented. Never forget library
overhead; having a nifty does-it-all function call available does not
mean it's a good idea to call it, because it may do way much more than
you need, and be a pain security-wise or performance-wise.

  - libc portability is a myth. When you want portable code, write
portable code, instead of relying on so-called portable functions that
will obscurely break on some architecture in situations you cannot guess
or control. Always play it safe.

 The strerr functions you're talking about are precisely _designed_
to avoid using varargs. The question is not "why not use varargs ?" -
the question is "why would I use varargs when I can do without them ?" :)

 If you want to dive in this "alternatively written software", take a
look at DJB's software, where it all comes from; especially daemontools
(which was the original inspiration for runit) and ucspi-tcp.

 That way of thinking has led to qmail instead of sendmail; djbdns instead
of BIND; tcprules instead of tcp_wrappers; runit instead of System V init;
runwhen instead of cron; ucspi-ssl instead of stunnel; execline instead
of sh (shameless and counterproductive plug here).
 And if nobody has yet come to write a working DJB-style news server to
replace that INN horror, it's because it's a huge amount of painful work
and we all have a life (sort of).

 That was my rant of the year. Time to go back to sleep.

-- 
 Laurent


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: new "sv status" flags and exit-tracking patch, and misc.
  2005-09-17  4:46 new "sv status" flags and exit-tracking patch, and misc Charles Duffy
  2005-09-17  5:17 ` Mike Bell
  2005-09-17  8:53 ` Laurent Bercot
@ 2005-09-19  8:31 ` Gerrit Pape
  2005-09-19 16:04   ` Charles Duffy
  2 siblings, 1 reply; 18+ messages in thread
From: Gerrit Pape @ 2005-09-19  8:31 UTC (permalink / raw)


On Fri, Sep 16, 2005 at 11:46:24PM -0500, Charles Duffy wrote:
> This is not a small patch. While acquainting myself with runit's 
> internals, I found their coding style such as to substantially reduce 
> maintainability, and consequently did some refactoring on my way. 
> There's still much to be done -- a great many of the items touched on in 
> the rant pasted below are still issues, or are issues which have been 
> addressed only in small portions of the code -- and frankly, I can 
> appreciate why there would be some legitimate hesitation to apply a 
> patch of this size, even if it *is* agreed that the changes it makes 
> (beyond that for which it was begun) are necessary and proper.

Yes, I won't apply this patch.  And I think the additional changes are
neither necessary, nor proper.  I don't see the 'issues' you are trying
to address, other than your personal opinion on how the code should look
like.  Where are the bugs?

$ grep ^--- runit-1.3.1.new.patch 
--- runit-1.3.1/src/chkshsgr.c  2005-08-24 15:14:39.770106475 -0500
--- runit-1.3.1/src/chpst.c     2005-08-24 15:14:39.353778987 -0500
--- runit-1.3.1/src/conf-cc     2005-08-24 15:14:39.425467844 -0500
--- runit-1.3.1/src/conf-ld     2005-08-24 15:14:39.821970387 -0500
--- runit-1.3.1/src/Makefile    2005-08-24 15:14:39.993823320 -0500
--- runit-1.3.1/src/pathexec_run.c      2005-08-24 15:14:39.870476461 -0500
--- runit-1.3.1/src/prot.c      2005-08-24 15:14:39.854196578 -0500
--- runit-1.3.1/src/runit.c     2005-08-24 15:14:39.862091897 -0500
--- runit-1.3.1/src/runit-init.c        2005-08-24 15:14:39.871740354 -0500
--- runit-1.3.1/src/runsv.c     2005-08-24 15:14:39.585119477 -0500
--- runit-1.3.1/src/runsvctrl.c 2005-08-24 15:14:39.257944026 -0500
--- runit-1.3.1/src/runsvstat.c 2005-08-24 15:14:39.653043632 -0500
--- runit-1.3.1/src/seek_set.c  2005-08-24 15:14:39.983609511 -0500
--- runit-1.3.1/src/strerr_die.c        2005-08-24 15:14:39.142788882 -0500
--- runit-1.3.1/src/strerr.h    2005-08-24 15:14:39.250106243 -0500
--- runit-1.3.1/src/strerr_sys.c        2005-08-24 15:14:39.678678463 -0500
--- runit-1.3.1/src/sv.c        2005-08-24 15:14:39.030052823 -0500
--- runit-1.3.1/src/svdir.c     1969-12-31 18:00:00.000000000 -0600
--- runit-1.3.1/src/svdir.h     1969-12-31 18:00:00.000000000 -0600
--- runit-1.3.1/src/svlogd.c    2005-08-24 15:14:39.244780354 -0500
--- runit-1.3.1/src/svwaitdown.c        2005-08-24 15:14:39.015270373 -0500
--- runit-1.3.1/src/svwaitup.c  2005-08-24 15:14:39.634420831 -0500
--- runit-1.3.1/src/taia_unpack.c       1969-12-31 18:00:00.000000000 -0600
--- runit-1.3.1/src/TARGETS     2005-08-24 15:14:39.529956491 -0500
--- runit-1.3.1/src/utmpset.c   2005-08-24 15:14:39.104228214 -0500

The plan was to adapt runsv.c to put additional information into the
supervise/status file, and sv.c to display this information through
additional command line switches.  These two files only need to be
changed.

On Tue, Aug 30, 2005 at 11:06:46AM +0000, Gerrit Pape wrote:
> I would prefer not to change the current output of 'sv status'.

On Mon, Aug 29, 2005 at 08:12:39AM +0000, Gerrit Pape wrote:
> To query this information, I would prefer to extend the sv program, not
> runsvstat.  runsvctrl, runsvstat, svwaitdown, svwaitup, most probably
> will get obsolete some time, as the functionality is integrated into
> sv.

There's no point in sharing code in runsvstat & co and sv.  There's also
no point is changing the library code as contribution, which is taken
literally from the daemontools and djbdns packages, out of the public
domain.  Most of the library code is pretty well documented
 http://cr.yp.to/software.html

Regards, Gerrit.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: new "sv status" flags and exit-tracking patch, and misc.
  2005-09-17  5:17 ` Mike Bell
@ 2005-09-19  8:35   ` Gerrit Pape
  0 siblings, 0 replies; 18+ messages in thread
From: Gerrit Pape @ 2005-09-19  8:35 UTC (permalink / raw)


On Fri, Sep 16, 2005 at 10:17:21PM -0700, Mike Bell wrote:
> Even "better" is when the indentation level suddenly decreases because
> the code block was too heavily nested. Faced with indentation levels
> that make it impossible to fit even simple statements on one line, he
> chooses to suddenly drop the indentation level rather than break-up the
> function. :)

You must be kidding, or reading different code.

> I ran runit through indent to fix it up once, and it wasn't recognizable
> any longer (nor were the over-nested sections legible).

sed -e 's/\t/        /g'

> There's nothing wrong with a block of code into a subroutine, even if
> it's only used a single time. Declare it static (or better yet static
> inline if you use C99) and the compiler has all the information it needs
> to inline the function and eliminate the overhead (which was only a
> single subroutine worth anyway, so not going to show up on any profile -
> ever - unless it's called tens of thousands of times in a loop).

So what?

Regards, Gerrit.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: new "sv status" flags and exit-tracking patch, and misc.
  2005-09-19  8:31 ` Gerrit Pape
@ 2005-09-19 16:04   ` Charles Duffy
  2005-09-19 19:13     ` Charles Duffy
  0 siblings, 1 reply; 18+ messages in thread
From: Charles Duffy @ 2005-09-19 16:04 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 2885 bytes --]

Gerrit Pape wrote:
> Yes, I won't apply this patch.

Attached is a patch which attempts to stick to the items we agreed on.


======================================================================
Everywhere below here I'm strictly debating philosophy: You're the 
maintainer; you win these discussions by default, and I'm not about to 
start my own fork. Feel free to ignore all of it if you prefer.
======================================================================


> The plan was to adapt runsv.c to put additional information into the
> supervise/status file, and sv.c to display this information through
> additional command line switches.  These two files only need to be
> changed.

Agreed that no other files need to be changed to implement the new 
functionality. The other files needed to be changed so I could 
understand what I was doing. Backporting my completed patch to your 
coding style, on the other hand, was not particularly difficult.

> On Mon, Aug 29, 2005 at 08:12:39AM +0000, Gerrit Pape wrote:
> 
>>To query this information, I would prefer to extend the sv program, not
>>runsvstat.  runsvctrl, runsvstat, svwaitdown, svwaitup, most probably
>>will get obsolete some time, as the functionality is integrated into
>>sv.

I'm aware that you expressed that preference, and was aware of it at the 
time when I decided to go the route that I did. Indeed, I interpreted 
your preference as being related to the amount of work and/or effort 
involved in maintaining the old programs -- but if almost all of their 
code is shared with the new programs, then maintaining them involves far 
less workload than if they include their own implementations of 
everything down to PID maintenance.

> There's no point in sharing code in runsvstat & co and sv.

I disagree.

Duplicate code is always bad. It means that two different routines which 
are intended to do the same things may have different bugs which come to 
be fixed in one but not the other. It means that user expectations that 
functionality accessed one way will have similar behaviour to the same 
functionality accessed via a similar subcommand in the same package may 
be broken. It makes maintenance harder (as changes which impact said 
code need to be made in multiple places). It makes the code more verbose 
(as tasks which could be hidden away in a library and implemented only a 
single time, such as decoding PIDs, are instead implemented a number of 
different times in a number of different files).

It's just Bad Juju.

> There's also
> no point is changing the library code as contribution, which is taken
> literally from the daemontools and djbdns packages, out of the public
> domain.  Most of the library code is pretty well documented
>  http://cr.yp.to/software.html

I'm happy to keep my modified versions of code origionally in the public 
domain in the public domain as well.

[-- Attachment #2: runit-1.3.1.new.patch --]
[-- Type: text/x-patch, Size: 11267 bytes --]

diff -ru runit-1.3.1/src/runsv.c runit-1.3.1.new/src/runsv.c
--- runit-1.3.1/src/runsv.c	2005-08-24 15:14:39.799235290 -0500
+++ runit-1.3.1.new/src/runsv.c	2005-09-19 10:00:44.845810403 -0500
@@ -1,4 +1,5 @@
 #include <sys/types.h>
+#include <sys/wait.h>
 #include <sys/stat.h>
 #include <unistd.h>
 #include <stdio.h>
@@ -39,6 +40,11 @@
 #define W_UP 0
 #define W_DOWN 1
 #define W_EXIT 2
+/* exit type */
+#define ET_NOEXIT 0
+#define ET_NORMAL 1
+#define ET_SIGNAL 2
+#define ET_UNKNOWN 127
 
 struct svdir {
   int pid;
@@ -50,6 +56,10 @@
   int fdcontrol;
   int fdcontrolwrite;
   int islog;
+  unsigned char runexit_type;
+  unsigned char runexit_val;
+  unsigned char finexit_type;
+  unsigned char finexit_val;
 };
 struct svdir svd[2];
 
@@ -86,7 +96,7 @@
 void update_status(struct svdir *s) {
   unsigned long l;
   int fd;
-  char status[20];
+  char status[24];
   char bspace[64];
   buffer b;
   char spid[FMT_ULONG];
@@ -180,6 +190,10 @@
   else
     status[18] =0;
   status[19] =s->state;
+  status[20] =s->runexit_type;
+  status[21] =s->runexit_val;
+  status[22] =s->finexit_type;
+  status[23] =s->finexit_val;
   if ((fd =open_trunc("supervise/status.new")) == -1) {
     warn("unable to open supervise/status.new");
     return;
@@ -399,6 +413,10 @@
   svd[0].ctrl =C_NOOP;
   svd[0].want =W_UP;
   svd[0].islog =0;
+  svd[0].runexit_type =0;
+  svd[0].runexit_val =0;
+  svd[0].finexit_type =0;
+  svd[0].finexit_val =0;
   svd[1].pid =0;
   taia_now(&svd[0].start);
   if (stat("down", &s) != -1) svd[0].want =W_DOWN;
@@ -416,6 +434,10 @@
       svd[1].ctrl =C_NOOP;
       svd[1].want =W_UP;
       svd[1].islog =1;
+      svd[1].runexit_type =0;
+      svd[1].runexit_val =0;
+      svd[1].finexit_type =0;
+      svd[1].finexit_val =0;
       taia_now(&svd[1].start);
       if (stat("log/down", &s) != -1)
         svd[1].want =W_DOWN;
@@ -540,13 +562,39 @@
         svd[0].pid =0;
         pidchanged =1;
         svd[0].ctrl &=~C_TERM;
-        if (svd[0].state != S_FINISH)
+        if (svd[0].state != S_FINISH) {
+          if (WIFEXITED(wstat)) {
+            svd[0].runexit_type = ET_NORMAL;
+            svd[0].runexit_val = WEXITSTATUS(wstat);
+          } else if (WIFSIGNALED(wstat)) {
+            svd[0].runexit_type = ET_SIGNAL;
+            svd[0].runexit_val = WTERMSIG(wstat);
+          } else if(WIFSTOPPED(wstat)) {
+            /* do nothing */
+          } else {
+            svd[0].runexit_type = ET_UNKNOWN;
+            svd[0].runexit_val = 0;
+          }
           if ((fd =open_read("finish")) != -1) {
             close(fd);
             svd[0].state =S_FINISH;
             update_status(&svd[0]);
             continue;
           }
+        } else {
+          if (WIFEXITED(wstat)) {
+            svd[0].finexit_type = ET_NORMAL;
+            svd[0].finexit_val = WEXITSTATUS(wstat);
+          } else if (WIFSIGNALED(wstat)) {
+            svd[0].finexit_type = ET_SIGNAL;
+            svd[0].finexit_val = WTERMSIG(wstat);
+          } else if(WIFSTOPPED(wstat)) {
+            /* do nothing */
+          } else {
+            svd[0].finexit_type = ET_UNKNOWN;
+            svd[0].finexit_val = 0;
+          }
+        }
         svd[0].state =S_DOWN;
         taia_uint(&deadline, 1);
         taia_add(&deadline, &svd[0].start, &deadline);
diff -ru runit-1.3.1/src/sv.c runit-1.3.1.new/src/sv.c
--- runit-1.3.1/src/sv.c	2005-08-24 15:14:39.204445484 -0500
+++ runit-1.3.1.new/src/sv.c	2005-09-19 10:48:21.712404702 -0500
@@ -1,5 +1,6 @@
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <string.h>
 #include <unistd.h>
 #include "str.h"
 #include "strerr.h"
@@ -48,7 +49,7 @@
 int (*cbk)(char*) =0;
 
 int curdir, fd, r;
-char svstatus[20];
+char svstatus[24];
 char sulong[FMT_ULONG];
 
 void usage() {
@@ -63,6 +64,10 @@
   strerr_warn4(FATAL, m1, m2, ": ", &strerr_sys);
   done(lsb ? 151 : 100);
 }
+void fatal2x(char *m1, char *m2) {
+  strerr_warn3(FATAL, m1, m2, 0);
+  done(lsb ? 151 : 100);
+}
 void out(char *p, char *m1) {
   buffer_puts(buffer_1, p);
   buffer_puts(buffer_1, *service);
@@ -86,6 +91,124 @@
 void outs2(const char *s) { buffer_puts(buffer_2, s); }
 void flush2(const char *s) { outs2(s); buffer_flush(buffer_2); }
 
+typedef int (*flagprint_fn)();
+typedef struct statusflag {
+  char flagchar;
+  flagprint_fn printfunc;
+} statusflag_t;
+
+#define MAX_FLAGPRINTFUNCS 16
+flagprint_fn flagprintfuncs[MAX_FLAGPRINTFUNCS] = { 0, };
+int flagprintfunccount = 0;  /* number of printfuncs currently requested */
+
+static inline int getPID() {
+  int pid =(unsigned char) svstatus[15];
+  pid <<=8; pid +=(unsigned char)svstatus[14];
+  pid <<=8; pid +=(unsigned char)svstatus[13];
+  pid <<=8; pid +=(unsigned char)svstatus[12];
+  return pid;
+}
+
+static int printULong(unsigned long ulong) {
+  char sulong[FMT_ULONG];
+  memset(sulong, 0, FMT_ULONG);
+  fmt_ulong(sulong, ulong);
+  outs(sulong);
+  return 1;
+}
+
+static inline int printExitData(int exit_type, int exit_val) {
+  switch(exit_type) {
+    case 0:
+      outs("none:");
+      break;
+    case 1:
+      outs("exit:");
+      break;
+    case 2:
+      outs("signal:");
+      break;
+    default:
+      outs("unknown:");
+  }
+  printULong(exit_val);
+  return 1;
+}
+
+static int printRunExit() {
+  return printExitData(svstatus[20], svstatus[21]);
+}
+
+static int printFinExit() {
+  return printExitData(svstatus[22], svstatus[23]);
+}
+
+static int printPID() {
+  int pid = getPID(svstatus);
+  if(pid) printULong(pid);
+  else outs("-");
+  return 1;
+}
+
+static int printCurrentState() {
+  switch(svstatus[19]) {
+    case 0:
+      outs("down");
+      break;
+    case 1:
+      outs("run");
+      break;
+    case 2:
+      outs("finish");
+      break;
+    default:
+      outs("error");
+      return -1;
+  }
+  return 1;
+}
+
+static int printWantState() {
+  switch(svstatus[17]) {
+    case 'u':
+      outs("up");
+      break;
+    case 'd':
+      outs("down");
+      break;
+    case 'x':
+      outs("exit");
+      break;
+    default:
+      outs("error");
+      return -1;
+  }
+  return 1;
+}
+
+static int printUptime() {
+  struct tai svtai;		/* from structure */
+  struct tai now;		/* current time*/
+
+  tai_now(&now);
+  tai_unpack(svstatus, &svtai);
+  if (tai_less(&now,&svtai)) svtai = now;
+  tai_sub(&svtai,&now,&svtai);
+  printULong(tai_approx(&svtai));
+  return 1;
+}
+
+const statusflag_t statusflags[] = {
+  {'r', printRunExit},
+  {'f', printFinExit},
+  {'p', printPID},
+  {'s', printCurrentState},
+  {'w', printWantState},
+  {'u', printUptime},
+  {0, 0},
+};
+
+
 int svstatus_get() {
   if ((fd =open_write("supervise/ok")) == -1) {
     if (errno == error_nodevice) {
@@ -100,10 +223,12 @@
     warn("unable to open supervise/status");
     return(-1);
   }
-  r =read(fd, svstatus, 20);
+  memset(svstatus, 0, 24);
+  r =read(fd, svstatus, 24);
   close(fd);
   switch(r) {
   case 20: break;
+  case 24: break;
   case -1: warn("unable to read supervise/status"); return(-1);
   default: warnx("unable to read supervise/status: bad format"); return(-1);
   }
@@ -122,10 +247,7 @@
     }
     normallyup =1;
   }
-  pid =(unsigned char) svstatus[15];
-  pid <<=8; pid +=(unsigned char)svstatus[14];
-  pid <<=8; pid +=(unsigned char)svstatus[13];
-  pid <<=8; pid +=(unsigned char)svstatus[12];
+  pid =getPID();
   tai_unpack(svstatus, &tstatus);
   if (pid) {
     switch (svstatus[19]) {
@@ -147,23 +269,55 @@
   if (!pid && (svstatus[17] == 'u')) outs(", want up");
   if (pid && (svstatus[17] == 'd')) outs(", want down");
   if (pid && svstatus[18]) outs(", got TERM");
+
+  if ((svstatus[20] != 0) &&
+      (!(svstatus[20] == 1 && svstatus[21] == 0)) &&
+      ((pid && svstatus[19] == 2) || !pid)) {
+    buffer_puts(buffer_1, ", last run exit ");
+    if(svstatus[20] == 1)
+      buffer_puts(buffer_1, "status ");
+    else if(svstatus[20] == 2)
+      buffer_puts(buffer_1, "signal ");
+    buffer_put(buffer_1, sulong, fmt_ulong(sulong, svstatus[21]));
+  }
+  if ((svstatus[22] != 0) &&
+      (!(svstatus[22] == 1 && svstatus[23] == 0)) &&
+      ((pid && svstatus[19] == 1) || !pid)) {
+    buffer_puts(buffer_1, ", last finish exit ");
+    if(svstatus[22] == 1)
+      buffer_puts(buffer_1, "status ");
+    else if(svstatus[22] == 2)
+      buffer_puts(buffer_1, "signal ");
+    buffer_put(buffer_1, sulong, fmt_ulong(sulong, svstatus[23]));
+  }
+
   return(pid ? 1 : 2);
 }
 int status(char *unused) {
   r =svstatus_get();
   switch(r) { case -1: if (lsb) done(4); case 0: return(0); }
-  r =svstatus_print(*service);
-  if (chdir("log") == -1) {
-    if (errno != error_noent) {
-      outs("; log: "); outs(WARN);
-      outs("unable to change to log service directory: ");
-      outs(error_str(errno));
+
+  if(flagprintfunccount == 0) {
+    r =svstatus_print(*service);
+    if (chdir("log") == -1) {
+      if (errno != error_noent) {
+        outs("; log: "); outs(WARN);
+        outs("unable to change to log service directory: ");
+        outs(error_str(errno));
+      }
     }
-  }
-  else
-    if (svstatus_get()) {
-      outs("; "); svstatus_print("log");
+    else
+      if (svstatus_get()) {
+        outs("; "); svstatus_print("log");
+      }
+  } else {
+    int i;
+    outs(*service);
+    for(i = 0; i < flagprintfunccount; i++) {
+      outs(" ");
+      flagprintfuncs[i]();
     }
+  }
   flush("\n");
   if (lsb) switch(r) { case 1: done(0); case 2: done(3); case 0: done(4); }
   return(r);
@@ -319,17 +473,42 @@
 
   servicex =service;
   for (i =0; i < services; ++i) {
-    if ((**service != '/') && (**service != '.')) {
-      if ((chdir(varservice) == -1) || (chdir(*service) == -1)) {
-        fail("unable to change to service directory");
-        *service =0;
+    if ((act == &status) && (**service == '-')) {
+      char *flag = (*service)+1;
+      while(*flag != 0) {
+        /* search for a function associated with this flag */
+        int fnidx, foundFlag = 0;
+        for(fnidx = 0; statusflags[fnidx].flagchar != 0; fnidx++) {
+          if(flagprintfunccount == MAX_FLAGPRINTFUNCS) {
+            fatal("Excessive number of flag-based parameters provided");
+          }
+          if(statusflags[fnidx].flagchar == *flag) {
+            flagprintfuncs[flagprintfunccount++] = statusflags[fnidx].printfunc;
+            foundFlag = 1;
+            break;
+          }
+        }
+        if(!foundFlag) {
+          char badFlag[3] = {'-', 'X', 0};
+          badFlag[1] = *flag;
+          fatal2x("Invalid status flag provided: ", badFlag);
+        }
+        flag++;
       }
+      *service = 0;
     }
     else
-      if (chdir(*service) == -1) {
-        fail("unable to change to service directory");
-        *service =0;
+      if ((**service != '/') && (**service != '.')) {
+        if ((chdir(varservice) == -1) || (chdir(*service) == -1)) {
+          fail("unable to change to service directory");
+          *service =0;
+        }
       }
+      else
+        if (chdir(*service) == -1) {
+          fail("unable to change to service directory");
+          *service =0;
+        }
     if (*service) if (act(acts) == -1) *service =0;
     if (fchdir(curdir) == -1) fatal("unable to change to original directory");
     service++;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: new "sv status" flags and exit-tracking patch, and misc.
  2005-09-19 16:04   ` Charles Duffy
@ 2005-09-19 19:13     ` Charles Duffy
  2005-09-21 21:50       ` new "sv status" flags and exit-tracking patch, yet again Charles Duffy
  2005-09-26 10:12       ` new "sv status" flags and exit-tracking patch, and misc Gerrit Pape
  0 siblings, 2 replies; 18+ messages in thread
From: Charles Duffy @ 2005-09-19 19:13 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 477 bytes --]

Charles Duffy wrote:
> Attached is a patch which attempts to stick to the items we agreed on.

Erk -- actually, that one was a little too overreaching as well, since 
it changed "sv status" to provide the last exit status of finish (if 
curently in run) or run (if currently in finish), if that program exited 
with a status other than 0. Since the requested spec was that the output 
of "sv status" (when called without flags) not be altered... well, 
here's another attempt.

[-- Attachment #2: runit-1.3.1.new-smaller.patch --]
[-- Type: text/x-patch, Size: 10284 bytes --]

diff -ru runit-1.3.1/src/runsv.c runit-1.3.1.new/src/runsv.c
--- runit-1.3.1/src/runsv.c	2005-08-24 15:14:39.427019666 -0500
+++ runit-1.3.1.new/src/runsv.c	2005-09-19 10:00:44.490650789 -0500
@@ -1,4 +1,5 @@
 #include <sys/types.h>
+#include <sys/wait.h>
 #include <sys/stat.h>
 #include <unistd.h>
 #include <stdio.h>
@@ -39,6 +40,11 @@
 #define W_UP 0
 #define W_DOWN 1
 #define W_EXIT 2
+/* exit type */
+#define ET_NOEXIT 0
+#define ET_NORMAL 1
+#define ET_SIGNAL 2
+#define ET_UNKNOWN 127
 
 struct svdir {
   int pid;
@@ -50,6 +56,10 @@
   int fdcontrol;
   int fdcontrolwrite;
   int islog;
+  unsigned char runexit_type;
+  unsigned char runexit_val;
+  unsigned char finexit_type;
+  unsigned char finexit_val;
 };
 struct svdir svd[2];
 
@@ -86,7 +96,7 @@
 void update_status(struct svdir *s) {
   unsigned long l;
   int fd;
-  char status[20];
+  char status[24];
   char bspace[64];
   buffer b;
   char spid[FMT_ULONG];
@@ -180,6 +190,10 @@
   else
     status[18] =0;
   status[19] =s->state;
+  status[20] =s->runexit_type;
+  status[21] =s->runexit_val;
+  status[22] =s->finexit_type;
+  status[23] =s->finexit_val;
   if ((fd =open_trunc("supervise/status.new")) == -1) {
     warn("unable to open supervise/status.new");
     return;
@@ -399,6 +413,10 @@
   svd[0].ctrl =C_NOOP;
   svd[0].want =W_UP;
   svd[0].islog =0;
+  svd[0].runexit_type =0;
+  svd[0].runexit_val =0;
+  svd[0].finexit_type =0;
+  svd[0].finexit_val =0;
   svd[1].pid =0;
   taia_now(&svd[0].start);
   if (stat("down", &s) != -1) svd[0].want =W_DOWN;
@@ -416,6 +434,10 @@
       svd[1].ctrl =C_NOOP;
       svd[1].want =W_UP;
       svd[1].islog =1;
+      svd[1].runexit_type =0;
+      svd[1].runexit_val =0;
+      svd[1].finexit_type =0;
+      svd[1].finexit_val =0;
       taia_now(&svd[1].start);
       if (stat("log/down", &s) != -1)
         svd[1].want =W_DOWN;
@@ -540,13 +562,39 @@
         svd[0].pid =0;
         pidchanged =1;
         svd[0].ctrl &=~C_TERM;
-        if (svd[0].state != S_FINISH)
+        if (svd[0].state != S_FINISH) {
+          if (WIFEXITED(wstat)) {
+            svd[0].runexit_type = ET_NORMAL;
+            svd[0].runexit_val = WEXITSTATUS(wstat);
+          } else if (WIFSIGNALED(wstat)) {
+            svd[0].runexit_type = ET_SIGNAL;
+            svd[0].runexit_val = WTERMSIG(wstat);
+          } else if(WIFSTOPPED(wstat)) {
+            /* do nothing */
+          } else {
+            svd[0].runexit_type = ET_UNKNOWN;
+            svd[0].runexit_val = 0;
+          }
           if ((fd =open_read("finish")) != -1) {
             close(fd);
             svd[0].state =S_FINISH;
             update_status(&svd[0]);
             continue;
           }
+        } else {
+          if (WIFEXITED(wstat)) {
+            svd[0].finexit_type = ET_NORMAL;
+            svd[0].finexit_val = WEXITSTATUS(wstat);
+          } else if (WIFSIGNALED(wstat)) {
+            svd[0].finexit_type = ET_SIGNAL;
+            svd[0].finexit_val = WTERMSIG(wstat);
+          } else if(WIFSTOPPED(wstat)) {
+            /* do nothing */
+          } else {
+            svd[0].finexit_type = ET_UNKNOWN;
+            svd[0].finexit_val = 0;
+          }
+        }
         svd[0].state =S_DOWN;
         taia_uint(&deadline, 1);
         taia_add(&deadline, &svd[0].start, &deadline);
diff -ru runit-1.3.1/src/sv.c runit-1.3.1.new/src/sv.c
--- runit-1.3.1/src/sv.c	2005-08-24 15:14:39.901422814 -0500
+++ runit-1.3.1.new/src/sv.c	2005-09-19 14:08:48.592763295 -0500
@@ -1,5 +1,6 @@
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <string.h>
 #include <unistd.h>
 #include "str.h"
 #include "strerr.h"
@@ -48,7 +49,7 @@
 int (*cbk)(char*) =0;
 
 int curdir, fd, r;
-char svstatus[20];
+char svstatus[24];
 char sulong[FMT_ULONG];
 
 void usage() {
@@ -63,6 +64,10 @@
   strerr_warn4(FATAL, m1, m2, ": ", &strerr_sys);
   done(lsb ? 151 : 100);
 }
+void fatal2x(char *m1, char *m2) {
+  strerr_warn3(FATAL, m1, m2, 0);
+  done(lsb ? 151 : 100);
+}
 void out(char *p, char *m1) {
   buffer_puts(buffer_1, p);
   buffer_puts(buffer_1, *service);
@@ -86,6 +91,124 @@
 void outs2(const char *s) { buffer_puts(buffer_2, s); }
 void flush2(const char *s) { outs2(s); buffer_flush(buffer_2); }
 
+typedef int (*flagprint_fn)();
+typedef struct statusflag {
+  char flagchar;
+  flagprint_fn printfunc;
+} statusflag_t;
+
+#define MAX_FLAGPRINTFUNCS 16
+flagprint_fn flagprintfuncs[MAX_FLAGPRINTFUNCS] = { 0, };
+int flagprintfunccount = 0;  /* number of printfuncs currently requested */
+
+static inline int getPID() {
+  int pid =(unsigned char) svstatus[15];
+  pid <<=8; pid +=(unsigned char)svstatus[14];
+  pid <<=8; pid +=(unsigned char)svstatus[13];
+  pid <<=8; pid +=(unsigned char)svstatus[12];
+  return pid;
+}
+
+static int printULong(unsigned long ulong) {
+  char sulong[FMT_ULONG];
+  memset(sulong, 0, FMT_ULONG);
+  fmt_ulong(sulong, ulong);
+  outs(sulong);
+  return 1;
+}
+
+static inline int printExitData(int exit_type, int exit_val) {
+  switch(exit_type) {
+    case 0:
+      outs("none:");
+      break;
+    case 1:
+      outs("exit:");
+      break;
+    case 2:
+      outs("signal:");
+      break;
+    default:
+      outs("unknown:");
+  }
+  printULong(exit_val);
+  return 1;
+}
+
+static int printRunExit() {
+  return printExitData(svstatus[20], svstatus[21]);
+}
+
+static int printFinExit() {
+  return printExitData(svstatus[22], svstatus[23]);
+}
+
+static int printPID() {
+  int pid = getPID(svstatus);
+  if(pid) printULong(pid);
+  else outs("-");
+  return 1;
+}
+
+static int printCurrentState() {
+  switch(svstatus[19]) {
+    case 0:
+      outs("down");
+      break;
+    case 1:
+      outs("run");
+      break;
+    case 2:
+      outs("finish");
+      break;
+    default:
+      outs("error");
+      return -1;
+  }
+  return 1;
+}
+
+static int printWantState() {
+  switch(svstatus[17]) {
+    case 'u':
+      outs("up");
+      break;
+    case 'd':
+      outs("down");
+      break;
+    case 'x':
+      outs("exit");
+      break;
+    default:
+      outs("error");
+      return -1;
+  }
+  return 1;
+}
+
+static int printUptime() {
+  struct tai svtai;		/* from structure */
+  struct tai now;		/* current time*/
+
+  tai_now(&now);
+  tai_unpack(svstatus, &svtai);
+  if (tai_less(&now,&svtai)) svtai = now;
+  tai_sub(&svtai,&now,&svtai);
+  printULong(tai_approx(&svtai));
+  return 1;
+}
+
+const statusflag_t statusflags[] = {
+  {'r', printRunExit},
+  {'f', printFinExit},
+  {'p', printPID},
+  {'s', printCurrentState},
+  {'w', printWantState},
+  {'u', printUptime},
+  {0, 0},
+};
+
+
 int svstatus_get() {
   if ((fd =open_write("supervise/ok")) == -1) {
     if (errno == error_nodevice) {
@@ -100,10 +223,12 @@
     warn("unable to open supervise/status");
     return(-1);
   }
-  r =read(fd, svstatus, 20);
+  memset(svstatus, 0, 24);
+  r =read(fd, svstatus, 24);
   close(fd);
   switch(r) {
   case 20: break;
+  case 24: break;
   case -1: warn("unable to read supervise/status"); return(-1);
   default: warnx("unable to read supervise/status: bad format"); return(-1);
   }
@@ -122,10 +247,7 @@
     }
     normallyup =1;
   }
-  pid =(unsigned char) svstatus[15];
-  pid <<=8; pid +=(unsigned char)svstatus[14];
-  pid <<=8; pid +=(unsigned char)svstatus[13];
-  pid <<=8; pid +=(unsigned char)svstatus[12];
+  pid =getPID();
   tai_unpack(svstatus, &tstatus);
   if (pid) {
     switch (svstatus[19]) {
@@ -152,18 +274,28 @@
 int status(char *unused) {
   r =svstatus_get();
   switch(r) { case -1: if (lsb) done(4); case 0: return(0); }
-  r =svstatus_print(*service);
-  if (chdir("log") == -1) {
-    if (errno != error_noent) {
-      outs("; log: "); outs(WARN);
-      outs("unable to change to log service directory: ");
-      outs(error_str(errno));
+
+  if(flagprintfunccount == 0) {
+    r =svstatus_print(*service);
+    if (chdir("log") == -1) {
+      if (errno != error_noent) {
+        outs("; log: "); outs(WARN);
+        outs("unable to change to log service directory: ");
+        outs(error_str(errno));
+      }
     }
-  }
-  else
-    if (svstatus_get()) {
-      outs("; "); svstatus_print("log");
+    else
+      if (svstatus_get()) {
+        outs("; "); svstatus_print("log");
+      }
+  } else {
+    int i;
+    outs(*service);
+    for(i = 0; i < flagprintfunccount; i++) {
+      outs(" ");
+      flagprintfuncs[i]();
     }
+  }
   flush("\n");
   if (lsb) switch(r) { case 1: done(0); case 2: done(3); case 0: done(4); }
   return(r);
@@ -319,17 +451,42 @@
 
   servicex =service;
   for (i =0; i < services; ++i) {
-    if ((**service != '/') && (**service != '.')) {
-      if ((chdir(varservice) == -1) || (chdir(*service) == -1)) {
-        fail("unable to change to service directory");
-        *service =0;
+    if ((act == &status) && (**service == '-')) {
+      char *flag = (*service)+1;
+      while(*flag != 0) {
+        /* search for a function associated with this flag */
+        int fnidx, foundFlag = 0;
+        for(fnidx = 0; statusflags[fnidx].flagchar != 0; fnidx++) {
+          if(flagprintfunccount == MAX_FLAGPRINTFUNCS) {
+            fatal("Excessive number of flag-based parameters provided");
+          }
+          if(statusflags[fnidx].flagchar == *flag) {
+            flagprintfuncs[flagprintfunccount++] = statusflags[fnidx].printfunc;
+            foundFlag = 1;
+            break;
+          }
+        }
+        if(!foundFlag) {
+          char badFlag[3] = {'-', 'X', 0};
+          badFlag[1] = *flag;
+          fatal2x("Invalid status flag provided: ", badFlag);
+        }
+        flag++;
       }
+      *service = 0;
     }
     else
-      if (chdir(*service) == -1) {
-        fail("unable to change to service directory");
-        *service =0;
+      if ((**service != '/') && (**service != '.')) {
+        if ((chdir(varservice) == -1) || (chdir(*service) == -1)) {
+          fail("unable to change to service directory");
+          *service =0;
+        }
       }
+      else
+        if (chdir(*service) == -1) {
+          fail("unable to change to service directory");
+          *service =0;
+        }
     if (*service) if (act(acts) == -1) *service =0;
     if (fchdir(curdir) == -1) fatal("unable to change to original directory");
     service++;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: new "sv status" flags and exit-tracking patch, yet again
  2005-09-19 19:13     ` Charles Duffy
@ 2005-09-21 21:50       ` Charles Duffy
  2005-09-26 10:12       ` new "sv status" flags and exit-tracking patch, and misc Gerrit Pape
  1 sibling, 0 replies; 18+ messages in thread
From: Charles Duffy @ 2005-09-21 21:50 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 150 bytes --]

This version uses negative numbers for representing cases where exit was 
on account of a signal being received, as per Gerrit's previous suggestion.

[-- Attachment #2: runit-1.3.1-newer.patch --]
[-- Type: text/x-patch, Size: 10130 bytes --]

diff -ru runit-1.3.1/src/runsv.c runit-1.3.1.new/src/runsv.c
--- runit-1.3.1/src/runsv.c	2005-08-24 15:14:39.680226345 -0500
+++ runit-1.3.1.new/src/runsv.c	2005-09-19 10:00:44.751145982 -0500
@@ -1,4 +1,5 @@
 #include <sys/types.h>
+#include <sys/wait.h>
 #include <sys/stat.h>
 #include <unistd.h>
 #include <stdio.h>
@@ -39,6 +40,11 @@
 #define W_UP 0
 #define W_DOWN 1
 #define W_EXIT 2
+/* exit type */
+#define ET_NOEXIT 0
+#define ET_NORMAL 1
+#define ET_SIGNAL 2
+#define ET_UNKNOWN 127
 
 struct svdir {
   int pid;
@@ -50,6 +56,10 @@
   int fdcontrol;
   int fdcontrolwrite;
   int islog;
+  unsigned char runexit_type;
+  unsigned char runexit_val;
+  unsigned char finexit_type;
+  unsigned char finexit_val;
 };
 struct svdir svd[2];
 
@@ -86,7 +96,7 @@
 void update_status(struct svdir *s) {
   unsigned long l;
   int fd;
-  char status[20];
+  char status[24];
   char bspace[64];
   buffer b;
   char spid[FMT_ULONG];
@@ -180,6 +190,10 @@
   else
     status[18] =0;
   status[19] =s->state;
+  status[20] =s->runexit_type;
+  status[21] =s->runexit_val;
+  status[22] =s->finexit_type;
+  status[23] =s->finexit_val;
   if ((fd =open_trunc("supervise/status.new")) == -1) {
     warn("unable to open supervise/status.new");
     return;
@@ -399,6 +413,10 @@
   svd[0].ctrl =C_NOOP;
   svd[0].want =W_UP;
   svd[0].islog =0;
+  svd[0].runexit_type =0;
+  svd[0].runexit_val =0;
+  svd[0].finexit_type =0;
+  svd[0].finexit_val =0;
   svd[1].pid =0;
   taia_now(&svd[0].start);
   if (stat("down", &s) != -1) svd[0].want =W_DOWN;
@@ -416,6 +434,10 @@
       svd[1].ctrl =C_NOOP;
       svd[1].want =W_UP;
       svd[1].islog =1;
+      svd[1].runexit_type =0;
+      svd[1].runexit_val =0;
+      svd[1].finexit_type =0;
+      svd[1].finexit_val =0;
       taia_now(&svd[1].start);
       if (stat("log/down", &s) != -1)
         svd[1].want =W_DOWN;
@@ -540,13 +562,39 @@
         svd[0].pid =0;
         pidchanged =1;
         svd[0].ctrl &=~C_TERM;
-        if (svd[0].state != S_FINISH)
+        if (svd[0].state != S_FINISH) {
+          if (WIFEXITED(wstat)) {
+            svd[0].runexit_type = ET_NORMAL;
+            svd[0].runexit_val = WEXITSTATUS(wstat);
+          } else if (WIFSIGNALED(wstat)) {
+            svd[0].runexit_type = ET_SIGNAL;
+            svd[0].runexit_val = WTERMSIG(wstat);
+          } else if(WIFSTOPPED(wstat)) {
+            /* do nothing */
+          } else {
+            svd[0].runexit_type = ET_UNKNOWN;
+            svd[0].runexit_val = 0;
+          }
           if ((fd =open_read("finish")) != -1) {
             close(fd);
             svd[0].state =S_FINISH;
             update_status(&svd[0]);
             continue;
           }
+        } else {
+          if (WIFEXITED(wstat)) {
+            svd[0].finexit_type = ET_NORMAL;
+            svd[0].finexit_val = WEXITSTATUS(wstat);
+          } else if (WIFSIGNALED(wstat)) {
+            svd[0].finexit_type = ET_SIGNAL;
+            svd[0].finexit_val = WTERMSIG(wstat);
+          } else if(WIFSTOPPED(wstat)) {
+            /* do nothing */
+          } else {
+            svd[0].finexit_type = ET_UNKNOWN;
+            svd[0].finexit_val = 0;
+          }
+        }
         svd[0].state =S_DOWN;
         taia_uint(&deadline, 1);
         taia_add(&deadline, &svd[0].start, &deadline);
diff -ru runit-1.3.1/src/sv.c runit-1.3.1.new/src/sv.c
--- runit-1.3.1/src/sv.c	2005-08-24 15:14:39.801889117 -0500
+++ runit-1.3.1.new/src/sv.c	2005-09-21 16:42:48.347484541 -0500
@@ -1,5 +1,6 @@
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <string.h>
 #include <unistd.h>
 #include "str.h"
 #include "strerr.h"
@@ -48,7 +49,7 @@
 int (*cbk)(char*) =0;
 
 int curdir, fd, r;
-char svstatus[20];
+char svstatus[24];
 char sulong[FMT_ULONG];
 
 void usage() {
@@ -63,6 +64,10 @@
   strerr_warn4(FATAL, m1, m2, ": ", &strerr_sys);
   done(lsb ? 151 : 100);
 }
+void fatal2x(char *m1, char *m2) {
+  strerr_warn3(FATAL, m1, m2, 0);
+  done(lsb ? 151 : 100);
+}
 void out(char *p, char *m1) {
   buffer_puts(buffer_1, p);
   buffer_puts(buffer_1, *service);
@@ -86,6 +91,114 @@
 void outs2(const char *s) { buffer_puts(buffer_2, s); }
 void flush2(const char *s) { outs2(s); buffer_flush(buffer_2); }
 
+typedef int (*flagprint_fn)();
+typedef struct statusflag {
+  char flagchar;
+  flagprint_fn printfunc;
+} statusflag_t;
+
+#define MAX_FLAGPRINTFUNCS 16
+flagprint_fn flagprintfuncs[MAX_FLAGPRINTFUNCS] = { 0, };
+int flagprintfunccount = 0;  /* number of printfuncs currently requested */
+
+static inline int getPID() {
+  int pid =(unsigned char) svstatus[15];
+  pid <<=8; pid +=(unsigned char)svstatus[14];
+  pid <<=8; pid +=(unsigned char)svstatus[13];
+  pid <<=8; pid +=(unsigned char)svstatus[12];
+  return pid;
+}
+
+static int printULong(unsigned long ulong) {
+  char sulong[FMT_ULONG];
+  memset(sulong, 0, FMT_ULONG);
+  fmt_ulong(sulong, ulong);
+  outs(sulong);
+  return 1;
+}
+
+static inline int printExitData(int exit_type, int exit_val) {
+  if(exit_type != 1)
+    outs("-");
+  if(exit_type != 0)
+    printULong(exit_val);
+  return 1;
+}
+
+static int printRunExit() {
+  return printExitData(svstatus[20], svstatus[21]);
+}
+
+static int printFinExit() {
+  return printExitData(svstatus[22], svstatus[23]);
+}
+
+static int printPID() {
+  int pid = getPID(svstatus);
+  if(pid) printULong(pid);
+  else outs("-");
+  return 1;
+}
+
+static int printCurrentState() {
+  switch(svstatus[19]) {
+    case 0:
+      outs("down");
+      break;
+    case 1:
+      outs("run");
+      break;
+    case 2:
+      outs("finish");
+      break;
+    default:
+      outs("error");
+      return -1;
+  }
+  return 1;
+}
+
+static int printWantState() {
+  switch(svstatus[17]) {
+    case 'u':
+      outs("up");
+      break;
+    case 'd':
+      outs("down");
+      break;
+    case 'x':
+      outs("exit");
+      break;
+    default:
+      outs("error");
+      return -1;
+  }
+  return 1;
+}
+
+static int printUptime() {
+  struct tai svtai;		/* from structure */
+  struct tai now;		/* current time*/
+
+  tai_now(&now);
+  tai_unpack(svstatus, &svtai);
+  if (tai_less(&now,&svtai)) svtai = now;
+  tai_sub(&svtai,&now,&svtai);
+  printULong(tai_approx(&svtai));
+  return 1;
+}
+
+const statusflag_t statusflags[] = {
+  {'r', printRunExit},
+  {'f', printFinExit},
+  {'p', printPID},
+  {'s', printCurrentState},
+  {'w', printWantState},
+  {'u', printUptime},
+  {0, 0},
+};
+
+
 int svstatus_get() {
   if ((fd =open_write("supervise/ok")) == -1) {
     if (errno == error_nodevice) {
@@ -100,10 +213,12 @@
     warn("unable to open supervise/status");
     return(-1);
   }
-  r =read(fd, svstatus, 20);
+  memset(svstatus, 0, 24);
+  r =read(fd, svstatus, 24);
   close(fd);
   switch(r) {
   case 20: break;
+  case 24: break;
   case -1: warn("unable to read supervise/status"); return(-1);
   default: warnx("unable to read supervise/status: bad format"); return(-1);
   }
@@ -122,10 +237,7 @@
     }
     normallyup =1;
   }
-  pid =(unsigned char) svstatus[15];
-  pid <<=8; pid +=(unsigned char)svstatus[14];
-  pid <<=8; pid +=(unsigned char)svstatus[13];
-  pid <<=8; pid +=(unsigned char)svstatus[12];
+  pid =getPID();
   tai_unpack(svstatus, &tstatus);
   if (pid) {
     switch (svstatus[19]) {
@@ -152,18 +264,28 @@
 int status(char *unused) {
   r =svstatus_get();
   switch(r) { case -1: if (lsb) done(4); case 0: return(0); }
-  r =svstatus_print(*service);
-  if (chdir("log") == -1) {
-    if (errno != error_noent) {
-      outs("; log: "); outs(WARN);
-      outs("unable to change to log service directory: ");
-      outs(error_str(errno));
+
+  if(flagprintfunccount == 0) {
+    r =svstatus_print(*service);
+    if (chdir("log") == -1) {
+      if (errno != error_noent) {
+        outs("; log: "); outs(WARN);
+        outs("unable to change to log service directory: ");
+        outs(error_str(errno));
+      }
     }
-  }
-  else
-    if (svstatus_get()) {
-      outs("; "); svstatus_print("log");
+    else
+      if (svstatus_get()) {
+        outs("; "); svstatus_print("log");
+      }
+  } else {
+    int i;
+    outs(*service);
+    for(i = 0; i < flagprintfunccount; i++) {
+      outs(" ");
+      flagprintfuncs[i]();
     }
+  }
   flush("\n");
   if (lsb) switch(r) { case 1: done(0); case 2: done(3); case 0: done(4); }
   return(r);
@@ -319,17 +441,42 @@
 
   servicex =service;
   for (i =0; i < services; ++i) {
-    if ((**service != '/') && (**service != '.')) {
-      if ((chdir(varservice) == -1) || (chdir(*service) == -1)) {
-        fail("unable to change to service directory");
-        *service =0;
+    if ((act == &status) && (**service == '-')) {
+      char *flag = (*service)+1;
+      while(*flag != 0) {
+        /* search for a function associated with this flag */
+        int fnidx, foundFlag = 0;
+        for(fnidx = 0; statusflags[fnidx].flagchar != 0; fnidx++) {
+          if(flagprintfunccount == MAX_FLAGPRINTFUNCS) {
+            fatal("Excessive number of flag-based parameters provided");
+          }
+          if(statusflags[fnidx].flagchar == *flag) {
+            flagprintfuncs[flagprintfunccount++] = statusflags[fnidx].printfunc;
+            foundFlag = 1;
+            break;
+          }
+        }
+        if(!foundFlag) {
+          char badFlag[3] = {'-', 'X', 0};
+          badFlag[1] = *flag;
+          fatal2x("Invalid status flag provided: ", badFlag);
+        }
+        flag++;
       }
+      *service = 0;
     }
     else
-      if (chdir(*service) == -1) {
-        fail("unable to change to service directory");
-        *service =0;
+      if ((**service != '/') && (**service != '.')) {
+        if ((chdir(varservice) == -1) || (chdir(*service) == -1)) {
+          fail("unable to change to service directory");
+          *service =0;
+        }
       }
+      else
+        if (chdir(*service) == -1) {
+          fail("unable to change to service directory");
+          *service =0;
+        }
     if (*service) if (act(acts) == -1) *service =0;
     if (fchdir(curdir) == -1) fatal("unable to change to original directory");
     service++;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: new "sv status" flags and exit-tracking patch, and misc.
  2005-09-19 19:13     ` Charles Duffy
  2005-09-21 21:50       ` new "sv status" flags and exit-tracking patch, yet again Charles Duffy
@ 2005-09-26 10:12       ` Gerrit Pape
  2005-09-26 15:31         ` duplicate processes Jussi Ramo
  2005-12-08 11:08         ` new "sv status" flags and exit-tracking patch, and misc Gerrit Pape
  1 sibling, 2 replies; 18+ messages in thread
From: Gerrit Pape @ 2005-09-26 10:12 UTC (permalink / raw)


On Mon, Sep 19, 2005 at 02:13:30PM -0500, Charles Duffy wrote:
> Charles Duffy wrote:
> >Attached is a patch which attempts to stick to the items we agreed on.
> 
> Erk -- actually, that one was a little too overreaching as well, since 
> it changed "sv status" to provide the last exit status of finish (if 
> curently in run) or run (if currently in finish), if that program exited 
> with a status other than 0. Since the requested spec was that the output 
> of "sv status" (when called without flags) not be altered... well, 
> here's another attempt.

Hi Charles, thanks a lot for the patch.  This looks much better to me to
work with.  I'll try to find some time soon, to review it, and integrate
it into the next runit test version.  Thanks also for arranging with the
coding style of the runit project, it may well be that's a strange style
I use, but it's one I personally can read well and work good with.

Regards, Gerrit.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* duplicate processes
  2005-09-26 10:12       ` new "sv status" flags and exit-tracking patch, and misc Gerrit Pape
@ 2005-09-26 15:31         ` Jussi Ramo
  2005-09-26 15:42           ` Charlie Brady
  2005-12-08 11:08         ` new "sv status" flags and exit-tracking patch, and misc Gerrit Pape
  1 sibling, 1 reply; 18+ messages in thread
From: Jussi Ramo @ 2005-09-26 15:31 UTC (permalink / raw)
  Cc: supervision

Hello,

I am new user for runit (sv part) and it would be highly appreciated if 
someone could give me
a hint for troubleshooting the following problem:

Some processes are still started first directly by init even if I try to 
use runits supervision only. So duplicate process will be generated and 
system becomes unstable. Both of those processes (the one started by init 
and the one started by runsv) react on sv command.

For example below: aisexec works properly but ndb_mgmd causes problems. 
There are no
major diffrencies in run scripts but file paths or so. Both of the run 
scrips are quite simple.

root         1     0  1 14:40 ?        00:00:05 init [3]
root      1946     1  0 14:40 ?        00:00:00 runsvdir -P /var/services 
log: ...................
root      1950  1946  0 14:40 ?        00:00:00 runsv aisexec
root      1951  1946  0 14:40 ?        00:00:00 runsv ndb_mgmd
ais       1955  1950  0 14:40 ?        00:00:00 /opt/SGC/bin/aisexec
ais       1967     1  0 14:40 ?        00:00:00 /opt/SGC/bin/ndb_mgmd -f 
/opt/SGC/etc/ndbconfig.ini
ais       3181  1951  3 14:46 ?        00:00:00 /opt/SGC/bin/ndb_mgmd -f 
/opt/SGC/etc/ndbconfig.ini

The other ndb_mgmd is restarted frequently by runsv because of the same 
process is started directly by init for some reason.

I have double checked that:
-There is nothing more added in inittab but runsvdir
-There are no files under init.d that might start ndb_mgmd

Kind regards,

Jussi




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: duplicate processes
  2005-09-26 15:31         ` duplicate processes Jussi Ramo
@ 2005-09-26 15:42           ` Charlie Brady
  0 siblings, 0 replies; 18+ messages in thread
From: Charlie Brady @ 2005-09-26 15:42 UTC (permalink / raw)
  Cc: supervision


On Mon, 26 Sep 2005, Jussi Ramo wrote:

> Some processes are still started first directly by init even if I try to use 
> runits supervision only.

What evidence do you have that the processes are started directly by init? 
Remember that a process will be inherited by init if its direct parent 
dies.

> So duplicate process will be generated and system 
> becomes unstable. Both of those processes (the one started by init and the 
> one started by runsv) react on sv command.

React in what ways?

> For example below: aisexec works properly but ndb_mgmd causes problems.

Google tells me that ndb_mgmd is part of mysql. mysql is known to be badly 
designed wrt co-operating with runit/daemontools.

> There are no major diffrencies in run scripts but file paths or so. Both 
> of the run scrips are quite simple.

What are they?

> The other ndb_mgmd is restarted frequently by runsv because of the same 
> process is started directly by init for some reason.

Again, what evidence do you have that there is any process started 
directly by init? Even if so, why would runsv restart the process it is 
managing? I expect that the process runsv is monitoring is exiting, and 
that is why runsv is starting a new process.

---
Charlie


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: new "sv status" flags and exit-tracking patch, and misc.
  2005-09-26 10:12       ` new "sv status" flags and exit-tracking patch, and misc Gerrit Pape
  2005-09-26 15:31         ` duplicate processes Jussi Ramo
@ 2005-12-08 11:08         ` Gerrit Pape
  2005-12-08 19:37           ` Charles Duffy
  1 sibling, 1 reply; 18+ messages in thread
From: Gerrit Pape @ 2005-12-08 11:08 UTC (permalink / raw)


On Mon, Sep 26, 2005 at 10:12:20AM +0000, Gerrit Pape wrote:
> On Mon, Sep 19, 2005 at 02:13:30PM -0500, Charles Duffy wrote:
> > Charles Duffy wrote:
> > >Attached is a patch which attempts to stick to the items we agreed on.
> > 
> > Erk -- actually, that one was a little too overreaching as well, since 
> > it changed "sv status" to provide the last exit status of finish (if 
> > curently in run) or run (if currently in finish), if that program exited 
> > with a status other than 0. Since the requested spec was that the output 
> > of "sv status" (when called without flags) not be altered... well, 
> > here's another attempt.
> 
> Hi Charles, thanks a lot for the patch.  This looks much better to me to
> work with.  I'll try to find some time soon, to review it, and integrate
> it into the next runit test version.

Hi, I haven't done this yet, sorry.  Somehow I'm reluctant to make these
changes to the command line option of the sv program, I'm afraid it gets
too complex.

I guess ./run and ./finish almost always will be scripts, more precisely
shell scripts.  runsv as of now maintains two files supervise/stat and
supervise/pid.  How about removing these files, and add a single one
containing the informations discussed here?:

./supervise/info:
<state> <want> <pid> <uptime> <runrc> <finishrc>

Shell script then can do

 set `cat ./supervise/info` ""
 state=$1
 want=$2
 pid=$3
 uptime=$4
 rrc=$5
 frc=$6

or so.

Regards, Gerrit.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: new "sv status" flags and exit-tracking patch, and misc.
  2005-12-08 11:08         ` new "sv status" flags and exit-tracking patch, and misc Gerrit Pape
@ 2005-12-08 19:37           ` Charles Duffy
  2005-12-09 14:29             ` Gerrit Pape
  0 siblings, 1 reply; 18+ messages in thread
From: Charles Duffy @ 2005-12-08 19:37 UTC (permalink / raw)


Gerrit Pape wrote:
> On Mon, Sep 26, 2005 at 10:12:20AM +0000, Gerrit Pape wrote:
>>Hi Charles, thanks a lot for the patch.  This looks much better to me to
>>work with.  I'll try to find some time soon, to review it, and integrate
>>it into the next runit test version.
> 
> Hi, I haven't done this yet, sorry.  Somehow I'm reluctant to make these
> changes to the command line option of the sv program, I'm afraid it gets
> too complex.
> 
> I guess ./run and ./finish almost always will be scripts, more precisely
> shell scripts.  runsv as of now maintains two files supervise/stat and
> supervise/pid.  How about removing these files, and add a single one
> containing the informations discussed here?:
> 
> ./supervise/info:
> <state> <want> <pid> <uptime> <runrc> <finishrc>

Sounds reasonable. Are you willing to impliment this yourself, or do you 
want me to?



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: new "sv status" flags and exit-tracking patch, and misc.
  2005-12-08 19:37           ` Charles Duffy
@ 2005-12-09 14:29             ` Gerrit Pape
  2005-12-09 14:50               ` Rafal Bisingier
  0 siblings, 1 reply; 18+ messages in thread
From: Gerrit Pape @ 2005-12-09 14:29 UTC (permalink / raw)


On Thu, Dec 08, 2005 at 01:37:51PM -0600, Charles Duffy wrote:
> Gerrit Pape wrote:
> >I guess ./run and ./finish almost always will be scripts, more precisely
> >shell scripts.  runsv as of now maintains two files supervise/stat and
> >supervise/pid.  How about removing these files, and add a single one
> >containing the informations discussed here?:
> >
> >./supervise/info:
> ><state> <want> <pid> <uptime> <runrc> <finishrc>
> 
> Sounds reasonable. Are you willing to impliment this yourself, or do you 
> want me to?

Contribution alwys is nice, and I'm rather busy these days.  So I would
be happy if you can put work into it.  Thanks!

<uptime> will not work though in a file that's only written when some
status changes, this should be <starttime> instead I guess.  And you
suggested to differentiate the run/finish return codes by 'exited
cleanly', and 'killed by signal'.  To make numeric tests (`test "$runrc"
-lt 0`) work, how about 0-255 for 'exited cleanly', and -SIGNO for
'killed by signal'?  What if ./run or ./finish did not exit yet?

Thanks, Gerrit.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: new "sv status" flags and exit-tracking patch, and misc.
  2005-12-09 14:29             ` Gerrit Pape
@ 2005-12-09 14:50               ` Rafal Bisingier
  0 siblings, 0 replies; 18+ messages in thread
From: Rafal Bisingier @ 2005-12-09 14:50 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1599 bytes --]

On Fri, Dec 09, 2005 at 03:29:36PM +0100, Gerrit Pape wrote:
> On Thu, Dec 08, 2005 at 01:37:51PM -0600, Charles Duffy wrote:
> > Gerrit Pape wrote:
> > >I guess ./run and ./finish almost always will be scripts, more precisely
> > >shell scripts.  runsv as of now maintains two files supervise/stat and
> > >supervise/pid.  How about removing these files, and add a single one
> > >containing the informations discussed here?:
> > >
> > >./supervise/info:
> > ><state> <want> <pid> <uptime> <runrc> <finishrc>
> > 
> > Sounds reasonable. Are you willing to impliment this yourself, or do you 
> > want me to?
> 
> Contribution alwys is nice, and I'm rather busy these days.  So I would
> be happy if you can put work into it.  Thanks!
> 
> <uptime> will not work though in a file that's only written when some
> status changes, this should be <starttime> instead I guess.  And you
> suggested to differentiate the run/finish return codes by 'exited
> cleanly', and 'killed by signal'.  To make numeric tests (`test "$runrc"
> -lt 0`) work, how about 0-255 for 'exited cleanly', and -SIGNO for
> 'killed by signal'?  What if ./run or ./finish did not exit yet?

If <state> will be different for ./run and ./finish time, then one can
simply check:
- if <state> is run, then runrc is for the previous execution of ./run
  and analogical for finish
- if runrc/finishrc is '-' then the appropriate script didn't exit yet

That's why <state> should differentiate executing ./run from executing
./finish, and runrc/finishrc fields shouldn't be empty

-- 
Rafal Bisingier

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: duplicate processes
  2005-09-27 18:25 ` Charlie Brady
@ 2005-09-28  8:25   ` Jussi Ramo
  0 siblings, 0 replies; 18+ messages in thread
From: Jussi Ramo @ 2005-09-28  8:25 UTC (permalink / raw)
  Cc: supervision

Thanks Charlie!

Charlie Brady wrote:
> 
> No, I didn't mean to suggest that runsv dies. I expect that ndb_mgmd has 
> forked, and the parent died. Perhaps it was designed to do that, to 
> "daemonise" the child. You will have to find some way to prevent that 
> from happening.
> 

That was it!! There is an option --nodeamon, and when using that 
everything works fine. (same goes for snmpdm -d)

 >mysql is known to be badly designed wrt co-operating with runit/daemontools.

Is it just because of this daemonizing issue, or is there other reasons 
not to use runit with mysql? If not, I could send those runs scripts to 
run script collection page when they are mature. I am using SUSE 9.1 and SLES.

Regards,
Jussi





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: duplicate processes
  2005-09-27  8:36 duplicate processes Jussi Ramo
@ 2005-09-27 18:25 ` Charlie Brady
  2005-09-28  8:25   ` Jussi Ramo
  0 siblings, 1 reply; 18+ messages in thread
From: Charlie Brady @ 2005-09-27 18:25 UTC (permalink / raw)
  Cc: supervision


On Tue, 27 Sep 2005, Jussi Ramo wrote:

>> What evidence do you have that the processes are started directly by 
> init? Remember that a process will be inherited by init if its direct 
>> parent dies.
>
> No evidence. Just looked at the parent process.

I'm pretty sure that's misleading.

> So you suggest that the "runsv ndb_mgmd" dies and the ndb_mgmd is 
> inherited by init. Then "runsv ndb_mgmd" is respawned by runsvdir (?) 
> and that starts another ndb_mgmd. This makes sense to me but now the 
> question is why runsv first dies once for certain processes.

No, I didn't mean to suggest that runsv dies. I expect that ndb_mgmd has 
forked, and the parent died. Perhaps it was designed to do that, to 
"daemonise" the child. You will have to find some way to prevent that from 
happening.

>>> So duplicate process will be generated and system becomes unstable. 
> Both of those processes (the one started by init and the one started by 
> runsv) react on sv command.
>> React in what ways?
>
> Do not know if this brings any extra information but if I have first the 
> following ndb_mgmd processes (one of them badly as child of init) :
>
> root      1950  1945  0 07:04 ?        00:00:00 runsv ndb_mgmd
> ais       1963     1  0 07:04 ?        00:00:00 /opt/SGC/bin/ndb_mgmd -f 
> /opt/SGC/etc/ndbconfig.ini
> ais       2276  1950  2 07:05 ?        00:00:00 /opt/SGC/bin/ndb_mgmd -f 
> /opt/SGC/etc/ndbconfig.ini
>
> Then I do like:
>
> blade_0_7:~ # /opt/SGC/bin/sv down /var/services/ndb_mgmd/
>
> and the other "right" ndb_mgmd disappears:
>
> root      1950  1945  0 07:04 ?        00:00:00 runsv ndb_mgmd
> ais       1963     1  0 07:04 ?        00:00:00 /opt/SGC/bin/ndb_mgmd -f 
> /opt/SGC/etc/ndbconfig.ini

OK.

> I then kill the "wrong" ndb_mgmd
>
> blade_0_7:~ # kill -9 1963
> root      1950  1945  0 07:04 ?        00:00:00 runsv ndb_mgmd

OK, but you shouldn't be using -9 unless there is no alternative.

> And when ndb_mgmd is put "up" there are again those two processes:
>
> blade_0_7:~ # /opt/SGC/bin/sv up /var/services/ndb_mgmd/
>
> root      1950  1945  0 07:04 ?        00:00:00 runsv ndb_mgmd
> ais       2805     1  0 07:08 ?        00:00:00 /opt/SGC/bin/ndb_mgmd -f 
> /opt/SGC/etc/ndbconfig.ini
> ais       2837  1950  4 07:08 ?        00:00:00 /opt/SGC/bin/ndb_mgmd -f 
> /opt/SGC/etc/ndbconfig.ini

OK. Assuming that you have pids allocated sequentially, look how many 
intervening processes there are between the two invocations.

>>> The other ndb_mgmd is restarted frequently by runsv because of the 
> same process is started directly by init for some reason.
>> Again, what evidence do you have that there is any process started 
> directly by init? Even if so, why would runsv restart the process it is 
> managing? I expect that the process runsv is monitoring is exiting, and that 
> is why runsv is starting a new process.
>
> Right. The process runsv is monitoring is exiting because the port it tries 
> to use is reserved by the extra process (now hopefully correct phrasing:) 
> whose parent is init.

Not quite. The process runsv is monitoring exits when told to by runsv. 
You showed us that. The other ndb_mgmd process however doesn't exit, and 
runsv will therefore not be able to start a new process which is able to 
keep running.

Does ndb_mgmd create a pid file? If so, what is its content?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: duplicate processes
@ 2005-09-27  8:36 Jussi Ramo
  2005-09-27 18:25 ` Charlie Brady
  0 siblings, 1 reply; 18+ messages in thread
From: Jussi Ramo @ 2005-09-27  8:36 UTC (permalink / raw)
  Cc: supervision


 >What evidence do you have that the processes are started directly by 
init? Remember that a process will be inherited by init if its direct 
 >parent dies.

No evidence. Just looked at the parent process. So you suggest that the 
"runsv ndb_mgmd" dies and the ndb_mgmd is inherited by init. Then "runsv 
ndb_mgmd" is respawned by runsvdir (?) and that starts another ndb_mgmd. 
This makes sense to me but now the question is why runsv first dies once 
for certain processes.




 >> So duplicate process will be generated and system becomes unstable. 
Both of those processes (the one started by init and the one started by 
runsv) react on sv command.
 >React in what ways?

Do not know if this brings any extra information but if I have first the 
following ndb_mgmd processes (one of them badly as child of init) :

root      1950  1945  0 07:04 ?        00:00:00 runsv ndb_mgmd
ais       1963     1  0 07:04 ?        00:00:00 /opt/SGC/bin/ndb_mgmd -f 
/opt/SGC/etc/ndbconfig.ini
ais       2276  1950  2 07:05 ?        00:00:00 /opt/SGC/bin/ndb_mgmd -f 
/opt/SGC/etc/ndbconfig.ini

Then I do like:

blade_0_7:~ # /opt/SGC/bin/sv down /var/services/ndb_mgmd/

and the other "right" ndb_mgmd disappears:

root      1950  1945  0 07:04 ?        00:00:00 runsv ndb_mgmd
ais       1963     1  0 07:04 ?        00:00:00 /opt/SGC/bin/ndb_mgmd -f 
/opt/SGC/etc/ndbconfig.ini

I then kill the "wrong" ndb_mgmd

blade_0_7:~ # kill -9 1963
root      1950  1945  0 07:04 ?        00:00:00 runsv ndb_mgmd

And when ndb_mgmd is put "up" there are again those two processes:

blade_0_7:~ # /opt/SGC/bin/sv up /var/services/ndb_mgmd/

root      1950  1945  0 07:04 ?        00:00:00 runsv ndb_mgmd
ais       2805     1  0 07:08 ?        00:00:00 /opt/SGC/bin/ndb_mgmd -f 
/opt/SGC/etc/ndbconfig.ini
ais       2837  1950  4 07:08 ?        00:00:00 /opt/SGC/bin/ndb_mgmd -f 
/opt/SGC/etc/ndbconfig.ini


 >> For example below: aisexec works properly but ndb_mgmd causes problems.
 >Google tells me that ndb_mgmd is part of mysql. mysql is known to be 
badly designed wrt co-operating with runit/daemontools.

This is a good piece of information. Other 3pp product that causes similar 
troubles as mysql to me is EMANATE snmp agent, when using with runit.


 >> There are no major diffrencies in run scripts but file paths or so. 
Both of the run scrips are quite simple.
 >What are they?

#!/bin/sh
# Source environment settings
. /opt/SGC/etc/sgcenv
exec >> $LOG_FILE 2>&1
aisexec_BIN=/opt/SGC/bin/aisexec
exec $aisexec_BIN


#!/bin/sh
# Source environment settings
. /opt/SGC/etc/sgcenv
exec >> $LOG_FILE 2>&1
ndb_mgmd_BIN=/opt/SGC/bin/ndb_mgmd
exec $ndb_mgmd_BIN -f /opt/SGC/etc/ndbconfig.ini



 >> The other ndb_mgmd is restarted frequently by runsv because of the 
same process is started directly by init for some reason.
 >Again, what evidence do you have that there is any process started 
directly by init? Even if so, why would runsv restart the process it is 
managing? I expect that the process runsv is monitoring is exiting, and 
that is why runsv is starting a new process.

Right. The process runsv is monitoring is exiting because the port it 
tries to use is reserved by the extra process (now hopefully correct 
phrasing:) whose parent is init.

Thanks and regards,
Jussi

P.S. I now tried to create a new message for a mailing thread, first time 
I just replied on an existing thread changing the subject. Sorry for that!









^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2005-12-09 14:50 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-17  4:46 new "sv status" flags and exit-tracking patch, and misc Charles Duffy
2005-09-17  5:17 ` Mike Bell
2005-09-19  8:35   ` Gerrit Pape
2005-09-17  8:53 ` Laurent Bercot
2005-09-19  8:31 ` Gerrit Pape
2005-09-19 16:04   ` Charles Duffy
2005-09-19 19:13     ` Charles Duffy
2005-09-21 21:50       ` new "sv status" flags and exit-tracking patch, yet again Charles Duffy
2005-09-26 10:12       ` new "sv status" flags and exit-tracking patch, and misc Gerrit Pape
2005-09-26 15:31         ` duplicate processes Jussi Ramo
2005-09-26 15:42           ` Charlie Brady
2005-12-08 11:08         ` new "sv status" flags and exit-tracking patch, and misc Gerrit Pape
2005-12-08 19:37           ` Charles Duffy
2005-12-09 14:29             ` Gerrit Pape
2005-12-09 14:50               ` Rafal Bisingier
2005-09-27  8:36 duplicate processes Jussi Ramo
2005-09-27 18:25 ` Charlie Brady
2005-09-28  8:25   ` Jussi Ramo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).