* graceful restart under runit @ 2006-11-15 11:47 Dražen Kačar 2006-11-15 16:08 ` Alex Efros 0 siblings, 1 reply; 20+ messages in thread From: Dražen Kačar @ 2006-11-15 11:47 UTC (permalink / raw) Say I have a TCP server which listens on incoming connections on some TCP port. Occasionaly I'd like to install and run a new version of the server executable. Server source is under my control, for all intents and purposes. Normally I'd use SIGUSR1 to make the server close socket on which it listens, finish processing current client sessions (depending on the protocol, that might take seconds, minutes or hours) and exit. Right after sending SIGUSR1 I'd start the new server version which would just work for all new client connections. If the server is managed by runit, things get complicated because runit won't start the new server until the old one exits, so I either have to abort existing client connections or suffer some time without service. Is there a way to get around this? -- .-. .-. Yes, I am an agent of Satan, but my duties are largely (_ \ / _) ceremonial. | | dave@fly.srk.fer.hr ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-15 11:47 graceful restart under runit Dražen Kačar @ 2006-11-15 16:08 ` Alex Efros 2006-11-16 15:24 ` Dražen Kačar 0 siblings, 1 reply; 20+ messages in thread From: Alex Efros @ 2006-11-15 16:08 UTC (permalink / raw) Cc: Dra?en Ka?ar Hi! On Wed, Nov 15, 2006 at 12:47:54PM +0100, Dra?en Ka?ar wrote: > Say I have a TCP server which listens on incoming connections on some TCP > port. Occasionaly I'd like to install and run a new version of the server > executable. Server source is under my control, for all intents and > purposes. [...] > Is there a way to get around this? Probably you can just fork() after receiving SIGUSR1 and exit from parent leaving child to process existing connection. -- WBR, Alex. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-15 16:08 ` Alex Efros @ 2006-11-16 15:24 ` Dražen Kačar 2006-11-17 0:15 ` Alex Efros 2006-11-17 13:14 ` Gerrit Pape 0 siblings, 2 replies; 20+ messages in thread From: Dražen Kačar @ 2006-11-16 15:24 UTC (permalink / raw) Alex Efros wrote: > On Wed, Nov 15, 2006 at 12:47:54PM +0100, Dra?en Ka?ar wrote: > > Say I have a TCP server which listens on incoming connections on some TCP > > port. Occasionaly I'd like to install and run a new version of the server > > executable. Server source is under my control, for all intents and > > purposes. > [...] > > Is there a way to get around this? > > Probably you can just fork() after receiving SIGUSR1 and exit from parent > leaving child to process existing connection. Servers which use process per connection do something like that already (the parent process signals the children, exits and leaves them to finish sessions and then they exit too). However, there are multithreaded monsters which can't do that. fork() replicates just the calling thread[1], so it's not an option and exit() will terminate all threads (ie. all sessions). [1] It's possible to replicate all threads on Solaris, but that's too unportable for my purposes. Besides, calling fork() from an MT process usually uncovers bugs in various libraries which aren't prepared to deal with that. -- .-. .-. Yes, I am an agent of Satan, but my duties are largely (_ \ / _) ceremonial. | | dave@fly.srk.fer.hr ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-16 15:24 ` Dražen Kačar @ 2006-11-17 0:15 ` Alex Efros 2006-11-17 0:48 ` Paul Jarc 2006-11-17 13:14 ` Gerrit Pape 1 sibling, 1 reply; 20+ messages in thread From: Alex Efros @ 2006-11-17 0:15 UTC (permalink / raw) Cc: Dra?en Ka?ar Hi! On Thu, Nov 16, 2006 at 04:24:46PM +0100, Dra?en Ka?ar wrote: > However, there are multithreaded monsters which can't do that. fork() :-/ Another option - you can ask runsv to 'x' (Exit) instead of 't' (Term). In this case runsv will send SIGTERM to your process, which can process it by just closing listening socket, waiting until existing connection finish and then exit. After few (up to 5) seconds runsv will be started again by runsvdir, and so start second process of that server (which will open listening socket again). Probably you can even convert 't' to 'x' using file ./control/t - to be able to use 't' instead of 'x' for restarting this service just as for any other services. P.S. Of course, better solution is not develop multithreaded monsters :) or split that monster into two processes - one for accepting connections and second for processing these connections (that architecture also much better scale because you can run multiple "second" processes, each multithreaded and process many connections - this proven to have better performance compared to single multithreaded process or many singlethreaded processes). -- WBR, Alex. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-17 0:15 ` Alex Efros @ 2006-11-17 0:48 ` Paul Jarc 2006-11-17 13:34 ` Alex Efros 0 siblings, 1 reply; 20+ messages in thread From: Paul Jarc @ 2006-11-17 0:48 UTC (permalink / raw) Cc: Dra?en Ka?ar Alex Efros <powerman@powerman.asdfGroup.com> wrote: > Another option - you can ask runsv to 'x' (Exit) instead of 't' (Term). > In this case runsv will send SIGTERM to your process, which can process it > by just closing listening socket, waiting until existing connection finish > and then exit. > After few (up to 5) seconds runsv will be started again by runsvdir, and > so start second process of that server (which will open listening socket > again). This seems worse than t. In either case, new connections are refused while the old process cleans up its current connections, but with x, new connections are also refused for up to 5 seconds more. paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-17 0:48 ` Paul Jarc @ 2006-11-17 13:34 ` Alex Efros 2006-11-17 14:53 ` Charlie Brady 0 siblings, 1 reply; 20+ messages in thread From: Alex Efros @ 2006-11-17 13:34 UTC (permalink / raw) Hi! On Thu, Nov 16, 2006 at 07:48:55PM -0500, Paul Jarc wrote: > > Another option - you can ask runsv to 'x' (Exit) instead of 't' (Term). > > In this case runsv will send SIGTERM to your process, which can process it > > by just closing listening socket, waiting until existing connection finish > > and then exit. > > After few (up to 5) seconds runsv will be started again by runsvdir, and > > so start second process of that server (which will open listening socket > > again). > This seems worse than t. In either case, new connections are refused > while the old process cleans up its current connections, but with x, > new connections are also refused for up to 5 seconds more. If old server continue accepting new connections for 5 seconds after receiving SIGTERM this solve 'connection refused' issue. (If new server will be started after 1 second, for example, then in next 4 seconds both server will have open listening socket and some connections will be accepted by first server and some by second AFAIK - I don't see something really wrong with this.) -- WBR, Alex. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-17 13:34 ` Alex Efros @ 2006-11-17 14:53 ` Charlie Brady 2006-11-17 15:39 ` Gerrit Pape 2006-11-18 0:22 ` Alex Efros 0 siblings, 2 replies; 20+ messages in thread From: Charlie Brady @ 2006-11-17 14:53 UTC (permalink / raw) Cc: supervision On Fri, 17 Nov 2006, Alex Efros wrote: > On Thu, Nov 16, 2006 at 07:48:55PM -0500, Paul Jarc wrote: >>> Another option - you can ask runsv to 'x' (Exit) instead of 't' (Term). >>> In this case runsv will send SIGTERM to your process, which can process it >>> by just closing listening socket, waiting until existing connection finish >>> and then exit. >>> After few (up to 5) seconds runsv will be started again by runsvdir, and >>> so start second process of that server (which will open listening socket >>> again). >> This seems worse than t. In either case, new connections are refused >> while the old process cleans up its current connections, but with x, >> new connections are also refused for up to 5 seconds more. > > If old server continue accepting new connections for 5 seconds after > receiving SIGTERM this solve 'connection refused' issue. (If new server > will be started after 1 second, for example, then in next 4 seconds both > server will have open listening socket and some connections will be > accepted by first server and some by second AFAIK - I don't see something > really wrong with this.) The new server will get an "Address in use" error when it attempts to open the socket, if it is still in use by the old server. It will likely then die, and you will have to wait again for runsv to start a new one. You will still have a period of time when connections will not be accepted. Gerrit, tcpsvd man page doesn't mention how tcpsvd responds to signals, but I would guess it doesn't go into the background and die without terminating its children, in response to SIGUSR1. Would you consider adding that behaviour? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-17 14:53 ` Charlie Brady @ 2006-11-17 15:39 ` Gerrit Pape 2006-11-18 0:22 ` Alex Efros 1 sibling, 0 replies; 20+ messages in thread From: Gerrit Pape @ 2006-11-17 15:39 UTC (permalink / raw) On Fri, Nov 17, 2006 at 09:53:28AM -0500, Charlie Brady wrote: > Gerrit, tcpsvd man page doesn't mention how tcpsvd responds to signals, > but I would guess it doesn't go into the background and die without > terminating its children, in response to SIGUSR1. Would you consider > adding that behaviour? tcpsvd forks for each connection, if it receives a TERM (or USR1) signal, it terminates, and leaves its children handling existing connections running. Regards, Gerrit. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-17 14:53 ` Charlie Brady 2006-11-17 15:39 ` Gerrit Pape @ 2006-11-18 0:22 ` Alex Efros 2006-11-18 1:34 ` Charlie Brady 1 sibling, 1 reply; 20+ messages in thread From: Alex Efros @ 2006-11-18 0:22 UTC (permalink / raw) Hi! On Fri, Nov 17, 2006 at 09:53:28AM -0500, Charlie Brady wrote: > The new server will get an "Address in use" error when it attempts to open > the socket, if it is still in use by the old server. It will likely then > die, and you will have to wait again for runsv to start a new one. You > will still have a period of time when connections will not be accepted. All servers usually use setsockopt(SO_REUSEADDR) to work around this. -- WBR, Alex. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-18 0:22 ` Alex Efros @ 2006-11-18 1:34 ` Charlie Brady 2006-11-18 12:31 ` Alex Efros 0 siblings, 1 reply; 20+ messages in thread From: Charlie Brady @ 2006-11-18 1:34 UTC (permalink / raw) Cc: supervision On Sat, 18 Nov 2006, Alex Efros wrote: > On Fri, Nov 17, 2006 at 09:53:28AM -0500, Charlie Brady wrote: >> The new server will get an "Address in use" error when it attempts to open >> the socket, if it is still in use by the old server. It will likely then >> die, and you will have to wait again for runsv to start a new one. You >> will still have a period of time when connections will not be accepted. > > All servers usually use setsockopt(SO_REUSEADDR) to work around this. Not as I understand it. SO_REUSEADDR will allow the socket to be reused when in TIME_WAIT state. It won't allow multiple processes to bind to the socket and listen to connections. -bash-3.00$ tcpsvd localhost 5000 echo foo & [1] 7520 -bash-3.00$ tcpsvd localhost 5000 echo foo tcpsvd: fatal: unable to bind socket: address already used -bash-3.00$ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-18 1:34 ` Charlie Brady @ 2006-11-18 12:31 ` Alex Efros 2006-11-18 19:30 ` Paul Jarc 0 siblings, 1 reply; 20+ messages in thread From: Alex Efros @ 2006-11-18 12:31 UTC (permalink / raw) Hi! On Fri, Nov 17, 2006 at 08:34:18PM -0500, Charlie Brady wrote: > Not as I understand it. SO_REUSEADDR will allow the socket to be reused > when in TIME_WAIT state. It won't allow multiple processes to bind to the > socket and listen to connections. Yep, looks like you right. Looks like I've confused this case and case when process with listening socket doing fork and so result in two listening sockets for same ip/port in two different processes. -- WBR, Alex. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-18 12:31 ` Alex Efros @ 2006-11-18 19:30 ` Paul Jarc 2006-11-20 18:27 ` Dražen Kačar 0 siblings, 1 reply; 20+ messages in thread From: Paul Jarc @ 2006-11-18 19:30 UTC (permalink / raw) Alex Efros <powerman@powerman.asdfGroup.com> wrote: > Yep, looks like you right. Looks like I've confused this case and case > when process with listening socket doing fork and so result in two > listening sockets for same ip/port in two different processes. Now that you mention this, I think there is a way to hand off to a new server process with no unavailability at all. But if you can tolerate a small outage during the switchover, you may be better off with the simpler method of sending a signal to make the old process close its listening socket, then forgetting the old process using Gerrit's patch and starting a new one, which will just open a listening socket as usual. So, for zero-unavailability: there is a metaserver which listens on a filesystem socket and takes care of opening other listening sockets for other servers. Instead of opening a listening socket directly, another server would connect to this filesystem socket and ask the metaserver to open it. If there is not yet any socket open for the requested address, the metaserver opens one, and passes the descriptor to the requestor over the filesystem socket connection with sendmsg(). (The metaserver also keeps the listening socket open for itself, but never calls accept() on it.) Both sides keep the filesystem socket connection open for as long as the requestor is accepting connections. If the metaserver receives a request for a socket that is already open, it notifies the previous requestor, which still holds the listening socket, over the filesystem connection. The old process can then close its listening socket (new connections will only be delayed at this point, not refused, since the metaserver still has the listening socket open as well), close the filesystem connection, finish servicing its current connections, and exit. Once the metaserver sees that the old requestor's filesystem connection has been closed, it sends the listening socket to the new requestor. While a server is handling connections, it would have to use select()/poll() to notice activity on either the listening socket or the filesystem connection; it couldn't just block on accept() in a loop. (Well, it could, but that would mean that when a switchover starts, it wouldn't be completed until the next client connected.) If the requestor exits, and no other requestors are around to pass the listening socket to, the metaserver could close it immediately, or could keep it open for a few seconds to see if a new requestor show up. So quick, non-overlapping restarts would be transparent to the end clients. To trigger the switchover, you wouldn't need any signals - just make runsv forget about the old process using Gerrit's patch. When the new process starts up and connects to the filesystem socket, that will trigger everything else. paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-18 19:30 ` Paul Jarc @ 2006-11-20 18:27 ` Dražen Kačar 2006-11-20 19:32 ` Paul Jarc 0 siblings, 1 reply; 20+ messages in thread From: Dražen Kačar @ 2006-11-20 18:27 UTC (permalink / raw) Paul Jarc wrote: > So, for zero-unavailability: there is a metaserver which listens on a And then it's just a small matter of implementing the metaserver? :-) It seems a bit complex to me. You'd have to implement a protocol for starting a new metaserver version (which boils down to passing all those file descriptors to the new metaserver) Then you'd need to implement something to take care of metaserver crashes. Probably a way for servers to pass listening sockets back to the new metaserver. Then servers would need a way to wait a bit if they want to restart while the metaserver is being restarted. Maybe a few more things as well. I suppose it's doable, but it seems like a can of worms and races. > While a server is handling connections, it would have to use > select()/poll() to notice activity on either the listening socket or > the filesystem connection; And that isn't very nice. > If the requestor exits, and no other requestors are around to pass the > listening socket to, the metaserver could close it immediately, or > could keep it open for a few seconds to see if a new requestor show > up. So quick, non-overlapping restarts would be transparent to the > end clients. How is it supposed to know that the requestor exited? > To trigger the switchover, you wouldn't need any signals - just make > runsv forget about the old process using Gerrit's patch. When the new > process starts up and connects to the filesystem socket, that will > trigger everything else. I meant to implement (when the time comes) something simpler. Either a FIFO or a Unix domain socket[1] is used as a communications channel for passing the listening socket, but without additional daemons. The new server starts, acquires all resources necessary to run except the listening socket and the PID file, then tries to connect to the file system channel. If there's no writer, it binds to the network socket, writes the PID file, becomes the writer on the file system channel and starts doing its job. If there is a writer, it's supposed to be an already running server instance. Then the new server reads the PID file, signals the running instance and blocks in read on the file system channel. The running instance receives the signal, passes the listening socket, performs whatever cleanup needs to be done[2] and then either exits or waits for the current sessions to finish and then exits. The new server reads the file descriptor, becomes the writer on the file system channel, writes the new PID file and starts doing its job. [1] FIFOs are nasty because O_RDONLY opens block if there are no writers and O_WRONLY opens block if there are no readers. O_NONBLOCK allows a reader to attach without the writer, but it doesn't allow a writer to attach when there is no reader. I'm not sure if the required mumbo-jumbo can be portably done with FIFOs (also, some OSs have bugs in this area IIRC). But if a FIFO isn't good enough, Unix domain socket should suffice. [2] At least it needs to close the listening socket and the writing part of the file system channel to enable the new server to become a writer there. -- .-. .-. Yes, I am an agent of Satan, but my duties are largely (_ \ / _) ceremonial. | | dave@fly.srk.fer.hr ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-20 18:27 ` Dražen Kačar @ 2006-11-20 19:32 ` Paul Jarc 2006-11-20 19:43 ` Paul Jarc 2006-11-22 19:25 ` Dražen Kačar 0 siblings, 2 replies; 20+ messages in thread From: Paul Jarc @ 2006-11-20 19:32 UTC (permalink / raw) Cc: supervision Dražen Kačar <dave@fly.srk.fer.hr> wrote: > And then it's just a small matter of implementing the metaserver? :-) Right. :) > It seems a bit complex to me. You'd have to implement a protocol for > starting a new metaserver version (which boils down to passing all > those file descriptors to the new metaserver) Well, it depends how much downtime you can tolerate. Restarting the metaserver would probably be pretty infrequent - less frequent than restarting the servers that use it - so you might accept some downtime in that event for the sake of making the metaserver simpler. But actually, it could be fairly simple if you're willing to restart all other services when you restart the metaserver. A new instance of the metaserver could request listening sockets from the old one using the same method that other servers use. First it would connect to the old metaserver through the filesystem socket, then listen on a new filesystem socket, and rename() that to atomically replace the old one. Then, since the old metaserver has passed the listening sockets to a new process, it will revoke its leases to the old servers. They will all exit and be automatically restarted, re-requesting their sockets from the new metaserver. But the listening sockets will never be completely closed through all this, so connections will not be rejected. The only new kind of conversation needed over the filesystem connection is for the new metaserver to ask for all open connections, instead of individual connections that may or may not already be open. There is a race condition here if two new metaservers start at the same time when there is no old metaserver already running, but it only results in an extra process hanging around doing nothing, which isn't harmful. > Then you'd need to implement something to take care of metaserver crashes. > Probably a way for servers to pass listening sockets back to the > new metaserver. I think that's beyond the point of diminishing returns. The problem can never be completely solved, since the metaserver and other servers could crash at the same time, or you could lose power, etc. You have to give up at some point. > Then servers would need a way to wait a bit if they want to restart while > the metaserver is being restarted. They could just exit and let supervise/runsv restart them. >> While a server is handling connections, it would have to use >> select()/poll() to notice activity on either the listening socket or >> the filesystem connection; > > And that isn't very nice. Well, I'd probably do that anyway, if I wanted to handle signals, since I'd use the self-pipe technique to notice when signals arrived. >> If the requestor exits, and no other requestors are around to pass the >> listening socket to, the metaserver could close it immediately > > How is it supposed to know that the requestor exited? The connection over the filesystem socket would be closed. That could happen without the server exiting, but the metaserver can treat both cases the same way. If a server closes the filesystem connection and still expects to accept new connections, it's misbehaving. > I meant to implement (when the time comes) something simpler. Either a > FIFO or a Unix domain socket[1] is used as a communications channel for > passing the listening socket, but without additional daemons. That was my first thought too, but I couldn't come up with any satisfying way to handle the race conditions gracefully. Open file descriptors can only be passed over sockets, not pipes. Also, using a socket means you have two-way communication, so you don't need signals or PID files, which are subject to race conditions. But without signals, you'll still have to use select()/poll() even with all the functionality contained in one program, or else when you start a new server to replace the old one, the old one will wait indefinitely for one more client connection before waking up and noticing that it should hand the listening sockets over to the new server. One problem with filesystem sockets is that you have to unlink the socket before listening on it, so the operation "listen on this socket, which may or may not already exist" isn't atomic. If two processes start at the same time, one of them can delete the other's socket without knowing that anything was listening on it. So it may be useful to atomically acquire some other dummy resource as a mutual-exclusion checkpoint before listening on the filesystem socket. Another benefit of making the metaserver a separate program: you can also write a library for LD_PRELOAD that masks the listen() function to make existing programs use the metaserver instead of opening their listening sockets directly. paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-20 19:32 ` Paul Jarc @ 2006-11-20 19:43 ` Paul Jarc 2006-11-22 19:25 ` Dražen Kačar 1 sibling, 0 replies; 20+ messages in thread From: Paul Jarc @ 2006-11-20 19:43 UTC (permalink / raw) Cc: supervision I wrote: > Another benefit of making the metaserver a separate program: you can > also write a library for LD_PRELOAD that masks the listen() function > to make existing programs use the metaserver instead of opening their > listening sockets directly. Oops, that won't work, since the server has to notice when its lease on the listening socket has expired, and clean up accordingly. paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-20 19:32 ` Paul Jarc 2006-11-20 19:43 ` Paul Jarc @ 2006-11-22 19:25 ` Dražen Kačar 2006-11-22 19:51 ` Paul Jarc 1 sibling, 1 reply; 20+ messages in thread From: Dražen Kačar @ 2006-11-22 19:25 UTC (permalink / raw) Paul Jarc wrote: > Dražen Kačar <dave@fly.srk.fer.hr> wrote: > Well, it depends how much downtime you can tolerate. [...] > > Then you'd need to implement something to take care of metaserver crashes. > > Probably a way for servers to pass listening sockets back to the > > new metaserver. > > I think that's beyond the point of diminishing returns. The problem > can never be completely solved, since the metaserver and other servers > could crash at the same time, or you could lose power, etc. You have > to give up at some point. Well, I'm thinking about systems with 99.999% availability (for fun you can calculate how many seconds per year that is :-). Clustered systems can do that. There's a hartbeat and if one component fails then the processing waits until the living ones reach the recovery point, but after that processing continues and the client only sees a brief (or not) pause. But that's for distributed systems where you just don't have a single point on which you can rely to work properly. On one machine there's the kernel. If it crashes, all processes will also crash and burn, so that would be the point at which I'd give up. :-) > >> While a server is handling connections, it would have to use > >> select()/poll() to notice activity on either the listening socket or > >> the filesystem connection; > > > > And that isn't very nice. > > Well, I'd probably do that anyway, if I wanted to handle signals, > since I'd use the self-pipe technique to notice when signals arrived. I'm trying to use sig_atomic_t flag in signal handlers whenever I can. Things are a bit simpler that way, at least to me. For threaded code there'd be a signal handling thread, so that's allegedly a non-issue. Just a small matter if inter-thread syncronization (yuck). > > I meant to implement (when the time comes) something simpler. Either a > > FIFO or a Unix domain socket[1] is used as a communications channel for > > passing the listening socket, but without additional daemons. > > That was my first thought too, but I couldn't come up with any > satisfying way to handle the race conditions gracefully. > > Open file descriptors can only be passed over sockets, not pipes. Right. Passing them through a FIFO is a SYSV feature. I forgot it was not portable. > Also, using a socket means you have two-way communication, so you > don't need signals or PID files, which are subject to race conditions. Files maybe are, but signals? You can end up with losing some signals if they are sent in a rapid succesion, but for this purpose you just need one to trigger an action and shortly after receiving it the server is supposed to exit, so the possibility of losing other signals (of the same kind) doesn't matter. > But without signals, you'll still have to use select()/poll() even > with all the functionality contained in one program, or else when you > start a new server to replace the old one, the old one will wait > indefinitely for one more client connection before waking up and > noticing that it should hand the listening sockets over to the new > server. I'd use a signal to get out of accept(). :-) > One problem with filesystem sockets is that you have to unlink the > socket before listening on it, so the operation "listen on this > socket, which may or may not already exist" isn't atomic. If two > processes start at the same time, one of them can delete the other's > socket without knowing that anything was listening on it. So it may > be useful to atomically acquire some other dummy resource as a > mutual-exclusion checkpoint before listening on the filesystem socket. A server binds to the file system socket after it got the network socket, either by a direct bind() or by a passover from an existing server. That should be enough, I think. It's not atomic, but it's a locking protocol. My description had: "If there's no writer [on a file system channel], it binds to the network socket, writes the PID file [...]" If the bind to the network socket fails because something else is listening, then it can try again on the file system and bail out with an error if there's no writer. After all, that's not supposed to happen. > Another benefit of making the metaserver a separate program: you can > also write a library for LD_PRELOAD that masks the listen() function > to make existing programs use the metaserver instead of opening their > listening sockets directly. That's a good one. But shouldn't that mask the bind() function? As for the lease problem, couldn't metaserver just SIGTERM the existing server? It needs to know the PID, but that can be passed to it when the server connects to the file system socket and before it gets the network socket from the metaserver. -- .-. .-. Yes, I am an agent of Satan, but my duties are largely (_ \ / _) ceremonial. | | dave@fly.srk.fer.hr ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-22 19:25 ` Dražen Kačar @ 2006-11-22 19:51 ` Paul Jarc 2006-11-23 12:25 ` Dražen Kačar 0 siblings, 1 reply; 20+ messages in thread From: Paul Jarc @ 2006-11-22 19:51 UTC (permalink / raw) Cc: supervision Dražen Kačar <dave@fly.srk.fer.hr> wrote: > Paul Jarc wrote: >> Also, using a socket means you have two-way communication, so you >> don't need signals or PID files, which are subject to race conditions. > > Files maybe are, but signals? Unless you're sending signals to your own child process, there's a chance that the process you're signaling has already died, and its PID has been reused for a new process. This is true no matter how you obtain the PID to send signals to; PID files are just one case of that. The only exception is for the parent, which knows that the child's PID hasn't been recycled because, even if the child has exited, the parent hasn't wait()ed for the child yet. > A server binds to the file system socket after it got the network socket, > either by a direct bind() or by a passover from an existing server. That > should be enough, I think. It's not atomic, but it's a locking protocol. Actually, obtaining the network socket can be atomic enough - bind() is atomic, and each server process can limit itself to passing the network socket to at most one other server, so there's no chance of one server getting a network socket while another is in the middle of receiving it in a handoff. That's all the atomicity we need. It can get messy if two servers start at the same time - A will get the network socket, so B will try to connect to the filesystem socket, but A might not be listening there yet. B could handle this by looping, waiting for A to either die and free up the network socket, or start listening on the filesystem socket. That might work, but looping doesn't feel right. I'm not sure if there are any problems lurking there. > If the bind to the network socket fails because something else is > listening, then it can try again on the file system and bail out with an > error if there's no writer. After all, that's not supposed to happen. That could work too. Looping has the advantage that if A dies just after binding the network socket, B will go back and try again, so the service will come up as long as B survives. But with supervision, it will get restarted anyway, so it's not really a big difference. So I guess there's no big advantage either way between the metaserver vs. keeping all the handoff functionality in one program. >> Another benefit of making the metaserver a separate program: you can >> also write a library for LD_PRELOAD that masks the listen() function >> to make existing programs use the metaserver instead of opening their >> listening sockets directly. > > That's a good one. But shouldn't that mask the bind() function? Probably, but it won't actually work either way, since the server needs to notice traffic on the filesystem connection. > As for the lease problem, couldn't metaserver just SIGTERM the existing > server? That suffers from the PID-recycling problem above. paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-22 19:51 ` Paul Jarc @ 2006-11-23 12:25 ` Dražen Kačar 2006-11-24 21:22 ` Paul Jarc 0 siblings, 1 reply; 20+ messages in thread From: Dražen Kačar @ 2006-11-23 12:25 UTC (permalink / raw) Paul Jarc wrote: > Dražen Kačar <dave@fly.srk.fer.hr> wrote: > > Paul Jarc wrote: > >> Also, using a socket means you have two-way communication, so you > >> don't need signals or PID files, which are subject to race conditions. > > > > Files maybe are, but signals? > > Unless you're sending signals to your own child process, there's a > chance that the process you're signaling has already died, and its PID > has been reused for a new process. This is true no matter how you > obtain the PID to send signals to; PID files are just one case of > that. The only exception is for the parent, which knows that the > child's PID hasn't been recycled because, even if the child has > exited, the parent hasn't wait()ed for the child yet. Ah, that. Well, you'd just have to rely on the usually-not-documented OS feature. PIDs are not recycled fast in practice, so that would have to be good enough. Somewhat unportable guarantee could be obtained via /proc. You know the PID, so you stop the process via /proc or ptrace() or whatever is available for debuggers (something will be available), check that the PID is associated with the correct executable via /proc, send your signal (now the process won't go away) and then detach from the process. Checking whether the PID corresponds to the correct executable is the messy part and I don't know if it can be handled in a reasonable way for this purpose. I'd just live with the race condition and rely on the OS not to reuse PIDs too fast. > > A server binds to the file system socket after it got the network socket, > > either by a direct bind() or by a passover from an existing server. That > > should be enough, I think. It's not atomic, but it's a locking protocol. > > Actually, obtaining the network socket can be atomic enough - bind() I meant for the whole thing: obtain the network socket and then obtain the file system socket. Mandating that the network socket must be obtained first and file system socket second is a locking protocol. And the first lock in the locking protocol must be atomic. > So I guess there's no big advantage either way between the metaserver > vs. keeping all the handoff functionality in one program. Unless LD_PRELOAD method can work. Then the metaserver has a distinct advantage for those who need it. > > That's a good one. But shouldn't that mask the bind() function? > > Probably, but it won't actually work either way, since the server > needs to notice traffic on the filesystem connection. > > > As for the lease problem, couldn't metaserver just SIGTERM the existing > > server? > > That suffers from the PID-recycling problem above. Well, I'd just ignore that problem. Or go through unportable interface for the debuggers when feeling paranoid. -- .-. .-. Yes, I am an agent of Satan, but my duties are largely (_ \ / _) ceremonial. | | dave@fly.srk.fer.hr ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-23 12:25 ` Dražen Kačar @ 2006-11-24 21:22 ` Paul Jarc 0 siblings, 0 replies; 20+ messages in thread From: Paul Jarc @ 2006-11-24 21:22 UTC (permalink / raw) Cc: supervision Dražen Kačar <dave@fly.srk.fer.hr> wrote: > PIDs are not recycled fast in practice, so that would have to be > good enough. They certainly are recycled fast in practice, although maybe not often. For example, OpenBSD can assign PIDs in random order instead of sequentially, so a PID has a chance of being reused for the very next process after it exits. The same problem can hit any OS if it spawns short-lived processes at a high rate. For me, at least, it's well worth using poll()/select() to avoid this risk. It's a one-time task for the programmer, but PID recycling is a constant danger for every user. > Somewhat unportable guarantee could be obtained via /proc. You know the > PID, so you stop the process via /proc or ptrace() or whatever is > available for debuggers (something will be available), check that the PID > is associated with the correct executable via /proc, Even if it's the right program, that doesn't guarantee it's the right process. This seems like more work than poll()/select(), with worse results. paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: graceful restart under runit 2006-11-16 15:24 ` Dražen Kačar 2006-11-17 0:15 ` Alex Efros @ 2006-11-17 13:14 ` Gerrit Pape 1 sibling, 0 replies; 20+ messages in thread From: Gerrit Pape @ 2006-11-17 13:14 UTC (permalink / raw) Cc: Dra?en Ka?ar On Thu, Nov 16, 2006 at 04:24:46PM +0100, Dra?en Ka?ar wrote: > Alex Efros wrote: > > On Wed, Nov 15, 2006 at 12:47:54PM +0100, Dra?en Ka?ar wrote: > > > Say I have a TCP server which listens on incoming connections on some TCP > > > port. Occasionaly I'd like to install and run a new version of the server > > > executable. Server source is under my control, for all intents and > > > purposes. > > [...] > > > Is there a way to get around this? > > > > Probably you can just fork() after receiving SIGUSR1 and exit from parent > > leaving child to process existing connection. Yes. > Servers which use process per connection do something like that already > (the parent process signals the children, exits and leaves them to finish > sessions and then they exit too). > > However, there are multithreaded monsters which can't do that. fork() > replicates just the calling thread[1], so it's not an option and exit() > will terminate all threads (ie. all sessions). Hm, even though I too dislike "multithreaded monsters", we could add some detach support to runsv, e.g. the patch below. You can test this with $ printf f >./supervise/control After this, runsv forgets about the child, and considers the service to be terminated; custom/f, if it exists, will be run before detaching. Regards, Gerrit. Index: src/runsv.c =================================================================== RCS file: /cvs/runit/src/runsv.c,v retrieving revision 1.26 diff -u -r1.26 runsv.c --- src/runsv.c 24 Jul 2006 21:01:37 -0000 1.26 +++ src/runsv.c 17 Nov 2006 12:58:34 -0000 @@ -359,6 +359,15 @@ update_status(s); if (! s->pid) startservice(s); break; + case 'f': /* forget, detach */ + if (! s->pid) break; + custom(s, c); + s->pid =0; + s->state =S_DOWN; + s->ctrl =C_NOOP; + pidchanged =1; + update_status(s); + break; case 'a': /* sig alarm */ if (s->pid && ! custom(s, c)) kill(s->pid, SIGALRM); break; ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2006-11-24 21:22 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-11-15 11:47 graceful restart under runit Dražen Kačar 2006-11-15 16:08 ` Alex Efros 2006-11-16 15:24 ` Dražen Kačar 2006-11-17 0:15 ` Alex Efros 2006-11-17 0:48 ` Paul Jarc 2006-11-17 13:34 ` Alex Efros 2006-11-17 14:53 ` Charlie Brady 2006-11-17 15:39 ` Gerrit Pape 2006-11-18 0:22 ` Alex Efros 2006-11-18 1:34 ` Charlie Brady 2006-11-18 12:31 ` Alex Efros 2006-11-18 19:30 ` Paul Jarc 2006-11-20 18:27 ` Dražen Kačar 2006-11-20 19:32 ` Paul Jarc 2006-11-20 19:43 ` Paul Jarc 2006-11-22 19:25 ` Dražen Kačar 2006-11-22 19:51 ` Paul Jarc 2006-11-23 12:25 ` Dražen Kačar 2006-11-24 21:22 ` Paul Jarc 2006-11-17 13:14 ` Gerrit Pape
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).