On Fri, Jul 26, 2024 at 2:41 PM Toomas Soome via illumos-developer < developer@lists.illumos.org> wrote: > > > On 26. Jul 2024, at 15:44, Peter Tribble wrote: > > On Fri, Jul 26, 2024 at 9:21 AM Andy Fiddaman > wrote: > >> Please can you review the following change? >> >> 15665 svc:/network/loopback exits successfully even if it fails >> https://www.illumos.org/issues/15665 >> https://code.illumos.org/c/illumos-gate/+/3610 >> > > When this first came up I expressed my belief that making this change is > the wrong > thing to do, and I'll express it again. > > If this service fails, I think the best thing to do is drive on so that > the system can come > up as far as possible to maximise the chance that the system comes up far > enough for > an administrator to be able to get in and fix it. Not putting the service > into maintenance > is a feature, not a bug. > > (If it fails, then there's something deeper wrong with the system, and > those fundamental > causes should be dealt with.) > > I think generally it would be wrong for a single voice to veto any change, > which means I > would generally be uncomfortable sticking a -1 on it, but if this does get > into the gate > it will be reverted in Tribblix. > > > hm, ok, you mean that service startup error in case of network/loopback > will block too many other services? but then again, if there is an error, > those depending services are also not functional, aren't they? Otherwise > those depending services should not depend on network/loopback ;) > Well yes, the whole thing gets completely blocked. If it's a cloud or remote server, it won't boot, and if you don't have console access (which isn't universal) then the system is a total loss. One of the issues is that SMF doesn't have a very rich vocabulary for service states or dependencies - it's very much black and white. And this is a case where you would normally want loopback to be up (I think many users and applications make unstated assumptions here, although actually quite a lot is completely unaffected if the loopback isn't up), and you definitely want it up *before* the other applications - and generally for loopback it's the ordering that's important. So when everything is working the dependency is correct. When something goes wrong, though, in many cases what you want is to carry on, so that as much as possible works and you get a chance to fix it. So what I see missing here is some way of exposing that the service has an error, without bluntly dropping it into maintenance and killing all the dependent services with it. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/