On Fri, Jul 26, 2024 at 2:41 PM Toomas Soome via illumos-developer <developer@lists.illumos.org> wrote:


On 26. Jul 2024, at 15:44, Peter Tribble <peter.tribble@gmail.com> wrote:

On Fri, Jul 26, 2024 at 9:21 AM Andy Fiddaman <illumos@fiddaman.net> wrote:
Please can you review the following change?

    15665 svc:/network/loopback exits successfully even if it fails
    https://www.illumos.org/issues/15665
    https://code.illumos.org/c/illumos-gate/+/3610

When this first came up I expressed my belief that making this change is the wrong
thing to do, and I'll express it again.

If this service fails, I think the best thing to do is drive on so that the system can come
up as far as possible to maximise the chance that the system comes up far enough for
an administrator to be able to get in and fix it. Not putting the service into maintenance
is a feature, not a bug.

(If it fails, then there's something deeper wrong with the system, and those fundamental
causes should be dealt with.)

I think generally it would be wrong for a single voice to veto any change, which means I
would generally be uncomfortable sticking a -1 on it, but if this does get into the gate
it will be reverted in Tribblix.


hm, ok, you mean that service startup error in case of network/loopback will block too many other services? but then again, if there is an error, those depending services are also not functional, aren't they? Otherwise those depending services should not depend on network/loopback ;)

Well yes, the whole thing gets completely blocked. If it's a cloud or remote server,
it won't boot, and if you don't have console access (which isn't universal) then the
system is a total loss.

One of the issues is that SMF doesn't have a very rich vocabulary for service states
or dependencies - it's very much black and white. And this is a case where you would
normally want loopback to be up (I think many users and applications make unstated
assumptions here, although actually quite a lot is completely unaffected if the loopback
isn't up), and you definitely want it up *before* the other applications - and generally for
loopback it's the ordering that's important. So when everything is working the dependency
is correct.

When something goes wrong, though, in many cases what you want is to carry on, so
that as much as possible works and you get a chance to fix it. So what I see missing
here is some way of exposing that the service has an error, without bluntly dropping it into
maintenance and killing all the dependent services with it.

--
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/