On Tue, Jul 30, 2024 at 11:46 PM Gordon Ross wrote: > Optional dependency does that in SMF, right? > Well no, that's a rather different case. That is the "I don't care if it's enabled or not, but if it is I'll have a hard dependency on it". What we're after here is the "I do care that it's enabled, and must run after it, but I'm prepared to live with errors". > On Tue, Jul 30, 2024 at 12:56 PM Jorge Schrauwen via illumos-developer < > developer@lists.illumos.org> wrote: > >> This last reply from Peter made me think of the difference between >> requires vs after in systemd speak. >> >> Although that is probably a lot of work as one would need those feature >> and somehow fix all manifests that express a dependancy on loopback. >> >> Admittedly I sometimes miss a more soft dependancy in smf in general. >> >> ~ sjorge >> >> On 26 Jul 2024, at 17:16, Peter Tribble wrote: >> >>  >> >> >> >> On Fri, Jul 26, 2024 at 2:50 PM Andy Fiddaman wrote: >> >>> >>> On Fri, 26 Jul 2024, Peter Tribble wrote: >>> >>> > On Fri, Jul 26, 2024 at 9:21?AM Andy Fiddaman >>> wrote: >>> > >>> > > Please can you review the following change? >>> > > >>> > > 15665 svc:/network/loopback exits successfully even if it fails >>> > > https://www.illumos.org/issues/15665 >>> > > https://code.illumos.org/c/illumos-gate/+/3610 >>> > > >>> > >>> > When this first came up I expressed my belief that making this change >>> is >>> > the wrong >>> > thing to do, and I'll express it again. >>> >>> Apologies Peter. I had recalled that your objection to the original >>> change >>> was mostly around the addition of the extra dependency to the service, >>> which >>> I've removed in this new patch set (that is >>> https://www.illumos.org/issues/15664 which remains open). >>> >>> > If this service fails, I think the best thing to do is drive on so >>> that the >>> > system can come up as far as possible to maximise the chance that the >>> system >>> > comes up far enough for an administrator to be able to get in and fix >>> it. Not >>> > putting the service into maintenance is a feature, not a bug. >>> >>> The impetus for this change is that over the past couple of years we've >>> had >>> a number of occasions where we've had to debug networking problems that >>> have had their root in the fact that the loopback interfaces were not >>> created >>> for one reason or another. It happened again yesterday in a non-global >>> zone. In >>> all of these, it would have been really useful and expedited diagnosis >>> if the >>> service had gone into maintenance. I understand the perspective of >>> allowing the >>> system to come up as far as possible - to the point of remote access >>> even - but >>> it still seems wrong for a service to report success where it has not >>> actually >>> achieved its goal. Is there some middle ground here. >>> >>> > I think generally it would be wrong for a single voice to veto any >>> change, >>> > which means I would generally be uncomfortable sticking a -1 on it, >>> but if >>> > this does get into the gate it will be reverted in Tribblix. >>> >>> Understood. This definitely warrants further discussion. >>> >> >> As I mentioned in my other reply, it seems that what we're after is some >> way to mark >> a service as having generated an error without bringing the system down >> by going >> into maintenance. Some sort of degraded mode. >> >> We have a couple of SMF exit codes that look interesting - >> SMF_EXIT_MON_DEGRADE >> and SMF_EXIT_MON_OFFLINE, but I'm sure they were never implemented. >> There's >> even an issue in this area - https://www.illumos.org/issues/7711 (which >> refers back to 8891 >> which is another case of something dropping into maintenance breaking the >> entire system). >> >> Interestingly, looking at the ssh method script for S11 >> >> https://github.com/oracle/solaris-userland/blob/master/components/openssh/sources/sshd.sh#L132 >> you see the following: >> >> # Put the service into degraded mode in case some of previous >> # configuration tasks failed. >> # We do not let the service enter maintenance mode, since >> # we want to keep the system as much operating as feasible. >> # >> if [ $ret1 -ne 0 ]; then >> smf_method_exit $SMF_EXIT_DEGRADED "hostkey_configuration" \ >> "Failed to generate missing host keys." >> fi >> >> So the equivalent of SMF_EXIT_DEGRADED might be what we're looking for? >> >> -- >> -Peter Tribble >> http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ >> >> *illumos * / illumos-developer / > see discussions + > participants + > delivery options > Permalink > > -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/