From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from tb-mx1.topicbox.com (localhost.local [127.0.0.1]) by tb-mx1.topicbox.com (Postfix) with ESMTP id 5E4181E2021B for ; Fri, 26 Jul 2024 11:14:27 -0400 (EDT) (envelope-from peter.tribble@gmail.com) Received: from tb-mx1.topicbox.com (localhost [127.0.0.1]) by tb-mx1.topicbox.com (Authentication Milter) with ESMTP id 52619F03D5D; Fri, 26 Jul 2024 11:14:27 -0400 ARC-Seal: i=1; a=rsa-sha256; cv=none; d=topicbox.com; s=arcseal; t= 1722006867; b=Vw60zmxrNAghqKZ8Q5mwOUL8gry10TsrJvVa4Y+NCSLhu6FO2K G9VBP6mf6FCUsD8itpobLtAx5QGJY5KM+sxR48sx8q08fkOf0ESa2vo4we0LZfxE u1/gaD4euh+L2zmh7P28TrMnwvjwJ/5lykUFcCs/J0VSZ7x+liUroqCsEZrcto/B D66IzBGJcLG7fZARab0hXXp4hb1s5IEhD0lOd9phInj72Iq4SUObXI352bogmxQI iQ+YYnmVHW1lfM8Nmp/YxCqfHzxrmzq6q10apBxkIiaStJonwVjCWBLyE1QaUGKH Gi8KrYdt3BqvPRc9U0VqKtG2Q8tIIGF2tWfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d= topicbox.com; h=mime-version:references:in-reply-to:from:date :message-id:subject:to:content-type; s=arcseal; t=1722006867; bh=g09Tv29/6S7kir93XwAK7G7v2wjVCNtRjKd3DpxQAJQ=; b=xX7QUhuFMyum 0blJiUCwnMZqptR3bXlGMvjo2wKAZk00YmfQ4erhBUsUEA6oUdf9tE32osk6QohE 4nUt+Fna/w2xTohpuB5lflF8eWaEbiyin7Y3V7FdTbPEBtMRjsF7eXYu4sldloQ1 DE7SE977OjFtP37lpDhvDOaBop648VIdhiLNnJxtXalUUdeOrLkiSeyZLzyR/ONi toZLDZpNv2UPpISWF2seNrLovmQHCiBbhR+Hi1s40MqZMqoPRPsht48VnuoBZYys vUVNqdRC2sQAXcOT8WUSQAQvquiWtTDpfGaIhzQNRA162AtHcEB7v+wd3eQ09M3n s+sFEhwWXg== ARC-Authentication-Results: i=1; tb-mx1.topicbox.com; arc=none (no signatures found); bimi=skipped (DMARC Policy is not at enforcement); dkim=pass (2048-bit rsa key sha256) header.d=gmail.com header.i=@gmail.com header.b=HThQkGfc header.a=rsa-sha256 header.s=20230601 x-bits=2048; dmarc=pass policy.published-domain-policy=none policy.published-subdomain-policy=quarantine policy.applied-disposition=none policy.evaluated-disposition=none (p=none,sp=quarantine,d=none,d.eval=none) policy.policy-from=p header.from=gmail.com; iprev=pass smtp.remote-ip=209.85.160.43 (mail-oa1-f43.google.com); spf=pass smtp.mailfrom=peter.tribble@gmail.com smtp.helo=mail-oa1-f43.google.com; x-aligned-from=pass (Address match); x-google-dkim=pass (2048-bit rsa key) header.d=1e100.net header.i=@1e100.net header.b=dKWQfPx7; x-me-sender=none; x-ptr=pass smtp.helo=mail-oa1-f43.google.com policy.ptr=mail-oa1-f43.google.com; x-return-mx=pass header.domain=gmail.com policy.is_org=yes (MX Records found: alt2.gmail-smtp-in.l.google.com,alt3.gmail-smtp-in.l.google.com,alt4.gmail-smtp-in.l.google.com,gmail-smtp-in.l.google.com,alt1.gmail-smtp-in.l.google.com); x-return-mx=pass smtp.domain=gmail.com policy.is_org=yes (MX Records found: alt2.gmail-smtp-in.l.google.com,alt3.gmail-smtp-in.l.google.com,alt4.gmail-smtp-in.l.google.com,gmail-smtp-in.l.google.com,alt1.gmail-smtp-in.l.google.com); x-tls=pass smtp.version=TLSv1.2 smtp.cipher=ECDHE-RSA-AES256-GCM-SHA384 smtp.bits=256/256; x-vs=clean score=-51 state=0 Authentication-Results: tb-mx1.topicbox.com; arc=none (no signatures found); bimi=skipped (DMARC Policy is not at enforcement); dkim=pass (2048-bit rsa key sha256) header.d=gmail.com header.i=@gmail.com header.b=HThQkGfc header.a=rsa-sha256 header.s=20230601 x-bits=2048; dmarc=pass policy.published-domain-policy=none policy.published-subdomain-policy=quarantine policy.applied-disposition=none policy.evaluated-disposition=none (p=none,sp=quarantine,d=none,d.eval=none) policy.policy-from=p header.from=gmail.com; iprev=pass smtp.remote-ip=209.85.160.43 (mail-oa1-f43.google.com); spf=pass smtp.mailfrom=peter.tribble@gmail.com smtp.helo=mail-oa1-f43.google.com; x-aligned-from=pass (Address match); x-google-dkim=pass (2048-bit rsa key) header.d=1e100.net header.i=@1e100.net header.b=dKWQfPx7; x-me-sender=none; x-ptr=pass smtp.helo=mail-oa1-f43.google.com policy.ptr=mail-oa1-f43.google.com; x-return-mx=pass header.domain=gmail.com policy.is_org=yes (MX Records found: alt2.gmail-smtp-in.l.google.com,alt3.gmail-smtp-in.l.google.com,alt4.gmail-smtp-in.l.google.com,gmail-smtp-in.l.google.com,alt1.gmail-smtp-in.l.google.com); x-return-mx=pass smtp.domain=gmail.com policy.is_org=yes (MX Records found: alt2.gmail-smtp-in.l.google.com,alt3.gmail-smtp-in.l.google.com,alt4.gmail-smtp-in.l.google.com,gmail-smtp-in.l.google.com,alt1.gmail-smtp-in.l.google.com); x-tls=pass smtp.version=TLSv1.2 smtp.cipher=ECDHE-RSA-AES256-GCM-SHA384 smtp.bits=256/256; x-vs=clean score=-51 state=0 X-ME-VSCause: gggruggvucftvghtrhhoucdtuddrgeeftddrieehgdekjecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdpuffr tefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnth hsucdlqddutddtmdenogfuuhhsphgvtghtffhomhgrihhnucdlgeelmdenucfjughrpegg fhgjhfffkffuvfgtsegrtderredttdejnecuhfhrohhmpefrvghtvghrucfvrhhisggslh gvuceophgvthgvrhdrthhrihgssghlvgesghhmrghilhdrtghomheqnecuggftrfgrthht vghrnhepteelffeijedtffejleduudefieduledtuedtueeuvefggeffiefhteeltdevff eknecuffhomhgrihhnpehilhhluhhmohhsrdhorhhgpdhgihhthhhusgdrtghomhdpshhs hhgurdhshhdpphgvthgvrhhtrhhisggslhgvrdgtohdruhhkpdgslhhoghhsphhothdrtg homhenucfkphepvddtledrkeehrdduiedtrdegfeenucevlhhushhtvghrufhiiigvpedt necurfgrrhgrmhepihhnvghtpedvtdelrdekhedrudeitddrgeefpdhhvghlohepmhgrih hlqdhorgduqdhfgeefrdhgohhoghhlvgdrtghomhdpmhgrihhlfhhrohhmpeeophgvthgv rhdrthhrihgssghlvgesghhmrghilhdrtghomheqpdhnsggprhgtphhtthhopedupdhrtg hpthhtohepoeguvghvvghlohhpvghrsehlihhsthhsrdhilhhluhhmohhsrdhorhhgqe X-ME-VSScore: -51 X-ME-VSCategory: clean Received-SPF: pass (gmail.com ... _spf.google.com: Sender is authorized to use 'peter.tribble@gmail.com' in 'mfrom' identity (mechanism 'include:_netblocks.google.com' matched)) receiver=tb-mx1.topicbox.com; identity=mailfrom; envelope-from="peter.tribble@gmail.com"; helo=mail-oa1-f43.google.com; client-ip=209.85.160.43 Received: from mail-oa1-f43.google.com (mail-oa1-f43.google.com [209.85.160.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by tb-mx1.topicbox.com (Postfix) with ESMTPS for ; Fri, 26 Jul 2024 11:14:26 -0400 (EDT) (envelope-from peter.tribble@gmail.com) Received: by mail-oa1-f43.google.com with SMTP id 586e51a60fabf-260e8c98cc2so720225fac.0 for ; Fri, 26 Jul 2024 08:14:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722006866; x=1722611666; darn=lists.illumos.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=g09Tv29/6S7kir93XwAK7G7v2wjVCNtRjKd3DpxQAJQ=; b=HThQkGfcAirGS6Hze2zEom3cPlSD2CgtnfCy7v1QLPJ82UkC03NlayxVuqmU3lz0j1 VZwBxc1l2bMrOAq3YYn51ezMzu1ymHXbfGTFus/76rG/o/jHmxZwN5ALGaodQ85H1GOD NshpYKNdm0T0sTi/wSDyg7HmNakk8eLEuPADOTG/mtlrb9FlKQ7d2xXSANNkg6OArSao OVQLU6B2bNrZeN3IjMa3WHQCt9VkL2RknrYysWG/M6jJWhFvv/cfnx+ad4/nHGZx6E1A 5LEzdIwOIM05ujt7mBIg1iew0hnd1ZvoyxRIDjmUGjuEDc+WC/5rmGZbyL8lyat0AQDk ec/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722006866; x=1722611666; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=g09Tv29/6S7kir93XwAK7G7v2wjVCNtRjKd3DpxQAJQ=; b=dKWQfPx7jfbXfCUIil57gR+zIYPANNHnC9oUJZW40fKnxHG2lw4QXe0vfXx4EDn0Eo y3KAQPpqtiFzX5kV6Oqv4TsouuOlwZxPj1YVzve216Hu8yhzg/LDJf2H9fAzJmOn5ywy sP+daX1q3Z0YpyBQWpOUFvtJIda1LGmR1rt12uvYdPNtsKrHaR+ppObfvYqhKOI0E8cQ G5B5VEvIjUAJqeH11vgZvNHaZcp5rwc2IUfWKc5Lu6T0l+WN/bHUEg7K0K1mG6Hq9gaR 2JMYx2ECXdR69BGIBEsTrrEOiMmfTq0NHJBSDULHZR3Re9y5sFFrlFGUN2qNMH1ZHazp ybww== X-Gm-Message-State: AOJu0YxQpNs0zlcBbZ4TlpCSHlg2IDPgimWswcYLejr/uhQhSLVmXmm8 lRkgA+a21t4WG6+Wn4iU+tnglzbdWXx1nDWUlJTdmYZ7yuAtKC4NHDJJX3EMF67WiAflg3czrUw xoGjezCUEjzJfa3UypymrtnCFTpG8mYA= X-Google-Smtp-Source: AGHT+IHyuXGeu3hEbomZp+9e3g7BVcXN64IduhlKbI2/3MbVsliz741ljn38miFat4WwkfCnBBVqOSp5jxV1s1K4zrs= X-Received: by 2002:a05:6870:200b:b0:260:eae3:5ec2 with SMTP id 586e51a60fabf-267d4ddc7dfmr149027fac.26.1722006865813; Fri, 26 Jul 2024 08:14:25 -0700 (PDT) MIME-Version: 1.0 References: <20240726082032.GA10040@reaper.citrus-it.net> In-Reply-To: From: Peter Tribble Date: Fri, 26 Jul 2024 16:14:14 +0100 Message-ID: Subject: Re: [developer] Review - 15665 svc:/network/loopback exits successfully even if it fails To: illumos-developer Content-Type: multipart/alternative; boundary="00000000000089a6af061e27f7b1" Topicbox-Policy-Reasoning: allow: sender is a member Topicbox-Message-UUID: c3135446-4b61-11ef-94c5-eed12ada42f9 --00000000000089a6af061e27f7b1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Jul 26, 2024 at 2:50=E2=80=AFPM Andy Fiddaman wro= te: > > On Fri, 26 Jul 2024, Peter Tribble wrote: > > > On Fri, Jul 26, 2024 at 9:21?AM Andy Fiddaman > wrote: > > > > > Please can you review the following change? > > > > > > 15665 svc:/network/loopback exits successfully even if it fails > > > https://www.illumos.org/issues/15665 > > > https://code.illumos.org/c/illumos-gate/+/3610 > > > > > > > When this first came up I expressed my belief that making this change i= s > > the wrong > > thing to do, and I'll express it again. > > Apologies Peter. I had recalled that your objection to the original chang= e > was mostly around the addition of the extra dependency to the service, > which > I've removed in this new patch set (that is > https://www.illumos.org/issues/15664 which remains open). > > > If this service fails, I think the best thing to do is drive on so that > the > > system can come up as far as possible to maximise the chance that the > system > > comes up far enough for an administrator to be able to get in and fix > it. Not > > putting the service into maintenance is a feature, not a bug. > > The impetus for this change is that over the past couple of years we've h= ad > a number of occasions where we've had to debug networking problems that > have had their root in the fact that the loopback interfaces were not > created > for one reason or another. It happened again yesterday in a non-global > zone. In > all of these, it would have been really useful and expedited diagnosis if > the > service had gone into maintenance. I understand the perspective of > allowing the > system to come up as far as possible - to the point of remote access even > - but > it still seems wrong for a service to report success where it has not > actually > achieved its goal. Is there some middle ground here. > > > I think generally it would be wrong for a single voice to veto any > change, > > which means I would generally be uncomfortable sticking a -1 on it, but > if > > this does get into the gate it will be reverted in Tribblix. > > Understood. This definitely warrants further discussion. > As I mentioned in my other reply, it seems that what we're after is some way to mark a service as having generated an error without bringing the system down by going into maintenance. Some sort of degraded mode. We have a couple of SMF exit codes that look interesting - SMF_EXIT_MON_DEGRADE and SMF_EXIT_MON_OFFLINE, but I'm sure they were never implemented. There's even an issue in this area - https://www.illumos.org/issues/7711 (which refers back to 8891 which is another case of something dropping into maintenance breaking the entire system). Interestingly, looking at the ssh method script for S11 https://github.com/oracle/solaris-userland/blob/master/components/openssh/s= ources/sshd.sh#L132 you see the following: # Put the service into degraded mode in case some of previous # configuration tasks failed. # We do not let the service enter maintenance mode, since # we want to keep the system as much operating as feasible. # if [ $ret1 -ne 0 ]; then smf_method_exit $SMF_EXIT_DEGRADED "hostkey_configuration" \ "Failed to generate missing host keys." fi So the equivalent of SMF_EXIT_DEGRADED might be what we're looking for? --=20 -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ --00000000000089a6af061e27f7b1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Fri, Jul 26, 2024 at 2:50=E2=80=AF= PM Andy Fiddaman <andy@omnios.org= > wrote:

On Fri, 26 Jul 2024, Peter Tribble wrote:

> On Fri, Jul 26, 2024 at 9:21?AM Andy Fiddaman <illumos@fiddaman.net> wrote: >
> > Please can you review the following change?
> >
> >=C2=A0 =C2=A0 =C2=A015665 svc:/network/loopback exits successfully= even if it fails
> >=C2=A0 =C2=A0 =C2=A0https://www.illumos.org/issues/1566= 5
> >=C2=A0 =C2=A0 =C2=A0https://code.illumos.org/= c/illumos-gate/+/3610
> >
>
> When this first came up I expressed my belief that making this change = is
> the wrong
> thing to do, and I'll express it again.

Apologies Peter. I had recalled that your objection to the original change<= br> was mostly around the addition of the extra dependency to the service, whic= h
I've removed in this new patch set (that is
https://www.illumos.org/issues/15664 which remains open).
> If this service fails, I think the best thing to do is drive on so tha= t the
> system can come up as far as possible to maximise the chance that the = system
> comes up far enough for an administrator to be able to get in and fix = it. Not
> putting the service into maintenance is a feature, not a bug.

The impetus for this change is that over the past couple of years we've= had
a number of occasions where we've had to debug networking problems that=
have had their root in the fact that the loopback interfaces were not creat= ed
for one reason or another. It happened again yesterday in a non-global zone= . In
all of these, it would have been really useful and expedited diagnosis if t= he
service had gone into maintenance. I understand the perspective of allowing= the
system to come up as far as possible - to the point of remote access even -= but
it still seems wrong for a service to report success where it has not actua= lly
achieved its goal. Is there some middle ground here.

> I think generally it would be wrong for a single voice to veto any cha= nge,
> which means I would generally be uncomfortable sticking a -1 on it, bu= t if
> this does get into the gate it will be reverted in Tribblix.

Understood. This definitely warrants further discussion.

As I mentioned in my other reply, it seems that what we&#= 39;re after is some way to mark
a service as having generated= an error without bringing the system down by going
into main= tenance. Some sort of degraded mode.

We have a couple of = SMF exit codes that look interesting - SMF_EXIT_MON_DEGRADE
and SMF_EXI= T_MON_OFFLINE, but I'm sure they were never implemented. There's
even an issue in this area - https://www.illumos.org/issues/7711 (which refers back to= 8891
which is another case of something dropping into mainte= nance breaking the entire system).

you see the following:

# Put the se= rvice into degraded mode in case some of previous
# configuration tasks= failed.
# We do not let the service enter maintenance mode, since
= # we want to keep the system as much operating as feasible.
#
if [ = $ret1 -ne 0 ]; then
smf_method_exit $SMF_EXIT_DEGRADED "hostkey_c= onfiguration" \
=C2=A0 =C2=A0"Failed to generate missing ho= st keys."
fi

So the equivalent of SMF_EXIT_DEGRA= DED might be what we're looking for?

--
--00000000000089a6af061e27f7b1--