From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7483
Path: news.gmane.org!not-for-mail
From: Raphael Cohn <raphael.cohn@stormmq.com>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: setenv if value=NULL, what say standard? Bug?
Date: Thu, 23 Apr 2015 10:52:10 +0100
Message-ID: <CACCP0Gqd60JLtbD=iy1a2Vbcry87NgyzdEJ+E=Are3nzh4Vjbw@mail.gmail.com>
References: <553837F1.5080808@safe.ca>
	<55383E43.8010505@skarnet.org>
	<55384A61.5020001@safe.ca>
	<20150423021507.GG6817@brightrain.aerifal.cx>
	<5538740E.1030306@safe.ca>
	<5538BA11.90402@skarnet.org>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=047d7b5d33daca6f8d0514613be5
X-Trace: ger.gmane.org 1429782747 21601 80.91.229.3 (23 Apr 2015 09:52:27 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 23 Apr 2015 09:52:27 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-7496-gllmg-musl=m.gmane.org@lists.openwall.com Thu Apr 23 11:52:26 2015
Return-path: <musl-return-7496-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-7496-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1YlDo1-0007aW-T0
	for gllmg-musl@m.gmane.org; Thu, 23 Apr 2015 11:52:26 +0200
Original-Received: (qmail 30545 invoked by uid 550); 23 Apr 2015 09:52:24 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 30489 invoked from network); 23 Apr 2015 09:52:22 -0000
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20130820;
        h=x-gm-message-state:mime-version:in-reply-to:references:date
         :message-id:subject:from:to:content-type;
        bh=kWHZDYif01niNcfUaOC16BAlHbO+wcdxkHQB2xtEPkw=;
        b=DJXKm/c0a/ZF371HbJA5XW79vUG5+EFEFRU1+ySRN27GfS+cGyxQjloOZYdjAmEMDX
         cpQykNQgrtliB8dFbYJ5mjP6Pok6UHuaru5E354/zLx6jAlRKdZp7rCKoO2W51KxHmH2
         RPP19PRmigtEo8A0xbSWV4QZ8DgSSq1oHVyWkVqaOxWwPbwTdRIXBcdFA6ziauShGYu4
         Q/HO+Fg8F31EYNRTF7iaIDVykd4PTMhFsajTZR8PhK7NruOW4d8UGHHNFe+N562K3EoC
         p4jTv+1xHMBBYHJZnzJ43ViPg3cL5sM6bnWHNZBm/GX6QV4ZvXBm+hYV4ixyGvzZ60iT
         hplw==
X-Gm-Message-State: ALoCoQks/ai5swwobPrVMWqbR5e1Y/vfgzt/jkm1hQU8goC+VoRdEREQe4sH5YmQ/swuzFI3GEiV
X-Received: by 10.60.124.69 with SMTP id mg5mr1670047oeb.76.1429782731053;
 Thu, 23 Apr 2015 02:52:11 -0700 (PDT)
X-Originating-IP: [2001:8b0:862:b944:f972:1347:91ee:161e]
In-Reply-To: <5538BA11.90402@skarnet.org>
Xref: news.gmane.org gmane.linux.lib.musl.general:7483
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/7483>

--047d7b5d33daca6f8d0514613be5
Content-Type: text/plain; charset=UTF-8

On 23 April 2015 at 10:23, Laurent Bercot <ska-dietlibc@skarnet.org> wrote:

> On 23/04/2015 06:24, Jean-Marc Pigeon wrote:
>
>> Think about this, you write an application working perfectly right,
>> but 1 in 1000000 you reach something not trapped by low level and
>> once in while the application (in production for month) just stop
>> to work because "unexpected" within musl...
>>
>
>  And why do you think the problem exists in the first place ?
> Because other libcs were defensive and failed to fail early, so the
> bug was never discovered until now. Your application is not working
> perfectly right - it is buggy, and it *should* fail. musl is giving
> developers a gift that other libcs do not: it helps them debug.
>
>
>  (so someone will propose to set a cron to automatically restart this
>> unreliable daemon, hmmm...)
>>
>
>  You want to be defensive, well, yeah, this is the place to be
> defensive. Until the bug is found and fixed, at least the daemon is
> kind of providing service.
>
>  Raphael says this behaviour is wrong for the same reason that
> silently failing is wrong, but I disagree. First, restarting crashing
> daemons is not silent at all, a crash is always a loud warning and
> can hardly be ignored; and second, restarting a process is not
> continuing it. A process can always be restarted from a clean state
> and work in a predictable way until it trips the bug again, whereas
> silently ignoring UB makes the process unpredictable for the rest of
> its lifetime.


Yes and no. Crashes are loud and noisy, and should immediately trigger
alerts, but without intimate knowledge of the application and the cause of
the fault, auto-restarting is risky. In my operational experience, it's
usually been a hack employed by incompetent sysadmins (no names, no pack
drill, but one large government dept comes to mind). If you have knowledge
of your daemon processes, then you could if:-
- you know they are idempotent or do not have persistent state (eg DNS
caches)
- they're essential system services (definitions might vary, but I'd have
ssh for geographically remote boxes here)
 That said, stuff that has complex state really shouldn't be restarted
without *investigation* - message brokers, relational database titans,
cluster HA set ups, etc. The worst outage of my career was a terracotta
cluster that had suffered from a split brain. Restarting it naively caused
it to _delete_ the only remaining good state. Is this your 'clean state'
caveat above?

>
>
>  Far better to return "trouble" status, then it is to the application
>> to decide what must be done in context, as ignore, override, bypass,
>> crash, etc.
>>
>
>  What "trouble" status do you return when a function dereferences a
> NULL pointer ? This is exactly what's happening here. Passing NULL
> to setenv is as incorrect as dereferencing NULL, and should result
> in the same behaviour.
>
>
>  A sensible policy in case of UB would be for such low level code to
>> swallow the problem, (protect the hardware and keep the program
>> running as much as possible).
>>
>
>  The language you want is Javascript, not C.
>
>
>  As reported, the crashing application is hwclock, (util-linux-2.26),
>> this a kind of code in the field for a very  very long time, so the
>> library (glibc and old libc) used for linux over the years defined an
>> expected behavior to this "UB".
>>
>
>  And this is why musl is so much better. If glibc and uclibc devs
> hadn't been so complacent, the bug wouldn't have lived for so long.
>
>
>  Crashing is not an option for code pertaining to musl/libc layer.
>>
>
>  It definitely is. You don't want your program to crash ? Don't
> invoke UB.
>  If you want to be "safe", you can ignore SIGSEGV at the start of
> all your applications - it will be the exact same thing as what you
> are asking. Your daemons will live longer, I guarantee it.
>
>
>  (:-} why bother to return an error, just crash for all
>> problems in open, close, write, etc. just bringing the crashing
>> concept to the extreme :-}).
>>
>
>  Straw man. You know as well as we do the difference between a
> programming error and a run-time error.
>
>
>  My experience (for a long time now) about writing complex daemon
>> running for months/year, it is not that straightforward (may
>> be for a simple application it is)
>>
>
>  And mine is that it is. We're evens, now please let's stop bringing
> up anecdotal evidence.
>
> --
>  Laurent
>
>

--047d7b5d33daca6f8d0514613be5
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><div class=3D"gmail_extra"><div class=3D"gmail_quote">=
On 23 April 2015 at 10:23, Laurent Bercot <span dir=3D"ltr">&lt;<a href=3D"=
mailto:ska-dietlibc@skarnet.org" target=3D"_blank">ska-dietlibc@skarnet.org=
</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin=
:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=3D"">O=
n 23/04/2015 06:24, Jean-Marc Pigeon wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Think about this, you write an application working perfectly right,<br>
but 1 in 1000000 you reach something not trapped by low level and<br>
once in while the application (in production for month) just stop<br>
to work because &quot;unexpected&quot; within musl...<br>
</blockquote>
<br></span>
=C2=A0And why do you think the problem exists in the first place ?<br>
Because other libcs were defensive and failed to fail early, so the<br>
bug was never discovered until now. Your application is not working<br>
perfectly right - it is buggy, and it *should* fail. musl is giving<br>
developers a gift that other libcs do not: it helps them debug.<span class=
=3D""><br>
<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
(so someone will propose to set a cron to automatically restart this<br>
unreliable daemon, hmmm...)<br>
</blockquote>
<br></span>
=C2=A0You want to be defensive, well, yeah, this is the place to be<br>
defensive. Until the bug is found and fixed, at least the daemon is<br>
kind of providing service.<br>
<br>
=C2=A0Raphael says this behaviour is wrong for the same reason that<br>
silently failing is wrong, but I disagree. First, restarting crashing<br>
daemons is not silent at all, a crash is always a loud warning and<br>
can hardly be ignored; and second, restarting a process is not<br>
continuing it. A process can always be restarted from a clean state<br>
and work in a predictable way until it trips the bug again, whereas<br>
silently ignoring UB makes the process unpredictable for the rest of<br>
its lifetime.<span class=3D""></span>=C2=A0=C2=A0</blockquote><div><br></di=
v><div>Yes and no. Crashes are loud and noisy, and should immediately trigg=
er alerts, but without intimate knowledge of the application and the cause =
of the fault, auto-restarting is risky. In my operational experience, it=
9;s usually been a hack employed by incompetent sysadmins (no names, no pac=
k drill, but one large government dept comes to mind). If you have knowledg=
e of your daemon processes, then you could if:-<br>- you know they are idem=
potent or do not have persistent state (eg DNS caches)<br>- they&#39;re ess=
ential system services (definitions might vary, but I&#39;d have ssh for ge=
ographically remote boxes here)<br></div><div>=C2=A0That said, stuff that h=
as complex state really shouldn&#39;t be restarted without *investigation* =
- message brokers, relational database titans, cluster HA set ups, etc. The=
 worst outage of my career was a terracotta cluster that had suffered from =
a split brain. Restarting it naively caused it to _delete_ the only remaini=
ng good state. Is this your &#39;clean state&#39; caveat above?<br></div><b=
lockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px =
#ccc solid;padding-left:1ex"><span class=3D"">
<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Far better to return &quot;trouble&quot; status, then it is to the applicat=
ion<br>
to decide what must be done in context, as ignore, override, bypass,<br>
crash, etc.<br>
</blockquote>
<br></span>
=C2=A0What &quot;trouble&quot; status do you return when a function derefer=
ences a<br>
NULL pointer ? This is exactly what&#39;s happening here. Passing NULL<br>
to setenv is as incorrect as dereferencing NULL, and should result<br>
in the same behaviour.<span class=3D""><br>
<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
A sensible policy in case of UB would be for such low level code to<br>
swallow the problem, (protect the hardware and keep the program<br>
running as much as possible).<br>
</blockquote>
<br></span>
=C2=A0The language you want is Javascript, not C.<span class=3D""><br>
<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
As reported, the crashing application is hwclock, (util-linux-2.26),<br>
this a kind of code in the field for a very=C2=A0 very long time, so the<br=
>
library (glibc and old libc) used for linux over the years defined an<br>
expected behavior to this &quot;UB&quot;.<br>
</blockquote>
<br></span>
=C2=A0And this is why musl is so much better. If glibc and uclibc devs<br>
hadn&#39;t been so complacent, the bug wouldn&#39;t have lived for so long.=
<span class=3D""><br>
<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Crashing is not an option for code pertaining to musl/libc layer.<br>
</blockquote>
<br></span>
=C2=A0It definitely is. You don&#39;t want your program to crash ? Don&#39;=
t<br>
invoke UB.<br>
=C2=A0If you want to be &quot;safe&quot;, you can ignore SIGSEGV at the sta=
rt of<br>
all your applications - it will be the exact same thing as what you<br>
are asking. Your daemons will live longer, I guarantee it.<span class=3D"">=
<br>
<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
(:-} why bother to return an error, just crash for all<br>
problems in open, close, write, etc. just bringing the crashing<br>
concept to the extreme :-}).<br>
</blockquote>
<br></span>
=C2=A0Straw man. You know as well as we do the difference between a<br>
programming error and a run-time error.<span class=3D""><br>
<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
My experience (for a long time now) about writing complex daemon<br>
running for months/year, it is not that straightforward (may<br>
be for a simple application it is)<br>
</blockquote>
<br></span>
=C2=A0And mine is that it is. We&#39;re evens, now please let&#39;s stop br=
inging<br>
up anecdotal evidence.<span class=3D"HOEnZb"><font color=3D"#888888"><br>
<br>
-- <br>
=C2=A0Laurent<br>
<br>
</font></span></blockquote></div><br></div></div>

--047d7b5d33daca6f8d0514613be5--