From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7483 Path: news.gmane.org!not-for-mail From: Raphael Cohn Newsgroups: gmane.linux.lib.musl.general Subject: Re: setenv if value=NULL, what say standard? Bug? Date: Thu, 23 Apr 2015 10:52:10 +0100 Message-ID: References: <553837F1.5080808@safe.ca> <55383E43.8010505@skarnet.org> <55384A61.5020001@safe.ca> <20150423021507.GG6817@brightrain.aerifal.cx> <5538740E.1030306@safe.ca> <5538BA11.90402@skarnet.org> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=047d7b5d33daca6f8d0514613be5 X-Trace: ger.gmane.org 1429782747 21601 80.91.229.3 (23 Apr 2015 09:52:27 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 23 Apr 2015 09:52:27 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7496-gllmg-musl=m.gmane.org@lists.openwall.com Thu Apr 23 11:52:26 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YlDo1-0007aW-T0 for gllmg-musl@m.gmane.org; Thu, 23 Apr 2015 11:52:26 +0200 Original-Received: (qmail 30545 invoked by uid 550); 23 Apr 2015 09:52:24 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 30489 invoked from network); 23 Apr 2015 09:52:22 -0000 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=kWHZDYif01niNcfUaOC16BAlHbO+wcdxkHQB2xtEPkw=; b=DJXKm/c0a/ZF371HbJA5XW79vUG5+EFEFRU1+ySRN27GfS+cGyxQjloOZYdjAmEMDX cpQykNQgrtliB8dFbYJ5mjP6Pok6UHuaru5E354/zLx6jAlRKdZp7rCKoO2W51KxHmH2 RPP19PRmigtEo8A0xbSWV4QZ8DgSSq1oHVyWkVqaOxWwPbwTdRIXBcdFA6ziauShGYu4 Q/HO+Fg8F31EYNRTF7iaIDVykd4PTMhFsajTZR8PhK7NruOW4d8UGHHNFe+N562K3EoC p4jTv+1xHMBBYHJZnzJ43ViPg3cL5sM6bnWHNZBm/GX6QV4ZvXBm+hYV4ixyGvzZ60iT hplw== X-Gm-Message-State: ALoCoQks/ai5swwobPrVMWqbR5e1Y/vfgzt/jkm1hQU8goC+VoRdEREQe4sH5YmQ/swuzFI3GEiV X-Received: by 10.60.124.69 with SMTP id mg5mr1670047oeb.76.1429782731053; Thu, 23 Apr 2015 02:52:11 -0700 (PDT) X-Originating-IP: [2001:8b0:862:b944:f972:1347:91ee:161e] In-Reply-To: <5538BA11.90402@skarnet.org> Xref: news.gmane.org gmane.linux.lib.musl.general:7483 Archived-At: --047d7b5d33daca6f8d0514613be5 Content-Type: text/plain; charset=UTF-8 On 23 April 2015 at 10:23, Laurent Bercot wrote: > On 23/04/2015 06:24, Jean-Marc Pigeon wrote: > >> Think about this, you write an application working perfectly right, >> but 1 in 1000000 you reach something not trapped by low level and >> once in while the application (in production for month) just stop >> to work because "unexpected" within musl... >> > > And why do you think the problem exists in the first place ? > Because other libcs were defensive and failed to fail early, so the > bug was never discovered until now. Your application is not working > perfectly right - it is buggy, and it *should* fail. musl is giving > developers a gift that other libcs do not: it helps them debug. > > > (so someone will propose to set a cron to automatically restart this >> unreliable daemon, hmmm...) >> > > You want to be defensive, well, yeah, this is the place to be > defensive. Until the bug is found and fixed, at least the daemon is > kind of providing service. > > Raphael says this behaviour is wrong for the same reason that > silently failing is wrong, but I disagree. First, restarting crashing > daemons is not silent at all, a crash is always a loud warning and > can hardly be ignored; and second, restarting a process is not > continuing it. A process can always be restarted from a clean state > and work in a predictable way until it trips the bug again, whereas > silently ignoring UB makes the process unpredictable for the rest of > its lifetime. Yes and no. Crashes are loud and noisy, and should immediately trigger alerts, but without intimate knowledge of the application and the cause of the fault, auto-restarting is risky. In my operational experience, it's usually been a hack employed by incompetent sysadmins (no names, no pack drill, but one large government dept comes to mind). If you have knowledge of your daemon processes, then you could if:- - you know they are idempotent or do not have persistent state (eg DNS caches) - they're essential system services (definitions might vary, but I'd have ssh for geographically remote boxes here) That said, stuff that has complex state really shouldn't be restarted without *investigation* - message brokers, relational database titans, cluster HA set ups, etc. The worst outage of my career was a terracotta cluster that had suffered from a split brain. Restarting it naively caused it to _delete_ the only remaining good state. Is this your 'clean state' caveat above? > > > Far better to return "trouble" status, then it is to the application >> to decide what must be done in context, as ignore, override, bypass, >> crash, etc. >> > > What "trouble" status do you return when a function dereferences a > NULL pointer ? This is exactly what's happening here. Passing NULL > to setenv is as incorrect as dereferencing NULL, and should result > in the same behaviour. > > > A sensible policy in case of UB would be for such low level code to >> swallow the problem, (protect the hardware and keep the program >> running as much as possible). >> > > The language you want is Javascript, not C. > > > As reported, the crashing application is hwclock, (util-linux-2.26), >> this a kind of code in the field for a very very long time, so the >> library (glibc and old libc) used for linux over the years defined an >> expected behavior to this "UB". >> > > And this is why musl is so much better. If glibc and uclibc devs > hadn't been so complacent, the bug wouldn't have lived for so long. > > > Crashing is not an option for code pertaining to musl/libc layer. >> > > It definitely is. You don't want your program to crash ? Don't > invoke UB. > If you want to be "safe", you can ignore SIGSEGV at the start of > all your applications - it will be the exact same thing as what you > are asking. Your daemons will live longer, I guarantee it. > > > (:-} why bother to return an error, just crash for all >> problems in open, close, write, etc. just bringing the crashing >> concept to the extreme :-}). >> > > Straw man. You know as well as we do the difference between a > programming error and a run-time error. > > > My experience (for a long time now) about writing complex daemon >> running for months/year, it is not that straightforward (may >> be for a simple application it is) >> > > And mine is that it is. We're evens, now please let's stop bringing > up anecdotal evidence. > > -- > Laurent > > --047d7b5d33daca6f8d0514613be5 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

= On 23 April 2015 at 10:23, Laurent Bercot <ska-dietlibc@skarnet.org= > wrote:
O= n 23/04/2015 06:24, Jean-Marc Pigeon wrote:
Think about this, you write an application working perfectly right,
but 1 in 1000000 you reach something not trapped by low level and
once in while the application (in production for month) just stop
to work because "unexpected" within musl...

=C2=A0And why do you think the problem exists in the first place ?
Because other libcs were defensive and failed to fail early, so the
bug was never discovered until now. Your application is not working
perfectly right - it is buggy, and it *should* fail. musl is giving
developers a gift that other libcs do not: it helps them debug.


(so someone will propose to set a cron to automatically restart this
unreliable daemon, hmmm...)

=C2=A0You want to be defensive, well, yeah, this is the place to be
defensive. Until the bug is found and fixed, at least the daemon is
kind of providing service.

=C2=A0Raphael says this behaviour is wrong for the same reason that
silently failing is wrong, but I disagree. First, restarting crashing
daemons is not silent at all, a crash is always a loud warning and
can hardly be ignored; and second, restarting a process is not
continuing it. A process can always be restarted from a clean state
and work in a predictable way until it trips the bug again, whereas
silently ignoring UB makes the process unpredictable for the rest of
its lifetime.=C2=A0=C2=A0

Yes and no. Crashes are loud and noisy, and should immediately trigg= er alerts, but without intimate knowledge of the application and the cause = of the fault, auto-restarting is risky. In my operational experience, it= 9;s usually been a hack employed by incompetent sysadmins (no names, no pac= k drill, but one large government dept comes to mind). If you have knowledg= e of your daemon processes, then you could if:-
- you know they are idem= potent or do not have persistent state (eg DNS caches)
- they're ess= ential system services (definitions might vary, but I'd have ssh for ge= ographically remote boxes here)
=C2=A0That said, stuff that h= as complex state really shouldn't be restarted without *investigation* = - message brokers, relational database titans, cluster HA set ups, etc. The= worst outage of my career was a terracotta cluster that had suffered from = a split brain. Restarting it naively caused it to _delete_ the only remaini= ng good state. Is this your 'clean state' caveat above?


Far better to return "trouble" status, then it is to the applicat= ion
to decide what must be done in context, as ignore, override, bypass,
crash, etc.

=C2=A0What "trouble" status do you return when a function derefer= ences a
NULL pointer ? This is exactly what's happening here. Passing NULL
to setenv is as incorrect as dereferencing NULL, and should result
in the same behaviour.


A sensible policy in case of UB would be for such low level code to
swallow the problem, (protect the hardware and keep the program
running as much as possible).

=C2=A0The language you want is Javascript, not C.


As reported, the crashing application is hwclock, (util-linux-2.26),
this a kind of code in the field for a very=C2=A0 very long time, so the library (glibc and old libc) used for linux over the years defined an
expected behavior to this "UB".

=C2=A0And this is why musl is so much better. If glibc and uclibc devs
hadn't been so complacent, the bug wouldn't have lived for so long.=


Crashing is not an option for code pertaining to musl/libc layer.

=C2=A0It definitely is. You don't want your program to crash ? Don'= t
invoke UB.
=C2=A0If you want to be "safe", you can ignore SIGSEGV at the sta= rt of
all your applications - it will be the exact same thing as what you
are asking. Your daemons will live longer, I guarantee it.=


(:-} why bother to return an error, just crash for all
problems in open, close, write, etc. just bringing the crashing
concept to the extreme :-}).

=C2=A0Straw man. You know as well as we do the difference between a
programming error and a run-time error.


My experience (for a long time now) about writing complex daemon
running for months/year, it is not that straightforward (may
be for a simple application it is)

=C2=A0And mine is that it is. We're evens, now please let's stop br= inging
up anecdotal evidence.

--
=C2=A0Laurent


--047d7b5d33daca6f8d0514613be5--