zsh-workers
 help / color / mirror / code / Atom feed
* misleading message for SIGFPE
@ 2024-09-24 12:36 Vincent Lefevre
  2024-09-24 13:09 ` Andreas Kähäri
  2024-09-24 18:05 ` Bart Schaefer
  0 siblings, 2 replies; 9+ messages in thread
From: Vincent Lefevre @ 2024-09-24 12:36 UTC (permalink / raw)
  To: zsh-workers

When a command is terminated by SIGFPE, I get a message saying
"floating point exception" (this comes from Src/signames.c):

qaa% sh -c 'kill -FPE $$'
zsh: floating point exception (core dumped)  sh -c 'kill -FPE $$'

However, a SIGFPE may also be generated by integer operations
(such as 1 / 0).

ISO C and POSIX use the term "erroneous arithmetic operation".
The GNU C Library manual says "fatal arithmetic error".

BTW, in addition to the signal description, I would suggest to
output the signal name, e.g.

SIGFPE - erroneous arithmetic operation (core dumped)
SIGSEGV - segmentation fault (core dumped)
SIGKILL - killed

Otherwise it is not clear that the command termination is due to
a signal (even though the exit status can be checked / reported).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: misleading message for SIGFPE
  2024-09-24 12:36 misleading message for SIGFPE Vincent Lefevre
@ 2024-09-24 13:09 ` Andreas Kähäri
  2024-09-25 12:26   ` Vincent Lefevre
  2024-09-24 18:05 ` Bart Schaefer
  1 sibling, 1 reply; 9+ messages in thread
From: Andreas Kähäri @ 2024-09-24 13:09 UTC (permalink / raw)
  To: zsh-workers

On Tue, Sep 24, 2024 at 02:36:52PM +0200, Vincent Lefevre wrote:
> When a command is terminated by SIGFPE, I get a message saying
> "floating point exception" (this comes from Src/signames.c):
> 
> qaa% sh -c 'kill -FPE $$'
> zsh: floating point exception (core dumped)  sh -c 'kill -FPE $$'
> 
> However, a SIGFPE may also be generated by integer operations
> (such as 1 / 0).
> 
> ISO C and POSIX use the term "erroneous arithmetic operation".
> The GNU C Library manual says "fatal arithmetic error".
> 
> BTW, in addition to the signal description, I would suggest to
> output the signal name, e.g.
> 
> SIGFPE - erroneous arithmetic operation (core dumped)
> SIGSEGV - segmentation fault (core dumped)
> SIGKILL - killed
> 
> Otherwise it is not clear that the command termination is due to
> a signal (even though the exit status can be checked / reported).

Isn't it already clear from the value of $? what the signal was?

	$ sh -c 'kill -FPE $$'
	zsh: floating point exception  sh -c 'kill -FPE $$'
	$ echo $?
	136

	$ kill -l 136
	FPE


-- 
Andreas (Kusalananda) Kähäri
Uppsala, Sweden

.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: misleading message for SIGFPE
  2024-09-24 12:36 misleading message for SIGFPE Vincent Lefevre
  2024-09-24 13:09 ` Andreas Kähäri
@ 2024-09-24 18:05 ` Bart Schaefer
  2024-09-24 20:03   ` Bart Schaefer
  1 sibling, 1 reply; 9+ messages in thread
From: Bart Schaefer @ 2024-09-24 18:05 UTC (permalink / raw)
  To: zsh-workers

On Tue, Sep 24, 2024 at 5:37 AM Vincent Lefevre <vincent@vinc17.net> wrote:
>
> When a command is terminated by SIGFPE, I get a message saying
> "floating point exception" (this comes from Src/signames.c):

The "right way" to handle this would be to use the SIGFPE si_codes
from /usr/include/siginfo.h to break down the base signal into the
more-specific cases that cause it, but I'm not very familiar with
usage of siginfo or whether the parent process is able to obtain it
about the exiting child.  Anyone?

There are several other signals (e.g., SEGV, POLL) that have added
details available via siginfo.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: misleading message for SIGFPE
  2024-09-24 18:05 ` Bart Schaefer
@ 2024-09-24 20:03   ` Bart Schaefer
  2024-09-25 12:33     ` Vincent Lefevre
                       ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Bart Schaefer @ 2024-09-24 20:03 UTC (permalink / raw)
  To: zsh-workers

On Tue, Sep 24, 2024 at 11:05 AM Bart Schaefer
<schaefer@brasslantern.com> wrote:
>
> I'm not very familiar with
> usage of siginfo or whether the parent process is able to obtain it

Looks as if this is possible (man 5 siginfo).  It appears we'd have to
switch to using waitid(2) for child reaping.  However, there doesn't
appear to be a strsignal(3) equivalent of psiginfo(3) to grab the
error message rather than spew it on stderr.

OTOH if there's a reason we're not using strsignal() when it's
available, I've forgotten it.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: misleading message for SIGFPE
  2024-09-24 13:09 ` Andreas Kähäri
@ 2024-09-25 12:26   ` Vincent Lefevre
  0 siblings, 0 replies; 9+ messages in thread
From: Vincent Lefevre @ 2024-09-25 12:26 UTC (permalink / raw)
  To: zsh-workers

On 2024-09-24 15:09:37 +0200, Andreas Kähäri wrote:
> On Tue, Sep 24, 2024 at 02:36:52PM +0200, Vincent Lefevre wrote:
> > When a command is terminated by SIGFPE, I get a message saying
> > "floating point exception" (this comes from Src/signames.c):
> > 
> > qaa% sh -c 'kill -FPE $$'
> > zsh: floating point exception (core dumped)  sh -c 'kill -FPE $$'
> > 
> > However, a SIGFPE may also be generated by integer operations
> > (such as 1 / 0).
> > 
> > ISO C and POSIX use the term "erroneous arithmetic operation".
> > The GNU C Library manual says "fatal arithmetic error".
> > 
> > BTW, in addition to the signal description, I would suggest to
> > output the signal name, e.g.
> > 
> > SIGFPE - erroneous arithmetic operation (core dumped)
> > SIGSEGV - segmentation fault (core dumped)
> > SIGKILL - killed
> > 
> > Otherwise it is not clear that the command termination is due to
> > a signal (even though the exit status can be checked / reported).
> 
> Isn't it already clear from the value of $? what the signal was?
> 
> 	$ sh -c 'kill -FPE $$'
> 	zsh: floating point exception  sh -c 'kill -FPE $$'
> 	$ echo $?
> 	136
> 
> 	$ kill -l 136
> 	FPE

This requires another step. The idea would be to have it in the
error message. This could be useful in bug reports, like here:

https://github.com/pytorch/pytorch/issues/89817

You cannot go back in time to get the $? value.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: misleading message for SIGFPE
  2024-09-24 20:03   ` Bart Schaefer
@ 2024-09-25 12:33     ` Vincent Lefevre
  2024-09-25 18:35     ` Bart Schaefer
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Vincent Lefevre @ 2024-09-25 12:33 UTC (permalink / raw)
  To: zsh-workers

On 2024-09-24 13:03:27 -0700, Bart Schaefer wrote:
> On Tue, Sep 24, 2024 at 11:05 AM Bart Schaefer
> <schaefer@brasslantern.com> wrote:
> >
> > I'm not very familiar with
> > usage of siginfo or whether the parent process is able to obtain it
> 
> Looks as if this is possible (man 5 siginfo).  It appears we'd have to
> switch to using waitid(2) for child reaping.

I don't think that it brings anything useful. You'll get the si_code
of the parent (i.e. corresponding to SIGCHLD)[*], not the one that
corresponds to the SIGFPE for the child.

[*] So, as the waitid(2) man page says:

  si_code
    Set  to  one  of:  CLD_EXITED (child called _exit(2)); CLD_KILLED
    (child killed by signal); CLD_DUMPED (child killed by signal, and
    dumped core); CLD_STOPPED (child stopped by signal);  CLD_TRAPPED
    (traced  child has trapped); or CLD_CONTINUED (child continued by
    SIGCONT).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: misleading message for SIGFPE
  2024-09-24 20:03   ` Bart Schaefer
  2024-09-25 12:33     ` Vincent Lefevre
@ 2024-09-25 18:35     ` Bart Schaefer
  2024-09-26  7:58     ` Stephane Chazelas
  2024-10-02 18:59     ` zeurkous
  3 siblings, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 2024-09-25 18:35 UTC (permalink / raw)
  To: zsh-workers

On Tue, Sep 24, 2024 at 1:03 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> OTOH if there's a reason we're not using strsignal() when it's
> available, I've forgotten it.

I compared the output of zsh using sig_msg[] from signames.c to the
output of calling strsignal(), on Ubuntu 20.04.  For most signals
(including FPE) the difference is only whether the first character of
the error message is capitalized.  The messages that differed are
below (zsh first, strsignal following).  There is a USE_SUSPENDED
macro that determines the difference in the SIGT* signals.  Perhaps we
avoid strsignal() just for output compatibility with old zsh from
before that was available?

illegal hardware instruction (core dumped)
Illegal instruction (core dumped)

trace trap (core dumped)
Trace/breakpoint trap (core dumped)

abort (core dumped)
Aborted (core dumped)

alarm
Alarm clock

SIGSTKFLT
Stack fault

suspended (signal)
Stopped (signal)

suspended
Stopped

suspended (tty input)
Stopped (tty input)

suspended (tty output)
Stopped (tty output)

cpu limit exceeded (core dumped)
CPU time limit exceeded (core dumped)

virtual time alarm
Virtual timer expired

profile signal
Profiling timer expired

pollable event occurred
I/O possible

power fail
Power failure

invalid system call (core dumped)
Bad system call (core dumped)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: misleading message for SIGFPE
  2024-09-24 20:03   ` Bart Schaefer
  2024-09-25 12:33     ` Vincent Lefevre
  2024-09-25 18:35     ` Bart Schaefer
@ 2024-09-26  7:58     ` Stephane Chazelas
  2024-10-02 18:59     ` zeurkous
  3 siblings, 0 replies; 9+ messages in thread
From: Stephane Chazelas @ 2024-09-26  7:58 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

FYI, bosh, a POSIXified fork of the Bourne shell by the late
Jörg Schilling has $/, ${.sh.code}, ${.sh.codename} and a few
more special parameters to complement $?. See also the $status
of rc that has text representation of the exit status.

$ bosh -o fullexitcode -c 'sh -c "kill -s SEGV \$\$"; printf "%s\n" "?=$?" "/=$/" "code=${.sh.code}" "codename=${.sh.codename}" "status=${.sh.status}" "termsig=${.sh.termsig}" "signo=${.sh.signo}" "signame=${.sh.signame}"'                 Segmentation fault - core dumped
?=139
/=SEGV
code=3
codename=DUMPED
status=11
termsig=SEGV
signo=17
signame=CHLD

$ rc -c 'sh -c '\''kill -s SEGV $$'\''; echo $status'
segmentation violation--core dumped
sigsegv+core


See in bosh man page at https://codeberg.org/schilytools/schilytools/src/commit/e835e64f0d84a614b3c8d619ac646060ea6922a5/sh/sh.1#L1809

> ?       The  decimal  value returned by the last synchronously executed
>         command or a decimal number derived from the signal number that
>         killed the process.
>
>         Only the low 8 bits of the exit code from the command are visi‐
>         ble   unless   exit   code   masking   is   switched   off   by
>         ``set -o fullexitcode''.   The  ability to see all 32 bits from
>         the exit code requires a modern UNIX compliant operating system
>         with working support for waitid(2).
>
>         If the executable file could not be found, the  returned  value
>         is  127.  If the file exists but could not be executed, the re‐
>         turned value is 126.
>
>         If bosh has been compiled with DO_EXIT_MODFIX (which is not the
>         default and not recommended by POSIX) and if a  command's  exit
>         code  modulo  256 is zero and ``set -o fullexitcode'' is not in
>         effect, the returned value is 128, except  when  the  operating
>         system  does  not  support  waitid(2), as the exit code then is
>         masked by the kernel.
>
>         If the command was killed by a signal, the  returned  value  is
>         128  + the signal number.  As a result, apparent exit code val‐
>         ues in the range 129..200 may also have been caused by  a  sig‐
>         nal.
>
>         If  the  shell  itself  or  a  sub shell catches a signal while
>         preparing a job, the exit code is 2000, or (when exit codes are
>         masked to only the low 8 bits) 208.
>
> /       A decimal number or text indicating the exit status returned by
>         the last synchronously executed command.
>
>         If $/ returns a decimal number, this is (on a POSIX system) the
>         32 bit exit code from the last command that did normally  exit.
>         Older  non-POSIX systems like Linux or UNIX systems from before
>         SVr4 return only the low 8 bits from the  exit  code.   In  any
>         case, the number was a result from a normal program exit.
>
>         If $/ returns text, this is either a signal name with the lead‐
>         ing  ``SIG''  stripped  off, like ``INT'' (see kill -l) for the
>         signal that terminated  the  program  or  one  of  the  strings
>         ``NOEXEC''  or  ``NOTFOUND'',  in case the program could not be
>         run at all.  The strings ``NOEXEC'' and  ``NOTFOUND''  are  re‐
>         turned  reliably from vfork(2) childs or when the related state
>         is already known by the cache.  This is  true  for  all  simple
>         commands.
>
>         Note  that  unless ``set -o fullexitcode'' is in effect, $/ may
>         have a non-zero value where value mod 256 == 0 and the shell in
>         such a case evaluates conditional execution as if the exit code
>         was zero.  This is the default behavior required by  POSIX  for
>         compatibility with historic shells.
[...]
> .sh.code
>         The  numerical  reason  waitid(2) returned for the child status
>         change. It matches the CLD_* definitions from  signal.h.   Note
>         that  the numbers are usually in the range 1..6 but this is not
>         guaranteed.  Use ${.sh.codename} for portability.
>
> .sh.codename
>         The reason waitid(2) returned for the child  status  change  as
>         text  that  is generated by stripping off CLD_ from the related
>         definitions from signal.h.  Possible values are:
>
>         EXITED      The  program  had  a  normal  termination  and  the
>                     exit(2) code is in ${.sh.status}.
>
>         KILLED      The program was killed by a signal, the signal num‐
>                     ber  is  in  ${.sh.status}  the  signal  name is in
>                     ${.sh.termsig}.
>
>         DUMPED      The program was killed  by  a  signal,  similar  to
>                     KILLED above, but the program in addition created a
>                     core dump.
>
>         TRAPPED     A traced child has trapped.
>
>         STOPPED     The  program  was  stopped  by a signal, the signal
>                     number is in ${.sh.status} the signal  name  is  in
>                     ${.sh.termsig}.
>
>         CONTINUED   A stopped child was continued.
>
>         NOEXEC      An  existing  file  could not be executed. This can
>                     happen when e.g. either the type of the file is not
>                     plain file or when the file does not  have  execute
>                     permission, or when the argument list is too long.
>
>                     This  is  not  a result from waitid(2) but from ex‐
>                     ecve(2).
>
>         NOTFOUND    A file was not found and thus  could  not  be  exe‐
>                     cuted.
>
>                     This  is  not  a result from waitid(2) but from ex‐
>                     ecve(2).
>
>         The child codes NOEXEC and  NOTFOUND  in  ${.sh.codename}  need
>         shared  memory (e.g. from vfork(2)) to allow a reliable report‐
>         ing.
[...]
> .sh.status
>         The  decimal  value returned by the last synchronously executed
>         command.  The value is unaltered and contains the full int from
>         the exit(2) call in the child in case the shell  is  run  on  a
>         modern os.

"modern os" here meaning one where waitid() returns the full
value (not truncated to 8 bits) which is not case of Linux (Jörg
had a bit of a grudge against GNU/Linux).

>
> .sh.termsig
>         The  signal  name related to the numerical ${.sh.status} value.
>         The translation to  signal  names  takes  place  regardless  of
>         whether the child was terminated by a signal or terminated nor‐
>         mally.
[...]

And also, only remotely related:

> .sh.signame
>         The name of the causing signal.  If the status is related to  a
>         set  of waitid(2) return values, this is CHLD or CLD, depending
>         on the os.  When a trap(1) command is executed,  ${.sh.signame}
>         holds the signal that caused the trap.
>
> .sh.signo
>         The signal number related to ${.sh.signame}.

-- 
Stephane


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Re: misleading message for SIGFPE
  2024-09-24 20:03   ` Bart Schaefer
                       ` (2 preceding siblings ...)
  2024-09-26  7:58     ` Stephane Chazelas
@ 2024-10-02 18:59     ` zeurkous
  3 siblings, 0 replies; 9+ messages in thread
From: zeurkous @ 2024-10-02 18:59 UTC (permalink / raw)
  To: Bart Schaefer, zsh-workers

On Tue, 24 Sep 2024 13:03:27 -0700, Bart Schaefer <schaefer@brasslantern.com> wrote:
> OTOH if there's a reason we're not using strsignal() when it's
> available, I've forgotten it.

AOL!

        --zeurkous.

-- 
Friggin' Machines!


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-10-02 19:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-24 12:36 misleading message for SIGFPE Vincent Lefevre
2024-09-24 13:09 ` Andreas Kähäri
2024-09-25 12:26   ` Vincent Lefevre
2024-09-24 18:05 ` Bart Schaefer
2024-09-24 20:03   ` Bart Schaefer
2024-09-25 12:33     ` Vincent Lefevre
2024-09-25 18:35     ` Bart Schaefer
2024-09-26  7:58     ` Stephane Chazelas
2024-10-02 18:59     ` zeurkous

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).