zsh-workers
 help / color / mirror / code / Atom feed
* [BUG] Crash due to malloc call in signal handler
@ 2019-12-12 18:28 ` Antoine C.
  2019-12-13  9:40   ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Antoine C. @ 2019-12-12 18:28 UTC (permalink / raw)
  To: Zsh Workers List

Hello,

I finally found the cause of the frequent crashes I reported one
year ago ( https://www.zsh.org/mla/workers/2019/msg00059.html ).

This is due to malloc calls from signal handler, for instance:

#0  tcache_get (tc_idx=17) at malloc.c:2943
#1  __GI___libc_malloc (bytes=296) at malloc.c:3050
#2  0x000055c2217b27b5 in malloc (size=8) at ./main.c:255
#3  0x000055c2218166f9 in zalloc (size=8) at mem.c:966
#4  0x000055c221806da2 in addbgstatus (pid=11959, status=0) at jobs.c:2192
#5  0x000055c2218478e7 in wait_for_processes () at signals.c:583
#6  0x000055c221847cdc in zhandler (sig=17) at signals.c:648
#7  <signal handler called>
#8  0x00007f8895b69209 in __GI___sigsuspend (set=0x7ffe759b7160) at ../sysdeps/unix/sysv/linux/sigsuspend.c:26
#9  0x000055c221847376 in signal_suspend (sig=17, wait_cmd=1) at signals.c:393
#10 0x000055c2218054e8 in waitforpid (pid=11953, wait_cmd=1) at jobs.c:1551
#11 0x000055c221807a10 in bin_fg (name=0x7f8896af4798 "wait", argv=0x7f8896af4830, ops=0x7ffe759b75c0, func=4) at jobs.c:2371

All the backtraces I get does not always show a signal, and I 
get a lot a various errors occuring either in a malloc or a free;
however, I have been debugging this problem by enabling mcheck(),
and in this very case, all the crashes occur within freehook() 
and when tracing back the associated malloc() I can find it 
always occurs during double interlaced malloc() calls from the
main and signal contexts.

I can provide more info to reproduce the problem.

Antoine

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Crash due to malloc call in signal handler
  2019-12-12 18:28 ` [BUG] Crash due to malloc call in signal handler Antoine C.
@ 2019-12-13  9:40   ` Peter Stephenson
  2019-12-13  9:45     ` Roman Perepelitsa
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Stephenson @ 2019-12-13  9:40 UTC (permalink / raw)
  To: zsh-workers

On Thu, 2019-12-12 at 19:28 +0100, Antoine C. wrote:
> Hello,
> 
> I finally found the cause of the frequent crashes I reported one
> year ago ( https://protect2.fireeye.com/url?k=605d4b55-3d89f611-605cc01a-0cc47a31381a-5ceba38dc2a22d2c&u=https://www.zsh.org/mla/workers/2019/msg00059.html ).
> 
> This is due to malloc calls from signal handler, for instance:
> 
> #0  tcache_get (tc_idx=17) at malloc.c:2943
> #1  __GI___libc_malloc (bytes=296) at malloc.c:3050
> #2  0x000055c2217b27b5 in malloc (size=8) at ./main.c:255
> #3  0x000055c2218166f9 in zalloc (size=8) at mem.c:966
> #4  0x000055c221806da2 in addbgstatus (pid=11959, status=0) at jobs.c:2192
> #5  0x000055c2218478e7 in wait_for_processes () at signals.c:583
> #6  0x000055c221847cdc in zhandler (sig=17) at signals.c:648
> #7  <signal handler called>
> #8  0x00007f8895b69209 in __GI___sigsuspend (set=0x7ffe759b7160) at ../sysdeps/unix/sysv/linux/sigsuspend.c:26
> #9  0x000055c221847376 in signal_suspend (sig=17, wait_cmd=1) at signals.c:393
> #10 0x000055c2218054e8 in waitforpid (pid=11953, wait_cmd=1) at jobs.c:1551
> #11 0x000055c221807a10 in bin_fg (name=0x7f8896af4798 "wait", argv=0x7f8896af4830, ops=0x7ffe759b75c0, func=4) at jobs.c:2371

The main shell is suspended, waiting for a child to finish, so the fact
it's in the signal handler isn't saying anything.

From the look of it, some memory corruption must already have occurred
at this point to get the malloc to fail.

pws


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Crash due to malloc call in signal handler
  2019-12-13  9:40   ` Peter Stephenson
@ 2019-12-13  9:45     ` Roman Perepelitsa
  2019-12-13 10:16       ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Roman Perepelitsa @ 2019-12-13  9:45 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

On Fri, Dec 13, 2019 at 10:40 AM Peter Stephenson
<p.stephenson@samsung.com> wrote:
> The main shell is suspended, waiting for a child to finish, so the fact
> it's in the signal handler isn't saying anything.
>
> From the look of it, some memory corruption must already have occurred
> at this point to get the malloc to fail.

malloc is not async signal safe. It's illegal to call it from a signal
handler. I'm not saying this is what's causing a crash.

Roman.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Crash due to malloc call in signal handler
  2019-12-13  9:45     ` Roman Perepelitsa
@ 2019-12-13 10:16       ` Peter Stephenson
  2019-12-13 10:19         ` Roman Perepelitsa
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Stephenson @ 2019-12-13 10:16 UTC (permalink / raw)
  To: Zsh hackers list

On Fri, 2019-12-13 at 10:45 +0100, Roman Perepelitsa wrote:
> On Fri, Dec 13, 2019 at 10:40 AM Peter Stephenson
> <p.stephenson@samsung.com> wrote:
> > 
> > The main shell is suspended, waiting for a child to finish, so the fact
> > it's in the signal handler isn't saying anything.
> > 
> > From the look of it, some memory corruption must already have occurred
> > at this point to get the malloc to fail.
> malloc is not async signal safe. It's illegal to call it from a signal
> handler. I'm not saying this is what's causing a crash.

In zsh, this is handled by queuing interrupts and only allowing them to
run in a few places in the code.  Obviously, waiting for a child to
exit is one of those places.

pws


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Crash due to malloc call in signal handler
  2019-12-13 10:16       ` Peter Stephenson
@ 2019-12-13 10:19         ` Roman Perepelitsa
  2019-12-13 10:31           ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Roman Perepelitsa @ 2019-12-13 10:19 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

On Fri, Dec 13, 2019 at 11:17 AM Peter Stephenson
<p.stephenson@samsung.com> wrote:
>
> On Fri, 2019-12-13 at 10:45 +0100, Roman Perepelitsa wrote:
> > On Fri, Dec 13, 2019 at 10:40 AM Peter Stephenson
> > <p.stephenson@samsung.com> wrote:
> > >
> > > The main shell is suspended, waiting for a child to finish, so the fact
> > > it's in the signal handler isn't saying anything.
> > >
> > > From the look of it, some memory corruption must already have occurred
> > > at this point to get the malloc to fail.
> > malloc is not async signal safe. It's illegal to call it from a signal
> > handler. I'm not saying this is what's causing a crash.
>
> In zsh, this is handled by queuing interrupts and only allowing them to
> run in a few places in the code.  Obviously, waiting for a child to
> exit is one of those places.

The stack trace shows malloc being called zhandler. zhandler is a
signal handler. What am I missing?

Roman.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Crash due to malloc call in signal handler
  2019-12-13 10:19         ` Roman Perepelitsa
@ 2019-12-13 10:31           ` Peter Stephenson
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Stephenson @ 2019-12-13 10:31 UTC (permalink / raw)
  To: Zsh hackers list

On Fri, 2019-12-13 at 11:19 +0100, Roman Perepelitsa wrote:
> On Fri, Dec 13, 2019 at 11:17 AM Peter Stephenson
> <p.stephenson@samsung.com> wrote:
> > 
> > 
> > On Fri, 2019-12-13 at 10:45 +0100, Roman Perepelitsa wrote:
> > > 
> > > On Fri, Dec 13, 2019 at 10:40 AM Peter Stephenson
> > > <p.stephenson@samsung.com> wrote:
> > > > 
> > > > 
> > > > The main shell is suspended, waiting for a child to finish, so the fact
> > > > it's in the signal handler isn't saying anything.
> > > > 
> > > > From the look of it, some memory corruption must already have occurred
> > > > at this point to get the malloc to fail.
> > > malloc is not async signal safe. It's illegal to call it from a signal
> > > handler. I'm not saying this is what's causing a crash.
> > In zsh, this is handled by queuing interrupts and only allowing them to
> > run in a few places in the code.  Obviously, waiting for a child to
> > exit is one of those places.
> The stack trace shows malloc being called zhandler. zhandler is a
> signal handler. What am I missing?

You're not missing anything there, that's how it works.

Interrupts are queued so they don't normally go off.

In certain places they are allowed to take place; one of these is when
we are sitting waiting for a child to exit.

At this point the signal handler will then run.

Thus the signal handler is supposed not to be running when any memory
management is taking place underneath.  So it's not asynchronous
with respect to code actually running in the main shell (despite being
run from a signal handler which can formally occur anywhere, but
we make sure it doesn't).

Of course, there's the possibility of bugs in this, but the stack in
this case doesn't show evidence of that at the point in question.

You'll find long discussions of this in the mail archive going back
some years.

pws


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-12-13 10:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20191212182957epcas5p48645aa560e781ac1c34f00662a6d6176@epcas5p4.samsung.com>
2019-12-12 18:28 ` [BUG] Crash due to malloc call in signal handler Antoine C.
2019-12-13  9:40   ` Peter Stephenson
2019-12-13  9:45     ` Roman Perepelitsa
2019-12-13 10:16       ` Peter Stephenson
2019-12-13 10:19         ` Roman Perepelitsa
2019-12-13 10:31           ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).