zsh-workers
 help / Atom feed
* [BUG] abort due to malloc/free problem
@ 2019-01-28 14:58 Antoine C.
  2019-01-29 19:10 ` Peter Stephenson
  0 siblings, 1 reply; 2+ messages in thread
From: Antoine C. @ 2019-01-28 14:58 UTC (permalink / raw)
  To: zsh-workers

Hello,

I have a set of scripts which crash after some time (can be 10mn or more than 1 hour).
These scripts are continually launching subshells and commands in background then waiting for them to finish.

It occurs with the Ubuntu 18.04.1 version 5.4.2 and also the last on the git 5.7.

I compiled with:
./configure --enable-zsh-debug --enable-multibyte

I tried a few combinations and noticed that --enable-zsh-mem prevented the abort.

I get the following messages:

   free(): invalid size
   zsh: abort (core dumped)

or:

   double free or corruption (out)
   zsh: abort (core dumped)

or just:

   zsh: segmentation fault (core dumped)

Backtrace from the core files give:

#0  _int_malloc (av=av@entry=0x7f21c9231c40 <main_arena>, bytes=bytes@entry=448) at malloc.c:3735
#1  0x00007f21c8edd0fc in __GI___libc_malloc (bytes=448) at malloc.c:3057
#2  0x000055dbbb7d2206 in zalloc (size=448) at mem.c:966
#3  0x000055dbbb7c19d5 in clearjobtab (monitor=1) at jobs.c:1705
#4  0x000055dbbb790a85 in entersubsh (flags=3, retp=0x7ffc8db68bd8) at exec.c:1138
#5  0x000055dbbb795db3 in execcmd_fork (state=0x7ffc8db6c070, how=4, type=8, varspc=0x0, filelistp=0x7ffc8db68d60, text=0x55dbbba51ea0 <jbuf> "( $WGET ${(e)URL} -O $filename -a $LOG.$task; rc=$? ; print  &>> $LOG.$task; )", oautocont=-1, close_if_forked=-1) at exec.c:2748
#6  0x000055dbbb79627c in execcmd_exec (state=0x7ffc8db6c070, eparams=0x7ffc8db690a0, input=0, output=0, how=4, last1=2, close_if_forked=-1) at exec.c:2868
#7  0x000055dbbb7937e6 in execpline2 (state=0x7ffc8db6c070, pcode=11459, how=4, input=0, output=0, last1=0) at exec.c:1927
#8  0x000055dbbb79239c in execpline (state=0x7ffc8db6c070, slcode=35842, how=4, last1=0) at exec.c:1658
#9  0x000055dbbb79163f in execlist (state=0x7ffc8db6c070, dont_change_job=1, exiting=0) at exec.c:1413
#10 0x000055dbbb7cb7a2 in execif (state=0x7ffc8db6c070, do_exec=0) at loop.c:576
#11 0x000055dbbb79965d in execcmd_exec (state=0x7ffc8db6c070, eparams=0x7ffc8db69910, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3910
#12 0x000055dbbb7937e6 in execpline2 (state=0x7ffc8db6c070, pcode=10563, how=18, input=0, output=0, last1=0) at exec.c:1927
#13 0x000055dbbb79239c in execpline (state=0x7ffc8db6c070, slcode=150530, how=18, last1=0) at exec.c:1658
#14 0x000055dbbb79163f in execlist (state=0x7ffc8db6c070, dont_change_job=1, exiting=0) at exec.c:1413
#15 0x000055dbbb7ca7ee in execfor (state=0x7ffc8db6c070, do_exec=0) at loop.c:175
#16 0x000055dbbb79965d in execcmd_exec (state=0x7ffc8db6c070, eparams=0x7ffc8db6a1f0, input=0, output=0, how=2, last1=2, close_if_forked=-1) at exec.c:3910
#17 0x000055dbbb7937e6 in execpline2 (state=0x7ffc8db6c070, pcode=10307, how=2, input=0, output=0, last1=0) at exec.c:1927
#18 0x000055dbbb79239c in execpline (state=0x7ffc8db6c070, slcode=171010, how=2, last1=0) at exec.c:1658
#19 0x000055dbbb79163f in execlist (state=0x7ffc8db6c070, dont_change_job=1, exiting=0) at exec.c:1413
#20 0x000055dbbb7ca7ee in execfor (state=0x7ffc8db6c070, do_exec=0) at loop.c:175
#21 0x000055dbbb79965d in execcmd_exec (state=0x7ffc8db6c070, eparams=0x7ffc8db6aad0, input=0, output=0, how=2, last1=2, close_if_forked=-1) at exec.c:3910
#22 0x000055dbbb7937e6 in execpline2 (state=0x7ffc8db6c070, pcode=9923, how=2, input=0, output=0, last1=0) at exec.c:1927
#23 0x000055dbbb79239c in execpline (state=0x7ffc8db6c070, slcode=521218, how=2, last1=0) at exec.c:1658
#24 0x000055dbbb79163f in execlist (state=0x7ffc8db6c070, dont_change_job=1, exiting=0) at exec.c:1413
#25 0x000055dbbb7ca7ee in execfor (state=0x7ffc8db6c070, do_exec=0) at loop.c:175
#26 0x000055dbbb79965d in execcmd_exec (state=0x7ffc8db6c070, eparams=0x7ffc8db6b3b0, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3910
#27 0x000055dbbb7937e6 in execpline2 (state=0x7ffc8db6c070, pcode=9667, how=18, input=0, output=0, last1=0) at exec.c:1927
#28 0x000055dbbb79239c in execpline (state=0x7ffc8db6c070, slcode=620546, how=18, last1=0) at exec.c:1658
#29 0x000055dbbb79163f in execlist (state=0x7ffc8db6c070, dont_change_job=1, exiting=0) at exec.c:1413
#30 0x000055dbbb7ca7ee in execfor (state=0x7ffc8db6c070, do_exec=0) at loop.c:175
#31 0x000055dbbb79965d in execcmd_exec (state=0x7ffc8db6c070, eparams=0x7ffc8db6bc90, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3910
#32 0x000055dbbb7937e6 in execpline2 (state=0x7ffc8db6c070, pcode=7811, how=18, input=0, output=0, last1=0) at exec.c:1927
#33 0x000055dbbb79239c in execpline (state=0x7ffc8db6c070, slcode=750594, how=18, last1=0) at exec.c:1658
#34 0x000055dbbb79163f in execlist (state=0x7ffc8db6c070, dont_change_job=0, exiting=0) at exec.c:1413
#35 0x000055dbbb790c7a in execode (p=0x7f21c9e243b0, dont_change_job=0, exiting=0, context=0x55dbbb82bdb6 "toplevel") at exec.c:1192
#36 0x000055dbbb7b7c53 in loop (toplevel=1, justonce=0) at init.c:209
#37 0x000055dbbb7bbd10 in zsh_main (argc=2, argv=0x7ffc8db6c358) at init.c:1758
#38 0x000055dbbb76edfa in main (argc=2, argv=0x7ffc8db6c358) at ./main.c:93

or 

#0  tcache_get (tc_idx=0) at malloc.c:2943
#1  __GI___libc_malloc (bytes=16) at malloc.c:3050
#2  0x000055970fbe2206 in zalloc (size=16) at mem.c:966
#3  0x000055970fbe0282 in pushheap () at mem.c:304
#4  0x000055970fbda580 in execfor (state=0x7fff0c041120, do_exec=0) at loop.c:118
#5  0x000055970fba965d in execcmd_exec (state=0x7fff0c041120, eparams=0x7fff0c03f2a0, input=0, output=0, how=2, last1=2, close_if_forked=-1) at exec.c:3910
#6  0x000055970fba37e6 in execpline2 (state=0x7fff0c041120, pcode=10307, how=2, input=0, output=0, last1=0) at exec.c:1927
#7  0x000055970fba239c in execpline (state=0x7fff0c041120, slcode=171010, how=2, last1=0) at exec.c:1658
#8  0x000055970fba163f in execlist (state=0x7fff0c041120, dont_change_job=1, exiting=0) at exec.c:1413
#9  0x000055970fbda7ee in execfor (state=0x7fff0c041120, do_exec=0) at loop.c:175
#10 0x000055970fba965d in execcmd_exec (state=0x7fff0c041120, eparams=0x7fff0c03fb80, input=0, output=0, how=2, last1=2, close_if_forked=-1) at exec.c:3910
#11 0x000055970fba37e6 in execpline2 (state=0x7fff0c041120, pcode=9923, how=2, input=0, output=0, last1=0) at exec.c:1927
#12 0x000055970fba239c in execpline (state=0x7fff0c041120, slcode=521218, how=2, last1=0) at exec.c:1658
#13 0x000055970fba163f in execlist (state=0x7fff0c041120, dont_change_job=1, exiting=0) at exec.c:1413
#14 0x000055970fbda7ee in execfor (state=0x7fff0c041120, do_exec=0) at loop.c:175
#15 0x000055970fba965d in execcmd_exec (state=0x7fff0c041120, eparams=0x7fff0c040460, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3910
#16 0x000055970fba37e6 in execpline2 (state=0x7fff0c041120, pcode=9667, how=18, input=0, output=0, last1=0) at exec.c:1927
#17 0x000055970fba239c in execpline (state=0x7fff0c041120, slcode=620546, how=18, last1=0) at exec.c:1658
#18 0x000055970fba163f in execlist (state=0x7fff0c041120, dont_change_job=1, exiting=0) at exec.c:1413
#19 0x000055970fbda7ee in execfor (state=0x7fff0c041120, do_exec=0) at loop.c:175
#20 0x000055970fba965d in execcmd_exec (state=0x7fff0c041120, eparams=0x7fff0c040d40, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3910
#21 0x000055970fba37e6 in execpline2 (state=0x7fff0c041120, pcode=7811, how=18, input=0, output=0, last1=0) at exec.c:1927
#22 0x000055970fba239c in execpline (state=0x7fff0c041120, slcode=750594, how=18, last1=0) at exec.c:1658
#23 0x000055970fba163f in execlist (state=0x7fff0c041120, dont_change_job=0, exiting=0) at exec.c:1413
#24 0x000055970fba0c7a in execode (p=0x7febab4c83b0, dont_change_job=0, exiting=0, context=0x55970fc3bdb6 "toplevel") at exec.c:1192
#25 0x000055970fbc7c53 in loop (toplevel=1, justonce=0) at init.c:209
#26 0x000055970fbcbd10 in zsh_main (argc=2, argv=0x7fff0c041408) at init.c:1758
#27 0x000055970fb7edfa in main (argc=2, argv=0x7fff0c041408) at ./main.c:93

I also tried with MALLOC_CHECK_=3 but it seems that instead of crashing, the script just freeze.

Also, when running with valgrind, the output gets flooded with traces, so I am not sure here if this is really relevant.

If you need other specific tests to be run, I would be glad to help.

Antoine

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [BUG] abort due to malloc/free problem
  2019-01-28 14:58 [BUG] abort due to malloc/free problem Antoine C.
@ 2019-01-29 19:10 ` Peter Stephenson
  0 siblings, 0 replies; 2+ messages in thread
From: Peter Stephenson @ 2019-01-29 19:10 UTC (permalink / raw)
  To: Antoine C., zsh-workers

On Mon, 2019-01-28 at 15:58 +0100, Antoine C. wrote:
> I have a set of scripts which crash after some time (can be 10mn or
> more than 1 hour).  These scripts are continually launching subshells
> and commands in background then waiting for them to finish.

From the crashes you've sent it looks like memory corruption that has
already happened some time before.

Given what you are doing, this could well be due to a relative of
problems we've seen before: races between the interrupt handler for
processes and the rest of the code.  It's quite a while since one of
these was reported, but they are very sensitive to conditions.

It's quite likely that sensitivity to timing is why enabling zsh's
memory allocator hides the problem.  That should be a perfectly good
workaround if it happens to work, though it does mean we can't simply
instrument the allocators to debug it.

It's unlikely to be possible to get any further without some concrete
reproduction case.

pws


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, back to index

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-28 14:58 [BUG] abort due to malloc/free problem Antoine C.
2019-01-29 19:10 ` Peter Stephenson

zsh-workers

Archives are clonable: git clone --mirror http://inbox.vuxu.org/zsh-workers

Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.zsh.workers


AGPL code for this site: git clone https://public-inbox.org/ public-inbox