From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=FREEMAIL_FROM, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.2 Received: from primenet.com.au (ns1.primenet.com.au [203.24.36.2]) by inbox.vuxu.org (OpenSMTPD) with ESMTP id d841fdb6 for ; Mon, 16 Dec 2019 17:05:01 +0000 (UTC) Received: (qmail 25522 invoked by alias); 16 Dec 2019 17:04:53 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: List-Unsubscribe: X-Seq: 45051 Received: (qmail 9942 invoked by uid 1010); 16 Dec 2019 17:04:53 -0000 X-Qmail-Scanner-Diagnostics: from smtp3-g21.free.fr by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.102.1/25663. spamassassin: 3.4.2. Clear:RC:0(212.27.42.3):SA:0(-1.9/5.0):. Processed in 2.643011 secs); 16 Dec 2019 17:04:53 -0000 X-Envelope-From: acalando@free.fr X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: none (ns1.primenet.com.au: domain at free.fr does not designate permitted sender hosts) Date: Mon, 16 Dec 2019 18:04:12 +0100 (CET) From: "Antoine C." To: Zsh Workers List Message-ID: <896178036.1008341150.1576515852191.JavaMail.root@zimbra62-e11.priv.proxad.net> Subject: =?utf-8?Q?Re=C2=A0:_[BUG]_Crash_due_to_malloc_call_in_signal_handler?= MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [138.21.12.25] X-Mailer: Zimbra 7.2.0-GA2598 (zclient/7.2.0-GA2598) X-Authenticated-User: acalando@free.fr I have tried this patch: https://www.zsh.org/mla/workers/2019/msg01058.html The good news (actually a bad one as I will explain) is that I could not reproduce it with my specific test script. With the original code, it usually occurs within seconds; with the patch, I never get a crash even after hours of execution, and trying to modify some delays. But I also tried with my "real life" scripts, and unfortunately I get a crash twice today. There is a difference with previous version: the crash usually occurred within minutes or a few hours, and now it is rather after many hours. Here is a bt: #0 malloc_consolidate (av=av@entry=0x7f9b26b33c40 ) at malloc.c:4439 #1 0x00007f9b267dce05 in _int_malloc (av=av@entry=0x7f9b26b33c40 , bytes=bytes@entry=448) at malloc.c:4112 #2 0x00007f9b267df0fc in __GI___libc_malloc (bytes=448) at malloc.c:3057 #3 0x00005571eade1a22 in zalloc (size=448) at mem.c:966 #4 0x00005571eadd11b6 in clearjobtab (monitor=1) at jobs.c:1705 #5 0x00005571ead9fec1 in entersubsh (flags=3, retp=0x7ffd830ec0e8) at exec.c:1140 #6 0x00005571eada521d in execcmd_fork (state=0x7ffd830ef580, how=4, type=8, varspc=0x0, filelistp=0x7ffd830ec270, text=0x5571eb061ee0 "( $WGET ${(e)URL} -O $filename -a $LOG.$task; rc=$? ; print &>> $LOG.$task; )", oautocont=-1, close_if_forked=-1) at exec.c:2751 #7 0x00005571eada56fb in execcmd_exec (state=0x7ffd830ef580, eparams=0x7ffd830ec5b0, input=0, output=0, how=4, last1=2, close_if_forked=-1) at exec.c:2874 #8 0x00005571eada2c35 in execpline2 (state=0x7ffd830ef580, pcode=16835, how=4, input=0, output=0, last1=0) at exec.c:1930 #9 0x00005571eada17d8 in execpline (state=0x7ffd830ef580, slcode=35842, how=4, last1=0) at exec.c:1660 #10 0x00005571eada0a7b in execlist (state=0x7ffd830ef580, dont_change_job=1, exiting=0) at exec.c:1415 #11 0x00005571eaddafbe in execif (state=0x7ffd830ef580, do_exec=0) at loop.c:580 #12 0x00005571eada8adc in execcmd_exec (state=0x7ffd830ef580, eparams=0x7ffd830ece20, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3916 #13 0x00005571eada2c35 in execpline2 (state=0x7ffd830ef580, pcode=15683, how=18, input=0, output=0, last1=0) at exec.c:1930 #14 0x00005571eada17d8 in execpline (state=0x7ffd830ef580, slcode=155650, how=18, last1=0) at exec.c:1660 #15 0x00005571eada0a7b in execlist (state=0x7ffd830ef580, dont_change_job=1, exiting=0) at exec.c:1415 #16 0x00005571eadd9fec in execfor (state=0x7ffd830ef580, do_exec=0) at loop.c:175 #17 0x00005571eada8adc in execcmd_exec (state=0x7ffd830ef580, eparams=0x7ffd830ed700, input=0, output=0, how=2, last1=2, close_if_forked=-1) at exec.c:3916 #18 0x00005571eada2c35 in execpline2 (state=0x7ffd830ef580, pcode=15363, how=2, input=0, output=0, last1=0) at exec.c:1930 #19 0x00005571eada17d8 in execpline (state=0x7ffd830ef580, slcode=176130, how=2, last1=0) at exec.c:1660 #20 0x00005571eada0a7b in execlist (state=0x7ffd830ef580, dont_change_job=1, exiting=0) at exec.c:1415 #21 0x00005571eadd9fec in execfor (state=0x7ffd830ef580, do_exec=0) at loop.c:175 #22 0x00005571eada8adc in execcmd_exec (state=0x7ffd830ef580, eparams=0x7ffd830edfe0, input=0, output=0, how=2, last1=2, close_if_forked=-1) at exec.c:3916 #23 0x00005571eada2c35 in execpline2 (state=0x7ffd830ef580, pcode=14979, how=2, input=0, output=0, last1=0) at exec.c:1930 #24 0x00005571eada17d8 in execpline (state=0x7ffd830ef580, slcode=579586, how=2, last1=0) at exec.c:1660 #25 0x00005571eada0a7b in execlist (state=0x7ffd830ef580, dont_change_job=1, exiting=0) at exec.c:1415 #26 0x00005571eadd9fec in execfor (state=0x7ffd830ef580, do_exec=0) at loop.c:175 #27 0x00005571eada8adc in execcmd_exec (state=0x7ffd830ef580, eparams=0x7ffd830ee8c0, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3916 #28 0x00005571eada2c35 in execpline2 (state=0x7ffd830ef580, pcode=14723, how=18, input=0, output=0, last1=0) at exec.c:1930 #29 0x00005571eada17d8 in execpline (state=0x7ffd830ef580, slcode=678914, how=18, last1=0) at exec.c:1660 #30 0x00005571eada0a7b in execlist (state=0x7ffd830ef580, dont_change_job=1, exiting=0) at exec.c:1415 #31 0x00005571eadd9fec in execfor (state=0x7ffd830ef580, do_exec=0) at loop.c:175 #32 0x00005571eada8adc in execcmd_exec (state=0x7ffd830ef580, eparams=0x7ffd830ef1a0, input=0, output=0, how=18, last1=2, close_if_forked=-1) at exec.c:3916 #33 0x00005571eada2c35 in execpline2 (state=0x7ffd830ef580, pcode=12867, how=18, input=0, output=0, last1=0) at exec.c:1930 #34 0x00005571eada17d8 in execpline (state=0x7ffd830ef580, slcode=813058, how=18, last1=0) at exec.c:1660 #35 0x00005571eada0a7b in execlist (state=0x7ffd830ef580, dont_change_job=0, exiting=0) at exec.c:1415 #36 0x00005571eada00b6 in execode (p=0x7f9b27726c20, dont_change_job=0, exiting=0, context=0x5571eae3b7de "toplevel") at exec.c:1194 #37 0x00005571eadc73f4 in loop (toplevel=1, justonce=0) at init.c:212 #38 0x00005571eadcb4f1 in zsh_main (argc=2, argv=0x7ffd830ef868) at init.c:1770 #39 0x00005571ead7de0a in main (argc=2, argv=0x7ffd830ef868) at ./main.c:93 But as with the first case, I suspect that the corruption occurred during a previous call to malloc/free. And the fact that I cannot reproduce it quickly with the test script will make the investigation more difficult. I will continue however to test the patch and I hope I can provide a more interesting bt later. Antoine