* TRAPINT doesn't work reliably @ 2019-09-17 16:47 ` Dennis Schwartz 2019-09-24 8:44 ` Peter Stephenson 0 siblings, 1 reply; 21+ messages in thread From: Dennis Schwartz @ 2019-09-17 16:47 UTC (permalink / raw) To: zsh-users Hi, I have a function on TRAPINT in my .zshrc like described as in the Zsh manual [1]. TRAPINT() { echo "trap: $1" return $(( 128 + $1 )) } This works unreliably. Usually this works a first few times, but after a while this doesn’t work anymore and throws the following error. TRAPINT:1: command not found: \M-^A^A TRAPINT:2: command not found: F^\V This command used to work flawlessly in Zsh 5.3.1 (Debian strech). I only encounter this issue in 5.7.1. Is this a regression that might have been introduced, or is there maybe something else wrong in my (other) configuration? Thanks, Dennis [1] http://zsh.sourceforge.net/Doc/Release/Functions.html#index-trapping-signals ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-17 16:47 ` TRAPINT doesn't work reliably Dennis Schwartz @ 2019-09-24 8:44 ` Peter Stephenson 2019-09-25 13:02 ` Dennis Schwartz 0 siblings, 1 reply; 21+ messages in thread From: Peter Stephenson @ 2019-09-24 8:44 UTC (permalink / raw) To: zsh-users; +Cc: Dennis Schwartz On Tue, 2019-09-17 at 16:47 +0000, Dennis Schwartz wrote: > Hi, > > I have a function on TRAPINT in my .zshrc like described as in the Zsh manual [1]. > > TRAPINT() { > echo "trap: $1" > return $(( 128 + $1 )) > } > > > This works unreliably. Usually this works a first few times, but after a while this doesn’t work anymore and throws the following error. > > TRAPINT:1: command not found: \M-^A^A > TRAPINT:2: command not found: F^\V This certainly isn't likely to be anything you've done wrong, at least based on what you've told us. It smells of memory management problems, but it's hard to see where the corruption would be. What do you see if you run functions TRAPINT after the problem has turned up? pws ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-24 8:44 ` Peter Stephenson @ 2019-09-25 13:02 ` Dennis Schwartz 2019-09-25 14:01 ` Peter Stephenson 0 siblings, 1 reply; 21+ messages in thread From: Dennis Schwartz @ 2019-09-25 13:02 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh-users On Tuesday, September 24, 2019 10:44 AM, Peter Stephenson <p.stephenson@samsung.com> wrote: > On Tue, 2019-09-17 at 16:47 +0000, Dennis Schwartz wrote: > > > I have a function on TRAPINT in my .zshrc like described as in the Zsh manual [1]. > > > > TRAPINT() { > > echo "trap: $1" > > return $(( 128 + $1 )) > > } > > > > > > This works unreliably. Usually this works a first few times, but after a while this doesn’t work anymore and throws the following error. > > > > TRAPINT:1: command not found: \M-^A^A > > TRAPINT:2: command not found: F^\V > > This certainly isn't likely to be anything you've done wrong, at least based on > what you've told us. > > It smells of memory management problems, but it's hard to see where the corruption > would be. > > What do you see if you run > > functions TRAPINT > > after the problem has turned up? It almost looks like the function gets replaced with random memory. `functions TRAPINT` just shows random bytes, for example: $ xxd <(functions TRAPINT) 00000000: 5452 4150 494e 5420 2829 207b 0a09 0701 TRAPINT () {.... 00000010: 200a 0950 200a 7d0a ..P .}. I am now more convinced it's a bug in Zsh. Any advice on how to debug this? And where can I best submit a bug report to? Dennis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-25 13:02 ` Dennis Schwartz @ 2019-09-25 14:01 ` Peter Stephenson 2019-09-25 16:25 ` Dennis Schwartz 0 siblings, 1 reply; 21+ messages in thread From: Peter Stephenson @ 2019-09-25 14:01 UTC (permalink / raw) To: Dennis Schwartz; +Cc: zsh-users > On 25 September 2019 at 14:02 Dennis Schwartz <dennis.schwartz@protonmail.com> wrote: > On Tuesday, September 24, 2019 10:44 AM, Peter Stephenson <p.stephenson@samsung.com> wrote: >> On Tue, 2019-09-17 at 16:47 +0000, Dennis Schwartz wrote: >>> I have a function on TRAPINT in my .zshrc like described as in the Zsh manual [1]. >>> >>> TRAPINT() { >>> echo "trap: $1" >>> return $(( 128 + $1 )) >>> } >>> >>> >>> This works unreliably. Usually this works a first few times, but >>> after a while this doesn’t work anymore and throws the following >>> error. >>> >>> TRAPINT:1: command not found: \M-^A^A >>> TRAPINT:2: command not found: F^\V > > It almost looks like the function gets replaced with random memory. > `functions TRAPINT` just shows random bytes, for example: > > $ xxd <(functions TRAPINT) > 00000000: 5452 4150 494e 5420 2829 207b 0a09 0701 TRAPINT () {.... > 00000010: 200a 0950 200a 7d0a ..P .}. > > I am now more convinced it's a bug in Zsh. Any advice on how to debug this? > And where can I best submit a bug report to? You don't need to submit a further separate bug report. Memory errors are tricky, and often hard to reproduce since allocation is heavily OS specific, but probably your best bet is to run with valgrind --leak-check=full zsh which should produce sensible results --- the shell shouldn't leak memory and anything that looks anomalous is probably a real bug ("still reachable" memory is OK). I'd also suggest trying the latest firmware from git or sourceforge, since there have been some memory fixes (and a release is probably overdue). cheers pws ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-25 14:01 ` Peter Stephenson @ 2019-09-25 16:25 ` Dennis Schwartz 2019-09-25 17:04 ` Peter Stephenson 2019-09-25 17:56 ` Peter Stephenson 0 siblings, 2 replies; 21+ messages in thread From: Dennis Schwartz @ 2019-09-25 16:25 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh-users On Wednesday, September 25, 2019 4:01 PM, Peter Stephenson <p.w.stephenson@ntlworld.com> wrote: > > On 25 September 2019 at 14:02 Dennis Schwartz dennis.schwartz@protonmail.com wrote: > > > > I am now more convinced it's a bug in Zsh. Any advice on how to debug this? > > And where can I best submit a bug report to? > > You don't need to submit a further separate bug report. Okay, thanks. > Memory errors are tricky, and often hard to reproduce since allocation > is heavily OS specific, but probably your best bet is to run with > > valgrind --leak-check=full zsh > > which should produce sensible results --- the shell shouldn't leak > memory and anything that looks anomalous is probably a real bug ("still > reachable" memory is OK). I run valgrind on zsh and captured the error. Unfortunately, I am inexperienced with C programming so I do not know how to interpret the output. I've copied the part of the output that I believed is relevant below. Please let me know if I could help in debugging it further. ==1896== Invalid read of size 1 ==1896== at 0x483BC62: strlen (vg_replace_strmem.c:460) ==1896== by 0x19755E: dupstring (in /usr/bin/zsh) ==1896== by 0x138F3B: ??? (in /usr/bin/zsh) ==1896== by 0x144663: ??? (in /usr/bin/zsh) ==1896== by 0x141A72: execlist (in /usr/bin/zsh) ==1896== by 0x141D83: execode (in /usr/bin/zsh) ==1896== by 0x142C8B: runshfunc (in /usr/bin/zsh) ==1896== by 0x1431C8: doshfunc (in /usr/bin/zsh) ==1896== by 0x1963C2: ??? (in /usr/bin/zsh) ==1896== by 0x19413B: dotrap (in /usr/bin/zsh) ==1896== by 0x194247: ??? (in /usr/bin/zsh) ==1896== by 0x194661: zhandler (in /usr/bin/zsh) ==1896== Address 0x5fc8488 is 264 bytes inside a block of size 328 free'd ==1896== at 0x48399AB: free (vg_replace_malloc.c:530) ==1896== by 0x136C8E: zcontext_restore_partial (in /usr/bin/zsh) ==1896== by 0x1656C3: parse_subscript (in /usr/bin/zsh) ==1896== by 0x17A446: getindex (in /usr/bin/zsh) ==1896== by 0x17ABCF: fetchvalue (in /usr/bin/zsh) ==1896== by 0x19BDB0: ??? (in /usr/bin/zsh) ==1896== by 0x1A0C87: prefork (in /usr/bin/zsh) ==1896== by 0x13ABE6: execsubst (in /usr/bin/zsh) ==1896== by 0x1674CB: execfor (in /usr/bin/zsh) ==1896== by 0x13E44C: ??? (in /usr/bin/zsh) ==1896== by 0x13FB6E: ??? (in /usr/bin/zsh) ==1896== by 0x13FF11: ??? (in /usr/bin/zsh) ==1896== Block was alloc'd at ==1896== at 0x483877F: malloc (vg_replace_malloc.c:299) ==1896== by 0x136A13: zcontext_save_partial (in /usr/bin/zsh) ==1896== by 0x165622: parse_subscript (in /usr/bin/zsh) ==1896== by 0x17A446: getindex (in /usr/bin/zsh) ==1896== by 0x17ABCF: fetchvalue (in /usr/bin/zsh) ==1896== by 0x19BDB0: ??? (in /usr/bin/zsh) ==1896== by 0x1A0C87: prefork (in /usr/bin/zsh) ==1896== by 0x13ABE6: execsubst (in /usr/bin/zsh) ==1896== by 0x1674CB: execfor (in /usr/bin/zsh) ==1896== by 0x13E44C: ??? (in /usr/bin/zsh) ==1896== by 0x13FB6E: ??? (in /usr/bin/zsh) ==1896== by 0x13FF11: ??? (in /usr/bin/zsh) ==1896== ==1896== Invalid read of size 1 [... repetition 4 more times ...] ==2144== ==2144== HEAP SUMMARY: ==2144== in use at exit: 1,583,276 bytes in 35,151 blocks ==2144== total heap usage: 78,390 allocs, 43,239 frees, 12,292,102 bytes allocated ==2144== ==2144== LEAK SUMMARY: ==2144== definitely lost: 0 bytes in 0 blocks ==2144== indirectly lost: 0 bytes in 0 blocks ==2144== possibly lost: 0 bytes in 0 blocks ==2144== still reachable: 1,583,276 bytes in 35,151 blocks ==2144== suppressed: 0 bytes in 0 blocks ==2144== Reachable blocks (those to which a pointer was found) are not shown. ==2144== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==2144== ==2144== For counts of detected and suppressed errors, rerun with: -v ==2144== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0) > I'd also suggest trying the latest firmware from git or sourceforge, > since there have been some memory fixes (and a release is probably > overdue). I haven't tried compiling from the latest source code yet. If this is desired I could try this again at a later point in time. - Dennis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-25 16:25 ` Dennis Schwartz @ 2019-09-25 17:04 ` Peter Stephenson 2019-09-25 18:46 ` Daniel Shahaf 2019-09-25 17:56 ` Peter Stephenson 1 sibling, 1 reply; 21+ messages in thread From: Peter Stephenson @ 2019-09-25 17:04 UTC (permalink / raw) To: zsh-users; +Cc: Dennis Schwartz On Wed, 2019-09-25 at 16:25 +0000, Dennis Schwartz wrote: > On Wednesday, September 25, 2019 4:01 PM, Peter Stephenson <p.w.stephenson@ntlworld.com> wrote: > I run valgrind on zsh and captured the error. Unfortunately, I am > inexperienced with C programming so I do not know how to interpret the > output. I've copied the part of the output that I believed is relevant > below. Please let me know if I could help in debugging it further. Unforutnately, you haven't got debug symbols in the installed zsh, so it's not showing much of interest --- though it does certainly suggest something is up. > > I'd also suggest trying the latest firmware from git or sourceforge, > > since there have been some memory fixes (and a release is probably > > overdue). > I haven't tried compiling from the latest source code yet. If this is > desired I could try this again at a later point in time. I suspect that's going to have to be the next step, if you get the chance. In the top-level directory, run configure as ./configure --enable-zsh-debug and then "make", and that should give you an installable executable that gives useful debug information ("sudo make install" will put zsh in /usr/local/bin; you can remove anything it installs in /usr/local later). Thanks for the assistance pws ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-25 17:04 ` Peter Stephenson @ 2019-09-25 18:46 ` Daniel Shahaf 2019-09-26 15:27 ` Peter Stephenson 0 siblings, 1 reply; 21+ messages in thread From: Daniel Shahaf @ 2019-09-25 18:46 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh-users, Dennis Schwartz Peter Stephenson wrote on Wed, Sep 25, 2019 at 18:04:45 +0100: > On Wed, 2019-09-25 at 16:25 +0000, Dennis Schwartz wrote: > > I haven't tried compiling from the latest source code yet. If this is > > desired I could try this again at a later point in time. > > I suspect that's going to have to be the next step, if you get the > chance. In the top-level directory, run configure as > > ./configure --enable-zsh-debug > Should Dennis use any of these flags as well? — [[[ % ./configure --help=short | vipe --enable-zsh-mem compile with zsh memory allocation routines --enable-zsh-mem-debug debug zsh memory allocation routines --enable-zsh-mem-warning print warnings for errors in memory allocation --enable-zsh-secure-free turn on error checking for free() --enable-zsh-heap-debug turn on error checking for heap allocation --enable-zsh-valgrind turn on support for valgrind debugging of heap memory --enable-zsh-hash-debug turn on debugging of internal hash tables --enable-stack-allocation allocate stack memory e.g. with `alloca' ]]] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-25 18:46 ` Daniel Shahaf @ 2019-09-26 15:27 ` Peter Stephenson 2019-09-27 13:43 ` Daniel Shahaf 0 siblings, 1 reply; 21+ messages in thread From: Peter Stephenson @ 2019-09-26 15:27 UTC (permalink / raw) To: Daniel Shahaf; +Cc: zsh-users, Dennis Schwartz On Wed, 2019-09-25 at 18:46 +0000, Daniel Shahaf wrote: > Peter Stephenson wrote on Wed, Sep 25, 2019 at 18:04:45 +0100: > > > > On Wed, 2019-09-25 at 16:25 +0000, Dennis Schwartz wrote: > > > > > > I haven't tried compiling from the latest source code yet. If this is > > > desired I could try this again at a later point in time. > > I suspect that's going to have to be the next step, if you get the > > chance. In the top-level directory, run configure as > > > > ./configure --enable-zsh-debug > > > Should Dennis use any of these flags as well? — It's not clear anything else is going to help debugging, certainly if the build that's showing the problem was made (as almost all distro builds are made) using the system allocators. In that case none of the zsh memory specials apply, and if we turn on zsh memory management we are in a different world --- which might shows the problem but could well perturb it somewhere completely different. cheers pws ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-26 15:27 ` Peter Stephenson @ 2019-09-27 13:43 ` Daniel Shahaf 0 siblings, 0 replies; 21+ messages in thread From: Daniel Shahaf @ 2019-09-27 13:43 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh-users, Dennis Schwartz Peter Stephenson wrote on Thu, 26 Sep 2019 15:27 +00:00: > On Wed, 2019-09-25 at 18:46 +0000, Daniel Shahaf wrote: > > Peter Stephenson wrote on Wed, Sep 25, 2019 at 18:04:45 +0100: > > > > > > On Wed, 2019-09-25 at 16:25 +0000, Dennis Schwartz wrote: > > > > > > > > I haven't tried compiling from the latest source code yet. If this is > > > > desired I could try this again at a later point in time. > > > I suspect that's going to have to be the next step, if you get the > > > chance. In the top-level directory, run configure as > > > > > > ./configure --enable-zsh-debug > > > > > Should Dennis use any of these flags as well? — > > It's not clear anything else is going to help debugging, certainly > if the build that's showing the problem was made (as almost all > distro builds are made) using the system allocators. In that > case none of the zsh memory specials apply, Even without any special configure flags, there's still zhalloc(). The source of zhalloc() contains some blocks conditional on --enable-zsh-valgrind. I assume passing that configure flag will let valgrind detect use-after-freeheap() bugs. Also, I thought --enable-zsh-secure-free and --enable-zsh-heap-debug were independent of --enable-zsh-mem*. > and if we turn on zsh memory management we are in a different world > --- which might shows the problem but could well perturb it somewhere > completely different. Sure, any change could make the symptoms disappear, particularly switching to a different allocator. Cheers, Daniel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-25 16:25 ` Dennis Schwartz 2019-09-25 17:04 ` Peter Stephenson @ 2019-09-25 17:56 ` Peter Stephenson 2019-09-26 14:48 ` Dennis Schwartz 1 sibling, 1 reply; 21+ messages in thread From: Peter Stephenson @ 2019-09-25 17:56 UTC (permalink / raw) To: zsh-users; +Cc: dennis.schwartz On Wed, 2019-09-25 at 16:25 +0000, Dennis Schwartz wrote: > ==1896== Block was alloc'd at > ==1896== at 0x483877F: malloc (vg_replace_malloc.c:299) > ==1896== by 0x136A13: zcontext_save_partial (in /usr/bin/zsh) > ==1896== by 0x165622: parse_subscript (in /usr/bin/zsh) > ==1896== by 0x17A446: getindex (in /usr/bin/zsh) > ==1896== by 0x17ABCF: fetchvalue (in /usr/bin/zsh) > ==1896== by 0x19BDB0: ??? (in /usr/bin/zsh) > ==1896== by 0x1A0C87: prefork (in /usr/bin/zsh) > ==1896== by 0x13ABE6: execsubst (in /usr/bin/zsh) > ==1896== by 0x1674CB: execfor (in /usr/bin/zsh) > ==1896== by 0x13E44C: ??? (in /usr/bin/zsh) > ==1896== by 0x13FB6E: ??? (in /usr/bin/zsh) > ==1896== by 0x13FF11: ??? (in /usr/bin/zsh) > ==1896== > ==1896== Invalid read of size 1 One kind of interesting thing here is there's some suggestion that the original allocation was not simply in top level code, but in some kind of block (at least a for block) --- assuming, of course, this is relevant. Is there some structure about the point where you're setting up the trap? If so, does changing it (making it simpler) have any effect on the problem? Cheers pws ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-25 17:56 ` Peter Stephenson @ 2019-09-26 14:48 ` Dennis Schwartz 2019-09-26 15:25 ` Peter Stephenson 0 siblings, 1 reply; 21+ messages in thread From: Dennis Schwartz @ 2019-09-26 14:48 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh-users On Wednesday, September 25, 2019 7:56 PM, Peter Stephenson <p.stephenson@samsung.com> wrote: > On Wed, 2019-09-25 at 16:25 +0000, Dennis Schwartz wrote: > > > ==1896== Block was alloc'd at > > ==1896== at 0x483877F: malloc (vg_replace_malloc.c:299) > > ==1896== by 0x136A13: zcontext_save_partial (in /usr/bin/zsh) > > ==1896== by 0x165622: parse_subscript (in /usr/bin/zsh) > > ==1896== by 0x17A446: getindex (in /usr/bin/zsh) > > ==1896== by 0x17ABCF: fetchvalue (in /usr/bin/zsh) > > ==1896== by 0x19BDB0: ??? (in /usr/bin/zsh) > > ==1896== by 0x1A0C87: prefork (in /usr/bin/zsh) > > ==1896== by 0x13ABE6: execsubst (in /usr/bin/zsh) > > ==1896== by 0x1674CB: execfor (in /usr/bin/zsh) > > ==1896== by 0x13E44C: ??? (in /usr/bin/zsh) > > ==1896== by 0x13FB6E: ??? (in /usr/bin/zsh) > > ==1896== by 0x13FF11: ??? (in /usr/bin/zsh) > > ==1896== > > ==1896== Invalid read of size 1 > > One kind of interesting thing here is there's some suggestion that the > original allocation was not simply in top level code, but in some kind > of block (at least a for block) --- assuming, of course, this is > relevant. Is there some structure about the point where you're setting > up the trap? If so, does changing it (making it simpler) have any > effect on the problem? Okay, after quite some time debugging (I will spare you the details of all what I've tried), I can now reliably reproduce the bug. However, I lack the knowledge of zsh to understand what is causing the bug. To trigger the bug, I just open a fresh new shell (e.g. run `zsh`) and type `ls` and hit TAB to trigger the autocompletion function. However, I can only reproduce the bug if I have the following code in my `~/.zshrc`: # Antigen zsh plugins if [ -f "/usr/share/zsh-antigen/antigen.zsh" ]; then source "/usr/share/zsh-antigen/antigen.zsh" # load some plugins here, but they are not relevant to trigger # the bug fi So, I conditionally `source` another file. Apparently, this is causing *super weird* behavior. Unbelievably, if I open the file `.zshrc` (e.g., vim/gedit) and _save_ the file, I cannot trigger the bug. However, if I open the file, but _do not save_ the file, I always trigger the bug. To complicate it further, I can trigger the bug when I compile from source `zsh-5.7.1`, but I cannot trigger the bug anymore if VERSION in `Config/version.mk` is updated (i.e., the next commit). These findings leave me totally confused. Not sure if it's relevant but I mount my home folder with the `noatime` option. Any ideas? Thanks for your help! Cheers, Dennis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-26 14:48 ` Dennis Schwartz @ 2019-09-26 15:25 ` Peter Stephenson 2019-09-26 17:10 ` Dennis Schwartz 0 siblings, 1 reply; 21+ messages in thread From: Peter Stephenson @ 2019-09-26 15:25 UTC (permalink / raw) To: Dennis Schwartz; +Cc: zsh-users On Thu, 2019-09-26 at 14:48 +0000, Dennis Schwartz wrote: > However, I can only reproduce the bug if I have the following code in my > `~/.zshrc`: > > # Antigen zsh plugins > if [ -f "/usr/share/zsh-antigen/antigen.zsh" ]; then > source "/usr/share/zsh-antigen/antigen.zsh" > > # load some plugins here, but they are not relevant to trigger > # the bug > fi > > So, I conditionally `source` another file. Apparently, this is causing > *super weird* behavior. Unbelievably, if I open the file `.zshrc` (e.g., > vim/gedit) and _save_ the file, I cannot trigger the bug. However, if I > open the file, but _do not save_ the file, I always trigger the bug. This is very much the sort of weirdness you get with memory errors, unfortunately. They're extremely sensitive to what was allocated and deallocated where --- some piece of memory allocated for one purpose is presumably being erroneously freed and reused, and as far as the structure of your zsh code is concerned there's no actual logical relationship between the places --- they are just getting mixed up in the bowels of the allocation functions. It suggests it's going to be quite hard to reproduce elsewhere, though I'd still be interesting in the logic where you're defining TRAPINT since clearly that's the memory that's getting mishandled. It's also still suggesting trying to get valgrind to give a bit more detail is the best way forward. cheers pws ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-26 15:25 ` Peter Stephenson @ 2019-09-26 17:10 ` Dennis Schwartz 2019-09-27 13:46 ` Daniel Shahaf 2019-09-27 19:05 ` Peter Stephenson 0 siblings, 2 replies; 21+ messages in thread From: Dennis Schwartz @ 2019-09-26 17:10 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh-users On Thursday, September 26, 2019 5:25 PM, Peter Stephenson <p.stephenson@samsung.com> wrote: > It suggests it's going to be quite hard to reproduce elsewhere, though > I'd still be interesting in the logic where you're defining TRAPINT since > clearly that's the memory that's getting mishandled. I don't fully understand what you mean with "the logic where you're defining TRAPINT," but I have the following code in my `.zshrc`: function TRAPINT { VIMODE="$VIINS" print $1 # for debug only return $(( 128 + $1 )) } (I use zsh with vi keybindings and VIMODE indicates on my prompt which mode I'm in. When I interrupt, start again in insert mode and I want that to be properly indicated.) > It's also still suggesting trying to get valgrind to give a bit more > detail is the best way forward. Ah, of course. I forgot about valgrind since I could only reproduce this bug if I checked out the tag `zsh-5.7.1`. I did manage to capture the bug with valgrind on `master` using the following sequence of commands (output tidied): $ git checkout master $ git checkout zsh-5.7.1 -- Config/version.mk $ ./configure --enable-zsh-debug && make && sudo make install $ valgrind --leak-check=full --log-file=zsh-valgrind.log /usr/local/bin/zsh /usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n' $ ll TRAPINT:1: not an identifier: Here, `ll [TAB]` was executed in the new shell. I don't get the error message "/usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n'" when I start zsh without valgrind, so I guess that can be ignored. What valgrind captured: Invalid read of size 1 at 0x4838CC2: __strlen_sse2 (vg_replace_strmem.c:462) by 0x1B0792: dupstring (string.c:39) by 0x19BC70: ecgetstr (parse.c:2809) by 0x144095: addvars (exec.c:2429) by 0x1404DB: execsimple (exec.c:1237) by 0x140A85: execlist (exec.c:1378) by 0x14038F: execode (exec.c:1194) by 0x14DCB0: runshfunc (exec.c:5980) by 0x14D2E8: doshfunc (exec.c:5830) by 0x1AF4D1: dotrapargs (signals.c:1371) by 0x1AFA8F: dotrap (signals.c:1487) by 0x1AF18C: handletrap (signals.c:1202) Address 0x566b948 is 0 bytes after a block of size 328 free'd at 0x48369AB: free (vg_replace_malloc.c:530) by 0x13D8F3: zcontext_restore_partial (context.c:108) by 0x13DA56: zcontext_restore (context.c:119) by 0x175A04: parse_subscript (lex.c:1697) by 0x18B7F1: getindex (params.c:1858) by 0x18C132: fetchvalue (params.c:2106) by 0x1B6304: paramsubst (subst.c:2516) by 0x1B1DB9: stringsubst (subst.c:322) by 0x1B1108: prefork (subst.c:142) by 0x14486C: execsubst (exec.c:2570) by 0x1772E9: execfor (loop.c:98) by 0x148469: execcmd_exec (exec.c:3913) Block was alloc'd at at 0x483577F: malloc (vg_replace_malloc.c:299) by 0x13D5D6: zcontext_save_partial (context.c:58) by 0x13D7E9: zcontext_save (context.c:82) by 0x1758A7: parse_subscript (lex.c:1661) by 0x18B7F1: getindex (params.c:1858) by 0x18C132: fetchvalue (params.c:2106) by 0x1B6304: paramsubst (subst.c:2516) by 0x1B1DB9: stringsubst (subst.c:322) by 0x1B1108: prefork (subst.c:142) by 0x14486C: execsubst (exec.c:2570) by 0x1772E9: execfor (loop.c:98) by 0x148469: execcmd_exec (exec.c:3913) I hope this helps. Thank you for your time and developing zsh! Cheers, Dennis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-26 17:10 ` Dennis Schwartz @ 2019-09-27 13:46 ` Daniel Shahaf 2019-09-28 11:16 ` Dennis Schwartz 2019-09-27 19:05 ` Peter Stephenson 1 sibling, 1 reply; 21+ messages in thread From: Daniel Shahaf @ 2019-09-27 13:46 UTC (permalink / raw) To: Dennis Schwartz, Peter Stephenson; +Cc: zsh-users Dennis Schwartz wrote on Thu, 26 Sep 2019 17:10 +00:00: > $ valgrind --leak-check=full --log-file=zsh-valgrind.log /usr/local/bin/zsh > /usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n' > $ ll What's the output of `dpkg -l zsh-antigen`? (I'm looking for the version number.) > TRAPINT:1: not an identifier: > > Here, `ll [TAB]` was executed in the new shell. I don't get the error > message "/usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n'" > when I start zsh without valgrind, so I guess that can be ignored. > What valgrind captured: Thanks! I think there's a clue in there somewhere :) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-27 13:46 ` Daniel Shahaf @ 2019-09-28 11:16 ` Dennis Schwartz 2019-09-28 14:29 ` Daniel Shahaf 2019-09-28 16:00 ` Bart Schaefer 0 siblings, 2 replies; 21+ messages in thread From: Dennis Schwartz @ 2019-09-28 11:16 UTC (permalink / raw) To: Daniel Shahaf; +Cc: Peter Stephenson, zsh-users On Thursday, September 26, 2019 2:48 PM, Dennis Schwartz <dennis.schwartz@protonmail.com> wrote: > However, I can only reproduce the bug if I have the following code in my > `~/.zshrc`: > > # Antigen zsh plugins > if [ -f "/usr/share/zsh-antigen/antigen.zsh" ]; then > source "/usr/share/zsh-antigen/antigen.zsh" > > # load some plugins here, but they are not relevant to trigger > # the bug > fi > > So, I conditionally `source` another file. Apparently, this is causing > super weird behavior. Unbelievably, if I open the file `.zshrc` (e.g., > vim/gedit) and save the file, I cannot trigger the bug. However, if I > open the file, but do not save the file, I always trigger the bug. Okay, so of course that didn't make any sense. Now I know that I can trigger the bug if (at least) the following conditions have been met: * On my system (Debian 10), I need to compile zsh with the version number from my default Debian installation. So I always do `git checkout zsh-5.7.1 -- Config/version.mk` before I compile. * `.zshrc` needs to contain several function definitions, aliases, keybindings, or other configurations. * `.zshrc` needs to contain a trap on interrupt. * I suspect that `.zshrc` also needs to contain `source "/usr/share/zsh-antigen/antigen.zsh"` (I'm using 2.2.3-2 from Debian 10) * `zsh` needs to be started twice. * The first time the bug cannot be triggered. * The second time the bug can be triggered by typing a character and then hitting TAB to autocomplete. Now hit Ctrl+C to interrupt. The bug is triggered. I suspect that `.zshrc` is read and either zsh or antigen generates some files based on the loaded configuration. That would explain why the bug is only triggered after zsh has been executed at least once. Unfortunately, I cannot easily generate a minimal `.zshrc` that triggers the bug. If I remove a function definition of my `.zshrc` and replace it by a bogus function I can trigger the bug based on the function definition. I haven't found a clear pattern though. However, I found that I could cause zsh to segfault using the following Python 3 generated `.zshrc` >>> open('/home/USERNAME/.zshrc', 'w').write('function fun() { echo "' + 'a' * (1 << 24) + '" }\nTRAPINT() { print $1; return $(( 128 + $1 )) }\nsource "/usr/share/zsh-antigen/antigen.zsh"') WARNING: This causes to crash zsh even if you replace your `.zshrc` with a 'normal' file again. You have to first run `zsh -f`, afterwards, you can start `zsh` again normally. I guess this again has to do with some file begin automatically generated by zsh or antigen which needs to be regenerated. Which file could this be? How can I easily see which files get loaded on start-up? The file `~/.zcompdump` remains the same, independent whether the bug can be triggered. I have run $ ./configure --enable-zsh-debug --enable-zsh-mem && make && sudo make install $ valgrind --leak-check=full --log-file=zsh-valgrind.log /usr/local/bin/zsh to capture the segfault. I cannot be sure that this is the same bug as the one I experience with the TRAPINT function. The log file (the memory addresses shift 0x10 bytes if I compile without `--enable-zsh-mem`): > Memcheck, a memory error detector > Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info > Command: /usr/local/bin/zsh > Parent PID: 10371 > > Invalid read of size 1 > at 0x4839565: __strncmp_sse42 (vg_replace_strmem.c:651) > by 0x14BD28: execfuncdef (exec.c:5286) > by 0x140669: execsimple (exec.c:1248) > by 0x140A75: execlist (exec.c:1378) > by 0x14037F: execode (exec.c:1194) > by 0x168151: source (init.c:1460) > by 0x168649: sourcehome (init.c:1536) > by 0x167D01: run_init_scripts (init.c:1340) > by 0x169224: zsh_main (init.c:1754) > by 0x11FD44: main (main.c:93) > Address 0x584f110 is not stack'd, malloc'd or (recently) free'd > > > Process terminating with default action of signal 11 (SIGSEGV) > Access not within mapped region at address 0x584F110 > at 0x4839565: __strncmp_sse42 (vg_replace_strmem.c:651) > by 0x14BD28: execfuncdef (exec.c:5286) > by 0x140669: execsimple (exec.c:1248) > by 0x140A75: execlist (exec.c:1378) > by 0x14037F: execode (exec.c:1194) > by 0x168151: source (init.c:1460) > by 0x168649: sourcehome (init.c:1536) > by 0x167D01: run_init_scripts (init.c:1340) > by 0x169224: zsh_main (init.c:1754) > by 0x11FD44: main (main.c:93) > If you believe this happened as a result of a stack > overflow in your program's main thread (unlikely but > possible), you can try to increase the size of the > main thread stack using the --main-stacksize= flag. > The main thread stack size used in this run was 8388608. > > HEAP SUMMARY: > in use at exit: 62,886 bytes in 919 blocks > total heap usage: 1,052 allocs, 133 frees, 105,358 bytes allocated > > 1 bytes in 1 blocks are definitely lost in loss record 4 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x1B232A: ztrdup (string.c:83) > by 0x16724E: setupvals (init.c:1062) > by 0x169210: zsh_main (init.c:1749) > by 0x11FD44: main (main.c:93) > > 2 bytes in 1 blocks are definitely lost in loss record 11 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x166AC3: init_term (init.c:805) > by 0x19564C: term_reinit_from_pm (params.c:4892) > by 0x1956A4: termsetfn (params.c:4912) > by 0x18ED1C: assignstrvalue (params.c:2532) > by 0x190C74: assignsparam (params.c:3144) > by 0x18A805: createparamtable (params.c:867) > by 0x167446: setupvals (init.c:1116) > by 0x169210: zsh_main (init.c:1749) > by 0x11FD44: main (main.c:93) > > 4 bytes in 1 blocks are definitely lost in loss record 22 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x1CA909: metafy (utils.c:4769) > by 0x1CAABE: ztrdup_metafy (utils.c:4826) > by 0x18A6E6: createparamtable (params.c:834) > by 0x167446: setupvals (init.c:1116) > by 0x169210: zsh_main (init.c:1749) > by 0x11FD44: main (main.c:93) > > 5 bytes in 1 blocks are definitely lost in loss record 26 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x1B232A: ztrdup (string.c:83) > by 0x166F14: setupvals (init.c:973) > by 0x169210: zsh_main (init.c:1749) > by 0x11FD44: main (main.c:93) > > 8 bytes in 1 blocks are definitely lost in loss record 62 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x195CA7: mkenvstr (params.c:5244) > by 0x18A862: createparamtable (params.c:871) > by 0x167446: setupvals (init.c:1116) > by 0x169210: zsh_main (init.c:1749) > by 0x11FD44: main (main.c:93) > > 9 bytes in 1 blocks are definitely lost in loss record 77 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x1B232A: ztrdup (string.c:83) > by 0x166F2E: setupvals (init.c:974) > by 0x169210: zsh_main (init.c:1749) > by 0x11FD44: main (main.c:93) > > 9 bytes in 1 blocks are definitely lost in loss record 78 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x1B232A: ztrdup (string.c:83) > by 0x166F48: setupvals (init.c:975) > by 0x169210: zsh_main (init.c:1749) > by 0x11FD44: main (main.c:93) > > 10 bytes in 1 blocks are definitely lost in loss record 95 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x1CA909: metafy (utils.c:4769) > by 0x1672C5: setupvals (init.c:1075) > by 0x169210: zsh_main (init.c:1749) > by 0x11FD44: main (main.c:93) > > 15 bytes in 1 blocks are definitely lost in loss record 128 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x1B232A: ztrdup (string.c:83) > by 0x166F62: setupvals (init.c:976) > by 0x169210: zsh_main (init.c:1749) > by 0x11FD44: main (main.c:93) > > 16 bytes in 1 blocks are definitely lost in loss record 134 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x17CC01: pushheap (mem.c:304) > by 0x18A6FA: createparamtable (params.c:848) > by 0x167446: setupvals (init.c:1116) > by 0x169210: zsh_main (init.c:1749) > by 0x11FD44: main (main.c:93) > > 68 (56 direct, 12 indirect) bytes in 1 blocks are definitely lost in loss record 263 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x17EA5E: zshcalloc (mem.c:979) > by 0x183E4A: load_module (module.c:2219) > by 0x167C3B: run_init_scripts (init.c:1318) > by 0x169224: zsh_main (init.c:1754) > by 0x11FD44: main (main.c:93) > > 81 bytes in 2 blocks are definitely lost in loss record 290 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x1B232A: ztrdup (string.c:83) > by 0x18A87E: createparamtable (params.c:874) > by 0x167446: setupvals (init.c:1116) > by 0x169210: zsh_main (init.c:1749) > by 0x11FD44: main (main.c:93) > > 112 bytes in 4 blocks are definitely lost in loss record 299 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x1CA909: metafy (utils.c:4769) > by 0x18A7EB: createparamtable (params.c:867) > by 0x167446: setupvals (init.c:1116) > by 0x169210: zsh_main (init.c:1749) > by 0x11FD44: main (main.c:93) > > 256 bytes in 1 blocks are definitely lost in loss record 330 of 360 > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x17E8C8: zalloc (mem.c:966) > by 0x18A675: createparamtable (params.c:829) > by 0x167446: setupvals (init.c:1116) > by 0x169210: zsh_main (init.c:1749) > by 0x11FD44: main (main.c:93) > > LEAK SUMMARY: > definitely lost: 584 bytes in 18 blocks > indirectly lost: 12 bytes in 1 blocks > possibly lost: 0 bytes in 0 blocks > still reachable: 62,290 bytes in 900 blocks > suppressed: 0 bytes in 0 blocks > Reachable blocks (those to which a pointer was found) are not shown. > To see them, rerun with: --leak-check=full --show-leak-kinds=all > > For counts of detected and suppressed errors, rerun with: -v > ERROR SUMMARY: 15 errors from 15 contexts (suppressed: 0 from 0) On Friday, September 27, 2019 1:46 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote: > Dennis Schwartz wrote on Thu, 26 Sep 2019 17:10 +00:00: > > > $ valgrind --leak-check=full --log-file=zsh-valgrind.log /usr/local/bin/zsh > > /usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n' > > $ ll > > What's the output of `dpkg -l zsh-antigen`? (I'm looking for the version number.) Good point. Debian 10 (buster) ships 2.2.3-2, which I'm running. I believe the bug is triggered in zsh by using this newer version (Debian 9 ships 1.3.4-1). If I compile and run zsh 5.3.1 (shipped with Debian 9, where I did not encountered this issue) on with `zsh-antigen` from Debian 10, I can also trigger the bug. On Friday, September 27, 2019 7:05 PM, Peter Stephenson <p.w.stephenson@ntlworld.com> wrote: > Thanks, this is exactly what I was asking for. Thanks for quite extensively explaining what's going on! > Does removing that assignment make a difference? No, the bug triggers for any TRAPINT function I've tried so far. I have the feeling we getting closer to the root cause of the bug. Cheers, Dennis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-28 11:16 ` Dennis Schwartz @ 2019-09-28 14:29 ` Daniel Shahaf 2019-09-28 18:21 ` Dennis Schwartz 2019-09-28 16:00 ` Bart Schaefer 1 sibling, 1 reply; 21+ messages in thread From: Daniel Shahaf @ 2019-09-28 14:29 UTC (permalink / raw) To: Dennis Schwartz; +Cc: Peter Stephenson, zsh-users Dennis Schwartz wrote on Sat, Sep 28, 2019 at 11:16:08 +0000: > I have run > > $ ./configure --enable-zsh-debug --enable-zsh-mem && make && sudo make install Please run «make check» in there as well, on general principles. > On Friday, September 27, 2019 1:46 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote: > > > Dennis Schwartz wrote on Thu, 26 Sep 2019 17:10 +00:00: > > > > > $ valgrind --leak-check=full --log-file=zsh-valgrind.log /usr/local/bin/zsh > > > /usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n' > > > $ ll > > > > What's the output of `dpkg -l zsh-antigen`? (I'm looking for the version number.) > > Good point. Debian 10 (buster) ships 2.2.3-2, which I'm running. > I believe the bug is triggered in zsh by using this newer version > (Debian 9 ships 1.3.4-1). If I compile and run zsh 5.3.1 (shipped with > Debian 9, where I did not encountered this issue) on with `zsh-antigen` > from Debian 10, I can also trigger the bug. Okay, so we have another angle on this: we could try to bisect antigen, either temporally (between 1.3.4-1 in Debian stretch and 2.2.3-2 in Debian buster) or spatially (taking the 2.2.3-2 version and deleting half of its antigen.zsh file at a time, lather, rinse, repeat). Would it be worthwhile to try and find a minimal example for the parse error? I don't know whether it's likely to be related to the memory bug. > > Does removing that assignment make a difference? > > No, the bug triggers for any TRAPINT function I've tried so far. Have you tried an empty function, «TRAPINT () {}»? Is there any reason to also try a «trap ':' INT»? Cheers, Daniel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-28 14:29 ` Daniel Shahaf @ 2019-09-28 18:21 ` Dennis Schwartz 2019-09-28 18:58 ` Dennis Schwartz 0 siblings, 1 reply; 21+ messages in thread From: Dennis Schwartz @ 2019-09-28 18:21 UTC (permalink / raw) To: Daniel Shahaf, zsh-users On Saturday, September 28, 2019 2:29 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote: > Okay, so we have another angle on this: we could try to bisect antigen, > either temporally (between 1.3.4-1 in Debian stretch and 2.2.3-2 in > Debian buster) or spatially (taking the 2.2.3-2 version and deleting > half of its antigen.zsh file at a time, lather, rinse, repeat). Hmm, it seems I'm having difficulties to determine the exact conditions when this bug occurs. The commands $ rm -Rf ~/.antigen && git checkout v1.3.4 && make also triggers the bug using /usr/bin/zsh. I also updated my .zshrc to source the correct `bin/antigen.zsh` and also run `antigen apply`. > > > Does removing that assignment make a difference? > > > > No, the bug triggers for any TRAPINT function I've tried so far. > > Have you tried an empty function, «TRAPINT () {}»? > > Is there any reason to also try a «trap ':' INT»? Both do nothing. It seems like TRAPINT needs to contain at least one command or a return statement. On Saturday, September 28, 2019 4:00 PM, Bart Schaefer <schaefer@brasslantern.com> wrote: > On Sat, Sep 28, 2019 at 4:17 AM Dennis Schwartz > dennis.schwartz@protonmail.com wrote: > > > - On my system (Debian 10), I need to compile zsh with the version > > number from my default Debian installation. So I always do > > `git checkout zsh-5.7.1 -- Config/version.mk` before I compile. > > > > So, you should definitely STOP doing that. It's only creating confusion. Okay, I see why I should avoid doing this. The consequence is that I can only debug version 5.7.1, either compiled myself (so optionally with debugging flags set) or using the Debian shipped version. > I would also suggest that you go back to the configuration where you > first observed the problem (i.e., do NOT use a custom-compiled binary) > and start zsh with > zsh -o sourcetrace > which will show you where all the configuration files are being found. > You can then compare that to "zsh -o sourcetrace" from your newly > compiled binary to determine which files are the same and which are > different in the event that the bug behavior changes with the new > build. Thanks! That was exactly the command I was looking for. If I return to my initial setup (i.e., using Debian's shipped zsh and antigen + my original .zshrc) I get: $ whence zsh /usr/bin/zsh $ touch .zshrc $ cp .zcompdump zcompdum-pre $ cp -R .antigen antigen-pre $ zsh -o sourcetrace +/etc/zsh/zshenv:1> <sourcetrace> +/etc/zsh/zshrc:1> <sourcetrace> +/home/USERNAME/.zshrc:1> <sourcetrace> +/home/USERNAME/.zcompdump:1> <sourcetrace> +/usr/share/zsh-antigen/antigen.zsh:1> <sourcetrace> +/home/USERNAME/.antigen/.zcompdump:1> <sourcetrace> +/home/USERNAME/.antigen/init.zsh:1> <sourcetrace> +/home/USERNAME/.antigen/.zcompdump:1> <sourcetrace> $ ll 2 <---- indicating the correct behavior of TRAPINT $ exit $ cp .zcompdump zcompdum-mid $ cp -R .antigen antigen-mid $ zsh -o sourcetrace +/etc/zsh/zshenv:1> <sourcetrace> +/etc/zsh/zshrc:1> <sourcetrace> +/home/USERNAME/.zshrc:1> <sourcetrace> +/home/USERNAME/.zcompdump:1> <sourcetrace> +/usr/share/zsh-antigen/antigen.zsh:1> <sourcetrace> +/home/USERNAME/.antigen/init.zsh:1> <sourcetrace> +/home/USERNAME/.antigen/.zcompdump:1> <sourcetrace> $ ll TRAPINT:1: not an identifier: F\M-|N)U $ exit $ cp .zcompdump zcompdum-post $ cp -R .antigen antigen-post $ sha1sum antigen-{pre,mid,post}/**/* sha1sum: antigen-pre/bundles: Is a directory ef142d6575f491caf15f643c90abec9809138eff antigen-pre/debug.log dbb5aba046583b7e5bd18ff482022e7eb57db6d1 antigen-pre/init.zsh 6a554ac7275ad58b87a484e9be26961ba7bc3bb6 antigen-pre/init.zsh.zwc sha1sum: antigen-mid/bundles: Is a directory ab8687454b49cc6d3c55e5d925596cddcbadc342 antigen-mid/debug.log 7f66a86e7d6d7b6847ccd8df0852db90ea40a5af antigen-mid/init.zsh 6a554ac7275ad58b87a484e9be26961ba7bc3bb6 antigen-mid/init.zsh.zwc sha1sum: antigen-post/bundles: Is a directory ab8687454b49cc6d3c55e5d925596cddcbadc342 antigen-post/debug.log 7f66a86e7d6d7b6847ccd8df0852db90ea40a5af antigen-post/init.zsh 6a554ac7275ad58b87a484e9be26961ba7bc3bb6 antigen-post/init.zsh.zwc $ sha1sum zcompdum-{pre,mid,post} 2e658e3f3c3c21bec98fedea8390cffd8fdab15e zcompdum-pre 2e658e3f3c3c21bec98fedea8390cffd8fdab15e zcompdum-mid 2e658e3f3c3c21bec98fedea8390cffd8fdab15e zcompdum-post The only difference between `debug.log` is that `pre` contains less entries. For `init.zsh`, the only difference is a timestamp in a comment. I'm a bit lost by these results. Using the original setup, I can also reproduce the segfault quite easily: $ python3 Python 3.7.3 (default, Apr 3 2019, 05:39:12) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> open('/home/USERNAME/.zshrc', 'w').write('function fun() { echo "' + 'a' * (1 << 24) + '" }\nsource "/usr/share/zsh-antigen/antigen.zsh"\nantigen apply') 16777300 >>> $ rm -Rf .antigen; rm .zcompdump $ zsh -o sourcetrace +/etc/zsh/zshenv:1> <sourcetrace> +/etc/zsh/zshrc:1> <sourcetrace> +/home/USERNAME/.zshrc:1> <sourcetrace> +/usr/share/zsh-antigen/antigen.zsh:1> <sourcetrace> +/home/USERNAME/.antigen/init.zsh:1> <sourcetrace> +/home/USERNAME/.antigen/.zcompdump:1> <sourcetrace> % exit $ zsh -o sourcetrace +/etc/zsh/zshenv:1> <sourcetrace> +/etc/zsh/zshrc:1> <sourcetrace> zsh: segmentation fault zsh -o sourcetrace Running `cp ~/zshrc-good ~/.zshrc` fixed it again (no need for `zsh -f`). I spent several hours on trying to debug this issue again today. This of course pale into insignificance compared to your time developing zsh (thanks again!), but I hope you understand that I can only spent some more time again next weekend. Feel free to ask me to try something to help debugging though! Cheers, Dennis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-28 18:21 ` Dennis Schwartz @ 2019-09-28 18:58 ` Dennis Schwartz 0 siblings, 0 replies; 21+ messages in thread From: Dennis Schwartz @ 2019-09-28 18:58 UTC (permalink / raw) To: Daniel Shahaf, zsh-users On Saturday, September 28, 2019 6:21 PM, Dennis Schwartz <dennis.schwartz@protonmail.com> wrote: > On Saturday, September 28, 2019 2:29 PM, Daniel Shahaf d.s@daniel.shahaf.name wrote: > > > > > Does removing that assignment make a difference? > > > > > > No, the bug triggers for any TRAPINT function I've tried so far. > > > > Have you tried an empty function, «TRAPINT () {}»? > > Is there any reason to also try a «trap ':' INT»? > > Both do nothing. It seems like TRAPINT needs to contain at least one > command or a return statement. Sorry, I only now realize that `trap ':' INT` actually overcomes the problem. I can now set trap 'VIMODE="$VIINS"; return 130' INT and that actually doesn't trigger the bug. Neither does mytrap () { VIMODE="$VIINS" return 130 } trap mytrap INT but that doesn't change the VIMODE variable on my prompt nor returns 130. (In both cases, $1 is always empty.) Cheers, Dennis ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-28 11:16 ` Dennis Schwartz 2019-09-28 14:29 ` Daniel Shahaf @ 2019-09-28 16:00 ` Bart Schaefer 2019-09-29 16:54 ` Peter Stephenson 1 sibling, 1 reply; 21+ messages in thread From: Bart Schaefer @ 2019-09-28 16:00 UTC (permalink / raw) To: Dennis Schwartz; +Cc: Daniel Shahaf, Peter Stephenson, zsh-users On Sat, Sep 28, 2019 at 4:17 AM Dennis Schwartz <dennis.schwartz@protonmail.com> wrote: > > * On my system (Debian 10), I need to compile zsh with the version > number from my default Debian installation. So I always do > `git checkout zsh-5.7.1 -- Config/version.mk` before I compile. So, you should definitely STOP doing that. It's only creating confusion. The version number from config.mk determines three things: 1. The function load path 2. The compiled module load path 3. The format of "compiled" function definitions from .zwc files and as a corollary to #3, whether zsh will load the .zwc file at all, because it compares a version number embedded in the file to to version number of the compiled zsh. If you compile version X.Y.Z of zsh with the version.mk from version P.D.Q, particularly on a host where P.D.Q was previously (or is currently) installed, you are extremely likely to either be linking with an incompatible shared object file, or loading a .zwc file whose bytecode is garbage to the internals of your newly compiled binary. Either of those things could be causing the crashes you are seeing, or cause valgrind to generate results that have no real relationship to the original problem. This part -- > * `zsh` needs to be started twice. > * The first time the bug cannot be triggered. > * The second time the bug can be triggered by typing a character and > then hitting TAB to autocomplete. Now hit Ctrl+C to interrupt. -- suggests very strongly that this is related to loading an incorrect version of a compiled function as a result of the .zcompdump file having been updated, or some similar automatic configuration update, probably (as you suggest) being performed by antigen. To get anywhere with this, we need a zsh that is compiled entirely consistently, not with bits an pieces of different versions. Either check out the entire git revision matching your OS version, not just the version number file, or run the entire test with the most recent version, including the correct version.mk for that build. I would also suggest that you go back to the configuration where you first observed the problem (i.e., do NOT use a custom-compiled binary) and start zsh with zsh -o sourcetrace which will show you where all the configuration files are being found. You can then compare that to "zsh -o sourcetrace" from your newly compiled binary to determine which files are the same and which are different in the event that the bug behavior changes with the new build. If after correcting the build process you STILL observe that zsh must be started twice, comparing sourcetrace from the first and second runs may also be informative. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-28 16:00 ` Bart Schaefer @ 2019-09-29 16:54 ` Peter Stephenson 0 siblings, 0 replies; 21+ messages in thread From: Peter Stephenson @ 2019-09-29 16:54 UTC (permalink / raw) To: zsh-users On Sat, 2019-09-28 at 09:00 -0700, Bart Schaefer wrote: > On Sat, Sep 28, 2019 at 4:17 AM Dennis Schwartz > <dennis.schwartz@protonmail.com> wrote: > > > > * On my system (Debian 10), I need to compile zsh with the version > > number from my default Debian installation. So I always do > > `git checkout zsh-5.7.1 -- Config/version.mk` before I compile. > > So, you should definitely STOP doing that. It's only creating confusion. This is an important point I missed --- if you're using wordcode compiled files this will be a disaster, as Bart notes. Also, if you're using wordcode compiled files, that could be a reason why using the trap command behaves differently from defining a TRAPINT function --- although the actual behaviour depends on the utilities you're using, it's much more typically to save functions that way than traps defined the other way. pws ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TRAPINT doesn't work reliably 2019-09-26 17:10 ` Dennis Schwartz 2019-09-27 13:46 ` Daniel Shahaf @ 2019-09-27 19:05 ` Peter Stephenson 1 sibling, 0 replies; 21+ messages in thread From: Peter Stephenson @ 2019-09-27 19:05 UTC (permalink / raw) To: zsh-users; +Cc: Dennis Schwartz On Thu, 2019-09-26 at 17:10 +0000, Dennis Schwartz wrote: > I don't fully understand what you mean with "the logic where you're > defining TRAPINT," but I have the following code in my `.zshrc`: > > function TRAPINT { > VIMODE="$VIINS" > print $1 # for debug only > return $(( 128 + $1 )) > } I was just wondering if there's more structure than that around, but I think I was reading too much into what I suspect (see below) is actually irrelevant information. > I did manage to capture the bug with valgrind on `master` using the > following sequence of commands (output tidied): > > $ git checkout master > $ git checkout zsh-5.7.1 -- Config/version.mk > $ ./configure --enable-zsh-debug && make && sudo make install > $ valgrind --leak-check=full --log-file=zsh-valgrind.log /usr/local/bin/zsh > /usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n' > $ ll > TRAPINT:1: not an identifier: Thanks, this is exactly what I was asking for. Obviously TRAPINT is getting screwed up somehow. Unforunately, I think the dmaage may have been done too early for this to tell us where. > Invalid read of size 1 > at 0x4838CC2: __strlen_sse2 (vg_replace_strmem.c:462) > by 0x1B0792: dupstring (string.c:39) > by 0x19BC70: ecgetstr (parse.c:2809) > by 0x144095: addvars (exec.c:2429) > by 0x1404DB: execsimple (exec.c:1237) > by 0x140A85: execlist (exec.c:1378) > by 0x14038F: execode (exec.c:1194) > by 0x14DCB0: runshfunc (exec.c:5980) > by 0x14D2E8: doshfunc (exec.c:5830) > by 0x1AF4D1: dotrapargs (signals.c:1371) > by 0x1AFA8F: dotrap (signals.c:1487) > by 0x1AF18C: handletrap (signals.c:1202) This is saying it's trying to execute your trap. It's getting into trouble when it's trying to read in the variable assignment from the trap. Either that's the VIMODE="$VIINS" chunk that's been messed up, or it's already got confused and is guessing what's going on. I would suspect that actually the main function structure is still there, since it's otherwise quite unlikely to negotiate the exec hierarchy down to addvars(). However, it's possible it's also been erroneously freed but malloc has only grabbed the assignment part of it for reuse so far. Does removing that assignment make a difference? That's just for testing, obviously. But given the shell obviously is trying to do an assignment and that's gone awol, it might tell us something. (If, for example, the error now occurs somewhere a bit later it might indicate that indeed the entire fucntion is free and malloc() is repurposing the memory piecemeal.) > Address 0x566b948 is 0 bytes after a block of size 328 free'd > at 0x48369AB: free (vg_replace_malloc.c:530) > by 0x13D8F3: zcontext_restore_partial (context.c:108) > by 0x13DA56: zcontext_restore (context.c:119) > by 0x175A04: parse_subscript (lex.c:1697) > by 0x18B7F1: getindex (params.c:1858) > by 0x18C132: fetchvalue (params.c:2106) > by 0x1B6304: paramsubst (subst.c:2516) > by 0x1B1DB9: stringsubst (subst.c:322) > by 0x1B1108: prefork (subst.c:142) > by 0x14486C: execsubst (exec.c:2570) > by 0x1772E9: execfor (loop.c:98) > by 0x148469: execcmd_exec (exec.c:3913) > Block was alloc'd at > at 0x483577F: malloc (vg_replace_malloc.c:299) > by 0x13D5D6: zcontext_save_partial (context.c:58) > by 0x13D7E9: zcontext_save (context.c:82) > by 0x1758A7: parse_subscript (lex.c:1661) > by 0x18B7F1: getindex (params.c:1858) > by 0x18C132: fetchvalue (params.c:2106) > by 0x1B6304: paramsubst (subst.c:2516) > by 0x1B1DB9: stringsubst (subst.c:322) > by 0x1B1108: prefork (subst.c:142) > by 0x14486C: execsubst (exec.c:2570) > by 0x1772E9: execfor (loop.c:98) > by 0x148469: execcmd_exec (exec.c:3913) So this stuff is saying, when we performed a substitution we had to save and restore some memory and we used the chunk that valgrind reported the error on. In other words, it had apparently been freed somewhere else already, so malloc() just grabbed it. So I don't think the code being executed here is actually relevant to the original problem, it's just the unlucky victim that got a chunk that shouldn't have been freed in the first place. Unfortunately this doesn't tell us where that happened. But it does look like it was actually freed, i.e. the problem isn't something is stomping on memory owned by something else, it's that the memory was erroneously given back to the system. (At least, that's the simple interpretation.) Not sure quite where to go from here --- but at least we have something that's reproducible, which is quite good by the standards of memory errors. I think we'll need to add something to the code you're using that marks the memory in the TRAPINT somehow. I'll need to think what seems propitious... First simple step might be to see if the shell is indeed freeing the TRAPINT() function code at some point. That shouldn't be so hard to find out but it'll need a bit of confection. cheers pws ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-09-29 16:54 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CGME20190917164905epcas1p4ad458ffcd504501780d522880c81de3e@epcas1p4.samsung.com> 2019-09-17 16:47 ` TRAPINT doesn't work reliably Dennis Schwartz 2019-09-24 8:44 ` Peter Stephenson 2019-09-25 13:02 ` Dennis Schwartz 2019-09-25 14:01 ` Peter Stephenson 2019-09-25 16:25 ` Dennis Schwartz 2019-09-25 17:04 ` Peter Stephenson 2019-09-25 18:46 ` Daniel Shahaf 2019-09-26 15:27 ` Peter Stephenson 2019-09-27 13:43 ` Daniel Shahaf 2019-09-25 17:56 ` Peter Stephenson 2019-09-26 14:48 ` Dennis Schwartz 2019-09-26 15:25 ` Peter Stephenson 2019-09-26 17:10 ` Dennis Schwartz 2019-09-27 13:46 ` Daniel Shahaf 2019-09-28 11:16 ` Dennis Schwartz 2019-09-28 14:29 ` Daniel Shahaf 2019-09-28 18:21 ` Dennis Schwartz 2019-09-28 18:58 ` Dennis Schwartz 2019-09-28 16:00 ` Bart Schaefer 2019-09-29 16:54 ` Peter Stephenson 2019-09-27 19:05 ` Peter Stephenson
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).