zsh-users
 help / Atom feed
* TRAPINT doesn't work reliably
@ 2019-09-17 16:47 ` Dennis Schwartz
  2019-09-24  8:44   ` Peter Stephenson
  0 siblings, 1 reply; 21+ messages in thread
From: Dennis Schwartz @ 2019-09-17 16:47 UTC (permalink / raw)
  To: zsh-users

Hi,

I have a function on TRAPINT in my .zshrc like described as in the Zsh manual [1].

    TRAPINT() {
        echo "trap: $1"
        return $(( 128 + $1 ))
    }


This works unreliably. Usually this works a first few times, but after a while this doesn’t work anymore and throws the following error.

TRAPINT:1: command not found: \M-^A^A
TRAPINT:2: command not found: F^\V


This command used to work flawlessly in Zsh 5.3.1 (Debian strech). I only encounter this issue in 5.7.1.

Is this a regression that might have been introduced, or is there maybe something else wrong in my (other) configuration?


Thanks,


Dennis

[1] http://zsh.sourceforge.net/Doc/Release/Functions.html#index-trapping-signals

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-17 16:47 ` TRAPINT doesn't work reliably Dennis Schwartz
@ 2019-09-24  8:44   ` Peter Stephenson
  2019-09-25 13:02     ` Dennis Schwartz
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Stephenson @ 2019-09-24  8:44 UTC (permalink / raw)
  To: zsh-users; +Cc: Dennis Schwartz

On Tue, 2019-09-17 at 16:47 +0000, Dennis Schwartz wrote:
> Hi,
> 
> I have a function on TRAPINT in my .zshrc like described as in the Zsh manual [1].
> 
>     TRAPINT() {
>         echo "trap: $1"
>         return $(( 128 + $1 ))
>     }
> 
> 
> This works unreliably. Usually this works a first few times, but after a while this doesn’t work anymore and throws the following error.
> 
> TRAPINT:1: command not found: \M-^A^A
> TRAPINT:2: command not found: F^\V

This certainly isn't likely to be anything you've done wrong, at least based on
what you've told us.

It smells of memory management problems, but it's hard to see where the corruption
would be.

What do you see if you run

functions TRAPINT

after the problem has turned up?

pws

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-24  8:44   ` Peter Stephenson
@ 2019-09-25 13:02     ` Dennis Schwartz
  2019-09-25 14:01       ` Peter Stephenson
  0 siblings, 1 reply; 21+ messages in thread
From: Dennis Schwartz @ 2019-09-25 13:02 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-users

On Tuesday, September 24, 2019 10:44 AM, Peter Stephenson <p.stephenson@samsung.com> wrote:

> On Tue, 2019-09-17 at 16:47 +0000, Dennis Schwartz wrote:
>
> > I have a function on TRAPINT in my .zshrc like described as in the Zsh manual [1].
> >  
> >      TRAPINT() {
> >          echo "trap: $1"
> >          return $(( 128 + $1 ))
> >      }
> >  
> >  
> > This works unreliably. Usually this works a first few times, but after a while this doesn’t work anymore and throws the following error.
> >  
> > TRAPINT:1: command not found: \M-^A^A
> > TRAPINT:2: command not found: F^\V
>
> This certainly isn't likely to be anything you've done wrong, at least based on
> what you've told us.
>
> It smells of memory management problems, but it's hard to see where the corruption
> would be.
>
> What do you see if you run
>
> functions TRAPINT
>
> after the problem has turned up?

It almost looks like the function gets replaced with random memory.
`functions TRAPINT` just shows random bytes, for example:

$ xxd <(functions TRAPINT)
00000000: 5452 4150 494e 5420 2829 207b 0a09 0701  TRAPINT () {....
00000010: 200a 0950 200a 7d0a                       ..P .}.


I am now more convinced it's a bug in Zsh. Any advice on how to debug this?
And where can I best submit a bug report to?


Dennis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-25 13:02     ` Dennis Schwartz
@ 2019-09-25 14:01       ` Peter Stephenson
  2019-09-25 16:25         ` Dennis Schwartz
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Stephenson @ 2019-09-25 14:01 UTC (permalink / raw)
  To: Dennis Schwartz; +Cc: zsh-users


> On 25 September 2019 at 14:02 Dennis Schwartz <dennis.schwartz@protonmail.com> wrote:
> On Tuesday, September 24, 2019 10:44 AM, Peter Stephenson <p.stephenson@samsung.com> wrote:
>> On Tue, 2019-09-17 at 16:47 +0000, Dennis Schwartz wrote:
>>> I have a function on TRAPINT in my .zshrc like described as in the Zsh manual [1].
>>>  
>>>      TRAPINT() {
>>>          echo "trap: $1"
>>>          return $(( 128 + $1 ))
>>>      }
>>>
>>>
>>> This works unreliably. Usually this works a first few times, but
>>> after a while this doesn’t work anymore and throws the following
>>> error.
>>>
>>> TRAPINT:1: command not found: \M-^A^A
>>> TRAPINT:2: command not found: F^\V
>
> It almost looks like the function gets replaced with random memory.
> `functions TRAPINT` just shows random bytes, for example:
> 
> $ xxd <(functions TRAPINT)
> 00000000: 5452 4150 494e 5420 2829 207b 0a09 0701  TRAPINT () {....
> 00000010: 200a 0950 200a 7d0a                       ..P .}.
>
> I am now more convinced it's a bug in Zsh. Any advice on how to debug this?
> And where can I best submit a bug report to?

You don't need to submit a further separate bug report.

Memory errors are tricky, and often hard to reproduce since allocation
is heavily OS specific, but probably your best bet is to run with

valgrind --leak-check=full zsh

which should produce sensible results --- the shell shouldn't leak
memory and anything that looks anomalous is probably a real bug ("still
reachable" memory is OK).

I'd also suggest trying the latest firmware from git or sourceforge,
since there have been some memory fixes (and a release is probably
overdue).

cheers
pws

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-25 14:01       ` Peter Stephenson
@ 2019-09-25 16:25         ` Dennis Schwartz
  2019-09-25 17:04           ` Peter Stephenson
  2019-09-25 17:56           ` Peter Stephenson
  0 siblings, 2 replies; 21+ messages in thread
From: Dennis Schwartz @ 2019-09-25 16:25 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-users

On Wednesday, September 25, 2019 4:01 PM, Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:

> > On 25 September 2019 at 14:02 Dennis Schwartz dennis.schwartz@protonmail.com wrote:
> >
> > I am now more convinced it's a bug in Zsh. Any advice on how to debug this?
> > And where can I best submit a bug report to?
>
> You don't need to submit a further separate bug report.

Okay, thanks.

> Memory errors are tricky, and often hard to reproduce since allocation
> is heavily OS specific, but probably your best bet is to run with
>
> valgrind --leak-check=full zsh
>
> which should produce sensible results --- the shell shouldn't leak
> memory and anything that looks anomalous is probably a real bug ("still
> reachable" memory is OK).

I run valgrind on zsh and captured the error. Unfortunately, I am
inexperienced with C programming so I do not know how to interpret the
output. I've copied the part of the output that I believed is relevant
below. Please let me know if I could help in debugging it further.


==1896== Invalid read of size 1
==1896==    at 0x483BC62: strlen (vg_replace_strmem.c:460)
==1896==    by 0x19755E: dupstring (in /usr/bin/zsh)
==1896==    by 0x138F3B: ??? (in /usr/bin/zsh)
==1896==    by 0x144663: ??? (in /usr/bin/zsh)
==1896==    by 0x141A72: execlist (in /usr/bin/zsh)
==1896==    by 0x141D83: execode (in /usr/bin/zsh)
==1896==    by 0x142C8B: runshfunc (in /usr/bin/zsh)
==1896==    by 0x1431C8: doshfunc (in /usr/bin/zsh)
==1896==    by 0x1963C2: ??? (in /usr/bin/zsh)
==1896==    by 0x19413B: dotrap (in /usr/bin/zsh)
==1896==    by 0x194247: ??? (in /usr/bin/zsh)
==1896==    by 0x194661: zhandler (in /usr/bin/zsh)
==1896==  Address 0x5fc8488 is 264 bytes inside a block of size 328 free'd
==1896==    at 0x48399AB: free (vg_replace_malloc.c:530)
==1896==    by 0x136C8E: zcontext_restore_partial (in /usr/bin/zsh)
==1896==    by 0x1656C3: parse_subscript (in /usr/bin/zsh)
==1896==    by 0x17A446: getindex (in /usr/bin/zsh)
==1896==    by 0x17ABCF: fetchvalue (in /usr/bin/zsh)
==1896==    by 0x19BDB0: ??? (in /usr/bin/zsh)
==1896==    by 0x1A0C87: prefork (in /usr/bin/zsh)
==1896==    by 0x13ABE6: execsubst (in /usr/bin/zsh)
==1896==    by 0x1674CB: execfor (in /usr/bin/zsh)
==1896==    by 0x13E44C: ??? (in /usr/bin/zsh)
==1896==    by 0x13FB6E: ??? (in /usr/bin/zsh)
==1896==    by 0x13FF11: ??? (in /usr/bin/zsh)
==1896==  Block was alloc'd at
==1896==    at 0x483877F: malloc (vg_replace_malloc.c:299)
==1896==    by 0x136A13: zcontext_save_partial (in /usr/bin/zsh)
==1896==    by 0x165622: parse_subscript (in /usr/bin/zsh)
==1896==    by 0x17A446: getindex (in /usr/bin/zsh)
==1896==    by 0x17ABCF: fetchvalue (in /usr/bin/zsh)
==1896==    by 0x19BDB0: ??? (in /usr/bin/zsh)
==1896==    by 0x1A0C87: prefork (in /usr/bin/zsh)
==1896==    by 0x13ABE6: execsubst (in /usr/bin/zsh)
==1896==    by 0x1674CB: execfor (in /usr/bin/zsh)
==1896==    by 0x13E44C: ??? (in /usr/bin/zsh)
==1896==    by 0x13FB6E: ??? (in /usr/bin/zsh)
==1896==    by 0x13FF11: ??? (in /usr/bin/zsh)
==1896==
==1896== Invalid read of size 1

[... repetition 4 more times ...]

==2144==
==2144== HEAP SUMMARY:
==2144==     in use at exit: 1,583,276 bytes in 35,151 blocks
==2144==   total heap usage: 78,390 allocs, 43,239 frees, 12,292,102 bytes allocated
==2144==
==2144== LEAK SUMMARY:
==2144==    definitely lost: 0 bytes in 0 blocks
==2144==    indirectly lost: 0 bytes in 0 blocks
==2144==      possibly lost: 0 bytes in 0 blocks
==2144==    still reachable: 1,583,276 bytes in 35,151 blocks
==2144==         suppressed: 0 bytes in 0 blocks
==2144== Reachable blocks (those to which a pointer was found) are not shown.
==2144== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==2144==
==2144== For counts of detected and suppressed errors, rerun with: -v
==2144== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)

> I'd also suggest trying the latest firmware from git or sourceforge,
> since there have been some memory fixes (and a release is probably
> overdue).

I haven't tried compiling from the latest source code yet. If this is
desired I could try this again at a later point in time.

- Dennis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-25 16:25         ` Dennis Schwartz
@ 2019-09-25 17:04           ` Peter Stephenson
  2019-09-25 18:46             ` Daniel Shahaf
  2019-09-25 17:56           ` Peter Stephenson
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Stephenson @ 2019-09-25 17:04 UTC (permalink / raw)
  To: zsh-users; +Cc: Dennis Schwartz

On Wed, 2019-09-25 at 16:25 +0000, Dennis Schwartz wrote:
> On Wednesday, September 25, 2019 4:01 PM, Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> I run valgrind on zsh and captured the error. Unfortunately, I am
> inexperienced with C programming so I do not know how to interpret the
> output. I've copied the part of the output that I believed is relevant
> below. Please let me know if I could help in debugging it further.

Unforutnately, you haven't got debug symbols in the installed zsh, so
it's not showing much of interest --- though it does certainly suggest
something is up.

> > I'd also suggest trying the latest firmware from git or sourceforge,
> > since there have been some memory fixes (and a release is probably
> > overdue).
> I haven't tried compiling from the latest source code yet. If this is
> desired I could try this again at a later point in time.

I suspect that's going to have to be the next step, if you get the
chance.  In the top-level directory, run configure as

./configure --enable-zsh-debug

and then "make", and that should give you an installable executable that
gives useful debug information ("sudo make install" will put zsh in
/usr/local/bin; you can remove anything it installs in /usr/local later).

Thanks for the assistance
pws


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-25 16:25         ` Dennis Schwartz
  2019-09-25 17:04           ` Peter Stephenson
@ 2019-09-25 17:56           ` Peter Stephenson
  2019-09-26 14:48             ` Dennis Schwartz
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Stephenson @ 2019-09-25 17:56 UTC (permalink / raw)
  To: zsh-users; +Cc: dennis.schwartz

On Wed, 2019-09-25 at 16:25 +0000, Dennis Schwartz wrote:
> ==1896==  Block was alloc'd at
> ==1896==    at 0x483877F: malloc (vg_replace_malloc.c:299)
> ==1896==    by 0x136A13: zcontext_save_partial (in /usr/bin/zsh)
> ==1896==    by 0x165622: parse_subscript (in /usr/bin/zsh)
> ==1896==    by 0x17A446: getindex (in /usr/bin/zsh)
> ==1896==    by 0x17ABCF: fetchvalue (in /usr/bin/zsh)
> ==1896==    by 0x19BDB0: ??? (in /usr/bin/zsh)
> ==1896==    by 0x1A0C87: prefork (in /usr/bin/zsh)
> ==1896==    by 0x13ABE6: execsubst (in /usr/bin/zsh)
> ==1896==    by 0x1674CB: execfor (in /usr/bin/zsh)
> ==1896==    by 0x13E44C: ??? (in /usr/bin/zsh)
> ==1896==    by 0x13FB6E: ??? (in /usr/bin/zsh)
> ==1896==    by 0x13FF11: ??? (in /usr/bin/zsh)
> ==1896==
> ==1896== Invalid read of size 1

One kind of interesting thing here is there's some suggestion that the
original allocation was not simply in top level code, but in some kind
of block (at least a for block) --- assuming, of course, this is
relevant.  Is there some structure about the point where you're setting
up the trap?  If so, does changing it (making it simpler) have any
effect on the problem?

Cheers
pws


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-25 17:04           ` Peter Stephenson
@ 2019-09-25 18:46             ` Daniel Shahaf
  2019-09-26 15:27               ` Peter Stephenson
  0 siblings, 1 reply; 21+ messages in thread
From: Daniel Shahaf @ 2019-09-25 18:46 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-users, Dennis Schwartz

Peter Stephenson wrote on Wed, Sep 25, 2019 at 18:04:45 +0100:
> On Wed, 2019-09-25 at 16:25 +0000, Dennis Schwartz wrote:
> > I haven't tried compiling from the latest source code yet. If this is
> > desired I could try this again at a later point in time.
> 
> I suspect that's going to have to be the next step, if you get the
> chance.  In the top-level directory, run configure as
> 
> ./configure --enable-zsh-debug
> 

Should Dennis use any of these flags as well? —

[[[
% ./configure --help=short | vipe
  --enable-zsh-mem        compile with zsh memory allocation routines
  --enable-zsh-mem-debug  debug zsh memory allocation routines
  --enable-zsh-mem-warning
                          print warnings for errors in memory allocation
  --enable-zsh-secure-free
                          turn on error checking for free()
  --enable-zsh-heap-debug turn on error checking for heap allocation
  --enable-zsh-valgrind   turn on support for valgrind debugging of heap
                          memory
  --enable-zsh-hash-debug turn on debugging of internal hash tables
  --enable-stack-allocation
                          allocate stack memory e.g. with `alloca'
]]]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-25 17:56           ` Peter Stephenson
@ 2019-09-26 14:48             ` Dennis Schwartz
  2019-09-26 15:25               ` Peter Stephenson
  0 siblings, 1 reply; 21+ messages in thread
From: Dennis Schwartz @ 2019-09-26 14:48 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-users

On Wednesday, September 25, 2019 7:56 PM, Peter Stephenson <p.stephenson@samsung.com> wrote:

> On Wed, 2019-09-25 at 16:25 +0000, Dennis Schwartz wrote:
>
> > ==1896==  Block was alloc'd at
> > ==1896==    at 0x483877F: malloc (vg_replace_malloc.c:299)
> > ==1896==    by 0x136A13: zcontext_save_partial (in /usr/bin/zsh)
> > ==1896==    by 0x165622: parse_subscript (in /usr/bin/zsh)
> > ==1896==    by 0x17A446: getindex (in /usr/bin/zsh)
> > ==1896==    by 0x17ABCF: fetchvalue (in /usr/bin/zsh)
> > ==1896==    by 0x19BDB0: ??? (in /usr/bin/zsh)
> > ==1896==    by 0x1A0C87: prefork (in /usr/bin/zsh)
> > ==1896==    by 0x13ABE6: execsubst (in /usr/bin/zsh)
> > ==1896==    by 0x1674CB: execfor (in /usr/bin/zsh)
> > ==1896==    by 0x13E44C: ??? (in /usr/bin/zsh)
> > ==1896==    by 0x13FB6E: ??? (in /usr/bin/zsh)
> > ==1896==    by 0x13FF11: ??? (in /usr/bin/zsh)
> > ==1896==
> > ==1896== Invalid read of size 1
>
> One kind of interesting thing here is there's some suggestion that the
> original allocation was not simply in top level code, but in some kind
> of block (at least a for block) --- assuming, of course, this is
> relevant.  Is there some structure about the point where you're setting
> up the trap?  If so, does changing it (making it simpler) have any
> effect on the problem?

Okay, after quite some time debugging (I will spare you the details of
all what I've tried), I can now reliably reproduce the bug. However, I
lack the knowledge of zsh to understand what is causing the bug.

To trigger the bug, I just open a fresh new shell (e.g. run `zsh`) and
type `ls` and hit TAB to trigger the autocompletion function.

However, I can only reproduce the bug if I have the following code in my
`~/.zshrc`:

    # Antigen zsh plugins
    if [ -f "/usr/share/zsh-antigen/antigen.zsh" ]; then
        source "/usr/share/zsh-antigen/antigen.zsh"

        # load some plugins here, but they are not relevant to trigger
        # the bug
    fi

So, I conditionally `source` another file. Apparently, this is causing
*super weird* behavior. Unbelievably, if I open the file `.zshrc` (e.g.,
vim/gedit) and _save_ the file, I cannot trigger the bug. However, if I
open the file, but _do not save_ the file, I always trigger the bug.

To complicate it further, I can trigger the bug when I compile from
source `zsh-5.7.1`, but I cannot trigger the bug anymore if VERSION in
`Config/version.mk` is updated (i.e., the next commit).

These findings leave me totally confused. Not sure if it's relevant but
I mount my home folder with the `noatime` option.

Any ideas? Thanks for your help!


Cheers,
Dennis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-26 14:48             ` Dennis Schwartz
@ 2019-09-26 15:25               ` Peter Stephenson
  2019-09-26 17:10                 ` Dennis Schwartz
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Stephenson @ 2019-09-26 15:25 UTC (permalink / raw)
  To: Dennis Schwartz; +Cc: zsh-users

On Thu, 2019-09-26 at 14:48 +0000, Dennis Schwartz wrote:
> However, I can only reproduce the bug if I have the following code in my
> `~/.zshrc`:
> 
>     # Antigen zsh plugins
>     if [ -f "/usr/share/zsh-antigen/antigen.zsh" ]; then
>         source "/usr/share/zsh-antigen/antigen.zsh"
> 
>         # load some plugins here, but they are not relevant to trigger
>         # the bug
>     fi
> 
> So, I conditionally `source` another file. Apparently, this is causing
> *super weird* behavior. Unbelievably, if I open the file `.zshrc` (e.g.,
> vim/gedit) and _save_ the file, I cannot trigger the bug. However, if I
> open the file, but _do not save_ the file, I always trigger the bug.

This is very much the sort of weirdness you get with memory errors,
unfortunately.  They're extremely sensitive to what was allocated and
deallocated where --- some piece of memory allocated for one purpose is
presumably being erroneously freed and reused, and as far as the structure of
your zsh code is concerned there's no actual logical relationship between
the places --- they are just getting mixed up in the bowels of the
allocation functions.

It suggests it's going to be quite hard to reproduce elsewhere, though
I'd still be interesting in the logic where you're defining TRAPINT since
clearly that's the memory that's getting mishandled.

It's also still suggesting trying to get valgrind to give a bit more
detail is the best way forward.

cheers
pws


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-25 18:46             ` Daniel Shahaf
@ 2019-09-26 15:27               ` Peter Stephenson
  2019-09-27 13:43                 ` Daniel Shahaf
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Stephenson @ 2019-09-26 15:27 UTC (permalink / raw)
  To: Daniel Shahaf; +Cc: zsh-users, Dennis Schwartz

On Wed, 2019-09-25 at 18:46 +0000, Daniel Shahaf wrote:
> Peter Stephenson wrote on Wed, Sep 25, 2019 at 18:04:45 +0100:
> > 
> > On Wed, 2019-09-25 at 16:25 +0000, Dennis Schwartz wrote:
> > > 
> > > I haven't tried compiling from the latest source code yet. If this is
> > > desired I could try this again at a later point in time.
> > I suspect that's going to have to be the next step, if you get the
> > chance.  In the top-level directory, run configure as
> > 
> > ./configure --enable-zsh-debug
> > 
> Should Dennis use any of these flags as well? —

It's not clear anything else is going to help debugging, certainly
if the build that's showing the problem was made (as almost all
distro builds are made) using the system allocators.  In that
case none of the zsh memory specials apply, and if we turn
on zsh memory management we are in a different world --- which
might shows the problem but could well perturb it somewhere
completely different.

cheers
pws


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-26 15:25               ` Peter Stephenson
@ 2019-09-26 17:10                 ` Dennis Schwartz
  2019-09-27 13:46                   ` Daniel Shahaf
  2019-09-27 19:05                   ` Peter Stephenson
  0 siblings, 2 replies; 21+ messages in thread
From: Dennis Schwartz @ 2019-09-26 17:10 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-users

On Thursday, September 26, 2019 5:25 PM, Peter Stephenson <p.stephenson@samsung.com> wrote:

> It suggests it's going to be quite hard to reproduce elsewhere, though
> I'd still be interesting in the logic where you're defining TRAPINT since
> clearly that's the memory that's getting mishandled.

I don't fully understand what you mean with "the logic where you're
defining TRAPINT," but I have the following code in my `.zshrc`:

    function TRAPINT {
        VIMODE="$VIINS"
        print $1  # for debug only
        return $(( 128 + $1 ))
    }

(I use zsh with vi keybindings and VIMODE indicates on my prompt which
mode I'm in. When I interrupt, start again in insert mode and I want that
to be properly indicated.)


> It's also still suggesting trying to get valgrind to give a bit more
> detail is the best way forward.

Ah, of course. I forgot about valgrind since I could only reproduce this
bug if I checked out the tag `zsh-5.7.1`.

I did manage to capture the bug with valgrind on `master` using the
following sequence of commands (output tidied):

$ git checkout master
$ git checkout zsh-5.7.1 -- Config/version.mk
$ ./configure --enable-zsh-debug && make && sudo make install
$ valgrind --leak-check=full --log-file=zsh-valgrind.log /usr/local/bin/zsh
/usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n'
$ ll
TRAPINT:1: not an identifier:

Here, `ll [TAB]` was executed in the new shell. I don't get the error
message "/usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n'"
when I start zsh without valgrind, so I guess that can be ignored.


What valgrind captured:

Invalid read of size 1
   at 0x4838CC2: __strlen_sse2 (vg_replace_strmem.c:462)
   by 0x1B0792: dupstring (string.c:39)
   by 0x19BC70: ecgetstr (parse.c:2809)
   by 0x144095: addvars (exec.c:2429)
   by 0x1404DB: execsimple (exec.c:1237)
   by 0x140A85: execlist (exec.c:1378)
   by 0x14038F: execode (exec.c:1194)
   by 0x14DCB0: runshfunc (exec.c:5980)
   by 0x14D2E8: doshfunc (exec.c:5830)
   by 0x1AF4D1: dotrapargs (signals.c:1371)
   by 0x1AFA8F: dotrap (signals.c:1487)
   by 0x1AF18C: handletrap (signals.c:1202)
 Address 0x566b948 is 0 bytes after a block of size 328 free'd
   at 0x48369AB: free (vg_replace_malloc.c:530)
   by 0x13D8F3: zcontext_restore_partial (context.c:108)
   by 0x13DA56: zcontext_restore (context.c:119)
   by 0x175A04: parse_subscript (lex.c:1697)
   by 0x18B7F1: getindex (params.c:1858)
   by 0x18C132: fetchvalue (params.c:2106)
   by 0x1B6304: paramsubst (subst.c:2516)
   by 0x1B1DB9: stringsubst (subst.c:322)
   by 0x1B1108: prefork (subst.c:142)
   by 0x14486C: execsubst (exec.c:2570)
   by 0x1772E9: execfor (loop.c:98)
   by 0x148469: execcmd_exec (exec.c:3913)
 Block was alloc'd at
   at 0x483577F: malloc (vg_replace_malloc.c:299)
   by 0x13D5D6: zcontext_save_partial (context.c:58)
   by 0x13D7E9: zcontext_save (context.c:82)
   by 0x1758A7: parse_subscript (lex.c:1661)
   by 0x18B7F1: getindex (params.c:1858)
   by 0x18C132: fetchvalue (params.c:2106)
   by 0x1B6304: paramsubst (subst.c:2516)
   by 0x1B1DB9: stringsubst (subst.c:322)
   by 0x1B1108: prefork (subst.c:142)
   by 0x14486C: execsubst (exec.c:2570)
   by 0x1772E9: execfor (loop.c:98)
   by 0x148469: execcmd_exec (exec.c:3913)


I hope this helps. Thank you for your time and developing zsh!

Cheers,
Dennis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-26 15:27               ` Peter Stephenson
@ 2019-09-27 13:43                 ` Daniel Shahaf
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel Shahaf @ 2019-09-27 13:43 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-users, Dennis Schwartz

Peter Stephenson wrote on Thu, 26 Sep 2019 15:27 +00:00:
> On Wed, 2019-09-25 at 18:46 +0000, Daniel Shahaf wrote:
> > Peter Stephenson wrote on Wed, Sep 25, 2019 at 18:04:45 +0100:
> > > 
> > > On Wed, 2019-09-25 at 16:25 +0000, Dennis Schwartz wrote:
> > > > 
> > > > I haven't tried compiling from the latest source code yet. If this is
> > > > desired I could try this again at a later point in time.
> > > I suspect that's going to have to be the next step, if you get the
> > > chance.  In the top-level directory, run configure as
> > > 
> > > ./configure --enable-zsh-debug
> > > 
> > Should Dennis use any of these flags as well? —
> 
> It's not clear anything else is going to help debugging, certainly
> if the build that's showing the problem was made (as almost all
> distro builds are made) using the system allocators.  In that
> case none of the zsh memory specials apply,

Even without any special configure flags, there's still zhalloc().  The source
of zhalloc() contains some blocks conditional on --enable-zsh-valgrind.
I assume passing that configure flag will let valgrind detect use-after-freeheap()
bugs.

Also, I thought --enable-zsh-secure-free and --enable-zsh-heap-debug were
independent of --enable-zsh-mem*.

> and if we turn on zsh memory management we are in a different world
> --- which might shows the problem but could well perturb it somewhere
> completely different.

Sure, any change could make the symptoms disappear, particularly
switching to a different allocator.

Cheers,

Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-26 17:10                 ` Dennis Schwartz
@ 2019-09-27 13:46                   ` Daniel Shahaf
  2019-09-28 11:16                     ` Dennis Schwartz
  2019-09-27 19:05                   ` Peter Stephenson
  1 sibling, 1 reply; 21+ messages in thread
From: Daniel Shahaf @ 2019-09-27 13:46 UTC (permalink / raw)
  To: Dennis Schwartz, Peter Stephenson; +Cc: zsh-users

Dennis Schwartz wrote on Thu, 26 Sep 2019 17:10 +00:00:
> $ valgrind --leak-check=full --log-file=zsh-valgrind.log /usr/local/bin/zsh
> /usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n'
> $ ll

What's the output of `dpkg -l zsh-antigen`?  (I'm looking for the version number.)

> TRAPINT:1: not an identifier:
> 
> Here, `ll [TAB]` was executed in the new shell. I don't get the error
> message "/usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n'"
> when I start zsh without valgrind, so I guess that can be ignored.

> What valgrind captured:

Thanks!  I think there's a clue in there somewhere :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-26 17:10                 ` Dennis Schwartz
  2019-09-27 13:46                   ` Daniel Shahaf
@ 2019-09-27 19:05                   ` Peter Stephenson
  1 sibling, 0 replies; 21+ messages in thread
From: Peter Stephenson @ 2019-09-27 19:05 UTC (permalink / raw)
  To: zsh-users; +Cc: Dennis Schwartz

On Thu, 2019-09-26 at 17:10 +0000, Dennis Schwartz wrote:
> I don't fully understand what you mean with "the logic where you're
> defining TRAPINT," but I have the following code in my `.zshrc`:
> 
>     function TRAPINT {
>         VIMODE="$VIINS"
>         print $1  # for debug only
>         return $(( 128 + $1 ))
>     }

I was just wondering if there's more structure than that around,
but I think I was reading too much into what I suspect (see below) is
actually irrelevant information.

> I did manage to capture the bug with valgrind on `master` using the
> following sequence of commands (output tidied):
> 
> $ git checkout master
> $ git checkout zsh-5.7.1 -- Config/version.mk
> $ ./configure --enable-zsh-debug && make && sudo make install
> $ valgrind --leak-check=full --log-file=zsh-valgrind.log /usr/local/bin/zsh
> /usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n'
> $ ll
> TRAPINT:1: not an identifier:

Thanks, this is exactly what I was asking for.

Obviously TRAPINT is getting screwed up somehow.  Unforunately, I
think the dmaage may have been done too early for this to tell us where.

> Invalid read of size 1
>    at 0x4838CC2: __strlen_sse2 (vg_replace_strmem.c:462)
>    by 0x1B0792: dupstring (string.c:39)
>    by 0x19BC70: ecgetstr (parse.c:2809)
>    by 0x144095: addvars (exec.c:2429)
>    by 0x1404DB: execsimple (exec.c:1237)
>    by 0x140A85: execlist (exec.c:1378)
>    by 0x14038F: execode (exec.c:1194)
>    by 0x14DCB0: runshfunc (exec.c:5980)
>    by 0x14D2E8: doshfunc (exec.c:5830)
>    by 0x1AF4D1: dotrapargs (signals.c:1371)
>    by 0x1AFA8F: dotrap (signals.c:1487)
>    by 0x1AF18C: handletrap (signals.c:1202)

This is saying it's trying to execute your trap.  It's getting into
trouble when it's trying to read in the variable assignment from the
trap.  Either that's the VIMODE="$VIINS" chunk that's been messed up,
or it's already got confused and is guessing what's going on.
I would suspect that actually the main function structure is still
there, since it's otherwise quite unlikely to negotiate the exec
hierarchy down to addvars().  However, it's possible it's also been
erroneously freed but malloc has only grabbed the assignment part of
it for reuse so far.

Does removing that assignment make a difference?  That's just for
testing, obviously.  But given the shell obviously is trying to do an
assignment and that's gone awol, it might tell us something.  (If, for
example, the error now occurs somewhere a bit later it might indicate
that indeed the entire fucntion is free and malloc() is repurposing the
memory piecemeal.)

>  Address 0x566b948 is 0 bytes after a block of size 328 free'd
>    at 0x48369AB: free (vg_replace_malloc.c:530)
>    by 0x13D8F3: zcontext_restore_partial (context.c:108)
>    by 0x13DA56: zcontext_restore (context.c:119)
>    by 0x175A04: parse_subscript (lex.c:1697)
>    by 0x18B7F1: getindex (params.c:1858)
>    by 0x18C132: fetchvalue (params.c:2106)
>    by 0x1B6304: paramsubst (subst.c:2516)
>    by 0x1B1DB9: stringsubst (subst.c:322)
>    by 0x1B1108: prefork (subst.c:142)
>    by 0x14486C: execsubst (exec.c:2570)
>    by 0x1772E9: execfor (loop.c:98)
>    by 0x148469: execcmd_exec (exec.c:3913)
>  Block was alloc'd at
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x13D5D6: zcontext_save_partial (context.c:58)
>    by 0x13D7E9: zcontext_save (context.c:82)
>    by 0x1758A7: parse_subscript (lex.c:1661)
>    by 0x18B7F1: getindex (params.c:1858)
>    by 0x18C132: fetchvalue (params.c:2106)
>    by 0x1B6304: paramsubst (subst.c:2516)
>    by 0x1B1DB9: stringsubst (subst.c:322)
>    by 0x1B1108: prefork (subst.c:142)
>    by 0x14486C: execsubst (exec.c:2570)
>    by 0x1772E9: execfor (loop.c:98)
>    by 0x148469: execcmd_exec (exec.c:3913)

So this stuff is saying, when we performed a substitution we had to save
and restore some memory and we used the chunk that valgrind reported the
error on.  In other words, it had apparently been freed somewhere else
already, so malloc() just grabbed it.  So I don't think the code being
executed here is actually relevant to the original problem, it's just
the unlucky victim that got a chunk that shouldn't have been freed in
the first place.

Unfortunately this doesn't tell us where that happened.  But it does
look like it was actually freed, i.e. the problem isn't something is
stomping on memory owned by something else, it's that the memory was
erroneously given back to the system.  (At least, that's the simple
interpretation.)

Not sure quite where to go from here --- but at least we have something
that's reproducible, which is quite good by the standards of memory
errors.  I think we'll need to add something to the code you're using
that marks the memory in the TRAPINT somehow.  I'll need to think what
seems propitious...

First simple step might be to see if the shell is indeed freeing the
TRAPINT() function code at some point.  That shouldn't be so hard to
find out but it'll need a bit of confection.

cheers
pws


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-27 13:46                   ` Daniel Shahaf
@ 2019-09-28 11:16                     ` Dennis Schwartz
  2019-09-28 14:29                       ` Daniel Shahaf
  2019-09-28 16:00                       ` Bart Schaefer
  0 siblings, 2 replies; 21+ messages in thread
From: Dennis Schwartz @ 2019-09-28 11:16 UTC (permalink / raw)
  To: Daniel Shahaf; +Cc: Peter Stephenson, zsh-users

On Thursday, September 26, 2019 2:48 PM, Dennis Schwartz <dennis.schwartz@protonmail.com> wrote:

> However, I can only reproduce the bug if I have the following code in my
> `~/.zshrc`:
>
> # Antigen zsh plugins
> if [ -f "/usr/share/zsh-antigen/antigen.zsh" ]; then
>     source "/usr/share/zsh-antigen/antigen.zsh"
>
>     # load some plugins here, but they are not relevant to trigger
>     # the bug
> fi
>
> So, I conditionally `source` another file. Apparently, this is causing
> super weird behavior. Unbelievably, if I open the file `.zshrc` (e.g.,
> vim/gedit) and save the file, I cannot trigger the bug. However, if I
> open the file, but do not save the file, I always trigger the bug.

Okay, so of course that didn't make any sense. Now I know that I can
trigger the bug if (at least) the following conditions have been met:

* On my system (Debian 10), I need to compile zsh with the version
  number from my default Debian installation. So I always do
  `git checkout zsh-5.7.1 -- Config/version.mk` before I compile.
* `.zshrc` needs to contain several function definitions, aliases,
  keybindings, or other configurations.
* `.zshrc` needs to contain a trap on interrupt.
* I suspect that `.zshrc` also needs to contain
  `source "/usr/share/zsh-antigen/antigen.zsh"` (I'm using 2.2.3-2 from
  Debian 10)
* `zsh` needs to be started twice.
  * The first time the bug cannot be triggered.
  * The second time the bug can be triggered by typing a character and
    then hitting TAB to autocomplete. Now hit Ctrl+C to interrupt. The
    bug is triggered.

I suspect that `.zshrc` is read and either zsh or antigen generates some
files based on the loaded configuration. That would explain why the bug
is only triggered after zsh has been executed at least once.

Unfortunately, I cannot easily generate a minimal `.zshrc` that triggers
the bug. If I remove a function definition of my `.zshrc` and replace it
by a bogus function I can trigger the bug based on the function
definition. I haven't found a clear pattern though. However, I found
that I could cause zsh to segfault using the following Python 3
generated `.zshrc`

>>> open('/home/USERNAME/.zshrc', 'w').write('function fun() { echo "' + 'a' * (1 << 24) + '" }\nTRAPINT() { print $1; return $(( 128 + $1 )) }\nsource "/usr/share/zsh-antigen/antigen.zsh"')

WARNING: This causes to crash zsh even if you replace your `.zshrc` with
a 'normal' file again. You have to first run `zsh -f`, afterwards, you
can start `zsh` again normally. I guess this again has to do with some
file begin automatically generated by zsh or antigen which needs to be
regenerated. Which file could this be? How can I easily see which files
get loaded on start-up? The file `~/.zcompdump` remains the same,
independent whether the bug can be triggered.


I have run

$ ./configure --enable-zsh-debug --enable-zsh-mem && make && sudo make install
$ valgrind --leak-check=full --log-file=zsh-valgrind.log /usr/local/bin/zsh

to capture the segfault. I cannot be sure that this is the same bug as
the one I experience with the TRAPINT function.

The log file (the memory addresses shift 0x10 bytes if I compile without
`--enable-zsh-mem`):

> Memcheck, a memory error detector
> Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
> Command: /usr/local/bin/zsh
> Parent PID: 10371
>
> Invalid read of size 1
>    at 0x4839565: __strncmp_sse42 (vg_replace_strmem.c:651)
>    by 0x14BD28: execfuncdef (exec.c:5286)
>    by 0x140669: execsimple (exec.c:1248)
>    by 0x140A75: execlist (exec.c:1378)
>    by 0x14037F: execode (exec.c:1194)
>    by 0x168151: source (init.c:1460)
>    by 0x168649: sourcehome (init.c:1536)
>    by 0x167D01: run_init_scripts (init.c:1340)
>    by 0x169224: zsh_main (init.c:1754)
>    by 0x11FD44: main (main.c:93)
>  Address 0x584f110 is not stack'd, malloc'd or (recently) free'd
>
>
> Process terminating with default action of signal 11 (SIGSEGV)
>  Access not within mapped region at address 0x584F110
>    at 0x4839565: __strncmp_sse42 (vg_replace_strmem.c:651)
>    by 0x14BD28: execfuncdef (exec.c:5286)
>    by 0x140669: execsimple (exec.c:1248)
>    by 0x140A75: execlist (exec.c:1378)
>    by 0x14037F: execode (exec.c:1194)
>    by 0x168151: source (init.c:1460)
>    by 0x168649: sourcehome (init.c:1536)
>    by 0x167D01: run_init_scripts (init.c:1340)
>    by 0x169224: zsh_main (init.c:1754)
>    by 0x11FD44: main (main.c:93)
>  If you believe this happened as a result of a stack
>  overflow in your program's main thread (unlikely but
>  possible), you can try to increase the size of the
>  main thread stack using the --main-stacksize= flag.
>  The main thread stack size used in this run was 8388608.
>
> HEAP SUMMARY:
>     in use at exit: 62,886 bytes in 919 blocks
>   total heap usage: 1,052 allocs, 133 frees, 105,358 bytes allocated
>
> 1 bytes in 1 blocks are definitely lost in loss record 4 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x1B232A: ztrdup (string.c:83)
>    by 0x16724E: setupvals (init.c:1062)
>    by 0x169210: zsh_main (init.c:1749)
>    by 0x11FD44: main (main.c:93)
>
> 2 bytes in 1 blocks are definitely lost in loss record 11 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x166AC3: init_term (init.c:805)
>    by 0x19564C: term_reinit_from_pm (params.c:4892)
>    by 0x1956A4: termsetfn (params.c:4912)
>    by 0x18ED1C: assignstrvalue (params.c:2532)
>    by 0x190C74: assignsparam (params.c:3144)
>    by 0x18A805: createparamtable (params.c:867)
>    by 0x167446: setupvals (init.c:1116)
>    by 0x169210: zsh_main (init.c:1749)
>    by 0x11FD44: main (main.c:93)
>
> 4 bytes in 1 blocks are definitely lost in loss record 22 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x1CA909: metafy (utils.c:4769)
>    by 0x1CAABE: ztrdup_metafy (utils.c:4826)
>    by 0x18A6E6: createparamtable (params.c:834)
>    by 0x167446: setupvals (init.c:1116)
>    by 0x169210: zsh_main (init.c:1749)
>    by 0x11FD44: main (main.c:93)
>
> 5 bytes in 1 blocks are definitely lost in loss record 26 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x1B232A: ztrdup (string.c:83)
>    by 0x166F14: setupvals (init.c:973)
>    by 0x169210: zsh_main (init.c:1749)
>    by 0x11FD44: main (main.c:93)
>
> 8 bytes in 1 blocks are definitely lost in loss record 62 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x195CA7: mkenvstr (params.c:5244)
>    by 0x18A862: createparamtable (params.c:871)
>    by 0x167446: setupvals (init.c:1116)
>    by 0x169210: zsh_main (init.c:1749)
>    by 0x11FD44: main (main.c:93)
>
> 9 bytes in 1 blocks are definitely lost in loss record 77 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x1B232A: ztrdup (string.c:83)
>    by 0x166F2E: setupvals (init.c:974)
>    by 0x169210: zsh_main (init.c:1749)
>    by 0x11FD44: main (main.c:93)
>
> 9 bytes in 1 blocks are definitely lost in loss record 78 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x1B232A: ztrdup (string.c:83)
>    by 0x166F48: setupvals (init.c:975)
>    by 0x169210: zsh_main (init.c:1749)
>    by 0x11FD44: main (main.c:93)
>
> 10 bytes in 1 blocks are definitely lost in loss record 95 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x1CA909: metafy (utils.c:4769)
>    by 0x1672C5: setupvals (init.c:1075)
>    by 0x169210: zsh_main (init.c:1749)
>    by 0x11FD44: main (main.c:93)
>
> 15 bytes in 1 blocks are definitely lost in loss record 128 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x1B232A: ztrdup (string.c:83)
>    by 0x166F62: setupvals (init.c:976)
>    by 0x169210: zsh_main (init.c:1749)
>    by 0x11FD44: main (main.c:93)
>
> 16 bytes in 1 blocks are definitely lost in loss record 134 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x17CC01: pushheap (mem.c:304)
>    by 0x18A6FA: createparamtable (params.c:848)
>    by 0x167446: setupvals (init.c:1116)
>    by 0x169210: zsh_main (init.c:1749)
>    by 0x11FD44: main (main.c:93)
>
> 68 (56 direct, 12 indirect) bytes in 1 blocks are definitely lost in loss record 263 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x17EA5E: zshcalloc (mem.c:979)
>    by 0x183E4A: load_module (module.c:2219)
>    by 0x167C3B: run_init_scripts (init.c:1318)
>    by 0x169224: zsh_main (init.c:1754)
>    by 0x11FD44: main (main.c:93)
>
> 81 bytes in 2 blocks are definitely lost in loss record 290 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x1B232A: ztrdup (string.c:83)
>    by 0x18A87E: createparamtable (params.c:874)
>    by 0x167446: setupvals (init.c:1116)
>    by 0x169210: zsh_main (init.c:1749)
>    by 0x11FD44: main (main.c:93)
>
> 112 bytes in 4 blocks are definitely lost in loss record 299 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x1CA909: metafy (utils.c:4769)
>    by 0x18A7EB: createparamtable (params.c:867)
>    by 0x167446: setupvals (init.c:1116)
>    by 0x169210: zsh_main (init.c:1749)
>    by 0x11FD44: main (main.c:93)
>
> 256 bytes in 1 blocks are definitely lost in loss record 330 of 360
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x17E8C8: zalloc (mem.c:966)
>    by 0x18A675: createparamtable (params.c:829)
>    by 0x167446: setupvals (init.c:1116)
>    by 0x169210: zsh_main (init.c:1749)
>    by 0x11FD44: main (main.c:93)
>
> LEAK SUMMARY:
>    definitely lost: 584 bytes in 18 blocks
>    indirectly lost: 12 bytes in 1 blocks
>      possibly lost: 0 bytes in 0 blocks
>    still reachable: 62,290 bytes in 900 blocks
>         suppressed: 0 bytes in 0 blocks
> Reachable blocks (those to which a pointer was found) are not shown.
> To see them, rerun with: --leak-check=full --show-leak-kinds=all
>
> For counts of detected and suppressed errors, rerun with: -v
> ERROR SUMMARY: 15 errors from 15 contexts (suppressed: 0 from 0)



On Friday, September 27, 2019 1:46 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:

> Dennis Schwartz wrote on Thu, 26 Sep 2019 17:10 +00:00:
>
> > $ valgrind --leak-check=full --log-file=zsh-valgrind.log /usr/local/bin/zsh
> > /usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n'
> > $ ll
>
> What's the output of `dpkg -l zsh-antigen`? (I'm looking for the version number.)

Good point. Debian 10 (buster) ships 2.2.3-2, which I'm running.
I believe the bug is triggered in zsh by using this newer version
(Debian 9 ships 1.3.4-1). If I compile and run zsh 5.3.1 (shipped with
Debian 9, where I did not encountered this issue) on with `zsh-antigen`
from Debian 10, I can also trigger the bug.


On Friday, September 27, 2019 7:05 PM, Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:

> Thanks, this is exactly what I was asking for.

Thanks for quite extensively explaining what's going on!

> Does removing that assignment make a difference?

No, the bug triggers for any TRAPINT function I've tried so far.


I have the feeling we getting closer to the root cause of the bug.

Cheers,
Dennis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-28 11:16                     ` Dennis Schwartz
@ 2019-09-28 14:29                       ` Daniel Shahaf
  2019-09-28 18:21                         ` Dennis Schwartz
  2019-09-28 16:00                       ` Bart Schaefer
  1 sibling, 1 reply; 21+ messages in thread
From: Daniel Shahaf @ 2019-09-28 14:29 UTC (permalink / raw)
  To: Dennis Schwartz; +Cc: Peter Stephenson, zsh-users

Dennis Schwartz wrote on Sat, Sep 28, 2019 at 11:16:08 +0000:
> I have run
> 
> $ ./configure --enable-zsh-debug --enable-zsh-mem && make && sudo make install

Please run «make check» in there as well, on general principles.

> On Friday, September 27, 2019 1:46 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> 
> > Dennis Schwartz wrote on Thu, 26 Sep 2019 17:10 +00:00:
> >
> > > $ valgrind --leak-check=full --log-file=zsh-valgrind.log /usr/local/bin/zsh
> > > /usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n'
> > > $ ll
> >
> > What's the output of `dpkg -l zsh-antigen`? (I'm looking for the version number.)
> 
> Good point. Debian 10 (buster) ships 2.2.3-2, which I'm running.
> I believe the bug is triggered in zsh by using this newer version
> (Debian 9 ships 1.3.4-1). If I compile and run zsh 5.3.1 (shipped with
> Debian 9, where I did not encountered this issue) on with `zsh-antigen`
> from Debian 10, I can also trigger the bug.

Okay, so we have another angle on this: we could try to bisect antigen,
either temporally (between 1.3.4-1 in Debian stretch and 2.2.3-2 in
Debian buster) or spatially (taking the 2.2.3-2 version and deleting
half of its antigen.zsh file at a time, lather, rinse, repeat).

Would it be worthwhile to try and find a minimal example for the parse
error?  I don't know whether it's likely to be related to the memory
bug.

> > Does removing that assignment make a difference?
> 
> No, the bug triggers for any TRAPINT function I've tried so far.

Have you tried an empty function, «TRAPINT () {}»?

Is there any reason to also try a «trap ':' INT»?

Cheers,

Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-28 11:16                     ` Dennis Schwartz
  2019-09-28 14:29                       ` Daniel Shahaf
@ 2019-09-28 16:00                       ` Bart Schaefer
  2019-09-29 16:54                         ` Peter Stephenson
  1 sibling, 1 reply; 21+ messages in thread
From: Bart Schaefer @ 2019-09-28 16:00 UTC (permalink / raw)
  To: Dennis Schwartz; +Cc: Daniel Shahaf, Peter Stephenson, zsh-users

On Sat, Sep 28, 2019 at 4:17 AM Dennis Schwartz
<dennis.schwartz@protonmail.com> wrote:
>
> * On my system (Debian 10), I need to compile zsh with the version
>   number from my default Debian installation. So I always do
>   `git checkout zsh-5.7.1 -- Config/version.mk` before I compile.

So, you should definitely STOP doing that.  It's only creating confusion.

The version number from config.mk determines three things:
1. The function load path
2. The compiled module load path
3. The format of "compiled" function definitions from .zwc files

and as a corollary to #3, whether zsh will load the .zwc file at all,
because it compares a version number embedded in the file to to
version number of the compiled zsh.

If you compile version X.Y.Z of zsh with the version.mk from version
P.D.Q, particularly on a host where P.D.Q was previously (or is
currently) installed, you are extremely likely to either be linking
with an incompatible shared object file, or loading a .zwc file whose
bytecode is garbage to the internals of your newly compiled binary.
Either of those things could be causing the crashes you are seeing, or
cause valgrind to generate results that have no real relationship to
the original problem.

This part --

> * `zsh` needs to be started twice.
>   * The first time the bug cannot be triggered.
>   * The second time the bug can be triggered by typing a character and
>     then hitting TAB to autocomplete. Now hit Ctrl+C to interrupt.

-- suggests very strongly that this is related to loading an incorrect
version of a compiled function as a result of the .zcompdump file
having been updated, or some similar automatic configuration update,
probably (as you suggest) being performed by antigen.

To get anywhere with this, we need a zsh that is compiled entirely
consistently, not with bits an pieces of different versions.  Either
check out the entire git revision matching your OS version, not just
the version number file, or run the entire test with the most recent
version, including the correct version.mk for that build.

I would also suggest that you go back to the configuration where you
first observed the problem (i.e., do NOT use a custom-compiled binary)
and start zsh with
  zsh -o sourcetrace
which will show you where all the configuration files are being found.
You can then compare that to "zsh -o sourcetrace" from your newly
compiled binary to determine which files are the same and which are
different in the event that the bug behavior changes with the new
build.

If after correcting the build process you STILL observe that zsh must
be started twice, comparing sourcetrace from the first and second runs
may also be informative.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-28 14:29                       ` Daniel Shahaf
@ 2019-09-28 18:21                         ` Dennis Schwartz
  2019-09-28 18:58                           ` Dennis Schwartz
  0 siblings, 1 reply; 21+ messages in thread
From: Dennis Schwartz @ 2019-09-28 18:21 UTC (permalink / raw)
  To: Daniel Shahaf, zsh-users

On Saturday, September 28, 2019 2:29 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:

> Okay, so we have another angle on this: we could try to bisect antigen,
> either temporally (between 1.3.4-1 in Debian stretch and 2.2.3-2 in
> Debian buster) or spatially (taking the 2.2.3-2 version and deleting
> half of its antigen.zsh file at a time, lather, rinse, repeat).

Hmm, it seems I'm having difficulties to determine the exact conditions
when this bug occurs. The commands

$ rm -Rf ~/.antigen && git checkout v1.3.4 && make

also triggers the bug using /usr/bin/zsh. I also updated my .zshrc to
source the correct `bin/antigen.zsh` and also run `antigen apply`.

> > > Does removing that assignment make a difference?
> >
> > No, the bug triggers for any TRAPINT function I've tried so far.
>
> Have you tried an empty function, «TRAPINT () {}»?
>
> Is there any reason to also try a «trap ':' INT»?

Both do nothing. It seems like TRAPINT needs to contain at least one
command or a return statement.


On Saturday, September 28, 2019 4:00 PM, Bart Schaefer <schaefer@brasslantern.com> wrote:

> On Sat, Sep 28, 2019 at 4:17 AM Dennis Schwartz
> dennis.schwartz@protonmail.com wrote:
>
> > -   On my system (Debian 10), I need to compile zsh with the version
> >     number from my default Debian installation. So I always do
> >     `git checkout zsh-5.7.1 -- Config/version.mk` before I compile.
> >
>
> So, you should definitely STOP doing that. It's only creating confusion.

Okay, I see why I should avoid doing this. The consequence is that I can
only debug version 5.7.1, either compiled myself (so optionally with
debugging flags set) or using the Debian shipped version.

> I would also suggest that you go back to the configuration where you
> first observed the problem (i.e., do NOT use a custom-compiled binary)
> and start zsh with
>  zsh -o sourcetrace
> which will show you where all the configuration files are being found.
> You can then compare that to "zsh -o sourcetrace" from your newly
> compiled binary to determine which files are the same and which are
> different in the event that the bug behavior changes with the new
> build.

Thanks! That was exactly the command I was looking for. If I return to
my initial setup (i.e., using Debian's shipped zsh and antigen + my
original .zshrc) I get:

$ whence zsh
/usr/bin/zsh
$ touch .zshrc
$ cp .zcompdump zcompdum-pre
$ cp -R .antigen antigen-pre
$ zsh -o sourcetrace
+/etc/zsh/zshenv:1> <sourcetrace>
+/etc/zsh/zshrc:1> <sourcetrace>
+/home/USERNAME/.zshrc:1> <sourcetrace>
+/home/USERNAME/.zcompdump:1> <sourcetrace>
+/usr/share/zsh-antigen/antigen.zsh:1> <sourcetrace>
+/home/USERNAME/.antigen/.zcompdump:1> <sourcetrace>
+/home/USERNAME/.antigen/init.zsh:1> <sourcetrace>
+/home/USERNAME/.antigen/.zcompdump:1> <sourcetrace>
$ ll 2      <---- indicating the correct behavior of TRAPINT

$ exit
$ cp .zcompdump zcompdum-mid
$ cp -R .antigen antigen-mid
$ zsh -o sourcetrace
+/etc/zsh/zshenv:1> <sourcetrace>
+/etc/zsh/zshrc:1> <sourcetrace>
+/home/USERNAME/.zshrc:1> <sourcetrace>
+/home/USERNAME/.zcompdump:1> <sourcetrace>
+/usr/share/zsh-antigen/antigen.zsh:1> <sourcetrace>
+/home/USERNAME/.antigen/init.zsh:1> <sourcetrace>
+/home/USERNAME/.antigen/.zcompdump:1> <sourcetrace>
$ ll
TRAPINT:1: not an identifier: F\M-|N)U
$ exit
$ cp .zcompdump zcompdum-post
$ cp -R .antigen antigen-post
$ sha1sum antigen-{pre,mid,post}/**/*
sha1sum: antigen-pre/bundles: Is a directory
ef142d6575f491caf15f643c90abec9809138eff  antigen-pre/debug.log
dbb5aba046583b7e5bd18ff482022e7eb57db6d1  antigen-pre/init.zsh
6a554ac7275ad58b87a484e9be26961ba7bc3bb6  antigen-pre/init.zsh.zwc
sha1sum: antigen-mid/bundles: Is a directory
ab8687454b49cc6d3c55e5d925596cddcbadc342  antigen-mid/debug.log
7f66a86e7d6d7b6847ccd8df0852db90ea40a5af  antigen-mid/init.zsh
6a554ac7275ad58b87a484e9be26961ba7bc3bb6  antigen-mid/init.zsh.zwc
sha1sum: antigen-post/bundles: Is a directory
ab8687454b49cc6d3c55e5d925596cddcbadc342  antigen-post/debug.log
7f66a86e7d6d7b6847ccd8df0852db90ea40a5af  antigen-post/init.zsh
6a554ac7275ad58b87a484e9be26961ba7bc3bb6  antigen-post/init.zsh.zwc
$ sha1sum zcompdum-{pre,mid,post}
2e658e3f3c3c21bec98fedea8390cffd8fdab15e  zcompdum-pre
2e658e3f3c3c21bec98fedea8390cffd8fdab15e  zcompdum-mid
2e658e3f3c3c21bec98fedea8390cffd8fdab15e  zcompdum-post


The only difference between `debug.log` is that `pre` contains less
entries. For `init.zsh`, the only difference is a timestamp in a
comment. I'm a bit lost by these results.


Using the original setup, I can also reproduce the segfault quite
easily:

$ python3
Python 3.7.3 (default, Apr  3 2019, 05:39:12)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> open('/home/USERNAME/.zshrc', 'w').write('function fun() { echo "' + 'a' * (1 << 24) + '" }\nsource "/usr/share/zsh-antigen/antigen.zsh"\nantigen apply')
16777300
>>>
$ rm -Rf .antigen; rm .zcompdump
$ zsh -o sourcetrace
+/etc/zsh/zshenv:1> <sourcetrace>
+/etc/zsh/zshrc:1> <sourcetrace>
+/home/USERNAME/.zshrc:1> <sourcetrace>
+/usr/share/zsh-antigen/antigen.zsh:1> <sourcetrace>
+/home/USERNAME/.antigen/init.zsh:1> <sourcetrace>
+/home/USERNAME/.antigen/.zcompdump:1> <sourcetrace>
% exit
$ zsh -o sourcetrace
+/etc/zsh/zshenv:1> <sourcetrace>
+/etc/zsh/zshrc:1> <sourcetrace>
zsh: segmentation fault  zsh -o sourcetrace

Running `cp ~/zshrc-good ~/.zshrc` fixed it again (no need for
`zsh -f`).


I spent several hours on trying to debug this issue again today. This of
course pale into insignificance compared to your time developing zsh
(thanks again!), but I hope you understand that I can only spent some
more time again next weekend. Feel free to ask me to try something to
help debugging though!

Cheers,
Dennis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-28 18:21                         ` Dennis Schwartz
@ 2019-09-28 18:58                           ` Dennis Schwartz
  0 siblings, 0 replies; 21+ messages in thread
From: Dennis Schwartz @ 2019-09-28 18:58 UTC (permalink / raw)
  To: Daniel Shahaf, zsh-users

On Saturday, September 28, 2019 6:21 PM, Dennis Schwartz <dennis.schwartz@protonmail.com> wrote:

> On Saturday, September 28, 2019 2:29 PM, Daniel Shahaf d.s@daniel.shahaf.name wrote:
>
> > > > Does removing that assignment make a difference?
> > >
> > > No, the bug triggers for any TRAPINT function I've tried so far.
> >
> > Have you tried an empty function, «TRAPINT () {}»?
> > Is there any reason to also try a «trap ':' INT»?
>
> Both do nothing. It seems like TRAPINT needs to contain at least one
> command or a return statement.

Sorry, I only now realize that `trap ':' INT` actually overcomes the
problem.

I can now set

    trap 'VIMODE="$VIINS"; return 130' INT

and that actually doesn't trigger the bug. Neither does

    mytrap () {
        VIMODE="$VIINS"
        return 130
    }
    trap mytrap INT

but that doesn't change the VIMODE variable on my prompt nor
returns 130. (In both cases, $1 is always empty.)

Cheers,
Dennis

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: TRAPINT doesn't work reliably
  2019-09-28 16:00                       ` Bart Schaefer
@ 2019-09-29 16:54                         ` Peter Stephenson
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Stephenson @ 2019-09-29 16:54 UTC (permalink / raw)
  To: zsh-users

On Sat, 2019-09-28 at 09:00 -0700, Bart Schaefer wrote:
> On Sat, Sep 28, 2019 at 4:17 AM Dennis Schwartz
> <dennis.schwartz@protonmail.com> wrote:
> > 
> > * On my system (Debian 10), I need to compile zsh with the version
> >   number from my default Debian installation. So I always do
> >   `git checkout zsh-5.7.1 -- Config/version.mk` before I compile.
> 
> So, you should definitely STOP doing that.  It's only creating confusion.

This is an important point I missed --- if you're using wordcode
compiled files this will be a disaster, as Bart notes.

Also, if you're using wordcode compiled files, that could be a reason
why using the trap command behaves differently from defining a TRAPINT
function --- although the actual behaviour depends on the utilities
you're using, it's much more typically to save functions that way than
traps defined the other way.

pws


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, back to index

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20190917164905epcas1p4ad458ffcd504501780d522880c81de3e@epcas1p4.samsung.com>
2019-09-17 16:47 ` TRAPINT doesn't work reliably Dennis Schwartz
2019-09-24  8:44   ` Peter Stephenson
2019-09-25 13:02     ` Dennis Schwartz
2019-09-25 14:01       ` Peter Stephenson
2019-09-25 16:25         ` Dennis Schwartz
2019-09-25 17:04           ` Peter Stephenson
2019-09-25 18:46             ` Daniel Shahaf
2019-09-26 15:27               ` Peter Stephenson
2019-09-27 13:43                 ` Daniel Shahaf
2019-09-25 17:56           ` Peter Stephenson
2019-09-26 14:48             ` Dennis Schwartz
2019-09-26 15:25               ` Peter Stephenson
2019-09-26 17:10                 ` Dennis Schwartz
2019-09-27 13:46                   ` Daniel Shahaf
2019-09-28 11:16                     ` Dennis Schwartz
2019-09-28 14:29                       ` Daniel Shahaf
2019-09-28 18:21                         ` Dennis Schwartz
2019-09-28 18:58                           ` Dennis Schwartz
2019-09-28 16:00                       ` Bart Schaefer
2019-09-29 16:54                         ` Peter Stephenson
2019-09-27 19:05                   ` Peter Stephenson

zsh-users

Archives are clonable: git clone --mirror http://inbox.vuxu.org/zsh-users

Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.zsh.users


AGPL code for this site: git clone https://public-inbox.org/ public-inbox