Complicated segfault regression

zsh-workers
 help / color / mirror / code / Atom feed

* Complicated segfault regression
@ 2015-07-02 21:36 Martijn Dekker
  2015-07-03  3:28 ` Bart Schaefer
  2015-07-03 11:35 ` Peter Stephenson
  0 siblings, 2 replies; 10+ messages in thread
From: Martijn Dekker @ 2015-07-02 21:36 UTC (permalink / raw)
  To: zsh-workers

Just for the hell of it I decided to make zsh 5.0.8 run a GNU
'configure' script (in sh mode) to see how far it would come with that
incredibly convoluted code. The result was a segfault. So I tried to
find the code that triggers it.

The following code block makes zsh 5.0.8 and current git zsh segfault
reliably. It's now completely non-operational because I've tried to
reduce it to the minimum necessary to make zsh crash.

The strange thing is, almost *any* change in the crashing code block
below will make it fail to produce a segfault. The 'if' is necessary.
All three clauses in the case statement are necessary. All the extension
tests in the first clause are necessary. The : with the parameter in
that form, with the variable, is necessary (this used to be a 'rm').
Even the { : ; } at the end is necessary. Change any of those and it
fails to segfault.

It doesn't seem to make a difference whether 'emulate sh' is used or
not, though.

Oh, and zsh 4.2.7 and 4.3.11 are fine.

I can't seem to narrow this down any further, but I figured it was worth
reporting anyway.

- M.

#! /usr/bin/env zsh
emulate sh
set -x

# --- begin crashing code block ---

if true; then
for ac_file in somestuff; do
  case $ac_file in
    *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb \
    | *.bbg | *.map | *.inf | *.dSYM | *.o | *.obj ) ;;
    *.* ) break;;
    * ) break;;
  esac
  break
done
fi
: conftest$ac_cv_exeext
{ : ; }

# --- end crashing code block ---

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Complicated segfault regression
  2015-07-02 21:36 Complicated segfault regression Martijn Dekker
@ 2015-07-03  3:28 ` Bart Schaefer
  2015-07-03 11:35 ` Peter Stephenson
  1 sibling, 0 replies; 10+ messages in thread
From: Bart Schaefer @ 2015-07-03  3:28 UTC (permalink / raw)
  To: zsh-workers

On Jul 2, 11:36pm, Martijn Dekker wrote:
}
} The following code block makes zsh 5.0.8 and current git zsh segfault
} reliably. It's now completely non-operational because I've tried to
} reduce it to the minimum necessary to make zsh crash.

So I tried executing this code from the shell prompt instead of as as
script, with debugging output enabled.  Following the "fi" I get:

Src/text.c:970: unknown word code in gettext2()

If I'm in GDB, I then immediately get a segfault on the following ":"
command.  I don't expect this stack trace is much help because the
problem has already happened during parsing the preceding block, but:

#0  0x008f228b in strlen () from /lib/tls/libc.so.6
#1  0x080c5df4 in taddstr (s=0xd5f29f7a <Address 0xd5f29f7a out of bounds>)
    at ../../zsh-5.0/Src/text.c:125
#2  0x080c5ff4 in taddassign (code=1600348549, state=0xbfe831d0, typeset=0)
    at ../../zsh-5.0/Src/text.c:181
#3  0x080c69a6 in gettext2 (state=0xbfe831d0) at ../../zsh-5.0/Src/text.c:484
#4  0x080c63d4 in getjobtext (prog=0xb7d945d8, c=0xb7d9460c)
    at ../../zsh-5.0/Src/text.c:314

If NOT in GDB, I don't get a crash, instead I get an infinite loop printing

Src/text.c:49: attempting to decrement tindent below zero

I can remove the "if" test and have only the "for ... case" and still
get this:

Src/text.c:970: unknown word code in gettext2()

And if I add the opening parens to the case patterns, I get an infinite
loop in gettext2() [never breaks out of the while(1) loop at line 396].

(s = tstack) is always true at 398; s->pop is always false at 400;
wc_code(code) is always WC_END passing through line 967 which sets
stack = 1 and we repeat at line 397 forever.

So this has to have something to do with the updated parsing of (x|y)
expressions in cond statements.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Complicated segfault regression
  2015-07-02 21:36 Complicated segfault regression Martijn Dekker
  2015-07-03  3:28 ` Bart Schaefer
@ 2015-07-03 11:35 ` Peter Stephenson
  2015-07-03 12:29   ` Martijn Dekker
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Stephenson @ 2015-07-03 11:35 UTC (permalink / raw)
  To: zsh-workers

On Thu, 2 Jul 2015 23:36:56 +0200
Martijn Dekker <martijn@inlv.org> wrote:
> Just for the hell of it I decided to make zsh 5.0.8 run a GNU
> 'configure' script (in sh mode) to see how far it would come with that
> incredibly convoluted code. The result was a segfault. So I tried to
> find the code that triggers it.

Thanks for trying to find this.  Based on Bart's clue that the text
representation is going wrong at some point to do with "case" , I've
narrowed it down a bit further... The following file, crash4.zsh

# --- begin crashing code block ---
fn() {
  case $ac_file in
    *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb \
    | *.bbg | *.map | *.inf | *.dSYM | *.o | *.obj ) ;;
    *.* ) break;;
    * ) break;;
  esac
}
# --- end crashing code block ---

when sourced produces the following output for "which fn"
(I've used "-x 2", only recently made available):

(N.B. there's a long line I haven't attempted to wrap myself...)

fn () {
  case $ac_file in
    (*.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.dSYM | *.o | *.obj)  ;;
    (*.*) break
}

which has evidently gone haywiere (though this version wasn't
particularly crash prone), while the slightly further simplified
crash5.zsh:

# --- begin crashing code block ---
fn() {
  case $ac_file in
    *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb ) ;;
    *.* ) break;;
    * ) break;;
  esac
}
# --- end crashing code block ---

doesn't show that effect:

fn () {
  case $ac_file in
    (*.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb)  ;;
    (*.*) break ;;
    (*) break ;;
  esac
}

So the best guess so far is it's do with memory management of the long
case with the continuation line.

pws


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Complicated segfault regression
  2015-07-03 11:35 ` Peter Stephenson
@ 2015-07-03 12:29   ` Martijn Dekker
  2015-07-03 13:44     ` Peter Stephenson
  0 siblings, 1 reply; 10+ messages in thread
From: Martijn Dekker @ 2015-07-03 12:29 UTC (permalink / raw)
  To: zsh-workers

Peter Stephenson schreef op 03-07-15 om 13:35:
> So the best guess so far is it's do with memory management of the long
> case with the continuation line.

FYI, the original crashing code didn't have the continuation line. I
added that to prevent spurious wrapping by email programs. It seemed to
make no difference at all to the symptoms.

- M.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Complicated segfault regression
  2015-07-03 12:29   ` Martijn Dekker
@ 2015-07-03 13:44     ` Peter Stephenson
  2015-07-03 17:52       ` Martijn Dekker
  2015-07-03 22:11       ` Peter Stephenson
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Stephenson @ 2015-07-03 13:44 UTC (permalink / raw)
  To: zsh-workers

On Fri, 3 Jul 2015 14:29:28 +0200
Martijn Dekker <martijn@inlv.org> wrote:
> Peter Stephenson schreef op 03-07-15 om 13:35:
> > So the best guess so far is it's do with memory management of the long
> > case with the continuation line.
> 
> FYI, the original crashing code didn't have the continuation line. I
> added that to prevent spurious wrapping by email programs. It seemed to
> make no difference at all to the symptoms.

Yes, it's not the wrap, it's the length that's contributing (but not
decisive on its own).  However, I can't see any sign of problems in
the parsing code, and running the function doesn't fail in the same
castastrophic way.  In fact, nobbling gettext2() entirely, the following

# --- begin crashing code block ---
fn() {
  ac_file="the else branch"
  case $ac_file in
    *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.dSYM | *.o | *.obj ) ;;
    *.* ) break;;
    *)
    ;;
  esac
  print Stuff here
}
# --- end crashing code block ---

runs to conclusion, printing the message, while the "which" output is
still gibberish.  So there's something at least different about the text
handling compared with execution that needs tracking down.

Unfortunately I wasn't able to get a "case" block both which looks
funny and which has commands after the point where the output is bad, so
this isn't conclusive.

pws

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Complicated segfault regression
  2015-07-03 13:44     ` Peter Stephenson
@ 2015-07-03 17:52       ` Martijn Dekker
  2015-07-03 22:11       ` Peter Stephenson
  1 sibling, 0 replies; 10+ messages in thread
From: Martijn Dekker @ 2015-07-03 17:52 UTC (permalink / raw)
  To: zsh-workers

Peter Stephenson schreef op 03-07-15 om 15:44:
> runs to conclusion, printing the message, while the "which" output is
> still gibberish.
[...]

After defining that function, running 'which fn' twice results either in
a segfault or in a lockup that needs SIGKILL to get out of it. It seems
to be unpredictable which happens.

- M.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Complicated segfault regression
  2015-07-03 13:44     ` Peter Stephenson
  2015-07-03 17:52       ` Martijn Dekker
@ 2015-07-03 22:11       ` Peter Stephenson
  2015-07-04 10:03         ` Peter Stephenson
  2015-07-05 11:06         ` Peter Stephenson
  1 sibling, 2 replies; 10+ messages in thread
From: Peter Stephenson @ 2015-07-03 22:11 UTC (permalink / raw)
  To: zsh-workers

On Fri, 03 Jul 2015 14:44:48 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> Yes, it's not the wrap, it's the length that's contributing (but not
> decisive on its own).  However, I can't see any sign of problems in
> the parsing code, and running the function doesn't fail in the same
> castastrophic way.  In fact, nobbling gettext2() entirely, the following
> 
> # --- begin crashing code block ---
> fn() {
>   ac_file="the else branch"
>   case $ac_file in
>     *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.dSYM | *.o | *.obj ) ;;
>     *.* ) break;;
>     *)
>     ;;
>   esac
>   print Stuff here
> }
> # --- end crashing code block ---
> 
> runs to conclusion, printing the message, while the "which" output is
> still gibberish.  So there's something at least different about the text
> handling compared with execution that needs tracking down.

I'm horribly out of my depth here, but how about this...

n->pop has no dcoumentation whatsoever, but the calculation associated
with it in 5.0.7 is to do with how much state->pc gets incremented for
each case pattern set, not coutning the string.  Why?  No idea.  The
calculation has changed, and now looks like the following.

pws

diff --git a/Src/text.c b/Src/text.c
index 3287c54..d63141b 100644
--- a/Src/text.c
+++ b/Src/text.c
@@ -681,7 +681,7 @@ gettext2(Estate state)
 	case WC_CASE:
 	    if (!s) {
 		Wordcode end = state->pc + WC_CASE_SKIP(code);
-		wordcode nalts;
+		wordcode nalts, ialts;
 
 		taddstr("case ");
 		taddstr(ecgetstr(state, EC_NODUP, NULL));
@@ -702,21 +702,22 @@ gettext2(Estate state)
 			taddchr(' ');
 		    taddstr("(");
 		    code = *state->pc++;
-		    nalts = *state->pc++;
-		    while (nalts--) {
+		    nalts = ialts = *state->pc++;
+		    while (ialts--) {
 			taddstr(ecgetstr(state, EC_NODUP, NULL));
 			state->pc++;
-			if (nalts)
+			if (ialts)
 			    taddstr(" | ");
 		    }
 		    taddstr(") ");
 		    tindent++;
 		    n = tpush(code, 0);
 		    n->u._case.end = end;
-		    n->pop = (state->pc - 2 + WC_CASE_SKIP(code) >= end);
+		    n->pop = (state->pc - 2 - nalts + WC_CASE_SKIP(code)
+			      >= end);
 		}
 	    } else if (state->pc < s->u._case.end) {
-		wordcode nalts;
+		wordcode nalts, ialts;
 		dec_tindent();
 		switch (WC_CASE_TYPE(code)) {
 		case WC_CASE_OR:
@@ -737,17 +738,17 @@ gettext2(Estate state)
 		    taddchr(' ');
 		taddstr("(");
 		code = *state->pc++;
-		nalts = *state->pc++;
-		while (nalts--) {
+		nalts = ialts = *state->pc++;
+		while (ialts--) {
 		    taddstr(ecgetstr(state, EC_NODUP, NULL));
 		    state->pc++;
-		    if (nalts)
+		    if (ialts)
 			taddstr(" | ");
 		}
 		taddstr(") ");
 		tindent++;
 		s->code = code;
-		s->pop = ((state->pc - 2 + WC_CASE_SKIP(code)) >=
+		s->pop = ((state->pc - 2 - nalts + WC_CASE_SKIP(code)) >=
 			  s->u._case.end);
 	    } else {
 		dec_tindent();


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Complicated segfault regression
  2015-07-03 22:11       ` Peter Stephenson
@ 2015-07-04 10:03         ` Peter Stephenson
  2015-07-05 11:06         ` Peter Stephenson
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Stephenson @ 2015-07-04 10:03 UTC (permalink / raw)
  To: zsh-workers

On Fri, 3 Jul 2015 23:11:59 +0100
Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> I'm horribly out of my depth here, but how about this...

I've committed this with a test.  I'll try to see if I can work out what it
means and add comments, assuming no one else has any ideas.

pws


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Complicated segfault regression
  2015-07-03 22:11       ` Peter Stephenson
  2015-07-04 10:03         ` Peter Stephenson
@ 2015-07-05 11:06         ` Peter Stephenson
  2015-07-05 11:15           ` Peter Stephenson
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Stephenson @ 2015-07-05 11:06 UTC (permalink / raw)
  To: zsh-workers

On Fri, 3 Jul 2015 23:11:59 +0100
Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> n->pop has no dcoumentation whatsoever, but the calculation associated
> with it in 5.0.7 is to do with how much state->pc gets incremented for
> each case pattern set, not coutning the string.  Why?  No idea.  The
> calculation has changed, and now looks like the following.

I've come to the conclusion this has always been wrong, or at least
confusingly inaccurate.  I think it sort-of worked because the
comparison landed in the right target area, just not at the right point.

Anyway, if the code is supposed to read like the following, which still
passes the new test, I now understand it better.  (And if it had read
like this before, which should always have worked, I wouldn't have had
to fiddle with it at all...)

Martijn's test is probably the best overall for this.

n->pop isn't really anything to do with the structure of the case block,
it's just saying something like "we've got to the end of this chunk, so
it's time to go back up the parse stack to find the next piece of code".
So this just uses the standard logic for "the end of this chunk" instead
of recalculating it wrongly from the point we happen to have reached.

pws

diff --git a/Src/text.c b/Src/text.c
index d63141b..cf73004 100644
--- a/Src/text.c
+++ b/Src/text.c
@@ -681,7 +681,7 @@ gettext2(Estate state)
 	case WC_CASE:
 	    if (!s) {
 		Wordcode end = state->pc + WC_CASE_SKIP(code);
-		wordcode nalts, ialts;
+		wordcode ialts;
 
 		taddstr("case ");
 		taddstr(ecgetstr(state, EC_NODUP, NULL));
@@ -695,6 +695,7 @@ gettext2(Estate state)
 		    taddstr("esac");
 		    stack = 1;
 		} else {
+		    Wordcode prev_pc;
 		    tindent++;
 		    if (tnewlins)
 			taddnl(0);
@@ -702,7 +703,8 @@ gettext2(Estate state)
 			taddchr(' ');
 		    taddstr("(");
 		    code = *state->pc++;
-		    nalts = ialts = *state->pc++;
+		    prev_pc = state->pc++;
+		    ialts = *prev_pc;
 		    while (ialts--) {
 			taddstr(ecgetstr(state, EC_NODUP, NULL));
 			state->pc++;
@@ -713,11 +715,11 @@ gettext2(Estate state)
 		    tindent++;
 		    n = tpush(code, 0);
 		    n->u._case.end = end;
-		    n->pop = (state->pc - 2 - nalts + WC_CASE_SKIP(code)
-			      >= end);
+		    n->pop = (prev_pc + WC_CASE_SKIP(code) >= end);
 		}
 	    } else if (state->pc < s->u._case.end) {
-		wordcode nalts, ialts;
+		Wordcode prev_pc;
+		wordcode ialts;
 		dec_tindent();
 		switch (WC_CASE_TYPE(code)) {
 		case WC_CASE_OR:
@@ -738,7 +740,8 @@ gettext2(Estate state)
 		    taddchr(' ');
 		taddstr("(");
 		code = *state->pc++;
-		nalts = ialts = *state->pc++;
+		prev_pc = state->pc++;
+		ialts = *prev_pc;
 		while (ialts--) {
 		    taddstr(ecgetstr(state, EC_NODUP, NULL));
 		    state->pc++;
@@ -748,7 +751,7 @@ gettext2(Estate state)
 		taddstr(") ");
 		tindent++;
 		s->code = code;
-		s->pop = ((state->pc - 2 - nalts + WC_CASE_SKIP(code)) >=
+		s->pop = (prev_pc + WC_CASE_SKIP(code) >=
 			  s->u._case.end);
 	    } else {
 		dec_tindent();


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Complicated segfault regression
  2015-07-05 11:06         ` Peter Stephenson
@ 2015-07-05 11:15           ` Peter Stephenson
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Stephenson @ 2015-07-05 11:15 UTC (permalink / raw)
  To: zsh-workers

On Sun, 5 Jul 2015 12:06:16 +0100
Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> On Fri, 3 Jul 2015 23:11:59 +0100
> Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> > n->pop has no dcoumentation whatsoever, but the calculation associated
> > with it in 5.0.7 is to do with how much state->pc gets incremented for
> > each case pattern set, not coutning the string.  Why?  No idea.  The
> > calculation has changed, and now looks like the following.
> 
> I've come to the conclusion this has always been wrong, or at least
> confusingly inaccurate.  I think it sort-of worked because the
> comparison landed in the right target area, just not at the right point.

Having said that: the penny has now dropped.  I think it was right, just
incredibly obscure (the equivalent execcase() code was equally obscure
before I tidied it up).

I was wondering why it didn't count 1 for the string expression in the
case (there was only one back then).  But it did... it then subtracted
it off again, because the skip starts at the *next* value of pc after
the one that contained the skip count.  (I've had to subtract that
explicitly --- actually, I simiply remember the next PC as the most
logical way of avoiding funny offsets.)  Then the skip count in the old
case agrees with what the expression was doing, and using the skip count
instead is equivalent in both cases.

So I think everything now adds up.

pws

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-05 11:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-02 21:36 Complicated segfault regression Martijn Dekker
2015-07-03  3:28 ` Bart Schaefer
2015-07-03 11:35 ` Peter Stephenson
2015-07-03 12:29   ` Martijn Dekker
2015-07-03 13:44     ` Peter Stephenson
2015-07-03 17:52       ` Martijn Dekker
2015-07-03 22:11       ` Peter Stephenson
2015-07-04 10:03         ` Peter Stephenson
2015-07-05 11:06         ` Peter Stephenson
2015-07-05 11:15           ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).