* [BUG] SIGSEGV under certain circumstances @ 2017-03-01 15:38 Chi-Hsuan Yen 2017-03-04 23:11 ` Bart Schaefer 0 siblings, 1 reply; 16+ messages in thread From: Chi-Hsuan Yen @ 2017-03-01 15:38 UTC (permalink / raw) To: Zsh hackers list Hello zsh experts, Yesterday I got SIGSEGV in zsh under certain circumstances. I'm on Arch Linux x86_64. Steps to reproduce are: 1. Install the mpv media player. Seems the bug is related to its completion script _mpv. On Arch Linux it's stored in /usr/share/zsh/site-functions/_mpv. I have uploaded a copy to [1] 2. Use .zshrc at [2] 3. Create an empty file called 突然好想你-3565536.mp3 in $HOME Seems the filename affects how SIGSEGV occurs or not. Its content is irrelevant. 4. Open a new terminal and run `mpv 突然好想你-3565536.mp3`. Then exit the terminal to make sure that the mpv command is written to ~/.zsh_history 5. Open another new terminal, hit arrow up one or more times to the previous mpv command. 6. Hit tab, then zsh crashes with SIGSEGV. The log under gdb can be found at [3] My zsh version is d00931de5c2b7aa846daf137865dd05ac8d4be8a. I replaced the value of `source` to git-master and rebuild the Arch Linux zsh package. [4] I can reproduce this bug with both QTerminal (a fork of Konsole) and xfce4-terminal. So it's not a terminal-specific issue. By the way, after those steps ~/.zsh_history seems corrupted: $ head -n 2 ~/.zsh_history | tail -n 1 | xxd 00000000: 6d70 7620 e7aa 81e7 83a4 b6e5 a5bd e683 mpv ............ 00000010: a3b3 e4bd 8380 2d33 3536 3535 3336 2e6d ......-3565536.m 00000020: 7033 0a p3. $ echo -n 突然好想你 | xxd 00000000: e7aa 81e7 84b6 e5a5 bde6 83b3 e4bd a0 ............... Chinese characters 突然好想你 map to: e7aa81 e784b6 e5a5bd e683b3 e4bda0 in utf-8 (15 bytes, 3 bytes for each character). However, in ~/.zsh_history, the saved content is: (I reformatted it for easier comparision with the correct version) e7aa81 e783a4b6 e5a5bd e683a3b3 e4bd8380 Apparently the 2nd, 4th and 5th characters are corrupted. I'm not sure whether it's related to the crash, though. I know this bug is not easy to reproduce. Thanks for the patience in reading this long letter! Best, Yen, Chi-Hsuan [1] https://gist.github.com/yan12125/014c7a7510d1d9bd9ac8edf142a0c65d#file-_mpv [2] https://gist.github.com/yan12125/014c7a7510d1d9bd9ac8edf142a0c65d#file-zshrc [3] https://gist.github.com/yan12125/014c7a7510d1d9bd9ac8edf142a0c65d#file-backtrace [4] https://git.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/zsh ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-01 15:38 [BUG] SIGSEGV under certain circumstances Chi-Hsuan Yen @ 2017-03-04 23:11 ` Bart Schaefer 2017-03-05 12:55 ` Chi-Hsuan Yen 2017-03-05 13:09 ` Chi-Hsuan Yen 0 siblings, 2 replies; 16+ messages in thread From: Bart Schaefer @ 2017-03-04 23:11 UTC (permalink / raw) To: Chi-Hsuan Yen, Zsh hackers list On Mar 1, 11:38pm, Chi-Hsuan Yen wrote: } } My zsh version is d00931de5c2b7aa846daf137865dd05ac8d4be8a. } } By the way, after those steps ~/.zsh_history seems corrupted: I'm not sure the history is actually corrupted, because the history file is stored with metafied characters which means you can't easily examine it independently of reading it into zsh. There was a small C program posted here some months ago that would unmetafy the file. However based on your steps to reproduce there likely is a problem related to reading the history file back in again. The stack trace you provided is not helpful because it reveals the point at which the corrupted memory is accessed but not the point at which it became corrupted. A pass through valgrind might give better information. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-04 23:11 ` Bart Schaefer @ 2017-03-05 12:55 ` Chi-Hsuan Yen 2017-03-05 13:09 ` Chi-Hsuan Yen 1 sibling, 0 replies; 16+ messages in thread From: Chi-Hsuan Yen @ 2017-03-05 12:55 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh hackers list On 5 March 2017 at 07:11, Bart Schaefer <schaefer@brasslantern.com> wrote: > On Mar 1, 11:38pm, Chi-Hsuan Yen wrote: > } > } My zsh version is d00931de5c2b7aa846daf137865dd05ac8d4be8a. > } > } By the way, after those steps ~/.zsh_history seems corrupted: > > I'm not sure the history is actually corrupted, because the history > file is stored with metafied characters which means you can't easily > examine it independently of reading it into zsh. There was a small > C program posted here some months ago that would unmetafy the file. > > However based on your steps to reproduce there likely is a problem > related to reading the history file back in again. > > The stack trace you provided is not helpful because it reveals the > point at which the corrupted memory is accessed but not the point > at which it became corrupted. A pass through valgrind might give > better information. Thanks for the tip. I re-compiled with commit 8522e996ecc88697344dcc4814367ec7e32e7deb and run zsh under valgrind. The log can be found in [1]. (The GDB backtrace is also updated to the new build for consistency). [1] https://gist.github.com/yan12125/014c7a7510d1d9bd9ac8edf142a0c65d#file-valgrind-log ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-04 23:11 ` Bart Schaefer 2017-03-05 12:55 ` Chi-Hsuan Yen @ 2017-03-05 13:09 ` Chi-Hsuan Yen 2017-03-05 16:00 ` Bart Schaefer 1 sibling, 1 reply; 16+ messages in thread From: Chi-Hsuan Yen @ 2017-03-05 13:09 UTC (permalink / raw) To: Bart Schaefer; +Cc: Zsh hackers list On 5 March 2017 at 07:11, Bart Schaefer <schaefer@brasslantern.com> wrote: > On Mar 1, 11:38pm, Chi-Hsuan Yen wrote: > } > } My zsh version is d00931de5c2b7aa846daf137865dd05ac8d4be8a. > } > } By the way, after those steps ~/.zsh_history seems corrupted: > > I'm not sure the history is actually corrupted, because the history > file is stored with metafied characters which means you can't easily > examine it independently of reading it into zsh. There was a small > C program posted here some months ago that would unmetafy the file. > > However based on your steps to reproduce there likely is a problem > related to reading the history file back in again. > > The stack trace you provided is not helpful because it reveals the > point at which the corrupted memory is accessed but not the point > at which it became corrupted. A pass through valgrind might give > better information. (I forgot that gmail servers are blocked, so I re-post my logs) Thanks for the tip. I re-compiled with commit 8522e996ecc88697344dcc4814367ec7e32e7deb and run zsh under valgrind. The log can be found in [1]. (The GDB backtrace is also updated to the new build for consistency). [2] and [3] contains logs of vgdb, which I heard that can give more information than either valgrind or gdb. [1] https://gist.github.com/yan12125/014c7a7510d1d9bd9ac8edf142a0c65d#file-valgrind-log [2] https://gist.github.com/yan12125/014c7a7510d1d9bd9ac8edf142a0c65d#file-valgrind-log-with-vgdb [3] https://gist.github.com/yan12125/014c7a7510d1d9bd9ac8edf142a0c65d#file-gdb-backtrace-from-vgdb ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-05 13:09 ` Chi-Hsuan Yen @ 2017-03-05 16:00 ` Bart Schaefer 2017-03-05 16:17 ` Peter Stephenson 0 siblings, 1 reply; 16+ messages in thread From: Bart Schaefer @ 2017-03-05 16:00 UTC (permalink / raw) To: Chi-Hsuan Yen; +Cc: Zsh hackers list On Mar 5, 9:09pm, Chi-Hsuan Yen wrote: } } Thanks for the tip. I re-compiled with commit } 8522e996ecc88697344dcc4814367ec7e32e7deb and run zsh under valgrind. Yes, this is much better, thank you. The bad pointer dereference is in filename completion, not in the history as I first suspected. In computil.c:cfp_matcher_pats there is a loop that walks the string from the command line, in this case the file name recalled from the history, Comparing each character to the matcher pattern. If it gets a match it adjusts some counters that are initialized from strlen() of the candidate string, exiting the loop when the counters reach the end of the string. It also adjusts pointers into string, and one of those pointers is running past the end. The pattern is m:{a-zA-Z}={A-Za-z} m:{a-zA-Z}={A-Za-z} but I don't think that matters, it's the candidate string that's causing the confusion. You can see the string in Yen's third backtrace - I won't attempt to paste the string here because my mail client will probably mangle it: https://gist.github.com/yan12125/014c7a7510d1d9bd9ac8edf142a0c65d#file-gdb-backtrace-from-vgdb It does not appear to be metafied but it contains some bytes that were not in the original file name in his reproducing example. So there seem to be two problems, one that the history is either not saving or not reloading the Chinese characters correctly, and two that the loop in cfp_matcher_pats is not counting correctly when it examines that garbage string recalled from history. I'm not in a good position multilingual-environment-wise to debug this much further. Peter, you were the last person to touch the computil.c code in this area, though it was a long time ago -- commit 7f470ebcb09972d46e947dddf0c16197dd3a312f Author: Peter Stephenson <pws@users.sourceforge.net> Date: Sat Oct 18 19:16:24 2008 +0000 25912: fix another metafication bug in completion and remove lies about matchers -- do you have any thoughts on what's going on? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-05 16:00 ` Bart Schaefer @ 2017-03-05 16:17 ` Peter Stephenson 2017-03-05 18:42 ` Bart Schaefer 0 siblings, 1 reply; 16+ messages in thread From: Peter Stephenson @ 2017-03-05 16:17 UTC (permalink / raw) To: Zsh hackers list; +Cc: Chi-Hsuan Yen On Sun, 5 Mar 2017 08:00:54 -0800 Bart Schaefer <schaefer@brasslantern.com> wrote: > In computil.c:cfp_matcher_pats there is a loop that walks the string > from the command line, in this case the file name recalled from the > history, Comparing each character to the matcher pattern. If it gets > a match it adjusts some counters that are initialized from strlen() of > the candidate string, exiting the loop when the counters reach the > end of the string. It also adjusts pointers into string, and one of > those pointers is running past the end. > > The pattern is m:{a-zA-Z}={A-Za-z} m:{a-zA-Z}={A-Za-z} but I don't > think that matters, it's the candidate string that's causing the > confusion. >... > So there seem to be two problems, one that the history is either not > saving or not reloading the Chinese characters correctly, and two > that the loop in cfp_matcher_pats is not counting correctly when it > examines that garbage string recalled from history. The matcher code doesn't handle non-ASCII characters, and probably never wlll --- I spent ages looking at this some years ago until I realised I was getting nowhere. The most we can hope is it's safe about pointing off the end of the string. That's complicated by the fact that the rest of completion does handle multibyte characters. There's a good chance this is another problem with the handling of Meta characters. We know that broken history parsing can get these wrong --- we've had problems like that not so long ago. If it gets past that stage, we make assumptions about what the presence of a Meta in the string means that, if it fails, can lead to any number of problems. In particular, we assume that any Meta anywhere in the string has the standard meaning. We might be able to debug the second half of this by testing for incorrect metafication in the matcher code, but I'm not sure how far that gets us. We're not going to be able to be safe at every point in the code. I suspect tracking down the problem at the input stage is the only good bet. You might have thought that would be much more mechanical and hence easy... pws ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-05 16:17 ` Peter Stephenson @ 2017-03-05 18:42 ` Bart Schaefer 2017-03-05 18:52 ` Peter Stephenson 0 siblings, 1 reply; 16+ messages in thread From: Bart Schaefer @ 2017-03-05 18:42 UTC (permalink / raw) To: Zsh hackers list On Mar 5, 4:17pm, Peter Stephenson wrote: } } The matcher code doesn't handle non-ASCII characters, and probably never } wlll --- I spent ages looking at this some years ago until I realised I } was getting nowhere. Maybe there's a way to have cfp_matcher_pats() bail out with failure as soon as it encounters any non-ascii character? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-05 18:42 ` Bart Schaefer @ 2017-03-05 18:52 ` Peter Stephenson 2017-03-05 21:45 ` Bart Schaefer 0 siblings, 1 reply; 16+ messages in thread From: Peter Stephenson @ 2017-03-05 18:52 UTC (permalink / raw) To: zsh-workers On 5 March 2017 18:42:39 GMT+00:00, Bart Schaefer <schaefer@brasslantern.com> wrote: >On Mar 5, 4:17pm, Peter Stephenson wrote: >} >} The matcher code doesn't handle non-ASCII characters, and probably >never >} wlll --- I spent ages looking at this some years ago until I realised >I >} was getting nowhere. > >Maybe there's a way to have cfp_matcher_pats() bail out with failure as >soon as it encounters any non-ascii character? It needs at least to be able to do trivial matching, character for character, or it becomes unusable. pws -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-05 18:52 ` Peter Stephenson @ 2017-03-05 21:45 ` Bart Schaefer 2017-03-05 22:31 ` Bart Schaefer ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Bart Schaefer @ 2017-03-05 21:45 UTC (permalink / raw) To: zsh-workers On Mar 5, 6:52pm, Peter Stephenson wrote: } } It needs at least to be able to do trivial matching, } character for character, or it becomes unusable. Point taken, though arguably it's *already* unusable if it crashes. Let's go back to the history angle ... from Yen's original message: } Chinese characters map to: } } e7aa81 e784b6 e5a5bd e683b3 e4bda0 } } in utf-8 (15 bytes, 3 bytes for each character). However, in } ~/.zsh_history, the saved content is: (I reformatted it for easier } comparision with the correct version) } } e7aa81 e783a4b6 e5a5bd e683a3b3 e4bd8380 Adding brackets to show metafied bytes: e7aa81 e7[83a4]b6 e5a5bd e6[83a3]b3 e4bd[8380] That is the same string that is passed to cfp_matcher_pats() according to the backtrace. So history is reading/writing correctly, it appears. However, note that it's not the first byte of the three in any of these examples that is metafied, and if we look at compmatch.c:pattern_match() we find this: /* * Now the line character. */ #ifdef MULTIBYTE_SUPPORT len = mb_metacharlenconv_r(s, &c, &lstate); #else /* We have the character itself. */ if (*s == Meta) { c = STOUC(s[1]) ^ 32; len = 2; } else { c = STOUC(*s); len = 1; } #endif This will fail in this case, I would think? We have to unmetafy first, possibly several bytes ahead, and THEN attempt multibyte conversion; not do either one or the other? So we're likely to get a false hit on pattern_match(), and in fact the bad dereference is of *mp in the branch where pattern_match() succeeds. [Somebody will have to explain to me why (mp += cl) makes sense, as that seems to be using a byte count to increment a (Cmatcher*), but it's done in several places.] Am I going astray here? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-05 21:45 ` Bart Schaefer @ 2017-03-05 22:31 ` Bart Schaefer 2017-03-05 22:41 ` Daniel Shahaf 2017-03-06 9:47 ` Peter Stephenson 2 siblings, 0 replies; 16+ messages in thread From: Bart Schaefer @ 2017-03-05 22:31 UTC (permalink / raw) To: zsh-workers; +Cc: Chi-Hsuan Yen On Mar 5, 1:45pm, Bart Schaefer wrote: } } This will fail in this case, I would think? We have to unmetafy first, } possibly several bytes ahead, and THEN attempt multibyte conversion; } not do either one or the other? So the following is probably deeply in need of optimization (needs a version of unmeta() that stops after the widest possible mutltibyte character), but just to check my hypothesis: diff --git a/Src/Zle/compmatch.c b/Src/Zle/compmatch.c index aedf463..b0252c2 100644 --- a/Src/Zle/compmatch.c +++ b/Src/Zle/compmatch.c @@ -1548,7 +1548,7 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) { convchar_t c, wc; convchar_t ind, wind; - int len = 0, wlen, mt, wmt; + int len = 0, wlen = 0, mt, wmt; #ifdef MULTIBYTE_SUPPORT mbstate_t lstate, wstate; @@ -1559,7 +1559,13 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) while (p && wp && *s && *ws) { /* First test the word character */ #ifdef MULTIBYTE_SUPPORT - wlen = mb_metacharlenconv_r(ws, &wc, &wstate); + int ulen = mb_metacharlenconv_r(unmeta(ws), &wc, &wstate); + while (ulen-- > 0) { + if (ws[wlen] == Meta) + wlen += 2; + else + wlen += 1; + } #else if (*ws == Meta) { wc = STOUC(ws[1]) ^ 32; @@ -1577,7 +1583,13 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) * Now the line character. */ #ifdef MULTIBYTE_SUPPORT - len = mb_metacharlenconv_r(s, &c, &lstate); + ulen = mb_metacharlenconv_r(unmeta(s), &c, &lstate); + while (ulen-- > 0) { + if (s[len] == Meta) + len += 2; + else + len += 1; + } #else /* We have the character itself. */ if (*s == Meta) { @@ -1624,11 +1636,20 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) ws += wlen; p = p->next; wp = wp->next; +#ifdef MULTIBYTE_SUPPORT + len = wlen = 0; +#endif } while (p && *s) { #ifdef MULTIBYTE_SUPPORT - len = mb_metacharlenconv_r(s, &c, &lstate); + int ulen = mb_metacharlenconv_r(unmeta(s), &c, &lstate); + while (ulen-- > 0) { + if (s[len] == Meta) + len += 2; + else + len += 1; + } #else if (*s == Meta) { c = STOUC(s[1]) ^ 32; @@ -1642,11 +1663,20 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) return 0; p = p->next; s += len; +#ifdef MULTIBYTE_SUPPORT + len = 0; +#endif } while (wp && *ws) { #ifdef MULTIBYTE_SUPPORT - wlen = mb_metacharlenconv_r(ws, &wc, &wstate); + int ulen = mb_metacharlenconv_r(unmeta(ws), &wc, &wstate); + while (ulen-- > 0) { + if (ws[wlen] == Meta) + wlen += 2; + else + wlen += 1; + } #else if (*ws == Meta) { wc = STOUC(ws[1]) ^ 32; @@ -1660,6 +1690,9 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) return 0; wp = wp->next; ws += wlen; +#ifdef MULTIBYTE_SUPPORT + wlen = 0; +#endif } return 1; ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-05 21:45 ` Bart Schaefer 2017-03-05 22:31 ` Bart Schaefer @ 2017-03-05 22:41 ` Daniel Shahaf 2017-03-05 22:51 ` Bart Schaefer 2017-03-06 9:47 ` Peter Stephenson 2 siblings, 1 reply; 16+ messages in thread From: Daniel Shahaf @ 2017-03-05 22:41 UTC (permalink / raw) To: zsh-workers Bart Schaefer wrote on Sun, Mar 05, 2017 at 13:45:13 -0800: > [Somebody will have to explain to me why (mp += cl) makes sense, as that > seems to be using a byte count to increment a (Cmatcher*), but it's done > in several places.] [ Context: this is about cfp_matcher_pats(). ] Using a byte count to modify a pointer, in itself, is fine. It's just pointer arithmetic that makes the pointer 'mp' point at a different element of the array 'ms'. The part that confuses me is why the line says 'mp += cl' rather than simply 'mp++'. The inner loop executes precisely 'zl' times (because the loop variable, 'tl', is initialized to 'al' and skips Meta bytes), and 'mp' points into an array of size 'zl'. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-05 22:41 ` Daniel Shahaf @ 2017-03-05 22:51 ` Bart Schaefer 2017-03-05 23:07 ` Bart Schaefer 0 siblings, 1 reply; 16+ messages in thread From: Bart Schaefer @ 2017-03-05 22:51 UTC (permalink / raw) To: Daniel Shahaf; +Cc: zsh-workers On Sun, Mar 5, 2017 at 2:41 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote: > > The part that confuses me is why the line says 'mp += cl' rather than > simply 'mp++'. The inner loop executes precisely 'zl' times (because > the loop variable, 'tl', is initialized to 'al' and skips Meta bytes), > and 'mp' points into an array of size 'zl'. Yes, that's the part that confuses me as well, i.e., why is mb incremented once for every byte rather than once for every character? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-05 22:51 ` Bart Schaefer @ 2017-03-05 23:07 ` Bart Schaefer 2017-03-06 0:23 ` Bart Schaefer 0 siblings, 1 reply; 16+ messages in thread From: Bart Schaefer @ 2017-03-05 23:07 UTC (permalink / raw) To: zsh-workers On Sun, Mar 5, 2017 at 2:51 PM, Bart Schaefer <schaefer@brasslantern.com> wrote: > On Sun, Mar 5, 2017 at 2:41 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote: >> >> The part that confuses me is why the line says 'mp += cl' rather than >> simply 'mp++'. The inner loop executes precisely 'zl' times (because >> the loop variable, 'tl', is initialized to 'al' and skips Meta bytes), >> and 'mp' points into an array of size 'zl'. > > Yes, that's the part that confuses me as well, i.e., why is mb > incremented once for every byte rather than once for every character? Also, it seems to me cl needs to be the number of bytes compared by pattern_match(), which is not always just 1 or 2 with wide characters. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-05 23:07 ` Bart Schaefer @ 2017-03-06 0:23 ` Bart Schaefer 0 siblings, 0 replies; 16+ messages in thread From: Bart Schaefer @ 2017-03-06 0:23 UTC (permalink / raw) To: zsh-workers On Mar 5, 3:07pm, Bart Schaefer wrote: } } Also, it seems to me cl needs to be the number of bytes compared by } pattern_match(), which is not always just 1 or 2 with wide characters. Let's try this? Hopefully zero'ing the mbstate_t is both correct and not too time consuming in unmeta_one(). It'd be nice if pattern_match() could return the size but there are too many possibilities (including a successful match of zero length, it seems). This is against master, NOT on top of my previous patch. diff --git a/Src/Zle/compmatch.c b/Src/Zle/compmatch.c index aedf463..1cdbb8a 100644 --- a/Src/Zle/compmatch.c +++ b/Src/Zle/compmatch.c @@ -1548,27 +1548,11 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) { convchar_t c, wc; convchar_t ind, wind; - int len = 0, wlen, mt, wmt; -#ifdef MULTIBYTE_SUPPORT - mbstate_t lstate, wstate; - - memset(&lstate, 0, sizeof(lstate)); - memset(&wstate, 0, sizeof(wstate)); -#endif + int len = 0, wlen = 0, mt, wmt; while (p && wp && *s && *ws) { /* First test the word character */ -#ifdef MULTIBYTE_SUPPORT - wlen = mb_metacharlenconv_r(ws, &wc, &wstate); -#else - if (*ws == Meta) { - wc = STOUC(ws[1]) ^ 32; - wlen = 2; - } else { - wc = STOUC(*ws); - wlen = 1; - } -#endif + wc = unmeta_one(ws, &wlen); wind = pattern_match1(wp, wc, &wmt); if (!wind) return 0; @@ -1576,18 +1560,7 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) /* * Now the line character. */ -#ifdef MULTIBYTE_SUPPORT - len = mb_metacharlenconv_r(s, &c, &lstate); -#else - /* We have the character itself. */ - if (*s == Meta) { - c = STOUC(s[1]) ^ 32; - len = 2; - } else { - c = STOUC(*s); - len = 1; - } -#endif + c = unmeta_one(s, &len); /* * If either is "?", they match each other; no further tests. * Apply this even if the character wasn't convertable; @@ -1627,17 +1600,7 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) } while (p && *s) { -#ifdef MULTIBYTE_SUPPORT - len = mb_metacharlenconv_r(s, &c, &lstate); -#else - if (*s == Meta) { - c = STOUC(s[1]) ^ 32; - len = 2; - } else { - c = STOUC(*s); - len = 1; - } -#endif + c = unmeta_one(s, &len); if (!pattern_match1(p, c, &mt)) return 0; p = p->next; @@ -1645,17 +1608,7 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) } while (wp && *ws) { -#ifdef MULTIBYTE_SUPPORT - wlen = mb_metacharlenconv_r(ws, &wc, &wstate); -#else - if (*ws == Meta) { - wc = STOUC(ws[1]) ^ 32; - wlen = 2; - } else { - wc = STOUC(*ws); - wlen = 1; - } -#endif + wc = unmeta_one(ws, &wlen); if (!pattern_match1(wp, wc, &wmt)) return 0; wp = wp->next; diff --git a/Src/Zle/computil.c b/Src/Zle/computil.c index 325da6d..4ea409d 100644 --- a/Src/Zle/computil.c +++ b/Src/Zle/computil.c @@ -4485,10 +4485,10 @@ cfp_matcher_pats(char *matcher, char *add) } else *mp = m; } - cl = (*tmp == Meta) ? 2 : 1; + (void) unmeta_one(tmp, &cl); tl -= cl; tmp += cl; - mp += cl; + mp++; } } else { stopp = m->line; @@ -4505,10 +4505,10 @@ cfp_matcher_pats(char *matcher, char *add) } else *mp = m; } - cl = (*tmp == Meta) ? 2 : 1; + (void) unmeta_one(tmp, &cl); tl -= cl; tmp += cl; - mp += cl; + mp++; } } else if (m->llen) { stopp = m->line; @@ -4531,7 +4531,7 @@ cfp_matcher_pats(char *matcher, char *add) al = tmp - add; break; } - cl = (*tmp == Meta) ? 2 : 1; + (void) unmeta_one(tmp, &cl); tl -= cl; tmp += cl; } diff --git a/Src/utils.c b/Src/utils.c index 7f3ddad..58983d3 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -4788,6 +4788,56 @@ unmeta(const char *file_name) } /* + * Unmetafy just one character and store the number of bytes it occupied. + */ +/**/ +mod_export convchar_t +unmeta_one(const char *in, int *sz) +{ +#ifdef MB_CUR_MAX + VARARR(char,out,MB_CUR_MAX+1); +#else +#define MB_CUR_MAX 2 + char out[MB_CUR_MAX+1]; +#endif + convchar_t wc; + char *p; + const char *t; + int newsz; +#ifdef MULTIBYTE_SUPPORT + int ulen; + mbstate_t wstate; +#endif + + if (!sz) + sz = &newsz; + *sz = 0; + + if (!in) + return 0; + + for (t = in, p = out; *t && (p - out <= MB_CUR_MAX); p++) + if ((*p = *t++) == Meta && *t) + *p = *t++ ^ 32; + *p = '\0'; /* Unnecessary? */ + +#ifdef MULTIBYTE_SUPPORT + memset(&wstate, 0, sizeof(wstate)); + ulen = mb_metacharlenconv_r(out, &wc, &wstate); + while (ulen-- > 0) { + if (in[*sz] == Meta) + *sz += 2; + else + *sz += 1; + } +#else + *sz = p - out; + wc = STOUC(out[0]); +#endif + return wc; +} + +/* * Unmetafy and compare two strings, comparing unsigned character values. * "a\0" sorts after "a". * ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-05 21:45 ` Bart Schaefer 2017-03-05 22:31 ` Bart Schaefer 2017-03-05 22:41 ` Daniel Shahaf @ 2017-03-06 9:47 ` Peter Stephenson 2017-03-06 17:10 ` Bart Schaefer 2 siblings, 1 reply; 16+ messages in thread From: Peter Stephenson @ 2017-03-06 9:47 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh-workers On Sun, 5 Mar 2017 13:45:13 -0800 Bart Schaefer <schaefer@brasslantern.com> wrote: > [Somebody will have to explain to me why (mp += cl) makes sense, as that > seems to be using a byte count to increment a (Cmatcher*), but it's done > in several places.] We're getting down to the point where the matcher hasn't been fully converted to multibyte chracters (though I can't swear this is it), and indeed right at the bottom I seem to remember it uses arrays to such an extent it would be more natural to use wide characters. But that turned out to be easier said than done too --- partly I think it was a question of being able to store pattern characters. > Am I going astray here? I don't understand your point since the functions being called here are all adapted for metafied multibyte characters, but I'm obviously missing something. pws ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] SIGSEGV under certain circumstances 2017-03-06 9:47 ` Peter Stephenson @ 2017-03-06 17:10 ` Bart Schaefer 0 siblings, 0 replies; 16+ messages in thread From: Bart Schaefer @ 2017-03-06 17:10 UTC (permalink / raw) To: zsh-workers On Mar 6, 9:47am, Peter Stephenson wrote: } Subject: Re: [BUG] SIGSEGV under certain circumstances } } Bart Schaefer <schaefer@brasslantern.com> wrote: } > Am I going astray here? } } I don't understand your point since the functions being called here are } all adapted for metafied multibyte characters, but I'm obviously missing } something. No, it's me that's missing something. The patch I sent in 40754 is more complicated than necessary, because I didn't bother to check whether the word "meta" in the name "mb_metacharlenconv" actually meant that it was handling zsh metafied pairs. At the same time, this still means that pattern_match() was using the unmetafied wide characters to do comparisons, but cfp_matcher_pats() was counting bytes as if there were at most one metafied pair making up the whole of each character. There's also the question of whether there is one entry in the "ms" array of Cmatcher for every byte in the input string (including the Meta bytes), or one entry for every character, or (the worst case) one entry for every byte-or-metafied-pair of which there may be more than one of each per wide character. I think the answer is that since the ms array is being filled here to be passed down to cfp_matcher_range(), it should be one entry for each character. cfp_matcher_range() does mp++ at the end of its loop. So incrementing mp by one each time in cfp_matcher_pats() must also be the right thing. Here's a corrected patch with unmeta_one() simplified. Again this is against the master, not on top of previous patches. I'm reasonably confident in this now, so inclined to commit and wait for explosions. Y02compmatch passes, so I don't think anything basic is broken. diff --git a/Src/Zle/compmatch.c b/Src/Zle/compmatch.c index aedf463..1cdbb8a 100644 --- a/Src/Zle/compmatch.c +++ b/Src/Zle/compmatch.c @@ -1548,27 +1548,11 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) { convchar_t c, wc; convchar_t ind, wind; - int len = 0, wlen, mt, wmt; -#ifdef MULTIBYTE_SUPPORT - mbstate_t lstate, wstate; - - memset(&lstate, 0, sizeof(lstate)); - memset(&wstate, 0, sizeof(wstate)); -#endif + int len = 0, wlen = 0, mt, wmt; while (p && wp && *s && *ws) { /* First test the word character */ -#ifdef MULTIBYTE_SUPPORT - wlen = mb_metacharlenconv_r(ws, &wc, &wstate); -#else - if (*ws == Meta) { - wc = STOUC(ws[1]) ^ 32; - wlen = 2; - } else { - wc = STOUC(*ws); - wlen = 1; - } -#endif + wc = unmeta_one(ws, &wlen); wind = pattern_match1(wp, wc, &wmt); if (!wind) return 0; @@ -1576,18 +1560,7 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) /* * Now the line character. */ -#ifdef MULTIBYTE_SUPPORT - len = mb_metacharlenconv_r(s, &c, &lstate); -#else - /* We have the character itself. */ - if (*s == Meta) { - c = STOUC(s[1]) ^ 32; - len = 2; - } else { - c = STOUC(*s); - len = 1; - } -#endif + c = unmeta_one(s, &len); /* * If either is "?", they match each other; no further tests. * Apply this even if the character wasn't convertable; @@ -1627,17 +1600,7 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) } while (p && *s) { -#ifdef MULTIBYTE_SUPPORT - len = mb_metacharlenconv_r(s, &c, &lstate); -#else - if (*s == Meta) { - c = STOUC(s[1]) ^ 32; - len = 2; - } else { - c = STOUC(*s); - len = 1; - } -#endif + c = unmeta_one(s, &len); if (!pattern_match1(p, c, &mt)) return 0; p = p->next; @@ -1645,17 +1608,7 @@ pattern_match(Cpattern p, char *s, Cpattern wp, char *ws) } while (wp && *ws) { -#ifdef MULTIBYTE_SUPPORT - wlen = mb_metacharlenconv_r(ws, &wc, &wstate); -#else - if (*ws == Meta) { - wc = STOUC(ws[1]) ^ 32; - wlen = 2; - } else { - wc = STOUC(*ws); - wlen = 1; - } -#endif + wc = unmeta_one(ws, &wlen); if (!pattern_match1(wp, wc, &wmt)) return 0; wp = wp->next; diff --git a/Src/Zle/computil.c b/Src/Zle/computil.c index 325da6d..e704f9f 100644 --- a/Src/Zle/computil.c +++ b/Src/Zle/computil.c @@ -4465,17 +4465,24 @@ cfp_matcher_pats(char *matcher, char *add) if (m && m != pcm_err) { char *tmp; int al = strlen(add), zl = ztrlen(add), tl, cl; - VARARR(Cmatcher, ms, zl); + VARARR(Cmatcher, ms, zl); /* One Cmatcher per character */ Cmatcher *mp; Cpattern stopp; int stopl = 0; + /* zl >= (number of wide characters) is guaranteed */ memset(ms, 0, zl * sizeof(Cmatcher)); for (; m && *add; m = m->next) { stopp = NULL; if (!(m->flags & (CMF_LEFT|CMF_RIGHT))) { if (m->llen == 1 && m->wlen == 1) { + /* + * In this loop and similar loops below we step + * through tmp one (possibly wide) character at a + * time. pattern_match() compares only the first + * character using unmeta_one() so keep in step. + */ for (tmp = add, tl = al, mp = ms; tl; ) { if (pattern_match(m->line, tmp, NULL, NULL)) { if (*mp) { @@ -4485,10 +4492,10 @@ cfp_matcher_pats(char *matcher, char *add) } else *mp = m; } - cl = (*tmp == Meta) ? 2 : 1; + (void) unmeta_one(tmp, &cl); tl -= cl; tmp += cl; - mp += cl; + mp++; } } else { stopp = m->line; @@ -4505,10 +4512,10 @@ cfp_matcher_pats(char *matcher, char *add) } else *mp = m; } - cl = (*tmp == Meta) ? 2 : 1; + (void) unmeta_one(tmp, &cl); tl -= cl; tmp += cl; - mp += cl; + mp++; } } else if (m->llen) { stopp = m->line; @@ -4531,7 +4538,7 @@ cfp_matcher_pats(char *matcher, char *add) al = tmp - add; break; } - cl = (*tmp == Meta) ? 2 : 1; + (void) unmeta_one(tmp, &cl); tl -= cl; tmp += cl; } diff --git a/Src/utils.c b/Src/utils.c index 7f3ddad..839575b 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -4788,6 +4788,48 @@ unmeta(const char *file_name) } /* + * Unmetafy just one character and store the number of bytes it occupied. + */ +/**/ +mod_export convchar_t +unmeta_one(const char *in, int *sz) +{ + convchar_t wc; + int newsz; +#ifdef MULTIBYTE_SUPPORT + int ulen; + mbstate_t wstate; +#endif + + if (!sz) + sz = &newsz; + *sz = 0; + + if (!in || !*in) + return 0; + +#ifdef MULTIBYTE_SUPPORT + memset(&wstate, 0, sizeof(wstate)); + ulen = mb_metacharlenconv_r(in, &wc, &wstate); + while (ulen-- > 0) { + if (in[*sz] == Meta) + *sz += 2; + else + *sz += 1; + } +#else + if (in[0] == Meta) { + *sz = 2; + wc = STOUC(in[1] ^ 32); + } else { + *sz = 1; + wc = STOUC(in[0]); + } +#endif + return wc; +} + +/* * Unmetafy and compare two strings, comparing unsigned character values. * "a\0" sorts after "a". * ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-03-06 20:01 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-01 15:38 [BUG] SIGSEGV under certain circumstances Chi-Hsuan Yen 2017-03-04 23:11 ` Bart Schaefer 2017-03-05 12:55 ` Chi-Hsuan Yen 2017-03-05 13:09 ` Chi-Hsuan Yen 2017-03-05 16:00 ` Bart Schaefer 2017-03-05 16:17 ` Peter Stephenson 2017-03-05 18:42 ` Bart Schaefer 2017-03-05 18:52 ` Peter Stephenson 2017-03-05 21:45 ` Bart Schaefer 2017-03-05 22:31 ` Bart Schaefer 2017-03-05 22:41 ` Daniel Shahaf 2017-03-05 22:51 ` Bart Schaefer 2017-03-05 23:07 ` Bart Schaefer 2017-03-06 0:23 ` Bart Schaefer 2017-03-06 9:47 ` Peter Stephenson 2017-03-06 17:10 ` Bart Schaefer
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).