bug with camel case and delete-whole-word-match function

zsh-workers
 help / color / mirror / code / Atom feed

* bug with camel case and delete-whole-word-match function
       [not found] <588168129.3340906.1467709726474.JavaMail.yahoo.ref@mail.yahoo.com>
@ 2016-07-05  9:08 ` Oliver Kiddle
  2016-07-05 10:19   ` Peter Stephenson
  2016-07-26 15:51   ` Peter Stephenson
  0 siblings, 2 replies; 12+ messages in thread
From: Oliver Kiddle @ 2016-07-05  9:08 UTC (permalink / raw)
  To: zsh-workers

I took a look at the word style widgets in the contributed functions with a view to perhaps adding a vim style text object based on them.

delete-whole-word-match appears to be the most useful example because it operates on a whole word rather than just forward of backward. Unfortunately, it doesn't seem to work too well for the subword (camel case) word style.

The presence of white space before the cursor position is used as an indication that the cursor is on the first character of the word but
with camel case, there isn't necessarily any whitespace.

Perhaps I'm missing something in the configuration but I have:
  autoload -U delete-whole-word-match
  zle -N delete-whole-word-match
  zstyle ':zle:*' word-style normal-subword
  bindkey '^K delete-whole-word-match

On the first character of a camel case word, both the current and previous word are deleted. On the last letter, it deletes up to the next real whitespace.

Oliver

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bug with camel case and delete-whole-word-match function
  2016-07-05  9:08 ` bug with camel case and delete-whole-word-match function Oliver Kiddle
@ 2016-07-05 10:19   ` Peter Stephenson
  2016-07-05 16:12     ` Bart Schaefer
  2016-07-23 23:06     ` Oliver Kiddle
  2016-07-26 15:51   ` Peter Stephenson
  1 sibling, 2 replies; 12+ messages in thread
From: Peter Stephenson @ 2016-07-05 10:19 UTC (permalink / raw)
  To: zsh-workers

On Tue, 05 Jul 2016 09:08:46 +0000 (UTC)
Oliver Kiddle <okiddle@yahoo.co.uk> wrote:
> I took a look at the word style widgets in the contributed functions
> with a view to perhaps adding a vim style text object based on them.
> 
> delete-whole-word-match appears to be the most useful example because
> it operates on a whole word rather than just forward of
> backward. Unfortunately, it doesn't seem to work too well for the
> subword (camel case) word style.
> 
> The presence of white space before the cursor position is used as an
> indication that the cursor is on the first character of the word but
> with camel case, there isn't necessarily any whitespace.
>
> On the first character of a camel case word, both the current and
> previous word are deleted.
>
> On the last letter, it deletes up to the next real whitespace.

These are related, but the first one's much harder to fix.

If you have "ThisIsSomeWords" and you're on the "e" then the relevant
bits of the split are "Som" (word before cursor) and "e" (word after
cursor), with whitespace before and after cursor empty.  The function
didn't recognise that "me" should be considered a word segment because
you had started the word at "So"; the reason it only happened on the
last character was because it previously assumed that must be the start
of a new word and this didn't match because "Words" followed
immediately.  That's easy enough to fix or at least improve --- detect
what we've been told is the start of a word is a character that normally
wouldn't be in subword mode when there's no white space around.

However, if you're on the "S", you get "Is" before and "Some" after.
Again there's no white space, so there's nothing to indicate to the
calling function that these are two separate words rather than bits of
the same word.  So I think we'd need to add some extra signalling from
match-words-by-style to indicate "I'm at a word start" whether or not
there's white space, which needs some thinking about.

diff --git a/Functions/Zle/match-words-by-style b/Functions/Zle/match-words-by-style
index 54e019d..1a3e78c 100644
--- a/Functions/Zle/match-words-by-style
+++ b/Functions/Zle/match-words-by-style
@@ -202,7 +202,7 @@ if [[ $wordstyle = *subword* ]]; then
   # followed by a lower case letter, or an upper case letter at
   # the start of a group of upper case letters.  To make
   # it easier to be consistent, we just use anything that
-  # isn't an upper case characer instead of a lower case
+  # isn't an upper case character instead of a lower case
   # character.
   # Here the initial "*" will match greedily, so we get the
   # last such match, as we want.
@@ -237,6 +237,12 @@ if [[ $wordstyle = *subword* ]]; then
 	  -n $match[2] ]]; then
     # Yes, so the last one is new word boundary.
     (( epos = ${#match[1]} - 1 ))
+    # Otherwise, are we in the middle of a word?
+    # In other, er, words, we've got something on the left with no
+    # white space following and something that doesn't start a word here.
+  elif [[ -n $word1 && -z $ws1 && -z $ws2 && \
+    $word2 = (#b)([^${~subwordrange}]##)* ]]; then
+    (( epos = ${#match[1]} ))
     # Otherwise, do we have upper followed by non-upper not
     # at the start?  Ignore the initial character, we already
     # know it's a word boundary so it can be an upper case character


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bug with camel case and delete-whole-word-match function
  2016-07-05 10:19   ` Peter Stephenson
@ 2016-07-05 16:12     ` Bart Schaefer
  2016-07-05 16:28       ` Peter Stephenson
  2016-07-23 23:06     ` Oliver Kiddle
  1 sibling, 1 reply; 12+ messages in thread
From: Bart Schaefer @ 2016-07-05 16:12 UTC (permalink / raw)
  To: zsh-workers

On Jul 5, 11:19am, Peter Stephenson wrote:
} Subject: Re: bug with camel case and delete-whole-word-match function
}
} However, if you're on the "S", you get "Is" before and "Some" after.
} Again there's no white space, so there's nothing to indicate to the
} calling function that these are two separate words rather than bits of
} the same word.

Maybe I'm missing something, but shouldn't every capital letter be
treated as the start of a word in this situation, even if it's under
the cursor?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bug with camel case and delete-whole-word-match function
  2016-07-05 16:12     ` Bart Schaefer
@ 2016-07-05 16:28       ` Peter Stephenson
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Stephenson @ 2016-07-05 16:28 UTC (permalink / raw)
  To: zsh-workers

On Tue, 05 Jul 2016 09:12:01 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:

> On Jul 5, 11:19am, Peter Stephenson wrote:
> } Subject: Re: bug with camel case and delete-whole-word-match function
> }
> } However, if you're on the "S", you get "Is" before and "Some" after.
> } Again there's no white space, so there's nothing to indicate to the
> } calling function that these are two separate words rather than bits of
> } the same word.
> 
> Maybe I'm missing something, but shouldn't every capital letter be
> treated as the start of a word in this situation, even if it's under
> the cursor?

Yes, that's what happens.  But the caller doesn't know the difference
between this and getting "Som" and "e" where there's *no* start-of-word
under the cursor, just parts of the word before and on/after.  With
standard word matching it can tell by looking at white space, here it
could only tell by checking again if it's *really* a start of word.
That additional check is the issue.

Because, in normal cases, (word-bit-before '' '' word-bit-after) always
indicates two parts of the same word, the caller will naturally assume
that here unless it has the extra test.  Hence it's the "Is" "Some" that
behaves incorrectly, rather than the "Som" "e".

pws

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bug with camel case and delete-whole-word-match function
  2016-07-05 10:19   ` Peter Stephenson
  2016-07-05 16:12     ` Bart Schaefer
@ 2016-07-23 23:06     ` Oliver Kiddle
  2016-07-24 19:44       ` Peter Stephenson
  2016-07-26 13:52       ` Peter Stephenson
  1 sibling, 2 replies; 12+ messages in thread
From: Oliver Kiddle @ 2016-07-23 23:06 UTC (permalink / raw)
  To: zsh-workers

On 5 Jul, Peter wrote:
> If you have "ThisIsSomeWords"

> However, if you're on the "S", you get "Is" before and "Some" after.
> Again there's no white space, so there's nothing to indicate to the
> calling function that these are two separate words rather than bits of
> the same word.  So I think we'd need to add some extra signalling from
> match-words-by-style to indicate "I'm at a word start" whether or not
> there's white space, which needs some thinking about.

Do we need to keep the existing seven elements of matched_words
unchanged for backwards compatibility? Not that I can think of a
particularly obvious way to augment it for this case. May be just
a 1/0 indicator for start of word is the simplest. It also seems
to lack having a <whitespace before the word> field.

I've attached a patch for a select-word-match function which uses the
mechanism for a vim style text object. In the process, I've found a
couple of other issues with match-words-by-style:

One is that if the cursor is in the middle of a block of whitespace
at the end of the line, the 4th element (whitespace after cursor)
is empty while element 7 contains the whitespace.
A similar issue occurs at the start of the line - element 1 contains
whitespace while element 3 doesn't.

The other issue is that with the shell word style, it'll put whitespace
at the end of element 5 instead of in element 6.

Oliver

diff --git a/Doc/Zsh/contrib.yo b/Doc/Zsh/contrib.yo
index 1d2b7ca..c3dec34 100644
--- a/Doc/Zsh/contrib.yo
+++ b/Doc/Zsh/contrib.yo
@@ -1940,6 +1940,8 @@ tindex(transpose-words-match)
 tindex(capitalize-word-match)
 tindex(up-case-word-match)
 tindex(down-case-word-match)
+tindex(delete-whole-word-match)
+tindex(select-word-match)
 tindex(select-word-style)
 tindex(match-word-context)
 tindex(match-words-by-style)
@@ -1947,12 +1949,14 @@ xitem(tt(forward-word-match), tt(backward-word-match))
 xitem(tt(kill-word-match), tt(backward-kill-word-match))
 xitem(tt(transpose-words-match), tt(capitalize-word-match))
 xitem(tt(up-case-word-match), tt(down-case-word-match))
+xitem(tt(delete-whole-word-match), tt(select-word-match))
 item(tt(select-word-style), tt(match-word-context), tt(match-words-by-style))(
-The eight `tt(-match)' functions are drop-in replacements for the
+The first eight `tt(-match)' functions are drop-in replacements for the
 builtin widgets without the suffix.  By default they behave in a similar
 way.  However, by the use of styles and the function tt(select-word-style),
-the way words are matched can be altered.  For comparison, the widgets
-described in ifzman(zmanref(zshzle) under Text Objects)\
+the way words are matched can be altered. tt(select-word-match) is intended
+to be used as a text object in vi mode but with custom word styles. For
+comparison, the widgets described in ifzman(zmanref(zshzle) under Text Objects)\
 ifnzman(noderef(Text Objects)) use fixed definitions of words, compatible
 with the tt(vim) editor.
 
@@ -1960,7 +1964,7 @@ The simplest way of configuring the functions is to use
 tt(select-word-style), which can either be called as a normal function with
 the appropriate argument, or invoked as a user-defined widget that will
 prompt for the first character of the word style to be used.  The first
-time it is invoked, the eight tt(-match) functions will automatically
+time it is invoked, the first eight tt(-match) functions will automatically
 replace the builtin versions, so they do not need to be loaded explicitly.
 
 The word styles available are as follows.  Only the first character
diff --git a/Functions/Zle/select-word-match b/Functions/Zle/select-word-match
new file mode 100644
index 0000000..24620c9
--- /dev/null
+++ b/Functions/Zle/select-word-match
@@ -0,0 +1,121 @@
+# Select the entire word around the cursor. Intended for use as
+# a vim-style text object in vi mode but with customisable
+# word boundaries.
+#
+# For example:
+#   autoload -U select-word-match
+#   zle -N select-in-camel select-word-match
+#   bindkey -M viopp ic select-in-camel
+#   zstyle ':zle:*-camel' word-style normal-subword
+
+emulate -L zsh
+setopt extendedglob
+
+local curcontext=:zle:$WIDGET
+local -a matched_words
+# Start and end of range of characters
+integer pos1 pos2 num=${NUMERIC:-1}
+local style word
+
+# choose between inner word or a word style of widget
+for style in $1 ${${WIDGET#*-}[1]} $KEYS[1] "i"; do
+  [[ $style = [ai] ]] && break
+done
+
+autoload -Uz match-words-by-style
+
+while (( num-- )); do
+  if (( MARK > CURSOR )); then
+    # if cursor is at the start of the selection, just move back a word
+    match-words-by-style
+    if [[ $style = i && -n $matched_words[3] ]]; then
+      word=$matched_words[3]
+    else
+      word=$matched_words[2]$matched_words[3]
+    fi
+    if [[ -n $word ]]; then
+      (( CURSOR -= ${#word} ))
+    else
+      return 1
+    fi
+  elif (( MARK >= 0 && MARK < CURSOR )); then
+    # cursor at the end, move forward a word
+    (( CURSOR+1 == $#BUFFER )) && return 1
+    (( CURSOR++ ))
+    match-words-by-style
+    if [[ -n $matched_words[4] ]]; then
+      if [[ $style = i ]]; then
+	# just skip the whitespace
+	word=$matched_words[4]
+      else
+	# skip the whitespace plus word
+	word=$matched_words[4]$matched_words[5]
+      fi
+    else
+      if [[ $style = i ]]; then
+	# skip the word
+	word=$matched_words[5]
+      else
+	# skip word and following whitespace
+	word=$matched_words[5]$matched_words[6]
+      fi
+    fi
+    (( CURSOR += ${#word} - 1 ))
+  else
+    match-words-by-style
+
+    if [[ -n "${matched_words[3]}" ]]; then
+      # There's whitespace before the cursor, so the word we are selecting
+      # starts at the cursor position.
+      pos1=$CURSOR
+    else
+      # No whitespace before us, so select any wordcharacters there.
+      pos1="${#matched_words[1]}"
+    fi
+
+    if [[ -n "${matched_words[4]}" ]]; then
+      if [[ -n "${matched_words[3]}" ]] || (( CURSOR == 0 )); then
+        # whitespace either side, select it
+	(( pos1 = CURSOR - ${#matched_words[3]} ))
+	(( pos2 = CURSOR + ${#matched_words[4]} ))
+      else
+	# There's whitespace at the cursor position, so only select
+	# up to the cursor position.
+	(( pos2 = CURSOR + 1 ))
+      fi
+    else
+      # No whitespace at the cursor position, so select the
+      # current character and any following wordcharacters.
+      (( pos2 = CURSOR + ${#matched_words[5]} ))
+    fi
+
+    if [[ $style = a ]]; then
+      if [[ -n "${matched_words[4]}"  && ( -n "${matched_words[3]}" || CURSOR -eq 0 ) ]]; then
+	# in the middle of whitespace so grab a word
+	if [[ -n "${matched_words[5]}" ]]; then
+	  (( pos2 += ${#matched_words[5]} )) # preferably the one after
+	else
+	  (( pos1 -= ${#matched_words[2]} )) # otherwise the one before
+	fi
+      elif [[ -n "${matched_words[6]}" ]]; then
+	(( pos2 += ${#matched_words[6]} ))
+      elif [[ -n "${matched_words[3]}" ]]; then
+	# couldn't grab whitespace forwards so try backwards
+	(( pos1 -= ${#matched_words[3]} ))
+      elif (( pos1 > 0 )); then
+	# There might have been whitespace before the word
+	(( CURSOR = pos1 ))
+	match-words-by-style
+	if [[ -n "${matched_words[3]}" ]]; then
+	  (( pos1 -= ${#matched_words[3]} ))
+	fi
+      fi
+    fi
+
+    (( MARK = pos1, CURSOR = pos2-1 ))
+  fi
+done
+
+if [[ $KEYMAP == vicmd ]] && (( !REGION_ACTIVE )); then
+  (( CURSOR++ )) # Need to include cursor position for operators
+fi


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bug with camel case and delete-whole-word-match function
  2016-07-23 23:06     ` Oliver Kiddle
@ 2016-07-24 19:44       ` Peter Stephenson
  2016-07-26 13:52       ` Peter Stephenson
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Stephenson @ 2016-07-24 19:44 UTC (permalink / raw)
  To: zsh-workers

On Sun, 24 Jul 2016 01:06:02 +0200
Oliver Kiddle <okiddle@yahoo.co.uk> wrote:
> On 5 Jul, Peter wrote:
> > If you have "ThisIsSomeWords"
> 
> > However, if you're on the "S", you get "Is" before and "Some" after.
> > Again there's no white space, so there's nothing to indicate to the
> > calling function that these are two separate words rather than bits of
> > the same word.  So I think we'd need to add some extra signalling from
> > match-words-by-style to indicate "I'm at a word start" whether or not
> > there's white space, which needs some thinking about.
> 
> Do we need to keep the existing seven elements of matched_words
> unchanged for backwards compatibility? Not that I can think of a
> particularly obvious way to augment it for this case. May be just
> a 1/0 indicator for start of word is the simplest. It also seems
> to lack having a <whitespace before the word> field.

Yes, something like that.  I was wondering if it was time to keep the
current way for backward compatibility but switch to a keyword-based
(associative array?) system for future enhancements.

> One is that if the cursor is in the middle of a block of whitespace
> at the end of the line, the 4th element (whitespace after cursor)
> is empty while element 7 contains the whitespace.
> A similar issue occurs at the start of the line - element 1 contains
> whitespace while element 3 doesn't.

Might simply not be using an inclusive enough type of white space?

pws


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bug with camel case and delete-whole-word-match function
  2016-07-23 23:06     ` Oliver Kiddle
  2016-07-24 19:44       ` Peter Stephenson
@ 2016-07-26 13:52       ` Peter Stephenson
  2016-07-26 18:22         ` Oliver Kiddle
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Stephenson @ 2016-07-26 13:52 UTC (permalink / raw)
  To: zsh-workers

On Sun, 24 Jul 2016 01:06:02 +0200
Oliver Kiddle <okiddle@yahoo.co.uk> wrote:
> One is that if the cursor is in the middle of a block of whitespace
> at the end of the line, the 4th element (whitespace after cursor)
> is empty while element 7 contains the whitespace.
> A similar issue occurs at the start of the line - element 1 contains
> whitespace while element 3 doesn't.

I'm not sure what you're testing.  I've put a test function below and
ran it with

mwbs-test -w normal-subword $'one two ThreeFour ' $' \nFiveSix seven'

and I get

      start: 'one two Three'
wd-before-c: 'Four'
ws-before-c: ' '
 ws-after-c: ' 
'
 wd-after-c: 'Five'
 ws-after-w: ''
        end: 'Six seven'

which is what I expect.  Similarly at the start of the next line.  Do
you get something different, or isn't it testing for the problem at all?

> The other issue is that with the shell word style, it'll put whitespace
> at the end of element 5 instead of in element 6.

Aagain, I get:

mwbs-test -w shell $'one two ThreeFour \n ' $' FiveSix seven'

      start: 'one two '
wd-before-c: 'ThreeFour'
ws-before-c: ' 
 '
 ws-after-c: ' '
 wd-after-c: 'FiveSix'
 ws-after-w: ' '
        end: 'seven'

pws


# mwbs-test
autoload -Uz match-words-by-style

local wordstyle=normal-subword
local opt
while getopts "w:" opt; do
  case $opt in
    (w)
    wordstyle=$OPTARG
    ;;
    (*)
    return 1
    ;;
  esac
done
shift $(( OPTIND - 1 ))

if (( $# != 2 )); then
  print "Usage: mwbs-test LBUFFER RBUFFER" >&2
  return 1
fi

local -a matched_words

local LBUFFER=$1 RBUFFER=$2

match-words-by-style -w $wordstyle || return

print -r "\
      start: '$matched_words[1]'
wd-before-c: '$matched_words[2]'
ws-before-c: '$matched_words[3]'
 ws-after-c: '$matched_words[4]'
 wd-after-c: '$matched_words[5]'
 ws-after-w: '$matched_words[6]'
        end: '$matched_words[7]'
"


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bug with camel case and delete-whole-word-match function
  2016-07-05  9:08 ` bug with camel case and delete-whole-word-match function Oliver Kiddle
  2016-07-05 10:19   ` Peter Stephenson
@ 2016-07-26 15:51   ` Peter Stephenson
  2016-07-26 16:00     ` Peter Stephenson
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Stephenson @ 2016-07-26 15:51 UTC (permalink / raw)
  To: zsh-workers

On Tue, 05 Jul 2016 09:08:46 +0000 (UTC)
Oliver Kiddle <okiddle@yahoo.co.uk> wrote:
> On the first character of a camel case word, both the current and
> previous word are deleted.

This should fix this in a way that makes it easy to add new features.

pws

diff --git a/Doc/Zsh/contrib.yo b/Doc/Zsh/contrib.yo
index c3dec34..5a7fc13 100644
--- a/Doc/Zsh/contrib.yo
+++ b/Doc/Zsh/contrib.yo
@@ -2132,6 +2132,15 @@ non-word characters following that word (7) the remainder of the line.  Any
 of the elements may be an empty string; the calling function should test
 for this to decide whether it can perform its function.
 
+If the option -A is given to tt(match-words-by-style), then
+tt(matched_words) is an associative array and the seven values
+given above should be retrieved from it as elements named tt(start),
+tt(word-before-cursor), tt(ws-before-cursor), tt(ws-after-cursor),
+tt(word-after-cursor), tt(ws-after-word), and tt(end).  In addition
+the element tt(is-word-start) is 1 if the cursor is on the start
+of a word or subword, and 0 otherwise.  This form is recommended
+for future compatibility.
+
 It is possible to pass options with arguments to tt(match-words-by-style)
 to override the use of styles.  The options are:
 startsitem()
diff --git a/Functions/Zle/delete-whole-word-match b/Functions/Zle/delete-whole-word-match
index aece860..a07f236 100644
--- a/Functions/Zle/delete-whole-word-match
+++ b/Functions/Zle/delete-whole-word-match
@@ -12,30 +12,29 @@ emulate -L zsh
 setopt extendedglob
 
 local curcontext=:zle:$WIDGET
-local -a matched_words
+local -A matched_words
 # Start and end of range of characters to remove.
 integer pos1 pos2
 
 autoload -Uz match-words-by-style
-match-words-by-style
+match-words-by-style -A
 
-if [[ -n "${matched_words[3]}" ]]; then
-    # There's whitespace before the cursor, so the word we are deleting
-    # starts at the cursor position.
+if (( ${matched_words[is-word-start]} )); then
+    # The word we are deleting starts at the cursor position.
     pos1=$CURSOR
 else
-    # No whitespace before us, so delete any wordcharacters there.
-    pos1="${#matched_words[1]}"
+    # Not, so delete any wordcharacters before, too
+    pos1="${#matched_words[start]}"
 fi
 
-if [[ -n "${matched_words[4]}" ]]; then
+if [[ -n "${matched_words[ws-after-cursor]}" ]]; then
     # There's whitespace at the cursor position, so only delete
     # up to the cursor position.
     (( pos2 = CURSOR + 1 ))
 else
     # No whitespace at the cursor position, so delete the
     # current character and any following wordcharacters.
-    (( pos2 = CURSOR + ${#matched_words[5]} + 1 ))
+    (( pos2 = CURSOR + ${#matched_words[word-after-cursor]} + 1 ))
 fi
 
 # Move the cursor then delete the block in one go for the
diff --git a/Functions/Zle/match-words-by-style b/Functions/Zle/match-words-by-style
index 6cdec75..1110f76 100644
--- a/Functions/Zle/match-words-by-style
+++ b/Functions/Zle/match-words-by-style
@@ -5,8 +5,16 @@
 #    <whitespace-after-cursor> <word-after-cursor> <whitespace-after-word>
 #    <stuff-at-end>
 # where the cursor position is always after the third item and `after'
-# is to be interpreted as `after or on'.  Some
-# of the array elements will be empty; this depends on the style.
+# is to be interpreted as `after or on'.
+#
+# With the option -A, matched_words is an associative array; the
+# values above are now given by the elements named start, word-before-cursor,
+# ws-before-cursor, ws-after-cursor, word-after-cursor, ws-after-word,
+# end.  In addition, the element is-word-start is 1 if the cursor
+# is on the start of a word; this is non-trivial in the case of subword
+# (camel case) matching as there may be no white space to test.
+#
+# Some of the array elements will be empty; this depends on the style.
 # For example
 #    foo bar  rod stick
 #            ^
@@ -70,14 +78,19 @@ setopt extendedglob
 local wordstyle spacepat wordpat1 wordpat2 opt charskip wordchars wordclass
 local match mbegin mend pat1 pat2 word1 word2 ws1 ws2 ws3 skip
 local nwords MATCH MBEGIN MEND subwordrange
+integer use_assoc
 
 local curcontext=${curcontext:-:zle:match-words-by-style}
 
 autoload -Uz match-word-context
 match-word-context
 
-while getopts "w:s:c:C:r:" opt; do
+while getopts "Aw:s:c:C:r:" opt; do
   case $opt in
+    (A)
+    use_assoc=1
+    ;;
+
     (w)
     wordstyle=$OPTARG
     ;;
@@ -229,6 +242,8 @@ ws2=$match[1]
 word2=$match[2]
 ws3=$match[3]
 
+integer wordstart
+[[ -n $ws1 || -n $ws2 ]] && wordstart=1
 if [[ $wordstyle = *subword* ]]; then
   # Do we have a group of upper case characters at the start
   # of word2 (that don't form the entire word)?
@@ -249,6 +264,7 @@ if [[ $wordstyle = *subword* ]]; then
     # if it wants.
   elif [[ $word2 = (#b)(?[^${~subwordrange}]##)[${~subwordrange}]* ]]; then
     (( epos = ${#match[1]} ))
+    (( wordstart = 1 ))
   else
     (( epos = 0 ))
   fi
@@ -262,4 +278,21 @@ if [[ $wordstyle = *subword* ]]; then
   fi
 fi
 
-matched_words=("$pat1" "$word1" "$ws1" "$ws2" "$word2" "$ws3" "$pat2")
+# matched_words should be local to caller.
+# Just fix type here.
+if (( use_assoc )); then
+  typeset -gA matched_words
+  matched_words=(
+    start              "$pat1"
+    word-before-cursor "$word1"
+    ws-before-cursor   "$ws1"
+    ws-after-cursor    "$ws2"
+    word-after-cursor  "$word2"
+    ws-after-word      "$ws3"
+    end                "$pat2"
+    is-word-start      $wordstart
+  )
+else
+  typeset -ga matched_words
+  matched_words=("$pat1" "$word1" "$ws1" "$ws2" "$word2" "$ws3" "$pat2")
+fi


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bug with camel case and delete-whole-word-match function
  2016-07-26 15:51   ` Peter Stephenson
@ 2016-07-26 16:00     ` Peter Stephenson
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Stephenson @ 2016-07-26 16:00 UTC (permalink / raw)
  To: zsh-workers

On Tue, 26 Jul 2016 16:51:14 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> ...In addition
> +the element tt(is-word-start) is 1 if the cursor is on the start
> +of a word or subword, and 0 otherwise.

Actually, if it's on a word start or after the end of the previous word,
i.e. there could be intervening white space.  This is OK if you want to
operate on the next word by default, but could at least be documented
better.  Or it could be 0 = in middle of word, 1 = right at start of
word, 2 = before start of word, or something.

pws

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bug with camel case and delete-whole-word-match function
  2016-07-26 13:52       ` Peter Stephenson
@ 2016-07-26 18:22         ` Oliver Kiddle
  2016-07-27  8:54           ` Peter Stephenson
  0 siblings, 1 reply; 12+ messages in thread
From: Oliver Kiddle @ 2016-07-26 18:22 UTC (permalink / raw)
  To: zsh-workers

Peter wrote:
> > A similar issue occurs at the start of the line - element 1 contains
> > whitespace while element 3 doesn't.
>
> I'm not sure what you're testing.  I've put a test function below and
> ran it with
>
> mwbs-test -w normal-subword $'one two ThreeFour ' $' \nFiveSix seven'

I probably should have said start/end of the buffer rather than of the
line.
It is the output of the following two:
  mwbs-test '   ' '  word'
  mwbs-test 'word  ' '   '

In the latter case, this is:

      start: ''
wd-before-c: 'word'
ws-before-c: '  '
 ws-after-c: ''
 wd-after-c: ''
 ws-after-w: ''
        end: '   '

So the spaces go in end rather than ws-after-c.
Whenever the cursor is between actual words, ws-before-c and ws-after-c will
cover the full area of whitespace surrounding the cursor. I don't see
why it should be different when you've got the end/start of the buffer.
For comparison, try: mwbs-test 'word  ' '   x'

In vi word selection will grab a whole block of whitespace in these
cases.

> This should fix this in a way that makes it easy to add new features.

Thanks. Looks good to me.

> +If the option -A is given to tt(match-words-by-style), then

Given that it is the calling functions' responsibility to declare
matched_words, it could just use ${(t)matched_words} but I'm not
especially bothered.

Oliver

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bug with camel case and delete-whole-word-match function
  2016-07-26 18:22         ` Oliver Kiddle
@ 2016-07-27  8:54           ` Peter Stephenson
  2016-07-27 23:05             ` Oliver Kiddle
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Stephenson @ 2016-07-27  8:54 UTC (permalink / raw)
  To: zsh-workers

On Tue, 26 Jul 2016 20:22:05 +0200
Oliver Kiddle <okiddle@yahoo.co.uk> wrote:
> It is the output of the following two:
>   mwbs-test '   ' '  word'
>   mwbs-test 'word  ' '   '
> 
> In the latter case, this is:
> 
>       start: ''
> wd-before-c: 'word'
> ws-before-c: '  '
>  ws-after-c: ''
>  wd-after-c: ''
>  ws-after-w: ''
>         end: '   '
> 
> So the spaces go in end rather than ws-after-c.

OK, we need to check that there actually is a word after the cursor, as
we should also do before setting is-word-start.

> > +If the option -A is given to tt(match-words-by-style), then
> 
> Given that it is the calling functions' responsibility to declare
> matched_words, it could just use ${(t)matched_words} but I'm not
> especially bothered.

Yes, that would probably be neater.

pws

diff --git a/Doc/Zsh/contrib.yo b/Doc/Zsh/contrib.yo
index c3dec34..8db7395 100644
--- a/Doc/Zsh/contrib.yo
+++ b/Doc/Zsh/contrib.yo
@@ -2132,6 +2132,17 @@ non-word characters following that word (7) the remainder of the line.  Any
 of the elements may be an empty string; the calling function should test
 for this to decide whether it can perform its function.
 
+If the variable tt(matched_words) is defined by the caller to
+tt(match-words-by-style) as an associative array (tt(local -A
+matched_words)), then the seven values given above should be retrieved
+from it as elements named tt(start), tt(word-before-cursor),
+tt(ws-before-cursor), tt(ws-after-cursor), tt(word-after-cursor),
+tt(ws-after-word), and tt(end).  In addition the element
+tt(is-word-start) is 1 if the cursor is on the start of a word or
+subword, or on white space before it (the cases can be distinguished by
+testing the tt(ws-after-cursor) element) and 0 otherwise.  This form is
+recommended for future compatibility.
+
 It is possible to pass options with arguments to tt(match-words-by-style)
 to override the use of styles.  The options are:
 startsitem()
diff --git a/Functions/Zle/delete-whole-word-match b/Functions/Zle/delete-whole-word-match
index aece860..3d52dd3 100644
--- a/Functions/Zle/delete-whole-word-match
+++ b/Functions/Zle/delete-whole-word-match
@@ -12,30 +12,29 @@ emulate -L zsh
 setopt extendedglob
 
 local curcontext=:zle:$WIDGET
-local -a matched_words
+local -A matched_words
 # Start and end of range of characters to remove.
 integer pos1 pos2
 
 autoload -Uz match-words-by-style
 match-words-by-style
 
-if [[ -n "${matched_words[3]}" ]]; then
-    # There's whitespace before the cursor, so the word we are deleting
-    # starts at the cursor position.
+if (( ${matched_words[is-word-start]} )); then
+    # The word we are deleting starts at the cursor position.
     pos1=$CURSOR
 else
-    # No whitespace before us, so delete any wordcharacters there.
-    pos1="${#matched_words[1]}"
+    # Not, so delete any wordcharacters before, too
+    pos1="${#matched_words[start]}"
 fi
 
-if [[ -n "${matched_words[4]}" ]]; then
+if [[ -n "${matched_words[ws-after-cursor]}" ]]; then
     # There's whitespace at the cursor position, so only delete
     # up to the cursor position.
     (( pos2 = CURSOR + 1 ))
 else
     # No whitespace at the cursor position, so delete the
     # current character and any following wordcharacters.
-    (( pos2 = CURSOR + ${#matched_words[5]} + 1 ))
+    (( pos2 = CURSOR + ${#matched_words[word-after-cursor]} + 1 ))
 fi
 
 # Move the cursor then delete the block in one go for the
diff --git a/Functions/Zle/match-words-by-style b/Functions/Zle/match-words-by-style
index 6cdec75..fc59c27 100644
--- a/Functions/Zle/match-words-by-style
+++ b/Functions/Zle/match-words-by-style
@@ -5,8 +5,16 @@
 #    <whitespace-after-cursor> <word-after-cursor> <whitespace-after-word>
 #    <stuff-at-end>
 # where the cursor position is always after the third item and `after'
-# is to be interpreted as `after or on'.  Some
-# of the array elements will be empty; this depends on the style.
+# is to be interpreted as `after or on'.
+#
+# matched_words may be an associative array, in which case the
+# values above are now given by the elements named start, word-before-cursor,
+# ws-before-cursor, ws-after-cursor, word-after-cursor, ws-after-word,
+# end.  In addition, the element is-word-start is 1 if the cursor
+# is on the start of a word; this is non-trivial in the case of subword
+# (camel case) matching as there may be no white space to test.
+#
+# Some of the array elements will be empty; this depends on the style.
 # For example
 #    foo bar  rod stick
 #            ^
@@ -224,11 +232,18 @@ charskip=${(l:skip::?:)}
 
 eval pat2='${RBUFFER##(#b)('${charskip}${spacepat}')('\
 ${wordpat2}')('${spacepat}')}'
+if [[ -n $match[2] ]]; then
+  ws2=$match[1]
+  word2=$match[2]
+  ws3=$match[3]
+else
+  # No more words, so anything left is white space after cursor.
+  ws2=$RBUFFER
+  pat2=
+fi
 
-ws2=$match[1]
-word2=$match[2]
-ws3=$match[3]
-
+integer wordstart
+[[ ( -n $ws1 || -n $ws2 ) && -n $word2 ]] && wordstart=1
 if [[ $wordstyle = *subword* ]]; then
   # Do we have a group of upper case characters at the start
   # of word2 (that don't form the entire word)?
@@ -249,6 +264,7 @@ if [[ $wordstyle = *subword* ]]; then
     # if it wants.
   elif [[ $word2 = (#b)(?[^${~subwordrange}]##)[${~subwordrange}]* ]]; then
     (( epos = ${#match[1]} ))
+    (( wordstart = 1 ))
   else
     (( epos = 0 ))
   fi
@@ -262,4 +278,19 @@ if [[ $wordstyle = *subword* ]]; then
   fi
 fi
 
-matched_words=("$pat1" "$word1" "$ws1" "$ws2" "$word2" "$ws3" "$pat2")
+# matched_words should be local to caller.
+# Just fix type here.
+if [[ ${(t)matched_words} = *association* ]]; then
+  matched_words=(
+    start              "$pat1"
+    word-before-cursor "$word1"
+    ws-before-cursor   "$ws1"
+    ws-after-cursor    "$ws2"
+    word-after-cursor  "$word2"
+    ws-after-word      "$ws3"
+    end                "$pat2"
+    is-word-start      $wordstart
+  )
+else
+  matched_words=("$pat1" "$word1" "$ws1" "$ws2" "$word2" "$ws3" "$pat2")
+fi


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bug with camel case and delete-whole-word-match function
  2016-07-27  8:54           ` Peter Stephenson
@ 2016-07-27 23:05             ` Oliver Kiddle
  0 siblings, 0 replies; 12+ messages in thread
From: Oliver Kiddle @ 2016-07-27 23:05 UTC (permalink / raw)
  To: zsh-workers

Peter wrote:
> OK, we need to check that there actually is a word after the cursor, as
> we should also do before setting is-word-start.

Thanks. That now works nicely.

Attached patch adjusts select-word-match to take advantage of this.
I've also updated zstyle completion slightly for the word-style and
word-class styles. The documentation seems a bit confusing in the way it
mixes "normal" and "standard" and there might be an error there. I have
adjusted the word-context example for what looks like an error to me.

I've also now had a chance to experiment with select-word-match. I
had previously used select-bracketed for directory components and
lists using i/ a/ i, and a, etc. For things that are essentially
list separators, select-word-match works better because it can grab
the first and last components of a list and the a forms such as a/
will grab just one surrounding / rather than both. Unfortunately,
for the last component of a path, it'll grab the following whitespace
rather than the preceding slash but we can perhaps add a style for
patterns specifying preferred whitespace. That might have other
uses like prefering preceding whitespace over a following newline.

Oliver

diff --git a/Completion/Zsh/Command/_zstyle b/Completion/Zsh/Command/_zstyle
index 9a6d618..20ff47f 100644
--- a/Completion/Zsh/Command/_zstyle
+++ b/Completion/Zsh/Command/_zstyle
@@ -173,6 +173,7 @@ styles=(
   url-seps               e:
   whence                 e:
   word-chars             e:
+  word-class             e:
   word-style             e:word-style
   word-context           e:
 
@@ -241,11 +242,13 @@ while (( $#state )); do
   case "$state[1]" in
     (contexts)
       if [[ ! -prefix :*: ]]; then
-	_wanted contexts expl context compadd -P : -qS : completion vcs_info zftp
+	_wanted contexts expl context compadd -P : -qS : chpwd completion vcs_info zftp zle
       elif compset -P :completion:; then
         contexts=( functions _completers cmdorcont argument tag )
       elif compset -P :vcs_info:; then
         contexts=( vcs-string user-context repo-root-name )
+      elif compset -P :zle:; then
+	_wanted widgets expl widget _widgets -qS :
       fi
       if (( $#contexts )); then
         for ostate in $contexts; do
@@ -521,7 +524,7 @@ while (( $#state )); do
       ;;
 
     (word-style)
-      _wanted word-styles expl 'word style' compadd normal shell space
+      _wanted word-styles expl 'word style' compadd {normal,specified,unspecified,shell,whitespace}-subword
       ;;
 
     (vcs-string)
diff --git a/Doc/Zsh/contrib.yo b/Doc/Zsh/contrib.yo
index 8db7395..00ed080 100644
--- a/Doc/Zsh/contrib.yo
+++ b/Doc/Zsh/contrib.yo
@@ -2105,7 +2105,7 @@ Here are some examples of use of the tt(word-context) style to extend
 the context.
 
 example(zstyle ':zle:*' word-context \ 
-       "*/*" file "[[:space:]]" whitespace
+       "*/*" filename "[[:space:]]" whitespace
 zstyle ':zle:transpose-words:whitespace' word-style shell
 zstyle ':zle:transpose-words:filename' word-style normal
 zstyle ':zle:transpose-words:filename' word-chars '')
diff --git a/Functions/Zle/select-word-match b/Functions/Zle/select-word-match
index 24620c9..8440852 100644
--- a/Functions/Zle/select-word-match
+++ b/Functions/Zle/select-word-match
@@ -12,7 +12,7 @@ emulate -L zsh
 setopt extendedglob
 
 local curcontext=:zle:$WIDGET
-local -a matched_words
+local -A matched_words
 # Start and end of range of characters
 integer pos1 pos2 num=${NUMERIC:-1}
 local style word
@@ -28,10 +28,10 @@ while (( num-- )); do
   if (( MARK > CURSOR )); then
     # if cursor is at the start of the selection, just move back a word
     match-words-by-style
-    if [[ $style = i && -n $matched_words[3] ]]; then
-      word=$matched_words[3]
+    if [[ $style = i && -n $matched_words[ws-before-cursor] ]]; then
+      word=$matched_words[ws-before-cursor]
     else
-      word=$matched_words[2]$matched_words[3]
+      word=$matched_words[word-before-cursor]$matched_words[ws-before-cursor]
     fi
     if [[ -n $word ]]; then
       (( CURSOR -= ${#word} ))
@@ -43,41 +43,40 @@ while (( num-- )); do
     (( CURSOR+1 == $#BUFFER )) && return 1
     (( CURSOR++ ))
     match-words-by-style
-    if [[ -n $matched_words[4] ]]; then
+    if [[ -n $matched_words[ws-after-cursor] ]]; then
       if [[ $style = i ]]; then
 	# just skip the whitespace
-	word=$matched_words[4]
+	word=$matched_words[ws-after-cursor]
       else
 	# skip the whitespace plus word
-	word=$matched_words[4]$matched_words[5]
+	word=$matched_words[ws-after-cursor]$matched_words[word-after-cursor]
       fi
     else
       if [[ $style = i ]]; then
 	# skip the word
-	word=$matched_words[5]
+	word=$matched_words[word-after-cursor]
       else
 	# skip word and following whitespace
-	word=$matched_words[5]$matched_words[6]
+	word=$matched_words[word-after-cursor]$matched_words[ws-after-word]
       fi
     fi
     (( CURSOR += ${#word} - 1 ))
   else
     match-words-by-style
 
-    if [[ -n "${matched_words[3]}" ]]; then
-      # There's whitespace before the cursor, so the word we are selecting
-      # starts at the cursor position.
+    if (( ${matched_words[is-word-start]} )); then
+      # The word we are selecting starts at the cursor position.
       pos1=$CURSOR
     else
       # No whitespace before us, so select any wordcharacters there.
-      pos1="${#matched_words[1]}"
+      pos1="${#matched_words[start]}"
     fi
 
-    if [[ -n "${matched_words[4]}" ]]; then
-      if [[ -n "${matched_words[3]}" ]] || (( CURSOR == 0 )); then
+    if [[ -n "${matched_words[ws-after-cursor]}" ]]; then
+      if [[ -n "${matched_words[ws-before-cursor]}" ]] || (( CURSOR == 0 )); then
         # whitespace either side, select it
-	(( pos1 = CURSOR - ${#matched_words[3]} ))
-	(( pos2 = CURSOR + ${#matched_words[4]} ))
+	(( pos1 = CURSOR - ${#matched_words[ws-before-cursor]} ))
+	(( pos2 = CURSOR + ${#matched_words[ws-after-cursor]} ))
       else
 	# There's whitespace at the cursor position, so only select
 	# up to the cursor position.
@@ -86,28 +85,28 @@ while (( num-- )); do
     else
       # No whitespace at the cursor position, so select the
       # current character and any following wordcharacters.
-      (( pos2 = CURSOR + ${#matched_words[5]} ))
+      (( pos2 = CURSOR + ${#matched_words[word-after-cursor]} ))
     fi
 
     if [[ $style = a ]]; then
-      if [[ -n "${matched_words[4]}"  && ( -n "${matched_words[3]}" || CURSOR -eq 0 ) ]]; then
+      if [[ -n "${matched_words[ws-after-cursor]}"  && ( -n "${matched_words[ws-before-cursor]}" || CURSOR -eq 0 ) ]]; then
 	# in the middle of whitespace so grab a word
-	if [[ -n "${matched_words[5]}" ]]; then
-	  (( pos2 += ${#matched_words[5]} )) # preferably the one after
+	if [[ -n "${matched_words[word-after-cursor]}" ]]; then
+	  (( pos2 += ${#matched_words[word-after-cursor]} )) # preferably the one after
 	else
-	  (( pos1 -= ${#matched_words[2]} )) # otherwise the one before
+	  (( pos1 -= ${#matched_words[word-before-cursor]} )) # otherwise the one before
 	fi
-      elif [[ -n "${matched_words[6]}" ]]; then
-	(( pos2 += ${#matched_words[6]} ))
-      elif [[ -n "${matched_words[3]}" ]]; then
+      elif [[ -n "${matched_words[ws-after-word]}" ]]; then
+	(( pos2 += ${#matched_words[ws-after-word]} ))
+      elif [[ -n "${matched_words[ws-before-cursor]}" ]]; then
 	# couldn't grab whitespace forwards so try backwards
-	(( pos1 -= ${#matched_words[3]} ))
+	(( pos1 -= ${#matched_words[ws-before-cursor]} ))
       elif (( pos1 > 0 )); then
 	# There might have been whitespace before the word
 	(( CURSOR = pos1 ))
 	match-words-by-style
-	if [[ -n "${matched_words[3]}" ]]; then
-	  (( pos1 -= ${#matched_words[3]} ))
+	if [[ -n "${matched_words[ws-before-cursor]}" ]]; then
+	  (( pos1 -= ${#matched_words[ws-before-cursor]} ))
 	fi
       fi
     fi


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-07-27 23:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <588168129.3340906.1467709726474.JavaMail.yahoo.ref@mail.yahoo.com>
2016-07-05  9:08 ` bug with camel case and delete-whole-word-match function Oliver Kiddle
2016-07-05 10:19   ` Peter Stephenson
2016-07-05 16:12     ` Bart Schaefer
2016-07-05 16:28       ` Peter Stephenson
2016-07-23 23:06     ` Oliver Kiddle
2016-07-24 19:44       ` Peter Stephenson
2016-07-26 13:52       ` Peter Stephenson
2016-07-26 18:22         ` Oliver Kiddle
2016-07-27  8:54           ` Peter Stephenson
2016-07-27 23:05             ` Oliver Kiddle
2016-07-26 15:51   ` Peter Stephenson
2016-07-26 16:00     ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).