zsh-workers
 help / color / mirror / code / Atom feed
* [PATCH] Index of element after width of characters
@ 2015-10-27 17:18 Sebastian Gniazdowski
  2015-10-27 19:41 ` Mikael Magnusson
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-27 17:18 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 2114 bytes --]

Hello,
I had to implement horizontal scrolling of text and noticed that Asian
characters can have width of two. Scrolling via indexes of strings was
giving different speeds to normal and to the wide characters.

I then noticed useful (m) flag that returns widths. Scrolling can be
done this way but basically it requires to output characters one by
one. I then thought about new flag. It should be handy in various
multi byte scenarios, e.g. in prompts. From submitted documentation:

y:width:
Index-post-width: for strings substitutes index of first character that is
located fully after width `width' of characters (multibyte characters may
have widths of e.g. 2, see the `m' flag). For arrays it substitutes index of
first element that appears after elements whose summed width is at least
`width'. -1 is substituted if there is no such character or array element.


Also a comment from code:
            /* Reached the width?
             * If single-width char 'd' reaches width say 4 it'll look like:
             *     abcd|ef
             * `e' is the returned (by index) character.
             *
             * If double-width char (de) reaches the width:
             *     abc(d|e)f
             * `f' is the returned (by index) character
             * */

The code works, however not in the test that I commit. Wonder why it
works in scripts and at prompt but not in the test? Output of the test
is:

*** 1,8 ****
!  1
!  2
!  2
!  3
!  3
!  4
!  -1
!  -1
--- 1,8 ----
! 1
! 8
! -1
! -1
! -1
! -1
! -1
! -1

The code should be well adapted, because it works like # flag (get
length), it's in the same place in subst.c. I hope that because of
this everything is in check. The function in Src/utils.c is a copied
and modified mature function metacharlenconv(). After things are
clarified I'll probably submit one other flag 'x' that will substitute
last index that yields width lesser than a given one (nickname
index-pre-width?). So it will be a nice mnemonic – y – after, x –
before (a width).

Best regards,
Sebastian Gniazdowski

[-- Attachment #2: index-post-width.patch --]
[-- Type: application/octet-stream, Size: 6737 bytes --]

diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 5ea8610..34d4e86 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -1335,6 +1335,13 @@ will remove the same matches as for `tt(#)', but in reverse order, and the
 form using `tt(%%)' will remove the same matches as for `tt(##)' in reverse
 order.
 )
+item(tt(x:)var(width)tt(:))(
+Index-post-width: for strings substitutes index of first character that is
+located fully after width var(width) of characters (multibyte characters may
+have widths of e.g. 2, see the tt(m) flag). For arrays it substitutes index of
+first element that appears after elements whose summed width is at least
+var(width). -1 is substituted if there is no such character or array element.
+)
 item(tt(B))(
 Include the index of the beginning of the match in the result.
 )
diff --git a/Src/subst.c b/Src/subst.c
index 021d234..764bd17 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -1590,6 +1590,8 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags)
     int flags = 0;
     /* Value from (I) flag, used for ditto. */
     int flnum = 0;
+    /* Value from (y) flag. */
+    int ynum = -1;
     /*
      * sortit is to be passed to strmetasort().
      * indord is the (a) flag, which for consistency doesn't get
@@ -1901,6 +1903,14 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags)
 		    whichlen = 3;
 		    break;
 
+		case 'y':
+		    s++;
+		    ynum = get_intarg(&s, &dellen);
+		    if (ynum < 0)
+			goto flagerr;
+		    s--;
+		    break;
+
 		case 'f':
 		    spsep = "\n";
 		    break;
@@ -3342,6 +3352,51 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags)
 	val = dupstring(buf);
 	isarr = 0;
     }
+
+    /* Index-post-width */
+    if(ynum >= 0) {
+        char buf[14];
+
+	if (isarr) {
+            long index;
+
+            // Width 0 is treated immedietally – any
+            // first array element fulfils
+            if ( ynum == 0 ) {
+                if( *aval )
+                    index = 1;
+                else
+                    index = -1;
+            } else {
+                char **ctr = aval;
+                long count = 0, current_width = 0;
+                index = -1;
+
+                while( *ctr ) {
+                    count ++;
+                    current_width += MB_METASTRLEN2(*ctr, 1);
+
+                    if( current_width >= ynum ) {
+                        // Element after the required width
+                        if( *(ctr+1) )
+                            index = count+1;
+                        else
+                            index = -1;
+                        break;
+                    }
+
+                    ctr ++;
+                }
+            }
+
+            sprintf(buf, "%ld", index);
+        } else {
+            sprintf(buf, "%d", mb_index_post_width(val, ynum));
+        }
+        val = dupstring(buf);
+        isarr = 0;
+    }
+
     /* At this point we make sure that our arrayness has affected the
      * arrayness of the linked list.  Then, we can turn our value into
      * a scalar for convenience sake without affecting the arrayness
diff --git a/Src/utils.c b/Src/utils.c
index 0afa8c9..310b0db 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -5154,6 +5154,104 @@ mb_charlenconv(const char *s, int slen, wint_t *wcp)
     return mb_charlenconv_r(s, slen, wcp, &mb_shiftstate);
 }
 
+/*
+ * Returns index of character that is located first after width `width`
+ */
+/**/
+mod_export int
+mb_index_post_width(char *ptr, int width)
+{
+    char inchar, *laststart;
+    size_t ret;
+    wchar_t wc;
+    int num, num_in_char, index;
+
+    // All characters treated as width 1
+    if (!isset(MULTIBYTE)) {
+        size_t len = ztrlen(ptr);
+        if( len > width ) {
+            return width + 1;
+        } else {
+            return -1;
+        }
+    }
+
+    // Width 0 is treated immedietally
+    if( width == 0 ) {
+        if( *ptr )
+            return 1;
+        else
+            return -1;
+    }
+
+    laststart = ptr;
+    ret = MB_INVALID;
+    num = num_in_char = index = 0;
+
+    memset(&mb_shiftstate, 0, sizeof(mb_shiftstate));
+    while (*ptr) {
+	if (*ptr == Meta)
+	    inchar = *++ptr ^ 32;
+	else
+	    inchar = *ptr;
+	ptr++;
+	ret = mbrtowc(&wc, &inchar, 1, &mb_shiftstate);
+
+	if (ret == MB_INCOMPLETE) {
+	    num_in_char++;
+	} else {
+            /* Count complete chars, which basically should be indexes in
+             * wchar_t array, right? */
+            index++;
+
+	    if (ret == MB_INVALID) {
+		/* Reset, treat as single character, and single width */
+		memset(&mb_shiftstate, 0, sizeof(mb_shiftstate));
+		ptr = laststart + (*laststart == Meta) + 1;
+		num++;
+	    } else {
+		/*
+		 * Returns -1 if not a printable character.  We
+		 * turn this into 0. That makes quite sense as
+                 * we are interested in width of characters,
+                 * and an unprintable one doesn't have it
+		 */
+		int wcw = WCWIDTH(wc);
+		if (wcw > 0)
+                    num += wcw;
+	    }
+
+            /* Reached the width?
+             * If single-width char 'd' reaches width say 4 it'll look like:
+             *     abcd|ef
+             * `e' is the returned (by index) character.
+             *
+             * If double-width char (de) reaches the width:
+             *     abc(d|e)f
+             * `f' is the returned (by index) character
+             * */
+            if (num >= width) {
+                /* We now need to return the next index,
+                 * after we check it exists */
+                if( *ptr )
+                    return index+1;
+                else
+                    return -1;
+            }
+
+	    laststart = ptr;
+	    num_in_char = 0;
+	}
+    }
+
+    /* We choose to point to incomplete character if it's post the width */
+    if (num + (num_in_char > 0 ? 1 : 0) > width) {
+        return index+1;
+    }
+
+    return -1;
+}
+
 /**/
 #else
 
diff --git a/Test/D10width.ztst b/Test/D10width.ztst
new file mode 100644
index 0000000..97e6c11
--- /dev/null
+++ b/Test/D10width.ztst
@@ -0,0 +1,43 @@
+# Test the (y:width:) flag
+
+%test
+
+  setopt multibyte
+  a="測試a句"
+  print ${(y:0:)a}
+  print ${(y:1:)a}
+  print ${(y:2:)a}
+  print ${(y:3:)a}
+  print ${(y:4:)a}
+  print ${(y:5:)a}
+  print ${(y:6:)a}
+  print ${(y:7:)a}
+0:index-post-width for strings
+> 1
+> 2
+> 2
+> 3
+> 3
+> 4
+> -1
+> -1
+
+  setopt multibyte
+  a=( "測試a句" "測" "a" "句句" )
+  print ${(y:0:)a}
+  print ${(y:1:)a}
+  print ${(y:7:)a}
+  print ${(y:8:)a}
+  print ${(y:9:)a}
+  print ${(y:10:)a}
+  print ${(y:11:)a}
+  print ${(y:14:)a}
+0:index-post-width for arrays
+> 1
+> 2
+> 2
+> 3
+> 3
+> 4
+> -1
+> -1

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-10-29  9:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-27 17:18 [PATCH] Index of element after width of characters Sebastian Gniazdowski
2015-10-27 19:41 ` Mikael Magnusson
2015-10-27 19:55 ` Sebastian Gniazdowski
2015-10-27 21:20 ` Bart Schaefer
2015-10-28  7:46   ` Sebastian Gniazdowski
2015-10-28  7:54     ` Sebastian Gniazdowski
2015-10-28 11:38       ` Sebastian Gniazdowski
2015-10-28 13:31         ` Mikael Magnusson
2015-10-28 15:46           ` Sebastian Gniazdowski
2015-10-28 15:59             ` Peter Stephenson
2015-10-28 16:37               ` Sebastian Gniazdowski
2015-10-28 23:07               ` Bart Schaefer
2015-10-29  9:42                 ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).