[PATCH] Index of element after width of characters

zsh-workers
 help / color / mirror / code / Atom feed

* [PATCH] Index of element after width of characters
@ 2015-10-27 17:18 Sebastian Gniazdowski
  2015-10-27 19:41 ` Mikael Magnusson
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-27 17:18 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 2114 bytes --]

Hello,
I had to implement horizontal scrolling of text and noticed that Asian
characters can have width of two. Scrolling via indexes of strings was
giving different speeds to normal and to the wide characters.

I then noticed useful (m) flag that returns widths. Scrolling can be
done this way but basically it requires to output characters one by
one. I then thought about new flag. It should be handy in various
multi byte scenarios, e.g. in prompts. From submitted documentation:

y:width:
Index-post-width: for strings substitutes index of first character that is
located fully after width `width' of characters (multibyte characters may
have widths of e.g. 2, see the `m' flag). For arrays it substitutes index of
first element that appears after elements whose summed width is at least
`width'. -1 is substituted if there is no such character or array element.


Also a comment from code:
            /* Reached the width?
             * If single-width char 'd' reaches width say 4 it'll look like:
             *     abcd|ef
             * `e' is the returned (by index) character.
             *
             * If double-width char (de) reaches the width:
             *     abc(d|e)f
             * `f' is the returned (by index) character
             * */

The code works, however not in the test that I commit. Wonder why it
works in scripts and at prompt but not in the test? Output of the test
is:

*** 1,8 ****
!  1
!  2
!  2
!  3
!  3
!  4
!  -1
!  -1
--- 1,8 ----
! 1
! 8
! -1
! -1
! -1
! -1
! -1
! -1

The code should be well adapted, because it works like # flag (get
length), it's in the same place in subst.c. I hope that because of
this everything is in check. The function in Src/utils.c is a copied
and modified mature function metacharlenconv(). After things are
clarified I'll probably submit one other flag 'x' that will substitute
last index that yields width lesser than a given one (nickname
index-pre-width?). So it will be a nice mnemonic – y – after, x –
before (a width).

Best regards,
Sebastian Gniazdowski

[-- Attachment #2: index-post-width.patch --]
[-- Type: application/octet-stream, Size: 6737 bytes --]

diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 5ea8610..34d4e86 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -1335,6 +1335,13 @@ will remove the same matches as for `tt(#)', but in reverse order, and the
 form using `tt(%%)' will remove the same matches as for `tt(##)' in reverse
 order.
 )
+item(tt(x:)var(width)tt(:))(
+Index-post-width: for strings substitutes index of first character that is
+located fully after width var(width) of characters (multibyte characters may
+have widths of e.g. 2, see the tt(m) flag). For arrays it substitutes index of
+first element that appears after elements whose summed width is at least
+var(width). -1 is substituted if there is no such character or array element.
+)
 item(tt(B))(
 Include the index of the beginning of the match in the result.
 )
diff --git a/Src/subst.c b/Src/subst.c
index 021d234..764bd17 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -1590,6 +1590,8 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags)
     int flags = 0;
     /* Value from (I) flag, used for ditto. */
     int flnum = 0;
+    /* Value from (y) flag. */
+    int ynum = -1;
     /*
      * sortit is to be passed to strmetasort().
      * indord is the (a) flag, which for consistency doesn't get
@@ -1901,6 +1903,14 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags)
 		    whichlen = 3;
 		    break;
 
+		case 'y':
+		    s++;
+		    ynum = get_intarg(&s, &dellen);
+		    if (ynum < 0)
+			goto flagerr;
+		    s--;
+		    break;
+
 		case 'f':
 		    spsep = "\n";
 		    break;
@@ -3342,6 +3352,51 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags)
 	val = dupstring(buf);
 	isarr = 0;
     }
+
+    /* Index-post-width */
+    if(ynum >= 0) {
+        char buf[14];
+
+	if (isarr) {
+            long index;
+
+            // Width 0 is treated immedietally – any
+            // first array element fulfils
+            if ( ynum == 0 ) {
+                if( *aval )
+                    index = 1;
+                else
+                    index = -1;
+            } else {
+                char **ctr = aval;
+                long count = 0, current_width = 0;
+                index = -1;
+
+                while( *ctr ) {
+                    count ++;
+                    current_width += MB_METASTRLEN2(*ctr, 1);
+
+                    if( current_width >= ynum ) {
+                        // Element after the required width
+                        if( *(ctr+1) )
+                            index = count+1;
+                        else
+                            index = -1;
+                        break;
+                    }
+
+                    ctr ++;
+                }
+            }
+
+            sprintf(buf, "%ld", index);
+        } else {
+            sprintf(buf, "%d", mb_index_post_width(val, ynum));
+        }
+        val = dupstring(buf);
+        isarr = 0;
+    }
+
     /* At this point we make sure that our arrayness has affected the
      * arrayness of the linked list.  Then, we can turn our value into
      * a scalar for convenience sake without affecting the arrayness
diff --git a/Src/utils.c b/Src/utils.c
index 0afa8c9..310b0db 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -5154,6 +5154,104 @@ mb_charlenconv(const char *s, int slen, wint_t *wcp)
     return mb_charlenconv_r(s, slen, wcp, &mb_shiftstate);
 }
 
+/*
+ * Returns index of character that is located first after width `width`
+ */
+/**/
+mod_export int
+mb_index_post_width(char *ptr, int width)
+{
+    char inchar, *laststart;
+    size_t ret;
+    wchar_t wc;
+    int num, num_in_char, index;
+
+    // All characters treated as width 1
+    if (!isset(MULTIBYTE)) {
+        size_t len = ztrlen(ptr);
+        if( len > width ) {
+            return width + 1;
+        } else {
+            return -1;
+        }
+    }
+
+    // Width 0 is treated immedietally
+    if( width == 0 ) {
+        if( *ptr )
+            return 1;
+        else
+            return -1;
+    }
+
+    laststart = ptr;
+    ret = MB_INVALID;
+    num = num_in_char = index = 0;
+
+    memset(&mb_shiftstate, 0, sizeof(mb_shiftstate));
+    while (*ptr) {
+	if (*ptr == Meta)
+	    inchar = *++ptr ^ 32;
+	else
+	    inchar = *ptr;
+	ptr++;
+	ret = mbrtowc(&wc, &inchar, 1, &mb_shiftstate);
+
+	if (ret == MB_INCOMPLETE) {
+	    num_in_char++;
+	} else {
+            /* Count complete chars, which basically should be indexes in
+             * wchar_t array, right? */
+            index++;
+
+	    if (ret == MB_INVALID) {
+		/* Reset, treat as single character, and single width */
+		memset(&mb_shiftstate, 0, sizeof(mb_shiftstate));
+		ptr = laststart + (*laststart == Meta) + 1;
+		num++;
+	    } else {
+		/*
+		 * Returns -1 if not a printable character.  We
+		 * turn this into 0. That makes quite sense as
+                 * we are interested in width of characters,
+                 * and an unprintable one doesn't have it
+		 */
+		int wcw = WCWIDTH(wc);
+		if (wcw > 0)
+                    num += wcw;
+	    }
+
+            /* Reached the width?
+             * If single-width char 'd' reaches width say 4 it'll look like:
+             *     abcd|ef
+             * `e' is the returned (by index) character.
+             *
+             * If double-width char (de) reaches the width:
+             *     abc(d|e)f
+             * `f' is the returned (by index) character
+             * */
+            if (num >= width) {
+                /* We now need to return the next index,
+                 * after we check it exists */
+                if( *ptr )
+                    return index+1;
+                else
+                    return -1;
+            }
+
+	    laststart = ptr;
+	    num_in_char = 0;
+	}
+    }
+
+    /* We choose to point to incomplete character if it's post the width */
+    if (num + (num_in_char > 0 ? 1 : 0) > width) {
+        return index+1;
+    }
+
+    return -1;
+}
+
 /**/
 #else
 
diff --git a/Test/D10width.ztst b/Test/D10width.ztst
new file mode 100644
index 0000000..97e6c11
--- /dev/null
+++ b/Test/D10width.ztst
@@ -0,0 +1,43 @@
+# Test the (y:width:) flag
+
+%test
+
+  setopt multibyte
+  a="測試a句"
+  print ${(y:0:)a}
+  print ${(y:1:)a}
+  print ${(y:2:)a}
+  print ${(y:3:)a}
+  print ${(y:4:)a}
+  print ${(y:5:)a}
+  print ${(y:6:)a}
+  print ${(y:7:)a}
+0:index-post-width for strings
+> 1
+> 2
+> 2
+> 3
+> 3
+> 4
+> -1
+> -1
+
+  setopt multibyte
+  a=( "測試a句" "測" "a" "句句" )
+  print ${(y:0:)a}
+  print ${(y:1:)a}
+  print ${(y:7:)a}
+  print ${(y:8:)a}
+  print ${(y:9:)a}
+  print ${(y:10:)a}
+  print ${(y:11:)a}
+  print ${(y:14:)a}
+0:index-post-width for arrays
+> 1
+> 2
+> 2
+> 3
+> 3
+> 4
+> -1
+> -1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Index of element after width of characters
  2015-10-27 17:18 [PATCH] Index of element after width of characters Sebastian Gniazdowski
@ 2015-10-27 19:41 ` Mikael Magnusson
  2015-10-27 19:55 ` Sebastian Gniazdowski
  2015-10-27 21:20 ` Bart Schaefer
  2 siblings, 0 replies; 13+ messages in thread
From: Mikael Magnusson @ 2015-10-27 19:41 UTC (permalink / raw)
  To: Sebastian Gniazdowski; +Cc: zsh workers

On Tue, Oct 27, 2015 at 6:18 PM, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> Hello,
> I had to implement horizontal scrolling of text and noticed that Asian
> characters can have width of two. Scrolling via indexes of strings was
> giving different speeds to normal and to the wide characters.

+    if(ynum >= 0) {

put a space between "if ("

+            // Width 0 is treated immedietally â€“ any

don't use c++ comments, avoid unicode for basic things like hyphens

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Index of element after width of characters
  2015-10-27 17:18 [PATCH] Index of element after width of characters Sebastian Gniazdowski
  2015-10-27 19:41 ` Mikael Magnusson
@ 2015-10-27 19:55 ` Sebastian Gniazdowski
  2015-10-27 21:20 ` Bart Schaefer
  2 siblings, 0 replies; 13+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-27 19:55 UTC (permalink / raw)
  To: zsh-workers

On 27 October 2015 at 18:18, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> I then noticed useful (m) flag that returns widths. Scrolling can be
> done this way but basically it requires to output characters one by
> one.

To be exact. Suppose you want to display $COLUMNS wide characters, and
skip N characters from beginning of the string. This can be done with
(m) flag by iterating over every char and detecting when to start
displaying, and when to stop displaying. With (x) and (y) flags this
will be much simpler.

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Index of element after width of characters
  2015-10-27 17:18 [PATCH] Index of element after width of characters Sebastian Gniazdowski
  2015-10-27 19:41 ` Mikael Magnusson
  2015-10-27 19:55 ` Sebastian Gniazdowski
@ 2015-10-27 21:20 ` Bart Schaefer
  2015-10-28  7:46   ` Sebastian Gniazdowski
  2 siblings, 1 reply; 13+ messages in thread
From: Bart Schaefer @ 2015-10-27 21:20 UTC (permalink / raw)
  To: Sebastian Gniazdowski; +Cc: Zsh hackers list

On Tue, Oct 27, 2015 at 10:18 AM, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> y:width:
> Index-post-width: for strings substitutes index of first character that is

In the patch file you attached, the doc refers to x:...: rather than
y, whereas the code uses 'y'.  I do find this to be a pretty highly
specialized operation, though, and I'm not sure I like the idea of
using up more of our limited number of remaining single-character flag
letters for it.

> The code works, however not in the test that I commit. Wonder why it
> works in scripts and at prompt but not in the test?

Did you make sure that the LANG and or LC_* etc. environment variables
are set correctly in the test script?  Also this test should probably
be appended to D07multibyte rather than have its own new file.  (That
would ensure that LANG etc. are properly set, too.)  I don't find a
test of the (m) flag anywhere in Test/*.ztst either, for that matter
-- hard to decide if this belongs in D04 or D07 but I tend to think
the latter.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Index of element after width of characters
  2015-10-27 21:20 ` Bart Schaefer
@ 2015-10-28  7:46   ` Sebastian Gniazdowski
  2015-10-28  7:54     ` Sebastian Gniazdowski
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-28  7:46 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Zsh hackers list

On 27 October 2015 at 22:20, Bart Schaefer <schaefer@brasslantern.com> wrote:
> I do find this to be a pretty highly
> specialized operation, though, and I'm not sure I like the idea of
> using up more of our limited number of remaining single-character flag
> letters for it.

I think CJK users need the patch, because with letters that vary in
width things are not what they seem, e.g. $COLUMNS is not what it
seems, any calculation "I will now skip one char to avoid wrapping of
text" is pointless, as char can have width 1, 2, and maybe more. It's
weird to me that no one complaint, wonder how Asian users code scripts
that do some pagination or other formatting of text. It might seem
that the patch is specialized but on the other hand it's fundamental.

I've counted actual flags, there are 42 of them, so there seems to be
room for 10 of them (2*26 letters of English alphabet), 8 with my two
patches. That might be not much. Plus symbols it's more, maybe this
helps. I could use "-" and "," for the patches, it's quite mnemonic –
it would look like ${(-:10:)a}, ${(,:9:)a} - the number is a limit,
"-" is before limit, "," is after limit. Thought about "," and ".",
but "." feels more like "at limit", while "," as "after limit". Minus
feels like "before limit" so "-", "," it's quite fine, even better
than "x", "y".

>> The code works, however not in the test that I commit. Wonder why it
>> works in scripts and at prompt but not in the test?
>
> Did you make sure that the LANG and or LC_* etc. environment variables
> are set correctly in the test script?  Also this test should probably
> be appended to D07multibyte rather than have its own new file.  (That
> would ensure that LANG etc. are properly set, too.)  I don't find a
> test of the (m) flag anywhere in Test/*.ztst either, for that matter
> -- hard to decide if this belongs in D04 or D07 but I tend to think
> the latter.

Thanks for feedback I'll dig into this today.

Best regards,
Sebastian Gniazdowski

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Index of element after width of characters
  2015-10-28  7:46   ` Sebastian Gniazdowski
@ 2015-10-28  7:54     ` Sebastian Gniazdowski
  2015-10-28 11:38       ` Sebastian Gniazdowski
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-28  7:54 UTC (permalink / raw)
  To: zsh-workers

On 28 October 2015 at 08:46, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> I've counted actual flags, there are 42 of them, so there seems to be
> room for 10 of them (2*26 letters of English alphabet), 8 with my two
> patches. That might be not much. Plus symbols it's more, maybe this
> helps. I could use "-" and "," for the patches, it's quite mnemonic –
> it would look like ${(-:10:)a}, ${(,:9:)a} - the number is a limit,
> "-" is before limit, "," is after limit. Thought about "," and ".",
> but "." feels more like "at limit", while "," as "after limit". Minus
> feels like "before limit" so "-", "," it's quite fine, even better
> than "x", "y".

Could also code '.' as "at limit". It would be empty when the limit
doesn't divide any char into half, and return the char (as index) in
other case. This would be a full support of any string operations
where width counts. I'm heating up the already hard to grasp intuition
of the flags, hoping for better end effect.

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Index of element after width of characters
  2015-10-28  7:54     ` Sebastian Gniazdowski
@ 2015-10-28 11:38       ` Sebastian Gniazdowski
  2015-10-28 13:31         ` Mikael Magnusson
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-28 11:38 UTC (permalink / raw)
  To: Zsh hackers list

After thinking this through I would want to use "<", ",", ">". Could I
allocate those symbols? Very mnemonic and nice, the use would be
${(<:10:)a}, ${(,:10:)a}, ${(>:10:)a} for index before width, index at
width, index after width. I checked that they work with parser.

PS. The code works in D07multibyte.ztst, so the tests issue is solved.

Best regards,
Sebastian Gniazdowski

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Index of element after width of characters
  2015-10-28 11:38       ` Sebastian Gniazdowski
@ 2015-10-28 13:31         ` Mikael Magnusson
  2015-10-28 15:46           ` Sebastian Gniazdowski
  0 siblings, 1 reply; 13+ messages in thread
From: Mikael Magnusson @ 2015-10-28 13:31 UTC (permalink / raw)
  To: Sebastian Gniazdowski; +Cc: Zsh hackers list

On Wed, Oct 28, 2015 at 12:38 PM, Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
> After thinking this through I would want to use "<", ",", ">". Could I
> allocate those symbols? Very mnemonic and nice, the use would be
> ${(<:10:)a}, ${(,:10:)a}, ${(>:10:)a} for index before width, index at
> width, index after width. I checked that they work with parser.
>
> PS. The code works in D07multibyte.ztst, so the tests issue is solved.

Wouldn't it make more sense to use syntax like $a[(y)10,-1] and
$a[1,(y)10] ? That's also less likely to trouble Bart since we only
have about 9 flags in that namespace. For example the (w) flag works
this way, making the number refer to words rather than characters in
the string.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Index of element after width of characters
  2015-10-28 13:31         ` Mikael Magnusson
@ 2015-10-28 15:46           ` Sebastian Gniazdowski
  2015-10-28 15:59             ` Peter Stephenson
  0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-28 15:46 UTC (permalink / raw)
  To: Zsh hackers list

I can resign from the "," and stay with "<" and ">" if Bart agrees.
The comma would be more complicated than the rest, having 3 possible
values: index, -1, empty string. Having only "<" and ">" it's easy to
detect that the limit doesn't divide any char in half:

a="測句a"
idx1=${(<:2:)a}
    idx1=1 # 測
idx2=${(>:2:)a}
    idx2=2 # 句
idx3=${(,:2:)a}
    idx3=

a="測句a"
idx1=${(<:3:)a}
    idx1=1 # 測
idx2=${(>:3:)a}
    idx2=3 # a
idx3=${(,:3:)a}
    idx3=2 # 句

As it can be seen ;) when a character is divided in half by the width
limit then indexes returned by < and > differ by 2. So a simple test
allows to detect this and compute the period's index: idx1+1.

Your syntax is very complicated, we should go for < and > as this is
important functionality that allows to use $COLUMNS, construct
prompts, format text. Characters < and > are related, it's good that
they will be allocated for the same feature.

Best regards,
Sebastian Gniazdowski

On 28 October 2015 at 14:31, Mikael Magnusson <mikachu@gmail.com> wrote:
> On Wed, Oct 28, 2015 at 12:38 PM, Sebastian Gniazdowski
> <sgniazdowski@gmail.com> wrote:
>> After thinking this through I would want to use "<", ",", ">". Could I
>> allocate those symbols? Very mnemonic and nice, the use would be
>> ${(<:10:)a}, ${(,:10:)a}, ${(>:10:)a} for index before width, index at
>> width, index after width. I checked that they work with parser.
>>
>> PS. The code works in D07multibyte.ztst, so the tests issue is solved.
>
> Wouldn't it make more sense to use syntax like $a[(y)10,-1] and
> $a[1,(y)10] ? That's also less likely to trouble Bart since we only
> have about 9 flags in that namespace. For example the (w) flag works
> this way, making the number refer to words rather than characters in
> the string.
>
> --
> Mikael Magnusson

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Index of element after width of characters
  2015-10-28 15:46           ` Sebastian Gniazdowski
@ 2015-10-28 15:59             ` Peter Stephenson
  2015-10-28 16:37               ` Sebastian Gniazdowski
  2015-10-28 23:07               ` Bart Schaefer
  0 siblings, 2 replies; 13+ messages in thread
From: Peter Stephenson @ 2015-10-28 15:59 UTC (permalink / raw)
  To: Zsh hackers list

On Wed, 28 Oct 2015 16:46:00 +0100
Sebastian Gniazdowski <sgniazdowski@gmail.com> wrote:
> I can resign from the "," and stay with "<" and ">" if Bart agrees.
> The comma would be more complicated than the rest, having 3 possible
> values: index, -1, empty string. Having only "<" and ">" it's easy to
> detect that the limit doesn't divide any char in half:

Mikael's right, actually --- this is an indexing problem and more
consistently done with subscripts.  It should actually be simpler than
the existing code for space-delimited words in scalars.

pws


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Index of element after width of characters
  2015-10-28 15:59             ` Peter Stephenson
@ 2015-10-28 16:37               ` Sebastian Gniazdowski
  2015-10-28 23:07               ` Bart Schaefer
  1 sibling, 0 replies; 13+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-28 16:37 UTC (permalink / raw)
  To: Zsh hackers list

On 28 October 2015 at 16:59, Peter Stephenson <p.stephenson@samsung.com> wrote:
> On Wed, 28 Oct 2015 16:46:00 +0100
> Sebastian Gniazdowski <sgniazdowski@gmail.com> wrote:
>> I can resign from the "," and stay with "<" and ">" if Bart agrees.
>> The comma would be more complicated than the rest, having 3 possible
>> values: index, -1, empty string. Having only "<" and ">" it's easy to
>> detect that the limit doesn't divide any char in half:
>
> Mikael's right, actually --- this is an indexing problem and more
> consistently done with subscripts.  It should actually be simpler than
> the existing code for space-delimited words in scalars.

Well indexes are substituted there, in e.g. "i  -- lowest index of
value matched by subscript", and strings are also supported, so
apparently I should move my code there

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Index of element after width of characters
  2015-10-28 15:59             ` Peter Stephenson
  2015-10-28 16:37               ` Sebastian Gniazdowski
@ 2015-10-28 23:07               ` Bart Schaefer
  2015-10-29  9:42                 ` Peter Stephenson
  1 sibling, 1 reply; 13+ messages in thread
From: Bart Schaefer @ 2015-10-28 23:07 UTC (permalink / raw)
  To: Zsh hackers list

On Oct 28,  3:59pm, Peter Stephenson wrote:
}
} Mikael's right, actually --- this is an indexing problem and more
} consistently done with subscripts.  It should actually be simpler than
} the existing code for space-delimited words in scalars.

Doesn't subscripting already treat multi-byte characters as single
positions in strings?  The one strangeness may be characters that
have zero disply width.

So I take it that what we need is subscripting that counts display
widths rather than character widths.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Index of element after width of characters
  2015-10-28 23:07               ` Bart Schaefer
@ 2015-10-29  9:42                 ` Peter Stephenson
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Stephenson @ 2015-10-29  9:42 UTC (permalink / raw)
  To: Zsh hackers list

On Wed, 28 Oct 2015 16:07:16 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 28,  3:59pm, Peter Stephenson wrote:
> }
> } Mikael's right, actually --- this is an indexing problem and more
> } consistently done with subscripts.  It should actually be simpler than
> } the existing code for space-delimited words in scalars.
> 
> Doesn't subscripting already treat multi-byte characters as single
> positions in strings?  The one strangeness may be characters that
> have zero disply width.
> 
> So I take it that what we need is subscripting that counts display
> widths rather than character widths.

Yes, in MULTIBYTE mode it counts characters.  It's slightly more
complicated when you take account of the width since there isn't a
one-to-one match from width-based indices to characters in the string.
In particular, if you ask for foo[1,(?)2] (with whatever character is
used for the syntax instead of "?") when the first character is width 2,
you either get nothing or a complete chracter taking you to character
position 2.

I suppose ${#foo[1,(?)2]} is an easy way to tell you how many characters
were actually included.

pws

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-10-29  9:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-27 17:18 [PATCH] Index of element after width of characters Sebastian Gniazdowski
2015-10-27 19:41 ` Mikael Magnusson
2015-10-27 19:55 ` Sebastian Gniazdowski
2015-10-27 21:20 ` Bart Schaefer
2015-10-28  7:46   ` Sebastian Gniazdowski
2015-10-28  7:54     ` Sebastian Gniazdowski
2015-10-28 11:38       ` Sebastian Gniazdowski
2015-10-28 13:31         ` Mikael Magnusson
2015-10-28 15:46           ` Sebastian Gniazdowski
2015-10-28 15:59             ` Peter Stephenson
2015-10-28 16:37               ` Sebastian Gniazdowski
2015-10-28 23:07               ` Bart Schaefer
2015-10-29  9:42                 ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).