zsh-workers
 help / color / mirror / code / Atom feed
* Re: Phil's prompt is not working when LANG is set to UTF-8
       [not found]   ` <200802141300.m1ED0qOo017425@news01.csr.com>
@ 2008-02-15 19:48     ` Andrey Borzenkov
  2008-02-15 19:55       ` Andrey Borzenkov
  2008-02-15 20:10       ` Wael Nasreddine
  0 siblings, 2 replies; 5+ messages in thread
From: Andrey Borzenkov @ 2008-02-15 19:48 UTC (permalink / raw)
  To: zsh-workers; +Cc: Wael Nasreddine


[-- Attachment #1.1: Type: text/plain, Size: 1277 bytes --]

On Thursday 14 February 2008, Peter Stephenson wrote:
> 
> Wael Nasreddine wrote:
> > Peter I couldn't install Fedora because it doesn't work with LVM over
> > DM-Crypt, have you tried my environment ??
> 
> No, it seems unlikely I'm going to have time for that sort of
> time-consuming procedure which is any case speculative.  It seems like
> the next step is understanding the implications of Andrei's findings
> since he's already narrowed it down.  I don't currently know anything
> about the system he's talking about.
> 


I took liberty to move this to workers.

In case it rings the bell for anyone. Here are prompt lengths computed by
zsh for phil's prompt in ru_RU.UTF-8 locale (where there were the same
results for en_US.UTF-8 as well, so at least proper UTF-8 part is correctly
computed :) )

(gdb) p rpromptw
$1 = 12
(gdb) p lpromptw
$2 = 9
(gdb) p lprompth
$3 = 2
(gdb) p rprompth
$4 = 1

that's absolutely wrong. The actual prompt lengths are (see screenshot)

lpromptw = 13
rptomptw = 16 (it has one space in it)

this perfectly correspnds to something (zsh?) ignoring invalid characters
with high bit set. In both left and right prompts there are exactly 4 of
ACS chars.

I attach both left and ritgh prompts as well.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: PROMPT --]
[-- Type: text/plain; charset="utf-8"; name="PROMPT", Size: 386 bytes --]

^[[1m^[[36m^[[11mÚ^[[1m^[[34mÄ^[[10m(^[[1m^[[36mbor^[[1m^[[33m@^[[1m^[[32mcooker^[[1m^[[33m:^[[1m^[[34m1^[[1m^[[34m)^[[11mÄ^[[1m^[[36mÄ                                                                                                       ^[[1m^[[34mÄ^[[10m(^[[1m^[[35m~^[[1m^[[34m)^[[11mÄ^[[1m^[[36m¿^[[10m
^[[1m^[[36m^[[11mÀ^[[1m^[[34mÄ^[[10m(^[[1m^[[33m22:51^[[34m:^[[1m^[[37m%^[[1m^[[34m)^[[11mÄ^[[10m^[[1m^[[36m^[[11mÄ^[[10m^[[0;10m 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.3: RPROMPT --]
[-- Type: text/plain; charset="utf-8"; name="RPROMPT", Size: 94 bytes --]

 ^[[1m^[[36m^[[11mÄ^[[1m^[[34mÄ^[[10m(^[[1m^[[33mЧтв,Фев14^[[1m^[[34m)^[[11mÄ^[[1m^[[36mÙ^[[10m^[[0;10m

[-- Attachment #1.4: prompt.png --]
[-- Type: image/png, Size: 9419 bytes --]

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Phil's prompt is not working when LANG is set to UTF-8
  2008-02-15 19:48     ` Phil's prompt is not working when LANG is set to UTF-8 Andrey Borzenkov
@ 2008-02-15 19:55       ` Andrey Borzenkov
  2008-02-15 20:10       ` Wael Nasreddine
  1 sibling, 0 replies; 5+ messages in thread
From: Andrey Borzenkov @ 2008-02-15 19:55 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1938 bytes --]

On Friday 15 February 2008, Andrey Borzenkov wrote:
> On Thursday 14 February 2008, Peter Stephenson wrote:
> > 
> > Wael Nasreddine wrote:
> > > Peter I couldn't install Fedora because it doesn't work with LVM over
> > > DM-Crypt, have you tried my environment ??
> > 
> > No, it seems unlikely I'm going to have time for that sort of
> > time-consuming procedure which is any case speculative.  It seems like
> > the next step is understanding the implications of Andrei's findings
> > since he's already narrowed it down.  I don't currently know anything
> > about the system he's talking about.
> > 
> 
> 
> I took liberty to move this to workers.
> 
> In case it rings the bell for anyone. Here are prompt lengths computed by
> zsh for phil's prompt in ru_RU.UTF-8 locale (where there were the same
> results for en_US.UTF-8 as well, so at least proper UTF-8 part is correctly
> computed :) )
> 
> (gdb) p rpromptw
> $1 = 12
> (gdb) p lpromptw
> $2 = 9
> (gdb) p lprompth
> $3 = 2
> (gdb) p rprompth
> $4 = 1
> 
> that's absolutely wrong. The actual prompt lengths are (see screenshot)
> 
> lpromptw = 13
> rptomptw = 16 (it has one space in it)
> 
> this perfectly correspnds to something (zsh?) ignoring invalid characters
> with high bit set.

For sure.

Src/prompt.c:countprompt()

            case MB_INVALID:
                memset(&mbs, 0, sizeof mbs);
                /* FALL THROUGH */
            case 0:
                /* Invalid character or null: assume no output. */
                multi = 0;
                break;

Oops.

I do not actually see how can we fix it except introducing prompt
expansion syntax for ACS (or may be for any terminfo sequence in general)
and simply assuming characters in any of them are of width 1.

> In both left and right prompts there are exactly 4 of 
> ACS chars.
> 
> I attach both left and ritgh prompts as well.
> 



[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Phil's prompt is not working when LANG is set to UTF-8
  2008-02-15 19:48     ` Phil's prompt is not working when LANG is set to UTF-8 Andrey Borzenkov
  2008-02-15 19:55       ` Andrey Borzenkov
@ 2008-02-15 20:10       ` Wael Nasreddine
  1 sibling, 0 replies; 5+ messages in thread
From: Wael Nasreddine @ 2008-02-15 20:10 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 2245 bytes --]

This One Time, at Band Camp, Andrey Borzenkov <arvidjaar@newmail.ru> said, On Fri, Feb 15, 2008 at 10:48:55PM +0300:
> On Thursday 14 February 2008, Peter Stephenson wrote:

> > Wael Nasreddine wrote:
> > > Peter I couldn't install Fedora because it doesn't work with LVM over
> > > DM-Crypt, have you tried my environment ??

> > No, it seems unlikely I'm going to have time for that sort of
> > time-consuming procedure which is any case speculative.  It seems like
> > the next step is understanding the implications of Andrei's findings
> > since he's already narrowed it down.  I don't currently know anything
> > about the system he's talking about.



> I took liberty to move this to workers.

> In case it rings the bell for anyone. Here are prompt lengths computed by
> zsh for phil's prompt in ru_RU.UTF-8 locale (where there were the same
> results for en_US.UTF-8 as well, so at least proper UTF-8 part is correctly
> computed :) )

> (gdb) p rpromptw
> $1 = 12
> (gdb) p lpromptw
> $2 = 9
> (gdb) p lprompth
> $3 = 2
> (gdb) p rprompth
> $4 = 1

> that's absolutely wrong. The actual prompt lengths are (see screenshot)

> lpromptw = 13
> rptomptw = 16 (it has one space in it)

> this perfectly correspnds to something (zsh?) ignoring invalid characters
> with high bit set. In both left and right prompts there are exactly 4 of
> ACS chars.

> I attach both left and ritgh prompts as well.

> ^[[1m^[[36m^[[11m?^[[1m^[[34m?^[[10m(^[[1m^[[36mbor^[[1m^[[33m@^[[1m^[[32mcooker^[[1m^[[33m:^[[1m^[[34m1^[[1m^[[34m)^[[11m?^[[1m^[[36m?                                                                                                       ^[[1m^[[34m?^[[10m(^[[1m^[[35m~^[[1m^[[34m)^[[11m?^[[1m^[[36m?^[[10m
> ^[[1m^[[36m^[[11m?^[[1m^[[34m?^[[10m(^[[1m^[[33m22:51^[[34m:^[[1m^[[37m%^[[1m^[[34m)^[[11m?^[[10m^[[1m^[[36m^[[11m?^[[10m^[[0;10m 
>  ^[[1m^[[36m^[[11m?^[[1m^[[34m?^[[10m(^[[1m^[[33m??????,??????14^[[1m^[[34m)^[[11m?^[[1m^[[36m?^[[10m^[[0;10m

So it does seem a zsh problem after all, file a bug perhaps?

-- 
Wael Nasreddine
http://wael.nasreddine.com
PGP: 1024D/C8DD18A2 06F6 1622 4BC8 4CEB D724  DE12 5565 3945 C8DD 18A2

.: An infinite number of monkeys typing into GNU emacs,
   would never make a good program. (L. Torvalds 1995) :.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Phil's prompt is not working when LANG is set to UTF-8
  2008-02-17 16:43   ` Peter Stephenson
@ 2008-02-17 17:40     ` Peter Stephenson
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Stephenson @ 2008-02-17 17:40 UTC (permalink / raw)
  To: Zsh Hackers' List

Peter Stephenson wrote:
> > Does it matter where %G appears?  %{...%7G...%} ?
> 
> Because they're indepdendent, it doesn't matter at all

Slight overstatement, since if we truncate the prompt the position of
the %G becomes important if it's not associated with the right
%{...%}, but the answer to the spirit of your original question is still
no, it doesn't matter where it appears within a %{...%}.

However, this, er, prompted me to look and see how we handle truncation
and there is code missing for this case.  The right thing to do must be
to copy the entire group, since we don't know how to divide it up.
Consequently it's a good idea to divide the groups as far as possible
into single characters.  This seems worth documenting.  This is an
updated patch.

Index: Doc/Zsh/prompt.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/prompt.yo,v
retrieving revision 1.10
diff -u -r1.10 prompt.yo
--- Doc/Zsh/prompt.yo	15 Feb 2008 23:59:09 -0000	1.10
+++ Doc/Zsh/prompt.yo	17 Feb 2008 17:39:39 -0000
@@ -187,6 +187,9 @@
 Include a string as a literal escape sequence.
 The string within the braces should not change the cursor
 position.  Brace pairs can nest.
+
+A positive numeric argument between the tt(%) and the %%({) is treated as
+described for tt(%G) below.
 )
 item(tt(%G))(
 Within a tt(%{)...tt(%}) sequence, include a `glitch': that is, assume
@@ -199,6 +202,13 @@
 indicates a character width other than one.  Hence tt(%{)var(seq)tt(%2G%})
 outputs var(seq) and assumes it takes up the width of two standard
 characters.
+
+Multiple uses of tt(%G) accumulate in the obvious fashion; the position
+of the tt(%G) is unimportant.  Negative integers are not handled.
+
+Note that when prompt truncation is in use it is advisable to divide up
+output into single characters within each tt(%{)...tt(%}) group so that
+the correct truncation point can be found.
 )
 enditem()
 
Index: Src/prompt.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/prompt.c,v
retrieving revision 1.45
diff -u -r1.45 prompt.c
--- Src/prompt.c	15 Feb 2008 23:59:09 -0000	1.45
+++ Src/prompt.c	17 Feb 2008 17:39:41 -0000
@@ -472,7 +472,10 @@
 		    addbufspc(1);
 		    *bp++ = Inpar;
 		}
-		break;
+		if (arg <= 0)
+		    break;
+		/* else */
+		/* FALLTHROUGH */
 	    case 'G':
 		if (arg > 0) {
 		    addbufspc(arg);
@@ -948,9 +951,11 @@
 		break;
 	    case MB_INVALID:
 		memset(&mbs, 0, sizeof mbs);
-		/* FALL THROUGH */
+		/* Invalid character: assume single width. */
+		multi = 0;
+		w++;
+		break;
 	    case 0:
-		/* Invalid character or null: assume no output. */
 		multi = 0;
 		break;
 	    default:
@@ -1124,14 +1129,19 @@
 			    /*
 			     * Text marked as invisible: copy
 			     * regardless, since we don't know what
-			     * this does but it shouldn't affect
-			     * the width.
+			     * this does.  It only affects the width
+			     * if there are Nularg's present.
+			     * However, even in that case we
+			     * can't break the sequence down, so
+			     * we still loop over the entire group.
 			     */
 			    for (;;) {
 				*ptr++ = *fulltextptr;
 				if (*fulltextptr == Outpar ||
 				    *fulltextptr == '\0')
 				    break;
+				if (*fulltextptr == Nularg)
+				    remw--;
 				fulltextptr++;
 			    }
 			} else {
@@ -1206,8 +1216,15 @@
 
 		    while (maxwidth > 0 && *skiptext) {
 			if (*skiptext == Inpar) {
-			    for (; *skiptext != Outpar && *skiptext;
-				 skiptext++);
+			    /* see comment on left truncation above */
+			    for (;;) {
+				if (*skiptext == Outpar ||
+				    *skiptext == '\0')
+				    break;
+				if (*skiptext == Nularg)
+				    maxwidth--;
+				skiptext++;
+			    }
 			} else {
 #ifdef MULTIBYTE_SUPPORT
 			    char inchar;


-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Phil's prompt is not working when LANG is set to UTF-8
  2008-02-16 19:13 ` Bart Schaefer
@ 2008-02-17 16:43   ` Peter Stephenson
  2008-02-17 17:40     ` Peter Stephenson
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Stephenson @ 2008-02-17 16:43 UTC (permalink / raw)
  To: Zsh Hackers' List

On Sat, 16 Feb 2008 11:13:35 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Feb 15, 11:52pm, Peter Stephenson wrote:
> }
> } - Hence it falls foul of the multibyte tests.  In principle it
> }   might clash with a UTF-8 character anyway and have the wrong
> }   width, so assuming a width 1 for an unknown character is not
> }   necessarily better than assuming width 0.
> 
> I agree with "not necessarily," but I suspect it'll be right more
> often than wrong to assume 1.  *Most* characters are not going to be
> "non-printing," and if the mb library doesn't recognize them, then
> they're also unlikely to be simultaneously multibyte and handled
> correctly by the terminal.

OK, I'll change that (though needless to say the %G method is
still recommended as being less hit and miss).

> I like this, but I'm wondering if it might not be better to have %{
> accept a count, e.g., instead of %{...%6G%} just write %6{...%}.

That was my originally thought, but then I realisd that as far as the
code is concerned they're logically independent.  From the point of view
of the user it's a rather different matter, however.  It's trivial to
support both.

> Does it matter where %G appears?  %{...%7G...%} ?

Because they're indepdendent, it doesn't matter at all, and multiple
uses acumulate.  In fact, they don't actually need to appear within
the %{...%}, although there's no obvious reason to put them elsewhere.

> Ooh, and what about negative numbers?  Can one say %-4G to mean that
> the sequence actually moved the cursor to the LEFT four positions?
> *That* could be really useful.

Yes, but I picked the present change because it was already supported by
the backend code.  Additions like the above are another story.

Index: Doc/Zsh/prompt.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/prompt.yo,v
retrieving revision 1.10
diff -u -r1.10 prompt.yo
--- Doc/Zsh/prompt.yo	15 Feb 2008 23:59:09 -0000	1.10
+++ Doc/Zsh/prompt.yo	17 Feb 2008 16:42:43 -0000
@@ -187,6 +187,9 @@
 Include a string as a literal escape sequence.
 The string within the braces should not change the cursor
 position.  Brace pairs can nest.
+
+A positive numeric argument between the tt(%) and the %%({) is treated as
+described for tt(%G) below.
 )
 item(tt(%G))(
 Within a tt(%{)...tt(%}) sequence, include a `glitch': that is, assume
@@ -199,6 +202,9 @@
 indicates a character width other than one.  Hence tt(%{)var(seq)tt(%2G%})
 outputs var(seq) and assumes it takes up the width of two standard
 characters.
+
+Multiple uses of tt(%G) accumulate in the obvious fashion; the position
+of the tt(%G) is unimportant.  Negative integers are not handled.
 )
 enditem()
 
Index: Src/prompt.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/prompt.c,v
retrieving revision 1.45
diff -u -r1.45 prompt.c
--- Src/prompt.c	15 Feb 2008 23:59:09 -0000	1.45
+++ Src/prompt.c	17 Feb 2008 16:42:43 -0000
@@ -472,7 +472,10 @@
 		    addbufspc(1);
 		    *bp++ = Inpar;
 		}
-		break;
+		if (arg <= 0)
+		    break;
+		/* else */
+		/* FALLTHROUGH */
 	    case 'G':
 		if (arg > 0) {
 		    addbufspc(arg);
@@ -948,9 +951,11 @@
 		break;
 	    case MB_INVALID:
 		memset(&mbs, 0, sizeof mbs);
-		/* FALL THROUGH */
+		/* Invalid character: assume single width. */
+		multi = 0;
+		w++;
+		break;
 	    case 0:
-		/* Invalid character or null: assume no output. */
 		multi = 0;
 		break;
 	    default:

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-02-17 17:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20080211033116.GD19613@phoenix.nasreddine.info>
     [not found] ` <20080214123757.GA2943@phoenix.nasreddine.info>
     [not found]   ` <200802141300.m1ED0qOo017425@news01.csr.com>
2008-02-15 19:48     ` Phil's prompt is not working when LANG is set to UTF-8 Andrey Borzenkov
2008-02-15 19:55       ` Andrey Borzenkov
2008-02-15 20:10       ` Wael Nasreddine
2008-02-15 23:52 Fw: " Peter Stephenson
2008-02-16 19:13 ` Bart Schaefer
2008-02-17 16:43   ` Peter Stephenson
2008-02-17 17:40     ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).