Better handling of wide glyphs (ask the terminal, not wcwidth)

zsh-workers
 help / color / mirror / code / Atom feed

* Better handling of wide glyphs (ask the terminal, not wcwidth)
@ 2016-11-05 22:04 Daniel Hahler
  2016-11-05 22:37 ` Bart Schaefer
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Hahler @ 2016-11-05 22:04 UTC (permalink / raw)
  To: Zsh Hackers' List

[-- Attachment #1.1: Type: text/plain, Size: 1764 bytes --]

I am working on a patch for rxvt-unicode, which basically asks the (Xft) font about the glyph's width, instead of using wcwidth(3).
This works around the issue that wcwidth is not updated for Unicode 9 on Linux yet, and is really required after all, since there is a private use area, and it depends on the font you are using for those glyphs (e.g. FontAwesome).

This method gets provided as a shared object then, which allows to LD_PRELOAD it (overwriting wcwidth and wcswidth).
In this case Zsh will use the same method, and everything is fine!

But this shows that there is a problem between Zsh and the terminal, since the display gets out of sync, i.e.

1. I insert "🐍", it gets two cells in the terminal:
% 🐍
2. I add "a" after it.  "🐍a" gets displayed (3 cells), but the offset gets shifted to the right:
%  🐍a
This happens also already when only moving the cursor to the left after inserting the snake glyph.

So I wondered if Zsh could be smarter even without the custom wcwidth(3) in LD_PRELOAD: there is CSI 6 n ('\e[6n'), which can be used to ask the terminal about the current position.
This could be used for a certain range of characters, where zle (?) would query the position before and after displaying it to get the "real" width (as seen by the terminal).

What do you think?

While looking into the source regarding this, I've found that there is a useful function already to get some info ("whatcursorposition", bound to "ga" in Vim mode).
Regardless of the idea above, I think it makes sense to include info from "CSI 6 n" there.

I assume there are methods already that query the terminal through ANSI control codes?
I've found Functions/Misc/promptnl, but that uses shell code.

Cheers,
Daniel.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Better handling of wide glyphs (ask the terminal, not wcwidth)
  2016-11-05 22:04 Better handling of wide glyphs (ask the terminal, not wcwidth) Daniel Hahler
@ 2016-11-05 22:37 ` Bart Schaefer
  2016-11-07  1:09   ` Daniel Hahler
  0 siblings, 1 reply; 4+ messages in thread
From: Bart Schaefer @ 2016-11-05 22:37 UTC (permalink / raw)
  To: Zsh Hackers' List

On Nov 5, 11:04pm, Daniel Hahler wrote:
}
} This method gets provided as a shared object then, which allows to
} LD_PRELOAD it (overwriting wcwidth and wcswidth). In this case Zsh
} will use the same method, and everything is fine!
}
} But this shows that there is a problem between Zsh and the terminal,

Just to clarify, you mean the problem "between Zsh and the terminal" is
present even *with* this LD_PRELOAD?

} So I wondered if Zsh could be smarter even without the custom
} wcwidth(3) in LD_PRELOAD: there is CSI 6 n ('\e[6n'), which can be
} used to ask the terminal about the current position.

This is error-prone (network inefficiency/inconsistency may cause it to
fail) and in most cases zsh internals will be asking for the width of
a character that isn't on the screen yet at all, or at least is not in
the position where the cursor is located.

So for this to work we'd have to move the cursor to an innocuous spot
(already difficult enough with terminal variances), print CIS 6, read
the position, print the character we care about, print CIS 6 again,
read again, and finally erase what we just did (with no way to put
back what was overwritten), all while hoping that the network didn't
glitch on us in the meantime.  That's a lot of round trips to the
terminal for what might be inside a loop over a long string.

} What do you think?

I think unicode glyphs have been allowed to go entirely overboard.  I
blame Sirius Cyberne -- er, I mean, Apple.

A zsh module that reads glyph widths from a config file might be a way
to approach this, plus a utility to generate such a configuration from
the terminal -- sort of a termcap library for glyphs.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Better handling of wide glyphs (ask the terminal, not wcwidth)
  2016-11-05 22:37 ` Bart Schaefer
@ 2016-11-07  1:09   ` Daniel Hahler
  2016-11-07  2:54     ` Bart Schaefer
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Hahler @ 2016-11-07  1:09 UTC (permalink / raw)
  To: zsh-workers@zsh.org >> Zsh Hackers' List

[-- Attachment #1.1: Type: text/plain, Size: 4034 bytes --]

On 05.11.2016 23:37, Bart Schaefer wrote:
> On Nov 5, 11:04pm, Daniel Hahler wrote:
> }
> } This method gets provided as a shared object then, which allows to
> } LD_PRELOAD it (overwriting wcwidth and wcswidth). In this case Zsh
> } will use the same method, and everything is fine!
> }
> } But this shows that there is a problem between Zsh and the terminal,
> 
> Just to clarify, you mean the problem "between Zsh and the terminal" is
> present even *with* this LD_PRELOAD?

No.  Then it works as expected.  But with e.g. Vim this works better
already in the case the terminal and Vim disagree (Vim is not using
wcwidth(3) by default already).

But I thought that this LD_PRELOAD hack might not be necessary after all.

> } So I wondered if Zsh could be smarter even without the custom
> } wcwidth(3) in LD_PRELOAD: there is CSI 6 n ('\e[6n'), which can be
> } used to ask the terminal about the current position.
>
> This is error-prone (network inefficiency/inconsistency may cause it to
> fail) and in most cases zsh internals will be asking for the width of
> a character that isn't on the screen yet at all, or at least is not in
> the position where the cursor is located.
>
> So for this to work we'd have to move the cursor to an innocuous spot
> (already difficult enough with terminal variances), print CIS 6, read
> the position, print the character we care about, print CIS 6 again,
> read again, and finally erase what we just did (with no way to put
> back what was overwritten), all while hoping that the network didn't
> glitch on us in the meantime.  That's a lot of round trips to the
> terminal for what might be inside a loop over a long string.

I see, thanks for your explanation.  I was not taking network traffic
into account at all.

However this would only be necessary for some / special chars after all,
and can be cached then internally - although the terminal might change
its result, e.g. when the font gets changed, of course.

Where would I have to look / poke to do this for the prompt and ZLE only?
There it should be mostly about chars that are about to be displayed,
and in this case the "painting in an innocuous spot" is not required at
all (given that those chars are displayed one by one).

> } What do you think?
> 
> I think unicode glyphs have been allowed to go entirely overboard.  I
> blame Sirius Cyberne -- er, I mean, Apple.

It's also a lot about Powerline, FontAwesome and its variants.  I agree
however that there are two worlds colliding and that it is difficult to
solve this using fixed tables of character widths, especially for
codepoints in the private use area.

I was using a hack with rxvt-unicode before already, which basically
required you to add spaces after wide glyphs.
A new approach is the one described here, which handles them as wide
chars internally, based on the result from the Xft font.  (The code is
at https://github.com/exg/rxvt-unicode/compare/master...blueyed:wcwidth-hack).

> A zsh module that reads glyph widths from a config file might be a way
> to approach this, plus a utility to generate such a configuration from
> the terminal -- sort of a termcap library for glyphs.

One of my initial ideas was also to generate just a custom wcwidth.so to
be used with LD_PRELOAD then, but it depends on the actual font being
used after all.
Since a terminal's font is typically not changed often that would be
feasible, but still requires you to use LD_PRELOAD (and programs picking
that up), so there is not much gained after all (compared to the
wcwidth(3) callback to the terminal).

> a utility to generate such a configuration from the terminal

How would that work then?  Based on the method described above?
Then it would be a pre-generated cache basically?!
It might be hard to predict what glyphs are being used in the future
though, and it is probably rather big.  It's also basically a custom
wcwidth(3) implementation then, isn't it?

Thanks,
Daniel.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Better handling of wide glyphs (ask the terminal, not wcwidth)
  2016-11-07  1:09   ` Daniel Hahler
@ 2016-11-07  2:54     ` Bart Schaefer
  0 siblings, 0 replies; 4+ messages in thread
From: Bart Schaefer @ 2016-11-07  2:54 UTC (permalink / raw)
  To: zsh-workers

On Nov 7,  2:09am, Daniel Hahler wrote:
}
} Where would I have to look / poke to do this for the prompt and ZLE only?

Mostly in Src/prompt.c, surprisingly enough.  There are a couple of
routines in Src/utils.c and the code in Src/Zle/complist.c has its
own variants for output with color controls.

} There it should be mostly about chars that are about to be displayed,
} and in this case the "painting in an innocuous spot" is not required at
} all (given that those chars are displayed one by one).

The problem there is that the prompt formatting code needs to be able
to calculate the width before it begins printing, so that it can stop
before printing something that won't fit.  Otherwise you run into all
sorts of auto-margin issues that are already bad enough.

And then there's the code used by e.g. zsh-syntax-highlighting which
also needs to compute widths in advance in order to populate the array
of positions where terminal attributes change.

} > a utility to generate such a configuration from the terminal
} 
} How would that work then?  Based on the method described above?

Yes, basically the way zkbd works except it wouldn't need to be
interactive.

} Then it would be a pre-generated cache basically?!

Yes.

} It might be hard to predict what glyphs are being used in the future
} though, and it is probably rather big.

Yes; if you tried to store them all it'd probably be about the size of
the terminfo database multiplied by the number of fonts.  But it could
omit any glyphs that are only 1 column wide, and you'd just need to
generate the tables for the presumably manageable combination of fonts
and terminals that a particular user is commonly using.

} It's also basically a custom wcwidth(3) implementation then, isn't it?

Yes again; but if you can supply the table of glyphs separately then
you don't need to update the linked library every time new glyphs or
fonts are introduced.  (I would not be surprised to find a standards
body working on something like this already.)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-11-07  2:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-05 22:04 Better handling of wide glyphs (ask the terminal, not wcwidth) Daniel Hahler
2016-11-05 22:37 ` Bart Schaefer
2016-11-07  1:09   ` Daniel Hahler
2016-11-07  2:54     ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).