zsh-workers
 help / color / mirror / code / Atom feed
* zsh generates invalid UTF-8 encoding in the history
@ 2016-10-05 11:48 Vincent Lefevre
  2016-10-05 12:41 ` Mikael Magnusson
  2016-10-05 17:25 ` Bart Schaefer
  0 siblings, 2 replies; 7+ messages in thread
From: Vincent Lefevre @ 2016-10-05 11:48 UTC (permalink / raw)
  To: zsh-workers

With Debian's zsh 5.2-5 + some patches, when I execute commands with
some particular Unicode characters, the UTF-8 sequences are rewritten
incorrectly in the history. For instance:

cventin:~> unicode ─
U+2500 BOX DRAWINGS LIGHT HORIZONTAL
UTF-8: e2 94 80 UTF-16BE: 2500 Decimal: ─ Octal: \022400
─
Category: So (Symbol, Other)
Unicode block: 2500..257F; Box Drawing
Bidi: ON (Other Neutrals)

But in the history, instead of getting e2 94 80, I get: e2 83 b4 80.
Concerning "e2 83 b4 80":

cventin:~> unicode --fromcp utf-8 -x e283b4
U+20F4  - No such unicode character name in database
UTF-8: e2 83 b4 UTF-16BE: 20f4 Decimal: ⃴ Octal: \020364
⃴ (⃴)
Uppercase: 20F4
Category: Cn (Other, Not Assigned)
Unicode block: 20D0..20FF; Combining Diacritical Marks for Symbols

and the 80 on its own is not a valid UTF-8 sequence.

This breaks various tools processing the history (grep, lesspipe,
etc.), first because the expected character is no longer present,
also because of invalid UTF-8, which is not regarded as a character.
For instance:

cventin:~> grep -av '^.*$' .zhistory | tail -n 1 | hd
00000000  3a 20 31 34 37 35 36 36  36 34 31 38 3a 30 3b 75  |: 1475666418:0;u|
00000010  6e 69 63 6f 64 65 20 e2  83 b4 80 0a              |nicode .....|
0000001c

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-11-29 15:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-05 11:48 zsh generates invalid UTF-8 encoding in the history Vincent Lefevre
2016-10-05 12:41 ` Mikael Magnusson
2016-10-06 18:31   ` Bart Schaefer
2016-10-07  8:57     ` Vincent Lefevre
2016-10-07 17:01       ` Bart Schaefer
2017-11-29 15:46         ` Vincent Lefevre
2016-10-05 17:25 ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).