zsh-workers
 help / color / mirror / code / Atom feed
From: Vincent Lefevre <vincent@vinc17.net>
To: zsh-workers@zsh.org
Subject: zsh generates invalid UTF-8 encoding in the history
Date: Wed, 5 Oct 2016 13:48:48 +0200	[thread overview]
Message-ID: <20161005114848.GA1125@cventin.lip.ens-lyon.fr> (raw)

With Debian's zsh 5.2-5 + some patches, when I execute commands with
some particular Unicode characters, the UTF-8 sequences are rewritten
incorrectly in the history. For instance:

cventin:~> unicode ─
U+2500 BOX DRAWINGS LIGHT HORIZONTAL
UTF-8: e2 94 80 UTF-16BE: 2500 Decimal: &#9472; Octal: \022400
─
Category: So (Symbol, Other)
Unicode block: 2500..257F; Box Drawing
Bidi: ON (Other Neutrals)

But in the history, instead of getting e2 94 80, I get: e2 83 b4 80.
Concerning "e2 83 b4 80":

cventin:~> unicode --fromcp utf-8 -x e283b4
U+20F4  - No such unicode character name in database
UTF-8: e2 83 b4 UTF-16BE: 20f4 Decimal: &#8436; Octal: \020364
⃴ (⃴)
Uppercase: 20F4
Category: Cn (Other, Not Assigned)
Unicode block: 20D0..20FF; Combining Diacritical Marks for Symbols

and the 80 on its own is not a valid UTF-8 sequence.

This breaks various tools processing the history (grep, lesspipe,
etc.), first because the expected character is no longer present,
also because of invalid UTF-8, which is not regarded as a character.
For instance:

cventin:~> grep -av '^.*$' .zhistory | tail -n 1 | hd
00000000  3a 20 31 34 37 35 36 36  36 34 31 38 3a 30 3b 75  |: 1475666418:0;u|
00000010  6e 69 63 6f 64 65 20 e2  83 b4 80 0a              |nicode .....|
0000001c

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


             reply	other threads:[~2016-10-05 11:48 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-05 11:48 Vincent Lefevre [this message]
2016-10-05 12:41 ` Mikael Magnusson
2016-10-06 18:31   ` Bart Schaefer
2016-10-07  8:57     ` Vincent Lefevre
2016-10-07 17:01       ` Bart Schaefer
2017-11-29 15:46         ` Vincent Lefevre
2016-10-05 17:25 ` Bart Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161005114848.GA1125@cventin.lip.ens-lyon.fr \
    --to=vincent@vinc17.net \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).