From: merch-redmine@jeremyevans.net
To: ruby-dev@ruby-lang.org
Subject: [ruby-dev:51070] [Ruby master Bug#12052] String#encode with xml option returns wrong result for totally non-ASCII-compatible encodings
Date: Fri, 25 Jun 2021 17:34:06 +0000 (UTC) [thread overview]
Message-ID: <redmine.journal-92651.20210625173405.4@ruby-lang.org> (raw)
In-Reply-To: <redmine.issue-12052.20160205025027.4@ruby-lang.org>
Issue #12052 has been updated by jeremyevans0 (Jeremy Evans).
duerst (Martin Dürst) wrote in #note-2:
> Sorry to @jeremyevans0, but I have to disagree. This is a bug. We can disagree about how important it is to fix this bug, but it's a bug nevertheless. First, xml: :text works correctly in other encodings even if the source and destination encodings match.
> ```Ruby
> "<q&".force_encoding("shift_JIS").encode("shift_JIS", xml: :text)
> => "<q&"
> ```
>
> The bug is that we process UTF-16LE as if it consisted of 1-byte ASCII-based code units. I still have to identify exactly where and when that happens.
Ah. So you are saying that `"<\0>\0".encode("utf-16le", "utf-16le", xml: :text)` needs to have the same result as:
`"<\0>\0".encode("utf-8", "utf-16le", xml: :text).encode("utf-16le")`. I agree, that makes more sense and this is a bug.
It looks like this issue occurs when using both multibyte source and destination encoding. If either the source or destination encoding is not multibyte, the issue doesn't occur:
```ruby
# Multibyte source, single-byte destination
"<\0>\0".encode("utf-8", "utf-16le", xml: :text).bytes
=> [38, 108, 116, 59, 38, 103, 116, 59]
# Single-byte source, multibyte destination
"<>".encode("utf-16le", "utf-8", xml: :text).bytes
=> [38, 0, 108, 0, 116, 0, 59, 0, 38, 0, 103, 0, 116, 0, 59, 0]
# Multibyte source, multibyte destination
"<\0>\0".encode("utf-16le", "utf-16le", xml: :text).bytes
=> [38, 108, 116, 59, 0, 38, 103, 116, 59, 0]
```
So a possible way to work around the issue until it can be properly fixed would be to detect the case where both source and destination are multibyte, switch the destination to UTF-8, then encode the result of that to the desired destination encoding.
----------------------------------------
Bug #12052: String#encode with xml option returns wrong result for totally non-ASCII-compatible encodings
https://bugs.ruby-lang.org/issues/12052#change-92651
* Author: nobu (Nobuyoshi Nakada)
* Status: Open
* Priority: Normal
* Assignee: akr (Akira Tanaka)
* Backport: 2.0.0: REQUIRED, 2.1: REQUIRED, 2.2: REQUIRED, 2.3: REQUIRED
----------------------------------------
`String#encode`をASCII非互換エンコーディングから同じエンコーディングへ、`xml:`オプション付きで呼ぶとおかしな結果を返します。
バイナリとして変換してしまっているようです。
```ruby
p "<\0>\0".encode("utf-16le", "utf-16le", xml: :text)
#=> "\u6C26\u3B74\u2600\u7467;"
```
--
https://bugs.ruby-lang.org/
next prev parent reply other threads:[~2021-06-25 17:34 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <redmine.issue-12052.20160205025027.4@ruby-lang.org>
2021-06-24 23:50 ` [ruby-dev:51068] [Ruby master Bug#12052] String#encode with xml option returns wrong result merch-redmine
2021-06-25 9:39 ` [ruby-dev:51069] [Ruby master Bug#12052] String#encode with xml option returns wrong result for totally non-ASCII-compatible encodings duerst
2021-06-25 17:34 ` merch-redmine [this message]
2021-06-25 20:09 ` [ruby-dev:51071] " merch-redmine
2021-06-26 0:42 ` [ruby-dev:51072] " duerst
2021-07-03 4:49 ` [ruby-dev:51076] " nagachika00
2021-07-03 5:26 ` [ruby-dev:51077] " nagachika00
2021-07-04 2:02 ` [ruby-dev:51078] " duerst
2021-07-04 8:27 ` [ruby-dev:51079] " nagachika00
2021-07-11 22:46 ` [ruby-dev:51081] " nobu
2021-07-18 2:43 ` [ruby-dev:51083] " nagachika00
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=redmine.journal-92651.20210625173405.4@ruby-lang.org \
--to=merch-redmine@jeremyevans.net \
--cc=ruby-dev@ruby-lang.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).