From: merch-redmine@jeremyevans.net
To: ruby-dev@ruby-lang.org
Subject: [ruby-dev:51068] [Ruby master Bug#12052] String#encode with xml option returns wrong result
Date: Thu, 24 Jun 2021 23:50:31 +0000 (UTC) [thread overview]
Message-ID: <redmine.journal-92642.20210624235030.4@ruby-lang.org> (raw)
In-Reply-To: <redmine.issue-12052.20160205025027.4@ruby-lang.org>
Issue #12052 has been updated by jeremyevans0 (Jeremy Evans).
Status changed from Assigned to Rejected
After an extensive session with gdb, I've determined that this isn't an issue with `String#encode`, and it isn't a bug.
`"<\0>\0".encode("utf-16le", "utf-16le", xml: :text)` returns the same string as `"<\0>\0".force_encoding("utf-16le")`. I think that's the correct behavior for `String#encode`, since you are specifying the source and destination encodings match.
`"<\0>\0".force_encoding("utf-16le")` is the same string as `"\u6C26\u3B74\u2600\u7467;".encode("utf-16le")`. The 10 ASCII bytes are the same as the bytes for the 5 codepoints in UTF16-LE encoding.
String#inspect processes the string, and formats each of the non-ASCII codepoints using the `\u` syntax, and the final codepoint (59) as a regular ASCII character.
As an example:
```ruby
"<\0>\0".encode("utf-16le", "utf-16le", xml: :text) == "<\0>\0".force_encoding("utf-16le")
=> true
"<\0>\0".force_encoding("utf-16le").codepoints
=> [27686, 15220, 9728, 29799, 59]
"<\0>\0".force_encoding("utf-16le").codepoints.map{|x| x >= 128 ? '-u%X'%x : x.chr}.join
"-u6C26-u3B74-u2600-u7467;"
```
----------------------------------------
Bug #12052: String#encode with xml option returns wrong result
https://bugs.ruby-lang.org/issues/12052#change-92642
* Author: nobu (Nobuyoshi Nakada)
* Status: Rejected
* Priority: Normal
* Assignee: akr (Akira Tanaka)
* Backport: 2.0.0: REQUIRED, 2.1: REQUIRED, 2.2: REQUIRED, 2.3: REQUIRED
----------------------------------------
`String#encode`をASCII非互換エンコーディングから同じエンコーディングへ、`xml:`オプション付きで呼ぶとおかしな結果を返します。
バイナリとして変換してしまっているようです。
```ruby
p "<\0>\0".encode("utf-16le", "utf-16le", xml: :text)
#=> "\u6C26\u3B74\u2600\u7467;"
```
--
https://bugs.ruby-lang.org/
next parent reply other threads:[~2021-06-24 23:50 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <redmine.issue-12052.20160205025027.4@ruby-lang.org>
2021-06-24 23:50 ` merch-redmine [this message]
2021-06-25 9:39 ` [ruby-dev:51069] [Ruby master Bug#12052] String#encode with xml option returns wrong result for totally non-ASCII-compatible encodings duerst
2021-06-25 17:34 ` [ruby-dev:51070] " merch-redmine
2021-06-25 20:09 ` [ruby-dev:51071] " merch-redmine
2021-06-26 0:42 ` [ruby-dev:51072] " duerst
2021-07-03 4:49 ` [ruby-dev:51076] " nagachika00
2021-07-03 5:26 ` [ruby-dev:51077] " nagachika00
2021-07-04 2:02 ` [ruby-dev:51078] " duerst
2021-07-04 8:27 ` [ruby-dev:51079] " nagachika00
2021-07-11 22:46 ` [ruby-dev:51081] " nobu
2021-07-18 2:43 ` [ruby-dev:51083] " nagachika00
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=redmine.journal-92642.20210624235030.4@ruby-lang.org \
--to=merch-redmine@jeremyevans.net \
--cc=ruby-dev@ruby-lang.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).