ruby-dev (Japanese) list archive (unofficial mirror)
 help / color / mirror / Atom feed
From: duerst@it.aoyama.ac.jp
To: ruby-dev@ruby-lang.org
Subject: [ruby-dev:51069] [Ruby master Bug#12052] String#encode with xml option returns wrong result for totally non-ASCII-compatible encodings
Date: Fri, 25 Jun 2021 09:39:55 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-92645.20210625093954.4@ruby-lang.org> (raw)
In-Reply-To: <redmine.issue-12052.20160205025027.4@ruby-lang.org>

Issue #12052 has been updated by duerst (Martin Dürst).

Status changed from Rejected to Open
Subject changed from String#encode with xml option returns wrong result to String#encode with xml option returns wrong result for totally non-ASCII-compatible encodings

Sorry to @jeremyevans0, but I have to disagree. This is a bug. We can disagree about how important it is to fix this bug, but it's a bug nevertheless. First, xml: :text works correctly in other encodings even if the source and destination encodings match.
```Ruby
"<q&".force_encoding("shift_JIS").encode("shift_JIS", xml: :text)
=> "&lt;q&amp;"
```

The bug is that we process UTF-16LE as if it consisted of 1-byte ASCII-based code units. I still have to identify exactly where and when that happens.

I have changed the subject to indicate what I understand is the extent of the problem. By using "totally", I want to distinguish this from encodings such as Shift_JIS which are also not as ASCII-compatible as say UTF-8, but still more so than UTF-16 (in its various variants).

----------------------------------------
Bug #12052: String#encode with xml option returns wrong result for totally non-ASCII-compatible encodings
https://bugs.ruby-lang.org/issues/12052#change-92645

* Author: nobu (Nobuyoshi Nakada)
* Status: Open
* Priority: Normal
* Assignee: akr (Akira Tanaka)
* Backport: 2.0.0: REQUIRED, 2.1: REQUIRED, 2.2: REQUIRED, 2.3: REQUIRED
----------------------------------------
`String#encode`をASCII非互換エンコーディングから同じエンコーディングへ、`xml:`オプション付きで呼ぶとおかしな結果を返します。
バイナリとして変換してしまっているようです。

```ruby
p "<\0>\0".encode("utf-16le", "utf-16le", xml: :text)
#=> "\u6C26\u3B74\u2600\u7467;"
```



-- 
https://bugs.ruby-lang.org/

  parent reply	other threads:[~2021-06-25  9:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <redmine.issue-12052.20160205025027.4@ruby-lang.org>
2021-06-24 23:50 ` [ruby-dev:51068] [Ruby master Bug#12052] String#encode with xml option returns wrong result merch-redmine
2021-06-25  9:39 ` duerst [this message]
2021-06-25 17:34 ` [ruby-dev:51070] [Ruby master Bug#12052] String#encode with xml option returns wrong result for totally non-ASCII-compatible encodings merch-redmine
2021-06-25 20:09 ` [ruby-dev:51071] " merch-redmine
2021-06-26  0:42 ` [ruby-dev:51072] " duerst
2021-07-03  4:49 ` [ruby-dev:51076] " nagachika00
2021-07-03  5:26 ` [ruby-dev:51077] " nagachika00
2021-07-04  2:02 ` [ruby-dev:51078] " duerst
2021-07-04  8:27 ` [ruby-dev:51079] " nagachika00
2021-07-11 22:46 ` [ruby-dev:51081] " nobu
2021-07-18  2:43 ` [ruby-dev:51083] " nagachika00

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-92645.20210625093954.4@ruby-lang.org \
    --to=duerst@it.aoyama.ac.jp \
    --cc=ruby-dev@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).