From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS4713 221.184.0.0/13 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from neon.ruby-lang.org (neon.ruby-lang.org [221.186.184.75]) by dcvr.yhbt.net (Postfix) with ESMTP id 43BD21F8C6 for ; Thu, 24 Jun 2021 23:50:39 +0000 (UTC) Received: from neon.ruby-lang.org (localhost [IPv6:::1]) by neon.ruby-lang.org (Postfix) with ESMTP id 9BFFE120A72; Fri, 25 Jun 2021 08:49:24 +0900 (JST) Received: from xtrwkhkc.outbound-mail.sendgrid.net (xtrwkhkc.outbound-mail.sendgrid.net [167.89.16.28]) by neon.ruby-lang.org (Postfix) with ESMTPS id 0D055120A72 for ; Fri, 25 Jun 2021 08:49:21 +0900 (JST) Received: by filterdrecv-c8c5888c4-rf8fh with SMTP id filterdrecv-c8c5888c4-rf8fh-1-60D51A47-21 2021-06-24 23:50:31.42996244 +0000 UTC m=+506554.296869445 Received: from herokuapp.com (unknown) by geopod-ismtpd-3-2 (SG) with ESMTP id TWNFoP5LSQCUMPNaToSWZg for ; Thu, 24 Jun 2021 23:50:31.294 +0000 (UTC) Date: Thu, 24 Jun 2021 23:50:31 +0000 (UTC) From: merch-redmine@jeremyevans.net Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Bug X-Redmine-Issue-Id: 12052 X-Redmine-Issue-Author: nobu X-Redmine-Issue-Assignee: akr X-Redmine-Sender: jeremyevans0 X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-Redmine-MailingListIntegration-Message-Ids: 80516 X-SG-EID: =?us-ascii?Q?RVE3t853K5scBhbmJHUzZTFFeVC=2FZSUmHZ0Dc+26wcEi2CTgsF1oz0wTSSxGGN?= =?us-ascii?Q?BIkz44jhhdJkbvO7Wfa5R9y=2FuKdwUJvWd3oDJk2?= =?us-ascii?Q?2y+hBSwvAKK=2FeLezkNQ8p0WuN8CXS6KtjbbV6lX?= =?us-ascii?Q?7vZ9X3B3lZ6o+KNiIEqTzXrK0OMNGyxzzJO+Szo?= =?us-ascii?Q?UGZh=2FxUeruX1PjAm0M6Rrays98Lw60KCoKWcbQ0?= =?us-ascii?Q?wkUkg0vK=2FWhRckc0U=3D?= To: ruby-dev@ruby-lang.org X-Entity-ID: b/2+PoftWZ6GuOu3b0IycA== X-ML-Name: ruby-dev X-Mail-Count: 51068 Subject: [ruby-dev:51068] [Ruby master Bug#12052] String#encode with xml option returns wrong result X-BeenThere: ruby-dev@ruby-lang.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: "Ruby developers \(Japanese\)" List-Id: "Ruby developers \(Japanese\)" List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: ruby-dev-bounces@ruby-lang.org Sender: "ruby-dev" Issue #12052 has been updated by jeremyevans0 (Jeremy Evans).=0D =0D Status changed from Assigned to Rejected=0D =0D After an extensive session with gdb, I've determined that this isn't an iss= ue with `String#encode`, and it isn't a bug.=0D =0D `"<\0>\0".encode("utf-16le", "utf-16le", xml: :text)` returns the same stri= ng as `"<\0>\0".force_encoding("utf-16le")`. I think that's the corr= ect behavior for `String#encode`, since you are specifying the source and d= estination encodings match.=0D =0D `"<\0>\0".force_encoding("utf-16le")` is the same string as `"\u6C26\= u3B74\u2600\u7467;".encode("utf-16le")`. The 10 ASCII bytes are the same as= the bytes for the 5 codepoints in UTF16-LE encoding.=0D =0D String#inspect processes the string, and formats each of the non-ASCII code= points using the `\u` syntax, and the final codepoint (59) as a regular ASC= II character.=0D =0D As an example:=0D =0D ```ruby=0D "<\0>\0".encode("utf-16le", "utf-16le", xml: :text) =3D=3D "<\0>\0".f= orce_encoding("utf-16le")=0D =3D> true=0D =0D "<\0>\0".force_encoding("utf-16le").codepoints=0D =3D> [27686, 15220, 9728, 29799, 59]=0D =0D "<\0>\0".force_encoding("utf-16le").codepoints.map{|x| x >=3D 128 ? '= -u%X'%x : x.chr}.join=0D "-u6C26-u3B74-u2600-u7467;"=0D ```=0D =0D =0D =0D ----------------------------------------=0D Bug #12052: String#encode with xml option returns wrong result=0D https://bugs.ruby-lang.org/issues/12052#change-92642=0D =0D * Author: nobu (Nobuyoshi Nakada)=0D * Status: Rejected=0D * Priority: Normal=0D * Assignee: akr (Akira Tanaka)=0D * Backport: 2.0.0: REQUIRED, 2.1: REQUIRED, 2.2: REQUIRED, 2.3: REQUIRED=0D ----------------------------------------=0D `String#encode`=E3=82=92ASCII=E9=9D=9E=E4=BA=92=E6=8F=9B=E3=82=A8=E3=83=B3= =E3=82=B3=E3=83=BC=E3=83=87=E3=82=A3=E3=83=B3=E3=82=B0=E3=81=8B=E3=82=89=E5= =90=8C=E3=81=98=E3=82=A8=E3=83=B3=E3=82=B3=E3=83=BC=E3=83=87=E3=82=A3=E3=83= =B3=E3=82=B0=E3=81=B8=E3=80=81`xml:`=E3=82=AA=E3=83=97=E3=82=B7=E3=83=A7=E3= =83=B3=E4=BB=98=E3=81=8D=E3=81=A7=E5=91=BC=E3=81=B6=E3=81=A8=E3=81=8A=E3=81= =8B=E3=81=97=E3=81=AA=E7=B5=90=E6=9E=9C=E3=82=92=E8=BF=94=E3=81=97=E3=81=BE= =E3=81=99=E3=80=82=0D =E3=83=90=E3=82=A4=E3=83=8A=E3=83=AA=E3=81=A8=E3=81=97=E3=81=A6=E5=A4=89=E6= =8F=9B=E3=81=97=E3=81=A6=E3=81=97=E3=81=BE=E3=81=A3=E3=81=A6=E3=81=84=E3=82= =8B=E3=82=88=E3=81=86=E3=81=A7=E3=81=99=E3=80=82=0D =0D ```ruby=0D p "<\0>\0".encode("utf-16le", "utf-16le", xml: :text)=0D #=3D> "\u6C26\u3B74\u2600\u7467;"=0D ```=0D =0D =0D =0D --=20=0D https://bugs.ruby-lang.org/=0D