From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/32005 Path: main.gmane.org!not-for-mail From: Toby Speight Newsgroups: gmane.emacs.gnus.general Subject: Re: Strange `UTF-8'? Date: 08 Aug 2000 11:01:59 +0100 Organization: Citrix Systems Sender: owner-ding@hpc.uh.edu Message-ID: <87g0ogxe94.fsf@delivery.cam.eu.citrix.com> References: NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: main.gmane.org 1035168351 17364 80.91.224.250 (21 Oct 2002 02:45:51 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 02:45:51 +0000 (UTC) Return-Path: Original-Received: from fisher.math.uh.edu (fisher.math.uh.edu [129.7.128.35]) by mailhost.sclp.com (Postfix) with ESMTP id 9F88FD051E for ; Tue, 8 Aug 2000 06:02:55 -0400 (EDT) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by fisher.math.uh.edu (8.9.1/8.9.1) with ESMTP id FAC12893; Tue, 8 Aug 2000 05:02:42 -0500 (CDT) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Tue, 08 Aug 2000 05:01:57 -0500 (CDT) Original-Received: from mailhost.sclp.com (postfix@66-209.196.61.interliant.com [209.196.61.66] (may be forged)) by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id FAA24035 for ; Tue, 8 Aug 2000 05:01:46 -0500 (CDT) Original-Received: from hqvwall01.citrix.com (hqcon01.citrix.com [206.103.132.2]) by mailhost.sclp.com (Postfix) with SMTP id 92550D051E for ; Tue, 8 Aug 2000 06:02:16 -0400 (EDT) Original-Received: from 10.9.1.111 by hqvwall01.citrix.com (InterScan E-Mail VirusWall NT); Tue, 08 Aug 2000 06:02:11 -0400 (Eastern Daylight Time) Original-Received: by HQEXCHCON01 with Internet Mail Service (5.5.2650.21) id ; Tue, 8 Aug 2000 06:01:57 -0400 Original-Received: from delivery.cam.eu.citrix.com ([10.70.128.47]) by hwexch01.ctxuk.citrix.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id QLGA22L3; Tue, 8 Aug 2000 11:02:00 +0100 Original-To: The Gnus Mailing List Original-Lines: 39 In-Reply-To: Kai.Grossjohann@CS.Uni-Dortmund.DE's message of "Tue, 8 Aug 2000 11:38:38 +0200" X-Author-Info: Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:32005 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:32005 0> In article , 0> Kai Gro=DFjohann = ("Kai") wrote: Kai> I received a message which was labeled charset=3DUTF-8, but my = name was Kai> all wrong. I then used Mule-UCS to create a UTF-8 file with my = name Kai> in it, ran `less' on it, and got this output: Kai> Kai> Gro<9F>johann Decoding C39F gives 3<<6 + 1F =3D=3D DF =3D=3D =DF So that looks right to me. Kai> Running `less' on the message file was different: Kai> Kai> Gro<83>Yjohann C383 =3D> u-00C3 =3D=3D =C3 And the Y (u-0059) is completely spurious. (Remember, in UTF-8, all US-ASCII characters stand for themselves - C0-FF are initial bytes, and 80-BF are continuation bytes). Kai> What is happening here? Do the two encodings mean the same thing = (but Kai> Mule-UCS does not know that), or is the second one bogus? Why is = it Kai> bogus? The second one is wrong (but not bogus in the sense of broken - it just represents something different from the intent). I'm not sure how it got like that. It looks like it's been double encoded: the leading C3 has been re-encoded as C383, and 9F, not being in Unicode has gone very random. Just a guess, though. Kai> The message was created with: Kai> Kai> X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-15mdksecure = i686)