From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/53749 Path: main.gmane.org!not-for-mail From: Oliver Scholz Newsgroups: gmane.emacs.gnus.general Subject: Re: Gnus: UTF-8 and compatibility with other MUAs Date: Sun, 17 Aug 2003 18:40:17 +0200 Sender: ding-owner@lists.math.uh.edu Message-ID: References: NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1061141269 31568 80.91.224.253 (17 Aug 2003 17:27:49 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 17 Aug 2003 17:27:49 +0000 (UTC) Original-X-From: ding-owner+M2290@lists.math.uh.edu Sun Aug 17 19:27:48 2003 Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19oRJk-0006Pi-00 for ; Sun, 17 Aug 2003 19:27:48 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 19oRIH-0008Bp-00; Sun, 17 Aug 2003 12:26:17 -0500 Original-Received: from sclp3.sclp.com ([64.157.176.121]) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 19oRI8-0008Bh-00 for ding@lists.math.uh.edu; Sun, 17 Aug 2003 12:26:08 -0500 Original-Received: (qmail 15086 invoked by alias); 17 Aug 2003 17:26:08 -0000 Original-Received: (qmail 15081 invoked from network); 17 Aug 2003 17:26:08 -0000 Original-Received: from main.gmane.org (80.91.224.249) by sclp3.sclp.com with SMTP; 17 Aug 2003 17:26:08 -0000 Original-Received: from list by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 19oRJA-0004B2-00 for ; Sun, 17 Aug 2003 19:27:12 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-To: ding@gnus.org Original-Received: from sea.gmane.org ([80.91.224.252]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19oRJ9-0004Au-00 for ; Sun, 17 Aug 2003 19:27:11 +0200 Original-Received: from news by sea.gmane.org with local (Exim 3.35 #1 (Debian)) id 19oRI4-0008BX-00 for ; Sun, 17 Aug 2003 19:26:04 +0200 Original-Lines: 31 Original-X-Complaints-To: usenet@sea.gmane.org X-Attribution: os X-Face: "HgH2sgK|bfH$;PiOJI6|qUCf.ve<51_Od(%ynHr?=>znn#~#oS>",F%B8&\vus),2AsPYb -n>PgddtGEn}s7kH?7kH{P_~vu?]OvVN^qD(L)>G^gDCl(U9n{:d>'DkilN!_K"eNzjrtI4Ya6;Td% IZGMbJ{lawG+'J>QXPZD&TwWU@^~A}f^zAb[Ru;CT(UA]c& User-Agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3.50 (windows-nt) Cancel-Lock: sha1:+BbSQRegTMLItjEl6LpGnDiBHwc= Precedence: bulk Xref: main.gmane.org gmane.emacs.gnus.general:53749 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:53749 Benjamin Riefenstahl writes: [...] > Simon Josefsson writes: >> But it doesn't solve the problem. 'cmp' still says the files are >> different. UTF-8 had a similar problem (overlong encodings) but >> that has been fixed, UTF-16 and UTF-32 can't be. > > Actually UTF-8 still has that problem with composed vs. decomposed > characters. There is no perfect system AFAIK. Just to be sure that I understand you correctly: Do you refer to the fact here that a character like, say, U+00E9 (LATIN SMALL LETTER E WITH ACUTE) is equivalent to U+0065 followed by U+0301 (LATIN SMALL LETTER E followed by COMBINING ACUTE ACCENT)? >> If normal computers was 16 bit, I could understand the trade-off, > > Depends of what you call "normal computers." MS Windows and Apple's > Mac OS X both use UTF-16 for APIs and internal implmentation. [...] I am not sure, but I think that the characters that need to be accessed via surrogate pairs are meant to be rare, since they are outside of the BMP. So AFAIK UTF-16 is meant as a space-efficient format for East Asian text. But as I said: this is outside the scope of things with which I have normally to deal with. Oliver -- 30 Thermidor an 211 de la Révolution Liberté, Egalité, Fraternité!