From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/53724 Path: main.gmane.org!not-for-mail From: Oliver Scholz Newsgroups: gmane.emacs.gnus.general Subject: Re: Gnus: UTF-8 and compatibility with other MUAs Date: Fri, 15 Aug 2003 20:10:56 +0200 Sender: ding-owner@lists.math.uh.edu Message-ID: References: NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1060971239 5988 80.91.224.253 (15 Aug 2003 18:13:59 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 15 Aug 2003 18:13:59 +0000 (UTC) Original-X-From: ding-owner+M2268@lists.math.uh.edu Fri Aug 15 20:13:58 2003 Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19nj5J-000838-00 for ; Fri, 15 Aug 2003 20:13:57 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 19nj3m-0001mq-00; Fri, 15 Aug 2003 13:12:22 -0500 Original-Received: from sclp3.sclp.com ([64.157.176.121]) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 19nj3h-0001ml-00 for ding@lists.math.uh.edu; Fri, 15 Aug 2003 13:12:17 -0500 Original-Received: (qmail 55601 invoked by alias); 15 Aug 2003 18:12:17 -0000 Original-Received: (qmail 55596 invoked from network); 15 Aug 2003 18:12:16 -0000 Original-Received: from main.gmane.org (80.91.224.249) by sclp3.sclp.com with SMTP; 15 Aug 2003 18:12:16 -0000 Original-Received: from list by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 19nj4m-0007YP-00 for ; Fri, 15 Aug 2003 20:13:24 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-To: ding@gnus.org Original-Received: from sea.gmane.org ([80.91.224.252]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19nj4l-0007YH-00 for ; Fri, 15 Aug 2003 20:13:23 +0200 Original-Received: from news by sea.gmane.org with local (Exim 3.35 #1 (Debian)) id 19nj3d-0001Vz-00 for ; Fri, 15 Aug 2003 20:12:13 +0200 Original-Lines: 66 Original-X-Complaints-To: usenet@sea.gmane.org X-Attribution: os X-Face: "HgH2sgK|bfH$;PiOJI6|qUCf.ve<51_Od(%ynHr?=>znn#~#oS>",F%B8&\vus),2AsPYb -n>PgddtGEn}s7kH?7kH{P_~vu?]OvVN^qD(L)>G^gDCl(U9n{:d>'DkilN!_K"eNzjrtI4Ya6;Td% IZGMbJ{lawG+'J>QXPZD&TwWU@^~A}f^zAb[Ru;CT(UA]c& User-Agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3.50 (windows-nt) Cancel-Lock: sha1:mpoDv8OVe0EmE9pT16wa/w0PMEY= Precedence: bulk Xref: main.gmane.org gmane.emacs.gnus.general:53724 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:53724 Jesper Harder writes: > Oliver Scholz writes: > >> The lowest common denominator for most German text is ISO >> 646-DE. For most Danish text (I presume) ISO 646-DK. Virtually >> nobody uses those coding systems anymore, and IMNSHO nobody should >> use them. > > The RFC does say that ISO-8859 is prefered over ISO 646: > > Note that the ISO 646 character sets have deliberately been omitted > in favor of their 8859 replacements, which are the designated > character sets for Internet mail. > Hmm. I guess it's time for me to finally read RFC 2046 ... >> Taken literally nobody should use ISO 8859-15 then, unless the >> message really contains an € (or one of the other 7 >> characters). > > I agree with that. I don't see _any_ reason to use latin-9 if you > don't need it. Some MUA's don't support latin-9 (including older > versions of Gnus) -- why break those clients for no good reason? Well, I think, if you want to maximize the chance that your message is flawlessly readable at the other end, this makes sense as a pragmatic rule. As a technical rule, however, which is important for the question whether a message is fully RFC compliant or not, it does not make sense. BTW, if the rule were that we should use the smallest, most widely used coded character set which covers the all necessary characters in a message, then western European users should use neither Latin-1 nor Latin-9, but windows-1252. However, from the section you quotet alone it is not entirely clear whether it refers to absctract characters, code points in a coded character set or octets in a character encoding scheme. The term “character set” may seem to indicate that they are talking about coded character sets, but RFC 2046 refers to RFC 2045 for the definition of the term “character set”. There it reads: NOTE: The term "character set" was originally to describe such straightforward schemes as US-ASCII and ISO-8859-1 which have a simple one-to-one mapping from single octets to single characters. Multi-octet coded character sets and switching techniques make the situation more complex. For example, some communities use the term "character encoding" for what MIME calls a "character set", while using the phrase "coded character set" to denote an abstract mapping from integers (not octets) to characters. So I'd say “character set” refers to the character encoding scheme. And in this sense the rule makes sense: if a message contains only characters from the ASCII repertoire it should be declared as US-ASCII, not as UTF-8. But that does not extend to ISO 8859-[[:digit:]]+, since UTF-8 and Latin-1 are not compatible. Oliver -- 28 Thermidor an 211 de la Révolution Liberté, Egalité, Fraternité!