From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/53724
Path: main.gmane.org!not-for-mail
From: Oliver Scholz <alkibiades@gmx.de>
Newsgroups: gmane.emacs.gnus.general
Subject: Re: Gnus: UTF-8 and compatibility with other MUAs
Date: Fri, 15 Aug 2003 20:10:56 +0200
Sender: ding-owner@lists.math.uh.edu
Message-ID: <uhe4in64v.fsf@ID-87814.user.dfncis.de>
References: <plop87brus6y07.fsf@gnu-rox.org> <m3oeyrkfnc.fsf@defun.localdomain>
 <ubruruj1i.fsf@ID-87814.user.dfncis.de> <m3isoy50kt.fsf@defun.localdomain>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Trace: sea.gmane.org 1060971239 5988 80.91.224.253 (15 Aug 2003 18:13:59 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Fri, 15 Aug 2003 18:13:59 +0000 (UTC)
Original-X-From: ding-owner+M2268@lists.math.uh.edu Fri Aug 15 20:13:58 2003
Return-path: <ding-owner+M2268@lists.math.uh.edu>
Original-Received: from malifon.math.uh.edu ([129.7.128.13])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 19nj5J-000838-00
	for <ding-account@gmane.org>; Fri, 15 Aug 2003 20:13:57 +0200
Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu)
	by malifon.math.uh.edu with smtp (Exim 3.20 #1)
	id 19nj3m-0001mq-00; Fri, 15 Aug 2003 13:12:22 -0500
Original-Received: from sclp3.sclp.com ([64.157.176.121])
	by malifon.math.uh.edu with smtp (Exim 3.20 #1)
	id 19nj3h-0001ml-00
	for ding@lists.math.uh.edu; Fri, 15 Aug 2003 13:12:17 -0500
Original-Received: (qmail 55601 invoked by alias); 15 Aug 2003 18:12:17 -0000
Original-Received: (qmail 55596 invoked from network); 15 Aug 2003 18:12:16 -0000
Original-Received: from main.gmane.org (80.91.224.249)
  by sclp3.sclp.com with SMTP; 15 Aug 2003 18:12:16 -0000
Original-Received: from list by main.gmane.org with local (Exim 3.35 #1 (Debian))
	id 19nj4m-0007YP-00
	for <ding@gnus.org>; Fri, 15 Aug 2003 20:13:24 +0200
X-Injected-Via-Gmane: http://gmane.org/
Original-To: ding@gnus.org
Original-Received: from sea.gmane.org ([80.91.224.252])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 19nj4l-0007YH-00
	for <gmane-emacs-gnus-general@m.gmane.org>; Fri, 15 Aug 2003 20:13:23 +0200
Original-Received: from news by sea.gmane.org with local (Exim 3.35 #1 (Debian))
	id 19nj3d-0001Vz-00
	for <gmane-emacs-gnus-general@m.gmane.org>; Fri, 15 Aug 2003 20:12:13 +0200
Original-Lines: 66
Original-X-Complaints-To: usenet@sea.gmane.org
X-Attribution: os
X-Face: "HgH2sgK|bfH$;PiOJI6|qUCf.ve<51_Od(%ynHr?=>znn#~#oS>",F%B8&\vus),2AsPYb -n>PgddtGEn}s7kH?7kH{P_~vu?]OvVN^qD(L)>G^gDCl(U9n{:d>'DkilN!_K"eNzjrtI4Ya6;Td%
 IZGMbJ{lawG+'J>QXPZD&TwWU@^~A}f^zAb[Ru;CT(UA]c&
User-Agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3.50 (windows-nt)
Cancel-Lock: sha1:mpoDv8OVe0EmE9pT16wa/w0PMEY=
Precedence: bulk
Xref: main.gmane.org gmane.emacs.gnus.general:53724
X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:53724

Jesper Harder <harder@myrealbox.com> writes:

> Oliver Scholz <alkibiades@gmx.de> writes:
>
>> The lowest common denominator for most German text is ISO
>> 646-DE. For most Danish text (I presume) ISO 646-DK. Virtually
>> nobody uses those coding systems anymore, and IMNSHO nobody should
>> use them.
>
> The RFC does say that ISO-8859 is prefered over ISO 646:
>
>    Note that the ISO 646 character sets have deliberately been omitted
>    in favor of their 8859 replacements, which are the designated
>    character sets for Internet mail.
>

Hmm. I guess it's time for me to finally read RFC 2046 ...

>> Taken literally nobody should use ISO 8859-15 then, unless the
>> message really contains an € (or one of the other 7
>> characters). 
>
> I agree with that.  I don't see _any_ reason to use latin-9 if you
> don't need it.  Some MUA's don't support latin-9 (including older
> versions of Gnus) -- why break those clients for no good reason?

Well, I think, if you want to maximize the chance that your message
is flawlessly readable at the other end, this makes sense as a
pragmatic rule.

As a technical rule, however, which is important for the question
whether a message is fully RFC compliant or not, it does not make
sense.

BTW, if the rule were that we should use the smallest, most widely
used coded character set which covers the all necessary characters in
a message, then western European users should use neither Latin-1 nor
Latin-9, but windows-1252.

However, from the section you quotet alone it is not entirely clear
whether it refers to absctract characters, code points in a coded
character set or octets in a character encoding scheme. The term
“character set” may seem to indicate that they are talking about coded
character sets, but RFC 2046 refers to RFC 2045 for the definition of
the term “character set”. There it reads:

   NOTE: The term "character set" was originally to describe such
   straightforward schemes as US-ASCII and ISO-8859-1 which have a
   simple one-to-one mapping from single octets to single characters.
   Multi-octet coded character sets and switching techniques make the
   situation more complex. For example, some communities use the term
   "character encoding" for what MIME calls a "character set", while
   using the phrase "coded character set" to denote an abstract mapping
   from integers (not octets) to characters.

So I'd say “character set” refers to the character encoding
scheme. And in this sense the rule makes sense: if a message contains
only characters from the ASCII repertoire it should be declared as
US-ASCII, not as UTF-8. But that does not extend to ISO
8859-[[:digit:]]+, since UTF-8 and Latin-1 are not compatible.


    Oliver
-- 
28 Thermidor an 211 de la Révolution
Liberté, Egalité, Fraternité!