From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/53720
Path: main.gmane.org!not-for-mail
From: Oliver Scholz <alkibiades@gmx.de>
Newsgroups: gmane.emacs.gnus.general
Subject: Re: Gnus: UTF-8 and compatibility with other MUAs
Date: Fri, 15 Aug 2003 15:50:17 +0200
Sender: ding-owner@lists.math.uh.edu
Message-ID: <ubruruj1i.fsf@ID-87814.user.dfncis.de>
References: <plop87brus6y07.fsf@gnu-rox.org> <m3oeyrkfnc.fsf@defun.localdomain>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Trace: sea.gmane.org 1060961763 22649 80.91.224.253 (15 Aug 2003 15:36:03 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Fri, 15 Aug 2003 15:36:03 +0000 (UTC)
Original-X-From: ding-owner+M2264@lists.math.uh.edu Fri Aug 15 17:36:02 2003
Return-path: <ding-owner+M2264@lists.math.uh.edu>
Original-Received: from malifon.math.uh.edu ([129.7.128.13])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 19ngcU-0001V6-00
	for <ding-account@gmane.org>; Fri, 15 Aug 2003 17:36:02 +0200
Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu)
	by malifon.math.uh.edu with smtp (Exim 3.20 #1)
	id 19ngb1-000187-00; Fri, 15 Aug 2003 10:34:31 -0500
Original-Received: from sclp3.sclp.com ([64.157.176.121])
	by malifon.math.uh.edu with smtp (Exim 3.20 #1)
	id 19nf6Y-0000wj-00
	for ding@lists.math.uh.edu; Fri, 15 Aug 2003 08:58:58 -0500
Original-Received: (qmail 48598 invoked by alias); 15 Aug 2003 13:58:58 -0000
Original-Received: (qmail 48593 invoked from network); 15 Aug 2003 13:58:57 -0000
Original-Received: from main.gmane.org (80.91.224.249)
  by sclp3.sclp.com with SMTP; 15 Aug 2003 13:58:57 -0000
Original-Received: from root by main.gmane.org with local (Exim 3.35 #1 (Debian))
	id 19nf7e-0005i5-00
	for <ding@gnus.org>; Fri, 15 Aug 2003 16:00:06 +0200
X-Injected-Via-Gmane: http://gmane.org/
Original-To: ding@gnus.org
Original-Received: from sea.gmane.org ([80.91.224.252])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 19nf0t-0005eC-00
	for <gmane-emacs-gnus-general@m.gmane.org>; Fri, 15 Aug 2003 15:53:07 +0200
Original-Received: from news by sea.gmane.org with local (Exim 3.35 #1 (Debian))
	id 19nezk-0002wM-00
	for <gmane-emacs-gnus-general@m.gmane.org>; Fri, 15 Aug 2003 15:51:56 +0200
Original-Lines: 45
Original-X-Complaints-To: usenet@sea.gmane.org
X-Attribution: os
X-Face: "HgH2sgK|bfH$;PiOJI6|qUCf.ve<51_Od(%ynHr?=>znn#~#oS>",F%B8&\vus),2AsPYb -n>PgddtGEn}s7kH?7kH{P_~vu?]OvVN^qD(L)>G^gDCl(U9n{:d>'DkilN!_K"eNzjrtI4Ya6;Td%
 IZGMbJ{lawG+'J>QXPZD&TwWU@^~A}f^zAb[Ru;CT(UA]c&
User-Agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3.50 (windows-nt)
Cancel-Lock: sha1:R8HEnWuGJ5RpKKKnVX/KCb3zzYE=
Precedence: bulk
Xref: main.gmane.org gmane.emacs.gnus.general:53720
X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:53720

Jesper Harder <harder@myrealbox.com> writes:
[...]
> To use UTF-8 by default would also be against RFC 2046:
>
> ,----[ RFC 2046, Section 4.1.2. ]
> |
> |    In general, composition software should always use the "lowest common
> |    denominator" character set possible.  For example, if a body contains
> |    only US-ASCII characters, it SHOULD be marked as being in the US-
> |    ASCII character set, not ISO-8859-1, which, like all the ISO-8859
> |    family of character sets, is a superset of US-ASCII.  More generally,
> |    if a widely-used character set is a subset of another character set,
> |    and a body contains only characters in the widely-used subset, it
> |    should be labelled as being in that subset.  This will increase the
> |    chances that the recipient will be able to view the resulting entity
> |    correctly.
> `----
[...]

That's not how I read the section you quoted. In my reading this
means that you should not declare the message to be in UTF-8, when it
contains only ASCII characters. For characters from the right hand
part of ISO 8859-1 this is not so simple: Latin-1 (as a coded
character set) may be a subset of UCS. But Latin-1 (as a character
encoding scheme) is _not_ a subset of UTF-8.

The lowest common denominator for most German text is ISO 646-DE. For
most Danish text (I presume) ISO 646-DK. Virtually nobody uses those
coding systems anymore, and IMNSHO nobody should use them. (I have
implemented ISO 646-DE for GNU Emacs in a way that it could be easily
extended to other national variants of ISO 646, in case you are
interested ...)

Sure, one could say that the national variants of ISO 646 are excluded
by the phrase “widely-used character sets”, but that is a bit too
fuzzy for my taste. Taken literally nobody should use ISO 8859-15
then, unless the message really contains an € (or one of the other 7
characters). Maybe this is what this section wants to say, but then I
dare say that it doesn't make much sense as a technical rule and I am
glad that it is not stated in a way that makes it mandatory.

    Oliver
-- 
28 Thermidor an 211 de la Révolution
Liberté, Egalité, Fraternité!