From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/50699
Path: main.gmane.org!not-for-mail
From: Simon Josefsson <jas@extundo.com>
Newsgroups: gmane.emacs.gnus.general
Subject: Re: charset=macintosh
Date: Sat, 08 Mar 2003 21:09:22 +0100
Sender: owner-ding@hpc.uh.edu
Message-ID: <iluk7f9ipsd.fsf@latte.josefsson.org>
References: <shsmtz3wkn.fsf@tux.gnu.franken.de>
	<ilu1y1j5917.fsf@latte.josefsson.org>
	<sh4r6f3rgn.fsf@tux.gnu.franken.de> <m365qvnepi.fsf@defun.localdomain>
	<shr89i3o6j.fsf@tux.gnu.franken.de> <m3zno6vku2.fsf@defun.localdomain>
	<843clxud7u.fsf@lucy.is.informatik.uni-duisburg.de>
	<86k7f9g8sb.fsf@ieee.org> <iluptp1j1iy.fsf@latte.josefsson.org>
	<m3r89hws9g.fsf@defun.localdomain>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: main.gmane.org 1047154181 16973 80.91.224.249 (8 Mar 2003 20:09:41 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Sat, 8 Mar 2003 20:09:41 +0000 (UTC)
Original-X-From: owner-ding@hpc.uh.edu Sat Mar 08 21:09:40 2003
Return-path: <owner-ding@hpc.uh.edu>
Original-Received: from malifon.math.uh.edu ([129.7.128.13])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 18rkdX-0004PU-00
	for <ding-account@gmane.org>; Sat, 08 Mar 2003 21:09:39 +0100
Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists)
	by malifon.math.uh.edu with esmtp (Exim 3.20 #1)
	id 18rkdf-0003qB-00; Sat, 08 Mar 2003 14:09:47 -0600
Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Sat, 08 Mar 2003 14:10:47 -0600 (CST)
Original-Received: from sclp3.sclp.com (sclp3.sclp.com [66.230.238.2])
	by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id OAA12558
	for <ding@hpc.uh.edu>; Sat, 8 Mar 2003 14:10:32 -0600 (CST)
Original-Received: (qmail 51464 invoked by alias); 8 Mar 2003 20:09:28 -0000
Original-Received: (qmail 51459 invoked from network); 8 Mar 2003 20:09:28 -0000
Original-Received: from 178.230.13.217.in-addr.dgcsystems.net (HELO yxa.extundo.com) (217.13.230.178)
  by 66.230.238.6 with SMTP; 8 Mar 2003 20:09:28 -0000
Original-Received: from latte.josefsson.org (yxa.extundo.com [217.13.230.178])
	(authenticated bits=0)
	by yxa.extundo.com (8.12.8/8.12.8) with ESMTP id h28K9MZG016989
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK)
	for <ding@gnus.org>; Sat, 8 Mar 2003 21:09:24 +0100
Original-To: ding@gnus.org
Mail-Copies-To: nobody
X-Payment: hashcash 1.2 0:030308:ding@gnus.org:7e5c70df47781c00
X-Hashcash: 0:030308:ding@gnus.org:7e5c70df47781c00
In-Reply-To: <m3r89hws9g.fsf@defun.localdomain> (Jesper Harder's message of
 "Sat, 08 Mar 2003 20:52:11 +0100")
User-Agent: Gnus/5.090016 (Oort Gnus v0.16) Emacs/21.3.50 (gnu/linux)
Precedence: list
X-Majordomo: 1.94.jlt7
Xref: main.gmane.org gmane.emacs.gnus.general:50699
X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:50699

Jesper Harder <harder@myrealbox.com> writes:

> Simon Josefsson <jas@extundo.com> writes:
>
>> For articles without MIME tags, in groups not in g-g-c-a, it would be
>> nice if Gnus could guess better -- like trying to UTF-8 decode it,
>> which typically only fails when data wasn't UTF-8 encoded, and then go
>> on and try other encodings.  Emacs' decoding functions behave a little
>> strange, but onces fixed Gnus should be able to do this.
>
> Currently they're not good enough, IMHO.  Here's an example:
>
> (detect-coding-string (encode-coding-string "dk.test.utf8-æøå" 'utf-8))
>
> => (iso-latin-1 iso-latin-1 raw-text japanese-shift-jis 
>     chinese-big5 no-conversion mule-utf-8)
>
> The correct answer is last in the list.

Doesn't that function use the preference order configured by the user?
For me, who runs emacs in a UTF-8 locale, it returns mule-utf-8 first.
Released emacs versions have incomplete UTF-8 support (see PROBLEMS)
and UTF-8 have a very low priority so any potential bug aren't
triggered unless they, err, really must be triggered.  This is
reasonable, I think.

I've asked on emacs-devel that emacs in CVS (both 21.3 and HEAD),
which supposedly has complete Unicode support, since the PROBLEMS
entry is removed, should prefer UTF-8 more often.  This would be the
best solution IMHO, as Gnus wouldn't have to contain magic charset
prioritizing code.  It also seems like a reasonable solution, assuming
the Unicode stuff actually is working.

The simplest would probably be that people who likes Unicode run emacs
in a UTF-8 locale though, then they would not have any of these
problems.

Your RFC quote was interesting though, I think it suggests that Gnus
should downgrade UTF-8 to ISO-8859-X whenever possible, even if the
user uses a UTF-8 locale, since ISO-8859-X is more widely supported.
That would probably be a contentious decision though: What if a
Japanese user, in a UTF-8 locale, enters text that happens to be
downgradable to ISO-8859-1?  Downgrading in this case is probably
never a good idea.  OTOH if this situation is purely hypothetical, it
doesn't matter if downgrading happens.