From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/50699 Path: main.gmane.org!not-for-mail From: Simon Josefsson Newsgroups: gmane.emacs.gnus.general Subject: Re: charset=macintosh Date: Sat, 08 Mar 2003 21:09:22 +0100 Sender: owner-ding@hpc.uh.edu Message-ID: References: <843clxud7u.fsf@lucy.is.informatik.uni-duisburg.de> <86k7f9g8sb.fsf@ieee.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: main.gmane.org 1047154181 16973 80.91.224.249 (8 Mar 2003 20:09:41 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sat, 8 Mar 2003 20:09:41 +0000 (UTC) Original-X-From: owner-ding@hpc.uh.edu Sat Mar 08 21:09:40 2003 Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18rkdX-0004PU-00 for ; Sat, 08 Mar 2003 21:09:39 +0100 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 18rkdf-0003qB-00; Sat, 08 Mar 2003 14:09:47 -0600 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Sat, 08 Mar 2003 14:10:47 -0600 (CST) Original-Received: from sclp3.sclp.com (sclp3.sclp.com [66.230.238.2]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id OAA12558 for ; Sat, 8 Mar 2003 14:10:32 -0600 (CST) Original-Received: (qmail 51464 invoked by alias); 8 Mar 2003 20:09:28 -0000 Original-Received: (qmail 51459 invoked from network); 8 Mar 2003 20:09:28 -0000 Original-Received: from 178.230.13.217.in-addr.dgcsystems.net (HELO yxa.extundo.com) (217.13.230.178) by 66.230.238.6 with SMTP; 8 Mar 2003 20:09:28 -0000 Original-Received: from latte.josefsson.org (yxa.extundo.com [217.13.230.178]) (authenticated bits=0) by yxa.extundo.com (8.12.8/8.12.8) with ESMTP id h28K9MZG016989 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK) for ; Sat, 8 Mar 2003 21:09:24 +0100 Original-To: ding@gnus.org Mail-Copies-To: nobody X-Payment: hashcash 1.2 0:030308:ding@gnus.org:7e5c70df47781c00 X-Hashcash: 0:030308:ding@gnus.org:7e5c70df47781c00 In-Reply-To: (Jesper Harder's message of "Sat, 08 Mar 2003 20:52:11 +0100") User-Agent: Gnus/5.090016 (Oort Gnus v0.16) Emacs/21.3.50 (gnu/linux) Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:50699 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:50699 Jesper Harder writes: > Simon Josefsson writes: > >> For articles without MIME tags, in groups not in g-g-c-a, it would be >> nice if Gnus could guess better -- like trying to UTF-8 decode it, >> which typically only fails when data wasn't UTF-8 encoded, and then go >> on and try other encodings. Emacs' decoding functions behave a little >> strange, but onces fixed Gnus should be able to do this. > > Currently they're not good enough, IMHO. Here's an example: > > (detect-coding-string (encode-coding-string "dk.test.utf8-זרו" 'utf-8)) > > => (iso-latin-1 iso-latin-1 raw-text japanese-shift-jis > chinese-big5 no-conversion mule-utf-8) > > The correct answer is last in the list. Doesn't that function use the preference order configured by the user? For me, who runs emacs in a UTF-8 locale, it returns mule-utf-8 first. Released emacs versions have incomplete UTF-8 support (see PROBLEMS) and UTF-8 have a very low priority so any potential bug aren't triggered unless they, err, really must be triggered. This is reasonable, I think. I've asked on emacs-devel that emacs in CVS (both 21.3 and HEAD), which supposedly has complete Unicode support, since the PROBLEMS entry is removed, should prefer UTF-8 more often. This would be the best solution IMHO, as Gnus wouldn't have to contain magic charset prioritizing code. It also seems like a reasonable solution, assuming the Unicode stuff actually is working. The simplest would probably be that people who likes Unicode run emacs in a UTF-8 locale though, then they would not have any of these problems. Your RFC quote was interesting though, I think it suggests that Gnus should downgrade UTF-8 to ISO-8859-X whenever possible, even if the user uses a UTF-8 locale, since ISO-8859-X is more widely supported. That would probably be a contentious decision though: What if a Japanese user, in a UTF-8 locale, enters text that happens to be downgradable to ISO-8859-1? Downgrading in this case is probably never a good idea. OTOH if this situation is purely hypothetical, it doesn't matter if downgrading happens.