From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/65127 Path: news.gmane.org!not-for-mail From: Florent Rougon Newsgroups: gmane.emacs.gnus.general Subject: Non-ASCII characters in 8-bit body corrupted as mail is imported Date: Sun, 02 Sep 2007 19:47:26 +0200 Message-ID: <87abs5arip.fsf@florent.maison> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1188755284 11684 80.91.229.12 (2 Sep 2007 17:48:04 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sun, 2 Sep 2007 17:48:04 +0000 (UTC) To: ding@gnus.org Original-X-From: ding-owner+M13640=ding+2Daccount=gmane.org@lists.math.uh.edu Sun Sep 02 19:48:03 2007 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by lo.gmane.org with esmtp (Exim 4.50) id 1IRtYF-0003EE-Uo for ding-account@gmane.org; Sun, 02 Sep 2007 19:48:00 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1IRtYD-0008K0-HA for ding-account@gmane.org; Sun, 02 Sep 2007 12:47:57 -0500 Original-Received: from mx2.math.uh.edu ([129.7.128.33]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1IRtYC-0008Ju-40 for ding@lists.math.uh.edu; Sun, 02 Sep 2007 12:47:56 -0500 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx2.math.uh.edu with esmtp (Exim 4.67) (envelope-from ) id 1IRtY8-0001ei-Fy for ding@lists.math.uh.edu; Sun, 02 Sep 2007 12:47:55 -0500 Original-Received: from smtp4-g19.free.fr ([212.27.42.30]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1IRtY4-0001z9-00 for ; Sun, 02 Sep 2007 19:47:48 +0200 Original-Received: from smtp4-g19.free.fr (localhost.localdomain [127.0.0.1]) by smtp4-g19.free.fr (Postfix) with ESMTP id DA0396F252 for ; Sun, 2 Sep 2007 19:47:47 +0200 (CEST) Original-Received: from frougon.dyndns.org (unknown [81.56.18.128]) by smtp4-g19.free.fr (Postfix) with ESMTP id B6CD76F429 for ; Sun, 2 Sep 2007 19:47:47 +0200 (CEST) Original-Received: by frougon.dyndns.org (Postfix, from userid 1000) id 7D26B2F106; Sun, 2 Sep 2007 19:47:26 +0200 (CEST) Mail-Followup-To: ding@gnus.org User-Agent: Gnus/5.110007 (No Gnus v0.7) Emacs/22.1 (gnu/linux) X-Spam-Score: -2.5 (--) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:65127 Archived-At: Hi, I just migrated to Emacs 22 and have an annoying problem with incoming mail. I had a similar problem in Emacs 21, but there I found a way to workaround it by using '(standard-display-european 1)', which had the unwanted side-effect of putting Emacs in unibyte mode, so that I couldn't deal correctly with UTF-8, etc. So, switching to Emacs 22, I tried to get rid of this semi-obsolete '(standard-display-european 1)' call and make a clean configuration that works in multibyte mode. Everything I tested works, except one thing with Gnus: when I receive a a mail that looks like that: Content-Type: text/plain; charset=3Diso-8859-1 Content-Transfer-Encoding: 8bit Test =E9 =E0 it gets corrupted when Gnus imports it into my nnml backend. I have: (setq mail-sources '((file :path "/var/mail/flo" :plugged t) (file :path "~/mbox" :plugged t) )) I did check carefully: the mail is *not* corrupted when sitting in /var/mail/flo, but it is after Gnus read it and stored it in my nnml folder. (I tried to route such tests to an nnfolder backend to see if the problem was specifically in the nnml backend, but failed. Gnus stored the received mails in ~/Mail/nnml/Tests, or even ~/Mail/nnml/nnfolder+tests:Tests, despite me having created a nnfolder+tests:Tests group and adapted nnmail-split-methods so that my test messages go there...) Of course, the file in ~/Mail/nnml/ being corrupted, it is badly displayed afterwards. The corruption is the following: every non-ASCII character is preceded by some character that depends on the language environment and on whether I activate these lines in my .emacs.el or not: (require 'ucs-tables) (unify-8859-on-encoding-mode 1) (unify-8859-on-decoding-mode 1) (prefer-coding-system 'latin-1) - if I don't put these lines and the language environment is latin-9, t= he unwanted character is #x8E; - if I do activate these lines, the unwanted character is #x81 no matter whether my language environment is latin-9 or latin-1. NB : #x81 is the infamous \201 in octal #x8E is the less famous (at least to me) \216 I did many tests, but didn't manage to find a configuration that doesn't trigger the problem (except switching again to Emacs 21 with (standard-display-european 1)...). I upgraded to today's Gnus CVS, in case it was fixed there, but that didn't solve my problem. I believe the parasite character is part of Emacs' internal representation of the non-ASCII chars, but it shouldn't go to the backend files... If I receive a mail with the same accented chars but with the body encoded in quoted-printable, the problem doesn't happen, so it is only triggered when reading directly the raw non-ASCII chars from the spool file. My configuration is the following: GNU Emacs 22.1.1 (i486-pc-linux-gnu, GTK+ Version 2.8.20) of 2007-09-02 on florent, modified by Debian No Gnus v0.7 (from today's CVS) The portion of my .emacs.el that is related to encoding issues is: (set-language-environment 'latin-9) ;; I also tried this, with similar results ;; (set-language-environment 'latin-1) (set-keyboard-coding-system default-keyboard-coding-system) (set-terminal-coding-system default-terminal-coding-system) (setq selection-coding-system 'compound-text-with-extensions) ;; (activated in some tests, deactivated in other tests, see ;; above) (require 'ucs-tables) (unify-8859-on-encoding-mode 1) (unify-8859-on-decoding-mode 1) (prefer-coding-system 'latin-1) ;; (require 'iso-transl) The portion of my .gnus.el that is related to encoding issues is: (setq gnus-default-charset 'iso-8859-1 gnus-default-posting-charset 'iso-8859-1 message-default-charset 'iso-8859-1 mm-coding-system-priorities '(iso-8859-1 iso-8859-15 utf-8)) (add-to-list 'mm-body-charset-encoding-alist '(iso-8859-1 . 8bit)) (add-to-list 'mm-body-charset-encoding-alist '(iso-8859-15 . 8bit)) When I encountered the problems, I deinstalled the Debian package mule-ucs (from etch), but this didn't solve anything. Any help would be *much* appreciated. Thanks! (in the meantime, I'm stuck with Emacs 21 in unibyte mode, if I don't want to corrupt my incoming mail...) --=20 Florent