Gnus development mailing list
 help / color / mirror / Atom feed
From: Florent Rougon <flo@via.ecp.fr>
To: ding@gnus.org
Subject: Non-ASCII characters in 8-bit body corrupted as mail is imported
Date: Sun, 02 Sep 2007 19:47:26 +0200	[thread overview]
Message-ID: <87abs5arip.fsf@florent.maison> (raw)

Hi,

I just migrated to Emacs 22 and have an annoying problem with incoming
mail. I had a similar problem in Emacs 21, but there I found a way to
workaround it by using '(standard-display-european 1)', which had the
unwanted side-effect of putting Emacs in unibyte mode, so that I
couldn't deal correctly with UTF-8, etc.

So, switching to Emacs 22, I tried to get rid of this semi-obsolete
'(standard-display-european 1)' call and make a clean configuration that
works in multibyte mode.

Everything I tested works, except one thing with Gnus: when I receive a
a mail that looks like that:

  <Usual headers snipped>
  Content-Type: text/plain; charset=iso-8859-1
  Content-Transfer-Encoding: 8bit

  Test é à

it gets corrupted when Gnus imports it into my nnml backend.

I have:

  (setq mail-sources
        '((file :path "/var/mail/flo" :plugged t)
          (file :path "~/mbox" :plugged t)
  ))

I did check carefully: the mail is *not* corrupted when sitting in
/var/mail/flo, but it is after Gnus read it and stored it in my nnml
folder. (I tried to route such tests to an nnfolder backend to see if
the problem was specifically in the nnml backend, but failed. Gnus
stored the received mails in ~/Mail/nnml/Tests, or even
~/Mail/nnml/nnfolder+tests:Tests, despite me having created a
nnfolder+tests:Tests group and adapted nnmail-split-methods so that my
test messages go there...)

Of course, the file in ~/Mail/nnml/ being corrupted, it is badly
displayed afterwards.

The corruption is the following: every non-ASCII character is preceded
by some character that depends on the language environment and on
whether I activate these lines in my .emacs.el or not:

   (require 'ucs-tables)
   (unify-8859-on-encoding-mode 1)
   (unify-8859-on-decoding-mode 1)
   (prefer-coding-system 'latin-1)

  - if I don't put these lines and the language environment is latin-9, the
    unwanted character is #x8E;
  - if I do activate these lines, the unwanted character is #x81 no
    matter whether my language environment is latin-9 or latin-1.

NB : #x81 is the infamous \201 in octal
     #x8E is the less famous (at least to me) \216

I did many tests, but didn't manage to find a configuration that doesn't
trigger the problem (except switching again to Emacs 21 with
(standard-display-european 1)...).

I upgraded to today's Gnus CVS, in case it was fixed there, but that
didn't solve my problem.

I believe the parasite character is part of Emacs' internal
representation of the non-ASCII chars, but it shouldn't go to the
backend files...

If I receive a mail with the same accented chars but with the body
encoded in quoted-printable, the problem doesn't happen, so it is only
triggered when reading directly the raw non-ASCII chars from the spool
file.

My configuration is the following:

  GNU Emacs 22.1.1 (i486-pc-linux-gnu, GTK+ Version 2.8.20) of 2007-09-02
  on florent, modified by Debian

  No Gnus v0.7 (from today's CVS)

The portion of my .emacs.el that is related to encoding issues is:

  (set-language-environment 'latin-9)
  ;; I also tried this, with similar results
  ;; (set-language-environment 'latin-1)
  (set-keyboard-coding-system default-keyboard-coding-system)
  (set-terminal-coding-system default-terminal-coding-system)

  (setq selection-coding-system 'compound-text-with-extensions)

  ;; <optional> (activated in some tests, deactivated in other tests, see
  ;;             above)
  (require 'ucs-tables)
  (unify-8859-on-encoding-mode 1)
  (unify-8859-on-decoding-mode 1)
  (prefer-coding-system 'latin-1)
  ;; </optional>

  (require 'iso-transl)

The portion of my .gnus.el that is related to encoding issues is:

  (setq gnus-default-charset 'iso-8859-1
        gnus-default-posting-charset 'iso-8859-1
        message-default-charset 'iso-8859-1
        mm-coding-system-priorities '(iso-8859-1 iso-8859-15 utf-8))

  (add-to-list 'mm-body-charset-encoding-alist '(iso-8859-1 . 8bit))
  (add-to-list 'mm-body-charset-encoding-alist '(iso-8859-15 . 8bit))

When I encountered the problems, I deinstalled the Debian package
mule-ucs (from etch), but this didn't solve anything.

Any help would be *much* appreciated. Thanks!

(in the meantime, I'm stuck with Emacs 21 in unibyte mode, if I don't
want to corrupt my incoming mail...)

-- 
Florent



                 reply	other threads:[~2007-09-02 17:47 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87abs5arip.fsf@florent.maison \
    --to=flo@via.ecp.fr \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).