Gnus development mailing list
 help / color / mirror / Atom feed
* Non-ASCII characters in 8-bit body corrupted as mail is imported
@ 2007-09-02 17:47 Florent Rougon
  0 siblings, 0 replies; only message in thread
From: Florent Rougon @ 2007-09-02 17:47 UTC (permalink / raw)
  To: ding

Hi,

I just migrated to Emacs 22 and have an annoying problem with incoming
mail. I had a similar problem in Emacs 21, but there I found a way to
workaround it by using '(standard-display-european 1)', which had the
unwanted side-effect of putting Emacs in unibyte mode, so that I
couldn't deal correctly with UTF-8, etc.

So, switching to Emacs 22, I tried to get rid of this semi-obsolete
'(standard-display-european 1)' call and make a clean configuration that
works in multibyte mode.

Everything I tested works, except one thing with Gnus: when I receive a
a mail that looks like that:

  <Usual headers snipped>
  Content-Type: text/plain; charset=iso-8859-1
  Content-Transfer-Encoding: 8bit

  Test é à

it gets corrupted when Gnus imports it into my nnml backend.

I have:

  (setq mail-sources
        '((file :path "/var/mail/flo" :plugged t)
          (file :path "~/mbox" :plugged t)
  ))

I did check carefully: the mail is *not* corrupted when sitting in
/var/mail/flo, but it is after Gnus read it and stored it in my nnml
folder. (I tried to route such tests to an nnfolder backend to see if
the problem was specifically in the nnml backend, but failed. Gnus
stored the received mails in ~/Mail/nnml/Tests, or even
~/Mail/nnml/nnfolder+tests:Tests, despite me having created a
nnfolder+tests:Tests group and adapted nnmail-split-methods so that my
test messages go there...)

Of course, the file in ~/Mail/nnml/ being corrupted, it is badly
displayed afterwards.

The corruption is the following: every non-ASCII character is preceded
by some character that depends on the language environment and on
whether I activate these lines in my .emacs.el or not:

   (require 'ucs-tables)
   (unify-8859-on-encoding-mode 1)
   (unify-8859-on-decoding-mode 1)
   (prefer-coding-system 'latin-1)

  - if I don't put these lines and the language environment is latin-9, the
    unwanted character is #x8E;
  - if I do activate these lines, the unwanted character is #x81 no
    matter whether my language environment is latin-9 or latin-1.

NB : #x81 is the infamous \201 in octal
     #x8E is the less famous (at least to me) \216

I did many tests, but didn't manage to find a configuration that doesn't
trigger the problem (except switching again to Emacs 21 with
(standard-display-european 1)...).

I upgraded to today's Gnus CVS, in case it was fixed there, but that
didn't solve my problem.

I believe the parasite character is part of Emacs' internal
representation of the non-ASCII chars, but it shouldn't go to the
backend files...

If I receive a mail with the same accented chars but with the body
encoded in quoted-printable, the problem doesn't happen, so it is only
triggered when reading directly the raw non-ASCII chars from the spool
file.

My configuration is the following:

  GNU Emacs 22.1.1 (i486-pc-linux-gnu, GTK+ Version 2.8.20) of 2007-09-02
  on florent, modified by Debian

  No Gnus v0.7 (from today's CVS)

The portion of my .emacs.el that is related to encoding issues is:

  (set-language-environment 'latin-9)
  ;; I also tried this, with similar results
  ;; (set-language-environment 'latin-1)
  (set-keyboard-coding-system default-keyboard-coding-system)
  (set-terminal-coding-system default-terminal-coding-system)

  (setq selection-coding-system 'compound-text-with-extensions)

  ;; <optional> (activated in some tests, deactivated in other tests, see
  ;;             above)
  (require 'ucs-tables)
  (unify-8859-on-encoding-mode 1)
  (unify-8859-on-decoding-mode 1)
  (prefer-coding-system 'latin-1)
  ;; </optional>

  (require 'iso-transl)

The portion of my .gnus.el that is related to encoding issues is:

  (setq gnus-default-charset 'iso-8859-1
        gnus-default-posting-charset 'iso-8859-1
        message-default-charset 'iso-8859-1
        mm-coding-system-priorities '(iso-8859-1 iso-8859-15 utf-8))

  (add-to-list 'mm-body-charset-encoding-alist '(iso-8859-1 . 8bit))
  (add-to-list 'mm-body-charset-encoding-alist '(iso-8859-15 . 8bit))

When I encountered the problems, I deinstalled the Debian package
mule-ucs (from etch), but this didn't solve anything.

Any help would be *much* appreciated. Thanks!

(in the meantime, I'm stuck with Emacs 21 in unibyte mode, if I don't
want to corrupt my incoming mail...)

-- 
Florent



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2007-09-02 17:47 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-02 17:47 Non-ASCII characters in 8-bit body corrupted as mail is imported Florent Rougon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).