Gnus development mailing list
 help / color / mirror / Atom feed
* Garbled display of UTF-8 encoded mails
@ 2010-11-23 20:11 Sven Joachim
  2010-11-23 21:03 ` Sven Joachim
  0 siblings, 1 reply; 12+ messages in thread
From: Sven Joachim @ 2010-11-23 20:11 UTC (permalink / raw)
  To: ding

Hi,

during the last weeks I encountered a problem viewing articles (mainly
mails) with non-ASCII characters, e.g. a German Umlaut "Ü" is displayed
as an octal digit, \334.  This happens only if the message is encoded
like this:

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

It does not happen with charset=iso-8859-1 or Content-Transfer-Encoding:
quoted-printable (thus you cannot reproduce it with articles from
Gmane).  Any idea what might have caused this?

The bundled Gnus from the emacs-23 branch does not have this problem.

Cheers,
       Sven



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Garbled display of UTF-8 encoded mails
  2010-11-23 20:11 Garbled display of UTF-8 encoded mails Sven Joachim
@ 2010-11-23 21:03 ` Sven Joachim
  2010-11-24  4:39   ` Katsumi Yamaoka
  2010-11-24 10:19   ` Sven Joachim
  0 siblings, 2 replies; 12+ messages in thread
From: Sven Joachim @ 2010-11-23 21:03 UTC (permalink / raw)
  To: ding

On 2010-11-23 21:11 +0100, Sven Joachim wrote:

> Hi,
>
> during the last weeks I encountered a problem viewing articles (mainly
> mails) with non-ASCII characters, e.g. a German Umlaut "Ü" is displayed
> as an octal digit, \334.  This happens only if the message is encoded
> like this:
>
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: 8bit
>
> It does not happen with charset=iso-8859-1 or Content-Transfer-Encoding:
> quoted-printable (thus you cannot reproduce it with articles from
> Gmane).  Any idea what might have caused this?

I bisected the problem and found out that commit bda3e8962a is the
culprit:

commit bda3e8962af0aee90144c3ae8c5360aa4c106d94
Author: Katsumi Yamaoka <yamaoka@jpl.org>
Date:   Fri May 7 06:34:41 2010 +0000

    * binhex.el (binhex-decode-region-internal)
    * dns.el (dns-read-string-name, dns-write, dns-read, dns-read-type)
    (dns-query)
    * nnweb.el (nnweb-gmane-search)
    * pgg-parse.el (pgg-parse-armor)
    * pgg.el (pgg-verify-region)
    * sha1.el (sha1-string-external)
    * uudecode.el (uudecode-decode-region-internal)
    * yenc.el (yenc-decode-region): Don't run set-buffer-multibyte for
    XEmacs.
    
    * gnus-art.el (gnus-article-browse-html-parts)
    * gnus-group.el (gnus-read-ephemeral-gmane-group)
    (gnus-read-ephemeral-bug-grou): Use mm-make-temp-file instead of
    make-temp-file.
    
    * gnus-dired.el (gnus-dired-mode): Bind gnus-dired-mode-hook,
    gnus-dired-mode-on-hook and gnus-dired-mode-off-hook for XEmacs when
    compiling.
    
    * gnus-ml.el (gnus-mailing-list-mode): Bind gnus-mailing-list-mode-hook,
    gnus-mailing-list-mode-on-hook and gnus-mailing-list-mode-off-hook for
    XEmacs when compiling.
    
    * gnus-salt.el (gnus-pick-mode): Bind gnus-pick-mode-on-hook and
    gnus-pick-mode-off-hook for XEmacs when compiling.
    (gnus-binary-mode): Bind gnus-binary-mode-on-hook and
    gnus-binary-mode-off-hook for XEmacs when compiling.
    
    * gnus-sum.el (gnus-summary-limit-strange-charsets-predicate): Return
    nil if char-charset is not available.
    
    * imap.el (imap-disable-multibyte)
    * sieve-manage.el (sieve-manage-disable-multibyte): Redefine them as a
    macro.
    
    * mm-url.el (mm-url-form-encode-xwfu): Use mm-encode-coding-string
    instead of encode-coding-string.
    
    * mm-util.el (mm-enable-multibyte, mm-disable-multibyte): Use (featurep
    'xemacs) instead of mm-emacs-mule to switch function definitions.
    (mm-with-unibyte-current-buffer): Make it a progn macro for XEmacs.
    
    * lpath.el: Fbind delete-overlay and overlay-lists for XEmacs;
    bind temporary-file-directory for XEmacs;
    fbind make-temp-file, set-buffer-multibyte, string-as-multibyte and
    timer-set-function for XEmacs 21.4 and SXEmacs;
    bind timer-list for XEmacs 21.4 and SXEmacs;
    fbind char-charset and find-charset-region for non-Mule XEmacs;
    fbind decode-coding-region, decode-coding-string, detect-coding-region,
    encode-coding-region and encode-coding-string for XEmacs having no
    file-coding feature.


I refrain from including the whole diff which is more than 500 lines and
touches 20 files. :-/  Going to find out what exactly broke things
tomorrow, unless somebody (Katsumi?) beats me to it.

Sven



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Garbled display of UTF-8 encoded mails
  2010-11-23 21:03 ` Sven Joachim
@ 2010-11-24  4:39   ` Katsumi Yamaoka
  2010-11-24  6:27     ` Sven Joachim
  2010-11-24 10:19   ` Sven Joachim
  1 sibling, 1 reply; 12+ messages in thread
From: Katsumi Yamaoka @ 2010-11-24  4:39 UTC (permalink / raw)
  To: ding

Sven Joachim wrote:
>> during the last weeks I encountered a problem viewing articles (mainly
>> mails) with non-ASCII characters, e.g. a German Umlaut "Ü" is displayed
>> as an octal digit, \334.  This happens only if the message is encoded
>> like this:
>>
>> Content-Type: text/plain; charset=utf-8
>> Content-Transfer-Encoding: 8bit

It seems to be due to a bug in sender's mailer.

(encode-coding-string "Ü" 'utf-8) => "\303\234"
(encode-coding-string "Ü" 'iso-8859-1) => "\334"

So the Content-Type header should have been labeled with iso-8859-1,
not utf-8.  If you often face such mails, there's a handy workaround.
Try adding something like the following to your ~/.gnus.el file:

(setq gnus-summary-show-article-charset-alist
      '((0 . undecided)
	(1 . iso-8859-1)
	(2 . windows-1252)
	(3 . utf-8)))

Then you can type `1 g' in the summary buffer to decode an article
by decoding by iso-8859-1 forcibly no matter what the charset the
Content-Type header says.

Cf. (info "(gnus)Paging the Article")

>> It does not happen with charset=iso-8859-1 or Content-Transfer-Encoding:
>> quoted-printable (thus you cannot reproduce it with articles from
>> Gmane).  Any idea what might have caused this?

> I bisected the problem and found out that commit bda3e8962a is the
> culprit:

> commit bda3e8962af0aee90144c3ae8c5360aa4c106d94
> Author: Katsumi Yamaoka <yamaoka@jpl.org>
> Date:   Fri May 7 06:34:41 2010 +0000

I tried old Gnus of Apr. 2010 but found no difference.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Garbled display of UTF-8 encoded mails
  2010-11-24  4:39   ` Katsumi Yamaoka
@ 2010-11-24  6:27     ` Sven Joachim
  2010-11-24  7:03       ` Sven Joachim
  0 siblings, 1 reply; 12+ messages in thread
From: Sven Joachim @ 2010-11-24  6:27 UTC (permalink / raw)
  To: ding; +Cc: Katsumi Yamaoka

On 2010-11-24 05:39 +0100, Katsumi Yamaoka wrote:

> Sven Joachim wrote:
>>> during the last weeks I encountered a problem viewing articles (mainly
>>> mails) with non-ASCII characters, e.g. a German Umlaut "Ü" is displayed
>>> as an octal digit, \334.  This happens only if the message is encoded
>>> like this:
>>>
>>> Content-Type: text/plain; charset=utf-8
>>> Content-Transfer-Encoding: 8bit
>
> It seems to be due to a bug in sender's mailer.

Not really.

> (encode-coding-string "Ü" 'utf-8) => "\303\234"
> (encode-coding-string "Ü" 'iso-8859-1) => "\334"

But in the raw article (when pressing C-u C-u g) I _do_ see \303\234, so
the charset seems to be declared correctly.  And when viewing the same
article on Gmane (which changes Content-Transfer-Encoding to
quoted-printable) everything looks fine.

>>> It does not happen with charset=iso-8859-1 or Content-Transfer-Encoding:
>>> quoted-printable (thus you cannot reproduce it with articles from
>>> Gmane).  Any idea what might have caused this?
>
>> I bisected the problem and found out that commit bda3e8962a is the
>> culprit:
>
>> commit bda3e8962af0aee90144c3ae8c5360aa4c106d94
>> Author: Katsumi Yamaoka <yamaoka@jpl.org>
>> Date:   Fri May 7 06:34:41 2010 +0000
>
> I tried old Gnus of Apr. 2010 but found no difference.

Maybe you'll see a difference in this mail.  I include characters
that cannot be encoded as iso-8859-1 and CC you.  Hopefully no-one
changes Content-Transfer-Encoding on the way.

„Schöne Grüße aus Übersee“,
Sven



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Garbled display of UTF-8 encoded mails
  2010-11-24  6:27     ` Sven Joachim
@ 2010-11-24  7:03       ` Sven Joachim
  2010-11-24  7:15         ` Katsumi Yamaoka
  2010-11-24  7:17         ` Sven Joachim
  0 siblings, 2 replies; 12+ messages in thread
From: Sven Joachim @ 2010-11-24  7:03 UTC (permalink / raw)
  To: ding

On 2010-11-24 07:27 +0100, Sven Joachim wrote:

> Maybe you'll see a difference in this mail.  I include characters
> that cannot be encoded as iso-8859-1 and CC you.  Hopefully no-one
> changes Content-Transfer-Encoding on the way.

Gnah, this displays fine in my GCC'ed copy, although other messages I've
sent appear broken. :-(  I'll try to find an example that shows the bug.

> „Schöne Grüße aus Übersee“,
> Sven



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Garbled display of UTF-8 encoded mails
  2010-11-24  7:03       ` Sven Joachim
@ 2010-11-24  7:15         ` Katsumi Yamaoka
  2010-11-24  7:17         ` Sven Joachim
  1 sibling, 0 replies; 12+ messages in thread
From: Katsumi Yamaoka @ 2010-11-24  7:15 UTC (permalink / raw)
  To: ding

Sven Joachim <svenjoac@gmx.de> wrote:
> On 2010-11-24 07:27 +0100, Sven Joachim wrote:

>> Maybe you'll see a difference in this mail.  I include characters
>> that cannot be encoded as iso-8859-1 and CC you.  Hopefully no-one
>> changes Content-Transfer-Encoding on the way.

> Gnah, this displays fine in my GCC'ed copy, although other messages I've
> sent appear broken. :-(  I'll try to find an example that shows the bug.

>> „Schöne Grüße aus Übersee“,
>> Sven

What mail server do you use?  M$ Exchange is completely broken!
Though it might be due to the configuration, I receive broken
ones often and often in the office.  (Mails to yamaoka@jpl.org
are safe).



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Garbled display of UTF-8 encoded mails
  2010-11-24  7:03       ` Sven Joachim
  2010-11-24  7:15         ` Katsumi Yamaoka
@ 2010-11-24  7:17         ` Sven Joachim
  2010-11-24  7:28           ` Katsumi Yamaoka
  1 sibling, 1 reply; 12+ messages in thread
From: Sven Joachim @ 2010-11-24  7:17 UTC (permalink / raw)
  To: ding; +Cc: Katsumi Yamaoka

[-- Attachment #1: Type: text/plain, Size: 55 bytes --]

I include an attachment that hopefully shows the bug.


[-- Attachment #2: Sample text with Umlauts --]
[-- Type: text/plain, Size: 29 bytes --]

Schöne Grüße aus Übersee

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Garbled display of UTF-8 encoded mails
  2010-11-24  7:17         ` Sven Joachim
@ 2010-11-24  7:28           ` Katsumi Yamaoka
  2010-11-24  7:58             ` Sven Joachim
  0 siblings, 1 reply; 12+ messages in thread
From: Katsumi Yamaoka @ 2010-11-24  7:28 UTC (permalink / raw)
  To: ding

<#part type="text/plain" disposition=inline charset=utf-8 encoding=8bit>
Sven Joachim wrote:

> I include an attachment that hopefully shows the bug.
> Schöne Grüße aus Übersee

No problem here.  Hm.  I tried Emacs 23.2 in addition to 24.0.50,
with the latest No Gnus.  The raw contents I received is:

Content-Type: text/plain; charset=utf-8
Content-Disposition: inline; filename=sample
Content-Transfer-Encoding: 8bit
Content-Description: Sample text with Umlauts

Sch\303\266ne Gr\303\274\303\237e aus \303\234bersee

This is really a utf-8 byte stream.  I may need to try 23.2.90...



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Garbled display of UTF-8 encoded mails
  2010-11-24  7:28           ` Katsumi Yamaoka
@ 2010-11-24  7:58             ` Sven Joachim
  0 siblings, 0 replies; 12+ messages in thread
From: Sven Joachim @ 2010-11-24  7:58 UTC (permalink / raw)
  To: ding

On 2010-11-24 08:28 +0100, Katsumi Yamaoka wrote:

> <#part type="text/plain" disposition=inline charset=utf-8 encoding=8bit>
> Sven Joachim wrote:
>
>> I include an attachment that hopefully shows the bug.
>> Schöne Grüße aus Übersee
>
> No problem here.  Hm.  I tried Emacs 23.2 in addition to 24.0.50,
> with the latest No Gnus.  The raw contents I received is:
>
> Content-Type: text/plain; charset=utf-8
> Content-Disposition: inline; filename=sample
> Content-Transfer-Encoding: 8bit
> Content-Description: Sample text with Umlauts
>
> Sch\303\266ne Gr\303\274\303\237e aus \303\234bersee

Same here in my GCC'ed and BCC'ed copies.  Yet the article is displayed
garbled. :-(  Maybe something is wrong with my Gnus configuration (I'm
using nnfolder backend, in case that is relevant).

Sven




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Garbled display of UTF-8 encoded mails
  2010-11-23 21:03 ` Sven Joachim
  2010-11-24  4:39   ` Katsumi Yamaoka
@ 2010-11-24 10:19   ` Sven Joachim
  2010-11-24 12:54     ` Sven Joachim
  1 sibling, 1 reply; 12+ messages in thread
From: Sven Joachim @ 2010-11-24 10:19 UTC (permalink / raw)
  To: ding

Seems I found a solution. :-)

On 2010-11-23 22:03 +0100, Sven Joachim wrote:

> I bisected the problem and found out that commit bda3e8962a is the
> culprit:
>
> commit bda3e8962af0aee90144c3ae8c5360aa4c106d94
> Author: Katsumi Yamaoka <yamaoka@jpl.org>
> Date:   Fri May 7 06:34:41 2010 +0000
> [...]
>     * mm-util.el (mm-enable-multibyte, mm-disable-multibyte): Use (featurep
>     'xemacs) instead of mm-emacs-mule to switch function definitions.

The diff hunk that corresponds to this sentence is the following:

--8<---------------cut here---------------start------------->8---
diff --git a/lisp/mm-util.el b/lisp/mm-util.el
index c1dc4f5..a288b8b 100644
--- a/lisp/mm-util.el
+++ b/lisp/mm-util.el
@@ -908,20 +908,20 @@ mail with multiple parts is preferred to sending a Unicode one.")
 			     (fboundp 'set-buffer-multibyte))
     "True in Emacs with Mule.")
 
-  (if mm-emacs-mule
-      (defun mm-enable-multibyte ()
-	"Set the multibyte flag of the current buffer.
+  (if (featurep 'xemacs)
+      (defalias 'mm-enable-multibyte 'ignore)
+    (defun mm-enable-multibyte ()
+      "Set the multibyte flag of the current buffer.
 Only do this if the default value of `enable-multibyte-characters' is
 non-nil.  This is a no-op in XEmacs."
-	(set-buffer-multibyte 'to))
-    (defalias 'mm-enable-multibyte 'ignore))
+      (set-buffer-multibyte t)))
 
-  (if mm-emacs-mule
-      (defun mm-disable-multibyte ()
-	"Unset the multibyte flag of in the current buffer.
+  (if (featurep 'xemacs)
+      (defalias 'mm-disable-multibyte 'ignore)
+    (defun mm-disable-multibyte ()
+      "Unset the multibyte flag of in the current buffer.
 This is a no-op in XEmacs."
-	(set-buffer-multibyte nil))
-    (defalias 'mm-disable-multibyte 'ignore)))
+      (set-buffer-multibyte nil))))
 
 (defun mm-preferred-coding-system (charset)
   ;; A typo in some Emacs versions.
--8<---------------cut here---------------end--------------->8---

However, this contains a change in semantics that is not mentioned and
might have been unintended.  The old version of mm-enable-multibyte for
Emacs was (set-buffer-multibyte 'to), whereas the new one is
(set-buffer-multibyte t), and this seems to be what causes all my
problems.  Reverting the change in master:

--8<---------------cut here---------------start------------->8---
diff --git a/lisp/mm-util.el b/lisp/mm-util.el
index 67b41e0..700c1a6 100644
--- a/lisp/mm-util.el
+++ b/lisp/mm-util.el
@@ -903,7 +903,7 @@ mail with multiple parts is preferred to sending a Unicode one.")
       "Set the multibyte flag of the current buffer.
 Only do this if the default value of `enable-multibyte-characters' is
 non-nil.  This is a no-op in XEmacs."
-      (set-buffer-multibyte t)))
+      (set-buffer-multibyte 'to)))
 
   (if (featurep 'xemacs)
       (defalias 'mm-disable-multibyte 'ignore)
--8<---------------cut here---------------end--------------->8---

solves the issue for me.

Cheers,
       Sven



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Garbled display of UTF-8 encoded mails
  2010-11-24 10:19   ` Sven Joachim
@ 2010-11-24 12:54     ` Sven Joachim
  2010-11-24 21:07       ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 12+ messages in thread
From: Sven Joachim @ 2010-11-24 12:54 UTC (permalink / raw)
  To: ding

On 2010-11-24 11:19 +0100, Sven Joachim wrote:

> However, this contains a change in semantics that is not mentioned and
> might have been unintended.  The old version of mm-enable-multibyte for
> Emacs was (set-buffer-multibyte 'to), whereas the new one is
> (set-buffer-multibyte t), and this seems to be what causes all my
> problems.  Reverting the change in master:
>
> diff --git a/lisp/mm-util.el b/lisp/mm-util.el
> index 67b41e0..700c1a6 100644
> --- a/lisp/mm-util.el
> +++ b/lisp/mm-util.el
> @@ -903,7 +903,7 @@ mail with multiple parts is preferred to sending a Unicode one.")
>        "Set the multibyte flag of the current buffer.
>  Only do this if the default value of `enable-multibyte-characters' is
>  non-nil.  This is a no-op in XEmacs."
> -      (set-buffer-multibyte t)))
> +      (set-buffer-multibyte 'to)))
>  
>    (if (featurep 'xemacs)
>        (defalias 'mm-disable-multibyte 'ignore)
>
> solves the issue for me.

BTW, I found an old commit which did the same change, but the ChangeLog
entry is not especially informative what exactly the problems were:

,----
| $ git show 302ab06a28aec7e405145417bdcbf324246ea192
| commit 302ab06a28aec7e405145417bdcbf324246ea192
| Author: Simon Josefsson <jas@extundo.com>
| Date:   Sun Nov 30 16:59:35 2003 +0000
| 
|     (mm-enable-multibyte): Call set-buffer-multibyte with
|     'to argument.  Fixes something or other in Emacs 22, and is
|     backwards compatible.  From Kenichi Handa <handa@m17n.org>.
| 
| diff --git a/lisp/ChangeLog b/lisp/ChangeLog
| index 6e37dbe..8d55cf2 100644
| --- a/lisp/ChangeLog
| +++ b/lisp/ChangeLog
| @@ -1,5 +1,9 @@
|  2003-11-30  Simon Josefsson  <jas@extundo.com>
|  
| +	* mm-util.el (mm-enable-multibyte): Call set-buffer-multibyte with
| +	'to argument.  Fixes something or other in Emacs 22, and is
| +	backwards compatible.  From Kenichi Handa <handa@m17n.org>.
| +
|  	* gnus-agent.el (gnus-agent-expire-unagentized-dirs): Custom fix.
|  
|  2003-11-30  Lars Magne Ingebrigtsen  <larsi@gnus.org>
| diff --git a/lisp/mm-util.el b/lisp/mm-util.el
| index eef5c04..cbd85b1 100644
| --- a/lisp/mm-util.el
| +++ b/lisp/mm-util.el
| @@ -413,7 +413,7 @@ used as the line break code type of the coding system."
|  	"Set the multibyte flag of the current buffer.
|  Only do this if the default value of `enable-multibyte-characters' is
|  non-nil.  This is a no-op in XEmacs."
| -	(set-buffer-multibyte t))
| +	(set-buffer-multibyte 'to))
|      (defalias 'mm-enable-multibyte 'ignore))
|  
|    (if mm-emacs-mule
| 
`----

I haven't been able to find a mailinglist thread which describes this in
greater detail.

Sven



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Garbled display of UTF-8 encoded mails
  2010-11-24 12:54     ` Sven Joachim
@ 2010-11-24 21:07       ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 12+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-11-24 21:07 UTC (permalink / raw)
  To: ding

Sven Joachim <svenjoac@gmx.de> writes:

> | +	(set-buffer-multibyte 'to))

Ok, I've now applied this...  let's see if it breaks anything...

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-11-24 21:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-23 20:11 Garbled display of UTF-8 encoded mails Sven Joachim
2010-11-23 21:03 ` Sven Joachim
2010-11-24  4:39   ` Katsumi Yamaoka
2010-11-24  6:27     ` Sven Joachim
2010-11-24  7:03       ` Sven Joachim
2010-11-24  7:15         ` Katsumi Yamaoka
2010-11-24  7:17         ` Sven Joachim
2010-11-24  7:28           ` Katsumi Yamaoka
2010-11-24  7:58             ` Sven Joachim
2010-11-24 10:19   ` Sven Joachim
2010-11-24 12:54     ` Sven Joachim
2010-11-24 21:07       ` Lars Magne Ingebrigtsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).