* Garbled display of UTF-8 encoded mails
@ 2010-11-23 20:11 Sven Joachim
2010-11-23 21:03 ` Sven Joachim
0 siblings, 1 reply; 12+ messages in thread
From: Sven Joachim @ 2010-11-23 20:11 UTC (permalink / raw)
To: ding
Hi,
during the last weeks I encountered a problem viewing articles (mainly
mails) with non-ASCII characters, e.g. a German Umlaut "Ü" is displayed
as an octal digit, \334. This happens only if the message is encoded
like this:
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
It does not happen with charset=iso-8859-1 or Content-Transfer-Encoding:
quoted-printable (thus you cannot reproduce it with articles from
Gmane). Any idea what might have caused this?
The bundled Gnus from the emacs-23 branch does not have this problem.
Cheers,
Sven
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Garbled display of UTF-8 encoded mails
2010-11-23 20:11 Garbled display of UTF-8 encoded mails Sven Joachim
@ 2010-11-23 21:03 ` Sven Joachim
2010-11-24 4:39 ` Katsumi Yamaoka
2010-11-24 10:19 ` Sven Joachim
0 siblings, 2 replies; 12+ messages in thread
From: Sven Joachim @ 2010-11-23 21:03 UTC (permalink / raw)
To: ding
On 2010-11-23 21:11 +0100, Sven Joachim wrote:
> Hi,
>
> during the last weeks I encountered a problem viewing articles (mainly
> mails) with non-ASCII characters, e.g. a German Umlaut "Ü" is displayed
> as an octal digit, \334. This happens only if the message is encoded
> like this:
>
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: 8bit
>
> It does not happen with charset=iso-8859-1 or Content-Transfer-Encoding:
> quoted-printable (thus you cannot reproduce it with articles from
> Gmane). Any idea what might have caused this?
I bisected the problem and found out that commit bda3e8962a is the
culprit:
commit bda3e8962af0aee90144c3ae8c5360aa4c106d94
Author: Katsumi Yamaoka <yamaoka@jpl.org>
Date: Fri May 7 06:34:41 2010 +0000
* binhex.el (binhex-decode-region-internal)
* dns.el (dns-read-string-name, dns-write, dns-read, dns-read-type)
(dns-query)
* nnweb.el (nnweb-gmane-search)
* pgg-parse.el (pgg-parse-armor)
* pgg.el (pgg-verify-region)
* sha1.el (sha1-string-external)
* uudecode.el (uudecode-decode-region-internal)
* yenc.el (yenc-decode-region): Don't run set-buffer-multibyte for
XEmacs.
* gnus-art.el (gnus-article-browse-html-parts)
* gnus-group.el (gnus-read-ephemeral-gmane-group)
(gnus-read-ephemeral-bug-grou): Use mm-make-temp-file instead of
make-temp-file.
* gnus-dired.el (gnus-dired-mode): Bind gnus-dired-mode-hook,
gnus-dired-mode-on-hook and gnus-dired-mode-off-hook for XEmacs when
compiling.
* gnus-ml.el (gnus-mailing-list-mode): Bind gnus-mailing-list-mode-hook,
gnus-mailing-list-mode-on-hook and gnus-mailing-list-mode-off-hook for
XEmacs when compiling.
* gnus-salt.el (gnus-pick-mode): Bind gnus-pick-mode-on-hook and
gnus-pick-mode-off-hook for XEmacs when compiling.
(gnus-binary-mode): Bind gnus-binary-mode-on-hook and
gnus-binary-mode-off-hook for XEmacs when compiling.
* gnus-sum.el (gnus-summary-limit-strange-charsets-predicate): Return
nil if char-charset is not available.
* imap.el (imap-disable-multibyte)
* sieve-manage.el (sieve-manage-disable-multibyte): Redefine them as a
macro.
* mm-url.el (mm-url-form-encode-xwfu): Use mm-encode-coding-string
instead of encode-coding-string.
* mm-util.el (mm-enable-multibyte, mm-disable-multibyte): Use (featurep
'xemacs) instead of mm-emacs-mule to switch function definitions.
(mm-with-unibyte-current-buffer): Make it a progn macro for XEmacs.
* lpath.el: Fbind delete-overlay and overlay-lists for XEmacs;
bind temporary-file-directory for XEmacs;
fbind make-temp-file, set-buffer-multibyte, string-as-multibyte and
timer-set-function for XEmacs 21.4 and SXEmacs;
bind timer-list for XEmacs 21.4 and SXEmacs;
fbind char-charset and find-charset-region for non-Mule XEmacs;
fbind decode-coding-region, decode-coding-string, detect-coding-region,
encode-coding-region and encode-coding-string for XEmacs having no
file-coding feature.
I refrain from including the whole diff which is more than 500 lines and
touches 20 files. :-/ Going to find out what exactly broke things
tomorrow, unless somebody (Katsumi?) beats me to it.
Sven
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Garbled display of UTF-8 encoded mails
2010-11-23 21:03 ` Sven Joachim
@ 2010-11-24 4:39 ` Katsumi Yamaoka
2010-11-24 6:27 ` Sven Joachim
2010-11-24 10:19 ` Sven Joachim
1 sibling, 1 reply; 12+ messages in thread
From: Katsumi Yamaoka @ 2010-11-24 4:39 UTC (permalink / raw)
To: ding
Sven Joachim wrote:
>> during the last weeks I encountered a problem viewing articles (mainly
>> mails) with non-ASCII characters, e.g. a German Umlaut "Ü" is displayed
>> as an octal digit, \334. This happens only if the message is encoded
>> like this:
>>
>> Content-Type: text/plain; charset=utf-8
>> Content-Transfer-Encoding: 8bit
It seems to be due to a bug in sender's mailer.
(encode-coding-string "Ü" 'utf-8) => "\303\234"
(encode-coding-string "Ü" 'iso-8859-1) => "\334"
So the Content-Type header should have been labeled with iso-8859-1,
not utf-8. If you often face such mails, there's a handy workaround.
Try adding something like the following to your ~/.gnus.el file:
(setq gnus-summary-show-article-charset-alist
'((0 . undecided)
(1 . iso-8859-1)
(2 . windows-1252)
(3 . utf-8)))
Then you can type `1 g' in the summary buffer to decode an article
by decoding by iso-8859-1 forcibly no matter what the charset the
Content-Type header says.
Cf. (info "(gnus)Paging the Article")
>> It does not happen with charset=iso-8859-1 or Content-Transfer-Encoding:
>> quoted-printable (thus you cannot reproduce it with articles from
>> Gmane). Any idea what might have caused this?
> I bisected the problem and found out that commit bda3e8962a is the
> culprit:
> commit bda3e8962af0aee90144c3ae8c5360aa4c106d94
> Author: Katsumi Yamaoka <yamaoka@jpl.org>
> Date: Fri May 7 06:34:41 2010 +0000
I tried old Gnus of Apr. 2010 but found no difference.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Garbled display of UTF-8 encoded mails
2010-11-24 4:39 ` Katsumi Yamaoka
@ 2010-11-24 6:27 ` Sven Joachim
2010-11-24 7:03 ` Sven Joachim
0 siblings, 1 reply; 12+ messages in thread
From: Sven Joachim @ 2010-11-24 6:27 UTC (permalink / raw)
To: ding; +Cc: Katsumi Yamaoka
On 2010-11-24 05:39 +0100, Katsumi Yamaoka wrote:
> Sven Joachim wrote:
>>> during the last weeks I encountered a problem viewing articles (mainly
>>> mails) with non-ASCII characters, e.g. a German Umlaut "Ü" is displayed
>>> as an octal digit, \334. This happens only if the message is encoded
>>> like this:
>>>
>>> Content-Type: text/plain; charset=utf-8
>>> Content-Transfer-Encoding: 8bit
>
> It seems to be due to a bug in sender's mailer.
Not really.
> (encode-coding-string "Ü" 'utf-8) => "\303\234"
> (encode-coding-string "Ü" 'iso-8859-1) => "\334"
But in the raw article (when pressing C-u C-u g) I _do_ see \303\234, so
the charset seems to be declared correctly. And when viewing the same
article on Gmane (which changes Content-Transfer-Encoding to
quoted-printable) everything looks fine.
>>> It does not happen with charset=iso-8859-1 or Content-Transfer-Encoding:
>>> quoted-printable (thus you cannot reproduce it with articles from
>>> Gmane). Any idea what might have caused this?
>
>> I bisected the problem and found out that commit bda3e8962a is the
>> culprit:
>
>> commit bda3e8962af0aee90144c3ae8c5360aa4c106d94
>> Author: Katsumi Yamaoka <yamaoka@jpl.org>
>> Date: Fri May 7 06:34:41 2010 +0000
>
> I tried old Gnus of Apr. 2010 but found no difference.
Maybe you'll see a difference in this mail. I include characters
that cannot be encoded as iso-8859-1 and CC you. Hopefully no-one
changes Content-Transfer-Encoding on the way.
„Schöne Grüße aus Übersee“,
Sven
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Garbled display of UTF-8 encoded mails
2010-11-24 6:27 ` Sven Joachim
@ 2010-11-24 7:03 ` Sven Joachim
2010-11-24 7:15 ` Katsumi Yamaoka
2010-11-24 7:17 ` Sven Joachim
0 siblings, 2 replies; 12+ messages in thread
From: Sven Joachim @ 2010-11-24 7:03 UTC (permalink / raw)
To: ding
On 2010-11-24 07:27 +0100, Sven Joachim wrote:
> Maybe you'll see a difference in this mail. I include characters
> that cannot be encoded as iso-8859-1 and CC you. Hopefully no-one
> changes Content-Transfer-Encoding on the way.
Gnah, this displays fine in my GCC'ed copy, although other messages I've
sent appear broken. :-( I'll try to find an example that shows the bug.
> „Schöne Grüße aus Übersee“,
> Sven
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Garbled display of UTF-8 encoded mails
2010-11-24 7:03 ` Sven Joachim
@ 2010-11-24 7:15 ` Katsumi Yamaoka
2010-11-24 7:17 ` Sven Joachim
1 sibling, 0 replies; 12+ messages in thread
From: Katsumi Yamaoka @ 2010-11-24 7:15 UTC (permalink / raw)
To: ding
Sven Joachim <svenjoac@gmx.de> wrote:
> On 2010-11-24 07:27 +0100, Sven Joachim wrote:
>> Maybe you'll see a difference in this mail. I include characters
>> that cannot be encoded as iso-8859-1 and CC you. Hopefully no-one
>> changes Content-Transfer-Encoding on the way.
> Gnah, this displays fine in my GCC'ed copy, although other messages I've
> sent appear broken. :-( I'll try to find an example that shows the bug.
>> „Schöne Grüße aus Übersee“,
>> Sven
What mail server do you use? M$ Exchange is completely broken!
Though it might be due to the configuration, I receive broken
ones often and often in the office. (Mails to yamaoka@jpl.org
are safe).
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Garbled display of UTF-8 encoded mails
2010-11-24 7:03 ` Sven Joachim
2010-11-24 7:15 ` Katsumi Yamaoka
@ 2010-11-24 7:17 ` Sven Joachim
2010-11-24 7:28 ` Katsumi Yamaoka
1 sibling, 1 reply; 12+ messages in thread
From: Sven Joachim @ 2010-11-24 7:17 UTC (permalink / raw)
To: ding; +Cc: Katsumi Yamaoka
[-- Attachment #1: Type: text/plain, Size: 55 bytes --]
I include an attachment that hopefully shows the bug.
[-- Attachment #2: Sample text with Umlauts --]
[-- Type: text/plain, Size: 29 bytes --]
Schöne Grüße aus Übersee
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Garbled display of UTF-8 encoded mails
2010-11-24 7:17 ` Sven Joachim
@ 2010-11-24 7:28 ` Katsumi Yamaoka
2010-11-24 7:58 ` Sven Joachim
0 siblings, 1 reply; 12+ messages in thread
From: Katsumi Yamaoka @ 2010-11-24 7:28 UTC (permalink / raw)
To: ding
<#part type="text/plain" disposition=inline charset=utf-8 encoding=8bit>
Sven Joachim wrote:
> I include an attachment that hopefully shows the bug.
> Schöne Grüße aus Übersee
No problem here. Hm. I tried Emacs 23.2 in addition to 24.0.50,
with the latest No Gnus. The raw contents I received is:
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline; filename=sample
Content-Transfer-Encoding: 8bit
Content-Description: Sample text with Umlauts
Sch\303\266ne Gr\303\274\303\237e aus \303\234bersee
This is really a utf-8 byte stream. I may need to try 23.2.90...
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Garbled display of UTF-8 encoded mails
2010-11-24 7:28 ` Katsumi Yamaoka
@ 2010-11-24 7:58 ` Sven Joachim
0 siblings, 0 replies; 12+ messages in thread
From: Sven Joachim @ 2010-11-24 7:58 UTC (permalink / raw)
To: ding
On 2010-11-24 08:28 +0100, Katsumi Yamaoka wrote:
> <#part type="text/plain" disposition=inline charset=utf-8 encoding=8bit>
> Sven Joachim wrote:
>
>> I include an attachment that hopefully shows the bug.
>> Schöne Grüße aus Übersee
>
> No problem here. Hm. I tried Emacs 23.2 in addition to 24.0.50,
> with the latest No Gnus. The raw contents I received is:
>
> Content-Type: text/plain; charset=utf-8
> Content-Disposition: inline; filename=sample
> Content-Transfer-Encoding: 8bit
> Content-Description: Sample text with Umlauts
>
> Sch\303\266ne Gr\303\274\303\237e aus \303\234bersee
Same here in my GCC'ed and BCC'ed copies. Yet the article is displayed
garbled. :-( Maybe something is wrong with my Gnus configuration (I'm
using nnfolder backend, in case that is relevant).
Sven
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Garbled display of UTF-8 encoded mails
2010-11-23 21:03 ` Sven Joachim
2010-11-24 4:39 ` Katsumi Yamaoka
@ 2010-11-24 10:19 ` Sven Joachim
2010-11-24 12:54 ` Sven Joachim
1 sibling, 1 reply; 12+ messages in thread
From: Sven Joachim @ 2010-11-24 10:19 UTC (permalink / raw)
To: ding
Seems I found a solution. :-)
On 2010-11-23 22:03 +0100, Sven Joachim wrote:
> I bisected the problem and found out that commit bda3e8962a is the
> culprit:
>
> commit bda3e8962af0aee90144c3ae8c5360aa4c106d94
> Author: Katsumi Yamaoka <yamaoka@jpl.org>
> Date: Fri May 7 06:34:41 2010 +0000
> [...]
> * mm-util.el (mm-enable-multibyte, mm-disable-multibyte): Use (featurep
> 'xemacs) instead of mm-emacs-mule to switch function definitions.
The diff hunk that corresponds to this sentence is the following:
--8<---------------cut here---------------start------------->8---
diff --git a/lisp/mm-util.el b/lisp/mm-util.el
index c1dc4f5..a288b8b 100644
--- a/lisp/mm-util.el
+++ b/lisp/mm-util.el
@@ -908,20 +908,20 @@ mail with multiple parts is preferred to sending a Unicode one.")
(fboundp 'set-buffer-multibyte))
"True in Emacs with Mule.")
- (if mm-emacs-mule
- (defun mm-enable-multibyte ()
- "Set the multibyte flag of the current buffer.
+ (if (featurep 'xemacs)
+ (defalias 'mm-enable-multibyte 'ignore)
+ (defun mm-enable-multibyte ()
+ "Set the multibyte flag of the current buffer.
Only do this if the default value of `enable-multibyte-characters' is
non-nil. This is a no-op in XEmacs."
- (set-buffer-multibyte 'to))
- (defalias 'mm-enable-multibyte 'ignore))
+ (set-buffer-multibyte t)))
- (if mm-emacs-mule
- (defun mm-disable-multibyte ()
- "Unset the multibyte flag of in the current buffer.
+ (if (featurep 'xemacs)
+ (defalias 'mm-disable-multibyte 'ignore)
+ (defun mm-disable-multibyte ()
+ "Unset the multibyte flag of in the current buffer.
This is a no-op in XEmacs."
- (set-buffer-multibyte nil))
- (defalias 'mm-disable-multibyte 'ignore)))
+ (set-buffer-multibyte nil))))
(defun mm-preferred-coding-system (charset)
;; A typo in some Emacs versions.
--8<---------------cut here---------------end--------------->8---
However, this contains a change in semantics that is not mentioned and
might have been unintended. The old version of mm-enable-multibyte for
Emacs was (set-buffer-multibyte 'to), whereas the new one is
(set-buffer-multibyte t), and this seems to be what causes all my
problems. Reverting the change in master:
--8<---------------cut here---------------start------------->8---
diff --git a/lisp/mm-util.el b/lisp/mm-util.el
index 67b41e0..700c1a6 100644
--- a/lisp/mm-util.el
+++ b/lisp/mm-util.el
@@ -903,7 +903,7 @@ mail with multiple parts is preferred to sending a Unicode one.")
"Set the multibyte flag of the current buffer.
Only do this if the default value of `enable-multibyte-characters' is
non-nil. This is a no-op in XEmacs."
- (set-buffer-multibyte t)))
+ (set-buffer-multibyte 'to)))
(if (featurep 'xemacs)
(defalias 'mm-disable-multibyte 'ignore)
--8<---------------cut here---------------end--------------->8---
solves the issue for me.
Cheers,
Sven
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Garbled display of UTF-8 encoded mails
2010-11-24 10:19 ` Sven Joachim
@ 2010-11-24 12:54 ` Sven Joachim
2010-11-24 21:07 ` Lars Magne Ingebrigtsen
0 siblings, 1 reply; 12+ messages in thread
From: Sven Joachim @ 2010-11-24 12:54 UTC (permalink / raw)
To: ding
On 2010-11-24 11:19 +0100, Sven Joachim wrote:
> However, this contains a change in semantics that is not mentioned and
> might have been unintended. The old version of mm-enable-multibyte for
> Emacs was (set-buffer-multibyte 'to), whereas the new one is
> (set-buffer-multibyte t), and this seems to be what causes all my
> problems. Reverting the change in master:
>
> diff --git a/lisp/mm-util.el b/lisp/mm-util.el
> index 67b41e0..700c1a6 100644
> --- a/lisp/mm-util.el
> +++ b/lisp/mm-util.el
> @@ -903,7 +903,7 @@ mail with multiple parts is preferred to sending a Unicode one.")
> "Set the multibyte flag of the current buffer.
> Only do this if the default value of `enable-multibyte-characters' is
> non-nil. This is a no-op in XEmacs."
> - (set-buffer-multibyte t)))
> + (set-buffer-multibyte 'to)))
>
> (if (featurep 'xemacs)
> (defalias 'mm-disable-multibyte 'ignore)
>
> solves the issue for me.
BTW, I found an old commit which did the same change, but the ChangeLog
entry is not especially informative what exactly the problems were:
,----
| $ git show 302ab06a28aec7e405145417bdcbf324246ea192
| commit 302ab06a28aec7e405145417bdcbf324246ea192
| Author: Simon Josefsson <jas@extundo.com>
| Date: Sun Nov 30 16:59:35 2003 +0000
|
| (mm-enable-multibyte): Call set-buffer-multibyte with
| 'to argument. Fixes something or other in Emacs 22, and is
| backwards compatible. From Kenichi Handa <handa@m17n.org>.
|
| diff --git a/lisp/ChangeLog b/lisp/ChangeLog
| index 6e37dbe..8d55cf2 100644
| --- a/lisp/ChangeLog
| +++ b/lisp/ChangeLog
| @@ -1,5 +1,9 @@
| 2003-11-30 Simon Josefsson <jas@extundo.com>
|
| + * mm-util.el (mm-enable-multibyte): Call set-buffer-multibyte with
| + 'to argument. Fixes something or other in Emacs 22, and is
| + backwards compatible. From Kenichi Handa <handa@m17n.org>.
| +
| * gnus-agent.el (gnus-agent-expire-unagentized-dirs): Custom fix.
|
| 2003-11-30 Lars Magne Ingebrigtsen <larsi@gnus.org>
| diff --git a/lisp/mm-util.el b/lisp/mm-util.el
| index eef5c04..cbd85b1 100644
| --- a/lisp/mm-util.el
| +++ b/lisp/mm-util.el
| @@ -413,7 +413,7 @@ used as the line break code type of the coding system."
| "Set the multibyte flag of the current buffer.
| Only do this if the default value of `enable-multibyte-characters' is
| non-nil. This is a no-op in XEmacs."
| - (set-buffer-multibyte t))
| + (set-buffer-multibyte 'to))
| (defalias 'mm-enable-multibyte 'ignore))
|
| (if mm-emacs-mule
|
`----
I haven't been able to find a mailinglist thread which describes this in
greater detail.
Sven
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Garbled display of UTF-8 encoded mails
2010-11-24 12:54 ` Sven Joachim
@ 2010-11-24 21:07 ` Lars Magne Ingebrigtsen
0 siblings, 0 replies; 12+ messages in thread
From: Lars Magne Ingebrigtsen @ 2010-11-24 21:07 UTC (permalink / raw)
To: ding
Sven Joachim <svenjoac@gmx.de> writes:
> | + (set-buffer-multibyte 'to))
Ok, I've now applied this... let's see if it breaks anything...
--
(domestic pets only, the antidote for overdose, milk.)
larsi@gnus.org * Lars Magne Ingebrigtsen
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-11-24 21:07 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-23 20:11 Garbled display of UTF-8 encoded mails Sven Joachim
2010-11-23 21:03 ` Sven Joachim
2010-11-24 4:39 ` Katsumi Yamaoka
2010-11-24 6:27 ` Sven Joachim
2010-11-24 7:03 ` Sven Joachim
2010-11-24 7:15 ` Katsumi Yamaoka
2010-11-24 7:17 ` Sven Joachim
2010-11-24 7:28 ` Katsumi Yamaoka
2010-11-24 7:58 ` Sven Joachim
2010-11-24 10:19 ` Sven Joachim
2010-11-24 12:54 ` Sven Joachim
2010-11-24 21:07 ` Lars Magne Ingebrigtsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).