* Override charset in <meta> tag for K H
@ 2010-03-29 0:42 Martin Stjernholm
2010-03-29 9:50 ` Katsumi Yamaoka
0 siblings, 1 reply; 4+ messages in thread
From: Martin Stjernholm @ 2010-03-29 0:42 UTC (permalink / raw)
To: ding
[-- Attachment #1: Type: text/plain, Size: 975 bytes --]
When an html mail is viewed externally with K H
(gnus-article-browse-html-article), gnus-article-browse-html-parts might
sometimes forcefully encode it to utf-8 in the temporary file. However,
if the html already contains a <meta> with a different charset, it won't
be changed (as is the documented behavior for mm-add-meta-html-tag). The
result is that the browser views the utf-8 encoded article with the
original charset.
I haven't dug deep enough in gnus-article-browse-html-parts to
understand why it sometimes changes the charset, but I assume it is with
good reason. Anyway, if the html is recoded to a different charset then
clearly the charset in the <meta> tag should be updated too. The
attached patch fixes this in the code paths where it's forced to utf-8.
Note that there are at least two more paths where the article goes
through mm-encode-coding-string. Since I haven't grasped those, this
patch doesn't touch them. The same problem might exist there too.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: meta-charset-override.patch --]
[-- Type: text/x-diff, Size: 3203 bytes --]
mm-decode.el (mm-add-meta-html-tag): Added option to override the charset.
gnus-art.el (gnus-article-browse-html-parts): Force the correct charset into
the <meta> tag when the article is encoded to utf-8.
diff --git a/lisp/gnus-art.el b/lisp/gnus-art.el
index 1a66404..3dcc30e 100644
--- a/lisp/gnus-art.el
+++ b/lisp/gnus-art.el
@@ -2862,7 +2862,7 @@ message header will be added to the bodies of the \"text/html\" parts."
;; Add a meta html tag to specify charset and a header.
(cond
(header
- (let (title eheader body hcharset coding)
+ (let (title eheader body hcharset coding force-charset)
(with-temp-buffer
(mm-enable-multibyte)
(setq case-fold-search t)
@@ -2886,7 +2886,8 @@ message header will be added to the bodies of the \"text/html\" parts."
title (when title
(mm-encode-coding-string title charset))
body (mm-encode-coding-string (mm-get-part handle)
- charset))
+ charset)
+ force-charset t)
(setq hcharset (mm-find-mime-charset-region (point-min)
(point-max)))
(cond ((= (length hcharset) 1)
@@ -2917,7 +2918,8 @@ message header will be added to the bodies of the \"text/html\" parts."
body (mm-encode-coding-string
(mm-decode-coding-string
(mm-get-part handle) body)
- charset))))
+ charset)
+ force-charset t)))
(setq charset hcharset
eheader (mm-encode-coding-string
(buffer-string) coding)
@@ -2931,7 +2933,7 @@ message header will be added to the bodies of the \"text/html\" parts."
(mm-disable-multibyte)
(insert body)
(when charset
- (mm-add-meta-html-tag handle charset))
+ (mm-add-meta-html-tag handle charset force-charset))
(when title
(goto-char (point-min))
(unless (search-forward "<title>" nil t)
diff --git a/lisp/mm-decode.el b/lisp/mm-decode.el
index a511253..0edc631 100644
--- a/lisp/mm-decode.el
+++ b/lisp/mm-decode.el
@@ -1250,11 +1250,11 @@ PROMPT overrides the default one used to ask user for a file name."
(mm-save-part-to-file handle file)
file))))
-(defun mm-add-meta-html-tag (handle &optional charset)
+(defun mm-add-meta-html-tag (handle &optional charset force-charset)
"Add meta html tag to specify CHARSET of HANDLE in the current buffer.
CHARSET defaults to the one HANDLE specifies. Existing meta tag that
-specifies charset will not be modified. Return t if meta tag is added
-or replaced."
+specifies charset will not be modified unless FORCE-CHARSET is non-nil.
+Return t if meta tag is added or replaced."
(when (equal (mm-handle-media-type handle) "text/html")
(when (or charset
(setq charset (mail-content-type-get (mm-handle-type handle)
@@ -1266,7 +1266,8 @@ or replaced."
(if (re-search-forward "\
<meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+?\\)\\)?[\"'][^>]*>" nil t)
- (if (and (match-beginning 2)
+ (if (and (not force-charset)
+ (match-beginning 2)
(string-match "\\`html\\'" (match-string 1)))
;; Don't modify existing meta tag.
nil
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Override charset in <meta> tag for K H
2010-03-29 0:42 Override charset in <meta> tag for K H Martin Stjernholm
@ 2010-03-29 9:50 ` Katsumi Yamaoka
2010-03-29 22:47 ` Martin Stjernholm
0 siblings, 1 reply; 4+ messages in thread
From: Katsumi Yamaoka @ 2010-03-29 9:50 UTC (permalink / raw)
To: ding
>>>>> Martin Stjernholm wrote:
> When an html mail is viewed externally with K H
> (gnus-article-browse-html-article), gnus-article-browse-html-parts might
> sometimes forcefully encode it to utf-8 in the temporary file. However,
> if the html already contains a <meta> with a different charset, it won't
> be changed (as is the documented behavior for mm-add-meta-html-tag). The
> result is that the browser views the utf-8 encoded article with the
> original charset.
Maybe the best is always to use the meta charset for encoding
decoded contents if it is available and is the one that Emacs
knows (recently I made emacs-w3m's shimbun do so). However, it
doesn't look so easy to achieve in Gnus.
> I haven't dug deep enough in gnus-article-browse-html-parts to
> understand why it sometimes changes the charset, but I assume it is with
> good reason.
I neither have reason at least nor recall why I left the existing
meta tag.
> Anyway, if the html is recoded to a different charset then
> clearly the charset in the <meta> tag should be updated too. The
> attached patch fixes this in the code paths where it's forced to utf-8.
> Note that there are at least two more paths where the article goes
> through mm-encode-coding-string. Since I haven't grasped those, this
> patch doesn't touch them. The same problem might exist there too.
How about making `mm-add-meta-html-tag' always replace the existing
meta tag? I don't imagine it causes any harm.
Regards,
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Override charset in <meta> tag for K H
2010-03-29 9:50 ` Katsumi Yamaoka
@ 2010-03-29 22:47 ` Martin Stjernholm
2010-03-30 4:47 ` Katsumi Yamaoka
0 siblings, 1 reply; 4+ messages in thread
From: Martin Stjernholm @ 2010-03-29 22:47 UTC (permalink / raw)
To: Katsumi Yamaoka; +Cc: ding
Katsumi Yamaoka <yamaoka@jpl.org> wrote:
> How about making `mm-add-meta-html-tag' always replace the existing
> meta tag? I don't imagine it causes any harm.
Can't say, really. I opted for the safe road in the patch for
compatibility reasons. The behavior is afterall well documented and
obviously intentional. mm-decode also gives the impression of being a
standalone library, so there might be other code besides Gnus out there
that use those functions. But I don't know, I just try to avoid
brooding. ;)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Override charset in <meta> tag for K H
2010-03-29 22:47 ` Martin Stjernholm
@ 2010-03-30 4:47 ` Katsumi Yamaoka
0 siblings, 0 replies; 4+ messages in thread
From: Katsumi Yamaoka @ 2010-03-30 4:47 UTC (permalink / raw)
To: ding
>>>>> Martin Stjernholm wrote:
> Katsumi Yamaoka <yamaoka@jpl.org> wrote:
>> How about making `mm-add-meta-html-tag' always replace the existing
>> meta tag? I don't imagine it causes any harm.
> Can't say, really. I opted for the safe road in the patch for
> compatibility reasons. The behavior is afterall well documented and
> obviously intentional. mm-decode also gives the impression of being a
> standalone library, so there might be other code besides Gnus out there
> that use those functions. But I don't know, I just try to avoid
> brooding. ;)
Agreed. After taking a look at the codes afresh, I saw your way
is safer. Committed in Gnus trunk and Emacs trunk. Thanks.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-03-30 4:47 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-29 0:42 Override charset in <meta> tag for K H Martin Stjernholm
2010-03-29 9:50 ` Katsumi Yamaoka
2010-03-29 22:47 ` Martin Stjernholm
2010-03-30 4:47 ` Katsumi Yamaoka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).