Gnus development mailing list
 help / color / mirror / Atom feed
* Override charset in <meta> tag for K H
@ 2010-03-29  0:42 Martin Stjernholm
  2010-03-29  9:50 ` Katsumi Yamaoka
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Stjernholm @ 2010-03-29  0:42 UTC (permalink / raw)
  To: ding

[-- Attachment #1: Type: text/plain, Size: 975 bytes --]

When an html mail is viewed externally with K H
(gnus-article-browse-html-article), gnus-article-browse-html-parts might
sometimes forcefully encode it to utf-8 in the temporary file. However,
if the html already contains a <meta> with a different charset, it won't
be changed (as is the documented behavior for mm-add-meta-html-tag). The
result is that the browser views the utf-8 encoded article with the
original charset.

I haven't dug deep enough in gnus-article-browse-html-parts to
understand why it sometimes changes the charset, but I assume it is with
good reason. Anyway, if the html is recoded to a different charset then
clearly the charset in the <meta> tag should be updated too. The
attached patch fixes this in the code paths where it's forced to utf-8.

Note that there are at least two more paths where the article goes
through mm-encode-coding-string. Since I haven't grasped those, this
patch doesn't touch them. The same problem might exist there too.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: meta-charset-override.patch --]
[-- Type: text/x-diff, Size: 3203 bytes --]

mm-decode.el (mm-add-meta-html-tag): Added option to override the charset.
    
gnus-art.el (gnus-article-browse-html-parts): Force the correct charset into
the <meta> tag when the article is encoded to utf-8.

diff --git a/lisp/gnus-art.el b/lisp/gnus-art.el
index 1a66404..3dcc30e 100644
--- a/lisp/gnus-art.el
+++ b/lisp/gnus-art.el
@@ -2862,7 +2862,7 @@ message header will be added to the bodies of the \"text/html\" parts."
 	     ;; Add a meta html tag to specify charset and a header.
 	     (cond
 	      (header
-	       (let (title eheader body hcharset coding)
+	       (let (title eheader body hcharset coding force-charset)
 		 (with-temp-buffer
 		   (mm-enable-multibyte)
 		   (setq case-fold-search t)
@@ -2886,7 +2886,8 @@ message header will be added to the bodies of the \"text/html\" parts."
 			     title (when title
 				     (mm-encode-coding-string title charset))
 			     body (mm-encode-coding-string (mm-get-part handle)
-							   charset))
+							   charset)
+			     force-charset t)
 		     (setq hcharset (mm-find-mime-charset-region (point-min)
 								 (point-max)))
 		     (cond ((= (length hcharset) 1)
@@ -2917,7 +2918,8 @@ message header will be added to the bodies of the \"text/html\" parts."
 				       body (mm-encode-coding-string
 					     (mm-decode-coding-string
 					      (mm-get-part handle) body)
-					     charset))))
+					     charset)
+				       force-charset t)))
 			   (setq charset hcharset
 				 eheader (mm-encode-coding-string
 					  (buffer-string) coding)
@@ -2931,7 +2933,7 @@ message header will be added to the bodies of the \"text/html\" parts."
 		   (mm-disable-multibyte)
 		   (insert body)
 		   (when charset
-		     (mm-add-meta-html-tag handle charset))
+		     (mm-add-meta-html-tag handle charset force-charset))
 		   (when title
 		     (goto-char (point-min))
 		     (unless (search-forward "<title>" nil t)
diff --git a/lisp/mm-decode.el b/lisp/mm-decode.el
index a511253..0edc631 100644
--- a/lisp/mm-decode.el
+++ b/lisp/mm-decode.el
@@ -1250,11 +1250,11 @@ PROMPT overrides the default one used to ask user for a file name."
 	   (mm-save-part-to-file handle file)
 	   file))))
 
-(defun mm-add-meta-html-tag (handle &optional charset)
+(defun mm-add-meta-html-tag (handle &optional charset force-charset)
   "Add meta html tag to specify CHARSET of HANDLE in the current buffer.
 CHARSET defaults to the one HANDLE specifies.  Existing meta tag that
-specifies charset will not be modified.  Return t if meta tag is added
-or replaced."
+specifies charset will not be modified unless FORCE-CHARSET is non-nil.
+Return t if meta tag is added or replaced."
   (when (equal (mm-handle-media-type handle) "text/html")
     (when (or charset
 	      (setq charset (mail-content-type-get (mm-handle-type handle)
@@ -1266,7 +1266,8 @@ or replaced."
 	(if (re-search-forward "\
 <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
 text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+?\\)\\)?[\"'][^>]*>" nil t)
-	    (if (and (match-beginning 2)
+	    (if (and (not force-charset)
+		     (match-beginning 2)
 		     (string-match "\\`html\\'" (match-string 1)))
 		;; Don't modify existing meta tag.
 		nil

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Override charset in <meta> tag for K H
  2010-03-29  0:42 Override charset in <meta> tag for K H Martin Stjernholm
@ 2010-03-29  9:50 ` Katsumi Yamaoka
  2010-03-29 22:47   ` Martin Stjernholm
  0 siblings, 1 reply; 4+ messages in thread
From: Katsumi Yamaoka @ 2010-03-29  9:50 UTC (permalink / raw)
  To: ding

>>>>> Martin Stjernholm wrote:
> When an html mail is viewed externally with K H
> (gnus-article-browse-html-article), gnus-article-browse-html-parts might
> sometimes forcefully encode it to utf-8 in the temporary file. However,
> if the html already contains a <meta> with a different charset, it won't
> be changed (as is the documented behavior for mm-add-meta-html-tag). The
> result is that the browser views the utf-8 encoded article with the
> original charset.

Maybe the best is always to use the meta charset for encoding
decoded contents if it is available and is the one that Emacs
knows (recently I made emacs-w3m's shimbun do so).  However, it
doesn't look so easy to achieve in Gnus.

> I haven't dug deep enough in gnus-article-browse-html-parts to
> understand why it sometimes changes the charset, but I assume it is with
> good reason.

I neither have reason at least nor recall why I left the existing
meta tag.

> Anyway, if the html is recoded to a different charset then
> clearly the charset in the <meta> tag should be updated too. The
> attached patch fixes this in the code paths where it's forced to utf-8.

> Note that there are at least two more paths where the article goes
> through mm-encode-coding-string. Since I haven't grasped those, this
> patch doesn't touch them. The same problem might exist there too.

How about making `mm-add-meta-html-tag' always replace the existing
meta tag?  I don't imagine it causes any harm.

Regards,



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Override charset in <meta> tag for K H
  2010-03-29  9:50 ` Katsumi Yamaoka
@ 2010-03-29 22:47   ` Martin Stjernholm
  2010-03-30  4:47     ` Katsumi Yamaoka
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Stjernholm @ 2010-03-29 22:47 UTC (permalink / raw)
  To: Katsumi Yamaoka; +Cc: ding

Katsumi Yamaoka <yamaoka@jpl.org> wrote:

> How about making `mm-add-meta-html-tag' always replace the existing
> meta tag?  I don't imagine it causes any harm.

Can't say, really. I opted for the safe road in the patch for
compatibility reasons. The behavior is afterall well documented and
obviously intentional. mm-decode also gives the impression of being a
standalone library, so there might be other code besides Gnus out there
that use those functions. But I don't know, I just try to avoid
brooding. ;)



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Override charset in <meta> tag for K H
  2010-03-29 22:47   ` Martin Stjernholm
@ 2010-03-30  4:47     ` Katsumi Yamaoka
  0 siblings, 0 replies; 4+ messages in thread
From: Katsumi Yamaoka @ 2010-03-30  4:47 UTC (permalink / raw)
  To: ding

>>>>> Martin Stjernholm wrote:
> Katsumi Yamaoka <yamaoka@jpl.org> wrote:

>> How about making `mm-add-meta-html-tag' always replace the existing
>> meta tag?  I don't imagine it causes any harm.

> Can't say, really. I opted for the safe road in the patch for
> compatibility reasons. The behavior is afterall well documented and
> obviously intentional. mm-decode also gives the impression of being a
> standalone library, so there might be other code besides Gnus out there
> that use those functions. But I don't know, I just try to avoid
> brooding. ;)

Agreed.  After taking a look at the codes afresh, I saw your way
is safer.  Committed in Gnus trunk and Emacs trunk.  Thanks.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-03-30  4:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-29  0:42 Override charset in <meta> tag for K H Martin Stjernholm
2010-03-29  9:50 ` Katsumi Yamaoka
2010-03-29 22:47   ` Martin Stjernholm
2010-03-30  4:47     ` Katsumi Yamaoka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).