Gnus development mailing list
 help / color / mirror / Atom feed
From: Katsumi Yamaoka <yamaoka@jpl.org>
Cc: rms@gnu.org, bsam@ipt.ru, ding@gnus.org
Subject: Re: gnus: incorrect conversion of Subject and From field from utf-8 to koi8-r
Date: Sat, 15 Oct 2005 01:51:00 +0900	[thread overview]
Message-ID: <b4my84wf3ez.fsf@jpl.org> (raw)
In-Reply-To: <E1EQPSa-0006iC-00@etlken>

I added the ding list to Cc:.

>>>>> In <E1EQPSa-0006iC-00@etlken> Handa-san wrote:

> Ok.  Yamaoka-san, it seems that you are the last modifier of
> the relevant part of rfc2047.el, so I include you in CC:.

I'd improved only the rfc2047.el encoder, but that's ok.

>>>>> In <E1EQ6NU-000GJF-2s@bsam.ru>
>>>>>	"Boris B. Samorodov" <bsam@ipt.ru> wrote:

>> The bug appeared to be at illegal concatenation of
>> =?UTF-8?<foo> =?UTF-8?<bar> parts of the Subject.

> Yes, the bug is in the way of handling that parts.

> The current code does this:

> (1) Remove spaces between encoded words.

> (2) Decode content-transfer-encoding of <foo> and decode the
> resulting text by utf-8, then decode
> content-transfer-encoding of <bar> and decode the resulting
> text by utf-8.

> But it doesn't work if <foo> and <bar> are devided not at
> character boundary of utf-8.  The above case is this.

I see.  The sample that Boris B. Samorodov brought up to the
pretest-bug list first gives the following result:

(prin1
 (rfc2047-decode-string
  "=?UTF-8?B?W2lwdC5ydSAjMTYzXSDQkNCy0YLQvtCe0YLQstC10YI6INCc0KHQmjog0KHQ?= =?UTF-8?B?nyDRgtC10YHRgg==?="))
"[ipt.ru #163] АвтоОтвет: МСК: С\xc3\x90\xc2\x9f тест"

And FLIM's decoder does the same.

> So what we should do is:

> (2') Decode content-transfer-encoding of <foo> and <bar>
> while keeping information of coding system (utf-8 in this
> case) on each part.  Then decode the text encoding of a run
> that has the same coding system at once.

> I'll attach a sample patch for doing that.  I modified
> rfc2047-decode-string with a helper function
> rfc2047-code-cte.

> As I don't know the detail of rfc2047, I have not yet
> installed it.  Could you please check the code and install
> it (or a version that does the similar thing).

Thank you for the patch, but I'm not sure whether dividing of
encoded words in that way is rightful.  I need time to look into
it.

> *** rfc2047.el	08 Aug 2005 10:13:38 +0900	1.22
> --- rfc2047.el	14 Oct 2005 22:16:03 +0900	
> ***************
> *** 822,827 ****
> --- 822,843 ----
>   ;; and worthwhile (is it more correct or not?), e.g. something like
>   ;; `=?iso-8859-1?q?foo?=@'.

> + (defun rfc2047-decode-cte (charset encoding word)
> +   "Decode content-transfer-encoding of WORD by ENCODING.
> + Put text property `coding' to the decoded word with value a coding system
> + derived from CHARSET."
> +   (cond ((char-equal ?B encoding)
> + 	 (setq word (base64-decode-string (rfc2047-pad-base64 word))))
> + 	((char-equal ?Q encoding)
> + 	 (setq word (quoted-printable-decode-string
> + 		     (mm-subst-char-in-string ?_ ? word t))))
> + 	(t (error "Invalid encoding: %c" encoding)))
> +   (setq word (string-to-multibyte word))
> +   (setq charset (intern (downcase charset)))
> +   (put-text-property 0 (length word) 
> + 		     'coding (mm-charset-to-coding-system charset) word)
> +   word)
> + 
>   (defun rfc2047-decode-region (start end)
>     "Decode MIME-encoded words in region between START and END."
>     (interactive "r")
> ***************
> *** 842,857 ****
>   	;; Decode the encoded words.
>   	(setq b (goto-char (point-min)))
>   	(while (re-search-forward rfc2047-encoded-word-regexp nil t)
>   	  (setq e (match-beginning 0))
> ! 	  (insert (rfc2047-parse-and-decode
> ! 		   (prog1
> ! 		       (match-string 0)
> ! 		     (delete-region e (match-end 0)))))
> ! 	  (while (looking-at rfc2047-encoded-word-regexp)
> ! 	    (insert (rfc2047-parse-and-decode
> ! 		     (prog1
> ! 			 (match-string 0)
> ! 		       (delete-region (point) (match-end 0))))))
>   	  (save-restriction
>   	    (narrow-to-region e (point))
>   	    (goto-char e)
> --- 858,888 ----
>   	;; Decode the encoded words.
>   	(setq b (goto-char (point-min)))
>   	(while (re-search-forward rfc2047-encoded-word-regexp nil t)
> + 	  ;; At first, decode content-transfer-encoding of the
> + 	  ;; succeeding encoded words.
>   	  (setq e (match-beginning 0))
> ! 	  (let ((charset (match-string 1))
> ! 		(encoding (char-after (match-beginning 3)))
> ! 		(word (match-string 4)))
> ! 	    (delete-region e (match-end 0))
> ! 	    (insert (rfc2047-decode-cte charset encoding word))
> ! 	    (while (looking-at rfc2047-encoded-word-regexp)
> ! 	      (setq charset (match-string 1)
> ! 		    encoding (char-after (match-beginning 3))
> ! 		    word (match-string 4))
> ! 	      (delete-region (point) (match-end 0))
> ! 	      (insert (rfc2047-decode-cte charset encoding word))))
> ! 	  ;; Then decode the text encoding.
> ! 	  (save-restriction
> ! 	    (narrow-to-region e (point))
> ! 	    (goto-char e)
> ! 	    (while (not (eobp))
> ! 	      (let ((from (point))
> ! 		    (coding (get-text-property (point) 'coding)))
> ! 		(goto-char (next-single-property-change from coding nil 
> ! 							(point-max)))
> ! 		(if coding
> ! 		    (decode-coding-region from (point) coding)))))
>   	  (save-restriction
>   	    (narrow-to-region e (point))
>   	    (goto-char e)



       reply	other threads:[~2005-10-14 16:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <E1EQHq4-0002rQ-EC@fencepost.gnu.org>
     [not found] ` <E1EQPSa-0006iC-00@etlken>
2005-10-14 16:51   ` Katsumi Yamaoka [this message]
2005-10-15  0:46     ` Kenichi Handa
2005-10-15  8:28       ` Katsumi Yamaoka
2005-10-15  8:50         ` Kenichi Handa
2005-10-15 10:06           ` Katsumi Yamaoka
2005-10-16  0:25             ` Kenichi Handa
2005-10-18 18:20             ` Boris Samorodov
2005-10-19  4:12               ` Katsumi Yamaoka
2005-10-19 20:16                 ` Richard M. Stallman
     [not found] <E1EQ6NU-000GJF-2s@bsam.ru>
2005-10-13 18:26 ` Reiner Steib

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4my84wf3ez.fsf@jpl.org \
    --to=yamaoka@jpl.org \
    --cc=bsam@ipt.ru \
    --cc=ding@gnus.org \
    --cc=rms@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).