From: Katsumi Yamaoka <yamaoka@jpl.org>
Cc: rms@gnu.org, bsam@ipt.ru, ding@gnus.org
Subject: Re: gnus: incorrect conversion of Subject and From field from utf-8 to koi8-r
Date: Sat, 15 Oct 2005 01:51:00 +0900 [thread overview]
Message-ID: <b4my84wf3ez.fsf@jpl.org> (raw)
In-Reply-To: <E1EQPSa-0006iC-00@etlken>
I added the ding list to Cc:.
>>>>> In <E1EQPSa-0006iC-00@etlken> Handa-san wrote:
> Ok. Yamaoka-san, it seems that you are the last modifier of
> the relevant part of rfc2047.el, so I include you in CC:.
I'd improved only the rfc2047.el encoder, but that's ok.
>>>>> In <E1EQ6NU-000GJF-2s@bsam.ru>
>>>>> "Boris B. Samorodov" <bsam@ipt.ru> wrote:
>> The bug appeared to be at illegal concatenation of
>> =?UTF-8?<foo> =?UTF-8?<bar> parts of the Subject.
> Yes, the bug is in the way of handling that parts.
> The current code does this:
> (1) Remove spaces between encoded words.
> (2) Decode content-transfer-encoding of <foo> and decode the
> resulting text by utf-8, then decode
> content-transfer-encoding of <bar> and decode the resulting
> text by utf-8.
> But it doesn't work if <foo> and <bar> are devided not at
> character boundary of utf-8. The above case is this.
I see. The sample that Boris B. Samorodov brought up to the
pretest-bug list first gives the following result:
(prin1
(rfc2047-decode-string
"=?UTF-8?B?W2lwdC5ydSAjMTYzXSDQkNCy0YLQvtCe0YLQstC10YI6INCc0KHQmjog0KHQ?= =?UTF-8?B?nyDRgtC10YHRgg==?="))
"[ipt.ru #163] АвтоОтвет: МСК: С\xc3\x90\xc2\x9f тест"
And FLIM's decoder does the same.
> So what we should do is:
> (2') Decode content-transfer-encoding of <foo> and <bar>
> while keeping information of coding system (utf-8 in this
> case) on each part. Then decode the text encoding of a run
> that has the same coding system at once.
> I'll attach a sample patch for doing that. I modified
> rfc2047-decode-string with a helper function
> rfc2047-code-cte.
> As I don't know the detail of rfc2047, I have not yet
> installed it. Could you please check the code and install
> it (or a version that does the similar thing).
Thank you for the patch, but I'm not sure whether dividing of
encoded words in that way is rightful. I need time to look into
it.
> *** rfc2047.el 08 Aug 2005 10:13:38 +0900 1.22
> --- rfc2047.el 14 Oct 2005 22:16:03 +0900
> ***************
> *** 822,827 ****
> --- 822,843 ----
> ;; and worthwhile (is it more correct or not?), e.g. something like
> ;; `=?iso-8859-1?q?foo?=@'.
> + (defun rfc2047-decode-cte (charset encoding word)
> + "Decode content-transfer-encoding of WORD by ENCODING.
> + Put text property `coding' to the decoded word with value a coding system
> + derived from CHARSET."
> + (cond ((char-equal ?B encoding)
> + (setq word (base64-decode-string (rfc2047-pad-base64 word))))
> + ((char-equal ?Q encoding)
> + (setq word (quoted-printable-decode-string
> + (mm-subst-char-in-string ?_ ? word t))))
> + (t (error "Invalid encoding: %c" encoding)))
> + (setq word (string-to-multibyte word))
> + (setq charset (intern (downcase charset)))
> + (put-text-property 0 (length word)
> + 'coding (mm-charset-to-coding-system charset) word)
> + word)
> +
> (defun rfc2047-decode-region (start end)
> "Decode MIME-encoded words in region between START and END."
> (interactive "r")
> ***************
> *** 842,857 ****
> ;; Decode the encoded words.
> (setq b (goto-char (point-min)))
> (while (re-search-forward rfc2047-encoded-word-regexp nil t)
> (setq e (match-beginning 0))
> ! (insert (rfc2047-parse-and-decode
> ! (prog1
> ! (match-string 0)
> ! (delete-region e (match-end 0)))))
> ! (while (looking-at rfc2047-encoded-word-regexp)
> ! (insert (rfc2047-parse-and-decode
> ! (prog1
> ! (match-string 0)
> ! (delete-region (point) (match-end 0))))))
> (save-restriction
> (narrow-to-region e (point))
> (goto-char e)
> --- 858,888 ----
> ;; Decode the encoded words.
> (setq b (goto-char (point-min)))
> (while (re-search-forward rfc2047-encoded-word-regexp nil t)
> + ;; At first, decode content-transfer-encoding of the
> + ;; succeeding encoded words.
> (setq e (match-beginning 0))
> ! (let ((charset (match-string 1))
> ! (encoding (char-after (match-beginning 3)))
> ! (word (match-string 4)))
> ! (delete-region e (match-end 0))
> ! (insert (rfc2047-decode-cte charset encoding word))
> ! (while (looking-at rfc2047-encoded-word-regexp)
> ! (setq charset (match-string 1)
> ! encoding (char-after (match-beginning 3))
> ! word (match-string 4))
> ! (delete-region (point) (match-end 0))
> ! (insert (rfc2047-decode-cte charset encoding word))))
> ! ;; Then decode the text encoding.
> ! (save-restriction
> ! (narrow-to-region e (point))
> ! (goto-char e)
> ! (while (not (eobp))
> ! (let ((from (point))
> ! (coding (get-text-property (point) 'coding)))
> ! (goto-char (next-single-property-change from coding nil
> ! (point-max)))
> ! (if coding
> ! (decode-coding-region from (point) coding)))))
> (save-restriction
> (narrow-to-region e (point))
> (goto-char e)
next parent reply other threads:[~2005-10-14 16:51 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <E1EQHq4-0002rQ-EC@fencepost.gnu.org>
[not found] ` <E1EQPSa-0006iC-00@etlken>
2005-10-14 16:51 ` Katsumi Yamaoka [this message]
2005-10-15 0:46 ` Kenichi Handa
2005-10-15 8:28 ` Katsumi Yamaoka
2005-10-15 8:50 ` Kenichi Handa
2005-10-15 10:06 ` Katsumi Yamaoka
2005-10-16 0:25 ` Kenichi Handa
2005-10-18 18:20 ` Boris Samorodov
2005-10-19 4:12 ` Katsumi Yamaoka
2005-10-19 20:16 ` Richard M. Stallman
[not found] <E1EQ6NU-000GJF-2s@bsam.ru>
2005-10-13 18:26 ` Reiner Steib
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b4my84wf3ez.fsf@jpl.org \
--to=yamaoka@jpl.org \
--cc=bsam@ipt.ru \
--cc=ding@gnus.org \
--cc=rms@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).