Gnus development mailing list
 help / color / mirror / Atom feed
* one more bug in latest gnus
@ 2003-05-27 13:45 Vladimir Volovich
  2003-05-27 21:57 ` Simon Josefsson
  0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Volovich @ 2003-05-27 13:45 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 1240 bytes --]

Hello,

please press C-d on this message, you'll see the attached
email; it's subject contains "йМХЦЮ  йНРЕКЭМХЙНБЮ Х вЕАНРЮЕБЮ".
The subject is encoded as koi8-r, while the correct
charset is windows-1251 (it was an error of user's mail agent).

when i want to see the correctly encoded subject in gnus, i press
  0 g windows-1251 RET
and gnus re-encodes the message (both headers and body) in
windows-1251 encoding.

for this, you need to execute
(codepage-setup 1251)
(define-coding-system-alias 'windows-1251 'cp1251)

now, the bug in gnus is this:

  after doing such reencoding with 0 g windows-1251 RET,
  i see correctly re-encoded subject in the Article buffer:
  Subject: Книга  Котельникова и Чеботаева
  but in the Summary buffer, i see only "Книга"
  but the rest of the subject is not shown.

That may be due to the fact that in the original message the subject
consists of several lines:

Subject: =?KOI8-R?Q?=CA=ED=E8=E3=E0?=
 =?KOI8-R?Q?__=CA=EE=F2=E5=EB=FC=ED=E8=EA=EE=E2=E0_=E8?=
 =?KOI8-R?Q?_=D7=E5=E1=EE=F2=E0=E5=E2=E0?=

and the word "Книга" is on the first line; and gnus incorrectly fails
to show the rest of the re-encoded subject in the Summary buffer.
(without re-encoding, it shows the whole subject).

Best,
v.

[-- Attachment #2: Type: message/rfc822, Size: 912 bytes --]

From: <CyrTeX-ru@vsu.ru> (mikezmn on newmail.ru)
To: "Cyrillic TeX Users Group" <CyrTeX-ru@vsu.ru>
Subject: йМХЦЮ  йНРЕКЭМХЙНБЮ Х вЕАНРЮЕБЮ
Date: Tue, 27 May 2003 11:30:28 +0400
Message-ID: <list-2759095@vsu.ru>

This is a test message.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: one more bug in latest gnus
  2003-05-27 13:45 one more bug in latest gnus Vladimir Volovich
@ 2003-05-27 21:57 ` Simon Josefsson
  2003-05-28 22:26   ` Dave Love
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Josefsson @ 2003-05-27 21:57 UTC (permalink / raw)
  Cc: ding, bugs, Dave Love

Vladimir Volovich <vvv@vsu.ru> writes:

> Hello,
>
> please press C-d on this message, you'll see the attached
> email; it's subject contains "йМХЦЮ  йНРЕКЭМХЙНБЮ Х вЕАНРЮЕБЮ".
> The subject is encoded as koi8-r, while the correct
> charset is windows-1251 (it was an error of user's mail agent).
>
> when i want to see the correctly encoded subject in gnus, i press
>   0 g windows-1251 RET
> and gnus re-encodes the message (both headers and body) in
> windows-1251 encoding.
>
> for this, you need to execute
> (codepage-setup 1251)
> (define-coding-system-alias 'windows-1251 'cp1251)
>
> now, the bug in gnus is this:
>
>   after doing such reencoding with 0 g windows-1251 RET,
>   i see correctly re-encoded subject in the Article buffer:
>   Subject: Книга  Котельникова и Чеботаева
>   but in the Summary buffer, i see only "Книга"
>   but the rest of the subject is not shown.
>
> That may be due to the fact that in the original message the subject
> consists of several lines:
>
> Subject: =?KOI8-R?Q?=CA=ED=E8=E3=E0?=
>  =?KOI8-R?Q?__=CA=EE=F2=E5=EB=FC=ED=E8=EA=EE=E2=E0_=E8?=
>  =?KOI8-R?Q?_=D7=E5=E1=EE=F2=E0=E5=E2=E0?=
>
> and the word "Книга" is on the first line; and gnus incorrectly fails
> to show the rest of the re-encoded subject in the Summary buffer.
> (without re-encoding, it shows the whole subject).

It seems to work for me with the patch Jesper and I posted earlier
(see <m33cj1ippq.fsf@defun.localdomain> in gnus.gnus-bug).  Dave, as
the one who seem to have worked on rfc2047 recently, can you say if
the patch the right thing or not?

--- rfc2047.el.~6.52.~	Thu May 22 22:39:53 2003
+++ rfc2047.el	Tue May 27 23:54:28 2003
@@ -612,7 +612,7 @@
 	    (goto-char e)
 	    (while (re-search-forward "[\n\r]+" nil t)
 	      (replace-match " "))
-	    (goto-char (point-max)))
+	    (setq b (goto-char (point-max))))
 	  (when (and (mm-multibyte-p)
 		     mail-parse-charset
 		     (not (eq mail-parse-charset 'us-ascii))
@@ -697,7 +697,6 @@
 		 mail-parse-charset)
 	(setq cs mail-parse-charset))
       ;; Fixme: What's this for?  The following comment makes no sense. -- fx
-      (mm-with-unibyte-current-buffer
 	;; In Emacs Mule 4, decoding UTF-8 should be in unibyte mode.
 	(mm-decode-coding-string
 	 (cond
@@ -708,7 +707,7 @@
 	   (quoted-printable-decode-string
 	    (mm-replace-chars-in-string string ?_ ? )))
 	  (t (error "Invalid encoding: %s" encoding)))
-	 cs)))))
+       cs))))
 
 (provide 'rfc2047)
 




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: one more bug in latest gnus
  2003-05-27 21:57 ` Simon Josefsson
@ 2003-05-28 22:26   ` Dave Love
  2003-05-30  9:44     ` Dave Love
       [not found]     ` <bb7b07$99b$1@quimby.gnus.org>
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Love @ 2003-05-28 22:26 UTC (permalink / raw)
  Cc: ding, bugs

Simon Josefsson <jas@extundo.com> writes:

> It seems to work for me with the patch Jesper and I posted earlier
> (see <m33cj1ippq.fsf@defun.localdomain> in gnus.gnus-bug).  Dave, as
> the one who seem to have worked on rfc2047 recently, can you say if
> the patch the right thing or not?

No better than anyone else, I'm afraid, but it doesn't look right to
me.  `e' is the beginning of the just-decoded word, so the args of
`mm-decode-coding-region' then appear to be wrong.

Inside the loop over encoded words, I assume `mm-decode-coding-region'
is supposed to be decoding any non-ASCII stuff between encoded words.
There's no case for anything before an encoded word, and it might at
least check whether there is an eight-bit character (not a multibyte
one) in what it might try to decode.  That code should probably go
before `rfc2047-parse-and-decode'.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: one more bug in latest gnus
  2003-05-28 22:26   ` Dave Love
@ 2003-05-30  9:44     ` Dave Love
       [not found]     ` <bb7b07$99b$1@quimby.gnus.org>
  1 sibling, 0 replies; 8+ messages in thread
From: Dave Love @ 2003-05-30  9:44 UTC (permalink / raw)
  Cc: bugs

I wrote:

> No better than anyone else, I'm afraid, but it doesn't look right to
> me.  `e' is the beginning of the just-decoded word, so the args of
> `mm-decode-coding-region' then appear to be wrong.

Sorry, ignore what I said; I misread the code in a hurry.  It's not
very clear without comments.

One thing it shouldn't do is to assume where decode-coding-region
leaves point.  As far as I remember, that's not guaranteed (which I
think is a bug).

I suspect that if mm-decode-coding-region must be used, it
could just go before the loop over words, to decode the whole thing
once for all.  The chance of QP words containing ASCII that could get
decoded inadvertently is small -- it would have to contain iso2022
escape sequences (or utf-7 if that's defined) -- and it can't happen
for B-encoding.  That may be a significant performance improvement,
but I haven't measured it.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: one more bug in latest gnus
       [not found]     ` <bb7b07$99b$1@quimby.gnus.org>
@ 2003-05-30 17:45       ` Vladimir Volovich
  2003-06-02  2:20         ` Jesper Harder
  0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Volovich @ 2003-05-30 17:45 UTC (permalink / raw)


"DL" == Dave Love writes:

 >> No better than anyone else, I'm afraid, but it doesn't look right
 >> to me.  `e' is the beginning of the just-decoded word, so the args
 >> of `mm-decode-coding-region' then appear to be wrong.

 DL> Sorry, ignore what I said; I misread the code in a hurry.  It's
 DL> not very clear without comments.

 DL> One thing it shouldn't do is to assume where decode-coding-region
 DL> leaves point.  As far as I remember, that's not guaranteed (which
 DL> I think is a bug).

 DL> I suspect that if mm-decode-coding-region must be used, it could
 DL> just go before the loop over words, to decode the whole thing
 DL> once for all.  The chance of QP words containing ASCII that could
 DL> get decoded inadvertently is small -- it would have to contain
 DL> iso2022 escape sequences (or utf-7 if that's defined) -- and it
 DL> can't happen for B-encoding.  That may be a significant
 DL> performance improvement, but I haven't measured it.

i saw some changes in CVS, and just synced to see if that fixed this
bug. this bug is not yet fixed, i.e.

  after pressing C-d on the first message in this thread, and
  after doing reencoding with 0 g windows-1251 RET,
  i see correctly re-encoded subject in the Article buffer:
  Subject: Книга  Котельникова и Чеботаева
  but in the Summary buffer, i see only "Книга"
  but the rest of the subject is not shown.

just to let you know so that will not get forgotten :)

Best,
v.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: one more bug in latest gnus
  2003-05-30 17:45       ` Vladimir Volovich
@ 2003-06-02  2:20         ` Jesper Harder
  0 siblings, 0 replies; 8+ messages in thread
From: Jesper Harder @ 2003-06-02  2:20 UTC (permalink / raw)


Vladimir Volovich <vvv@vsu.ru> writes:

> i saw some changes in CVS, and just synced to see if that fixed this
> bug. this bug is not yet fixed, i.e.
>
>   after pressing C-d on the first message in this thread, and
>   after doing reencoding with 0 g windows-1251 RET,
>   i see correctly re-encoded subject in the Article buffer:
>   Subject: Книга  Котельникова и Чеботаева
>   but in the Summary buffer, i see only "Книга"
>   but the rest of the subject is not shown.
>
> just to let you know so that will not get forgotten :)

This patch appears to fix it.  But I haven't checked carefully that it
doesn't break anything else.


*** /home/harder/gnus/lisp/gnus-sum.el	Sat May 17 18:14:30 2003
--- /home/harder/cvsgnus/lisp/gnus-sum.el	Mon Jun  2 04:18:16 2003
***************
*** 5713,5718 ****
--- 5713,5719 ----
        ;; Translate all TAB characters into SPACE characters.
        (subst-char-in-region (point-min) (point-max) ?\t ?  t)
        (subst-char-in-region (point-min) (point-max) ?\r ?  t)
+       (ietf-drums-unfold-fws)
        (gnus-run-hooks 'gnus-parse-headers-hook)
        (let ((case-fold-search t)
  	    in-reply-to header p lines chars)




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: one more bug in latest gnus
  2003-05-27 14:01 ` Vladimir Volovich
@ 2003-05-27 22:02   ` Simon Josefsson
  0 siblings, 0 replies; 8+ messages in thread
From: Simon Josefsson @ 2003-05-27 22:02 UTC (permalink / raw)
  Cc: ding

Vladimir Volovich <vvv@vsu.ru> writes:

> Hi!,
>
> one more bug:
>
> when you press RET on the message button (after previewing the
> previous message in this thread) you'll see
>
> Subject: йМХЦЮ  йНРЕКЭМХЙНБЮ Х=?KOI8-R?Q?_=D7=E5=E1=EE=F2=E0=E5=E2=E0?=
>
> I.e., gnus did not decode the whole subject - some encoded-words were
> left undecoded.

This also appears to be fixed by Jesper's patch.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: one more bug in latest gnus
       [not found] <bavqsu$46t$1@quimby.gnus.org>
@ 2003-05-27 14:01 ` Vladimir Volovich
  2003-05-27 22:02   ` Simon Josefsson
  0 siblings, 1 reply; 8+ messages in thread
From: Vladimir Volovich @ 2003-05-27 14:01 UTC (permalink / raw)


Hi!,

one more bug:

when you press RET on the message button (after previewing the
previous message in this thread) you'll see

Subject: йМХЦЮ  йНРЕКЭМХЙНБЮ Х=?KOI8-R?Q?_=D7=E5=E1=EE=F2=E0=E5=E2=E0?=

I.e., gnus did not decode the whole subject - some encoded-words were
left undecoded.

It looks like gnus does not use consistent calls to decode headers:
sometimes it fails to decode multiline headers. It should use some
unified functions which will properly deal with headers, instead of
using special function in several places which appear to have bugs.

Best,
v.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-06-02  2:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-27 13:45 one more bug in latest gnus Vladimir Volovich
2003-05-27 21:57 ` Simon Josefsson
2003-05-28 22:26   ` Dave Love
2003-05-30  9:44     ` Dave Love
     [not found]     ` <bb7b07$99b$1@quimby.gnus.org>
2003-05-30 17:45       ` Vladimir Volovich
2003-06-02  2:20         ` Jesper Harder
     [not found] <bavqsu$46t$1@quimby.gnus.org>
2003-05-27 14:01 ` Vladimir Volovich
2003-05-27 22:02   ` Simon Josefsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).