* Avoiding double encoding in subject
@ 2006-07-26 11:00 Reiner Steib
2006-07-27 15:55 ` Michael Piotrowski
0 siblings, 1 reply; 4+ messages in thread
From: Reiner Steib @ 2006-07-26 11:00 UTC (permalink / raw)
Hi,
when replying to an article who's subject contains an unknown (or
invalid) encoding...
| Newsgroups: gmane.test
| Archived-At: <http://permalink.gmane.org/gmane.test/3051>
| Message-ID: <v94px7cjcs.fsf@marauder.physik.uni-ulm.de>
| Subject: bogus or unknown charset in =?iso-8859-17?Q?=E4?= subject
-- i.e. the charset is unknown to (X)Emacs[1] -- and not present in
`mm-charset-synonym-alist', Gnus produces a subject like...
| Newsgroups: gmane.test
| Archived-At: <http://permalink.gmane.org/gmane.test/3052>
| Message-ID: <v9y7ujb39h.fsf@marauder.physik.uni-ulm.de>
| Subject: Re: bogus or unknown charset in =?us-ascii?Q?=3D=3Fiso-8859-17=3F?=
| =?us-ascii?Q?Q=3F=3DE4=3F=3D?= subject
Bad.
I'm not sure how Gnus should handle this situation. Some
possibilities:
(1) Gnus could (probably, I don't know if it is feasible to implement
this) mark the Subject as "not decoded" and resend it "as is"
without the double[2] us-ascii encoding. Gnus also has to make
sure that this mark survives when the article is saved to the
drafts folder.
Problem: If the given charset is really invalid rather than
unknown (the user usually can't decide), Gnus will also produce an
incorrect article.
(2) Gnus' decoder could replace the unknown/invalid characters with a
replacement character ("?", U+FFFD = REPLACEMENT CHARACTER, ...).
Problem: It's probably not possible to get the number of
replacement characters right.
Other suggestions?
Bye, Reiner.
[1] This may happen when (X)Emacs is to old to support a newly
introduced charset.
--
,,,
(o o)
---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Avoiding double encoding in subject
2006-07-26 11:00 Avoiding double encoding in subject Reiner Steib
@ 2006-07-27 15:55 ` Michael Piotrowski
2006-11-06 20:00 ` Reiner Steib
0 siblings, 1 reply; 4+ messages in thread
From: Michael Piotrowski @ 2006-07-27 15:55 UTC (permalink / raw)
On 2006-07-26 Reiner Steib <reinersteib+gmane@imap.cc> wrote:
> when replying to an article who's subject contains an unknown (or
> invalid) encoding...
[...]
> I'm not sure how Gnus should handle this situation. Some
> possibilities:
[...]
> (2) Gnus' decoder could replace the unknown/invalid characters with a
> replacement character ("?", U+FFFD = REPLACEMENT CHARACTER, ...).
>
> Problem: It's probably not possible to get the number of
> replacement characters right.
Something like option (2) plus user interaction might be the right
approach:
Gnus detects a RFC 2047-encoded string in a header. If the charset is
known, everything is fine and it's decoded. If the charset is
unknown, ask the user, e.g.:
Unknown charset "iso-8859-17" in Subject header, what now? (c, u or C-h):
The user could then either type "c" and specify a charset which should
be used to interpret it or type "u" to accept it as unknown; in this
case I'd simply replace each octet with the replacement character
since there is now way to find out the character size of an unknown
encoding.
Greetings
--
Michael Piotrowski, M.A. <mxp@dynalabs.de>
Public key at <http://www.dynalabs.de/mxp/pubkey.txt>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Avoiding double encoding in subject
2006-07-27 15:55 ` Michael Piotrowski
@ 2006-11-06 20:00 ` Reiner Steib
2006-11-09 18:13 ` Reiner Steib
0 siblings, 1 reply; 4+ messages in thread
From: Reiner Steib @ 2006-11-06 20:00 UTC (permalink / raw)
[ Quotation added because of answering to an old message ]
On Thu, Jul 27 2006, Michael Piotrowski wrote:
> On 2006-07-26 Reiner Steib <reinersteib+gmane@imap.cc> wrote:
>
>> when replying to an article who's subject contains an unknown (or
>> invalid) encoding...
>>
>> | Newsgroups: gmane.test
>> | Archived-At: <http://permalink.gmane.org/gmane.test/3051>
>> | Message-ID: <v94px7cjcs.fsf@marauder.physik.uni-ulm.de>
>> | Subject: bogus or unknown charset in =?iso-8859-17?Q?=E4?= subject
>>
>> -- i.e. the charset is unknown to (X)Emacs[1] -- and not present in
>> `mm-charset-synonym-alist', Gnus produces a subject like...
>>
>> | Newsgroups: gmane.test
>> | Archived-At: <http://permalink.gmane.org/gmane.test/3052>
>> | Message-ID: <v9y7ujb39h.fsf@marauder.physik.uni-ulm.de>
>> | Subject: Re: bogus or unknown charset in =?us-ascii?Q?=3D=3Fiso-8859-17=3F?=
>> | =?us-ascii?Q?Q=3F=3DE4=3F=3D?= subject
>>
>> Bad.
>>
>> I'm not sure how Gnus should handle this situation. Some
>> possibilities:
>
>> (1) Gnus could (probably, I don't know if it is feasible to implement
>> this) mark the Subject as "not decoded" and resend it "as is"
>> without the double[2] us-ascii encoding. Gnus also has to make
>> sure that this mark survives when the article is saved to the
>> drafts folder.
>>
>> Problem: If the given charset is really invalid rather than
>> unknown (the user usually can't decide), Gnus will also produce an
>> incorrect article.
>>
>> (2) Gnus' decoder could replace the unknown/invalid characters with a
>> replacement character ("?", U+FFFD = REPLACEMENT CHARACTER, ...).
>>
>> Problem: It's probably not possible to get the number of
>> replacement characters right.
The fact that we only need to do something if the user replies to a
message makes it more simple. We can tackle the problem when the user
hits `F' like we do when stripping "(was: ...)".
> Something like option (2) plus user interaction might be the right
> approach:
>
> Gnus detects a RFC 2047-encoded string in a header. If the charset is
> known, everything is fine and it's decoded. If the charset is
> unknown, ask the user, e.g.:
>
> Unknown charset "iso-8859-17" in Subject header, what now? (c, u or C-h):
>
> The user could then either type "c" and specify a charset which should
> be used to interpret it or type "u" to accept it as unknown; in this
> case I'd simply replace each octet with the replacement character
> since there is now way to find out the character size of an unknown
> encoding.
I've added the new function `message-strip-subject-encoded-words' to
CVS trunk, but I didn't enable it yet. If nobody finds a problem, I'll
enable it by default tomorrow and will later merge it to v5-10 as
well.
I'd appreciate if someone could look into the code and I'd also like
people to ask people to test it in by adding it to
`message-simplify-subject-functions'[1] now.
Bye, Reiner.
[1]
(add-to-list 'message-simplify-subject-functions
'message-strip-subject-encoded-words
t)
--
,,,
(o o)
---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Avoiding double encoding in subject
2006-11-06 20:00 ` Reiner Steib
@ 2006-11-09 18:13 ` Reiner Steib
0 siblings, 0 replies; 4+ messages in thread
From: Reiner Steib @ 2006-11-09 18:13 UTC (permalink / raw)
On Mon, Nov 06 2006, Reiner Steib wrote:
> I've added the new function `message-strip-subject-encoded-words' to
> CVS trunk, but I didn't enable it yet. If nobody finds a problem, I'll
> enable it by default tomorrow and will later merge it to v5-10 as
> well.
Done. I have found and fixed some problems.
See the threads starting with
<news:4adc91846507dfc3bbd2f23c7cc32a39@wachinger.fqdn.th-h.de> in
de.comm.software.newsreader and <news:eiheec$acn$1@online.de> in
de.comp.editoren for some examples in the wild. (If you don't have
access to de.* I could send interested testers an mbox file with this
threads.)
Or the thread starting with
<news:v94px7cjcs.fsf@marauder.physik.uni-ulm.de> in gmane.test
<http://thread.gmane.org/v94px7cjcs.fsf@marauder.physik.uni-ulm.de>
(BTW: Loom, Gmane's web interface doesn't show the articles
correctly.)
`message-strip-subject-encoded-words' doesn't correct a similar
problem which was mentioned in
<news:v9irka2xuc.fsf_-_@marauder.physik.uni-ulm.de>:
The following encoding of "wöchentlich?" is wrong:
| Subject: wrong encoding: =?utf-8?Q?w=C3=B6chentlich??=
This is correct:
| Subject: wrong encoding: =?utf-8?Q?w=C3=B6chentlich=3F?=
Gnus doesn't detect the mistake and the subject in a reply will be
double encoded as well (which is technically correct, but maybe a
little more "liberal what you accept" might be okay):
,----
| Subject: Re: wrong encoding: =?us-ascii?Q?=3D=3Futf-8=3FQ=3Fw=3DC3=3DB6che?=
| =?us-ascii?Q?ntlich=3F=3F=3D?=
`----
Bye, Reiner.
--
,,,
(o o)
---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-11-09 18:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-26 11:00 Avoiding double encoding in subject Reiner Steib
2006-07-27 15:55 ` Michael Piotrowski
2006-11-06 20:00 ` Reiner Steib
2006-11-09 18:13 ` Reiner Steib
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).