* rfc2047 decoding
@ 2016-01-06 22:01 Adam Sjøgren
2016-01-06 22:30 ` Bjørn Mork
2016-01-07 0:45 ` Katsumi Yamaoka
0 siblings, 2 replies; 5+ messages in thread
From: Adam Sjøgren @ 2016-01-06 22:01 UTC (permalink / raw)
To: ding
What is the correct decoding of this header:
Subject: =?UTF-8?Q?Hackerangreb=20mod=20it=2Dl?=
=?UTF-8?Q?everand=C3=B8r=20bag=20app=20ti?= =?UTF-8?Q?l=20DSB?='
=?UTF-8?Q?s=20gr=C3=A6nsekontrol?=
Is it:
"Hackerangreb mod it-leverandør bag app til DSB' s grænsekontrol" - or:
"Hackerangreb mod it-leverandør bag app til DSB's grænsekontrol" ?
Gnus decodes it to the first line. So I'm inclined to think that is
correct.
Perl's Encode::MIME::Header decodes it to the second line. But it also
encoded the text in the first place, so I guess it {s,w}ould roundtrip.
What gives?
Best regards,
Adam, who is ready to weep about rfc2047 apparently being too hard
to get right.
--
"Where the world is going? Adam Sjøgren
Back to where it once was" asjo@koldfront.dk
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: rfc2047 decoding
2016-01-06 22:01 rfc2047 decoding Adam Sjøgren
@ 2016-01-06 22:30 ` Bjørn Mork
2016-01-06 23:02 ` Adam Sjøgren
2016-01-07 0:45 ` Katsumi Yamaoka
1 sibling, 1 reply; 5+ messages in thread
From: Bjørn Mork @ 2016-01-06 22:30 UTC (permalink / raw)
To: Adam Sjøgren; +Cc: ding
asjo@koldfront.dk (Adam Sjøgren) writes:
> What is the correct decoding of this header:
>
> Subject: =?UTF-8?Q?Hackerangreb=20mod=20it=2Dl?=
> =?UTF-8?Q?everand=C3=B8r=20bag=20app=20ti?= =?UTF-8?Q?l=20DSB?='
> =?UTF-8?Q?s=20gr=C3=A6nsekontrol?=
>
> Is it:
>
> "Hackerangreb mod it-leverandør bag app til DSB' s grænsekontrol" - or:
> "Hackerangreb mod it-leverandør bag app til DSB's grænsekontrol" ?
>
> Gnus decodes it to the first line. So I'm inclined to think that is
> correct.
>
> Perl's Encode::MIME::Header decodes it to the second line. But it also
> encoded the text in the first place, so I guess it {s,w}ould roundtrip.
>
> What gives?
The "=?UTF-8?Q?l=20DSB?='" part of the header is invalid:
Ordinary ASCII text and 'encoded-word's may appear together in the
same header field. However, an 'encoded-word' that appears in a
header field defined as '*text' MUST be separated from any adjacent
'encoded-word' or 'text' by 'linear-white-space'.
So the formally correct decoding would be:
"Hackerangreb mod it-leverandør bag app ti=?UTF-8?Q?l=20DSB?='s grænsekontrol"
But I believe both your examples are more reasonable attempts in the
spirit of rfc1123.
Bjørn
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: rfc2047 decoding
2016-01-06 22:30 ` Bjørn Mork
@ 2016-01-06 23:02 ` Adam Sjøgren
2016-01-26 20:59 ` Adam Sjøgren
0 siblings, 1 reply; 5+ messages in thread
From: Adam Sjøgren @ 2016-01-06 23:02 UTC (permalink / raw)
To: ding
Bjørn writes:
>> Perl's Encode::MIME::Header decodes it to the second line. But it also
>> encoded the text in the first place, so I guess it {s,w}ould roundtrip.
> The "=?UTF-8?Q?l=20DSB?='" part of the header is invalid:
>
> Ordinary ASCII text and 'encoded-word's may appear together in the
> same header field. However, an 'encoded-word' that appears in a
> header field defined as '*text' MUST be separated from any adjacent
> 'encoded-word' or 'text' by 'linear-white-space'.
Great, I shall report that as a bug against Encode::MIME::Header.
I knew this was where to find experts in rfc 2047 :-)
E:M:H also encodes other things weirdly, like adding "\n " at the
beginning of a line of ascii(!)
Thanks!
Adam
--
"With the possible exception of things like box Adam Sjøgren
scores, race results, and stock market tabulations, asjo@koldfront.dk
there is no such thing as Objective Journalism. The
phrase itself is a pompous contradiction in terms."
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: rfc2047 decoding
2016-01-06 22:01 rfc2047 decoding Adam Sjøgren
2016-01-06 22:30 ` Bjørn Mork
@ 2016-01-07 0:45 ` Katsumi Yamaoka
1 sibling, 0 replies; 5+ messages in thread
From: Katsumi Yamaoka @ 2016-01-07 0:45 UTC (permalink / raw)
To: ding
On Wed, 06 Jan 2016 23:01:39 +0100, Adam Sjøgren wrote:
> What is the correct decoding of this header:
> Subject: =?UTF-8?Q?Hackerangreb=20mod=20it=2Dl?=
> =?UTF-8?Q?everand=C3=B8r=20bag=20app=20ti?= =?UTF-8?Q?l=20DSB?='
> =?UTF-8?Q?s=20gr=C3=A6nsekontrol?=
To begin with, there is a wrong encoding that violates RFC2047:
,----
| 5. Use of encoded-words in message headers
| [...]
| (1) An 'encoded-word' may replace a 'text' token (as defined by RFC 822)
| [...]
| Ordinary ASCII text and 'encoded-word's may appear together in the
| same header field. However, an 'encoded-word' that appears in a
| header field defined as '*text' MUST be separated from any adjacent
| 'encoded-word' or 'text' by 'linear-white-space'.
`----
I mean "=?UTF-8?Q?l=20DSB?=" and "'" are concatenated without
SPC. The original text seems to be "DSB's", not "DSB' s", and
the correct encoding would be to encode the whole letters into
"=?utf-8?Q?DSB's?=". Gnus does not do so since a word that does
not contain non-ASCII letter does not need to be encoded, though.
(let ((mm-coding-system-priorities '(utf-8)))
(rfc2047-encode-string "Subject:\
Hackerangreb mod it-leverandør bag app til DSB's grænsekontrol"))
"Subject: Hackerangreb mod =?utf-8?Q?it-leverand=C3=B8r?= bag app til DSB's
=?utf-8?Q?gr=C3=A6nsekontrol?="
That's excellent, isn't it? :)
> Is it:
> "Hackerangreb mod it-leverandør bag app til DSB' s grænsekontrol" - or:
> "Hackerangreb mod it-leverandør bag app til DSB's grænsekontrol" ?
> Gnus decodes it to the first line. So I'm inclined to think that is
> correct.
Probably there is no prescribed way to decode illegally encoded
data and Gnus's way might not necessarily be the best.
That Gnus does is simple; concatenate decoded successive encoded
words without SPC[1] and leave the others as-is[2].
[1] In reality, rfc2047.el concatenates successive encoded words
without SPC, and then decodes it.
[2] The reason why there appears SPC between "'" and "s".
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: rfc2047 decoding
2016-01-06 23:02 ` Adam Sjøgren
@ 2016-01-26 20:59 ` Adam Sjøgren
0 siblings, 0 replies; 5+ messages in thread
From: Adam Sjøgren @ 2016-01-26 20:59 UTC (permalink / raw)
To: ding
Adam writes:
> Great, I shall report that as a bug against Encode::MIME::Header.
And just as was about to contribute failing tests, the author fixed the
problems!
· https://github.com/dankogai/p5-encode/commits/master
Thanks for your comments.
Best regards,
Adam
--
"Och när jag blundar hörs din röst Adam Sjøgren
Jag kan inte se ditt ansikte asjo@koldfront.dk
Det var det jag glömde först"
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-01-26 20:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-06 22:01 rfc2047 decoding Adam Sjøgren
2016-01-06 22:30 ` Bjørn Mork
2016-01-06 23:02 ` Adam Sjøgren
2016-01-26 20:59 ` Adam Sjøgren
2016-01-07 0:45 ` Katsumi Yamaoka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).