Gnus development mailing list
 help / color / mirror / Atom feed
* rfc2047 decoding
@ 2016-01-06 22:01 Adam Sjøgren
  2016-01-06 22:30 ` Bjørn Mork
  2016-01-07  0:45 ` Katsumi Yamaoka
  0 siblings, 2 replies; 5+ messages in thread
From: Adam Sjøgren @ 2016-01-06 22:01 UTC (permalink / raw)
  To: ding

What is the correct decoding of this header:

Subject: =?UTF-8?Q?Hackerangreb=20mod=20it=2Dl?=
 =?UTF-8?Q?everand=C3=B8r=20bag=20app=20ti?= =?UTF-8?Q?l=20DSB?='
 =?UTF-8?Q?s=20gr=C3=A6nsekontrol?=

Is it:

 "Hackerangreb mod it-leverandør bag app til DSB' s grænsekontrol" - or:
 "Hackerangreb mod it-leverandør bag app til DSB's grænsekontrol" ?

Gnus decodes it to the first line. So I'm inclined to think that is
correct.

Perl's Encode::MIME::Header decodes it to the second line. But it also
encoded the text in the first place, so I guess it {s,w}ould roundtrip.

What gives?


  Best regards,

    Adam, who is ready to weep about rfc2047 apparently being too hard
          to get right.

-- 
 "Where the world is going?                                   Adam Sjøgren
  Back to where it once was"                             asjo@koldfront.dk




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: rfc2047 decoding
  2016-01-06 22:01 rfc2047 decoding Adam Sjøgren
@ 2016-01-06 22:30 ` Bjørn Mork
  2016-01-06 23:02   ` Adam Sjøgren
  2016-01-07  0:45 ` Katsumi Yamaoka
  1 sibling, 1 reply; 5+ messages in thread
From: Bjørn Mork @ 2016-01-06 22:30 UTC (permalink / raw)
  To: Adam Sjøgren; +Cc: ding

asjo@koldfront.dk (Adam Sjøgren) writes:

> What is the correct decoding of this header:
>
> Subject: =?UTF-8?Q?Hackerangreb=20mod=20it=2Dl?=
>  =?UTF-8?Q?everand=C3=B8r=20bag=20app=20ti?= =?UTF-8?Q?l=20DSB?='
>  =?UTF-8?Q?s=20gr=C3=A6nsekontrol?=
>
> Is it:
>
>  "Hackerangreb mod it-leverandør bag app til DSB' s grænsekontrol" - or:
>  "Hackerangreb mod it-leverandør bag app til DSB's grænsekontrol" ?
>
> Gnus decodes it to the first line. So I'm inclined to think that is
> correct.
>
> Perl's Encode::MIME::Header decodes it to the second line. But it also
> encoded the text in the first place, so I guess it {s,w}ould roundtrip.
>
> What gives?

The "=?UTF-8?Q?l=20DSB?='" part of the header is invalid:

    Ordinary ASCII text and 'encoded-word's may appear together in the
    same header field.  However, an 'encoded-word' that appears in a
    header field defined as '*text' MUST be separated from any adjacent
    'encoded-word' or 'text' by 'linear-white-space'.


So the formally correct decoding would be:

  "Hackerangreb mod it-leverandør bag app ti=?UTF-8?Q?l=20DSB?='s grænsekontrol"

But I believe both your examples are more reasonable attempts in the
spirit of rfc1123.


Bjørn



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: rfc2047 decoding
  2016-01-06 22:30 ` Bjørn Mork
@ 2016-01-06 23:02   ` Adam Sjøgren
  2016-01-26 20:59     ` Adam Sjøgren
  0 siblings, 1 reply; 5+ messages in thread
From: Adam Sjøgren @ 2016-01-06 23:02 UTC (permalink / raw)
  To: ding

Bjørn writes:

>> Perl's Encode::MIME::Header decodes it to the second line. But it also
>> encoded the text in the first place, so I guess it {s,w}ould roundtrip.

> The "=?UTF-8?Q?l=20DSB?='" part of the header is invalid:
>
>     Ordinary ASCII text and 'encoded-word's may appear together in the
>     same header field.  However, an 'encoded-word' that appears in a
>     header field defined as '*text' MUST be separated from any adjacent
>     'encoded-word' or 'text' by 'linear-white-space'.

Great, I shall report that as a bug against Encode::MIME::Header.

I knew this was where to find experts in rfc 2047 :-)

E:M:H also encodes other things weirdly, like adding "\n " at the
beginning of a line of ascii(!)


  Thanks!

   Adam

-- 
 "With the possible exception of things like box              Adam Sjøgren
  scores, race results, and stock market tabulations,    asjo@koldfront.dk
  there is no such thing as Objective Journalism. The
  phrase itself is a pompous contradiction in terms."




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: rfc2047 decoding
  2016-01-06 22:01 rfc2047 decoding Adam Sjøgren
  2016-01-06 22:30 ` Bjørn Mork
@ 2016-01-07  0:45 ` Katsumi Yamaoka
  1 sibling, 0 replies; 5+ messages in thread
From: Katsumi Yamaoka @ 2016-01-07  0:45 UTC (permalink / raw)
  To: ding

On Wed, 06 Jan 2016 23:01:39 +0100, Adam Sjøgren wrote:
> What is the correct decoding of this header:

> Subject: =?UTF-8?Q?Hackerangreb=20mod=20it=2Dl?=
>  =?UTF-8?Q?everand=C3=B8r=20bag=20app=20ti?= =?UTF-8?Q?l=20DSB?='
>  =?UTF-8?Q?s=20gr=C3=A6nsekontrol?=

To begin with, there is a wrong encoding that violates RFC2047:
,----
| 5. Use of encoded-words in message headers
| [...]
| (1) An 'encoded-word' may replace a 'text' token (as defined by RFC 822)
| [...]
|     Ordinary ASCII text and 'encoded-word's may appear together in the
|     same header field.  However, an 'encoded-word' that appears in a
|     header field defined as '*text' MUST be separated from any adjacent
|     'encoded-word' or 'text' by 'linear-white-space'.
`----

I mean "=?UTF-8?Q?l=20DSB?=" and "'" are concatenated without
SPC.  The original text seems to be "DSB's", not "DSB' s", and
the correct encoding would be to encode the whole letters into
"=?utf-8?Q?DSB's?=".  Gnus does not do so since a word that does
not contain non-ASCII letter does not need to be encoded, though.

(let ((mm-coding-system-priorities '(utf-8)))
  (rfc2047-encode-string "Subject:\
 Hackerangreb mod it-leverandør bag app til DSB's grænsekontrol"))
"Subject: Hackerangreb mod =?utf-8?Q?it-leverand=C3=B8r?= bag app til DSB's
 =?utf-8?Q?gr=C3=A6nsekontrol?="

That's excellent, isn't it? :)

> Is it:

>  "Hackerangreb mod it-leverandør bag app til DSB' s grænsekontrol" - or:
>  "Hackerangreb mod it-leverandør bag app til DSB's grænsekontrol" ?

> Gnus decodes it to the first line. So I'm inclined to think that is
> correct.

Probably there is no prescribed way to decode illegally encoded
data and Gnus's way might not necessarily be the best.
That Gnus does is simple; concatenate decoded successive encoded
words without SPC[1] and leave the others as-is[2].

[1] In reality, rfc2047.el concatenates successive encoded words
    without SPC, and then decodes it.
[2] The reason why there appears SPC between "'" and "s".



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: rfc2047 decoding
  2016-01-06 23:02   ` Adam Sjøgren
@ 2016-01-26 20:59     ` Adam Sjøgren
  0 siblings, 0 replies; 5+ messages in thread
From: Adam Sjøgren @ 2016-01-26 20:59 UTC (permalink / raw)
  To: ding

Adam writes:

> Great, I shall report that as a bug against Encode::MIME::Header.

And just as was about to contribute failing tests, the author fixed the
problems!

 · https://github.com/dankogai/p5-encode/commits/master

Thanks for your comments.


  Best regards,

    Adam

-- 
 "Och när jag blundar hörs din röst                           Adam Sjøgren
  Jag kan inte se ditt ansikte                           asjo@koldfront.dk
  Det var det jag glömde först"




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-01-26 20:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-06 22:01 rfc2047 decoding Adam Sjøgren
2016-01-06 22:30 ` Bjørn Mork
2016-01-06 23:02   ` Adam Sjøgren
2016-01-26 20:59     ` Adam Sjøgren
2016-01-07  0:45 ` Katsumi Yamaoka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).