Gnus development mailing list
 help / color / mirror / Atom feed
From: Katsumi Yamaoka <yamaoka@jpl.org>
To: ding@gnus.org
Subject: Re: rfc2047 decoding
Date: Thu, 07 Jan 2016 09:45:35 +0900	[thread overview]
Message-ID: <b4mbn8yb4k0.fsf@jpl.org> (raw)
In-Reply-To: <8737uawenw.fsf@tullinup.koldfront.dk>

On Wed, 06 Jan 2016 23:01:39 +0100, Adam Sjøgren wrote:
> What is the correct decoding of this header:

> Subject: =?UTF-8?Q?Hackerangreb=20mod=20it=2Dl?=
>  =?UTF-8?Q?everand=C3=B8r=20bag=20app=20ti?= =?UTF-8?Q?l=20DSB?='
>  =?UTF-8?Q?s=20gr=C3=A6nsekontrol?=

To begin with, there is a wrong encoding that violates RFC2047:
,----
| 5. Use of encoded-words in message headers
| [...]
| (1) An 'encoded-word' may replace a 'text' token (as defined by RFC 822)
| [...]
|     Ordinary ASCII text and 'encoded-word's may appear together in the
|     same header field.  However, an 'encoded-word' that appears in a
|     header field defined as '*text' MUST be separated from any adjacent
|     'encoded-word' or 'text' by 'linear-white-space'.
`----

I mean "=?UTF-8?Q?l=20DSB?=" and "'" are concatenated without
SPC.  The original text seems to be "DSB's", not "DSB' s", and
the correct encoding would be to encode the whole letters into
"=?utf-8?Q?DSB's?=".  Gnus does not do so since a word that does
not contain non-ASCII letter does not need to be encoded, though.

(let ((mm-coding-system-priorities '(utf-8)))
  (rfc2047-encode-string "Subject:\
 Hackerangreb mod it-leverandør bag app til DSB's grænsekontrol"))
"Subject: Hackerangreb mod =?utf-8?Q?it-leverand=C3=B8r?= bag app til DSB's
 =?utf-8?Q?gr=C3=A6nsekontrol?="

That's excellent, isn't it? :)

> Is it:

>  "Hackerangreb mod it-leverandør bag app til DSB' s grænsekontrol" - or:
>  "Hackerangreb mod it-leverandør bag app til DSB's grænsekontrol" ?

> Gnus decodes it to the first line. So I'm inclined to think that is
> correct.

Probably there is no prescribed way to decode illegally encoded
data and Gnus's way might not necessarily be the best.
That Gnus does is simple; concatenate decoded successive encoded
words without SPC[1] and leave the others as-is[2].

[1] In reality, rfc2047.el concatenates successive encoded words
    without SPC, and then decodes it.
[2] The reason why there appears SPC between "'" and "s".



      parent reply	other threads:[~2016-01-07  0:45 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-06 22:01 Adam Sjøgren
2016-01-06 22:30 ` Bjørn Mork
2016-01-06 23:02   ` Adam Sjøgren
2016-01-26 20:59     ` Adam Sjøgren
2016-01-07  0:45 ` Katsumi Yamaoka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4mbn8yb4k0.fsf@jpl.org \
    --to=yamaoka@jpl.org \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).