Gnus development mailing list
 help / color / mirror / Atom feed
From: Katsumi Yamaoka <yamaoka@jpl.org>
To: ding@gnus.org
Subject: Re: More liberal MIME decoding (unencoded question marks in encoded words)
Date: Tue, 27 Nov 2007 18:34:12 +0900	[thread overview]
Message-ID: <b4m7ik4jaln.fsf@jpl.org> (raw)
In-Reply-To: <v9y7ckoe1t.fsf@marauder.physik.uni-ulm.de>

I've installed the new ones in the Gnus trunk.  Decoding bad Q
encoding is enabled by default.

>>>>> Reiner Steib wrote:
> On Mon, Nov 26 2007, Katsumi Yamaoka wrote:

>> Would we be able to make complete test cases?
>>
>> (rfc2047-decode-string "=?ISO-8859-1?Q??foo?=")
>> "?foo"

[...]

> Do you see the other examples often in the wild?

No, I've never seen such ones at all, though I always examine
raw data when decoding fails.  What I saw were mainly broken B
encoding (99.9% of Japanese MIME messages use B encoding).

> If not, I'd rather not make the decode too liberal.

I thought it's not going too far since it doesn't support encoded
words folded into two or more lines.  In reality, there's the
reason I didn't make it support newlines in encoded words.  Because
the regexp pattern for Q encoding is ambiguous in a sense, if it
supports newlines, it might lead re-search to get stuck with an
encoded word that is not terminated with "?=".

FYI:

> +\\(B\\?[+/0-9A-Za-z]*=*\

This pattern is restricted into only the characters that B
encoding uses, since the base64 decoder doesn't work with data
containing other characters.

> +\\|Q\\?\\(?:\\?+[ -<>@-~]\\)?\\(?:[ ->@-~]+\\?+[ -<>@-~]\\)*[ ->@-~]*\\?*\
> +\\)\\?="))

This pattern is similar to:

"Q\\?\\(\\?+[^\n=?]\\)?\\([^\n?]+\\?+[^\n=?]\\)*[^\n?]*\\?*"
     <--------1-------><----------2,3----------><--4--><-5->

1. After "Q?", allow "?"s that follow a character other than "=".
2. Allow "=" after "Q?"; it isn't regarded as the terminator.
3. In the middle of an encoded word, allow "?"s that follow a
   character other than "=".
4. Allow any characters other than "?" in the middle of an
   encoded word.
5. At the end, allow "?"s.

> And we probably should have an option to toggle strict/loose
> decoding.

I've introduced the `rfc2047-allow-irregular-q-encoded-words'
option.  I wish that it is tested widely, so I've set the default
value to t.  But it might have to be nil when it is imported into
the stable branch.  Now there are two regexps; one is
`rfc2047-encoded-word-regexp' for strict decoding, the other is
`rfc2047-encoded-word-regexp-loose'.

> BTW, another problem is that we "double encode"
> (`rfc2047-encode-encoded-words') such subjects:

ELISP> (rfc2047-decode-string "=?ISO-8859-1?Q?foo??=")
> "=?ISO-8859-1?Q?foo??="
ELISP> (rfc2047-encode-string "=?ISO-8859-1?Q?foo??=")
> "=?us-ascii?Q?=3D=3FISO-8859-1=3FQ=3Ffoo=3F=3F=3D?="

> AFAICS, Gnus (`rfc2047-encodable-p'?) simply looks for "=?".

[...]

> ..., i.e. shouldn't we use "=\\?.+\\?[qb]\\?.+\\?=" (or similar)
> instead of "=?"?

I agree with you.  I've made `rfc2047-encodable-p' use
`rfc2047-encoded-word-regexp' instead of "=?".  It will be hard
to be found out even if this change causes another trouble, though.

Regards,



  reply	other threads:[~2007-11-27  9:34 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-24 13:29 Reiner Steib
2007-11-26 12:31 ` Katsumi Yamaoka
2007-11-26 22:08   ` Reiner Steib
2007-11-27  9:34     ` Katsumi Yamaoka [this message]
2007-12-01 13:17       ` Reiner Steib
2007-12-04  9:19         ` Katsumi Yamaoka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4m7ik4jaln.fsf@jpl.org \
    --to=yamaoka@jpl.org \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).