Gnus development mailing list
 help / color / mirror / Atom feed
* CRLF canonicalization only done for text/plain
@ 2005-09-01 23:35 Hrvoje Niksic
  2005-09-02  7:59 ` Simon Josefsson
  0 siblings, 1 reply; 4+ messages in thread
From: Hrvoje Niksic @ 2005-09-01 23:35 UTC (permalink / raw)
  Cc: hniksic

[ Please Cc responses to me because I'm not on the list. ]

Today I received mail with a Python script attached and, to my
surprise, discovered that Gnus saved it using CRLF for line endings.
Since it was sent from a Unix machine and and I'm pretty sure the
source didn't contain CRLF, this puzzled me.

It turns out that the mail was sent by Evolution which prepared the
following attachment:

Content-Disposition: attachment; filename=rnditems
Content-Transfer-Encoding: base64
Content-Type: text/x-python; name=rnditems; charset=ISO-8859-2

IyEvdXNyL2Jpbi9weXRob24NCg0KaW1wb3J0IG9wdHBhcnNlLCBzeXMsIHJhbmRvbSwgc3RyaW5n
DQoNCmRlZiBzaGVsbF9xdW90ZShzdHIpOg0KICAgIHJldHVybiAiJyVzJyIgJSBzdHJpbmcucmVw
bGFjZShzdHIsICInIiwgIidcIidcIiciKQ0KDQpwYXJzZXIgPSBvcHRwYXJzZS5PcHRpb25QYXJz
ZXIodXNhZ2U9IiVwcm9nIFstbiBNSU5dIFstLXF1b3RlXSBJVEVNLi4uIikNCnBhcnNlci5hZGRf
b3B0aW9uKCctbicsICctLW51bWJlcicsIGRlc3Q9J251bWJlcicsIHR5cGU9J2ludCcsIGhlbHA9
J21heCBudW1iZXIgb2YgaXRlbXMgdG8gcmV0dXJuJykNCnBhcnNlci5hZGRfb3B0aW9uKCctcScs
ICctLXF1b3RlJywgZGVzdD0ncXVvdGUnLCBhY3Rpb249J3N0b3JlX3RydWUnLCBoZWxwPSdxdW90
ZSBwcmludGVkIGl0ZW1zJykNCm9wdGlvbiwgYXJncyA9IHBhcnNlci5wYXJzZV9hcmdzKCkNCg0K
aWYgb3B0aW9uLnF1b3RlOg0KICAgIGFyZ3MgPSBbc2hlbGxfcXVvdGUoeCkgZm9yIHggaW4gYXJn
c10NCg0KcmFuZG9tLnNodWZmbGUoYXJncykNCmZvciBpLCBpdGVtIGluIGVudW1lcmF0ZShhcmdz
KToNCiAgICBpZiBvcHRpb24ubnVtYmVyIGlzIG5vdCBOb25lIGFuZCBpID49IG9wdGlvbi5udW1i
ZXI6DQogICAgICAgIGJyZWFrDQogICAgcHJpbnQgaXRlbQ0K

Decoding the base64 shows that the text indeed contains CRLF pairs;
however, since the Content-Type is text/*, I think it is meant to be
the "canonical representation".  After decoding the base64, Gnus
should have converted that representation to the local line coding
convention (i.e. converted CRLF to LF and optionally let Mule handle
the actual conversion).  That Gnus didn't do this came as a surprise
because I was pretty sure that Gnus had the correct code to handle
this exact situation.

A closer inspection of Gnus shows that it does contain CRLF
(de)-canonicalization code, but only for "text/plain" attachments,
whereas the above was "text/x-python":

(defun mm-decode-content-transfer-encoding (encoding &optional type)
  "Decodes buffer encoded with ENCODING, returning success status.
If TYPE is `text/plain' CRLF->LF translation may occur."
  ...
    (when (and
	   (memq encoding '(base64 x-uuencode x-uue x-binhex x-yenc))
	   (equal type "text/plain"))
      (goto-char (point-min))
      (while (search-forward "\r\n" nil t)
	(replace-match "\n" t t)))))

In other words, Evolution seems to think that "canonical
representation" of line feeds pertains to all text/* types, whereas
Gnus thinks that it pertains only to text/plain.  A reading of section
6.8 seems to indicate that Evolution is right:

   Care must be taken to use the proper octets for line breaks if
   base64 encoding is applied directly to text material that has not
   been converted to canonical form.  In particular, *text line breaks
   must be converted into CRLF sequences prior to base64 encoding*
   [emphasis mine].  The important thing to note is that this may be
   done directly by the encoder rather than in a prior
   canonicalization step in some implementations.

This talks about "text line breaks" as "text material", which can only
meaningfully refer to all text/* content types, at least unless
explicitly specified otherwise.  Applying it only to text/plain seems
wrong -- a useful feature of the TYPE/SUBTYPE division is that certain
properties can describe a type regardless of the subtypes.

rfc2049 does single out text/plain in section 4, bullet 2.  However, I
don't think it intends to state that canonicalization (and the implied
decanonicalization) implies only to text/plain.  It says:

          [...] If character set conversion is involved, however, care
          must be taken to understand the semantics of the media type,
          which may have strong implications for any character set
          conversion, e.g. with regard to syntactically meaningful
          characters in a text subtype other than "plain".

That is, different character set conversion rules may apply to text
types other than text/plain -- but nothing is said of CRLF
conversions.  And then:

          For example, in the case of text/plain data, the text must
          be converted to a supported character set and lines must be
          delimited with CRLF delimiters in accordance with RFC 822.
          Note that the restriction on line lengths implied by RFC 822
          is eliminated if the next step employs either
          quoted-printable or base64 encoding.

This uses text/plain data as an *example* of how text data can be
treated, i.e. that it requires both charset conversion and CRLF
canonicalization.  But it doesn't imply that subtypes other than
"plain" don't need to undergo (de)canonicalization.


Hopefully the above should be enough to convince you that Gnus does
not currently do the right thing.  Fortunately the change is simple
enough, implemented by applying the patch below.  Please let me know
if you agree with this change.

2005-09-02  Hrvoje Niksic  <hniksic@xemacs.org>

	* mm-encode.el (mm-encode-content-transfer-encoding): Likewise
	when encoding.

	* mm-bodies.el (mm-decode-content-transfer-encoding):
	De-canonicalize CRLF for all text content types, not just
	text/plain.

--- lisp/mm-bodies.el.orig	2005-09-02 00:46:57.000000000 +0200
+++ lisp/mm-bodies.el	2005-09-02 00:47:14.000000000 +0200
@@ -218,7 +218,7 @@
 	 nil))
     (when (and
 	   (memq encoding '(base64 x-uuencode x-uue x-binhex x-yenc))
-	   (equal type "text/plain"))
+	   (string-match "\\`text/" type))
       (goto-char (point-min))
       (while (search-forward "\r\n" nil t)
 	(replace-match "\n" t t)))))
--- lisp/mm-encode.el.orig	2005-09-02 01:07:17.000000000 +0200
+++ lisp/mm-encode.el	2005-09-02 01:07:20.000000000 +0200
@@ -106,7 +106,7 @@
     ;; Likewise base64 below.
     (quoted-printable-encode-region (point-min) (point-max) t))
    ((eq encoding 'base64)
-    (when (equal type "text/plain")
+    (when (string-match "\\`text/" type)
       (goto-char (point-min))
       (while (search-forward "\n" nil t)
 	(replace-match "\r\n" t t)))



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: CRLF canonicalization only done for text/plain
  2005-09-01 23:35 CRLF canonicalization only done for text/plain Hrvoje Niksic
@ 2005-09-02  7:59 ` Simon Josefsson
  2005-09-03 11:04   ` Reiner Steib
  0 siblings, 1 reply; 4+ messages in thread
From: Simon Josefsson @ 2005-09-02  7:59 UTC (permalink / raw)
  Cc: ding

Hrvoje Niksic <hniksic@xemacs.org> writes:

> Hopefully the above should be enough to convince you that Gnus does
> not currently do the right thing.  Fortunately the change is simple
> enough, implemented by applying the patch below.  Please let me know
> if you agree with this change.

You convinced me, so I applied your patch.  RFC 2045 section 6.5 and
6.7 (4) also appear to discuss this.  Thanks for the detailed
investigation.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: CRLF canonicalization only done for text/plain
  2005-09-02  7:59 ` Simon Josefsson
@ 2005-09-03 11:04   ` Reiner Steib
  2005-09-03 11:21     ` Simon Josefsson
  0 siblings, 1 reply; 4+ messages in thread
From: Reiner Steib @ 2005-09-03 11:04 UTC (permalink / raw)


On Fri, Sep 02 2005, Simon Josefsson wrote:

> Hrvoje Niksic <hniksic@xemacs.org> writes:
>
>> Hopefully the above should be enough to convince you that Gnus does
>> not currently do the right thing.  Fortunately the change is simple
>> enough, implemented by applying the patch below.  Please let me know
>> if you agree with this change.
>
> You convinced me, so I applied your patch.  RFC 2045 section 6.5 and
> 6.7 (4) also appear to discuss this.  Thanks for the detailed
> investigation.

Shouldn't this go to the v5-10 branch too?

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: CRLF canonicalization only done for text/plain
  2005-09-03 11:04   ` Reiner Steib
@ 2005-09-03 11:21     ` Simon Josefsson
  0 siblings, 0 replies; 4+ messages in thread
From: Simon Josefsson @ 2005-09-03 11:21 UTC (permalink / raw)


Reiner Steib <reinersteib+gmane@imap.cc> writes:

> On Fri, Sep 02 2005, Simon Josefsson wrote:
>
>> Hrvoje Niksic <hniksic@xemacs.org> writes:
>>
>>> Hopefully the above should be enough to convince you that Gnus does
>>> not currently do the right thing.  Fortunately the change is simple
>>> enough, implemented by applying the patch below.  Please let me know
>>> if you agree with this change.
>>
>> You convinced me, so I applied your patch.  RFC 2045 section 6.5 and
>> 6.7 (4) also appear to discuss this.  Thanks for the detailed
>> investigation.
>
> Shouldn't this go to the v5-10 branch too?

Yes, installed there too.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-09-03 11:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-01 23:35 CRLF canonicalization only done for text/plain Hrvoje Niksic
2005-09-02  7:59 ` Simon Josefsson
2005-09-03 11:04   ` Reiner Steib
2005-09-03 11:21     ` Simon Josefsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).