From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/60863 Path: news.gmane.org!not-for-mail From: Hrvoje Niksic Newsgroups: gmane.emacs.gnus.general Subject: CRLF canonicalization only done for text/plain Date: Fri, 02 Sep 2005 01:35:48 +0200 Message-ID: <87aciws6nf.fsf@xemacs.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 X-Trace: sea.gmane.org 1125619001 6906 80.91.229.2 (1 Sep 2005 23:56:41 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 1 Sep 2005 23:56:41 +0000 (UTC) Cc: hniksic@xemacs.org Original-X-From: ding-owner+m9395=ding+2daccount=gmane.org@lists.math.uh.edu Fri Sep 02 01:56:31 2005 Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by ciao.gmane.org with esmtp (Exim 4.43) id 1EAytm-0005Cx-U1 for ding-account@gmane.org; Fri, 02 Sep 2005 01:55:15 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu ident=lists) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 1EAytl-00038C-01 for ding-account@gmane.org; Thu, 01 Sep 2005 18:55:13 -0500 Original-Received: from nas01.math.uh.edu ([129.7.128.39]) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 1EAyb4-00036Y-00 for ding@lists.math.uh.edu; Thu, 01 Sep 2005 18:35:54 -0500 Original-Received: from quimby.gnus.org ([80.91.224.244]) by nas01.math.uh.edu with esmtp (Exim 4.52) id 1EAyb2-0003Ar-7x for ding@lists.math.uh.edu; Thu, 01 Sep 2005 18:35:54 -0500 Original-Received: from ls405.htnet.hr ([195.29.150.97]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1EAyb0-00041n-00 for ; Fri, 02 Sep 2005 01:35:50 +0200 Original-Received: from ls422.t-com.hr (ls422.t-com.hr [195.29.150.237]) by ls405.htnet.hr (0.0.0/8.12.10) with ESMTP id j81NZot8017833; Fri, 2 Sep 2005 01:35:50 +0200 Original-Received: from ls422.t-com.hr (localhost.localdomain [127.0.0.1]) by ls422.t-com.hr (Qmlai) with ESMTP id 6D8C0988042; Fri, 2 Sep 2005 01:33:25 +0200 (CEST) X-Envelope-Sender: hniksic@xemacs.org X-Envelope-Sender: hniksic@xemacs.org Original-Received: from ls422.t-com.hr (localhost.localdomain [127.0.0.1]) by ls422.t-com.hr (Qmlai) with ESMTP id 61CF098803F; Fri, 2 Sep 2005 01:33:25 +0200 (CEST) Original-Received: from localhost.localdomain (83-131-67-249.adsl.net.t-com.hr [83.131.67.249]) by ls422.t-com.hr (Qmlai) with ESMTP id 508508B803B; Fri, 2 Sep 2005 01:33:24 +0200 (CEST) Original-Received: by localhost.localdomain (Postfix, from userid 1000) id 6F874380004; Fri, 2 Sep 2005 01:35:48 +0200 (CEST) Original-To: ding@gnus.org User-Agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.4.17 (Jumbo Shrimp, linux) X-Spam-Score: 0.1 (/) Precedence: bulk Original-Sender: ding-owner@lists.math.uh.edu Xref: news.gmane.org gmane.emacs.gnus.general:60863 Archived-At: [ Please Cc responses to me because I'm not on the list. ] Today I received mail with a Python script attached and, to my surprise, discovered that Gnus saved it using CRLF for line endings. Since it was sent from a Unix machine and and I'm pretty sure the source didn't contain CRLF, this puzzled me. It turns out that the mail was sent by Evolution which prepared the following attachment: Content-Disposition: attachment; filename=rnditems Content-Transfer-Encoding: base64 Content-Type: text/x-python; name=rnditems; charset=ISO-8859-2 IyEvdXNyL2Jpbi9weXRob24NCg0KaW1wb3J0IG9wdHBhcnNlLCBzeXMsIHJhbmRvbSwgc3RyaW5n DQoNCmRlZiBzaGVsbF9xdW90ZShzdHIpOg0KICAgIHJldHVybiAiJyVzJyIgJSBzdHJpbmcucmVw bGFjZShzdHIsICInIiwgIidcIidcIiciKQ0KDQpwYXJzZXIgPSBvcHRwYXJzZS5PcHRpb25QYXJz ZXIodXNhZ2U9IiVwcm9nIFstbiBNSU5dIFstLXF1b3RlXSBJVEVNLi4uIikNCnBhcnNlci5hZGRf b3B0aW9uKCctbicsICctLW51bWJlcicsIGRlc3Q9J251bWJlcicsIHR5cGU9J2ludCcsIGhlbHA9 J21heCBudW1iZXIgb2YgaXRlbXMgdG8gcmV0dXJuJykNCnBhcnNlci5hZGRfb3B0aW9uKCctcScs ICctLXF1b3RlJywgZGVzdD0ncXVvdGUnLCBhY3Rpb249J3N0b3JlX3RydWUnLCBoZWxwPSdxdW90 ZSBwcmludGVkIGl0ZW1zJykNCm9wdGlvbiwgYXJncyA9IHBhcnNlci5wYXJzZV9hcmdzKCkNCg0K aWYgb3B0aW9uLnF1b3RlOg0KICAgIGFyZ3MgPSBbc2hlbGxfcXVvdGUoeCkgZm9yIHggaW4gYXJn c10NCg0KcmFuZG9tLnNodWZmbGUoYXJncykNCmZvciBpLCBpdGVtIGluIGVudW1lcmF0ZShhcmdz KToNCiAgICBpZiBvcHRpb24ubnVtYmVyIGlzIG5vdCBOb25lIGFuZCBpID49IG9wdGlvbi5udW1i ZXI6DQogICAgICAgIGJyZWFrDQogICAgcHJpbnQgaXRlbQ0K Decoding the base64 shows that the text indeed contains CRLF pairs; however, since the Content-Type is text/*, I think it is meant to be the "canonical representation". After decoding the base64, Gnus should have converted that representation to the local line coding convention (i.e. converted CRLF to LF and optionally let Mule handle the actual conversion). That Gnus didn't do this came as a surprise because I was pretty sure that Gnus had the correct code to handle this exact situation. A closer inspection of Gnus shows that it does contain CRLF (de)-canonicalization code, but only for "text/plain" attachments, whereas the above was "text/x-python": (defun mm-decode-content-transfer-encoding (encoding &optional type) "Decodes buffer encoded with ENCODING, returning success status. If TYPE is `text/plain' CRLF->LF translation may occur." ... (when (and (memq encoding '(base64 x-uuencode x-uue x-binhex x-yenc)) (equal type "text/plain")) (goto-char (point-min)) (while (search-forward "\r\n" nil t) (replace-match "\n" t t))))) In other words, Evolution seems to think that "canonical representation" of line feeds pertains to all text/* types, whereas Gnus thinks that it pertains only to text/plain. A reading of section 6.8 seems to indicate that Evolution is right: Care must be taken to use the proper octets for line breaks if base64 encoding is applied directly to text material that has not been converted to canonical form. In particular, *text line breaks must be converted into CRLF sequences prior to base64 encoding* [emphasis mine]. The important thing to note is that this may be done directly by the encoder rather than in a prior canonicalization step in some implementations. This talks about "text line breaks" as "text material", which can only meaningfully refer to all text/* content types, at least unless explicitly specified otherwise. Applying it only to text/plain seems wrong -- a useful feature of the TYPE/SUBTYPE division is that certain properties can describe a type regardless of the subtypes. rfc2049 does single out text/plain in section 4, bullet 2. However, I don't think it intends to state that canonicalization (and the implied decanonicalization) implies only to text/plain. It says: [...] If character set conversion is involved, however, care must be taken to understand the semantics of the media type, which may have strong implications for any character set conversion, e.g. with regard to syntactically meaningful characters in a text subtype other than "plain". That is, different character set conversion rules may apply to text types other than text/plain -- but nothing is said of CRLF conversions. And then: For example, in the case of text/plain data, the text must be converted to a supported character set and lines must be delimited with CRLF delimiters in accordance with RFC 822. Note that the restriction on line lengths implied by RFC 822 is eliminated if the next step employs either quoted-printable or base64 encoding. This uses text/plain data as an *example* of how text data can be treated, i.e. that it requires both charset conversion and CRLF canonicalization. But it doesn't imply that subtypes other than "plain" don't need to undergo (de)canonicalization. Hopefully the above should be enough to convince you that Gnus does not currently do the right thing. Fortunately the change is simple enough, implemented by applying the patch below. Please let me know if you agree with this change. 2005-09-02 Hrvoje Niksic * mm-encode.el (mm-encode-content-transfer-encoding): Likewise when encoding. * mm-bodies.el (mm-decode-content-transfer-encoding): De-canonicalize CRLF for all text content types, not just text/plain. --- lisp/mm-bodies.el.orig 2005-09-02 00:46:57.000000000 +0200 +++ lisp/mm-bodies.el 2005-09-02 00:47:14.000000000 +0200 @@ -218,7 +218,7 @@ nil)) (when (and (memq encoding '(base64 x-uuencode x-uue x-binhex x-yenc)) - (equal type "text/plain")) + (string-match "\\`text/" type)) (goto-char (point-min)) (while (search-forward "\r\n" nil t) (replace-match "\n" t t))))) --- lisp/mm-encode.el.orig 2005-09-02 01:07:17.000000000 +0200 +++ lisp/mm-encode.el 2005-09-02 01:07:20.000000000 +0200 @@ -106,7 +106,7 @@ ;; Likewise base64 below. (quoted-printable-encode-region (point-min) (point-max) t)) ((eq encoding 'base64) - (when (equal type "text/plain") + (when (string-match "\\`text/" type) (goto-char (point-min)) (while (search-forward "\n" nil t) (replace-match "\r\n" t t)))