Gnus development mailing list
 help / color / mirror / Atom feed
* Re: bug#8070: gnus damages attached file
       [not found]         ` <87d3mnk1sy.fsf@myhost.localdomain>
@ 2011-02-20  1:15           ` Lars Ingebrigtsen
  2011-02-20  8:49             ` Andreas Schwab
  0 siblings, 1 reply; 18+ messages in thread
From: Lars Ingebrigtsen @ 2011-02-20  1:15 UTC (permalink / raw)
  To: Hobbit, ding; +Cc: bugs

[-- Attachment #1: Type: text/plain, Size: 584 bytes --]

Hobbit <werehobbit@yandex.ru> writes:

> If you look at mml-generate-mime-1 at mml.el.gz, you'll see this code:
>
> (if (and (not raw)
> 		   (member (car (split-string type "/")) '("text" "message")))
> 	      (progn
>
> When attachment is text file all this black magic after `progn' starts
> to work. Question arises: why not just grab a text, encode it into
> base64 and put into a message?

It's a good question, and I don't know.  I've Cc'd this message to the
Gnus development list -- perhaps someone there knows?

(The issue is that the user inserts the following MIME part:


[-- Attachment #2: thing.txt --]
[-- Type: text/plain, Size: 24 bytes --]

ðóññêèé òåêñò â êï1251

[-- Attachment #3: Type: text/plain, Size: 158 bytes --]



and then the CP<russian> text gets marked as utf-8.)

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20  1:15           ` bug#8070: gnus damages attached file Lars Ingebrigtsen
@ 2011-02-20  8:49             ` Andreas Schwab
  2011-02-20  9:23               ` Hobbit
  2011-02-20  9:31               ` Hobbit
  0 siblings, 2 replies; 18+ messages in thread
From: Andreas Schwab @ 2011-02-20  8:49 UTC (permalink / raw)
  To: bugs; +Cc: Hobbit, ding

[-- Attachment #1: Type: text/plain, Size: 1287 bytes --]

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Hobbit <werehobbit@yandex.ru> writes:
>
>> If you look at mml-generate-mime-1 at mml.el.gz, you'll see this code:
>>
>> (if (and (not raw)
>> 		   (member (car (split-string type "/")) '("text" "message")))
>> 	      (progn
>>
>> When attachment is text file all this black magic after `progn' starts
>> to work. Question arises: why not just grab a text, encode it into
>> base64 and put into a message?

You'll still have to find out the correct charset.

> It's a good question, and I don't know.  I've Cc'd this message to the
> Gnus development list -- perhaps someone there knows?
>
> (The issue is that the user inserts the following MIME part:
>
> ðóññêèé òåêñò â êï1251
>
>
> and then the CP<russian> text gets marked as utf-8.)

I see this in the raw mail:

--=-=-=
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline; filename=thing.txt
Content-Transfer-Encoding: base64

8PPx8ero6SDy5erx8iDiIOrvMTI1MQ0K
--=-=-=

Which makes sense, since 8-bit encodings cannot be told apart, so the
detection can't do better than using the first charset from the priority
list.  If you need to force a specific charset you have to specify it
manually, as I did here:


[-- Attachment #2: thing.txt --]
[-- Type: text/plain, Size: 24 bytes --]

русский текст в кп1251

[-- Attachment #3: Type: text/plain, Size: 172 bytes --]


Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20  8:49             ` Andreas Schwab
@ 2011-02-20  9:23               ` Hobbit
  2011-02-20  9:46                 ` Andreas Schwab
  2011-02-20  9:31               ` Hobbit
  1 sibling, 1 reply; 18+ messages in thread
From: Hobbit @ 2011-02-20  9:23 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: bugs, ding

[-- Attachment #1: Type: text/plain, Size: 1579 bytes --]

Andreas Schwab <schwab@linux-m68k.org> writes:

>>> When attachment is text file all this black magic after `progn'
>>> starts to work. Question arises: why not just grab a text, encode it
>>> into base64 and put into a message?
>
> You'll still have to find out the correct charset.
>

It's an RFC requirement (to find out the correct charset) or just
'useful' ability? Why not just send a text file like some binary file
and let a recepient save it to his hard drive and do with it what he
wants, including figuring out file's true charset?

>
> I see this in the raw mail:
>
> --=-=-=
> Content-Type: text/plain; charset=iso-8859-1
> Content-Disposition: inline; filename=thing.txt
> Content-Transfer-Encoding: base64
>
> 8PPx8ero6SDy5erx8iDiIOrvMTI1MQ0K
> --=-=-=
>
> Which makes sense, since 8-bit encodings cannot be told apart, so the
> detection can't do better than using the first charset from the
> priority list.  If you need to force a specific charset you have to
> specify it manually, as I did here:
>
> русский текст в кп1251
>
> Andreas.

Actually my Gnus (Emacs is 23.2.1 and Gnus 5.13) puts there
charset=UTF-8, treats any cp<russian> text file as iso-8859-1, reencodes
it from iso-8859-1 to UTF8, encodes to base64 and inserts into a mail.

But Gnus of Lars Ingebrigtsen thinks that any 8-bit file is iso-8859-1,
so nothing weird happens (so he can't reproduce my bug in its full
power).

Could it be because of some UNIX locale or other settings?

I enclose my `describe-current-coding-system'.


[-- Attachment #2: describe-current-coding-system --]
[-- Type: text/plain, Size: 2983 bytes --]

Coding system for saving this buffer:
  U -- utf-8-emacs

Default coding system (for new files):
  U -- utf-8-unix (alias: mule-utf-8-unix)

Coding system for keyboard input:
  = -- no-conversion (alias: binary)

Coding system for terminal output:
  nil
Coding system for inter-client cut and paste:
  nil
Defaults for subprocess I/O:
  decoding: U -- utf-8-unix (alias: mule-utf-8-unix)

  encoding: U -- utf-8-unix (alias: mule-utf-8-unix)


Priority order for recognizing coding systems when reading files:
  1. utf-8 (alias: mule-utf-8)
  2. iso-2022-7bit 
  3. iso-latin-1 (alias: iso-8859-1 latin-1)
  4. iso-2022-7bit-lock (alias: iso-2022-int-1)
  5. iso-2022-8bit-ss2 
  6. emacs-mule 
  7. raw-text 
  8. iso-2022-jp (alias: junet)
  9. in-is13194-devanagari (alias: devanagari)
  10. chinese-iso-8bit (alias: cn-gb-2312 euc-china euc-cn cn-gb gb2312)
  11. utf-8-auto 
  12. utf-8-with-signature 
  13. utf-16 
  14. utf-16be-with-signature (alias: utf-16-be)
  15. utf-16le-with-signature (alias: utf-16-le)
  16. utf-16be 
  17. utf-16le 
  18. japanese-shift-jis (alias: shift_jis sjis)
  19. chinese-big5 (alias: big5 cn-big5 cp950)
  20. w3m-euc-japan 
  21. undecided 

  Other coding systems cannot be distinguished automatically
  from these, and therefore cannot be recognized automatically
  with the present coding system priorities.

Particular coding systems specified for certain file names:

  OPERATION	TARGET PATTERN		CODING SYSTEM(s)
  ---------	--------------		----------------
  File I/O      "\\.dz\\'"              (no-conversion . no-conversion)
                "\\.xz\\(~\\|\\.~[0-9]+~\\)?\\'"
                                        (no-conversion . no-conversion)
                "\\.g?z\\(~\\|\\.~[0-9]+~\\)?\\'"
                                        (no-conversion . no-conversion)
                "\\.\\(?:tgz\\|svgz\\|sifz\\)\\(~\\|\\.~[0-9]+~\\)?\\'"
                                        (no-conversion . no-conversion)
                "\\.tbz2?\\'"           (no-conversion . no-conversion)
                "\\.bz2\\(~\\|\\.~[0-9]+~\\)?\\'"
                                        (no-conversion . no-conversion)
                "\\.Z\\(~\\|\\.~[0-9]+~\\)?\\'"
                                        (no-conversion . no-conversion)
                "\\.elc\\'"             utf-8-emacs
                "\\.utf\\(-8\\)?\\'"    utf-8
                "\\.xml\\'"             xml-find-file-coding-system
                "\\(\\`\\|/\\)loaddefs.el\\'"
                                        (raw-text . raw-text-unix)
                "\\.tar\\'"             (no-conversion . no-conversion)
                "\\.po[tx]?\\'\\|\\.po\\."
                                        po-find-file-coding-system
                "\\.\\(tex\\|ltx\\|dtx\\|drv\\)\\'"
                                        latexenc-find-file-coding-system
                ""                      (undecided)
  Process I/O	nothing specified
  Network I/O	nothing specified

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20  8:49             ` Andreas Schwab
  2011-02-20  9:23               ` Hobbit
@ 2011-02-20  9:31               ` Hobbit
  2011-02-20  9:48                 ` Andreas Schwab
  1 sibling, 1 reply; 18+ messages in thread
From: Hobbit @ 2011-02-20  9:31 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: bugs, ding

Gnus should find out the correct charset only for inline attachment or
for Disposition='attachment' attachement too?

I am talking about Disposition='attachment', of course.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20  9:23               ` Hobbit
@ 2011-02-20  9:46                 ` Andreas Schwab
  2011-02-20 10:46                   ` Hobbit
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Schwab @ 2011-02-20  9:46 UTC (permalink / raw)
  To: Hobbit; +Cc: bugs, ding

Hobbit <werehobbit@yandex.ru> writes:

> Andreas Schwab <schwab@linux-m68k.org> writes:
>
>>>> When attachment is text file all this black magic after `progn'
>>>> starts to work. Question arises: why not just grab a text, encode it
>>>> into base64 and put into a message?
>>
>> You'll still have to find out the correct charset.
>>
>
> It's an RFC requirement (to find out the correct charset) or just
> 'useful' ability? Why not just send a text file like some binary file
> and let a recepient save it to his hard drive and do with it what he
> wants, including figuring out file's true charset?

It doesn't make sense to send a text without charset information, unless
it is pure ASCII.  Why would you want the recipient to guess wrong?

> Actually my Gnus (Emacs is 23.2.1 and Gnus 5.13) puts there
> charset=UTF-8, treats any cp<russian> text file as iso-8859-1, reencodes
> it from iso-8859-1 to UTF8, encodes to base64 and inserts into a mail.

Are you sure it is Gnus that reencodes the attachment?  It could also be
altered by any MTA.

> But Gnus of Lars Ingebrigtsen thinks that any 8-bit file is iso-8859-1,
> so nothing weird happens (so he can't reproduce my bug in its full
> power).

There is not much difference between yours and Lars' interpretation of
the text: both detect Latin-1 as the encoding of the file.

> Priority order for recognizing coding systems when reading files:
>   1. utf-8 (alias: mule-utf-8)
>   2. iso-2022-7bit 
>   3. iso-latin-1 (alias: iso-8859-1 latin-1)

See?  You have Latin-1 at the top of the priority list.  So your file
*is* Latin-1 for all practical purpose, unless you explicitly override
the choice.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20  9:31               ` Hobbit
@ 2011-02-20  9:48                 ` Andreas Schwab
  0 siblings, 0 replies; 18+ messages in thread
From: Andreas Schwab @ 2011-02-20  9:48 UTC (permalink / raw)
  To: Hobbit; +Cc: bugs, ding

Hobbit <werehobbit@yandex.ru> writes:

> Gnus should find out the correct charset only for inline attachment or
> for Disposition='attachment' attachement too?

How does that make any difference?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20  9:46                 ` Andreas Schwab
@ 2011-02-20 10:46                   ` Hobbit
  2011-02-20 11:40                     ` Andreas Schwab
  0 siblings, 1 reply; 18+ messages in thread
From: Hobbit @ 2011-02-20 10:46 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: bugs, ding

Andreas Schwab <schwab@linux-m68k.org> writes:

> It doesn't make sense to send a text without charset information, unless
> it is pure ASCII.  Why would you want the recipient to guess wrong?

Because almost all of my work recepients use Windows and expect only
CP1251. Especially my boss, which hates receiving files in any other
codepages. And sometimes I had to send files in CP866 (old DOS encoding)
to my colleagues which still use some old DOS programs. I use UTF8 only
when I'm talking with Open-Source hackers. So 98 % cases my recepients
know what they are going to view. I even doubt that

Content-Type: text/plain; charset=xxxxxxx

could make a difference, because in most of a cases they just save a
file to a hard drive and open it in some other program. Introducing some
Gnus ability for other 2 % just create unnecesary problems.

And some text formats (for example, LaTeX) have something like

\usepackage[cp1251]{inputenc}

which directly states document encoding.

> Are you sure it is Gnus that reencodes the attachment?  It could also be
> altered by any MTA.

At least when I evaluate (setq mm-coding-system-priorities
'(iso-8859-1)) before sending a mail I receive my files unchanged. Could
a MTA reencode files when it sees

Content-Type: text/plain; charset=xxxxxxx

string? Because some of my Internet interlocutors also have this problem
(and they use other MTA's).

Andreas Schwab <schwab@linux-m68k.org> writes:

> Hobbit <werehobbit@yandex.ru> writes:
>
>> Gnus should find out the correct charset only for inline attachment or
>> for Disposition='attachment' attachement too?
>
> How does that make any difference?
>

When I use an 'inline' attachment it part of a message and could be
reencoded (it's normal). If it's Disposition='attachment' then it's best
for Gnus to leave file unchanged. It's an 'attachment' ('burn after
reading, please'), imagine: you ordered a tech book and received it with
others man handwritten notes as 'improvement'.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20 10:46                   ` Hobbit
@ 2011-02-20 11:40                     ` Andreas Schwab
  2011-02-20 13:27                       ` Hobbit
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Schwab @ 2011-02-20 11:40 UTC (permalink / raw)
  To: Hobbit; +Cc: bugs, ding

Hobbit <werehobbit@yandex.ru> writes:

> Andreas Schwab <schwab@linux-m68k.org> writes:
>
>> It doesn't make sense to send a text without charset information, unless
>> it is pure ASCII.  Why would you want the recipient to guess wrong?
>
> Because almost all of my work recepients use Windows and expect only
> CP1251.

How is that an argument against specifying the correct charset
explicitly?

> So 98 % cases my recepients know what they are going to view.

What about the remaining 2%?

> Introducing some Gnus ability for other 2 % just create unnecesary
> problems.

Which problems?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20 11:40                     ` Andreas Schwab
@ 2011-02-20 13:27                       ` Hobbit
  2011-02-20 13:41                         ` Andreas Schwab
  0 siblings, 1 reply; 18+ messages in thread
From: Hobbit @ 2011-02-20 13:27 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: bugs, ding

[-- Attachment #1: Type: text/plain, Size: 1361 bytes --]

Andreas Schwab <schwab@linux-m68k.org> writes:

>>
>> Because almost all of my work recepients use Windows and expect only
>> CP1251.
>
> How is that an argument against specifying the correct charset
> explicitly?
>

I can't know charset of everything. If somebody ask me to send him some
cryptic text file (for example, generated by an old local program with
it's own unique charset), I don't want to guess what codepage it
has. Asked. Sent. Finished.

Today I had to pack it into a ZIP archive to ensure integrity.

At least it could be better to specify it by some variable (setq
gnus-default-attachment-charset ...) separately.

>> So 98 % cases my recepients know what they are going to view.
>
> What about the remaining 2%?
>

Well, in a such cases I usually write about codepage in a message
text. I would be happy to use something like

<menu-bar> <Set Attachment Codepage> 
Codepage: adobe-standard-encoding-dos (or "don't specify" if needed)

to explicitly ask Gnus for that. But in 98 % it's just a problem.  And
yes, I usually use Linux and prefer UTF8 for my own Org-Mode notes and
other.

>> Introducing some Gnus ability for other 2 % just create unnecesary
>> problems.
>
> Which problems?
>
Look news://news.gnus.org/gnus.gnus-bug thread 'bug#8070: gnus damages
attached file'

I enclose two reports about that (from aforementioned thread):


[-- Attachment #2: report #1 --]
[-- Type: text/plain, Size: 1950 bytes --]

From: Werehobbit <werehobbit@yandex.ru>
Subject: bug#8070: gnus damages attached file
Newsgroups: gnus.gnus-bug
Date: Thu, 17 Feb 2011 09:09:42 +0200
Organization: Gnus News User Services

Gnus v5.13
GNU Emacs 23.2.1 (i686-pc-linux-gnu, GTK+ Version 2.20.1)
 of 2010-05-08 on pidsley.hoetzel.info

Gnus damages an attached text file (i. e. if you send file your
recipient will recieve it in broken state).

I attached example files: original text file (in CP1251)
letter.txt.orig and received file letter.txt, everything packed in ZIP archive.

Sorry for my broken English.

------------------ Environment follows ------------------

(setq gnus-default-nntp-server "")
(setq gnus-select-method
      '(nnml ""))
(setq gnus-summary-mode-hook
      '(gnus-agent-mode))
(setq gnus-exit-gnus-hook
      '(mm-destroy-postponed-undisplay-list))
(setq gnus-setup-news-hook
      '(gnus-agent-queue-setup gnus-fixup-nnimap-unread-after-getting-new-news))
(setq gnus-group-mode-hook
      '(gnus-agent-mode))
;; (makeunbound 'gnus-topic-mode)
;; (makeunbound 'gnus-topic-mode-hook)
;; (makeunbound 'gnus-topic-line-format)
;; (makeunbound 'gnus-topic-indent-level)
;; (makeunbound 'gnus-topic-display-empty-topics)
(setq gnus-server-mode-hook
      '(gnus-agent-mode))
(setq mm-charset-synonym-alist
      '((ibm866 . cp866)
        (unicode . utf-16-le)
        (ks_c_5601-1987 . cp949)
        (windows-31j . cp932)
        (utf8 . utf-8)
        (iso8859-1 . iso-8859-1)
        (iso_8859-1 . iso-8859-1)))
(setq message-send-mail-function 'smtpmail-send-it)
(setq message-mode-hook
      '(#[nil "\302\030\303	!)\207"
              [gnus-article-copy gnus-setup-message-group nil gnus-configure-posting-styles]
              2]
        #[nil "\302 \211\020\211\021\207"
              [message-mailer message-newsreader gnus-extended-version]
              2]))
(setq message-header-setup-hook
      '(gnus-inews-insert-archive-gcc gnus-inews-insert-gcc))

[-- Attachment #3: zip-archive from report1 --]
[-- Type: application/zip, Size: 407 bytes --]

[-- Attachment #4: report #2 --]
[-- Type: text/plain, Size: 2343 bytes --]

From: "Evgeny M. Zubok" <evgeny.zubok@tochka.ru>
Subject: Re: bug#8070: gnus damages attached file
Newsgroups: gnus.gnus-bug
Date: Sun, 20 Feb 2011 13:41:40 +0300
Organization: Gnus News User Services


I can confirm this strange behaviour. Gnus converts attachments from
8-bit coding system like cp1251 (viewing it as iso-8859-1 text) to
utf-8. It seems to occur only with files that have MIME supertype
"text". When I send README.doc with plain unibyte text in cp1251
simultaneously as "text/plain" (or "text/x-tex" for instance) and
"application/msword" (or "application/octet-stream") it received
differently:

as "text/plain", "text/x-tex", etc.

-rw-r--r-- 1 zubok zubok  9 Фев 20 12:56 README.doc
-rw------- 1 zubok zubok 17 Фев 20 12:58 README1.doc.received
                        ^^^
Raw mail:

Content-Type: text/plain; charset=utf-8
Content-Disposition: attachment; filename=README.doc
Content-Transfer-Encoding: base64


as "application/msword", "application/octet-stream"

-rw-r--r-- 1 zubok zubok 9 Фев 20 12:56 README.doc
-rw------- 1 zubok zubok 9 Фев 20 12:58 README2.doc.received
                        ^^^
Raw mail:

Content-Type: application/msword
Content-Disposition: attachment; filename=README.doc
Content-Transfer-Encoding: base64


Additional information
======================

Emacs 22.2.1, Gnus 5.11. Both from Debian.

The variable mm-coding-system-priorities is nil by default.

Relevant settings from ~/.emacs:

(set-language-environment 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'compound-text)
(setq x-select-request-type '(UTF8_STRING COMPOUND_TEXT TEXT STRING))

(codepage-setup 1251)
(codepage-setup 866)
(define-coding-system-alias 'ibm866 'cp866)
(define-coding-system-alias 'windows-1251 'cp1251)
(define-coding-system-alias 'koi8-u 'koi8-r)
(define-coding-system-alias 'utf8 'utf-8)

Locale settings:

$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20 13:27                       ` Hobbit
@ 2011-02-20 13:41                         ` Andreas Schwab
  2011-02-20 14:03                           ` Hobbit
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Schwab @ 2011-02-20 13:41 UTC (permalink / raw)
  To: Hobbit; +Cc: bugs, ding

Hobbit <werehobbit@yandex.ru> writes:

> I can't know charset of everything. If somebody ask me to send him some
> cryptic text file (for example, generated by an old local program with
> it's own unique charset), I don't want to guess what codepage it
> has. Asked. Sent. Finished.

If you send an arbitrary byte stream you send it as
application/octet-stream.  Case closed.

> Well, in a such cases I usually write about codepage in a message
> text.

What's wrong with putting it in the charset declaration?

> I enclose two reports about that (from aforementioned thread):

All I can see in the second attachment are two basically equivalent
files, one encoded in UTF-8 and one encoded in Latin-1.

The third attachment is missing vital information, so I cannot say
anything about it.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20 13:41                         ` Andreas Schwab
@ 2011-02-20 14:03                           ` Hobbit
  2011-02-20 14:17                             ` Andreas Schwab
  0 siblings, 1 reply; 18+ messages in thread
From: Hobbit @ 2011-02-20 14:03 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: bugs, ding

Andreas Schwab <schwab@linux-m68k.org> writes:

> If you send an arbitrary byte stream you send it as
> application/octet-stream.  Case closed.

Okay.

>
> What's wrong with putting it in the charset declaration?
>

Gnus should ask an user what codepage to mention in the header. It
shouldn't be some random guessing (often incorrect).

>> I enclose two reports about that (from aforementioned thread):
>
> All I can see in the second attachment are two basically equivalent
> files, one encoded in UTF-8 and one encoded in Latin-1.
>

localhost$ iconv -f cp1251 letter_before.txt
русский текст в кп1251      <------- normal text

localhost$ iconv -f cp1251 letter_after.txt 
ðóññêèé òåêñò â êï1251   <---- gibberish

localhost$ iconv -f UTF8 letter_after.txt -t iso-8859-1 | 
  iconv -f cp1251
русский текст в кп1251       <------ normal text

File letter_after.txt isn't equivalent to letter_before.txt, because to
read it I not only had to know it's codepage. I need to do some strange
transformations to see normal text. That's not right and described in
report #1.

> The third attachment is missing vital information, so I cannot say
> anything about it.

What vital information? 

Besides, do you mind answering to the third attachment directly
(i. e. to news://news.gnus.org/gnus.gnus-bug thread 'bug#8070: gnus
damages attached file', third message article in the thread (which I
cited)?



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20 14:03                           ` Hobbit
@ 2011-02-20 14:17                             ` Andreas Schwab
  2011-02-20 14:34                               ` Hobbit
                                                 ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Andreas Schwab @ 2011-02-20 14:17 UTC (permalink / raw)
  To: Hobbit; +Cc: bugs, ding

Hobbit <werehobbit@yandex.ru> writes:

> Gnus should ask an user what codepage to mention in the header. It
> shouldn't be some random guessing (often incorrect).

It's not random.  You told Emacs to decode it as Latin-1, and Emacs
obeyed.

> localhost$ iconv -f cp1251 letter_before.txt

Why cp1251?

> What vital information? 

The contents of the file and the raw mail.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20 14:17                             ` Andreas Schwab
@ 2011-02-20 14:34                               ` Hobbit
  2011-02-20 15:03                                 ` Andreas Schwab
  2011-02-20 14:38                               ` Hobbit
  2011-02-20 15:25                               ` Hobbit
  2 siblings, 1 reply; 18+ messages in thread
From: Hobbit @ 2011-02-20 14:34 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: bugs, ding

Andreas Schwab <schwab@linux-m68k.org> writes:

>> Gnus should ask an user what codepage to mention in the header. It
>> shouldn't be some random guessing (often incorrect).
>
> It's not random.  You told Emacs to decode it as Latin-1, and Emacs
> obeyed.

I must change my .emacs settings each time when I need to send text file
in different codepage? Why not ask it when I attach file?

>> localhost$ iconv -f cp1251 letter_before.txt
>
> Why cp1251?
>

Because my boss asked me to use cp1251 for editing his NOTE.txt file. So
I saved it in cp1251 and tried to send. Unfortunately, Gnus thought that
it knows my needs better.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20 14:17                             ` Andreas Schwab
  2011-02-20 14:34                               ` Hobbit
@ 2011-02-20 14:38                               ` Hobbit
  2011-02-20 15:04                                 ` Andreas Schwab
  2011-02-20 15:25                               ` Hobbit
  2 siblings, 1 reply; 18+ messages in thread
From: Hobbit @ 2011-02-20 14:38 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: bugs, ding

Andreas Schwab <schwab@linux-m68k.org> writes:

>
>> localhost$ iconv -f cp1251 letter_before.txt
>
> Why cp1251?
>

I'm not sure you asked for this, but let it be.

ICONV(1P)

iconv -f fromcode [-t tocode [file ...]

The iconv utility shall convert the encoding of characters in file from
one codeset to another and write the results to standard output.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20 14:34                               ` Hobbit
@ 2011-02-20 15:03                                 ` Andreas Schwab
  2011-02-20 16:47                                   ` Hobbit
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Schwab @ 2011-02-20 15:03 UTC (permalink / raw)
  To: Hobbit; +Cc: bugs, ding

Hobbit <werehobbit@yandex.ru> writes:

> I must change my .emacs settings each time when I need to send text file
> in different codepage? Why not ask it when I attach file?

You can always add a charset tag.

>> Why cp1251?
>>
>
> Because my boss asked me to use cp1251 for editing his NOTE.txt file.

How do I know that?  There is no clue in the file.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20 14:38                               ` Hobbit
@ 2011-02-20 15:04                                 ` Andreas Schwab
  0 siblings, 0 replies; 18+ messages in thread
From: Andreas Schwab @ 2011-02-20 15:04 UTC (permalink / raw)
  To: Hobbit; +Cc: bugs, ding

Hobbit <werehobbit@yandex.ru> writes:

> Andreas Schwab <schwab@linux-m68k.org> writes:
>
>>
>>> localhost$ iconv -f cp1251 letter_before.txt
>>
>> Why cp1251?
>>
>
> I'm not sure you asked for this, but let it be.

I can read manuals, thank you very much.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20 14:17                             ` Andreas Schwab
  2011-02-20 14:34                               ` Hobbit
  2011-02-20 14:38                               ` Hobbit
@ 2011-02-20 15:25                               ` Hobbit
  2 siblings, 0 replies; 18+ messages in thread
From: Hobbit @ 2011-02-20 15:25 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: bugs, ding

[-- Attachment #1: Type: text/plain, Size: 242 bytes --]

Andreas Schwab <schwab@linux-m68k.org> writes:

>>>The third attachment is missing vital information, so I cannot say
>>>anything about it.
>>
>> What vital information? 
>
> The contents of the file and the raw mail.
>

I provide as asked:


[-- Attachment #2: contents and raw mail --]
[-- Type: text/plain, Size: 1395 bytes --]

From: "Evgeny M. Zubok" <evgeny.zubok@tochka.ru>
Subject: Re: bug#8070: gnus damages attached file
Newsgroups: gnus.gnus-bug
Date: Sun, 20 Feb 2011 18:03:12 +0300
Organization: Gnus News User Services

Some additional information.

Orginal README.txt is attached. I've archived it in order to not break
its contents.

Raw mail shows that the same short file is encoded differently when we
use either "text/plane" or "application/msword". This is definetly not
the MTA bacause the raw mail has been saved from Gnus archive (Gcc:
Sent), not from the remote mailer.


X-From-Line: nobody Sun Feb 20 17:43:20 2011
To: "Evgeny M. Zubok" <zoubok@mail.ru>
Subject: Test
X-Draft-From: ("nntp+news.gnus.org:gnus.gnus-bug" "")
From: "Evgeny M. Zubok" <evgeny.zubok@tochka.ru>
Date: Sun, 20 Feb 2011 17:43:18 +0300
Message-ID: <87oc66bwc9.fsf@tochka.ru>
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
Lines: 17
Xref: localhost sent:445
X-Gnus-Article-Number: 445   Sun, 20 Feb 2011 17:43:20 +0300

--=-=-=

Hello!

--=-=-=
Content-Type: application/msword
Content-Disposition: attachment; filename=README.txt
Content-Transfer-Encoding: base64

4OHi4+Tl5ucK
--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Disposition: attachment; filename=README.txt
Content-Transfer-Encoding: base64

w6DDocOiw6PDpMOlw6bDpw0K
--=-=-=--


[-- Attachment #3: Type: text/plain, Size: 33 bytes --]


Attachment from above message:


[-- Attachment #4: attachment mentioned in message --]
[-- Type: application/octet-stream, Size: 144 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug#8070: gnus damages attached file
  2011-02-20 15:03                                 ` Andreas Schwab
@ 2011-02-20 16:47                                   ` Hobbit
  0 siblings, 0 replies; 18+ messages in thread
From: Hobbit @ 2011-02-20 16:47 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: bugs, ding

[-- Attachment #1: Type: text/plain, Size: 45 bytes --]

>The contents of the file and the raw mail.


[-- Attachment #2: contents and raw mail --]
[-- Type: text/plain, Size: 1395 bytes --]

From: "Evgeny M. Zubok" <evgeny.zubok@tochka.ru>
Subject: Re: bug#8070: gnus damages attached file
Newsgroups: gnus.gnus-bug
Date: Sun, 20 Feb 2011 18:03:12 +0300
Organization: Gnus News User Services

Some additional information.

Orginal README.txt is attached. I've archived it in order to not break
its contents.

Raw mail shows that the same short file is encoded differently when we
use either "text/plane" or "application/msword". This is definetly not
the MTA bacause the raw mail has been saved from Gnus archive (Gcc:
Sent), not from the remote mailer.


X-From-Line: nobody Sun Feb 20 17:43:20 2011
To: "Evgeny M. Zubok" <zoubok@mail.ru>
Subject: Test
X-Draft-From: ("nntp+news.gnus.org:gnus.gnus-bug" "")
From: "Evgeny M. Zubok" <evgeny.zubok@tochka.ru>
Date: Sun, 20 Feb 2011 17:43:18 +0300
Message-ID: <87oc66bwc9.fsf@tochka.ru>
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
Lines: 17
Xref: localhost sent:445
X-Gnus-Article-Number: 445   Sun, 20 Feb 2011 17:43:20 +0300

--=-=-=

Hello!

--=-=-=
Content-Type: application/msword
Content-Disposition: attachment; filename=README.txt
Content-Transfer-Encoding: base64

4OHi4+Tl5ucK
--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Disposition: attachment; filename=README.txt
Content-Transfer-Encoding: base64

w6DDocOiw6PDpMOlw6bDpw0K
--=-=-=--


[-- Attachment #3: Type: text/plain, Size: 37 bytes --]


Attachment from the above message:


[-- Attachment #4: README.tar.gz --]
[-- Type: application/octet-stream, Size: 144 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-02-20 16:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <871v34nwdn.fsf@myhost.localdomain>
     [not found] ` <87aahss3yu.fsf@gnus.org>
     [not found]   ` <87hbbz6fg4.fsf@myhost.localdomain>
     [not found]     ` <87vd0fpvzv.fsf@gnus.org>
     [not found]       ` <874o7z662d.fsf@myhost.localdomain>
     [not found]         ` <87d3mnk1sy.fsf@myhost.localdomain>
2011-02-20  1:15           ` bug#8070: gnus damages attached file Lars Ingebrigtsen
2011-02-20  8:49             ` Andreas Schwab
2011-02-20  9:23               ` Hobbit
2011-02-20  9:46                 ` Andreas Schwab
2011-02-20 10:46                   ` Hobbit
2011-02-20 11:40                     ` Andreas Schwab
2011-02-20 13:27                       ` Hobbit
2011-02-20 13:41                         ` Andreas Schwab
2011-02-20 14:03                           ` Hobbit
2011-02-20 14:17                             ` Andreas Schwab
2011-02-20 14:34                               ` Hobbit
2011-02-20 15:03                                 ` Andreas Schwab
2011-02-20 16:47                                   ` Hobbit
2011-02-20 14:38                               ` Hobbit
2011-02-20 15:04                                 ` Andreas Schwab
2011-02-20 15:25                               ` Hobbit
2011-02-20  9:31               ` Hobbit
2011-02-20  9:48                 ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).