Gnus development mailing list
 help / color / mirror / Atom feed
* Re: Problems with 8-bit headers
@ 2000-04-26 11:20 Janusz S. Bien
  0 siblings, 0 replies; 13+ messages in thread
From: Janusz S. Bien @ 2000-04-26 11:20 UTC (permalink / raw)
  Cc: Vladimir Volovich


I haven't follow this thread from the beginning, so I apologize for
possible repetition.

Vladimir Volovich <vvv@vvv.vsu.ru> writes:

>However, there is a bug somewhere in core Emacs which makes it
>misbehave when i receive messages with 8-bit content-transfer-encoding
>and characters in the range 0x80--0x9f (which often happens when i
>receive messages in UTF-8). I store mails in nnmbox backend, and when
>i receive such a message, i get ENORMOUS time delays when pressing `g'
>in a group buffer, and article numbering gets broken (displayed number
>of articles in a group is overestimated), and such things. I think
>this is because Emacs uses "intelligent" auto-detection algorithm
>which wants to interpret a `mbox' buffer as if it were in some charset
>(mbox is opened in raw mode, but this does not help to switch of this
>"intelligent" behavior, and certain character combinations choke
>Emacs). This is a long-standing bug which is there for more than a
>year, but i don't know how to report it... :-(
>
>	Best regards, -- Vladimir.

I have not (yet) installed mule-ucs, so I cannot try to reproduce your
problem. However, I am intrigued by it.

I am not sure the bug is in core Emacs. If you open mbox in raw mode
(I understand you mean actually `raw-text'), then multibyte character
support is switched off. In consequence, the characters in the range
0x80--0x9f are OK (otherwise they would be interpreted as parts of
multibyte character representations). There is probably something
wrong in the way they are processed later.

If it appears that is it really a GNU Emacs bug, you should report it
to bug-gnu-emacs@gnu.org (you can use for it the menu-bar: [Help]
[Send Bug Report]).

Best regards

Janusz

---------------------------------------------------------------------
                     ,   
dr hab. Janusz S. Bien, prof. UW
Prof. Janusz S. Bien, Warsaw Uniwersity
---------------------------------------------------------------------
---------------------------------------------------------------------
Na tym koncie czytam i wysylam poczte i wiadomosci offline -
prosze nie oczekiwac szybkiej odpowiedzi!
Data w naglowku to data rozpoczecia pisania listu, a nie jego wyslania.
---------------------------------------------------------------------
On this account I read/post mail/news offline - do not expect 
an immediate answer!
The date in the header refers to the moment when I started to write
the letter, not to the moment when I sent it.
---------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problems with 8-bit headers
  2000-04-30 21:21 Janusz S. Bien
  2000-05-01  7:08 ` Florian Weimer
@ 2000-05-01 11:41 ` Kai Großjohann
  1 sibling, 0 replies; 13+ messages in thread
From: Kai Großjohann @ 2000-05-01 11:41 UTC (permalink / raw)
  Cc: ding, Florian Weimer, Vladimir Volovich

jsbien@mimuw.edu.pl (Janusz S. Bien) writes:

> Could you please elaborate? When you save or write a buffer, Emacs
> can decide whether the buffer content (e.g. input directly from the
> keyboard) can be represented in the default buffer coding system; if
> not, Emacs offers the choice of appropriate coding systems.  

If you use the input method `latin-2-prefix' and then type `" a',
you'll get a letter which looks like ä.  If you then use the input
method `latin-1-prefix' and type `" a', you'll also get a letter which
looks like ä.

But these two characters are not the same, as far as Emacs is
concerned!  Ie, Emacs will not be able to save this file as either
Latin-1 or Latin-2.

It has been decided to use Unicode as the internal character encoding
in Emacs (possibly slightly modified to compensate for Han
unification).  When that happens, the two ä characters will be the
same (unified), but until then, they are different.

At least that's my understanding of the whole issue.

kai
-- 
Beware of flying birch trees.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problems with 8-bit headers
  2000-04-30 21:21 Janusz S. Bien
@ 2000-05-01  7:08 ` Florian Weimer
  2000-05-01 11:41 ` Kai Großjohann
  1 sibling, 0 replies; 13+ messages in thread
From: Florian Weimer @ 2000-05-01  7:08 UTC (permalink / raw)


jsbien@mimuw.edu.pl (Janusz S. Bien) writes:

> Could you please elaborate? When you save or write a buffer, Emacs
> can decide whether the buffer content (e.g. input directly from the
> keyboard) can be represented in the default buffer coding system; if
> not, Emacs offers the choice of appropriate coding systems.  

Try the following: Insert an ISO-8859-1 "ä" and an ISO-8859-2 "ä" into
a buffer and save it.  Emacs should allow you to save the buffer in
either charset, but currently, neither is possible.

> Why do you think that an analogical treatment of the preferred MIME
> charset it is impossible? What is wrong with such Emacs functions as
> `select-safe-coding-system-function', `detect-coding-with-priority',
> `detect-coding-region', `prefer-coding-system' etc.?

I still think these functions aren't useful for this kind of job at
the moment.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problems with 8-bit headers
@ 2000-04-30 21:21 Janusz S. Bien
  2000-05-01  7:08 ` Florian Weimer
  2000-05-01 11:41 ` Kai Großjohann
  0 siblings, 2 replies; 13+ messages in thread
From: Janusz S. Bien @ 2000-04-30 21:21 UTC (permalink / raw)
  Cc: Florian Weimer, Vladimir Volovich

Florian Weimer <fw@deneb.cygnus.argh.org> writes:

>Vladimir Volovich <vvv@vvv.vsu.ru> writes:
[...]
>> * possibility to set up preferred charsets to encode message bodies on
>>   a per-group basis.
>
>> * possibility to specify that the preferred charset to encode a
>>   message body should be the same as the charset used to encode the
>>   message (or mime part) i'm replying to.
>
>Currently, these two features are very hard to implement, at least
>in the general case, because Emacs unifies only ASCII characters in
>different MULE charsets.  For example, an ISO-8859-1 "=E4" and one in
>ISO-8859-2 are completely different characters from Emacs' view.  As a
>result, it is impossible to decide whether the preferred MIME charset
>can be used---without additional mapping tables.  

Could you please elaborate? When you save or write a buffer, Emacs
can decide whether the buffer content (e.g. input directly from the
keyboard) can be represented in the default buffer coding system; if
not, Emacs offers the choice of appropriate coding systems.  

Why do you think that an analogical treatment of the preferred MIME
charset it is impossible? What is wrong with such Emacs functions as
`select-safe-coding-system-function', `detect-coding-with-priority',
`detect-coding-region', `prefer-coding-system' etc.?

Regards

Janusz

---------------------------------------------------------------------
                     ,   
dr hab. Janusz S. Bien, prof. UW
Prof. Janusz S. Bien, Warsaw Uniwersity
---------------------------------------------------------------------
---------------------------------------------------------------------
Na tym koncie czytam i wysylam poczte i wiadomosci offline -
prosze nie oczekiwac szybkiej odpowiedzi!
Data w naglowku to data rozpoczecia pisania listu, a nie jego wyslania.
---------------------------------------------------------------------
On this account I read/post mail/news offline - do not expect 
an immediate answer!
The date in the header refers to the moment when I started to write
the letter, not to the moment when I sent it.
---------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problems with 8-bit headers
  2000-04-24 18:20 ` Vladimir Volovich
  2000-04-25 21:59   ` Florian Weimer
@ 2000-04-30 18:21   ` Florian Weimer
  1 sibling, 0 replies; 13+ messages in thread
From: Florian Weimer @ 2000-04-30 18:21 UTC (permalink / raw)


Vladimir Volovich <vvv@vvv.vsu.ru> writes:

> * possibility to auto-select the charset for 8-bit chars in headers
>   depending on the charset used for the message body

Yes, I agree that's a nice feature. ;)

> * possibility to set up preferred charsets to encode message bodies on
>   a per-group basis.

> * possibility to specify that the preferred charset to encode a
>   message body should be the same as the charset used to encode the
>   message (or mime part) i'm replying to.

Currently, these two features are very hard to implement, at least
in the general case, because Emacs unifies only ASCII characters in
different MULE charsets.  For example, an ISO-8859-1 "ä" and one in
ISO-8859-2 are completely different characters from Emacs' view.  As a
result, it is impossible to decide whether the preferred MIME charset
can be used---without additional mapping tables.  On the other hand,
such tables do exist for almost all charsets used on the world. They
are known as Unicode. ;)

IMHO, these features have to wait until Emacs switches to Unicode for
internal character representation.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problems with 8-bit headers
  2000-04-26 20:49       ` François Pinard
@ 2000-04-26 23:35         ` Kenichi Handa
  0 siblings, 0 replies; 13+ messages in thread
From: Kenichi Handa @ 2000-04-26 23:35 UTC (permalink / raw)
  Cc: vvv, ding, handa

=?ISO-8859-1?Q?Fran=E7ois_Pinard?= <pinard@iro.umontreal.ca> writes:
> Vladimir Volovich <vvv@vvv.vsu.ru> writes:
>>  However, there is a bug somewhere in core Emacs which makes it misbehave
>>  when i receive messages with 8-bit content-transfer-encoding and characters
>>  in the range 0x80--0x9f (which often happens when i receive messages
>>  in UTF-8).

> I fear that this is a strong limitation of Mule for Latin characters.
> A kind that may be very hard to repair.

Emacs still doesn't support UTF-8 encoding.  But, I've just
finished the work for handling 0x80..0x9f bytes in a better
way in the developping code of Emacs 21.  With that, those
characters can appear even in a multibyte buffer without
being combined with the following 0xa0..0xff bytes.  And,
thus, code conversion is now reversible.

And, although the external package Mule-UCS supports UTF-8
encoding, as you already know, it can handle only such
characters that Emacs supports.  But, we have agreed on
adding a new (Emacs internal) charset
`mule-unicode-0100-24ff' which contains Unicode characters
of the range U+0100..U+24FF (thus convers all Cyrillic
characters).  On reading UTF-8, Mule-UCS at first tries to
map Cyrillic characters of cyrillic-8859-5, but if
impossible, map to mule-unicode-0100-24ff.

>>  I store mails in nnmbox backend, and when i receive such a message,
>>  i get ENORMOUS time delays when pressing `g' in a group buffer, and
>>  article numbering gets broken (displayed number of articles in a group
>>  is overestimated), and such things. I think this is because Emacs uses
>>  "intelligent" auto-detection algorithm which wants to interpret a `mbox'
>>  buffer as if it were in some charset (mbox is opened in raw mode, but
>>  this does not help to switch of this "intelligent" behavior, and certain
>>  character combinations choke Emacs). This is a long-standing bug which
>>  is there for more than a year, but i don't know how to report it... :-(

Which mail program are you using?  Gnus?

---
Ken'ichi HANDA
handa@etl.go.jp



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problems with 8-bit headers
  2000-04-26  6:29     ` Vladimir Volovich
@ 2000-04-26 20:49       ` François Pinard
  2000-04-26 23:35         ` Kenichi Handa
  0 siblings, 1 reply; 13+ messages in thread
From: François Pinard @ 2000-04-26 20:49 UTC (permalink / raw)
  Cc: Forum of ding/Gnus users, Handa Kenichi

Vladimir Volovich <vvv@vvv.vsu.ru> writes:

> However, there is a bug somewhere in core Emacs which makes it misbehave
> when i receive messages with 8-bit content-transfer-encoding and characters
> in the range 0x80--0x9f (which often happens when i receive messages
> in UTF-8).

I fear that this is a strong limitation of Mule for Latin characters.
A kind that may be very hard to repair.

> I store mails in nnmbox backend, and when i receive such a message,
> i get ENORMOUS time delays when pressing `g' in a group buffer, and
> article numbering gets broken (displayed number of articles in a group
> is overestimated), and such things. I think this is because Emacs uses
> "intelligent" auto-detection algorithm which wants to interpret a `mbox'
> buffer as if it were in some charset (mbox is opened in raw mode, but
> this does not help to switch of this "intelligent" behavior, and certain
> character combinations choke Emacs). This is a long-standing bug which
> is there for more than a year, but i don't know how to report it... :-(

When you have such problems, best is often to directly write to the real
Mule author, this very guy who endlessly tries to repair the FSF damage, and
often gets the blame for problems he did not create :-).  He is very busy,
so you should expect some delay in replies: Handa Kenichi <handa@etl.go.jp>.
He often helped me with Mule difficulties, I try to not abuse of his help.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problems with 8-bit headers
  2000-04-25 21:59   ` Florian Weimer
@ 2000-04-26  6:29     ` Vladimir Volovich
  2000-04-26 20:49       ` François Pinard
  0 siblings, 1 reply; 13+ messages in thread
From: Vladimir Volovich @ 2000-04-26  6:29 UTC (permalink / raw)


"FW" == Florian Weimer writes:

 >> Some features i'd like to see:

 FW> Impressive list.  Why can't all the people of the world simply
 FW> use UTF-8?

Well, it's definitely a good thing (TM), and it already works quite
satisfactory with Emacs/Gnus/mule-ucs (except some characters present
in UCS are not supported by Emacs' internal encoding).

Those features are not really that important (besides perhaps
auto-detecting the charset of 8-bit chars in headers).

However, there is a bug somewhere in core Emacs which makes it
misbehave when i receive messages with 8-bit content-transfer-encoding
and characters in the range 0x80--0x9f (which often happens when i
receive messages in UTF-8). I store mails in nnmbox backend, and when
i receive such a message, i get ENORMOUS time delays when pressing `g'
in a group buffer, and article numbering gets broken (displayed number
of articles in a group is overestimated), and such things. I think
this is because Emacs uses "intelligent" auto-detection algorithm
which wants to interpret a `mbox' buffer as if it were in some charset
(mbox is opened in raw mode, but this does not help to switch of this
"intelligent" behavior, and certain character combinations choke
Emacs). This is a long-standing bug which is there for more than a
year, but i don't know how to report it... :-(

	Best regards, -- Vladimir.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problems with 8-bit headers
  2000-04-24 18:20 ` Vladimir Volovich
@ 2000-04-25 21:59   ` Florian Weimer
  2000-04-26  6:29     ` Vladimir Volovich
  2000-04-30 18:21   ` Florian Weimer
  1 sibling, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2000-04-25 21:59 UTC (permalink / raw)


Vladimir Volovich <vvv@vvv.vsu.ru> writes:

> "FW" == Florian Weimer writes:
> 
>  FW> In the past, there were some issues with 8-bit headers (they were
>  FW> encoded using the wrong charset or something like this).  Is this
>  FW> problem still present in Gnus 5.8.5?
> 
> No, there is no such a problem with current gnus, thanks! (i.e. you
> have fixed a bug). 

Glad to read this, but I think it was probably Christoph Roland who
fixed it:

2000-03-13 20:23:06  Christoph Rohland  <hans-christoph.rohland@sap.com>

	* rfc2047.el (rfc2047-encode-message-header): Encode no matter
	whether Mule.

> Some features i'd like to see:

Impressive list.  Why can't all the people of the world simply use
UTF-8?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problems with 8-bit headers
  2000-04-21  6:13 Florian Weimer
  2000-04-21  9:52 ` Roman Belenov
  2000-04-21 11:14 ` Simon Josefsson
@ 2000-04-24 18:20 ` Vladimir Volovich
  2000-04-25 21:59   ` Florian Weimer
  2000-04-30 18:21   ` Florian Weimer
  2 siblings, 2 replies; 13+ messages in thread
From: Vladimir Volovich @ 2000-04-24 18:20 UTC (permalink / raw)


"FW" == Florian Weimer writes:

 FW> In the past, there were some issues with 8-bit headers (they were
 FW> encoded using the wrong charset or something like this).  Is this
 FW> problem still present in Gnus 5.8.5?

No, there is no such a problem with current gnus, thanks! (i.e. you
have fixed a bug). I'm able to get 8-bit chars correctly encoded
according to the gnus-group-posting-charset-alist value.

Gnus continues to get more and more powerful features. :-)

Some features i'd like to see:

* possibility to set up preferred charsets to encode message bodies on
  a per-group basis. Currently, i have a global setting

  (put-charset-property 'cyrillic-iso8859-5 'preferred-coding-system 'koi8-r)

  in my .emacs file; the mm-preferred-coding-system in mm-util.el
  extracts the info from charset properties, but this is done
  globally, -- not on a per group basis. I'd like to be able to send
  messages in, say, koi8-r to some groups and in windows-1251 in some
  other groups and in iso-8859-5 to other, etc. (all these are
  cyrillic charsets based on a single coding system).

* possibility to specify that the preferred charset to encode a
  message body should be the same as the charset used to encode the
  message (or mime part) i'm replying to.

  (these two features could perhaps be configured via the same
  variable)

* possibility to auto-select the charset for 8-bit chars in headers
  depending on the charset used for the message body (of for the first
  text/* mime part of the message). i.e. if the message body is in,
  say, windows-1251, then gnus should be able to treat 8-bit chars in
  headers as if they were in windows-1251 (but not simply get the
  charset from gnus-group-charset-alist on a per-group basis)

	Best regards, -- Vladimir.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problems with 8-bit headers
  2000-04-21  6:13 Florian Weimer
  2000-04-21  9:52 ` Roman Belenov
@ 2000-04-21 11:14 ` Simon Josefsson
  2000-04-24 18:20 ` Vladimir Volovich
  2 siblings, 0 replies; 13+ messages in thread
From: Simon Josefsson @ 2000-04-21 11:14 UTC (permalink / raw)
  Cc: ding, roman, vvv

Florian Weimer <fw@deneb.cygnus.argh.org> writes:

> In the past, there were some issues with 8-bit headers (they were
> encoded using the wrong charset or something like this).  Is this
> problem still present in Gnus 5.8.5?

I think there is problem using non-mule XEmacsen where Gnus would send
out character in headers with the 8-bit set. I believe there is a bug
report about it, I was able to reproduce it but I'm not sure what the
fix should be.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Problems with 8-bit headers
  2000-04-21  6:13 Florian Weimer
@ 2000-04-21  9:52 ` Roman Belenov
  2000-04-21 11:14 ` Simon Josefsson
  2000-04-24 18:20 ` Vladimir Volovich
  2 siblings, 0 replies; 13+ messages in thread
From: Roman Belenov @ 2000-04-21  9:52 UTC (permalink / raw)
  Cc: ding, vvv

Florian Weimer <fw@deneb.cygnus.argh.org> writes:

> In the past, there were some issues with 8-bit headers (they were
> encoded using the wrong charset or something like this).  Is this
> problem still present in Gnus 5.8.5?

Does Gnus 5.8.5 already exist ? May be you mean CVS version which I
don't use ?

Anyway, I had no problems in 5.8.3 and 5.8.4. Note that I didn't try
to use unencoded headers (Vladimir reported some problems with them) -
but encoded ones do work correctly.

-- 
 							With regards, Roman.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Problems with 8-bit headers
@ 2000-04-21  6:13 Florian Weimer
  2000-04-21  9:52 ` Roman Belenov
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Florian Weimer @ 2000-04-21  6:13 UTC (permalink / raw)
  Cc: roman, vvv

In the past, there were some issues with 8-bit headers (they were
encoded using the wrong charset or something like this).  Is this
problem still present in Gnus 5.8.5?



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2000-05-01 11:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-04-26 11:20 Problems with 8-bit headers Janusz S. Bien
  -- strict thread matches above, loose matches on Subject: below --
2000-04-30 21:21 Janusz S. Bien
2000-05-01  7:08 ` Florian Weimer
2000-05-01 11:41 ` Kai Großjohann
2000-04-21  6:13 Florian Weimer
2000-04-21  9:52 ` Roman Belenov
2000-04-21 11:14 ` Simon Josefsson
2000-04-24 18:20 ` Vladimir Volovich
2000-04-25 21:59   ` Florian Weimer
2000-04-26  6:29     ` Vladimir Volovich
2000-04-26 20:49       ` François Pinard
2000-04-26 23:35         ` Kenichi Handa
2000-04-30 18:21   ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).