more multibyte bugs

Gnus development mailing list
 help / color / mirror / Atom feed

* more multibyte bugs
@ 2000-05-05  8:57 Vladimir Volovich
  2000-05-05 12:34 ` Florian Weimer
  2000-05-05 15:23 ` Shenghuo ZHU
  0 siblings, 2 replies; 5+ messages in thread
From: Vladimir Volovich @ 2000-05-05  8:57 UTC (permalink / raw)


Hi Shenghuo,

with current cvs i'm able to send messages in koi8-r with CTE=8bit.
Thanks for good work! BTW, i wonder why sending via smtp worked fine
in prev. versions, and was broken in some recent one (and was fixed
without touching smtp sending code?).

With the current cvs version, there is one more bug that i just
noticed: open any message with 8-bit CTE, press TWICE C-u g, and then
get it again.  I.e. press

g
C-u g
C-u g
g

Then 8-bit text will be broken. Those multibyte problems are really
bad and tricky. :-)

BTW, as you agreed to use unibyte encoding for mbox, but you think
that it could be a disuster, i suggest to create a cvs "multibyte
cleanup branch" in which you could try to Do The Right Thing (TM);
i'll be happy to debug this branch and report problems. When
everything works fine, we could merge it into the main branch.

Am i right that mule will _by design_ interpret combinations of 8-bit
octets which form multibyte characters as multibyte characters in a
multibyte buffer regardless of the coding system settings?

In this case, it seems that ALL gnus buffers which are supposed to
contain arbitrary 8-bit text must be unibyte.

Best,
v.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: more multibyte bugs
  2000-05-05  8:57 more multibyte bugs Vladimir Volovich
@ 2000-05-05 12:34 ` Florian Weimer
  2000-05-05 18:32   ` Vladimir Volovich
  2000-05-05 15:23 ` Shenghuo ZHU
  1 sibling, 1 reply; 5+ messages in thread
From: Florian Weimer @ 2000-05-05 12:34 UTC (permalink / raw)

Vladimir Volovich <vvv@vvv.vsu.ru> writes:

> Am i right that mule will _by design_ interpret combinations of 8-bit
> octets which form multibyte characters as multibyte characters in a
> multibyte buffer regardless of the coding system settings?

Yes, some Emacs MULE guru confirmed it in an earlier posting to this
list, I think.

> In this case, it seems that ALL gnus buffers which are supposed to
> contain arbitrary 8-bit text must be unibyte.

If this can't be done (because making a buffer unibyte is some kind
of global thing), one could add a special coding system designed for
encoding raw bytes in the range 16#80# .. 16#FF#.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: more multibyte bugs
  2000-05-05 12:34 ` Florian Weimer
@ 2000-05-05 18:32   ` Vladimir Volovich
  2000-05-06  7:10     ` Florian Weimer
  0 siblings, 1 reply; 5+ messages in thread
From: Vladimir Volovich @ 2000-05-05 18:32 UTC (permalink / raw)

"FW" == Florian Weimer writes:

 >> In this case, it seems that ALL gnus buffers which are supposed to
 >> contain arbitrary 8-bit text must be unibyte.

 FW> If this can't be done (because making a buffer unibyte is some
 FW> kind of global thing),

what do you mean by word "global"? the set-buffer-multibyte
function can switch between both modes independently for different
buffers. i think that gnus should try to work with unibyte buffers in
all cases where the buffer contents is not supposed to contain data in
internal mule encoding.

 FW> one could add a special coding system designed for encoding raw
 FW> bytes in the range 16#80# .. 16#FF#.

how this could be done?

	Best regards, -- Vladimir.
-- 
COBOL is for morons.
		-- E.W. Dijkstra

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: more multibyte bugs
  2000-05-05 18:32   ` Vladimir Volovich
@ 2000-05-06  7:10     ` Florian Weimer
  0 siblings, 0 replies; 5+ messages in thread
From: Florian Weimer @ 2000-05-06  7:10 UTC (permalink / raw)

Vladimir Volovich <vvv@vvv.vsu.ru> writes:

>  FW> If this can't be done (because making a buffer unibyte is some
>  FW> kind of global thing),
> 
> what do you mean by word "global"? 

It affects a whole buffer, not just some region in it.

>  FW> one could add a special coding system designed for encoding raw
>  FW> bytes in the range 16#80# .. 16#FF#.
> 
> how this could be done?

First, you create a MULE charset containing 128 characters.  Then
you write a CCL program which maps characters in the range 16#80# ..
16#FF# to these characters (and vice versa).  The resulting characters
are not special in any way (except that you don't have a font to
display them), and no accidental combining occurs.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: more multibyte bugs
  2000-05-05  8:57 more multibyte bugs Vladimir Volovich
  2000-05-05 12:34 ` Florian Weimer
@ 2000-05-05 15:23 ` Shenghuo ZHU
  1 sibling, 0 replies; 5+ messages in thread
From: Shenghuo ZHU @ 2000-05-05 15:23 UTC (permalink / raw)


>>>>> "VV" == Vladimir Volovich <vvv@vvv.vsu.ru> writes:

VV> Hi Shenghuo,

VV> with current cvs i'm able to send messages in koi8-r with
VV> CTE=8bit.  Thanks for good work! BTW, i wonder why sending via
VV> smtp worked fine in prev. versions, and was broken in some recent
VV> one (and was fixed without touching smtp sending code?).

You are using koi8-r as your default coding system.  In the previous
versions, the 8-bit data are copied, decoded and encoded, as if they
are koi8-r characters.  For 8bit koi8-r message, there is no problem.
Even for iso-8859-? and some other coding systems, koi8-r decoding,
then encoding, do not hurt the 8-bit data. But it is not Right Thing
(TM).  Now, every buffer for prepared messages are unibyte, i.e., it
is safer for CTE=8bit. In the recent fix, I set the default value of
enable-multibyte-characters in mm-with-unibyte-current-buffer so that
all new buffers generated within the scope will be unibyte, including
those generated in smtpmail.el.

VV> With the current cvs version, there is one more bug that i just
VV> noticed: open any message with 8-bit CTE, press TWICE C-u g, and then
VV> get it again.  I.e. press

VV> g
VV> C-u g
VV> C-u g
VV> g

VV> Then 8-bit text will be broken. Those multibyte problems are
VV> really bad and tricky. :-)

The article buffer is unibyte, right? I just fixed it.  If it doesn't
work, tell me.

VV> BTW, as you agreed to use unibyte encoding for mbox, but you think
VV> that it could be a disuster, i suggest to create a cvs "multibyte
VV> cleanup branch" in which you could try to Do The Right Thing (TM);
VV> i'll be happy to debug this branch and report problems. When
VV> everything works fine, we could merge it into the main branch.

The problem is caused not only by open-file, but by the operations of
insert, buffer-string and others. So, we can not simply create a
buffer as unibyte, because MULE characters and 8bit raw characters may
exist in a buffer at the same time. Currently, I fixed some bug by
dynamically switching between unibyte and multibyte
(mm-with-unibyte-current-buffer).  I hope it works fine. 

Yesterday, I created an mbox and copied an 8bit utf-8 message into it
after I fixed a unibyte bug in mm-insert-part.  It works fine. Could
you test it again?  If it not fixes the problem, please send the
message to me as base64 attachment.


-- 
Shenghuo



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2000-05-06  7:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-05-05  8:57 more multibyte bugs Vladimir Volovich
2000-05-05 12:34 ` Florian Weimer
2000-05-05 18:32   ` Vladimir Volovich
2000-05-06  7:10     ` Florian Weimer
2000-05-05 15:23 ` Shenghuo ZHU

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).