* more multibyte bugs @ 2000-05-05 8:57 Vladimir Volovich 2000-05-05 12:34 ` Florian Weimer 2000-05-05 15:23 ` Shenghuo ZHU 0 siblings, 2 replies; 5+ messages in thread From: Vladimir Volovich @ 2000-05-05 8:57 UTC (permalink / raw) Hi Shenghuo, with current cvs i'm able to send messages in koi8-r with CTE=8bit. Thanks for good work! BTW, i wonder why sending via smtp worked fine in prev. versions, and was broken in some recent one (and was fixed without touching smtp sending code?). With the current cvs version, there is one more bug that i just noticed: open any message with 8-bit CTE, press TWICE C-u g, and then get it again. I.e. press g C-u g C-u g g Then 8-bit text will be broken. Those multibyte problems are really bad and tricky. :-) BTW, as you agreed to use unibyte encoding for mbox, but you think that it could be a disuster, i suggest to create a cvs "multibyte cleanup branch" in which you could try to Do The Right Thing (TM); i'll be happy to debug this branch and report problems. When everything works fine, we could merge it into the main branch. Am i right that mule will _by design_ interpret combinations of 8-bit octets which form multibyte characters as multibyte characters in a multibyte buffer regardless of the coding system settings? In this case, it seems that ALL gnus buffers which are supposed to contain arbitrary 8-bit text must be unibyte. Best, v. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: more multibyte bugs 2000-05-05 8:57 more multibyte bugs Vladimir Volovich @ 2000-05-05 12:34 ` Florian Weimer 2000-05-05 18:32 ` Vladimir Volovich 2000-05-05 15:23 ` Shenghuo ZHU 1 sibling, 1 reply; 5+ messages in thread From: Florian Weimer @ 2000-05-05 12:34 UTC (permalink / raw) Vladimir Volovich <vvv@vvv.vsu.ru> writes: > Am i right that mule will _by design_ interpret combinations of 8-bit > octets which form multibyte characters as multibyte characters in a > multibyte buffer regardless of the coding system settings? Yes, some Emacs MULE guru confirmed it in an earlier posting to this list, I think. > In this case, it seems that ALL gnus buffers which are supposed to > contain arbitrary 8-bit text must be unibyte. If this can't be done (because making a buffer unibyte is some kind of global thing), one could add a special coding system designed for encoding raw bytes in the range 16#80# .. 16#FF#. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: more multibyte bugs 2000-05-05 12:34 ` Florian Weimer @ 2000-05-05 18:32 ` Vladimir Volovich 2000-05-06 7:10 ` Florian Weimer 0 siblings, 1 reply; 5+ messages in thread From: Vladimir Volovich @ 2000-05-05 18:32 UTC (permalink / raw) "FW" == Florian Weimer writes: >> In this case, it seems that ALL gnus buffers which are supposed to >> contain arbitrary 8-bit text must be unibyte. FW> If this can't be done (because making a buffer unibyte is some FW> kind of global thing), what do you mean by word "global"? the set-buffer-multibyte function can switch between both modes independently for different buffers. i think that gnus should try to work with unibyte buffers in all cases where the buffer contents is not supposed to contain data in internal mule encoding. FW> one could add a special coding system designed for encoding raw FW> bytes in the range 16#80# .. 16#FF#. how this could be done? Best regards, -- Vladimir. -- COBOL is for morons. -- E.W. Dijkstra ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: more multibyte bugs 2000-05-05 18:32 ` Vladimir Volovich @ 2000-05-06 7:10 ` Florian Weimer 0 siblings, 0 replies; 5+ messages in thread From: Florian Weimer @ 2000-05-06 7:10 UTC (permalink / raw) Vladimir Volovich <vvv@vvv.vsu.ru> writes: > FW> If this can't be done (because making a buffer unibyte is some > FW> kind of global thing), > > what do you mean by word "global"? It affects a whole buffer, not just some region in it. > FW> one could add a special coding system designed for encoding raw > FW> bytes in the range 16#80# .. 16#FF#. > > how this could be done? First, you create a MULE charset containing 128 characters. Then you write a CCL program which maps characters in the range 16#80# .. 16#FF# to these characters (and vice versa). The resulting characters are not special in any way (except that you don't have a font to display them), and no accidental combining occurs. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: more multibyte bugs 2000-05-05 8:57 more multibyte bugs Vladimir Volovich 2000-05-05 12:34 ` Florian Weimer @ 2000-05-05 15:23 ` Shenghuo ZHU 1 sibling, 0 replies; 5+ messages in thread From: Shenghuo ZHU @ 2000-05-05 15:23 UTC (permalink / raw) >>>>> "VV" == Vladimir Volovich <vvv@vvv.vsu.ru> writes: VV> Hi Shenghuo, VV> with current cvs i'm able to send messages in koi8-r with VV> CTE=8bit. Thanks for good work! BTW, i wonder why sending via VV> smtp worked fine in prev. versions, and was broken in some recent VV> one (and was fixed without touching smtp sending code?). You are using koi8-r as your default coding system. In the previous versions, the 8-bit data are copied, decoded and encoded, as if they are koi8-r characters. For 8bit koi8-r message, there is no problem. Even for iso-8859-? and some other coding systems, koi8-r decoding, then encoding, do not hurt the 8-bit data. But it is not Right Thing (TM). Now, every buffer for prepared messages are unibyte, i.e., it is safer for CTE=8bit. In the recent fix, I set the default value of enable-multibyte-characters in mm-with-unibyte-current-buffer so that all new buffers generated within the scope will be unibyte, including those generated in smtpmail.el. VV> With the current cvs version, there is one more bug that i just VV> noticed: open any message with 8-bit CTE, press TWICE C-u g, and then VV> get it again. I.e. press VV> g VV> C-u g VV> C-u g VV> g VV> Then 8-bit text will be broken. Those multibyte problems are VV> really bad and tricky. :-) The article buffer is unibyte, right? I just fixed it. If it doesn't work, tell me. VV> BTW, as you agreed to use unibyte encoding for mbox, but you think VV> that it could be a disuster, i suggest to create a cvs "multibyte VV> cleanup branch" in which you could try to Do The Right Thing (TM); VV> i'll be happy to debug this branch and report problems. When VV> everything works fine, we could merge it into the main branch. The problem is caused not only by open-file, but by the operations of insert, buffer-string and others. So, we can not simply create a buffer as unibyte, because MULE characters and 8bit raw characters may exist in a buffer at the same time. Currently, I fixed some bug by dynamically switching between unibyte and multibyte (mm-with-unibyte-current-buffer). I hope it works fine. Yesterday, I created an mbox and copied an 8bit utf-8 message into it after I fixed a unibyte bug in mm-insert-part. It works fine. Could you test it again? If it not fixes the problem, please send the message to me as base64 attachment. -- Shenghuo ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2000-05-06 7:10 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2000-05-05 8:57 more multibyte bugs Vladimir Volovich 2000-05-05 12:34 ` Florian Weimer 2000-05-05 18:32 ` Vladimir Volovich 2000-05-06 7:10 ` Florian Weimer 2000-05-05 15:23 ` Shenghuo ZHU
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).