* Re: `.newsrc.eld' saves chinese group name in wrong coding [not found] <ufydnay6j.fsf@gmail.com> @ 2006-10-19 2:54 ` Chong Yidong 2006-10-19 3:56 ` Katsumi Yamaoka 0 siblings, 1 reply; 45+ messages in thread From: Chong Yidong @ 2006-10-19 2:54 UTC (permalink / raw) Cc: Zhang Wei, ding, Kenichi Handa Zhang Wei <id.brep@gmail.com> writes: > `.newsrc.eld' can't save chinese group name in proper coding. When gnus > is restarted, all of the articles in groups with chinese name are marked > unread. But enter that group, you will find all of the articles are old > articles (marked by an `O'). The file in the attachment is the wrong > formatted `.newsrc.eld', hope that will be helpful. The problem seems to be that when a Chinese group name is given, e.g. "好", `gnus-group-insert-group-line' ends up calling (decode-coding-string "好" 'utf-8) which gives gibberish. Could either the coding systems experts (i.e. Handa) or Gnus experts tell us why this is the wrong thing to do? I think the way to reproduce this is as follows: 1. save an empty file with a Chinese filename: C-x C-f 好 RET RET C-x C-s (I simply copied the character into the minibuffer from the HELLO file.) 2. go to the Gnus group buffer: M-x gnus RET 3. Open that file as a Gnus group: G f => Gnus group line is shown in Gibberish ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-19 2:54 ` `.newsrc.eld' saves chinese group name in wrong coding Chong Yidong @ 2006-10-19 3:56 ` Katsumi Yamaoka 2006-10-19 4:11 ` Katsumi Yamaoka 0 siblings, 1 reply; 45+ messages in thread From: Katsumi Yamaoka @ 2006-10-19 3:56 UTC (permalink / raw) Cc: emacs-pretest-bug, Zhang Wei, ding, Kenichi Handa >>>>> In <87mz7tm2wn.fsf@furball.mit.edu> >>>>> Chong Yidong <cyd@stupidchicken.com> wrote: > Zhang Wei <id.brep@gmail.com> writes: >> `.newsrc.eld' can't save chinese group name in proper coding. When gnus >> is restarted, all of the articles in groups with chinese name are marked >> unread. But enter that group, you will find all of the articles are old >> articles (marked by an `O'). The file in the attachment is the wrong >> formatted `.newsrc.eld', hope that will be helpful. > The problem seems to be that when a Chinese group name is given, e.g. > "好", `gnus-group-insert-group-line' ends up calling > (decode-coding-string "好" 'utf-8) > which gives gibberish. Could either the coding systems experts > (i.e. Handa) or Gnus experts tell us why this is the wrong thing to > do? Gnus uses utf-8 encoded non-ASCII group names internally, those encoded names are saved in the .newsrc.eld file, and they are decoded by utf-8 when displaying. I had no problem when I once tried nnrss groups with Japanese names. So, I cannot imagine what is happening with Zhang Wei, sorry. > I think the way to reproduce this is as follows: > 1. save an empty file with a Chinese filename: > C-x C-f 好 RET RET C-x C-s > (I simply copied the character into the minibuffer from the HELLO > file.) > 2. go to the Gnus group buffer: > M-x gnus RET > 3. Open that file as a Gnus group: > G f > => Gnus group line is shown in Gibberish It is caused because of the default value of `gnus-group-name-charset-group-alist'. It can be fixed with the following: (push '("\\`nndoc\\(\\+?[^:]+\\)?:") gnus-group-name-charset-group-alist) However, I'm not quite sure making it the new default is generally good. Regards, ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-19 3:56 ` Katsumi Yamaoka @ 2006-10-19 4:11 ` Katsumi Yamaoka 2006-10-19 8:33 ` Reiner Steib 0 siblings, 1 reply; 45+ messages in thread From: Katsumi Yamaoka @ 2006-10-19 4:11 UTC (permalink / raw) Cc: emacs-pretest-bug, Zhang Wei, ding, Kenichi Handa >>>>> In <b4m4pu1m00i.fsf@jpl.org> Katsumi Yamaoka wrote: > It can be fixed with the following: > (push '("\\`nndoc\\(\\+?[^:]+\\)?:") > gnus-group-name-charset-group-alist) I mistyped it. Here's what I wanted to write. (push '("\\`nndoc\\(?:\\+[^:]+\\)?:") gnus-group-name-charset-group-alist) In addition, just now I noticed it is insufficient to solve the problem. Maybe we need to do the fix here and there in Gnus to enable it to work with non-ASCII nndoc group names. Regards, ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-19 4:11 ` Katsumi Yamaoka @ 2006-10-19 8:33 ` Reiner Steib 2006-10-19 9:03 ` Katsumi Yamaoka 0 siblings, 1 reply; 45+ messages in thread From: Reiner Steib @ 2006-10-19 8:33 UTC (permalink / raw) Cc: Zhang Wei On Thu, Oct 19 2006, Katsumi Yamaoka wrote: > Gnus uses utf-8 encoded non-ASCII group names internally, those > encoded names are saved in the .newsrc.eld file, and they are > decoded by utf-8 when displaying. I had no problem when I once > tried nnrss groups with Japanese names. So, I cannot imagine > what is happening with Zhang Wei, sorry. [...] > (push '("\\`nndoc\\(?:\\+[^:]+\\)?:") > gnus-group-name-charset-group-alist) > > In addition, just now I noticed it is insufficient to solve the > problem. Maybe we need to do the fix here and there in Gnus to > enable it to work with non-ASCII nndoc group names. The default value of `gnus-group-name-charset-group-alist' is ((".*" . utf-8)), so it should cover all groups, IIUC. Or am I misunderstanding the issue? Why is setting it to nil for nndoc necessary? Is nndoc handled differently than other backends? Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-19 8:33 ` Reiner Steib @ 2006-10-19 9:03 ` Katsumi Yamaoka 2006-10-20 3:39 ` Chong Yidong 2006-10-20 6:04 ` Eli Zaretskii 0 siblings, 2 replies; 45+ messages in thread From: Katsumi Yamaoka @ 2006-10-19 9:03 UTC (permalink / raw) Cc: ding, Zhang Wei >>>>> In <v9slhkn1qs.fsf@marauder.physik.uni-ulm.de> >>>>> Reiner Steib wrote: > On Thu, Oct 19 2006, Katsumi Yamaoka wrote: >> Gnus uses utf-8 encoded non-ASCII group names internally, those >> encoded names are saved in the .newsrc.eld file, and they are >> decoded by utf-8 when displaying. I had no problem when I once >> tried nnrss groups with Japanese names. So, I cannot imagine >> what is happening with Zhang Wei, sorry. > [...] >> (push '("\\`nndoc\\(?:\\+[^:]+\\)?:") >> gnus-group-name-charset-group-alist) >> >> In addition, just now I noticed it is insufficient to solve the >> problem. Maybe we need to do the fix here and there in Gnus to >> enable it to work with non-ASCII nndoc group names. > The default value of `gnus-group-name-charset-group-alist' is ((".*" > . utf-8)), so it should cover all groups, IIUC. Or am I > misunderstanding the issue? > Why is setting it to nil for nndoc necessary? Is nndoc handled > differently than other backends? I figured out a moment ago that that was wrong approach. All group names should be utf-8 encoded for the internal use in Gnus, so the value ((".*" . utf-8)) is necessary and sufficient. IIUC, the difference between nnrss and nndoc is that the former encodes a non-ASCII group name first. Regards, ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-19 9:03 ` Katsumi Yamaoka @ 2006-10-20 3:39 ` Chong Yidong 2006-10-20 4:06 ` Katsumi Yamaoka 2006-10-20 6:04 ` Eli Zaretskii 1 sibling, 1 reply; 45+ messages in thread From: Chong Yidong @ 2006-10-20 3:39 UTC (permalink / raw) Cc: emacs-pretest-bug Katsumi Yamaoka <yamaoka@jpl.org> writes: > I figured out a moment ago that that was wrong approach. All > group names should be utf-8 encoded for the internal use in > Gnus, so the value ((".*" . utf-8)) is necessary and sufficient. > IIUC, the difference between nnrss and nndoc is that the former > encodes a non-ASCII group name first. So what needs to be changed? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-20 3:39 ` Chong Yidong @ 2006-10-20 4:06 ` Katsumi Yamaoka 2006-10-20 5:18 ` Katsumi Yamaoka 0 siblings, 1 reply; 45+ messages in thread From: Katsumi Yamaoka @ 2006-10-20 4:06 UTC (permalink / raw) Cc: emacs-pretest-bug >>>>> In <87u01zd5ba.fsf@furball.mit.edu> Chong Yidong wrote: > Katsumi Yamaoka <yamaoka@jpl.org> writes: >> I figured out a moment ago that that was wrong approach. All >> group names should be utf-8 encoded for the internal use in >> Gnus, so the value ((".*" . utf-8)) is necessary and sufficient. >> IIUC, the difference between nnrss and nndoc is that the former >> encodes a non-ASCII group name first. > So what needs to be changed? I'm now looking into it. However, I think improving of nndoc might not help Zhang Wei because the problem looked caused by the nntp group. So, I'm not urged by myself so much. >>>>> In <ufydnay6j.fsf@gmail.com> >>>>> Zhang Wei <id.brep@gmail.com> wrote: [...] > (setq gnus-newsrc-alist '(("\301\367\320\30799.\261\276\265\330\262\342\312\324" 3 ((1 . 8)) ((seen (1 . 8))))... ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-20 4:06 ` Katsumi Yamaoka @ 2006-10-20 5:18 ` Katsumi Yamaoka 0 siblings, 0 replies; 45+ messages in thread From: Katsumi Yamaoka @ 2006-10-20 5:18 UTC (permalink / raw) Cc: emacs-pretest-bug >>>>> In <b4mods7hbr3.fsf@jpl.org> Katsumi Yamaoka wrote: >>>>>> In <87u01zd5ba.fsf@furball.mit.edu> Chong Yidong wrote: >> So what needs to be changed? > I'm now looking into it. However, I think improving of nndoc > might not help Zhang Wei because the problem looked caused by > the nntp group. So, I'm not urged by myself so much. >>>>>> In <ufydnay6j.fsf@gmail.com> >>>>>> Zhang Wei <id.brep@gmail.com> wrote: > [...] >> (setq gnus-newsrc-alist '(("\301\367\320\30799.\261\276\265\330\262\342\312\324" 3 ((1 . 8)) ((seen (1 . 8))))... The following patch enables Gnus to use non-ASCII names in nndoc groups. I've tested it with the "~/好" file containing mbox data. After some other tests, I will install it to the Gnus trunk and the v5-10 branch. I don't think it solves Zhang Wei's problem anyway, though. I'm unable to test with nntp groups of non-ASCII names, but IIRC, Gnus has been completed to run with those groups a couple of years ago (even if there might still be trivial difficulties). --8<---------------cut here---------------start------------->8--- *** gnus-group.el~ Mon Jul 17 21:52:02 2006 --- gnus-group.el Fri Oct 20 05:15:50 2006 *************** *** 2680,2692 **** (t (setq err (format "%c unknown. " char)) nil)))) (setq type found))) ! (let* ((file (expand-file-name file)) ! (name (gnus-generate-new-group-name ! (gnus-group-prefixed-name ! (file-name-nondirectory file) '(nndoc ""))))) (gnus-group-make-group ! (gnus-group-real-name name) ! (list 'nndoc file (list 'nndoc-address file) (list 'nndoc-article-type (or type 'guess)))))) --- 2680,2697 ---- (t (setq err (format "%c unknown. " char)) nil)))) (setq type found))) ! (setq file (expand-file-name file)) ! (let ((name (gnus-generate-new-group-name ! (gnus-group-prefixed-name ! (file-name-nondirectory file) '(nndoc "")))) ! (encodable (mm-coding-system-p 'utf-8))) (gnus-group-make-group ! (if encodable ! (mm-encode-coding-string (gnus-group-real-name name) 'utf-8) ! (gnus-group-real-name name)) ! (list 'nndoc (if encodable ! (mm-encode-coding-string file 'utf-8) ! file) (list 'nndoc-address file) (list 'nndoc-article-type (or type 'guess)))))) --8<---------------cut here---------------end--------------->8--- ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-19 9:03 ` Katsumi Yamaoka 2006-10-20 3:39 ` Chong Yidong @ 2006-10-20 6:04 ` Eli Zaretskii 2006-10-20 6:21 ` Katsumi Yamaoka 1 sibling, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2006-10-20 6:04 UTC (permalink / raw) Cc: emacs-pretest-bug, id.brep, ding > Date: Thu, 19 Oct 2006 18:03:55 +0900 > From: Katsumi Yamaoka <yamaoka@jpl.org> > Cc: Zhang Wei <id.brep@gmail.com>, ding@gnus.org > > All group names should be utf-8 encoded for the internal use in Gnus I don't know anything about Gnus, but is this sentence really right? Gnus is part of Emacs, and Emacs normally doesn't use encoded strings internally, it only encodes them when it writes them to a file or sends them to a program. Did you perhaps mean ``all group names should use characters from the mule-unicode-* character set''? That would make sense to me. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-20 6:04 ` Eli Zaretskii @ 2006-10-20 6:21 ` Katsumi Yamaoka 2006-10-20 6:38 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Katsumi Yamaoka @ 2006-10-20 6:21 UTC (permalink / raw) Cc: emacs-pretest-bug, id.brep, ding >>>>> In <ubqo7r08q.fsf@gnu.org> Eli Zaretskii wrote: >> Date: Thu, 19 Oct 2006 18:03:55 +0900 >> From: Katsumi Yamaoka <yamaoka@jpl.org> >> Cc: Zhang Wei <id.brep@gmail.com>, ding@gnus.org >> >> All group names should be utf-8 encoded for the internal use in Gnus > I don't know anything about Gnus, but is this sentence really right? > Gnus is part of Emacs, and Emacs normally doesn't use encoded strings > internally, it only encodes them when it writes them to a file or > sends them to a program. > Did you perhaps mean ``all group names should use characters from the > mule-unicode-* character set''? That would make sense to me. No, Gnus uses `(encode-coding-string "name" 'utf-8)' as a group name internally. IIRC, nntp servers understand utf-8 encoded group names. So, someone might have considered making Gnus use them internally is convenient to communicate with nntp servers. I'm not quite sure it is the best way even if the way was easy to enable Gnus to use non-ASCII group names. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-20 6:21 ` Katsumi Yamaoka @ 2006-10-20 6:38 ` Eli Zaretskii 2006-10-20 8:59 ` Katsumi Yamaoka ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Eli Zaretskii @ 2006-10-20 6:38 UTC (permalink / raw) Cc: emacs-pretest-bug, id.brep, ding > Date: Fri, 20 Oct 2006 15:21:53 +0900 > From: Katsumi Yamaoka <yamaoka@jpl.org> > Cc: emacs-pretest-bug@gnu.org, id.brep@gmail.com, ding@gnus.org > > IIRC, nntp servers understand utf-8 encoded group names. So, > someone might have considered making Gnus use them internally is > convenient to communicate with nntp servers. I'd say this design decision will certainly cause subtle bugs, such as the one we are discussing in this thread. I suggest to modify the design to not use encoded strings internally. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-20 6:38 ` Eli Zaretskii @ 2006-10-20 8:59 ` Katsumi Yamaoka 2006-10-21 2:03 ` Richard Stallman 2006-10-20 19:19 ` Stefan Monnier 2006-10-21 1:01 ` Kenichi Handa 2 siblings, 1 reply; 45+ messages in thread From: Katsumi Yamaoka @ 2006-10-20 8:59 UTC (permalink / raw) Cc: emacs-pretest-bug, id.brep, ding >>>>> In <u3b9jqyod.fsf@gnu.org> Eli Zaretskii wrote: >> Date: Fri, 20 Oct 2006 15:21:53 +0900 >> From: Katsumi Yamaoka <yamaoka@jpl.org> >> Cc: emacs-pretest-bug@gnu.org, id.brep@gmail.com, ding@gnus.org >> >> IIRC, nntp servers understand utf-8 encoded group names. So, >> someone might have considered making Gnus use them internally is >> convenient to communicate with nntp servers. > I'd say this design decision will certainly cause subtle bugs, such as > the one we are discussing in this thread. I suggest to modify the > design to not use encoded strings internally. I hastened to change the nndoc code so as to use encoded group names but I agree with you. Though to implement it will take efforts and a long time, I think it is a subject to have to be solved in the future anyway. BTW, I realized that I misunderstood Zhang Wei's case. The group name is encoded by gb2312, not utf-8, as Handa-san wrote. It might be the default of the nntp server that Zhang Wei uses, or the news administrator might have done something wrong. If it is utf-8, Gnus should work (in other words, there is currently no way to enable Gnus to handle gb2312 encoded group names). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-20 8:59 ` Katsumi Yamaoka @ 2006-10-21 2:03 ` Richard Stallman 2006-10-22 23:28 ` Katsumi Yamaoka 0 siblings, 1 reply; 45+ messages in thread From: Richard Stallman @ 2006-10-21 2:03 UTC (permalink / raw) Cc: emacs-pretest-bug, id.brep, ding > I'd say this design decision will certainly cause subtle bugs, such as > the one we are discussing in this thread. I suggest to modify the > design to not use encoded strings internally. I hastened to change the nndoc code so as to use encoded group names but I agree with you. Though to implement it will take efforts and a long time, I think it is a subject to have to be solved in the future anyway. I don't entirely understand that statement. Are you about to fix this now, or do you think it should be delayed? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-21 2:03 ` Richard Stallman @ 2006-10-22 23:28 ` Katsumi Yamaoka 2006-10-23 11:45 ` Richard Stallman 0 siblings, 1 reply; 45+ messages in thread From: Katsumi Yamaoka @ 2006-10-22 23:28 UTC (permalink / raw) Cc: emacs-pretest-bug, id.brep, ding >>>>> In <E1Gb6Cw-0006wp-6I@fencepost.gnu.org> Richard Stallman wrote: >> I'd say this design decision will certainly cause subtle bugs, such as >> the one we are discussing in this thread. I suggest to modify the >> design to not use encoded strings internally. > I hastened to change the nndoc code so as to use encoded group > names but I agree with you. Though to implement it will take > efforts and a long time, I think it is a subject to have to be > solved in the future anyway. > I don't entirely understand that statement. > Are you about to fix this now, or do you think it should be > delayed? I've already fixed the nndoc code in both the Gnus CVS trunk and the v5-10 branch (it will be merged into the Emacs CVS soon). Although I haven't yet changed the handling of non-ASCII group names (that is, Gnus still represents them in the utf-8 encoded style internally), it won't trouble users. I agree with making Gnus encode non-ASCII group names only when communicating with nntp servers, and I (or someone?) will try it in the future. I think it should be done in the Gnus trunk first, and it will take time for coding, testing, and possibly bug fixing. So, importing it into Emacs will probably be inevitably delayed. At the present time, I don't know whether it is days, weeks or years. Regards, ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-22 23:28 ` Katsumi Yamaoka @ 2006-10-23 11:45 ` Richard Stallman 0 siblings, 0 replies; 45+ messages in thread From: Richard Stallman @ 2006-10-23 11:45 UTC (permalink / raw) Cc: emacs-pretest-bug, id.brep, ding I agree with making Gnus encode non-ASCII group names only when communicating with nntp servers, and I (or someone?) will try it in the future. I think it should be done in the Gnus trunk first, and it will take time for coding, testing, and possibly bug fixing. If the existing code works for the users, I'd prefer that we not install a further redesign before the Emacs 22 release. Thanks. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-20 6:38 ` Eli Zaretskii 2006-10-20 8:59 ` Katsumi Yamaoka @ 2006-10-20 19:19 ` Stefan Monnier 2006-10-20 20:30 ` Eli Zaretskii 2006-10-21 1:01 ` Kenichi Handa 2 siblings, 1 reply; 45+ messages in thread From: Stefan Monnier @ 2006-10-20 19:19 UTC (permalink / raw) Cc: Katsumi Yamaoka, emacs-pretest-bug, id.brep, ding >> IIRC, nntp servers understand utf-8 encoded group names. So, >> someone might have considered making Gnus use them internally is >> convenient to communicate with nntp servers. > I'd say this design decision will certainly cause subtle bugs, such as > the one we are discussing in this thread. I suggest to modify the > design to not use encoded strings internally. It could be, although it would make sense to manipulate group names in "encoded" form, in the sense of "not decoded". I.e. keep the group names obtained from the news server in their raw unibyte form, and only decode for display purposes and only encode when the name comes from another place than the server itself. This way, Gnus should be able to (partly) work with arbitrary encodings rather than mandating utf-8. This may also help with problems linked to utf-8 normalization (or lack thereof). Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-20 19:19 ` Stefan Monnier @ 2006-10-20 20:30 ` Eli Zaretskii 2006-10-20 22:06 ` Stefan Monnier 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2006-10-20 20:30 UTC (permalink / raw) Cc: yamaoka, emacs-pretest-bug, id.brep, ding > Cc: Katsumi Yamaoka <yamaoka@jpl.org>, emacs-pretest-bug@gnu.org, > id.brep@gmail.com, ding@gnus.org > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Fri, 20 Oct 2006 15:19:43 -0400 > > > I'd say this design decision will certainly cause subtle bugs, such as > > the one we are discussing in this thread. I suggest to modify the > > design to not use encoded strings internally. > > It could be, although it would make sense to manipulate group names in > "encoded" form, in the sense of "not decoded". It could ``make sense'', but it's IMO a bad idea, since, as we both know, Emacs is not well suited to handling unibyte strings. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-20 20:30 ` Eli Zaretskii @ 2006-10-20 22:06 ` Stefan Monnier 2006-10-21 9:22 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Stefan Monnier @ 2006-10-20 22:06 UTC (permalink / raw) Cc: yamaoka, emacs-pretest-bug, id.brep, ding >> > I'd say this design decision will certainly cause subtle bugs, such as >> > the one we are discussing in this thread. I suggest to modify the >> > design to not use encoded strings internally. >> >> It could be, although it would make sense to manipulate group names in >> "encoded" form, in the sense of "not decoded". > It could ``make sense'', but it's IMO a bad idea, since, as we both > know, Emacs is not well suited to handling unibyte strings. Huh? Unibyte strings are perfectly well supported as far as I know. You have to be careful to remember which strings are unibyte and which are multibyte, so you don't decode multibyte strings or encode unibyte strings, and especially not implicitly (by inserting a unibyte string in a multibyte buffer or vice versa). So if you mean that it requires discipline, then I agree, but otherwise I don't know what you're referring to. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-20 22:06 ` Stefan Monnier @ 2006-10-21 9:22 ` Eli Zaretskii 2006-10-23 3:55 ` Stefan Monnier 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2006-10-21 9:22 UTC (permalink / raw) Cc: emacs-pretest-bug, yamaoka, id.brep, ding > Cc: yamaoka@jpl.org, emacs-pretest-bug@gnu.org, id.brep@gmail.com, > ding@gnus.org > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Fri, 20 Oct 2006 18:06:09 -0400 > > >> It could be, although it would make sense to manipulate group names in > >> "encoded" form, in the sense of "not decoded". > > > It could ``make sense'', but it's IMO a bad idea, since, as we both > > know, Emacs is not well suited to handling unibyte strings. > > Huh? Unibyte strings are perfectly well supported as far as I know. > > You have to be careful to remember which strings are unibyte and which are > multibyte, so you don't decode multibyte strings or encode unibyte strings, > and especially not implicitly (by inserting a unibyte string in a multibyte > buffer or vice versa). So if you mean that it requires discipline, then > I agree, but otherwise I don't know what you're referring to. To me, the second paragraph is precisely the meaning of ``not well suited'' and ``not perfectly supported''. What kind of ``well supported'' is that if I as a programmer need to carry with each string additional information, and make sure I know _exactly_ what primitives are invoked by every function I call, to take care that I don't inadvertently call something that deep inside assumes I passed a multibyte string? That way lies madness. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-21 9:22 ` Eli Zaretskii @ 2006-10-23 3:55 ` Stefan Monnier 2006-10-23 4:16 ` Eli Zaretskii 2006-10-23 11:45 ` Richard Stallman 0 siblings, 2 replies; 45+ messages in thread From: Stefan Monnier @ 2006-10-23 3:55 UTC (permalink / raw) Cc: emacs-pretest-bug, yamaoka, id.brep, ding >> >> It could be, although it would make sense to manipulate group names in >> >> "encoded" form, in the sense of "not decoded". >> >> > It could ``make sense'', but it's IMO a bad idea, since, as we both >> > know, Emacs is not well suited to handling unibyte strings. >> >> Huh? Unibyte strings are perfectly well supported as far as I know. >> >> You have to be careful to remember which strings are unibyte and which are >> multibyte, so you don't decode multibyte strings or encode unibyte strings, >> and especially not implicitly (by inserting a unibyte string in a multibyte >> buffer or vice versa). So if you mean that it requires discipline, then >> I agree, but otherwise I don't know what you're referring to. > To me, the second paragraph is precisely the meaning of ``not well > suited'' and ``not perfectly supported''. What kind of ``well > supported'' is that if I as a programmer need to carry with each > string additional information, and make sure I know _exactly_ what > primitives are invoked by every function I call, to take care that I > don't inadvertently call something that deep inside assumes I passed a > multibyte string? > That way lies madness. Agreed, but note that this problem is as much on the unibyte side as it is on the multibyte side, so that seems to imply that you also thing that Emacs is not well suited to handling multibyte strings. This said, I agree that Emacs should help more. E.g. by signalling an error when trying to insert multibyte text into a unibyte buffer. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-23 3:55 ` Stefan Monnier @ 2006-10-23 4:16 ` Eli Zaretskii 2006-10-23 19:11 ` Stefan Monnier 2006-10-23 11:45 ` Richard Stallman 1 sibling, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2006-10-23 4:16 UTC (permalink / raw) Cc: emacs-pretest-bug, yamaoka, id.brep, ding > Cc: yamaoka@jpl.org, emacs-pretest-bug@gnu.org, id.brep@gmail.com, > ding@gnus.org > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Sun, 22 Oct 2006 23:55:29 -0400 > > Agreed, but note that this problem is as much on the unibyte side as it is > on the multibyte side Not if I never let unibyte strings into my buffers and strings (modulo bugs, of course). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-23 4:16 ` Eli Zaretskii @ 2006-10-23 19:11 ` Stefan Monnier 2006-10-23 20:06 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Stefan Monnier @ 2006-10-23 19:11 UTC (permalink / raw) Cc: emacs-pretest-bug, yamaoka, id.brep, ding >> Agreed, but note that this problem is as much on the unibyte side as it is >> on the multibyte side > Not if I never let unibyte strings into my buffers and strings (modulo > bugs, of course). I don't follow. Not that it matters. My point was simply if you stay 100% within multibyte, it all works, and if you stay 100% in unibyte it all works, and it's only when you mix them two that things don't work. So the problem is neither with unibyte nor with multibyte but with their interaction: the problem takes its root in the conflation of the concept of byte and the concept of char. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-23 19:11 ` Stefan Monnier @ 2006-10-23 20:06 ` Eli Zaretskii 2006-10-23 20:49 ` Stefan Monnier 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2006-10-23 20:06 UTC (permalink / raw) Cc: yamaoka, emacs-pretest-bug, id.brep, ding > Cc: yamaoka@jpl.org, emacs-pretest-bug@gnu.org, id.brep@gmail.com, > ding@gnus.org > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Mon, 23 Oct 2006 15:11:09 -0400 > > My point was simply if you stay 100% within multibyte, it all works, and if > you stay 100% in unibyte it all works The former is true, the latter isn't, AFAIK. ``Normal'' Emacs primitives and subroutines always do TRT with multibyte strings, while with unibyte you need to be careful which ones you call. That was my point, and the case that started this thread is my evidence. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-23 20:06 ` Eli Zaretskii @ 2006-10-23 20:49 ` Stefan Monnier 2006-10-24 4:17 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Stefan Monnier @ 2006-10-23 20:49 UTC (permalink / raw) Cc: emacs-pretest-bug, yamaoka, id.brep, ding >> My point was simply if you stay 100% within multibyte, it all works, and if >> you stay 100% in unibyte it all works > The former is true, the latter isn't, AFAIK. ``Normal'' Emacs > primitives and subroutines always do TRT with multibyte strings, while > with unibyte you need to be careful which ones you call. Care to give an example of what you're thinking about, where purely unibyte strings and buffers are not properly handled? After all, such cases are probably bugs. > That was my point, and the case that started this thread is my evidence. I must have misunderstood because from what I read in this thread I thought the problem was due to the fact that one part of the code is using unibyte strings (for group names) and it's apparently messed up somewhere because it gets mixed with multibyte data. Sorry I misunderstood and went on with a rant. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-23 20:49 ` Stefan Monnier @ 2006-10-24 4:17 ` Eli Zaretskii 2006-10-24 15:22 ` Stefan Monnier 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2006-10-24 4:17 UTC (permalink / raw) Cc: emacs-pretest-bug, yamaoka, id.brep, ding > Cc: yamaoka@jpl.org, emacs-pretest-bug@gnu.org, id.brep@gmail.com, > ding@gnus.org > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Mon, 23 Oct 2006 16:49:59 -0400 > > >> My point was simply if you stay 100% within multibyte, it all works, and if > >> you stay 100% in unibyte it all works > > > The former is true, the latter isn't, AFAIK. ``Normal'' Emacs > > primitives and subroutines always do TRT with multibyte strings, while > > with unibyte you need to be careful which ones you call. > > Care to give an example of what you're thinking about, where purely unibyte > strings and buffers are not properly handled? Are you talking about a unibyte Emacs session? If so, that's not what I had in mind. I'm talking about using unibyte strings in a multibyte session. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-24 4:17 ` Eli Zaretskii @ 2006-10-24 15:22 ` Stefan Monnier 2006-10-24 17:27 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Stefan Monnier @ 2006-10-24 15:22 UTC (permalink / raw) Cc: emacs-pretest-bug, yamaoka, id.brep, ding >> >> My point was simply if you stay 100% within multibyte, it all works, >> >> and if you stay 100% in unibyte it all works >> >> > The former is true, the latter isn't, AFAIK. ``Normal'' Emacs >> > primitives and subroutines always do TRT with multibyte strings, while >> > with unibyte you need to be careful which ones you call. >> >> Care to give an example of what you're thinking about, where purely unibyte >> strings and buffers are not properly handled? > Are you talking about a unibyte Emacs session? If so, that's not what > I had in mind. I'm talking about using unibyte strings in a multibyte > session. I'm not quite sure what is a "unibyte session", but I think "stay 100% in unibyte" is fairly clear: only use unibyte buffers and strings in the relevant code (while other unrelated buffers and strings may be multibyte). So I think we're thinking about the same situation. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-24 15:22 ` Stefan Monnier @ 2006-10-24 17:27 ` Eli Zaretskii 2006-10-24 18:03 ` Stefan Monnier 2006-10-25 18:02 ` Richard Stallman 0 siblings, 2 replies; 45+ messages in thread From: Eli Zaretskii @ 2006-10-24 17:27 UTC (permalink / raw) Cc: emacs-pretest-bug, yamaoka, id.brep, ding > Cc: yamaoka@jpl.org, emacs-pretest-bug@gnu.org, id.brep@gmail.com, > ding@gnus.org > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Tue, 24 Oct 2006 11:22:51 -0400 > > I'm not quite sure what is a "unibyte session" A.k.a. "emacs --unibyte". > but I think "stay 100% in > unibyte" is fairly clear: only use unibyte buffers and strings in the > relevant code (while other unrelated buffers and strings may be multibyte). I think it's practically impossible to use only unibyte buffers for any serious work, and therefore I don't consider this a feasible solution. If one uses the default multibyte session, using unibyte strings is prone to subtle problems as described in this thread. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-24 17:27 ` Eli Zaretskii @ 2006-10-24 18:03 ` Stefan Monnier 2006-10-25 18:02 ` Richard Stallman 1 sibling, 0 replies; 45+ messages in thread From: Stefan Monnier @ 2006-10-24 18:03 UTC (permalink / raw) Cc: emacs-pretest-bug, yamaoka, id.brep, ding >> I'm not quite sure what is a "unibyte session" > A.k.a. "emacs --unibyte". I know that, but I'm not quite sure what it entails. This discussion is within the scope of code such as Gnus's, i.e. code which should work either way. >> but I think "stay 100% in >> unibyte" is fairly clear: only use unibyte buffers and strings in the >> relevant code (while other unrelated buffers and strings may be multibyte). > I think it's practically impossible to use only unibyte buffers for > any serious work, and therefore I don't consider this a feasible > solution. The operative term there is "in the relevant code". E.g. Gnus could easily (as opposed to "practically impossible") use unibyte for all its buffers and strings. It's also very common (and often necessary) to use unibyte buffers and strings to interact with underlying processes or network connections. Typically because the data passed back&forth may use mixes of various encodings. > If one uses the default multibyte session, using unibyte strings is > prone to subtle problems as described in this thread. But those problems are not specific to unibyte, but to the mix of unibyte and multibyte. In most packages such as Gnus it's just as hard/impossible to use only multibyte as it is to use only unibyte. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-24 17:27 ` Eli Zaretskii 2006-10-24 18:03 ` Stefan Monnier @ 2006-10-25 18:02 ` Richard Stallman 2006-10-25 20:22 ` Eli Zaretskii 1 sibling, 1 reply; 45+ messages in thread From: Richard Stallman @ 2006-10-25 18:02 UTC (permalink / raw) Cc: id.brep, emacs-pretest-bug, yamaoka, ding If one uses the default multibyte session, using unibyte strings is prone to subtle problems as described in this thread. I was not following the thread. Could you explain the problem that was encountered? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-25 18:02 ` Richard Stallman @ 2006-10-25 20:22 ` Eli Zaretskii 2006-10-26 8:52 ` Richard Stallman 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2006-10-25 20:22 UTC (permalink / raw) Cc: id.brep, emacs-pretest-bug, yamaoka, ding > From: Richard Stallman <rms@gnu.org> > CC: monnier@iro.umontreal.ca, emacs-pretest-bug@gnu.org, > yamaoka@jpl.org, id.brep@gmail.com, ding@gnus.org > Date: Wed, 25 Oct 2006 14:02:00 -0400 > > If one uses the default multibyte session, using unibyte strings is > prone to subtle problems as described in this thread. > > I was not following the thread. Could you explain the problem > that was encountered? Gnus stored a name of a news group in encoded form. Manipulating that encoded name as a normal Emacs string caused some weird problem (I no more remember the details, but I'm hardly surprised that it happened). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-25 20:22 ` Eli Zaretskii @ 2006-10-26 8:52 ` Richard Stallman 2006-10-27 8:05 ` Eli Zaretskii 0 siblings, 1 reply; 45+ messages in thread From: Richard Stallman @ 2006-10-26 8:52 UTC (permalink / raw) Cc: id.brep, emacs-pretest-bug, yamaoka, ding Gnus stored a name of a news group in encoded form. There is a big difference between unibyte strings and encoded unibyte strings. The latter indeed requires a lot of special care. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-26 8:52 ` Richard Stallman @ 2006-10-27 8:05 ` Eli Zaretskii 2006-10-27 13:33 ` Richard Stallman 0 siblings, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2006-10-27 8:05 UTC (permalink / raw) Cc: id.brep, emacs-pretest-bug, yamaoka, ding > From: Richard Stallman <rms@gnu.org> > CC: monnier@iro.umontreal.ca, emacs-pretest-bug@gnu.org, > yamaoka@jpl.org, id.brep@gmail.com, ding@gnus.org > Date: Thu, 26 Oct 2006 04:52:56 -0400 > > There is a big difference between unibyte strings and encoded unibyte > strings. What is that difference? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-27 8:05 ` Eli Zaretskii @ 2006-10-27 13:33 ` Richard Stallman 2006-10-27 14:27 ` Stefan Monnier 2006-10-28 10:28 ` Eli Zaretskii 0 siblings, 2 replies; 45+ messages in thread From: Richard Stallman @ 2006-10-27 13:33 UTC (permalink / raw) Cc: id.brep, emacs-pretest-bug, yamaoka, ding > There is a big difference between unibyte strings and encoded unibyte > strings. What is that difference? You can represent one of Emacs' supported Latin alphabets in (unencoded) unibyte strings, and Emacs will automatically convert to and from multibyte. However, if you store encoded text in unibyte strings, you are responsible for decoding and encoding when necessary. You have to keep track, everywhere, of whether the data is encoded or not. We implemented the ability to do encoding manually because sometimes it is necessary to decode parts of a file in different ways (e.g., mailboxes). ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-27 13:33 ` Richard Stallman @ 2006-10-27 14:27 ` Stefan Monnier 2006-10-28 18:13 ` Richard Stallman 2006-10-28 10:28 ` Eli Zaretskii 1 sibling, 1 reply; 45+ messages in thread From: Stefan Monnier @ 2006-10-27 14:27 UTC (permalink / raw) Cc: id.brep, emacs-pretest-bug, yamaoka, ding > You can represent one of Emacs' supported Latin alphabets in > (unencoded) unibyte strings, and Emacs will automatically convert to > and from multibyte. And this use was very convenient for Emacs-20 where we wanted to keep some backward compatibility with code that was not MULE-aware. But nowadays any code which relies on this is simply broken, AFAIC, because it'll only work in environments using a iso-8859 encoding (more or less) and will thus be unusable with in asian environments or in utf-8 (which is very quickly taking over the iso-8859 world). > However, if you store encoded text in unibyte strings, you are > responsible for decoding and encoding when necessary. You have to > keep track, everywhere, of whether the data is encoded or not. It's pretty easy to keep track of it: unibyte == encoded, multibyte == decoded. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-27 14:27 ` Stefan Monnier @ 2006-10-28 18:13 ` Richard Stallman 0 siblings, 0 replies; 45+ messages in thread From: Richard Stallman @ 2006-10-28 18:13 UTC (permalink / raw) Cc: id.brep, emacs-pretest-bug, yamaoka, ding > However, if you store encoded text in unibyte strings, you are > responsible for decoding and encoding when necessary. You have to > keep track, everywhere, of whether the data is encoded or not. It's pretty easy to keep track of it: unibyte == encoded, multibyte == decoded. What you're proposing is a convention which a certain program could use internally. It might be a workable convention for some purposes. But it is not automatic, and not required by Emacs. > You can represent one of Emacs' supported Latin alphabets in > (unencoded) unibyte strings, and Emacs will automatically convert to > and from multibyte. And this use was very convenient for Emacs-20 where we wanted to keep some backward compatibility with code that was not MULE-aware. But nowadays any code which relies on this is simply broken, AFAIC, because it'll only work in environments using a iso-8859 encoding (more or less) I think you're mistaken. The conversion between unibyte and multibyte involves internal Emacs characters. It concerns character sets, not coding systems. However, it is true that the use of unibyte strings is only applicable to alphabets such as could be represented in unibyte strings. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-27 13:33 ` Richard Stallman 2006-10-27 14:27 ` Stefan Monnier @ 2006-10-28 10:28 ` Eli Zaretskii 2006-10-29 18:45 ` Richard Stallman 1 sibling, 1 reply; 45+ messages in thread From: Eli Zaretskii @ 2006-10-28 10:28 UTC (permalink / raw) Cc: emacs-pretest-bug, yamaoka, id.brep, ding > From: Richard Stallman <rms@gnu.org> > CC: monnier@iro.umontreal.ca, emacs-pretest-bug@gnu.org, > yamaoka@jpl.org, id.brep@gmail.com, ding@gnus.org > Date: Fri, 27 Oct 2006 09:33:35 -0400 > > > There is a big difference between unibyte strings and encoded unibyte > > strings. > > What is that difference? > > You can represent one of Emacs' supported Latin alphabets in > (unencoded) unibyte strings, and Emacs will automatically convert to > and from multibyte. AFAIK, Latin-N unibyte strings and iso-8859-N text encoded in Latin-N use the same numerical codes for the same characters, so they are indistinguishable. Handa-san, am I right? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-28 10:28 ` Eli Zaretskii @ 2006-10-29 18:45 ` Richard Stallman 0 siblings, 0 replies; 45+ messages in thread From: Richard Stallman @ 2006-10-29 18:45 UTC (permalink / raw) Cc: emacs-pretest-bug, yamaoka, id.brep, ding, handa > You can represent one of Emacs' supported Latin alphabets in > (unencoded) unibyte strings, and Emacs will automatically convert to > and from multibyte. AFAIK, Latin-N unibyte strings and iso-8859-N text encoded in Latin-N use the same numerical codes for the same characters, so they are indistinguishable. I think that is true, but if that's what you're doing, you'll understand it better if you think "unibyte representations of these Emacs characters" rather than "encoded in a coding system". ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-23 3:55 ` Stefan Monnier 2006-10-23 4:16 ` Eli Zaretskii @ 2006-10-23 11:45 ` Richard Stallman 2006-10-23 19:16 ` Stefan Monnier 1 sibling, 1 reply; 45+ messages in thread From: Richard Stallman @ 2006-10-23 11:45 UTC (permalink / raw) Cc: id.brep, emacs-pretest-bug, yamaoka, ding This said, I agree that Emacs should help more. E.g. by signalling an error when trying to insert multibyte text into a unibyte buffer. This operation converts the string to unibyte. It works correctly, provided the characters in that string can be expressed in the unibyte buffer. If people generally agree it would be better to signal an error, we could do that. However, that would cause trouble trying to use M-y to move past multibyte entries in the kill ring to reach the unibyte entry you really want. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-23 11:45 ` Richard Stallman @ 2006-10-23 19:16 ` Stefan Monnier 2006-10-24 17:43 ` Richard Stallman 0 siblings, 1 reply; 45+ messages in thread From: Stefan Monnier @ 2006-10-23 19:16 UTC (permalink / raw) Cc: id.brep, emacs-pretest-bug, yamaoka, ding > This said, I agree that Emacs should help more. E.g. by signalling an > error when trying to insert multibyte text into a unibyte buffer. > This operation converts the string to unibyte. Indeed. Using a default (and poorly specified) encoding method. > It works correctly, provided the characters in that string can be > expressed in the unibyte buffer. But which characters can be expressed is poorly specified. E.g. Tell me which chars can be expressed in a unibyte buffer in a BIG5 locale? > If people generally agree it would be better to signal an error, > we could do that. However, that would cause trouble trying to use > M-y to move past multibyte entries in the kill ring to reach the > unibyte entry you really want. When the insertion is a user-level operation, the elisp code should make sure to manually do the encoding/decoding, using e.g. the default file coding-system. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-23 19:16 ` Stefan Monnier @ 2006-10-24 17:43 ` Richard Stallman 2006-10-24 18:14 ` Stefan Monnier 0 siblings, 1 reply; 45+ messages in thread From: Richard Stallman @ 2006-10-24 17:43 UTC (permalink / raw) Cc: eliz, emacs-pretest-bug, yamaoka, id.brep, ding > It works correctly, provided the characters in that string can be > expressed in the unibyte buffer. But which characters can be expressed is poorly specified. E.g. Tell me which chars can be expressed in a unibyte buffer in a BIG5 locale? Mentioning the locale is somewhat of a red herring, since what controls this conversion is (effectively) nonascii-insert-offset. Mentioning BIG5 is a second red herring. You can't represent Chinese in 8-bit characters, but that is not Emacs' fault. Do you think that we need to document nonascii-insert-offset more prominently? If so, where else should we talk about it? > If people generally agree it would be better to signal an error, > we could do that. However, that would cause trouble trying to use > M-y to move past multibyte entries in the kill ring to reach the > unibyte entry you really want. When the insertion is a user-level operation, the elisp code should make sure to manually do the encoding/decoding, using e.g. the default file coding-system. I don't understand -- could you be more specific? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-24 17:43 ` Richard Stallman @ 2006-10-24 18:14 ` Stefan Monnier 2006-10-25 18:03 ` Richard Stallman 2006-10-25 18:03 ` Richard Stallman 0 siblings, 2 replies; 45+ messages in thread From: Stefan Monnier @ 2006-10-24 18:14 UTC (permalink / raw) Cc: id.brep, emacs-pretest-bug, yamaoka, ding >> It works correctly, provided the characters in that string can be >> expressed in the unibyte buffer. > But which characters can be expressed is poorly specified. E.g. Tell me > which chars can be expressed in a unibyte buffer in a BIG5 locale? > Mentioning the locale is somewhat of a red herring, since what controls > this conversion is (effectively) nonascii-insert-offset. The nonascii-insert-offset and noonascii-translation-table is AFAIK initialized differently depending on the locale (and/or language environment) and users typically don't fidle with that table directly but via their locale setting instead. > Mentioning BIG5 is a second red herring. You can't represent Chinese > in 8-bit characters, but that is not Emacs' fault. Code which implicitly converts text from multibyte to unibyte (and vice versa), using nonascii-*, will presumably be used in all kinds of locales, including BIG5 ones. So knowing what happens in this case is still relevant. > Do you think that we need to document nonascii-insert-offset more > prominently? If so, where else should we talk about it? No, I think we should kill it instead and declare in error any code which tries to use it. It made sense in Emacs-20 when the multibyte support was weaker, but nowadays it just encourages sloppy code which breaks down in different language environments. >> If people generally agree it would be better to signal an error, >> we could do that. However, that would cause trouble trying to use >> M-y to move past multibyte entries in the kill ring to reach the >> unibyte entry you really want. > When the insertion is a user-level operation, the elisp code should make > sure to manually do the encoding/decoding, using e.g. the default file > coding-system. > I don't understand -- could you be more specific? C-y/M-y uses `insert' somewhere internally. My suggestion is to make `insert' signal an error when faced with the need to insert a multibyte string in a unibyte buffer. This doesn't mean that C-y/M-y should propagate this error. Stefan ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-24 18:14 ` Stefan Monnier @ 2006-10-25 18:03 ` Richard Stallman 2006-10-25 18:03 ` Richard Stallman 1 sibling, 0 replies; 45+ messages in thread From: Richard Stallman @ 2006-10-25 18:03 UTC (permalink / raw) Cc: id.brep, emacs-pretest-bug, yamaoka, ding Code which implicitly converts text from multibyte to unibyte (and vice versa), using nonascii-*, will presumably be used in all kinds of locales, including BIG5 ones. So knowing what happens in this case is still relevant. It is not hard to know what happens--that is documented in the Lisp Manual. (Do you think any of it is not clear?) Meanwhile, I think that the presumption of the above text is incorrect. Unibyte text can only handle certain European alphabets. If you use unibyte text, you should make sure to use it only for them. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-24 18:14 ` Stefan Monnier 2006-10-25 18:03 ` Richard Stallman @ 2006-10-25 18:03 ` Richard Stallman 2006-10-27 2:48 ` Kenichi Handa 1 sibling, 1 reply; 45+ messages in thread From: Richard Stallman @ 2006-10-25 18:03 UTC (permalink / raw) Cc: id.brep, emacs-pretest-bug, yamaoka, ding C-y/M-y uses `insert' somewhere internally. My suggestion is to make `insert' signal an error when faced with the need to insert a multibyte string in a unibyte buffer. This doesn't mean that C-y/M-y should propagate this error. That might work. We could try it, after the release. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-25 18:03 ` Richard Stallman @ 2006-10-27 2:48 ` Kenichi Handa 0 siblings, 0 replies; 45+ messages in thread From: Kenichi Handa @ 2006-10-27 2:48 UTC (permalink / raw) Cc: emacs-pretest-bug, id.brep, ding, yamaoka In article <E1Gcn5p-0006bW-Uy@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes: > C-y/M-y uses `insert' somewhere internally. My suggestion is to make > `insert' signal an error when faced with the need to insert a multibyte > string in a unibyte buffer. This doesn't mean that C-y/M-y should propagate > this error. > That might work. We could try it, after the release. Stefan, how about start trying it in emacs-unicode-2 now? I generally agree with your view about unibyte<->multibyte problem. You also proposed to change the current automatic unibyte->multibyte conversion from string-make-multibyte method to string-to-multibyte method a while ago, didn't you? I think that change is good too. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: `.newsrc.eld' saves chinese group name in wrong coding 2006-10-20 6:38 ` Eli Zaretskii 2006-10-20 8:59 ` Katsumi Yamaoka 2006-10-20 19:19 ` Stefan Monnier @ 2006-10-21 1:01 ` Kenichi Handa 2 siblings, 0 replies; 45+ messages in thread From: Kenichi Handa @ 2006-10-21 1:01 UTC (permalink / raw) Cc: emacs-pretest-bug, yamaoka, id.brep, ding In article <u3b9jqyod.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > Date: Fri, 20 Oct 2006 15:21:53 +0900 > > From: Katsumi Yamaoka <yamaoka@jpl.org> > > Cc: emacs-pretest-bug@gnu.org, id.brep@gmail.com, ding@gnus.org > > > > IIRC, nntp servers understand utf-8 encoded group names. So, > > someone might have considered making Gnus use them internally is > > convenient to communicate with nntp servers. > I'd say this design decision will certainly cause subtle bugs, such as > the one we are discussing in this thread. I suggest to modify the > design to not use encoded strings internally. I agree. Keeping around encoded strings quite easily leads to bugs. String/buffer should be encoded only just before writing out. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2006-10-29 18:45 UTC | newest] Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <ufydnay6j.fsf@gmail.com> 2006-10-19 2:54 ` `.newsrc.eld' saves chinese group name in wrong coding Chong Yidong 2006-10-19 3:56 ` Katsumi Yamaoka 2006-10-19 4:11 ` Katsumi Yamaoka 2006-10-19 8:33 ` Reiner Steib 2006-10-19 9:03 ` Katsumi Yamaoka 2006-10-20 3:39 ` Chong Yidong 2006-10-20 4:06 ` Katsumi Yamaoka 2006-10-20 5:18 ` Katsumi Yamaoka 2006-10-20 6:04 ` Eli Zaretskii 2006-10-20 6:21 ` Katsumi Yamaoka 2006-10-20 6:38 ` Eli Zaretskii 2006-10-20 8:59 ` Katsumi Yamaoka 2006-10-21 2:03 ` Richard Stallman 2006-10-22 23:28 ` Katsumi Yamaoka 2006-10-23 11:45 ` Richard Stallman 2006-10-20 19:19 ` Stefan Monnier 2006-10-20 20:30 ` Eli Zaretskii 2006-10-20 22:06 ` Stefan Monnier 2006-10-21 9:22 ` Eli Zaretskii 2006-10-23 3:55 ` Stefan Monnier 2006-10-23 4:16 ` Eli Zaretskii 2006-10-23 19:11 ` Stefan Monnier 2006-10-23 20:06 ` Eli Zaretskii 2006-10-23 20:49 ` Stefan Monnier 2006-10-24 4:17 ` Eli Zaretskii 2006-10-24 15:22 ` Stefan Monnier 2006-10-24 17:27 ` Eli Zaretskii 2006-10-24 18:03 ` Stefan Monnier 2006-10-25 18:02 ` Richard Stallman 2006-10-25 20:22 ` Eli Zaretskii 2006-10-26 8:52 ` Richard Stallman 2006-10-27 8:05 ` Eli Zaretskii 2006-10-27 13:33 ` Richard Stallman 2006-10-27 14:27 ` Stefan Monnier 2006-10-28 18:13 ` Richard Stallman 2006-10-28 10:28 ` Eli Zaretskii 2006-10-29 18:45 ` Richard Stallman 2006-10-23 11:45 ` Richard Stallman 2006-10-23 19:16 ` Stefan Monnier 2006-10-24 17:43 ` Richard Stallman 2006-10-24 18:14 ` Stefan Monnier 2006-10-25 18:03 ` Richard Stallman 2006-10-25 18:03 ` Richard Stallman 2006-10-27 2:48 ` Kenichi Handa 2006-10-21 1:01 ` Kenichi Handa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).