* detecting encoding for Japanese @ 2002-08-29 5:48 Hal Snyder 2002-08-29 10:00 ` Kai Großjohann 2002-08-29 12:04 ` Simon Josefsson 0 siblings, 2 replies; 22+ messages in thread From: Hal Snyder @ 2002-08-29 5:48 UTC (permalink / raw) I'm using gnus to read a Japanese mailing list in which messages arrive in various encodings. Messages encoded with iso-2022-jp are displayed properly. Messages in euc-jp or sjis are displayed as backslashed octal. If I save one of these euc-jp or sjis messages with "O f" and visit the file, the encoding is properly recognized. Setting any of the C-x RET encoding functions has no effect on messages displayed by gnus. Variable gnus-group-charset-alist seems only to allow a single character set per matched group. This is with GNU Emacs 21.2.1, Oort gnus from cvs 2002-08-29, nnimap backend. Any ideas on how to enable encoding recognition when a message is pulled in from the (nnimap) backend? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-08-29 5:48 detecting encoding for Japanese Hal Snyder @ 2002-08-29 10:00 ` Kai Großjohann 2002-08-29 12:04 ` Simon Josefsson 1 sibling, 0 replies; 22+ messages in thread From: Kai Großjohann @ 2002-08-29 10:00 UTC (permalink / raw) Cc: Gnus List Hal Snyder <hal@vailsys.com> writes: > I'm using gnus to read a Japanese mailing list in which messages > arrive in various encodings. Messages encoded with iso-2022-jp are > displayed properly. Messages in euc-jp or sjis are displayed as > backslashed octal. Until you find a real solution, you can use `1 g' to specify the charset for viewing the current article. (I think this is available in Oort only.) kai -- A large number of young women don't trust men with beards. (BFBS Radio) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-08-29 5:48 detecting encoding for Japanese Hal Snyder 2002-08-29 10:00 ` Kai Großjohann @ 2002-08-29 12:04 ` Simon Josefsson 2002-08-29 14:08 ` Hal Snyder 1 sibling, 1 reply; 22+ messages in thread From: Simon Josefsson @ 2002-08-29 12:04 UTC (permalink / raw) Cc: Gnus List Hal Snyder <hal@vailsys.com> writes: > I'm using gnus to read a Japanese mailing list in which messages > arrive in various encodings. Messages encoded with iso-2022-jp are > displayed properly. Messages in euc-jp or sjis are displayed as > backslashed octal. I can't reproduce this. Sending the following text (euc-jp) works: JIS -- 元気 開発 Could you make a M-x gnus-bug so I can see what configuration you have? > If I save one of these euc-jp or sjis messages with "O f" and visit > the file, the encoding is properly recognized. Setting any of the C-x > RET encoding functions has no effect on messages displayed by gnus. > Variable gnus-group-charset-alist seems only to allow a single > character set per matched group. g-g-c-a is only the default charset, used when the message doesn't contain MIME tags. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-08-29 12:04 ` Simon Josefsson @ 2002-08-29 14:08 ` Hal Snyder 2002-08-29 16:01 ` Simon Josefsson 0 siblings, 1 reply; 22+ messages in thread From: Hal Snyder @ 2002-08-29 14:08 UTC (permalink / raw) Simon Josefsson <jas@extundo.com> writes: >> I'm using gnus to read a Japanese mailing list in which messages >> arrive in various encodings. Messages encoded with iso-2022-jp are >> displayed properly. Messages in euc-jp or sjis are displayed as >> backslashed octal. > > I can't reproduce this. Sending the following text (euc-jp) works: > > JIS -- 元気 開発 > > Could you make a M-x gnus-bug so I can see what configuration you > have? I'm not at that computer now, but on a somewhat older configuration, gnus displays your euc text properly. However, your message has the header Content-Type: text/plain; charset=euc-jp The messages I'm dealing with are not so well-behaved. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-08-29 14:08 ` Hal Snyder @ 2002-08-29 16:01 ` Simon Josefsson 2002-08-29 16:24 ` Hal Snyder 0 siblings, 1 reply; 22+ messages in thread From: Simon Josefsson @ 2002-08-29 16:01 UTC (permalink / raw) Cc: Gnus List Hal Snyder <hal@vailsys.com> writes: > However, your message has the header > > Content-Type: text/plain; charset=euc-jp > > The messages I'm dealing with are not so well-behaved. Aha, then you can use g-g-c-a to set the default for each group. If more than one untagged encodings is used within a single group, you must use 1 g or the menubar to input the desired encoding manually. Or where you asking for a new feature where Gnus used Emacs' builtin AI to guess what encoding untagged data? That could perhaps work, but I don't know how to implement it. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-08-29 16:01 ` Simon Josefsson @ 2002-08-29 16:24 ` Hal Snyder 2002-08-29 16:59 ` Kai Großjohann 2002-08-30 10:43 ` Simon Josefsson 0 siblings, 2 replies; 22+ messages in thread From: Hal Snyder @ 2002-08-29 16:24 UTC (permalink / raw) Simon Josefsson <jas@extundo.com> writes: >> Content-Type: text/plain; charset=euc-jp >> >> The messages I'm dealing with are not so well-behaved. > > Aha, then you can use g-g-c-a to set the default for each group. > > If more than one untagged encodings is used within a single group, > you must use 1 g or the menubar to input the desired encoding > manually. Yes, multiple untagged encodings appear (sometimes UTF-8, but I guess I have to wait for Emacs 22 for that). All sorts of users show up on the list. > Or where you asking for a new feature where Gnus used Emacs' builtin > AI to guess what encoding untagged data? That could perhaps work, > but I don't know how to implement it. Probably that is indeed what I was asking for. GNU Emacs indeed seems to know how to guess encoding when a file is visited. The 1g command suggested by you (and Kai, thanks) helps, but presumes I know the encoding before switching. As time permits, I'll have a look at the "builtin AI". Thank you. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-08-29 16:24 ` Hal Snyder @ 2002-08-29 16:59 ` Kai Großjohann 2002-08-30 0:05 ` Katsumi Yamaoka 2002-08-30 10:43 ` Simon Josefsson 1 sibling, 1 reply; 22+ messages in thread From: Kai Großjohann @ 2002-08-29 16:59 UTC (permalink / raw) Cc: Gnus List Hal Snyder <hal@vailsys.com> writes: > Yes, multiple untagged encodings appear (sometimes UTF-8, but I guess > I have to wait for Emacs 22 for that). All sorts of users show up on > the list. As Dave likes to say, the internal encoding and the ability to edit Unicode are orthogonal to each other. Maybe the Unicode support is lacking in the area of CJK, then it might help to install Mule-UCS. kai -- A large number of young women don't trust men with beards. (BFBS Radio) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-08-29 16:59 ` Kai Großjohann @ 2002-08-30 0:05 ` Katsumi Yamaoka 2002-08-30 12:23 ` Kai Großjohann 0 siblings, 1 reply; 22+ messages in thread From: Katsumi Yamaoka @ 2002-08-30 0:05 UTC (permalink / raw) Well, Gnus sometimes encodes Japanese messages with the headers Content-Type: text/plain; charset=euc-jp Content-Transfer-Encoding: base64 or Content-Type: text/plain; charset=euc-jisx0213 Content-Transfer-Encoding: base64 if a user run Gnus under Emacs 21.x. The later case will be occurred if a user employs the jisx0213 module which is included in the Mule-UCS package. Though there are no problems technically and almost MUAs including Gnus can decode and show such messages correctly, those encodings are not so common in Japan. So, I recommend Japanese Gnus users customize the option `mm-coding-system-priorities' to have popular Japanese charsets as follows: (setq mm-coding-system-priorities '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8)) ;; Visit http://www.jpl.org/elips/Gnus-Tips-ja.html for more ;; Japanese Gnus tips (which is written in Japanese, sorry). By the way, many sendmail MTAs tend to decode base64 or qp to 8bit in the message body arbitrarily. Here's an example: Content-Type: text/plain; charset=euc-jisx0213 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.foo.com... It is anxious about me in whether this does any bad influences. Hal, isn't such a thing in your mail files? -- Katsumi Yamaoka <yamaoka@jpl.org> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-08-30 0:05 ` Katsumi Yamaoka @ 2002-08-30 12:23 ` Kai Großjohann 2002-09-02 12:16 ` Katsumi Yamaoka 0 siblings, 1 reply; 22+ messages in thread From: Kai Großjohann @ 2002-08-30 12:23 UTC (permalink / raw) Cc: ding Katsumi Yamaoka <yamaoka@jpl.org> writes: > So, I recommend Japanese Gnus users customize the option > `mm-coding-system-priorities' to have popular Japanese charsets > as follows: > > (setq mm-coding-system-priorities > '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8)) Why are euc-jp and euc-jisx0213 missing from this list? (Maybe they are different names for an encoding already in the above list?) Also, if the above change is good for Japanese users, why isn't it good for everybody else, too? Do you think that there are non-Japanese out there who prefer the current behavior? Maybe it is a good idea to put your change in Gnus. (Hm. I wonder what happens for Chinese. I think that nothing much will happen, but in case that a Chinese sends their messages in a Japanese encoding, they would be surprised :-) I think that Mule cannot unify Chinese and Japanese characters, so this can never happen. But I'm not so sure about Mule, so I'd better ask.) Another thought: Mule itself also has a priority list of encodings. So I wonder why does Gnus need another priority list? Normally, I'd guess that Japanese would normally configure their Emacs for the right priorities, and then Gnus should do the right thing automatically. There could be two reasons why this is not happening: (1) Japanese use a different encoding in email than in editing files, or (2) the priorities that Emacs sets up normally do not propagate properly to Gnus, or (3) Emacs does not set itself up for the right priorities at all when users setup a Japanese language environment. Yes, I can't count... kai (no email access over the weekend) -- A large number of young women don't trust men with beards. (BFBS Radio) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-08-30 12:23 ` Kai Großjohann @ 2002-09-02 12:16 ` Katsumi Yamaoka 2002-09-02 17:31 ` Kai Großjohann 0 siblings, 1 reply; 22+ messages in thread From: Katsumi Yamaoka @ 2002-09-02 12:16 UTC (permalink / raw) Cc: ding [-- Attachment #1: Type: text/plain, Size: 979 bytes --] >>>>> In <vaf1y8g5yck.fsf@INBOX.auto.gnus.tok.lucy.cs.uni-dortmund.de> >>>>> Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) wrote: >> So, I recommend Japanese Gnus users customize the option >> `mm-coding-system-priorities' to have popular Japanese charsets >> as follows: >> >> (setq mm-coding-system-priorities >> '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8)) Kai> Why are euc-jp and euc-jisx0213 missing from this list? (Maybe Kai> they are different names for an encoding already in the above Kai> list?) Yes, there's not euc-jp. Saying beforehand, I'm not so detailed to characters. Before MIME generally spread, iso-2022-jp and similar 7-bit codes were used in Japan without using any designator. It would be the cause that iso-2022-jp is generally used even now, and any codes other than iso-2022-jp cannot be read in very old MUAs. iso-2022-jp-2 is for Japanese extra characters including a part of Korean symbols like: [-- Attachment #2: Type: text/plain, Size: 12 bytes --] ♤♡♧♨☎ [-- Attachment #3: Type: text/plain, Size: 1813 bytes --] shift-jis is the addition for katakana-jisx0201 characters (which is normally called hankaku-katakana), it is rarely used, though. And utf-8 is the addition for characters other than these. Kai> Also, if the above change is good for Japanese users, why isn't Kai> it good for everybody else, too? Do you think that there are Kai> non-Japanese out there who prefer the current behavior? Kai> Maybe it is a good idea to put your change in Gnus. By surely using this, I can write Japanese and doubtful English. However, I do not know the other language (also Chinese and Korean). Therefore, I cannot judge whether it is proper. [...] Kai> Another thought: Mule itself also has a priority list of Kai> encodings. So I wonder why does Gnus need another priority list? Kai> Normally, I'd guess that Japanese would normally configure their Kai> Emacs for the right priorities, and then Gnus should do the right Kai> thing automatically. There could be two reasons why this is not Kai> happening: (1) Japanese use a different encoding in email than in Kai> editing files, or (2) the priorities that Emacs sets up normally Kai> do not propagate properly to Gnus, or (3) Emacs does not set Kai> itself up for the right priorities at all when users setup a Kai> Japanese language environment. Yes, I can't count... That's a good consideration. (1) is the main reason. Though iso-2022-jp is used for mail messages, euc-jp has mainly been used in UNIX and DOS has used shift_jis. Although it will be different with the system-type, Emacs gives a priority to euc-jp or shift_jis in general and it is a right way for editing files. It seems to be a good way that Emacs offers the priority list for mails apart from the list for files for the specified language environment. -- Katsumi Yamaoka <yamaoka@jpl.org> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-09-02 12:16 ` Katsumi Yamaoka @ 2002-09-02 17:31 ` Kai Großjohann 2002-09-02 22:38 ` Katsumi Yamaoka 2002-09-03 21:43 ` Hal Snyder 0 siblings, 2 replies; 22+ messages in thread From: Kai Großjohann @ 2002-09-02 17:31 UTC (permalink / raw) Cc: ding Katsumi Yamaoka <yamaoka@jpl.org> writes: >>>>>> In <vaf1y8g5yck.fsf@INBOX.auto.gnus.tok.lucy.cs.uni-dortmund.de> >>>>>> Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) wrote: > > Kai> Another thought: Mule itself also has a priority list of > Kai> encodings. So I wonder why does Gnus need another priority list? > Kai> Normally, I'd guess that Japanese would normally configure their > Kai> Emacs for the right priorities, and then Gnus should do the right > Kai> thing automatically. There could be two reasons why this is not > Kai> happening: (1) Japanese use a different encoding in email than in > Kai> editing files, or (2) the priorities that Emacs sets up normally > Kai> do not propagate properly to Gnus, or (3) Emacs does not set > Kai> itself up for the right priorities at all when users setup a > Kai> Japanese language environment. Yes, I can't count... > > That's a good consideration. (1) is the main reason. Though > iso-2022-jp is used for mail messages, euc-jp has mainly been > used in UNIX and DOS has used shift_jis. Although it will be > different with the system-type, Emacs gives a priority to euc-jp > or shift_jis in general and it is a right way for editing files. > It seems to be a good way that Emacs offers the priority list > for mails apart from the list for files for the specified > language environment. Okay. So it seems that Japanese users always want what you are saying. What do you think about changing the default value to the following expression? (when (string= current-language-environment "Japanese") '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8)) We could add a comment saying that we still need to investigate which values are good for other language environments. But I wonder if there is a right way to do this? The right way, IMHO, would be to use the standard coding system priorities in principle, except that they are slightly modified to prefer iso-2022-jp over euc-jp. Hm. "emacs -q -no-site-file", then setting the Japanese language environment, tells me: /---- | Priority order for recognizing coding systems when reading files: | 1. iso-2022-jp (alias: junet) | 2. japanese-iso-8bit (alias: euc-japan-1990 euc-japan euc-jp) | 3. japanese-shift-jis (alias: shift_jis sjis) | 4. iso-2022-jp-2 | 5. iso-latin-1 (alias: iso-8859-1 latin-1) | 6. iso-2022-7bit | 7. iso-2022-8bit-ss2 | 8. emacs-mule | 9. raw-text | 10. chinese-big5 (alias: big5 cn-big5) | 11. no-conversion | 12. mule-utf-8 (alias: utf-8) \---- So it seems that Emacs already prefers iso-2022-jp over euc-jp. It's not clear to me where the problem comes from. Do you get a different output from M-x describe-coding-system RET RET? kai -- A large number of young women don't trust men with beards. (BFBS Radio) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-09-02 17:31 ` Kai Großjohann @ 2002-09-02 22:38 ` Katsumi Yamaoka 2002-09-03 1:52 ` Katsumi Yamaoka 2002-09-03 21:43 ` Hal Snyder 1 sibling, 1 reply; 22+ messages in thread From: Katsumi Yamaoka @ 2002-09-02 22:38 UTC (permalink / raw) Cc: ding >>>>> In <vaf8z2k8fj0.fsf@INBOX.auto.gnus.tok.lucy.cs.uni-dortmund.de> >>>>> Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) wrote: [...] Kai> Okay. So it seems that Japanese users always want what you are Kai> saying. Kai> What do you think about changing the default value to the Kai> following expression? Kai> (when (string= current-language-environment "Japanese") Kai> '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8)) Kai> We could add a comment saying that we still need to investigate Kai> which values are good for other language environments. That's good. Kai> But I wonder if there is a right way to do this? The right way, Kai> IMHO, would be to use the standard coding system priorities in Kai> principle, except that they are slightly modified to prefer Kai> iso-2022-jp over euc-jp. Hm. "emacs -q -no-site-file", then Kai> setting the Japanese language environment, tells me: Kai>| Priority order for recognizing coding systems when reading Kai>| files: 1. iso-2022-jp (alias: junet) Kai>| 2. japanese-iso-8bit (alias: euc-japan-1990 euc-japan euc-jp) Kai>| 3. japanese-shift-jis (alias: shift_jis sjis) Kai>| 4. iso-2022-jp-2 Kai>| 5. iso-latin-1 (alias: iso-8859-1 latin-1) [...] Although surely it becomes so in almost systems, please see the function `setup-japanese-environment-internal' doing: (defun setup-japanese-environment-internal () (cond ((eq system-type 'ms-dos) (prefer-coding-system 'japanese-shift-jis)) ((eq system-type 'usg-unix-v) (prefer-coding-system 'japanese-iso-8bit))) [...]) This is defined in language/japan-util.el and which will be called from `(set-language-environment "Japanese")'. Because of this, the coding priority in Solaris will be the order of: (mapcar 'symbol-value coding-category-list) => (japanese-iso-8bit iso-2022-jp japanese-shift-jis iso-2022-jp-2 iso-latin-1 iso-2022-7bit iso-2022-8bit-ss2 emacs-mule raw-text chinese-big5 nil no-conversion ...) I do not know whether only `usg-unix-v' is special, though. -- Katsumi Yamaoka <yamaoka@jpl.org> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-09-02 22:38 ` Katsumi Yamaoka @ 2002-09-03 1:52 ` Katsumi Yamaoka 2002-09-03 2:03 ` Katsumi Yamaoka 0 siblings, 1 reply; 22+ messages in thread From: Katsumi Yamaoka @ 2002-09-03 1:52 UTC (permalink / raw) Cc: ding >>>>> In <yotlwuq4585x.fsf@jpl.org> >>>>> Katsumi Yamaoka <yamaoka@jpl.org> wrote: Kai> (when (string= current-language-environment "Japanese") Kai> '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8)) Handa-san gave us a good advice in the mule-ja list that Emacs provides the variable `default-sendmail-coding-system' which has the different value for the particular locale. In the Japanese environment, it is `iso-2022-jp' even if the system-type is `usg-unix-v'. It is defined in language/*.el files for the command `set-language-environment'. Can it be made the first candidate of `mm-coding-system-priorities'? A similar thing is not in XEmacs, though. -- Katsumi Yamaoka <yamaoka@jpl.org> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-09-03 1:52 ` Katsumi Yamaoka @ 2002-09-03 2:03 ` Katsumi Yamaoka 2002-09-03 6:19 ` Katsumi Yamaoka 0 siblings, 1 reply; 22+ messages in thread From: Katsumi Yamaoka @ 2002-09-03 2:03 UTC (permalink / raw) Cc: ding >>>>> In <yotlu1l7x2j9.fsf@jpl.org> >>>>> Katsumi Yamaoka <yamaoka@jpl.org> wrote: > Handa-san gave us a good advice in the mule-ja list that Emacs > provides the variable `default-sendmail-coding-system' ... > ... Can it be made the first candidate of > `mm-coding-system-priorities'? Maybe the following value is much better(?): (get-language-info current-language-environment 'coding-priority) -- Katsumi Yamaoka <yamaoka@jpl.org> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-09-03 2:03 ` Katsumi Yamaoka @ 2002-09-03 6:19 ` Katsumi Yamaoka 2002-09-03 6:43 ` Katsumi Yamaoka 0 siblings, 1 reply; 22+ messages in thread From: Katsumi Yamaoka @ 2002-09-03 6:19 UTC (permalink / raw) Cc: ding I'm sorry to bother you so many times. >>>>> In <yotlofbfx217.fsf@jpl.org> >>>>> Katsumi Yamaoka <yamaoka@jpl.org> wrote: > Maybe the following value is much better(?): > (get-language-info current-language-environment 'coding-priority) I noticed the form may not provide a good value. If a user loads the jisx0213 module which is included in the Mule-UCS package, priority will be given to iso-2022-jp-3 or euc-jisx0213. Probably it is not related to the system-type: emacs-21.2 -batch -q -no-site-file (load "jisx0213") (set-language-environment "Japanese") (get-language-info current-language-environment coding-priority) => (iso-2022-jp-3-compatible utf-8 utf-16-le utf-16-be euc-jisx0213 japanese-shift-jisx0213 iso-2022-jp-2) (require 'mm-util) (let ((default-enable-multibyte-characters t) (mm-coding-system-priorities nil)) (with-temp-buffer (insert "あいうえお") (mm-find-mime-charset-region (point-min) (point-max)))) => (euc-jisx0213) After all, it is the best to adopt the way which you wrote. (defcustom mm-coding-system-priorities (cond ((string-= current-language-environment "Japanese") '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8))) "Preferred coding systems for encoding outgoing mails....") -- Katsumi Yamaoka <yamaoka@jpl.org> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-09-03 6:19 ` Katsumi Yamaoka @ 2002-09-03 6:43 ` Katsumi Yamaoka 0 siblings, 0 replies; 22+ messages in thread From: Katsumi Yamaoka @ 2002-09-03 6:43 UTC (permalink / raw) Cc: ding >>>>> In <yotlbs7f4mtz.fsf@jpl.org> >>>>> Katsumi Yamaoka <yamaoka@jpl.org> wrote: > I'm sorry to bother you so many times. [...] > (defcustom mm-coding-system-priorities > (cond ((string-= current-language-environment "Japanese") > '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8))) I've committed it with a slight modification for non-Mule XEmacs. -- Katsumi Yamaoka <yamaoka@jpl.org> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-09-02 17:31 ` Kai Großjohann 2002-09-02 22:38 ` Katsumi Yamaoka @ 2002-09-03 21:43 ` Hal Snyder 2002-09-03 22:09 ` Kai Großjohann 1 sibling, 1 reply; 22+ messages in thread From: Hal Snyder @ 2002-09-03 21:43 UTC (permalink / raw) Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes: > But I wonder if there is a right way to do this? The right way, > IMHO, would be to use the standard coding system priorities in > principle, except that they are slightly modified to prefer > iso-2022-jp over euc-jp. Hm. "emacs -q -no-site-file", then setting > the Japanese language environment, tells me: > > /---- > | Priority order for recognizing coding systems when reading files: > | 1. iso-2022-jp (alias: junet) > | 2. japanese-iso-8bit (alias: euc-japan-1990 euc-japan euc-jp) > | 3. japanese-shift-jis (alias: shift_jis sjis) > | 4. iso-2022-jp-2 > | 5. iso-latin-1 (alias: iso-8859-1 latin-1) > | 6. iso-2022-7bit > | 7. iso-2022-8bit-ss2 > | 8. emacs-mule > | 9. raw-text > | 10. chinese-big5 (alias: big5 cn-big5) > | 11. no-conversion > | 12. mule-utf-8 (alias: utf-8) > \---- > > So it seems that Emacs already prefers iso-2022-jp over euc-jp. It's > not clear to me where the problem comes from. Do you get a different > output from M-x describe-coding-system RET RET? I don't think it's necessarily an issue of priorities. When there is an incoming message without Content-type: properly set, it seems that only iso-2022-jp is tried. The previously mentioned "AI" used when opening a file, if applied to messages, should find the right encoding - or at least distinguish iso/euc/sjis/utf8, regardless of priorities. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-09-03 21:43 ` Hal Snyder @ 2002-09-03 22:09 ` Kai Großjohann 0 siblings, 0 replies; 22+ messages in thread From: Kai Großjohann @ 2002-09-03 22:09 UTC (permalink / raw) Cc: ding Hal Snyder <hal@vailsys.com> writes: > I don't think it's necessarily an issue of priorities. When there is > an incoming message without Content-type: properly set, it seems that > only iso-2022-jp is tried. The previously mentioned "AI" used when > opening a file, if applied to messages, should find the right encoding > - or at least distinguish iso/euc/sjis/utf8, regardless of priorities. I think we're talking about sending messages, not viewing them. kai -- A large number of young women don't trust men with beards. (BFBS Radio) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-08-29 16:24 ` Hal Snyder 2002-08-29 16:59 ` Kai Großjohann @ 2002-08-30 10:43 ` Simon Josefsson 2002-08-30 12:25 ` Kai Großjohann ` (2 more replies) 1 sibling, 3 replies; 22+ messages in thread From: Simon Josefsson @ 2002-08-30 10:43 UTC (permalink / raw) Cc: Gnus List Hal Snyder <hal@vailsys.com> writes: >> Aha, then you can use g-g-c-a to set the default for each group. >> >> If more than one untagged encodings is used within a single group, >> you must use 1 g or the menubar to input the desired encoding >> manually. > > Yes, multiple untagged encodings appear (sometimes UTF-8, but I guess > I have to wait for Emacs 22 for that). All sorts of users show up on > the list. Another solution would be to tell them to use a standards compliant MUA. (UTF-8 should work in released Emacs 21's too, I think.) >> Or where you asking for a new feature where Gnus used Emacs' builtin >> AI to guess what encoding untagged data? That could perhaps work, >> but I don't know how to implement it. > > Probably that is indeed what I was asking for. GNU Emacs indeed seems > to know how to guess encoding when a file is visited. The 1g command > suggested by you (and Kai, thanks) helps, but presumes I know the > encoding before switching. As time permits, I'll have a look at the > "builtin AI". Yes, using the builtin AI for untagged messages seems like the best compromise. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-08-30 10:43 ` Simon Josefsson @ 2002-08-30 12:25 ` Kai Großjohann 2002-08-30 22:58 ` Hal Snyder 2002-09-11 10:40 ` Yoshiki Hayashi 2 siblings, 0 replies; 22+ messages in thread From: Kai Großjohann @ 2002-08-30 12:25 UTC (permalink / raw) Cc: Gnus List Simon Josefsson <jas@extundo.com> writes: > Yes, using the builtin AI for untagged messages seems like the best > compromise. It should be easy, but some moons ago I asked how to tell Emacs to auto-detect the encoding of an incoming message. If there was a solution, I forgot it :-| Is there a way to tell Emacs/Gnus to have a look at the mail and to try to figure out the encoding used in it, and then to decode the message using this encoding? kai -- A large number of young women don't trust men with beards. (BFBS Radio) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-08-30 10:43 ` Simon Josefsson 2002-08-30 12:25 ` Kai Großjohann @ 2002-08-30 22:58 ` Hal Snyder 2002-09-11 10:40 ` Yoshiki Hayashi 2 siblings, 0 replies; 22+ messages in thread From: Hal Snyder @ 2002-08-30 22:58 UTC (permalink / raw) Simon Josefsson <jas@extundo.com> writes: >> Yes, multiple untagged encodings appear (sometimes UTF-8, but I >> guess I have to wait for Emacs 22 for that). All sorts of users >> show up on the list. > Another solution would be to tell them to use a standards compliant > MUA. It's a diverse (all right, newbie) list where we want persons to participate regardless of MUA clue. > (UTF-8 should work in released Emacs 21's too, I think.) I think UTF8 can't be used to save Japanese text from Gnu Emacs 21.2 without extra code such as Mule-UCS-0.84 (<= which I just learned about from this thread). ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: detecting encoding for Japanese 2002-08-30 10:43 ` Simon Josefsson 2002-08-30 12:25 ` Kai Großjohann 2002-08-30 22:58 ` Hal Snyder @ 2002-09-11 10:40 ` Yoshiki Hayashi 2 siblings, 0 replies; 22+ messages in thread From: Yoshiki Hayashi @ 2002-09-11 10:40 UTC (permalink / raw) Cc: Gnus List Simon Josefsson <jas@extundo.com> writes: >>> Or where you asking for a new feature where Gnus used Emacs' builtin >>> AI to guess what encoding untagged data? That could perhaps work, >>> but I don't know how to implement it. >> >> Probably that is indeed what I was asking for. GNU Emacs indeed seems >> to know how to guess encoding when a file is visited. The 1g command >> suggested by you (and Kai, thanks) helps, but presumes I know the >> encoding before switching. As time permits, I'll have a look at the >> "builtin AI". > > Yes, using the builtin AI for untagged messages seems like the best > compromise. I haven't used Oort so I may be missing something but can't you simply specify 'undecided' in gnus-group-charset-alist instead of iso-2022-jp? -- Yoshiki Hayashi ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2002-09-11 10:40 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-08-29 5:48 detecting encoding for Japanese Hal Snyder 2002-08-29 10:00 ` Kai Großjohann 2002-08-29 12:04 ` Simon Josefsson 2002-08-29 14:08 ` Hal Snyder 2002-08-29 16:01 ` Simon Josefsson 2002-08-29 16:24 ` Hal Snyder 2002-08-29 16:59 ` Kai Großjohann 2002-08-30 0:05 ` Katsumi Yamaoka 2002-08-30 12:23 ` Kai Großjohann 2002-09-02 12:16 ` Katsumi Yamaoka 2002-09-02 17:31 ` Kai Großjohann 2002-09-02 22:38 ` Katsumi Yamaoka 2002-09-03 1:52 ` Katsumi Yamaoka 2002-09-03 2:03 ` Katsumi Yamaoka 2002-09-03 6:19 ` Katsumi Yamaoka 2002-09-03 6:43 ` Katsumi Yamaoka 2002-09-03 21:43 ` Hal Snyder 2002-09-03 22:09 ` Kai Großjohann 2002-08-30 10:43 ` Simon Josefsson 2002-08-30 12:25 ` Kai Großjohann 2002-08-30 22:58 ` Hal Snyder 2002-09-11 10:40 ` Yoshiki Hayashi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).