detecting encoding for Japanese

Gnus development mailing list
 help / color / mirror / Atom feed

* detecting encoding for Japanese
@ 2002-08-29  5:48 Hal Snyder
  2002-08-29 10:00 ` Kai Großjohann
  2002-08-29 12:04 ` Simon Josefsson
  0 siblings, 2 replies; 22+ messages in thread
From: Hal Snyder @ 2002-08-29  5:48 UTC (permalink / raw)


I'm using gnus to read a Japanese mailing list in which messages
arrive in various encodings. Messages encoded with iso-2022-jp are
displayed properly. Messages in euc-jp or sjis are displayed as
backslashed octal.

If I save one of these euc-jp or sjis messages with "O f" and visit
the file, the encoding is properly recognized. Setting any of the C-x
RET encoding functions has no effect on messages displayed by gnus.
Variable gnus-group-charset-alist seems only to allow a single
character set per matched group.

This is with GNU Emacs 21.2.1, Oort gnus from cvs 2002-08-29, nnimap
backend.

Any ideas on how to enable encoding recognition when a message is
pulled in from the (nnimap) backend?




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-08-29  5:48 detecting encoding for Japanese Hal Snyder
@ 2002-08-29 10:00 ` Kai Großjohann
  2002-08-29 12:04 ` Simon Josefsson
  1 sibling, 0 replies; 22+ messages in thread
From: Kai Großjohann @ 2002-08-29 10:00 UTC (permalink / raw)
  Cc: Gnus List

Hal Snyder <hal@vailsys.com> writes:

> I'm using gnus to read a Japanese mailing list in which messages
> arrive in various encodings. Messages encoded with iso-2022-jp are
> displayed properly. Messages in euc-jp or sjis are displayed as
> backslashed octal.

Until you find a real solution, you can use `1 g' to specify the
charset for viewing the current article.  (I think this is available
in Oort only.)

kai
-- 
A large number of young women don't trust men with beards.  (BFBS Radio)



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-08-29  5:48 detecting encoding for Japanese Hal Snyder
  2002-08-29 10:00 ` Kai Großjohann
@ 2002-08-29 12:04 ` Simon Josefsson
  2002-08-29 14:08   ` Hal Snyder
  1 sibling, 1 reply; 22+ messages in thread
From: Simon Josefsson @ 2002-08-29 12:04 UTC (permalink / raw)
  Cc: Gnus List

Hal Snyder <hal@vailsys.com> writes:

> I'm using gnus to read a Japanese mailing list in which messages
> arrive in various encodings. Messages encoded with iso-2022-jp are
> displayed properly. Messages in euc-jp or sjis are displayed as
> backslashed octal.

I can't reproduce this.  Sending the following text (euc-jp) works:

	JIS  -- 元気  開発

Could you make a M-x gnus-bug so I can see what configuration you
have?

> If I save one of these euc-jp or sjis messages with "O f" and visit
> the file, the encoding is properly recognized. Setting any of the C-x
> RET encoding functions has no effect on messages displayed by gnus.
> Variable gnus-group-charset-alist seems only to allow a single
> character set per matched group.

g-g-c-a is only the default charset, used when the message doesn't
contain MIME tags.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-08-29 12:04 ` Simon Josefsson
@ 2002-08-29 14:08   ` Hal Snyder
  2002-08-29 16:01     ` Simon Josefsson
  0 siblings, 1 reply; 22+ messages in thread
From: Hal Snyder @ 2002-08-29 14:08 UTC (permalink / raw)


Simon Josefsson <jas@extundo.com> writes:

>> I'm using gnus to read a Japanese mailing list in which messages
>> arrive in various encodings. Messages encoded with iso-2022-jp are
>> displayed properly. Messages in euc-jp or sjis are displayed as
>> backslashed octal.
>
> I can't reproduce this.  Sending the following text (euc-jp) works:
>
> 	JIS  -- 元気  開発
>
> Could you make a M-x gnus-bug so I can see what configuration you
> have?

I'm not at that computer now, but on a somewhat older configuration,
gnus displays your euc text properly. However, your message has the
header

Content-Type: text/plain; charset=euc-jp

The messages I'm dealing with are not so well-behaved.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-08-29 14:08   ` Hal Snyder
@ 2002-08-29 16:01     ` Simon Josefsson
  2002-08-29 16:24       ` Hal Snyder
  0 siblings, 1 reply; 22+ messages in thread
From: Simon Josefsson @ 2002-08-29 16:01 UTC (permalink / raw)
  Cc: Gnus List

Hal Snyder <hal@vailsys.com> writes:

> However, your message has the header
>
> Content-Type: text/plain; charset=euc-jp
>
> The messages I'm dealing with are not so well-behaved.

Aha, then you can use g-g-c-a to set the default for each group.

If more than one untagged encodings is used within a single group, you
must use 1 g or the menubar to input the desired encoding manually.

Or where you asking for a new feature where Gnus used Emacs' builtin
AI to guess what encoding untagged data?  That could perhaps work, but
I don't know how to implement it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-08-29 16:01     ` Simon Josefsson
@ 2002-08-29 16:24       ` Hal Snyder
  2002-08-29 16:59         ` Kai Großjohann
  2002-08-30 10:43         ` Simon Josefsson
  0 siblings, 2 replies; 22+ messages in thread
From: Hal Snyder @ 2002-08-29 16:24 UTC (permalink / raw)

Simon Josefsson <jas@extundo.com> writes:

>> Content-Type: text/plain; charset=euc-jp
>>
>> The messages I'm dealing with are not so well-behaved.
>
> Aha, then you can use g-g-c-a to set the default for each group.
>
> If more than one untagged encodings is used within a single group,
> you must use 1 g or the menubar to input the desired encoding
> manually. 

Yes, multiple untagged encodings appear (sometimes UTF-8, but I guess
I have to wait for Emacs 22 for that). All sorts of users show up on
the list.

> Or where you asking for a new feature where Gnus used Emacs' builtin
> AI to guess what encoding untagged data? That could perhaps work,
> but I don't know how to implement it.

Probably that is indeed what I was asking for. GNU Emacs indeed seems
to know how to guess encoding when a file is visited. The 1g command
suggested by you (and Kai, thanks) helps, but presumes I know the
encoding before switching. As time permits, I'll have a look at the
"builtin AI".

Thank you.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-08-29 16:24       ` Hal Snyder
@ 2002-08-29 16:59         ` Kai Großjohann
  2002-08-30  0:05           ` Katsumi Yamaoka
  2002-08-30 10:43         ` Simon Josefsson
  1 sibling, 1 reply; 22+ messages in thread
From: Kai Großjohann @ 2002-08-29 16:59 UTC (permalink / raw)
  Cc: Gnus List

Hal Snyder <hal@vailsys.com> writes:

> Yes, multiple untagged encodings appear (sometimes UTF-8, but I guess
> I have to wait for Emacs 22 for that). All sorts of users show up on
> the list.

As Dave likes to say, the internal encoding and the ability to edit
Unicode are orthogonal to each other.

Maybe the Unicode support is lacking in the area of CJK, then it
might help to install Mule-UCS.

kai
-- 
A large number of young women don't trust men with beards.  (BFBS Radio)



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-08-29 16:59         ` Kai Großjohann
@ 2002-08-30  0:05           ` Katsumi Yamaoka
  2002-08-30 12:23             ` Kai Großjohann
  0 siblings, 1 reply; 22+ messages in thread
From: Katsumi Yamaoka @ 2002-08-30  0:05 UTC (permalink / raw)


Well, Gnus sometimes encodes Japanese messages with the headers

  Content-Type: text/plain; charset=euc-jp
  Content-Transfer-Encoding: base64
or
  Content-Type: text/plain; charset=euc-jisx0213
  Content-Transfer-Encoding: base64

if a user run Gnus under Emacs 21.x.  The later case will be
occurred if a user employs the jisx0213 module which is included
in the Mule-UCS package.  Though there are no problems technically
and almost MUAs including Gnus can decode and show such messages
correctly, those encodings are not so common in Japan.
So, I recommend Japanese Gnus users customize the option
`mm-coding-system-priorities' to have popular Japanese charsets
as follows:

(setq mm-coding-system-priorities
      '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8))

;; Visit http://www.jpl.org/elips/Gnus-Tips-ja.html for more
;; Japanese Gnus tips (which is written in Japanese, sorry).

By the way, many sendmail MTAs tend to decode base64 or qp to
8bit in the message body arbitrarily.  Here's an example:

  Content-Type: text/plain; charset=euc-jisx0213
  Content-Transfer-Encoding: 8bit
  X-MIME-Autoconverted: from quoted-printable to 8bit by mail.foo.com...

It is anxious about me in whether this does any bad influences.
Hal, isn't such a thing in your mail files?
-- 
Katsumi Yamaoka <yamaoka@jpl.org>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-08-29 16:24       ` Hal Snyder
  2002-08-29 16:59         ` Kai Großjohann
@ 2002-08-30 10:43         ` Simon Josefsson
  2002-08-30 12:25           ` Kai Großjohann
                             ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Simon Josefsson @ 2002-08-30 10:43 UTC (permalink / raw)
  Cc: Gnus List

Hal Snyder <hal@vailsys.com> writes:

>> Aha, then you can use g-g-c-a to set the default for each group.
>>
>> If more than one untagged encodings is used within a single group,
>> you must use 1 g or the menubar to input the desired encoding
>> manually. 
>
> Yes, multiple untagged encodings appear (sometimes UTF-8, but I guess
> I have to wait for Emacs 22 for that). All sorts of users show up on
> the list.

Another solution would be to tell them to use a standards compliant
MUA.  (UTF-8 should work in released Emacs 21's too, I think.)

>> Or where you asking for a new feature where Gnus used Emacs' builtin
>> AI to guess what encoding untagged data? That could perhaps work,
>> but I don't know how to implement it.
>
> Probably that is indeed what I was asking for. GNU Emacs indeed seems
> to know how to guess encoding when a file is visited. The 1g command
> suggested by you (and Kai, thanks) helps, but presumes I know the
> encoding before switching. As time permits, I'll have a look at the
> "builtin AI".

Yes, using the builtin AI for untagged messages seems like the best
compromise.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-08-30  0:05           ` Katsumi Yamaoka
@ 2002-08-30 12:23             ` Kai Großjohann
  2002-09-02 12:16               ` Katsumi Yamaoka
  0 siblings, 1 reply; 22+ messages in thread
From: Kai Großjohann @ 2002-08-30 12:23 UTC (permalink / raw)
  Cc: ding

Katsumi Yamaoka <yamaoka@jpl.org> writes:

> So, I recommend Japanese Gnus users customize the option
> `mm-coding-system-priorities' to have popular Japanese charsets
> as follows:
>
> (setq mm-coding-system-priorities
>       '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8))

Why are euc-jp and euc-jisx0213 missing from this list?  (Maybe they
are different names for an encoding already in the above list?)

Also, if the above change is good for Japanese users, why isn't it
good for everybody else, too?  Do you think that there are
non-Japanese out there who prefer the current behavior?

Maybe it is a good idea to put your change in Gnus.

(Hm.  I wonder what happens for Chinese.  I think that nothing much
will happen, but in case that a Chinese sends their messages in a
Japanese encoding, they would be surprised :-)  I think that Mule
cannot unify Chinese and Japanese characters, so this can never
happen.  But I'm not so sure about Mule, so I'd better ask.)

Another thought: Mule itself also has a priority list of encodings.
So I wonder why does Gnus need another priority list?  Normally, I'd
guess that Japanese would normally configure their Emacs for the right
priorities, and then Gnus should do the right thing automatically.
There could be two reasons why this is not happening: (1) Japanese use
a different encoding in email than in editing files, or (2) the
priorities that Emacs sets up normally do not propagate properly to
Gnus, or (3) Emacs does not set itself up for the right priorities at
all when users setup a Japanese language environment.  Yes, I can't
count...

kai (no email access over the weekend)
-- 
A large number of young women don't trust men with beards.  (BFBS Radio)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-08-30 10:43         ` Simon Josefsson
@ 2002-08-30 12:25           ` Kai Großjohann
  2002-08-30 22:58           ` Hal Snyder
  2002-09-11 10:40           ` Yoshiki Hayashi
  2 siblings, 0 replies; 22+ messages in thread
From: Kai Großjohann @ 2002-08-30 12:25 UTC (permalink / raw)
  Cc: Gnus List

Simon Josefsson <jas@extundo.com> writes:

> Yes, using the builtin AI for untagged messages seems like the best
> compromise.

It should be easy, but some moons ago I asked how to tell Emacs to
auto-detect the encoding of an incoming message.  If there was a
solution, I forgot it :-|

Is there a way to tell Emacs/Gnus to have a look at the mail and to
try to figure out the encoding used in it, and then to decode the
message using this encoding?

kai
-- 
A large number of young women don't trust men with beards.  (BFBS Radio)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-08-30 10:43         ` Simon Josefsson
  2002-08-30 12:25           ` Kai Großjohann
@ 2002-08-30 22:58           ` Hal Snyder
  2002-09-11 10:40           ` Yoshiki Hayashi
  2 siblings, 0 replies; 22+ messages in thread
From: Hal Snyder @ 2002-08-30 22:58 UTC (permalink / raw)

Simon Josefsson <jas@extundo.com> writes:

>> Yes, multiple untagged encodings appear (sometimes UTF-8, but I
>> guess I have to wait for Emacs 22 for that). All sorts of users
>> show up on the list.

> Another solution would be to tell them to use a standards compliant
> MUA.

It's a diverse (all right, newbie) list where we want persons to
participate regardless of MUA clue.

> (UTF-8 should work in released Emacs 21's too, I think.)

I think UTF8 can't be used to save Japanese text from Gnu Emacs 21.2
without extra code such as Mule-UCS-0.84 (<= which I just learned
about from this thread).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-08-30 12:23             ` Kai Großjohann
@ 2002-09-02 12:16               ` Katsumi Yamaoka
  2002-09-02 17:31                 ` Kai Großjohann
  0 siblings, 1 reply; 22+ messages in thread
From: Katsumi Yamaoka @ 2002-09-02 12:16 UTC (permalink / raw)
  Cc: ding

[-- Attachment #1: Type: text/plain, Size: 979 bytes --]

>>>>> In <vaf1y8g5yck.fsf@INBOX.auto.gnus.tok.lucy.cs.uni-dortmund.de>
>>>>>	Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) wrote:

>> So, I recommend Japanese Gnus users customize the option
>> `mm-coding-system-priorities' to have popular Japanese charsets
>> as follows:
>>
>> (setq mm-coding-system-priorities
>>       '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8))

Kai> Why are euc-jp and euc-jisx0213 missing from this list?  (Maybe
Kai> they are different names for an encoding already in the above
Kai> list?)

Yes, there's not euc-jp.  Saying beforehand, I'm not so detailed
to characters.  Before MIME generally spread, iso-2022-jp and
similar 7-bit codes were used in Japan without using any
designator.  It would be the cause that iso-2022-jp is generally
used even now, and any codes other than iso-2022-jp cannot be
read in very old MUAs.

iso-2022-jp-2 is for Japanese extra characters including a part
of Korean symbols like:

[-- Attachment #2: Type: text/plain, Size: 12 bytes --]

♤♡♧♨☎

[-- Attachment #3: Type: text/plain, Size: 1813 bytes --]

shift-jis is the addition for katakana-jisx0201 characters
(which is normally called hankaku-katakana), it is rarely used,
though.  And utf-8 is the addition for characters other than
these.

Kai> Also, if the above change is good for Japanese users, why isn't
Kai> it good for everybody else, too?  Do you think that there are
Kai> non-Japanese out there who prefer the current behavior?

Kai> Maybe it is a good idea to put your change in Gnus.

By surely using this, I can write Japanese and doubtful English.
However, I do not know the other language (also Chinese and
Korean).  Therefore, I cannot judge whether it is proper.

[...]

Kai> Another thought: Mule itself also has a priority list of
Kai> encodings.  So I wonder why does Gnus need another priority list?
Kai> Normally, I'd guess that Japanese would normally configure their
Kai> Emacs for the right priorities, and then Gnus should do the right
Kai> thing automatically.  There could be two reasons why this is not
Kai> happening: (1) Japanese use a different encoding in email than in
Kai> editing files, or (2) the priorities that Emacs sets up normally
Kai> do not propagate properly to Gnus, or (3) Emacs does not set
Kai> itself up for the right priorities at all when users setup a
Kai> Japanese language environment.  Yes, I can't count...

That's a good consideration.  (1) is the main reason.  Though
iso-2022-jp is used for mail messages, euc-jp has mainly been
used in UNIX and DOS has used shift_jis.  Although it will be
different with the system-type, Emacs gives a priority to euc-jp
or shift_jis in general and it is a right way for editing files.
It seems to be a good way that Emacs offers the priority list
for mails apart from the list for files for the specified
language environment.
-- 
Katsumi Yamaoka <yamaoka@jpl.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-09-02 12:16               ` Katsumi Yamaoka
@ 2002-09-02 17:31                 ` Kai Großjohann
  2002-09-02 22:38                   ` Katsumi Yamaoka
  2002-09-03 21:43                   ` Hal Snyder
  0 siblings, 2 replies; 22+ messages in thread
From: Kai Großjohann @ 2002-09-02 17:31 UTC (permalink / raw)
  Cc: ding

Katsumi Yamaoka <yamaoka@jpl.org> writes:

>>>>>> In <vaf1y8g5yck.fsf@INBOX.auto.gnus.tok.lucy.cs.uni-dortmund.de>
>>>>>>	Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) wrote:
>
> Kai> Another thought: Mule itself also has a priority list of
> Kai> encodings.  So I wonder why does Gnus need another priority list?
> Kai> Normally, I'd guess that Japanese would normally configure their
> Kai> Emacs for the right priorities, and then Gnus should do the right
> Kai> thing automatically.  There could be two reasons why this is not
> Kai> happening: (1) Japanese use a different encoding in email than in
> Kai> editing files, or (2) the priorities that Emacs sets up normally
> Kai> do not propagate properly to Gnus, or (3) Emacs does not set
> Kai> itself up for the right priorities at all when users setup a
> Kai> Japanese language environment.  Yes, I can't count...
>
> That's a good consideration.  (1) is the main reason.  Though
> iso-2022-jp is used for mail messages, euc-jp has mainly been
> used in UNIX and DOS has used shift_jis.  Although it will be
> different with the system-type, Emacs gives a priority to euc-jp
> or shift_jis in general and it is a right way for editing files.
> It seems to be a good way that Emacs offers the priority list
> for mails apart from the list for files for the specified
> language environment.

Okay.  So it seems that Japanese users always want what you are
saying.

What do you think about changing the default value to the following
expression?

    (when (string= current-language-environment "Japanese")
      '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8))

We could add a comment saying that we still need to investigate which
values are good for other language environments.

But I wonder if there is a right way to do this?  The right way,
IMHO, would be to use the standard coding system priorities in
principle, except that they are slightly modified to prefer
iso-2022-jp over euc-jp.  Hm.  "emacs -q -no-site-file", then setting
the Japanese language environment, tells me:

/----
| Priority order for recognizing coding systems when reading files:
|   1. iso-2022-jp (alias: junet)
|   2. japanese-iso-8bit (alias: euc-japan-1990 euc-japan euc-jp)
|   3. japanese-shift-jis (alias: shift_jis sjis)
|   4. iso-2022-jp-2 
|   5. iso-latin-1 (alias: iso-8859-1 latin-1)
|   6. iso-2022-7bit 
|   7. iso-2022-8bit-ss2 
|   8. emacs-mule 
|   9. raw-text 
|   10. chinese-big5 (alias: big5 cn-big5)
|   11. no-conversion 
|   12. mule-utf-8 (alias: utf-8)
\----

So it seems that Emacs already prefers iso-2022-jp over euc-jp.  It's
not clear to me where the problem comes from.  Do you get a different
output from M-x describe-coding-system RET RET?

kai
-- 
A large number of young women don't trust men with beards.  (BFBS Radio)



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-09-02 17:31                 ` Kai Großjohann
@ 2002-09-02 22:38                   ` Katsumi Yamaoka
  2002-09-03  1:52                     ` Katsumi Yamaoka
  2002-09-03 21:43                   ` Hal Snyder
  1 sibling, 1 reply; 22+ messages in thread
From: Katsumi Yamaoka @ 2002-09-02 22:38 UTC (permalink / raw)
  Cc: ding

>>>>> In <vaf8z2k8fj0.fsf@INBOX.auto.gnus.tok.lucy.cs.uni-dortmund.de>
>>>>>	Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) wrote:

[...]

Kai> Okay.  So it seems that Japanese users always want what you are
Kai> saying.

Kai> What do you think about changing the default value to the
Kai> following expression?

Kai> (when (string= current-language-environment "Japanese")
Kai> '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8))

Kai> We could add a comment saying that we still need to investigate
Kai> which values are good for other language environments.

That's good.

Kai> But I wonder if there is a right way to do this?  The right way,
Kai> IMHO, would be to use the standard coding system priorities in
Kai> principle, except that they are slightly modified to prefer
Kai> iso-2022-jp over euc-jp.  Hm.  "emacs -q -no-site-file", then
Kai> setting the Japanese language environment, tells me:

Kai>| Priority order for recognizing coding systems when reading
Kai>| files: 1. iso-2022-jp (alias: junet)
Kai>| 2. japanese-iso-8bit (alias: euc-japan-1990 euc-japan euc-jp)
Kai>| 3. japanese-shift-jis (alias: shift_jis sjis)
Kai>| 4. iso-2022-jp-2
Kai>| 5. iso-latin-1 (alias: iso-8859-1 latin-1)

[...]

Although surely it becomes so in almost systems, please see the
function `setup-japanese-environment-internal' doing:

(defun setup-japanese-environment-internal ()
  (cond ((eq system-type 'ms-dos)
	 (prefer-coding-system 'japanese-shift-jis))
	((eq system-type 'usg-unix-v)
	 (prefer-coding-system 'japanese-iso-8bit)))
  [...])

This is defined in language/japan-util.el and which will be
called from `(set-language-environment "Japanese")'.  Because of
this, the coding priority in Solaris will be the order of:

(mapcar 'symbol-value coding-category-list)
 => (japanese-iso-8bit iso-2022-jp japanese-shift-jis
     iso-2022-jp-2 iso-latin-1 iso-2022-7bit iso-2022-8bit-ss2
     emacs-mule raw-text chinese-big5 nil no-conversion ...)

I do not know whether only `usg-unix-v' is special, though.
-- 
Katsumi Yamaoka <yamaoka@jpl.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-09-02 22:38                   ` Katsumi Yamaoka
@ 2002-09-03  1:52                     ` Katsumi Yamaoka
  2002-09-03  2:03                       ` Katsumi Yamaoka
  0 siblings, 1 reply; 22+ messages in thread
From: Katsumi Yamaoka @ 2002-09-03  1:52 UTC (permalink / raw)
  Cc: ding

>>>>> In <yotlwuq4585x.fsf@jpl.org>
>>>>>	Katsumi Yamaoka <yamaoka@jpl.org> wrote:

Kai> (when (string= current-language-environment "Japanese")
Kai> '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8))

Handa-san gave us a good advice in the mule-ja list that Emacs
provides the variable `default-sendmail-coding-system' which has
the different value for the particular locale.  In the Japanese
environment, it is `iso-2022-jp' even if the system-type is
`usg-unix-v'.  It is defined in language/*.el files for the
command `set-language-environment'.  Can it be made the first
candidate of `mm-coding-system-priorities'?  A similar thing is
not in XEmacs, though.
-- 
Katsumi Yamaoka <yamaoka@jpl.org>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-09-03  1:52                     ` Katsumi Yamaoka
@ 2002-09-03  2:03                       ` Katsumi Yamaoka
  2002-09-03  6:19                         ` Katsumi Yamaoka
  0 siblings, 1 reply; 22+ messages in thread
From: Katsumi Yamaoka @ 2002-09-03  2:03 UTC (permalink / raw)
  Cc: ding

>>>>> In <yotlu1l7x2j9.fsf@jpl.org>
>>>>>	Katsumi Yamaoka <yamaoka@jpl.org> wrote:

> Handa-san gave us a good advice in the mule-ja list that Emacs
> provides the variable `default-sendmail-coding-system' ...

> ... Can it be made the first candidate of
> `mm-coding-system-priorities'?

Maybe the following value is much better(?):

(get-language-info current-language-environment 'coding-priority)
-- 
Katsumi Yamaoka <yamaoka@jpl.org>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-09-03  2:03                       ` Katsumi Yamaoka
@ 2002-09-03  6:19                         ` Katsumi Yamaoka
  2002-09-03  6:43                           ` Katsumi Yamaoka
  0 siblings, 1 reply; 22+ messages in thread
From: Katsumi Yamaoka @ 2002-09-03  6:19 UTC (permalink / raw)
  Cc: ding

I'm sorry to bother you so many times.

>>>>> In <yotlofbfx217.fsf@jpl.org>
>>>>>	Katsumi Yamaoka <yamaoka@jpl.org> wrote:

> Maybe the following value is much better(?):

> (get-language-info current-language-environment 'coding-priority)

I noticed the form may not provide a good value.  If a user loads
the jisx0213 module which is included in the Mule-UCS package,
priority will be given to iso-2022-jp-3 or euc-jisx0213.
Probably it is not related to the system-type:

emacs-21.2 -batch -q -no-site-file

(load "jisx0213")
(set-language-environment "Japanese")
(get-language-info current-language-environment coding-priority)
 => (iso-2022-jp-3-compatible utf-8 utf-16-le utf-16-be
     euc-jisx0213 japanese-shift-jisx0213 iso-2022-jp-2)

(require 'mm-util)
(let ((default-enable-multibyte-characters t)
      (mm-coding-system-priorities nil))
  (with-temp-buffer
    (insert "あいうえお")
    (mm-find-mime-charset-region (point-min) (point-max))))
 => (euc-jisx0213)

After all, it is the best to adopt the way which you wrote.

(defcustom mm-coding-system-priorities
  (cond ((string-= current-language-environment "Japanese")
	 '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8)))
  "Preferred coding systems for encoding outgoing mails....")
-- 
Katsumi Yamaoka <yamaoka@jpl.org>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-09-03  6:19                         ` Katsumi Yamaoka
@ 2002-09-03  6:43                           ` Katsumi Yamaoka
  0 siblings, 0 replies; 22+ messages in thread
From: Katsumi Yamaoka @ 2002-09-03  6:43 UTC (permalink / raw)
  Cc: ding

>>>>> In <yotlbs7f4mtz.fsf@jpl.org>
>>>>>	Katsumi Yamaoka <yamaoka@jpl.org> wrote:

> I'm sorry to bother you so many times.

[...]

> (defcustom mm-coding-system-priorities
>   (cond ((string-= current-language-environment "Japanese")
> 	 '(iso-2022-jp iso-2022-jp-2 japanese-shift-jis utf-8)))

I've committed it with a slight modification for non-Mule XEmacs.
-- 
Katsumi Yamaoka <yamaoka@jpl.org>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-09-02 17:31                 ` Kai Großjohann
  2002-09-02 22:38                   ` Katsumi Yamaoka
@ 2002-09-03 21:43                   ` Hal Snyder
  2002-09-03 22:09                     ` Kai Großjohann
  1 sibling, 1 reply; 22+ messages in thread
From: Hal Snyder @ 2002-09-03 21:43 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> But I wonder if there is a right way to do this? The right way,
> IMHO, would be to use the standard coding system priorities in
> principle, except that they are slightly modified to prefer
> iso-2022-jp over euc-jp. Hm. "emacs -q -no-site-file", then setting
> the Japanese language environment, tells me:
>
> /----
> | Priority order for recognizing coding systems when reading files:
> |   1. iso-2022-jp (alias: junet)
> |   2. japanese-iso-8bit (alias: euc-japan-1990 euc-japan euc-jp)
> |   3. japanese-shift-jis (alias: shift_jis sjis)
> |   4. iso-2022-jp-2 
> |   5. iso-latin-1 (alias: iso-8859-1 latin-1)
> |   6. iso-2022-7bit 
> |   7. iso-2022-8bit-ss2 
> |   8. emacs-mule 
> |   9. raw-text 
> |   10. chinese-big5 (alias: big5 cn-big5)
> |   11. no-conversion 
> |   12. mule-utf-8 (alias: utf-8)
> \----
>
> So it seems that Emacs already prefers iso-2022-jp over euc-jp.  It's
> not clear to me where the problem comes from.  Do you get a different
> output from M-x describe-coding-system RET RET?

I don't think it's necessarily an issue of priorities. When there is
an incoming message without Content-type: properly set, it seems that
only iso-2022-jp is tried. The previously mentioned "AI" used when
opening a file, if applied to messages, should find the right encoding
- or at least distinguish iso/euc/sjis/utf8, regardless of priorities.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-09-03 21:43                   ` Hal Snyder
@ 2002-09-03 22:09                     ` Kai Großjohann
  0 siblings, 0 replies; 22+ messages in thread
From: Kai Großjohann @ 2002-09-03 22:09 UTC (permalink / raw)
  Cc: ding

Hal Snyder <hal@vailsys.com> writes:

> I don't think it's necessarily an issue of priorities. When there is
> an incoming message without Content-type: properly set, it seems that
> only iso-2022-jp is tried. The previously mentioned "AI" used when
> opening a file, if applied to messages, should find the right encoding
> - or at least distinguish iso/euc/sjis/utf8, regardless of priorities.

I think we're talking about sending messages, not viewing them.

kai
-- 
A large number of young women don't trust men with beards.  (BFBS Radio)



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: detecting encoding for Japanese
  2002-08-30 10:43         ` Simon Josefsson
  2002-08-30 12:25           ` Kai Großjohann
  2002-08-30 22:58           ` Hal Snyder
@ 2002-09-11 10:40           ` Yoshiki Hayashi
  2 siblings, 0 replies; 22+ messages in thread
From: Yoshiki Hayashi @ 2002-09-11 10:40 UTC (permalink / raw)
  Cc: Gnus List

Simon Josefsson <jas@extundo.com> writes:

>>> Or where you asking for a new feature where Gnus used Emacs' builtin
>>> AI to guess what encoding untagged data? That could perhaps work,
>>> but I don't know how to implement it.
>>
>> Probably that is indeed what I was asking for. GNU Emacs indeed seems
>> to know how to guess encoding when a file is visited. The 1g command
>> suggested by you (and Kai, thanks) helps, but presumes I know the
>> encoding before switching. As time permits, I'll have a look at the
>> "builtin AI".
>
> Yes, using the builtin AI for untagged messages seems like the best
> compromise.

I haven't used Oort so I may be missing something but can't
you simply specify 'undecided' in gnus-group-charset-alist
instead of iso-2022-jp?

-- 
Yoshiki Hayashi



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2002-09-11 10:40 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-29  5:48 detecting encoding for Japanese Hal Snyder
2002-08-29 10:00 ` Kai Großjohann
2002-08-29 12:04 ` Simon Josefsson
2002-08-29 14:08   ` Hal Snyder
2002-08-29 16:01     ` Simon Josefsson
2002-08-29 16:24       ` Hal Snyder
2002-08-29 16:59         ` Kai Großjohann
2002-08-30  0:05           ` Katsumi Yamaoka
2002-08-30 12:23             ` Kai Großjohann
2002-09-02 12:16               ` Katsumi Yamaoka
2002-09-02 17:31                 ` Kai Großjohann
2002-09-02 22:38                   ` Katsumi Yamaoka
2002-09-03  1:52                     ` Katsumi Yamaoka
2002-09-03  2:03                       ` Katsumi Yamaoka
2002-09-03  6:19                         ` Katsumi Yamaoka
2002-09-03  6:43                           ` Katsumi Yamaoka
2002-09-03 21:43                   ` Hal Snyder
2002-09-03 22:09                     ` Kai Großjohann
2002-08-30 10:43         ` Simon Josefsson
2002-08-30 12:25           ` Kai Großjohann
2002-08-30 22:58           ` Hal Snyder
2002-09-11 10:40           ` Yoshiki Hayashi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).