gnus should accept UTF8 even if UTF-8 is standard

Gnus development mailing list
 help / color / mirror / Atom feed

* gnus should accept UTF8 even if UTF-8 is standard
@ 2008-10-14 20:30 jidanni
  2008-10-14 20:50 ` Ted Zlatanov
  0 siblings, 1 reply; 72+ messages in thread
From: jidanni @ 2008-10-14 20:30 UTC (permalink / raw)
  To: ding

Gnus cannot deal with
From: =?UTF8?B?5b+g576p5Z+66YeR5pyD?= <cybaby.org@example.net>
From: =?UTF-8?B?5b+g576p5Z+66YeR5pyD?= <cybaby.org@example.net>
One needs  ^a dash here. Mutt has no problem.
Same with utf8 vs. utf-8.

How can one add utf8 and UTF8 as aliases in one's .gnus.el awaiting
your fix to filter down?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-14 20:30 gnus should accept UTF8 even if UTF-8 is standard jidanni
@ 2008-10-14 20:50 ` Ted Zlatanov
  2008-10-15  0:39   ` Kenichi Handa
  0 siblings, 1 reply; 72+ messages in thread
From: Ted Zlatanov @ 2008-10-14 20:50 UTC (permalink / raw)
  Cc: Ding Mailing List, Emacs Development

On Wed, 15 Oct 2008 04:30:10 +0800 jidanni@jidanni.org wrote: 

j> Gnus cannot deal with
j> From: =?UTF8?B?5b+g576p5Z+66YeR5pyD?= <cybaby.org@example.net>
j> From: =?UTF-8?B?5b+g576p5Z+66YeR5pyD?= <cybaby.org@example.net>
j> One needs  ^a dash here. Mutt has no problem.
j> Same with utf8 vs. utf-8.

j> How can one add utf8 and UTF8 as aliases in one's .gnus.el awaiting
j> your fix to filter down?

I thinks this should work.  Can you test it?

(define-coding-system-alias 'utf8 'utf-8)
(define-coding-system-alias 'UTF8 'utf-8)

I think if this is to be fixed, it's in Emacs
(lisp/international/mule-conf.el).  I don't know if it's been discussed
before (I did a search, no results except for some BBDB issues) so I'm
CCing emacs-devel as well.  The fix seems reasonable to me as a user
convenience.  It's very unlikely it will cause problems.  But I don't
know for sure, so please tell me if I'm wrong.

Ted

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-14 20:50 ` Ted Zlatanov
@ 2008-10-15  0:39   ` Kenichi Handa
  2008-10-15 16:10     ` Richard M. Stallman
  2008-10-15 17:32     ` Ted Zlatanov
  0 siblings, 2 replies; 72+ messages in thread
From: Kenichi Handa @ 2008-10-15  0:39 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: ding, emacs-devel

In article <86od1mj3z2.fsf@lifelogs.com>, Ted Zlatanov <tzz@lifelogs.com> writes:

> On Wed, 15 Oct 2008 04:30:10 +0800 jidanni@jidanni.org wrote: 
>>> Gnus cannot deal with
>>> From: =?UTF8?B?5b+g576p5Z+66YeR5pyD?= <cybaby.org@example.net>
>>> From: =?UTF-8?B?5b+g576p5Z+66YeR5pyD?= <cybaby.org@example.net>
>>> One needs  ^a dash here. Mutt has no problem.
>>> Same with utf8 vs. utf-8.

>>> How can one add utf8 and UTF8 as aliases in one's .gnus.el awaiting
>>> your fix to filter down?

> I thinks this should work.  Can you test it?

> (define-coding-system-alias 'utf8 'utf-8)
> (define-coding-system-alias 'UTF8 'utf-8)

> I think if this is to be fixed, it's in Emacs
> (lisp/international/mule-conf.el).  I don't know if it's been discussed
> before (I did a search, no results except for some BBDB issues) so I'm
> CCing emacs-devel as well.  The fix seems reasonable to me as a user
> convenience.  It's very unlikely it will cause problems.  But I don't
> know for sure, so please tell me if I'm wrong.

I agree that defining those aliases causes no problem.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-15  0:39   ` Kenichi Handa
@ 2008-10-15 16:10     ` Richard M. Stallman
  2008-10-15 17:32     ` Ted Zlatanov
  1 sibling, 0 replies; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-15 16:10 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: tzz, ding, emacs-devel

Definitely `utf8' should be a predefined alias.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-15  0:39   ` Kenichi Handa
  2008-10-15 16:10     ` Richard M. Stallman
@ 2008-10-15 17:32     ` Ted Zlatanov
  2008-10-15 19:49       ` Reiner Steib
  1 sibling, 1 reply; 72+ messages in thread
From: Ted Zlatanov @ 2008-10-15 17:32 UTC (permalink / raw)
  To: emacs-devel; +Cc: ding

On Wed, 15 Oct 2008 09:39:41 +0900 Kenichi Handa <handa@m17n.org> wrote: 

KH> In article <86od1mj3z2.fsf@lifelogs.com>, Ted Zlatanov <tzz@lifelogs.com> writes:
>> (define-coding-system-alias 'utf8 'utf-8)
>> (define-coding-system-alias 'UTF8 'utf-8)

KH> I agree that defining those aliases causes no problem.

On Wed, 15 Oct 2008 12:10:13 -0400 "Richard M. Stallman" <rms@gnu.org> wrote: 

RMS> Definitely `utf8' should be a predefined alias.

I've comitted this change.

Ted





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-15 19:49       ` Reiner Steib
@ 2008-10-15 19:05         ` Ted Zlatanov
  2008-10-15 22:03           ` Reiner Steib
  2008-10-16  1:12         ` Kenichi Handa
  1 sibling, 1 reply; 72+ messages in thread
From: Ted Zlatanov @ 2008-10-15 19:05 UTC (permalink / raw)
  To: emacs-devel; +Cc: ding

On Wed, 15 Oct 2008 21:49:21 +0200 Reiner Steib <reinersteib+gmane@imap.cc> wrote: 

RS> On Wed, Oct 15 2008, Ted Zlatanov wrote:
>> On Wed, 15 Oct 2008 09:39:41 +0900 Kenichi Handa <handa@m17n.org> wrote: 
KH> Ted Zlatanov <tzz@lifelogs.com> writes:
>>>> (define-coding-system-alias 'utf8 'utf-8)
>>>> (define-coding-system-alias 'UTF8 'utf-8)
>> 
KH> I agree that defining those aliases causes no problem.

RS> [ See <http://thread.gmane.org/gmane.emacs.gnus.general/66962> for a
RS>   previous discussion on this topic. ]

RS> I seem to recall that (define-coding-system-alias 'utf8 'utf-8) might
RS> trigger Gnus into sending articles with "Content-type: text/plain;
RS> charset=utf8" which would be plain wrong [1][2].  But I'm not sure.

If it does we can fix it (maybe it can prefer the original coding system
before trying its aliases).  The general usage, regardless of Gnus and
the RFCs, is that these are valid synonyms for utf-8, so I think this is
the right thing to do.

RS> For the problem at hand (displaying incoming articles), the user can
RS> type `1 g utf-8 RET' (<menu-bar> <Article> <Display> <View as
RS> different encoding> <utf-8>) or add (utf8 . utf-8) to
RS> `mm-charset-synonym-alist'.

RS> If it becomes a more frequent mistake, we might add (utf8 . utf-8) to
RS> the default value `mm-charset-synonym-alist'.  I hesitate to pollute
RS> `mm-charset-synonym-alist' by adding entries for each and every pilot
RS> error.

It's not a pilot error at this point, the mangled names have been in
common usage for years...

Ted

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-15 17:32     ` Ted Zlatanov
@ 2008-10-15 19:49       ` Reiner Steib
  2008-10-15 19:05         ` Ted Zlatanov
  2008-10-16  1:12         ` Kenichi Handa
  0 siblings, 2 replies; 72+ messages in thread
From: Reiner Steib @ 2008-10-15 19:49 UTC (permalink / raw)
  To: emacs-devel, ding

On Wed, Oct 15 2008, Ted Zlatanov wrote:

> On Wed, 15 Oct 2008 09:39:41 +0900 Kenichi Handa <handa@m17n.org> wrote: 
> KH> Ted Zlatanov <tzz@lifelogs.com> writes:
>>> (define-coding-system-alias 'utf8 'utf-8)
>>> (define-coding-system-alias 'UTF8 'utf-8)
>
> KH> I agree that defining those aliases causes no problem.

[ See <http://thread.gmane.org/gmane.emacs.gnus.general/66962> for a
  previous discussion on this topic. ]

I seem to recall that (define-coding-system-alias 'utf8 'utf-8) might
trigger Gnus into sending articles with "Content-type: text/plain;
charset=utf8" which would be plain wrong [1][2].  But I'm not sure.

For the problem at hand (displaying incoming articles), the user can
type `1 g utf-8 RET' (<menu-bar> <Article> <Display> <View as
different encoding> <utf-8>) or add (utf8 . utf-8) to
`mm-charset-synonym-alist'.

If it becomes a more frequent mistake, we might add (utf8 . utf-8) to
the default value `mm-charset-synonym-alist'.  I hesitate to pollute
`mm-charset-synonym-alist' by adding entries for each and every pilot
error.

Bye, Reiner.

[1] <http://www.faqs.org/rfcs/rfc3629.html>
[2] <http://www.iana.org/assignments/character-sets>
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-15 22:03           ` Reiner Steib
@ 2008-10-15 21:23             ` Ted Zlatanov
  2008-10-16  0:15               ` Katsumi Yamaoka
                                 ` (2 more replies)
  2008-10-16  2:41             ` Stefan Monnier
  1 sibling, 3 replies; 72+ messages in thread
From: Ted Zlatanov @ 2008-10-15 21:23 UTC (permalink / raw)
  To: ding; +Cc: emacs-devel

On Thu, 16 Oct 2008 00:03:28 +0200 Reiner Steib <reinersteib+gmane@imap.cc> wrote: 

RS> On Wed, Oct 15 2008, Ted Zlatanov wrote:
>> Reiner Steib <reinersteib+gmane@imap.cc> wrote: 
>>>>>> (define-coding-system-alias 'utf8 'utf-8)
>>>>>> (define-coding-system-alias 'UTF8 'utf-8)
KH> I agree that defining those aliases causes no problem.

RS> BTW, is the uppercase variant useful?  AFAICS, it is the only
RS> uppercase variant...
RS> -*- mode: grep; default-directory: ".../emacs/lisp/international/" -*-
RS> Grep started at Wed Oct 15 23:58:55

RS> grep -nH -e define-coding-system-alias.*[A-Z] *.el
RS> mule-conf.el:1279:(define-coding-system-alias 'UTF8 'utf-8)

RS> Grep finished (matches found) at Wed Oct 15 23:58:55

It's a common usage, e.g. in locale names.  'UTF-8 is also common and
should be added.

RS> If it becomes a more frequent mistake, we might add (utf8 . utf-8) to
RS> the default value `mm-charset-synonym-alist'.  I hesitate to pollute
RS> `mm-charset-synonym-alist' by adding entries for each and every pilot
RS> error.
>> 
>> It's not a pilot error at this point, 

RS> How do you know?

Because we're talking about Emacs-wide coding system aliases, not the
specific pilot error jidanni is reporting, or the Evolution bug we
discussed back in May.  The goal is to give the user sensible
conveniences as long as we don't compromise the software, and I think
this is a good solution.  If you disagree this is reasonable, I'll defer
to you and the Emacs maintainers as I don't feel too strongly about it.

>> the mangled names have been in common usage for years...

RS> Not in the context of mail/news.  See the little statistics in the
RS> above mentioned thread.

Again, we're talking Emacs-wide now.  I agree that for mail and news
utf-8 is reasonable.

Ted

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-15 19:05         ` Ted Zlatanov
@ 2008-10-15 22:03           ` Reiner Steib
  2008-10-15 21:23             ` Ted Zlatanov
  2008-10-16  2:41             ` Stefan Monnier
  0 siblings, 2 replies; 72+ messages in thread
From: Reiner Steib @ 2008-10-15 22:03 UTC (permalink / raw)
  To: emacs-devel, ding

On Wed, Oct 15 2008, Ted Zlatanov wrote:

> Reiner Steib <reinersteib+gmane@imap.cc> wrote: 
>>>>> (define-coding-system-alias 'utf8 'utf-8)
>>>>> (define-coding-system-alias 'UTF8 'utf-8)
> KH> I agree that defining those aliases causes no problem.

BTW, is the uppercase variant useful?  AFAICS, it is the only
uppercase variant...

--8<---------------cut here---------------start------------->8---
-*- mode: grep; default-directory: ".../emacs/lisp/international/" -*-
Grep started at Wed Oct 15 23:58:55

grep -nH -e define-coding-system-alias.*[A-Z] *.el
mule-conf.el:1279:(define-coding-system-alias 'UTF8 'utf-8)

Grep finished (matches found) at Wed Oct 15 23:58:55
--8<---------------cut here---------------end--------------->8---

> RS> I seem to recall that (define-coding-system-alias 'utf8 'utf-8) might
> RS> trigger Gnus into sending articles with "Content-type: text/plain;
> RS> charset=utf8" which would be plain wrong [1][2].  But I'm not sure.
>
> If it does we can fix it (maybe it can prefer the original coding system
> before trying its aliases).

On a second though, there must be such a check already, since there
are plenty of coding-system alias which aren't MIME charsets.

> RS> If it becomes a more frequent mistake, we might add (utf8 . utf-8) to
> RS> the default value `mm-charset-synonym-alist'.  I hesitate to pollute
> RS> `mm-charset-synonym-alist' by adding entries for each and every pilot
> RS> error.
>
> It's not a pilot error at this point, 

How do you know?

> the mangled names have been in common usage for years...

Not in the context of mail/news.  See the little statistics in the
above mentioned thread.

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-15 21:23             ` Ted Zlatanov
@ 2008-10-16  0:15               ` Katsumi Yamaoka
  2008-10-20 16:23                 ` Reiner Steib
  2008-10-16  4:32               ` Stephen J. Turnbull
  2008-10-16  6:47               ` Eli Zaretskii
  2 siblings, 1 reply; 72+ messages in thread
From: Katsumi Yamaoka @ 2008-10-16  0:15 UTC (permalink / raw)
  To: ding; +Cc: emacs-devel

>>>>> Reiner Steib wrote:
> I seem to recall that (define-coding-system-alias 'utf8 'utf-8) might
> trigger Gnus into sending articles with "Content-type: text/plain;
> charset=utf8" which would be plain wrong [1][2].  But I'm not sure.

I installed the latest Emacs trunk and did some tests with a
message written in Japanese.

1.
(setq mm-coding-system-priorities '(utf8)) is ok.  Messages to
be sent don't seem to be labeled with charset=utf8 .
(setq mm-coding-system-priorities nil) is ok, too.

2.
Type `C-c C-m p text/plain RET' in the message buffer and add
charset=utf8 manually as follows:

< #part type="text/plain" disposition=inline charset=utf8>

In this case the message will be sent with charset=utf8 .
By this result I got to think the system-wide utf8 alias might
be harmful.  The utf8 alias is needed only for reading wrong
configured messages, isn't it?  If so, I think it is enough
using `gnus-summary-show-article-charset-alist'.

cf. (info "(gnus)Paging the Article")

Note that adding (utf8 . utf-8) to `mm-charset-synonym-alist'
causes the same problem (I verified it with Emacs 22.3).

Regards,

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-15 19:49       ` Reiner Steib
  2008-10-15 19:05         ` Ted Zlatanov
@ 2008-10-16  1:12         ` Kenichi Handa
  1 sibling, 0 replies; 72+ messages in thread
From: Kenichi Handa @ 2008-10-16  1:12 UTC (permalink / raw)
  To: Reiner Steib; +Cc: ding, emacs-devel

In article <87d4i1iqpq.fsf@marauder.physik.uni-ulm.de>, Reiner Steib <reinersteib+gmane@imap.cc> writes:

> I seem to recall that (define-coding-system-alias 'utf8 'utf-8) might
> trigger Gnus into sending articles with "Content-type: text/plain;
> charset=utf8" which would be plain wrong [1][2].  But I'm not sure.

For mime-charset name, the value of something like this
should be used:
  (coding-system-get CODING-SYSTEM 'mime-charset)

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-15 22:03           ` Reiner Steib
  2008-10-15 21:23             ` Ted Zlatanov
@ 2008-10-16  2:41             ` Stefan Monnier
  2008-10-16 14:27               ` Richard M. Stallman
  1 sibling, 1 reply; 72+ messages in thread
From: Stefan Monnier @ 2008-10-16  2:41 UTC (permalink / raw)
  To: emacs-devel; +Cc: ding

> BTW, is the uppercase variant useful?  AFAICS, it is the only
> uppercase variant...

I think neither variant is deserved.
Such aliases aren't much use and they become annoying duplicates when
listing coding systems.  I much prefer to add such things punctually
where they're needed.  E.g. in mm-charset-synonym-alist.


        Stefan




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-15 21:23             ` Ted Zlatanov
  2008-10-16  0:15               ` Katsumi Yamaoka
@ 2008-10-16  4:32               ` Stephen J. Turnbull
  2008-10-16  6:47               ` Eli Zaretskii
  2 siblings, 0 replies; 72+ messages in thread
From: Stephen J. Turnbull @ 2008-10-16  4:32 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: ding, emacs-devel

Ted Zlatanov writes:

 > RS> Grep finished (matches found) at Wed Oct 15 23:58:55
 > 
 > It's a common usage, e.g. in locale names.  'UTF-8 is also common and
 > should be added.

It's intended to be a user convenience for typing to iconv and the
like.  It was introduced IIRC by glibc, comparing encoding names by
stripping out non-alphanumerics and canonicalizing case.  However,
internally Emacs should use the IANA registered names, and only use
aliases where one encoding has multiple names for some reason (are
there any?) or where one encoding is implemented as an alias to
another (binary and iso-8859-1-unix).

I agree that users should be allowed the convenience; this can be done
with a `read-mime-charset' or the like, which will determine what the
user wants and produce a canonical encoding name.  I see no good
reason for allowing it to programmers, who will probably spell it "u
M-/ 8" in any case.<wink>  If it's being seen in MIME headers, very
likely it's spam.  Responsible MUAs will use the names registered with
the IANA where available, but spammers tend to disregard RFCs the way
they disregard all other civilized usage.  Emacs should not emulate
them.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-15 21:23             ` Ted Zlatanov
  2008-10-16  0:15               ` Katsumi Yamaoka
  2008-10-16  4:32               ` Stephen J. Turnbull
@ 2008-10-16  6:47               ` Eli Zaretskii
  2008-10-16 13:01                 ` Ted Zlatanov
  2 siblings, 1 reply; 72+ messages in thread
From: Eli Zaretskii @ 2008-10-16  6:47 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: ding, emacs-devel

> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Wed, 15 Oct 2008 16:23:58 -0500
> Cc: ding@gnus.org
> 
> On Thu, 16 Oct 2008 00:03:28 +0200 Reiner Steib <reinersteib+gmane@imap.cc> wrote: 
> 
> RS> On Wed, Oct 15 2008, Ted Zlatanov wrote:
> >> Reiner Steib <reinersteib+gmane@imap.cc> wrote: 
> >>>>>> (define-coding-system-alias 'utf8 'utf-8)
> >>>>>> (define-coding-system-alias 'UTF8 'utf-8)
> KH> I agree that defining those aliases causes no problem.
> 
> RS> BTW, is the uppercase variant useful?  AFAICS, it is the only
> RS> uppercase variant...
> RS> -*- mode: grep; default-directory: ".../emacs/lisp/international/" -*-
> RS> Grep started at Wed Oct 15 23:58:55
> 
> RS> grep -nH -e define-coding-system-alias.*[A-Z] *.el
> RS> mule-conf.el:1279:(define-coding-system-alias 'UTF8 'utf-8)
> 
> RS> Grep finished (matches found) at Wed Oct 15 23:58:55
> 
> It's a common usage, e.g. in locale names.  'UTF-8 is also common and
> should be added.

It's common usage, allright, but it need not be added, because code
that gleans a coding-system from file metadata and email messages
should always downcase the encoding before intern-ing it.  If some
code doesn't downcase, that's a bug that needs to be fixed.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-16  6:47               ` Eli Zaretskii
@ 2008-10-16 13:01                 ` Ted Zlatanov
  0 siblings, 0 replies; 72+ messages in thread
From: Ted Zlatanov @ 2008-10-16 13:01 UTC (permalink / raw)
  To: emacs-devel; +Cc: ding

On Thu, 16 Oct 2008 08:47:49 +0200 Eli Zaretskii <eliz@gnu.org> wrote: 

>> From: Ted Zlatanov <tzz@lifelogs.com>
>> It's a common usage, e.g. in locale names.  'UTF-8 is also common and
>> should be added.

EZ> It's common usage, allright, but it need not be added, because code
EZ> that gleans a coding-system from file metadata and email messages
EZ> should always downcase the encoding before intern-ing it.  If some
EZ> code doesn't downcase, that's a bug that needs to be fixed.

On Thu, 16 Oct 2008 13:32:11 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote: 

SJT> It's intended to be a user convenience for typing to iconv and the
SJT> like.  It was introduced IIRC by glibc, comparing encoding names by
SJT> stripping out non-alphanumerics and canonicalizing case.  However,
SJT> internally Emacs should use the IANA registered names, and only use
SJT> aliases where one encoding has multiple names for some reason (are
SJT> there any?) or where one encoding is implemented as an alias to
SJT> another (binary and iso-8859-1-unix).

SJT> I agree that users should be allowed the convenience; this can be done
SJT> with a `read-mime-charset' or the like, which will determine what the
SJT> user wants and produce a canonical encoding name.  I see no good
SJT> reason for allowing it to programmers, who will probably spell it "u
SJT> M-/ 8" in any case.<wink>  If it's being seen in MIME headers, very
SJT> likely it's spam.  Responsible MUAs will use the names registered with
SJT> the IANA where available, but spammers tend to disregard RFCs the way
SJT> they disregard all other civilized usage.  Emacs should not emulate
SJT> them.

On Wed, 15 Oct 2008 22:41:29 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: 

>> BTW, is the uppercase variant useful?  AFAICS, it is the only
>> uppercase variant...

SM> I think neither variant is deserved.
SM> Such aliases aren't much use and they become annoying duplicates when
SM> listing coding systems.  I much prefer to add such things punctually
SM> where they're needed.  E.g. in mm-charset-synonym-alist.

OK, I've removed the aliases.  Thanks, everyone, for the helpful information.

Ted

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-16  2:41             ` Stefan Monnier
@ 2008-10-16 14:27               ` Richard M. Stallman
  2008-10-16 15:41                 ` Stefan Monnier
  2008-10-20 16:00                 ` Reiner Steib
  0 siblings, 2 replies; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-16 14:27 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: ding, emacs-devel

    Such aliases aren't much use and they become annoying duplicates when
    listing coding systems.

That is no reason to omit the aliases people use.
You could add a way to mark aliases so they don't appear in that list.

What's the advantage of giving the user an error when he types `utf8'?
Why not do what he wants?





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-16 14:27               ` Richard M. Stallman
@ 2008-10-16 15:41                 ` Stefan Monnier
  2008-10-16 17:47                   ` Eli Zaretskii
  2008-10-17 19:59                   ` Richard M. Stallman
  2008-10-20 16:00                 ` Reiner Steib
  1 sibling, 2 replies; 72+ messages in thread
From: Stefan Monnier @ 2008-10-16 15:41 UTC (permalink / raw)
  To: rms; +Cc: ding, emacs-devel

>     Such aliases aren't much use and they become annoying duplicates when
>     listing coding systems.

> That is no reason to omit the aliases people use.
> You could add a way to mark aliases so they don't appear in that list.

> What's the advantage of giving the user an error when he types `utf8'?
> Why not do what he wants?

We can do that without adding a coding-system-alias.


        Stefan




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-16 15:41                 ` Stefan Monnier
@ 2008-10-16 17:47                   ` Eli Zaretskii
  2008-10-17 19:59                   ` Richard M. Stallman
  1 sibling, 0 replies; 72+ messages in thread
From: Eli Zaretskii @ 2008-10-16 17:47 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: rms, ding, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Thu, 16 Oct 2008 11:41:44 -0400
> Cc: ding@gnus.org, emacs-devel@gnu.org
> 
> > What's the advantage of giving the user an error when he types `utf8'?
> > Why not do what he wants?
> 
> We can do that without adding a coding-system-alias.

Right, I agree that if we can do TRT without adding an alias, that is
better.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-16 15:41                 ` Stefan Monnier
  2008-10-16 17:47                   ` Eli Zaretskii
@ 2008-10-17 19:59                   ` Richard M. Stallman
  2008-10-18 19:01                     ` Stefan Monnier
  1 sibling, 1 reply; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-17 19:59 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel, ding

    > What's the advantage of giving the user an error when he types `utf8'?
    > Why not do what he wants?

    We can do that without adding a coding-system-alias.

How do you propose to do it?



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-17 19:59                   ` Richard M. Stallman
@ 2008-10-18 19:01                     ` Stefan Monnier
  2008-10-20  1:14                       ` Richard M. Stallman
  0 siblings, 1 reply; 72+ messages in thread
From: Stefan Monnier @ 2008-10-18 19:01 UTC (permalink / raw)
  To: rms; +Cc: ding, emacs-devel

>> What's the advantage of giving the user an error when he types `utf8'?
>> Why not do what he wants?

>     We can do that without adding a coding-system-alias.

> How do you propose to do it?

Depends on the particular case you're thinking of.  I already proposed
mm-charset-synonym-alist for the case of charsets in MIME messages.


        Stefan




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-18 19:01                     ` Stefan Monnier
@ 2008-10-20  1:14                       ` Richard M. Stallman
  2008-10-20  3:21                         ` Stefan Monnier
  0 siblings, 1 reply; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-20  1:14 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: ding, emacs-devel

    Depends on the particular case you're thinking of.  I already proposed
    mm-charset-synonym-alist for the case of charsets in MIME messages.

That would work only when the character set is specified in a MIME
message, and only for Gnus.  It would be much better to fix the 
same problem in a general way that applies for all the places
in Emacs that you can specify a coding system.  Why fix just
part of the problem?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-20  1:14                       ` Richard M. Stallman
@ 2008-10-20  3:21                         ` Stefan Monnier
  2008-10-20  8:42                           ` Eli Zaretskii
  2008-10-20 17:04                           ` Richard M. Stallman
  0 siblings, 2 replies; 72+ messages in thread
From: Stefan Monnier @ 2008-10-20  3:21 UTC (permalink / raw)
  To: rms; +Cc: ding, emacs-devel

>     Depends on the particular case you're thinking of.  I already proposed
>     mm-charset-synonym-alist for the case of charsets in MIME messages.

> That would work only when the character set is specified in a MIME
> message, and only for Gnus.  It would be much better to fix the 
> same problem in a general way that applies for all the places
> in Emacs that you can specify a coding system.  Why fix just
> part of the problem?

Yes, we should probably provide at least a function that we could call
`get-coding-system' and which would take a string and return
a coding-systm or nil.  It would then take care of ignoring case
differences a well as addition/removal of dash/underscore/etc.. chars.


        Stefan




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-20  3:21                         ` Stefan Monnier
@ 2008-10-20  8:42                           ` Eli Zaretskii
  2008-10-20 17:04                           ` Richard M. Stallman
  1 sibling, 0 replies; 72+ messages in thread
From: Eli Zaretskii @ 2008-10-20  8:42 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: rms, ding, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sun, 19 Oct 2008 23:21:03 -0400
> Cc: ding@gnus.org, emacs-devel@gnu.org
> 
> >     Depends on the particular case you're thinking of.  I already proposed
> >     mm-charset-synonym-alist for the case of charsets in MIME messages.
> 
> > That would work only when the character set is specified in a MIME
> > message, and only for Gnus.  It would be much better to fix the 
> > same problem in a general way that applies for all the places
> > in Emacs that you can specify a coding system.  Why fix just
> > part of the problem?
> 
> Yes, we should probably provide at least a function that we could call
> `get-coding-system' and which would take a string and return
> a coding-systm or nil.

I suggest not to use `get' in the name, since that tends to hint we
are working with some property list or alist, similarly to
`get-language-info'.  Maybe `coding-system-for-charset'?



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-16 14:27               ` Richard M. Stallman
  2008-10-16 15:41                 ` Stefan Monnier
@ 2008-10-20 16:00                 ` Reiner Steib
  2008-10-20 22:03                   ` Richard M. Stallman
  1 sibling, 1 reply; 72+ messages in thread
From: Reiner Steib @ 2008-10-20 16:00 UTC (permalink / raw)
  To: Richard Stallman; +Cc: ding, emacs-devel

On Thu, Oct 16 2008, Richard M. Stallman wrote:

>     Such aliases aren't much use and they become annoying duplicates when
>     listing coding systems.
>
> That is no reason to omit the aliases people use.
> You could add a way to mark aliases so they don't appear in that list.
>
> What's the advantage of giving the user an error when he types `utf8'?
> Why not do what he wants?

Which user interaction are you thinking of?  (If there's a broader
need or expectation for this, we would probably have got some user
request already for Emacs 21 and 22.)

As mentioned before, for the initial problem we already have
`mm-charset-synonym-alist' (for Gnus or maybe other packages).

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-16  0:15               ` Katsumi Yamaoka
@ 2008-10-20 16:23                 ` Reiner Steib
  2008-10-21  0:01                   ` Katsumi Yamaoka
  0 siblings, 1 reply; 72+ messages in thread
From: Reiner Steib @ 2008-10-20 16:23 UTC (permalink / raw)
  To: ding, emacs-devel

On Thu, Oct 16 2008, Katsumi Yamaoka wrote:

> 2.
> Type `C-c C-m p text/plain RET' in the message buffer and add
> charset=utf8 manually as follows:
>
> < #part type="text/plain" disposition=inline charset=utf8>
>
> In this case the message will be sent with charset=utf8 .
> By this result I got to think the system-wide utf8 alias might
> be harmful.  

Doesn't this already happen when no alias is present (or when using
some bogus charset, say "bogus-8").

[quoting reordered]
> Note that adding (utf8 . utf-8) to `mm-charset-synonym-alist'
> causes the same problem (I verified it with Emacs 22.3).

I'm surprised that `mm-charset-synonym-alist' has any effect on
outgoing messages.  I though it would only be used when displaying an
article.  Could you please investigate this?

> The utf8 alias is needed only for reading wrong configured messages,
> isn't it?  If so, I think it is enough using
> `gnus-summary-show-article-charset-alist'.
> cf. (info "(gnus)Paging the Article")

Maybe we should consider providing some entries in
`gnus-summary-show-article-charset-alist' by default.

Mail-Followup-To: ding@gnus.org, since this is mostly Gnus-related.

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-20  3:21                         ` Stefan Monnier
  2008-10-20  8:42                           ` Eli Zaretskii
@ 2008-10-20 17:04                           ` Richard M. Stallman
  2008-10-21  4:39                             ` Stephen J. Turnbull
  1 sibling, 1 reply; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-20 17:04 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: ding, emacs-devel

    Yes, we should probably provide at least a function that we could call
    `get-coding-system' and which would take a string and return
    a coding-systm or nil.  It would then take care of ignoring case
    differences a well as addition/removal of dash/underscore/etc.. chars.

That approach sounds good.



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-20 16:00                 ` Reiner Steib
@ 2008-10-20 22:03                   ` Richard M. Stallman
  2008-10-21  2:50                     ` Kenichi Handa
  0 siblings, 1 reply; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-20 22:03 UTC (permalink / raw)
  To: Reiner Steib; +Cc: ding, emacs-devel

    > What's the advantage of giving the user an error when he types `utf8'?
    > Why not do what he wants?

    Which user interaction are you thinking of?

C-x RET c utf8 RET



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-20 16:23                 ` Reiner Steib
@ 2008-10-21  0:01                   ` Katsumi Yamaoka
  2008-12-15 23:35                     ` Katsumi Yamaoka
  0 siblings, 1 reply; 72+ messages in thread
From: Katsumi Yamaoka @ 2008-10-21  0:01 UTC (permalink / raw)
  To: ding

>>>>> Reiner Steib wrote:
> On Thu, Oct 16 2008, Katsumi Yamaoka wrote:
>> Note that adding (utf8 . utf-8) to `mm-charset-synonym-alist'
>> causes the same problem (I verified it with Emacs 22.3).

> I'm surprised that `mm-charset-synonym-alist' has any effect on
> outgoing messages.  I though it would only be used when displaying an
> article.  Could you please investigate this?

That is due to `mml-generate-mime-1' that uses the charset specified
as is (for encoding, it uses the valid coding system derived from
that charset according to `mm-charset-synonym-alist', though).
It's a bug, I have two solutions, and I like the second one:

1. Bind `mm-charset-synonym-alist' (and `mm-charset-eval-alist')
   to nil when encoding.  And signal an error if the charset is
   invalid.

2. Replace the charset that is an alias with the valid one that
   Emacs knows.  Although it violates the idea that "the charset
   aliases would be used only when displaying an article", it
   will be convenience for users.

WDYT?  The patches for 1. and 2. are below:

Patch1:
--8<---------------cut here---------------start------------->8---
--- mml.el~	2008-10-03 05:47:11 +0000
+++ mml.el	2008-10-20 23:58:29 +0000
@@ -476,8 +476,11 @@
 				 "application/octet-stream")
 			   "text/plain")))
 	       (charset (cdr (assq 'charset cont)))
-	       (coding (mm-charset-to-coding-system charset))
+	       (coding (let (mm-charset-eval-alist mm-charset-synonym-alist)
+			 (mm-charset-to-coding-system charset)))
 	       encoding flowed coded)
+	  (unless coding
+	    (error "Unknown charset: %s" charset))
 	  (cond ((eq coding 'ascii)
 		 (setq charset nil
 		       coding nil))
--8<---------------cut here---------------end--------------->8---

Patch2:
--8<---------------cut here---------------start------------->8---
--- mml.el~	2008-10-03 05:47:11 +0000
+++ mml.el	2008-10-20 23:58:29 +0000
@@ -482,7 +482,12 @@
 		 (setq charset nil
 		       coding nil))
 		(charset
-		 (setq charset (intern (downcase charset)))))
+		 ;; `charset' might be an alias that `mm-charset-synonym-alist'
+		 ;; provides and might not be in common use, so we prefer
+		 ;; the one that Emacs knows for `coding'.
+		 (setq charset (if coding
+				   (mm-coding-system-to-mime-charset coding)
+				 (intern (downcase charset))))))
 	  (if (and (not raw)
 		   (member (car (split-string type "/")) '("text" "message")))
 	      (progn
--8<---------------cut here---------------end--------------->8---

Regards,



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-20 22:03                   ` Richard M. Stallman
@ 2008-10-21  2:50                     ` Kenichi Handa
  2008-10-21 16:00                       ` Ted Zlatanov
  0 siblings, 1 reply; 72+ messages in thread
From: Kenichi Handa @ 2008-10-21  2:50 UTC (permalink / raw)
  To: rms; +Cc: ding, Reiner.Steib, emacs-devel

In article <E1Ks2qs-0003ZO-H6@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:

> What's the advantage of giving the user an error when he types `utf8'?
> Why not do what he wants?

>     Which user interaction are you thinking of?

> C-x RET c utf8 RET

I think it's not good to give a user an incorrect impression
that "utf8" is a correct name.  In IANA
(http://www.iana.org/assignments/character-sets), there's no
alias names for "UTF-8".

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-20 17:04                           ` Richard M. Stallman
@ 2008-10-21  4:39                             ` Stephen J. Turnbull
  2008-10-21  5:23                               ` Miles Bader
  0 siblings, 1 reply; 72+ messages in thread
From: Stephen J. Turnbull @ 2008-10-21  4:39 UTC (permalink / raw)
  To: rms; +Cc: Stefan Monnier, ding, emacs-devel

Richard M. Stallman writes:
 >     Yes, we should probably provide at least a function that we could call
 >     `get-coding-system' and which would take a string and return
 >     a coding-systm or nil.  It would then take care of ignoring case
 >     differences a well as addition/removal of dash/underscore/etc.. chars.
 > 
 > That approach sounds good.

`get-coding-system' and `find-coding-system' already exist in XEmacs,
so I would appreciate it if you would avoid those names.  I dislike
them for this purpose, anyway.  In general, get- functions retrieve an
internal object by a canonical name; the get- nomenclature should not
be used for a function which guesses what kind of abuse a luser has
applied to a publicly registered name.

How about `guess-coding-system-from-string' (or an abbreviation, but I
would prefer it be inconvenient to provide the guessing functionality)?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21  4:39                             ` Stephen J. Turnbull
@ 2008-10-21  5:23                               ` Miles Bader
  2008-10-21  6:25                                 ` tomas
  2008-10-21  8:06                                 ` Stephen J. Turnbull
  0 siblings, 2 replies; 72+ messages in thread
From: Miles Bader @ 2008-10-21  5:23 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: emacs-devel, rms, ding, Stefan Monnier

"Stephen J. Turnbull" <stephen@xemacs.org> writes:
> How about `guess-coding-system-from-string' (or an abbreviation, but I
> would prefer it be inconvenient to provide the guessing functionality)?

That seems reasonable (though "-from-string" seems superfluous, given
the argument is ... a string).

-miles

-- 
The trouble with most people is that they think with their hopes or
fears or wishes rather than with their minds.  -- Will Durant




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21  6:25                                 ` tomas
@ 2008-10-21  6:21                                   ` Miles Bader
  2008-10-21  7:44                                     ` tomas
  2008-10-21  8:15                                     ` Eli Zaretskii
  0 siblings, 2 replies; 72+ messages in thread
From: Miles Bader @ 2008-10-21  6:21 UTC (permalink / raw)
  To: tomas; +Cc: Stephen J. Turnbull, emacs-devel, Stefan Monnier, ding, rms

tomas@tuxteam.de writes:
>> That seems reasonable (though "-from-string" seems superfluous, given
>> the argument is ... a string).
>
> Heh. But this would be ambiguous. You might be feeding the function a
> string encoded in some unknown coding system and expecting it to guess
> what this is (e.g. using some statistical magic). The sister of
> guess-coding-system-from-buffer (as long as I might be baking
> pies-in-the-sky ;-)

Actually that points out a problem with the name --
`guess-coding-system-from-string' implies that it somehow examines the
_contents_ of the string and tries to choose a reasonable coding system,
whereas in fact it simply treats it as a name.

Perhaps something like `canonicalize-coding-system-name' would be good.

-Miles

-- 
Infancy, n. The period of our lives when, according to Wordsworth, 'Heaven
lies about us.' The world begins lying about us pretty soon afterward.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21  5:23                               ` Miles Bader
@ 2008-10-21  6:25                                 ` tomas
  2008-10-21  6:21                                   ` Miles Bader
  2008-10-21  8:06                                 ` Stephen J. Turnbull
  1 sibling, 1 reply; 72+ messages in thread
From: tomas @ 2008-10-21  6:25 UTC (permalink / raw)
  To: Miles Bader; +Cc: Stephen J. Turnbull, Stefan Monnier, rms, ding, emacs-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Oct 21, 2008 at 02:23:55PM +0900, Miles Bader wrote:
> "Stephen J. Turnbull" <stephen@xemacs.org> writes:
> > How about `guess-coding-system-from-string' (or an abbreviation, but I
> > would prefer it be inconvenient to provide the guessing functionality)?
> 
> That seems reasonable (though "-from-string" seems superfluous, given
> the argument is ... a string).

Heh. But this would be ambiguous. You might be feeding the function a
string encoded in some unknown coding system and expecting it to guess
what this is (e.g. using some statistical magic). The sister of
guess-coding-system-from-buffer (as long as I might be baking
pies-in-the-sky ;-)

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFI/XXGBcgs9XrR2kYRAo1iAJwIMOLgXh7PMe7mHGKsh6P91JgouwCggR2C
9cf67dAWHLoiBOi+xVbxT6A=
=OgJa
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21  6:21                                   ` Miles Bader
@ 2008-10-21  7:44                                     ` tomas
  2008-10-21  8:15                                     ` Eli Zaretskii
  1 sibling, 0 replies; 72+ messages in thread
From: tomas @ 2008-10-21  7:44 UTC (permalink / raw)
  To: Miles Bader; +Cc: Stephen J. Turnbull, emacs-devel, Stefan Monnier, ding, rms

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Oct 21, 2008 at 03:21:25PM +0900, Miles Bader wrote:
> tomas@tuxteam.de writes:
> >> That seems reasonable (though "-from-string" seems superfluous, given
> >> the argument is ... a string).
> >
> > Heh. But this would be ambiguous [...]

> Actually that points out a problem with the name --
> `guess-coding-system-from-string' implies that it somehow examines the
> _contents_ of the string [...]

Yes, that's what I was trying to say, in a round-about way. Sometimes
I've got foot in mouth. Thanks for expressing it more clearly :-)

> Perhaps something like `canonicalize-coding-system-name' would be good.

That's much clearer.

(not that a function guessing the coding system -- and the language! of
a stext snippet wouldn't be cool, though).

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFI/YhPBcgs9XrR2kYRAj7+AJ0RpYwsm7/4IlDFYJDzKyjW88iggACfaZ14
1o5r1qVWQ1HePuxplsuciQc=
=j7BD
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21  5:23                               ` Miles Bader
  2008-10-21  6:25                                 ` tomas
@ 2008-10-21  8:06                                 ` Stephen J. Turnbull
  1 sibling, 0 replies; 72+ messages in thread
From: Stephen J. Turnbull @ 2008-10-21  8:06 UTC (permalink / raw)
  To: Miles Bader; +Cc: emacs-devel, rms, ding, Stefan Monnier

Miles Bader writes:
 > "Stephen J. Turnbull" <stephen@xemacs.org> writes:
 > > How about `guess-coding-system-from-string' (or an abbreviation, but I
 > > would prefer it be inconvenient to provide the guessing functionality)?
 > 
 > That seems reasonable (though "-from-string" seems superfluous, given
 > the argument is ... a string).

No, the argument is a Lisp object, as always. :-)

The superfluity is intentional, as I mentioned.  Probably it's not
necessary to discourage its use, though, people will only go to the
trouble of using it in `read-coding-system' (interactive Emacs users
are not lusers :-) and in MUAs.





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21  6:21                                   ` Miles Bader
  2008-10-21  7:44                                     ` tomas
@ 2008-10-21  8:15                                     ` Eli Zaretskii
  2008-10-21  9:06                                       ` Stephen J. Turnbull
  2008-10-22  0:32                                       ` Kenichi Handa
  1 sibling, 2 replies; 72+ messages in thread
From: Eli Zaretskii @ 2008-10-21  8:15 UTC (permalink / raw)
  To: Miles Bader; +Cc: rms, ding, emacs-devel, tomas, monnier, stephen

> From: Miles Bader <miles@gnu.org>
> Date: Tue, 21 Oct 2008 15:21:25 +0900
> Cc: "Stephen J. Turnbull" <stephen@xemacs.org>, emacs-devel@gnu.org,
> 	Stefan Monnier <monnier@iro.umontreal.ca>, ding@gnus.org, rms@gnu.org
> 
> Perhaps something like `canonicalize-coding-system-name' would be good.

That implies that the return value would be a string, not the coding
system itself.  I suggest we return the coding system (or nil), not
just the name.

Some time back in this thread I suggested `coding-system-for-charset'
(since the argument strings will be charsets).




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21  8:15                                     ` Eli Zaretskii
@ 2008-10-21  9:06                                       ` Stephen J. Turnbull
  2008-10-21 10:22                                         ` Eli Zaretskii
  2008-10-22  0:32                                       ` Kenichi Handa
  1 sibling, 1 reply; 72+ messages in thread
From: Stephen J. Turnbull @ 2008-10-21  9:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Miles Bader, rms, ding, emacs-devel, tomas, monnier

Eli Zaretskii writes:
 > > From: Miles Bader <miles@gnu.org>
 > > Date: Tue, 21 Oct 2008 15:21:25 +0900
 > > Cc: "Stephen J. Turnbull" <stephen@xemacs.org>, emacs-devel@gnu.org,
 > > 	Stefan Monnier <monnier@iro.umontreal.ca>, ding@gnus.org, rms@gnu.org
 > > 
 > > Perhaps something like `canonicalize-coding-system-name' would be good.
 > 
 > That implies that the return value would be a string, not the coding
 > system itself.  I suggest we return the coding system (or nil), not
 > just the name.

The coding system *is* just a name (AIUI, for Emacs; XEmacs and old
Mule exposed the internal coding system object for reasons I don't
claim to understand).  That name happens to be a symbol, that's all.

I think it would be reasonable for this function to also accept
symbols (and attempt to guess coding systems from their print-names,
if they are not coding system names).  Eg, return 'utf-8 if handed
'utf8 or "UTF8".

 > Some time back in this thread I suggested `coding-system-for-charset'
 > (since the argument strings will be charsets).

Actually, if I recall the thread correctly, normally they won't.
They'll be *MIME* charsets, which correspond to Emacs *coding
systems*.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21  9:06                                       ` Stephen J. Turnbull
@ 2008-10-21 10:22                                         ` Eli Zaretskii
  2008-10-21 12:06                                           ` Stephen J. Turnbull
  0 siblings, 1 reply; 72+ messages in thread
From: Eli Zaretskii @ 2008-10-21 10:22 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: rms, ding, emacs-devel, tomas, monnier, miles

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: Miles Bader <miles@gnu.org>,
>     rms@gnu.org,
>     ding@gnus.org,
>     emacs-devel@gnu.org,
>     tomas@tuxteam.de,
>     monnier@iro.umontreal.ca
> Date: Tue, 21 Oct 2008 18:06:42 +0900
> 
> Eli Zaretskii writes:
>  > > From: Miles Bader <miles@gnu.org>
>  > > Date: Tue, 21 Oct 2008 15:21:25 +0900
>  > > Cc: "Stephen J. Turnbull" <stephen@xemacs.org>, emacs-devel@gnu.org,
>  > > 	Stefan Monnier <monnier@iro.umontreal.ca>, ding@gnus.org, rms@gnu.org
>  > > 
>  > > Perhaps something like `canonicalize-coding-system-name' would be good.
>  > 
>  > That implies that the return value would be a string, not the coding
>  > system itself.  I suggest we return the coding system (or nil), not
>  > just the name.
> 
> The coding system *is* just a name

Not in GNU Emacs; see the doc string of `define-coding-system' (in
Emacs 23) or `make-coding-system' (in Emacs 22).

>  > Some time back in this thread I suggested `coding-system-for-charset'
>  > (since the argument strings will be charsets).
> 
> Actually, if I recall the thread correctly, normally they won't.
> They'll be *MIME* charsets, which correspond to Emacs *coding
> systems*.

"utf8" is not a MIME charset, AFAIK, and it certainly doesn't
currently correspond to any coding system in Emacs.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21 10:22                                         ` Eli Zaretskii
@ 2008-10-21 12:06                                           ` Stephen J. Turnbull
  2008-10-21 12:40                                             ` Eli Zaretskii
  0 siblings, 1 reply; 72+ messages in thread
From: Stephen J. Turnbull @ 2008-10-21 12:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rms, ding, emacs-devel, tomas, monnier, miles

Eli Zaretskii writes:

 > >  > That implies that the return value would be a string, not the coding
 > >  > system itself.  I suggest we return the coding system (or nil), not
 > >  > just the name.
 > > 
 > > The coding system *is* just a name
 > 
 > Not in GNU Emacs; see the doc string of `define-coding-system' (in
 > Emacs 23) or `make-coding-system' (in Emacs 22).

If that's true, it's a shame; AFAICS there is no real utility to
exposing the coding system object to Lisp, since you never want to
muck with one in the middle of en/decoding, and the codecs themselves
aren't defined in Lisp anyway (they're defined either in C or CCL).
Not to mention that I don't see how you can dispense with
`get-coding-system' if you need to actually get a coding system
object.

But I thought that that is something that got fixed in Emacs 21 or 22.
So coding system objects and charset objects are no longer visible to
Lisp, but rather are manipulated by functions whose arguments include
the name of the object.  No?

 > >  > Some time back in this thread I suggested
 > >  > `coding-system-for-charset' (since the argument strings will
 > >  > be charsets).
 > > 
 > > Actually, if I recall the thread correctly, normally they won't.
 > > They'll be *MIME* charsets, which correspond to Emacs *coding
 > > systems*.
 > 
 > "utf8" is not a MIME charset, AFAIK, and it certainly doesn't
 > currently correspond to any coding system in Emacs.

"utf8" is not a MIME charset (and no form of UTF-8 is an Emacs
charset), but "utf-8" is an IANA-registered MIME charset.  AIUI, the
point of the function is to guess what people who don't know what
they're doing are trying to express (and to provide some interactive
convenience to people who do know what they're doing).

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21 12:06                                           ` Stephen J. Turnbull
@ 2008-10-21 12:40                                             ` Eli Zaretskii
  2008-10-22  2:34                                               ` Stephen J. Turnbull
  0 siblings, 1 reply; 72+ messages in thread
From: Eli Zaretskii @ 2008-10-21 12:40 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: rms, ding, emacs-devel, tomas, monnier, miles

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: rms@gnu.org,
>     ding@gnus.org,
>     emacs-devel@gnu.org,
>     tomas@tuxteam.de,
>     monnier@iro.umontreal.ca,
>     miles@gnu.org
> Date: Tue, 21 Oct 2008 21:06:50 +0900
> 
> Eli Zaretskii writes:
> 
>  > >  > That implies that the return value would be a string, not the coding
>  > >  > system itself.  I suggest we return the coding system (or nil), not
>  > >  > just the name.
>  > > 
>  > > The coding system *is* just a name
>  > 
>  > Not in GNU Emacs; see the doc string of `define-coding-system' (in
>  > Emacs 23) or `make-coding-system' (in Emacs 22).
> 
> If that's true, it's a shame; AFAICS there is no real utility to
> exposing the coding system object to Lisp

Emacs does not expose coding systems to Lisp.  A coding system is a
just a symbol with special attributes, as far as Lisp is concerned.

This must be a misunderstanding of some kind, because I'm sure we are
talking about something we both understand.  Let me take a step back
and explain what I meant in my original message:

> > Perhaps something like `canonicalize-coding-system-name' would be good.
> 
> That implies that the return value would be a string, not the coding
> system itself.  I suggest we return the coding system (or nil), not
> just the name.

What I meant is that, instead of returning a _string_, which is the
name of a coding system, it is better to return a _symbol_ of that
coding system.  This will avoid the need to `intern' that string,
which is a gratuitous nuisance, because the function we are discussing
will most probably need to `intern' the string itself, for doing its
job.

I'm sure we both understand the difference between a thing and its
name, so vividly explained by Luis Carrol ;-)

> "utf8" is not a MIME charset (and no form of UTF-8 is an Emacs
> charset), but "utf-8" is an IANA-registered MIME charset.

Yes.

> AIUI, the point of the function is to guess what people who don't
> know what they're doing are trying to express (and to provide some
> interactive convenience to people who do know what they're doing).

Agreed, but in most cases the argument will be a valid MIME charset.
The case of "UTF8" is an exception.  And even in this exceptional
case, I understand that "UTF8" came from some charset= header.  That
is why I suggested coding-system-for-charset.  I don't mind
coding-system-for-mime-charset, either, if that was your point.  (In
Emacs 23+, the original Mule meaning of "charset" will fade out.)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21  2:50                     ` Kenichi Handa
@ 2008-10-21 16:00                       ` Ted Zlatanov
  2008-10-22  1:22                         ` Kenichi Handa
                                           ` (3 more replies)
  0 siblings, 4 replies; 72+ messages in thread
From: Ted Zlatanov @ 2008-10-21 16:00 UTC (permalink / raw)
  To: emacs-devel; +Cc: ding

On Tue, 21 Oct 2008 11:50:37 +0900 Kenichi Handa <handa@m17n.org> wrote: 

KH> In article <E1Ks2qs-0003ZO-H6@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:
>> What's the advantage of giving the user an error when he types `utf8'?
>> Why not do what he wants?

>> Which user interaction are you thinking of?

>> C-x RET c utf8 RET

KH> I think it's not good to give a user an incorrect impression
KH> that "utf8" is a correct name.  In IANA
KH> (http://www.iana.org/assignments/character-sets), there's no
KH> alias names for "UTF-8".

So maybe display a message "This is not the real coding system name, use
`utf-8'" and also don't offer utf8 for completion?  Would that be
sufficient?

Ted





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21  8:15                                     ` Eli Zaretskii
  2008-10-21  9:06                                       ` Stephen J. Turnbull
@ 2008-10-22  0:32                                       ` Kenichi Handa
  2008-10-22  4:27                                         ` Eli Zaretskii
  1 sibling, 1 reply; 72+ messages in thread
From: Kenichi Handa @ 2008-10-22  0:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rms, ding, emacs-devel, tomas, monnier, stephen, miles

In article <uprlugy8g.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > From: Miles Bader <miles@gnu.org>
> > Date: Tue, 21 Oct 2008 15:21:25 +0900
> > Cc: "Stephen J. Turnbull" <stephen@xemacs.org>, emacs-devel@gnu.org,
> > 	Stefan Monnier <monnier@iro.umontreal.ca>, ding@gnus.org, rms@gnu.org
> > 
> > Perhaps something like `canonicalize-coding-system-name' would be good.

> That implies that the return value would be a string, not the coding
> system itself.  I suggest we return the coding system (or nil), not
> just the name.

> Some time back in this thread I suggested `coding-system-for-charset'
> (since the argument strings will be charsets).

But, "for-charset" implies that it should be used for
mime-charset.  What is required is to find a coding system
by loose name matching (not necessarily a mime-charset
name), isn't it?

How about `resolve-coding-system-name'?

---
Kenichi Handa
handa@ni.aist.go.jp






^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21 16:00                       ` Ted Zlatanov
@ 2008-10-22  1:22                         ` Kenichi Handa
  2008-10-22  2:07                         ` Stephen J. Turnbull
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 72+ messages in thread
From: Kenichi Handa @ 2008-10-22  1:22 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: ding, emacs-devel

In article <86wsg2546l.fsf@lifelogs.com>, Ted Zlatanov <tzz@lifelogs.com> writes:

KH> I think it's not good to give a user an incorrect impression
KH> that "utf8" is a correct name.  In IANA
KH> (http://www.iana.org/assignments/character-sets), there's no
KH> alias names for "UTF-8".

> So maybe display a message "This is not the real coding system name, use
> `utf-8'" and also don't offer utf8 for completion?  Would that be
> sufficient?

Do we really need such a tedious warning?  When one types
"utf8 RET" and sees "no match" message, he can use
completion to learn what is the correct name.  I think that
is enough.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21 16:00                       ` Ted Zlatanov
  2008-10-22  1:22                         ` Kenichi Handa
@ 2008-10-22  2:07                         ` Stephen J. Turnbull
  2008-10-22  6:21                         ` Richard M. Stallman
  2008-10-22 13:15                         ` Stefan Monnier
  3 siblings, 0 replies; 72+ messages in thread
From: Stephen J. Turnbull @ 2008-10-22  2:07 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: ding, emacs-devel

Ted Zlatanov writes:
 > On Tue, 21 Oct 2008 11:50:37 +0900 Kenichi Handa <handa@m17n.org> wrote: 

 > KH> I think it's not good to give a user an incorrect impression
 > KH> that "utf8" is a correct name.  In IANA
 > KH> (http://www.iana.org/assignments/character-sets), there's no
 > KH> alias names for "UTF-8".

Although I tend to be a standards bigot myself, IMHO this is a bit
harsh for practical purposes, especially since in GNU libc it *is*
an acceptable alias (computed algorithmically).  AFAIK GNU libc is
POSIXLY_CORRECT, because POSIX defines neither any locales (besides
C/POSIX itself) nor a registry for locale names.

 > So maybe display a message "This is not the real coding system name, use
 > `utf-8'" and also don't offer utf8 for completion?  Would that be
 > sufficient?

I think the warning is overkill and too annoying.  For programmers,
having their programs grind to an ignominious halt on the Lisp error
will teach them to spell, while for interactive entry of coding system
names, the damage has long since been done, and we may as well offer
the convenience.  The internal canonicalization function is needed
anyway for MUAs.

Of course "utf8" should not be offered for completion.  Instead, if
you type "utf8" it should be completed to "utf-8" *and* *require*
confirmation (no implicit completion as with, eg, command names).

That should be educational enough.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21 12:40                                             ` Eli Zaretskii
@ 2008-10-22  2:34                                               ` Stephen J. Turnbull
  2008-10-22  4:33                                                 ` Eli Zaretskii
  2008-10-22 21:02                                                 ` Richard M. Stallman
  0 siblings, 2 replies; 72+ messages in thread
From: Stephen J. Turnbull @ 2008-10-22  2:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rms, ding, emacs-devel, tomas, monnier, miles

Eli Zaretskii writes:

 > > > Perhaps something like `canonicalize-coding-system-name' would be good.
 > > 
 > > That implies that the return value would be a string, not the coding
 > > system itself.  I suggest we return the coding system (or nil), not
 > > just the name.
 > 
 > What I meant is that, instead of returning a _string_, which is the
 > name of a coding system, it is better to return a _symbol_ of that
 > coding system.

Of course.  My point is that the symbol is the name, and therefore
"canonicalize-coding-system-name" is a reasonable name for this
function.

If it weren't for the conflict with XEmacs, which still needs
`get-coding-system' to return a coding system object, I'd be perfectly
happy using that.

 > > AIUI, the point of the function is to guess what people who don't
 > > know what they're doing are trying to express (and to provide some
 > > interactive convenience to people who do know what they're doing).
 > 
 > Agreed, but in most cases the argument will be a valid MIME charset.

Except when Richard<wink> is typing, and surely we all consider that
an important use case?  Aside from Richard's expressed preference for
a harmless convenience, the presence or absence of one or more hyphens
is something the various standards disagree about:

 > The case of "UTF8" is an exception.

Well, no, I think it is not.  AFAIK only one of "iso-8859-1" and
"iso8859-1" is registered, but Emacs uses the former exclusively, and
X11 only the latter (in XLFDs).  Both are acceptable to iconv.  (And
the ISO standards actually use "ISO 8859/1" which isn't even
acceptable to glibc iconv!)

 > And even in this exceptional case, I understand that "UTF8" came
 > from some charset= header.  That is why I suggested
 > coding-system-for-charset.

Well, the MIME nomenclature is seriously broken.  A substantial
minority of the things it denotes "charsets" are not "character sets"
in any sense.

 > I don't mind coding-system-for-mime-charset, either, if that was
 > your point.

That's the worst of several suggestions, as this mapping is not
limited to MIME charsets, but is useful for coding systems in general,
as the usage of hyphens in their names has no rhyme nor reason.  Is it
"KOI8-R" or "KOI-8R"?  That one confused me, at least, for a while.

 > (In Emacs 23+, the original Mule meaning of "charset" will fade
 > out.)

That would be sad.  While I agree that UTF-8 will fairly quickly
become universal for current text documents, I don't expect the vast
amount of legacy archives to be converted any time soon (some will be
converted at the time of converting to new media, but human beings
being what they are I expect that for a couple centuries some
bureaucrats will just make bit-level copies ;-).  Emacs should be the
premier application for reading those!

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-22  0:32                                       ` Kenichi Handa
@ 2008-10-22  4:27                                         ` Eli Zaretskii
  2009-01-27  4:51                                           ` Kenichi Handa
  0 siblings, 1 reply; 72+ messages in thread
From: Eli Zaretskii @ 2008-10-22  4:27 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: rms, ding, emacs-devel, tomas, monnier, stephen, miles

> From: Kenichi Handa <handa@m17n.org>
> CC: miles@gnu.org, rms@gnu.org, ding@gnus.org, emacs-devel@gnu.org,
>         tomas@tuxteam.de, monnier@iro.umontreal.ca, stephen@xemacs.org
> Date: Wed, 22 Oct 2008 09:32:17 +0900
> 
> In article <uprlugy8g.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > > From: Miles Bader <miles@gnu.org>
> > > Date: Tue, 21 Oct 2008 15:21:25 +0900
> > > Cc: "Stephen J. Turnbull" <stephen@xemacs.org>, emacs-devel@gnu.org,
> > > 	Stefan Monnier <monnier@iro.umontreal.ca>, ding@gnus.org, rms@gnu.org
> > > 
> > > Perhaps something like `canonicalize-coding-system-name' would be good.
> 
> > That implies that the return value would be a string, not the coding
> > system itself.  I suggest we return the coding system (or nil), not
> > just the name.
> 
> > Some time back in this thread I suggested `coding-system-for-charset'
> > (since the argument strings will be charsets).
> 
> But, "for-charset" implies that it should be used for
> mime-charset.  What is required is to find a coding system
> by loose name matching (not necessarily a mime-charset
> name), isn't it?
> 
> How about `resolve-coding-system-name'?

Canonicalize is better, IMO.  But again, I think the function should
return a symbol, not its name (which is a string).  There's no need to
request that users intern the string.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-22  2:34                                               ` Stephen J. Turnbull
@ 2008-10-22  4:33                                                 ` Eli Zaretskii
  2008-10-22 21:02                                                 ` Richard M. Stallman
  1 sibling, 0 replies; 72+ messages in thread
From: Eli Zaretskii @ 2008-10-22  4:33 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: rms, ding, emacs-devel, tomas, monnier, miles

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: rms@gnu.org,
>     ding@gnus.org,
>     emacs-devel@gnu.org,
>     tomas@tuxteam.de,
>     monnier@iro.umontreal.ca,
>     miles@gnu.org
> Date: Wed, 22 Oct 2008 11:34:17 +0900
> 
> Eli Zaretskii writes:
> 
>  > What I meant is that, instead of returning a _string_, which is the
>  > name of a coding system, it is better to return a _symbol_ of that
>  > coding system.
> 
> Of course.  My point is that the symbol is the name, and therefore
> "canonicalize-coding-system-name" is a reasonable name for this
> function.

"-name" is generally a string, like in "symbol-name".



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21 16:00                       ` Ted Zlatanov
  2008-10-22  1:22                         ` Kenichi Handa
  2008-10-22  2:07                         ` Stephen J. Turnbull
@ 2008-10-22  6:21                         ` Richard M. Stallman
  2008-10-23  2:34                           ` Kenichi Handa
  2008-10-22 13:15                         ` Stefan Monnier
  3 siblings, 1 reply; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-22  6:21 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: emacs-devel, ding

    KH> I think it's not good to give a user an incorrect impression
    KH> that "utf8" is a correct name.

I think that is less important than DTRT, but I'm not against it.

					In IANA
    KH> (http://www.iana.org/assignments/character-sets), there's no
    KH> alias names for "UTF-8".

Why should that be relevant?

    So maybe display a message "This is not the real coding system name, use
    `utf-8'" and also don't offer utf8 for completion?  Would that be
    sufficient?

I have nothing against that.



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21 16:00                       ` Ted Zlatanov
                                           ` (2 preceding siblings ...)
  2008-10-22  6:21                         ` Richard M. Stallman
@ 2008-10-22 13:15                         ` Stefan Monnier
  2008-10-24 17:21                           ` Ted Zlatanov
  3 siblings, 1 reply; 72+ messages in thread
From: Stefan Monnier @ 2008-10-22 13:15 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: ding, emacs-devel

> So maybe display a message "This is not the real coding system name, use
> `utf-8'" and also don't offer utf8 for completion?  Would that be
> sufficient?

I think the message is useless.  If we want to make it clear, then as
Stephen suggests we should make TAB expand "utf8" to "utf-8", and make
RET do the expansion and request confirmation.

But to tell you the truth, that would be more work because it's not
a standard form of completion.  Tho I guess we could come up with
a special completion-style for it.

        Stefan

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-22  2:34                                               ` Stephen J. Turnbull
  2008-10-22  4:33                                                 ` Eli Zaretskii
@ 2008-10-22 21:02                                                 ` Richard M. Stallman
  1 sibling, 0 replies; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-22 21:02 UTC (permalink / raw)
  To: Stephen J. Turnbull; +Cc: eliz, ding, emacs-devel, tomas, monnier, miles

    Well, no, I think it is not.  AFAIK only one of "iso-8859-1" and
    "iso8859-1" is registered, but Emacs uses the former exclusively, and
    X11 only the latter (in XLFDs).

I think it would be better if Emacs accepted either form, if specified.



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-22  6:21                         ` Richard M. Stallman
@ 2008-10-23  2:34                           ` Kenichi Handa
  2008-10-23 21:08                             ` Richard M. Stallman
  0 siblings, 1 reply; 72+ messages in thread
From: Kenichi Handa @ 2008-10-23  2:34 UTC (permalink / raw)
  To: rms; +Cc: tzz, ding, emacs-devel

In article <E1KsX5i-00030Z-VF@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:

> 					In IANA
    KH> (http://www.iana.org/assignments/character-sets), there's no
    KH> alias names for "UTF-8".

> Why should that be relevant?

Because, as far as I know, that is the only authority
registering coding system names comprehensively (even though
they mix them with character set names).

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-23  2:34                           ` Kenichi Handa
@ 2008-10-23 21:08                             ` Richard M. Stallman
  2008-10-24  0:54                               ` Kenichi Handa
  0 siblings, 1 reply; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-23 21:08 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: tzz, ding, emacs-devel

	KH> (http://www.iana.org/assignments/character-sets), there's no
	KH> alias names for "UTF-8".

    > Why should that be relevant?

    Because, as far as I know, that is the only authority
    registering coding system names comprehensively (even though
    they mix them with character set names).

So what?  This is a matter of what is convenient for users of Emacs.
It has nothing to do with anyone's "authority".

In GNU, we do not obey standards.  We take account of them
in order to figure out what is The Right Thing.  The fact that
a certain name has no meaning according to some standard
means the standard offers no obstacle to whatever we want to do.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-23 21:08                             ` Richard M. Stallman
@ 2008-10-24  0:54                               ` Kenichi Handa
  2008-10-24 18:36                                 ` Richard M. Stallman
  0 siblings, 1 reply; 72+ messages in thread
From: Kenichi Handa @ 2008-10-24  0:54 UTC (permalink / raw)
  To: rms; +Cc: tzz, ding, emacs-devel

In article <E1Kt7Q2-0002yL-Nw@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:

>     Because, as far as I know, that is the only authority
>     registering coding system names comprehensively (even though
>     they mix them with character set names).

> So what?  This is a matter of what is convenient for users of Emacs.
> It has nothing to do with anyone's "authority".

For user's convenience, we currently have two choices;
adding alias "utf8", or improving coding-system completion
(as proposed by "Stephen J. Turnbull" <stephen@xemacs.org>).

To decide which way is better, whether "utf8" is an
authorized name or not can be a good guide.  If it's an
authorized name, adding alias is good, if not, solving by
more intelligent completion is good.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-22 13:15                         ` Stefan Monnier
@ 2008-10-24 17:21                           ` Ted Zlatanov
  2008-10-25  2:01                             ` Richard M. Stallman
  0 siblings, 1 reply; 72+ messages in thread
From: Ted Zlatanov @ 2008-10-24 17:21 UTC (permalink / raw)
  To: emacs-devel; +Cc: ding

On Wed, 22 Oct 2008 09:15:30 -0400 Stefan Monnier <monnier@iro.umontreal.ca> wrote: 

>> So maybe display a message "This is not the real coding system name, use
>> `utf-8'" and also don't offer utf8 for completion?  Would that be
>> sufficient?

SM> I think the message is useless.  If we want to make it clear, then as
SM> Stephen suggests we should make TAB expand "utf8" to "utf-8", and make
SM> RET do the expansion and request confirmation.

SM> But to tell you the truth, that would be more work because it's not
SM> a standard form of completion.  Tho I guess we could come up with
SM> a special completion-style for it.

I think completing utf8 to utf-8 is the best solution as you and Stephen
described, so it's worth some code.

Ted

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-24  0:54                               ` Kenichi Handa
@ 2008-10-24 18:36                                 ` Richard M. Stallman
  0 siblings, 0 replies; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-24 18:36 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: tzz, ding, emacs-devel

    For user's convenience, we currently have two choices;
    adding alias "utf8", or improving coding-system completion
    (as proposed by "Stephen J. Turnbull" <stephen@xemacs.org>).

    To decide which way is better, whether "utf8" is an
    authorized name or not can be a good guide.  If it's an
    authorized name, adding alias is good, if not, solving by
    more intelligent completion is good.

I think there is a third option: recognize the name
thru a list of second-class aliases.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-24 17:21                           ` Ted Zlatanov
@ 2008-10-25  2:01                             ` Richard M. Stallman
  2008-10-25  2:32                               ` Kenichi Handa
  2008-10-25 18:27                               ` Stefan Monnier
  0 siblings, 2 replies; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-25  2:01 UTC (permalink / raw)
  To: Ted Zlatanov; +Cc: ding, emacs-devel

    I think completing utf8 to utf-8 is the best solution as you and Stephen
    described, so it's worth some code.

It would look strange for completion to make such a change within a
word of the user's input.  It may be ok, but let's not forget the
easy alternative: accept that input without having it visibly change
in the minibuffer.

For instance, completing-read could accept a list of alternatives
that are valid if entered, but should not affect completion.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-25  2:01                             ` Richard M. Stallman
@ 2008-10-25  2:32                               ` Kenichi Handa
  2008-10-26  4:10                                 ` Richard M. Stallman
  2008-10-25 18:27                               ` Stefan Monnier
  1 sibling, 1 reply; 72+ messages in thread
From: Kenichi Handa @ 2008-10-25  2:32 UTC (permalink / raw)
  To: rms; +Cc: tzz, ding, emacs-devel

In article <E1KtYTF-0002jw-IU@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:

>     I think completing utf8 to utf-8 is the best solution as you and Stephen
>     described, so it's worth some code.

> It would look strange for completion to make such a change within a
> word of the user's input.  It may be ok, but let's not forget the
> easy alternative: accept that input without having it visibly change
> in the minibuffer.

> For instance, completing-read could accept a list of alternatives
> that are valid if entered, but should not affect completion.

Such a behavior can be implemented without a list of
alternatives but with a function to find a coding system
with loose matching.  The only difference with Stephen's
proposal is to echo "utf-8" to users or not.

By the way, with such a behavior, when one types "utf8 TAB
TAB", what completion list should we show?  With "utf-8 TAB
TAB", these are shown now.

utf-8 	utf-8-auto
utf-8-auto-dos 	utf-8-auto-mac
utf-8-auto-unix 	utf-8-dos
utf-8-emacs 	utf-8-emacs-dos
utf-8-emacs-mac 	utf-8-emacs-unix
utf-8-mac 	utf-8-unix
utf-8-with-signature 	utf-8-with-signature-dos
utf-8-with-signature-mac 	utf-8-with-signature-unix


---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-25  2:01                             ` Richard M. Stallman
  2008-10-25  2:32                               ` Kenichi Handa
@ 2008-10-25 18:27                               ` Stefan Monnier
  2008-10-26  4:10                                 ` Richard M. Stallman
  1 sibling, 1 reply; 72+ messages in thread
From: Stefan Monnier @ 2008-10-25 18:27 UTC (permalink / raw)
  To: rms; +Cc: Ted Zlatanov, ding, emacs-devel

>     I think completing utf8 to utf-8 is the best solution as you and Stephen
>     described, so it's worth some code.

> It would look strange for completion to make such a change within a
> word of the user's input.  It may be ok, but let's not forget the
> easy alternative: accept that input without having it visibly change
> in the minibuffer.

Agreed.

> For instance, completing-read could accept a list of alternatives
> that are valid if entered, but should not affect completion.

It's just easier to do a "loose match" than to hardcode a list
of alternatives.


        Stefan





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-25 18:27                               ` Stefan Monnier
@ 2008-10-26  4:10                                 ` Richard M. Stallman
  2008-10-31 21:29                                   ` Stefan Monnier
  0 siblings, 1 reply; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-26  4:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: tzz, ding, emacs-devel

    > For instance, completing-read could accept a list of alternatives
    > that are valid if entered, but should not affect completion.

    It's just easier to do a "loose match" than to hardcode a list
    of alternatives.

If it works, it's fine.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-25  2:32                               ` Kenichi Handa
@ 2008-10-26  4:10                                 ` Richard M. Stallman
  2008-10-31  6:33                                   ` Kenichi Handa
  0 siblings, 1 reply; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-26  4:10 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: tzz, ding, emacs-devel

    By the way, with such a behavior, when one types "utf8 TAB
    TAB", what completion list should we show?

That is a hard question.  Perhaps it should show the same
analogous set of completions

    utf8 	utf8-auto
    utf8-auto-dos 	utf8-auto-mac
    utf8-auto-unix 	utf8-dos
    utf8-emacs 	utf8-emacs-dos
    utf8-emacs-mac 	utf8-emacs-unix
    utf8-mac 	utf8-unix
    utf8-with-signature 	utf8-with-signature-dos
    utf8-with-signature-mac 	utf8-with-signature-unix

These could be treated somewhat like `completion-ignored-extensions'
for filenames: if a first-class name is a possible completion
then you don't see the second-class ones.  If no first-class name
is possible, then you do see the second-class ones.





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-26  4:10                                 ` Richard M. Stallman
@ 2008-10-31  6:33                                   ` Kenichi Handa
  2008-10-31  7:24                                     ` Miles Bader
  2008-10-31 19:31                                     ` Richard M. Stallman
  0 siblings, 2 replies; 72+ messages in thread
From: Kenichi Handa @ 2008-10-31  6:33 UTC (permalink / raw)
  To: rms; +Cc: tzz, ding, emacs-devel

In article <E1Ktwxk-0006vn-3F@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:

>     By the way, with such a behavior, when one types "utf8 TAB
>     TAB", what completion list should we show?

> That is a hard question.  Perhaps it should show the same
> analogous set of completions

>     utf8 	utf8-auto
>     utf8-auto-dos 	utf8-auto-mac
>     utf8-auto-unix 	utf8-dos
>     utf8-emacs 	utf8-emacs-dos
>     utf8-emacs-mac 	utf8-emacs-unix
>     utf8-mac 	utf8-unix
>     utf8-with-signature 	utf8-with-signature-dos
>     utf8-with-signature-mac 	utf8-with-signature-unix

> These could be treated somewhat like `completion-ignored-extensions'
> for filenames: if a first-class name is a possible completion
> then you don't see the second-class ones.  If no first-class name
> is possible, then you do see the second-class ones.

In the case of filenames, there surely exist the actual file
with those ignored extensions.  But, in the case of coding
systems, such an alias as "utf8" doesn't exist.  Or do you
still propose to make such an alias as a sencond-class name
in advance?  If so, I strongly oppose to it.  If we are
going to allow users to type all names that are accepted by
iconv, we must make so many aliases.

I still thinks completing "utf8" to "utf-8" is the best
method.  Even for a filename, if there's a file "test-a" and
I type "tes-a TAB", it is expanded to "test-a".  So,
changing "utf8" to "utf-8" doesn't look that strange.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-31  6:33                                   ` Kenichi Handa
@ 2008-10-31  7:24                                     ` Miles Bader
  2008-10-31 19:31                                     ` Richard M. Stallman
  1 sibling, 0 replies; 72+ messages in thread
From: Miles Bader @ 2008-10-31  7:24 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: tzz, rms, ding, emacs-devel

Kenichi Handa <handa@m17n.org> writes:
> I still thinks completing "utf8" to "utf-8" is the best
> method.  Even for a filename, if there's a file "test-a" and
> I type "tes-a TAB", it is expanded to "test-a".  So,
> changing "utf8" to "utf-8" doesn't look that strange.

Also if you have e.g. `read-file-name-completion-ignore-case',
completing a filename "corrects" its case to match what's on disk.

This works quite well, and I think is very natural from a user's point
of view.

-Miles

-- 
We have met the enemy, and he is us.  -- Pogo




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-31  6:33                                   ` Kenichi Handa
  2008-10-31  7:24                                     ` Miles Bader
@ 2008-10-31 19:31                                     ` Richard M. Stallman
  2008-11-01  2:17                                       ` Kenichi Handa
  1 sibling, 1 reply; 72+ messages in thread
From: Richard M. Stallman @ 2008-10-31 19:31 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: tzz, ding, emacs-devel

    In the case of filenames, there surely exist the actual file
    with those ignored extensions.  But, in the case of coding
    systems, such an alias as "utf8" doesn't exist.  Or do you
    still propose to make such an alias as a sencond-class name
    in advance?

To define them as second-class extensions would be one method.
Another is this: `read-coding-system' could create the completion
alist, then add to it modified entries made by replacing "utf-8" with
"utf8".  Then it could read the name, using the appropriate kind of
completion.  When it gets back the value from `completing-read', it
could replace "utf8" with "utf-8".

This avoids having a list of second-class "utf8" aliases.  Those
aliases would be constructed automatically from the valid names
that start with "utf-8".

		 If so, I strongly oppose to it.

Why, what harm would it do?

      If we are
    going to allow users to type all names that are accepted by
    iconv, we must make so many aliases.

I don't know which names are accepted by iconv, so I don't know
whether I'm in favor of accepting them all.

But suppose that we decide to accept them all, and suppose we decide
to do it by defining each one as a second-class alias.  How many
second-class aliases would that require?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-26  4:10                                 ` Richard M. Stallman
@ 2008-10-31 21:29                                   ` Stefan Monnier
  2008-11-10  5:43                                     ` Kenichi Handa
  0 siblings, 1 reply; 72+ messages in thread
From: Stefan Monnier @ 2008-10-31 21:29 UTC (permalink / raw)
  To: rms; +Cc: tzz, ding, emacs-devel

>> For instance, completing-read could accept a list of alternatives
>> that are valid if entered, but should not affect completion.

>     It's just easier to do a "loose match" than to hardcode a list
>     of alternatives.

> If it works, it's fine.

The following patch (to be applied by hand, or to be read) is a proof
of concept.  It allows the user to do "C-x RET f u8 TAB" and have it
complete to utf-8.


        Stefan


=== modified file 'lisp/international/mule.el'
--- lisp/international/mule.el  2008-07-24 03:10:36 +0000
+++ lisp/international/mule.el  2008-10-31 21:08:00 +0000
@@ -1192,6 +1192,10 @@
                  (widen)
                  (goto-char (point-min))
                  (set-auto-coding buffer-file-name (buffer-size))))))
+          (completion-ignore-case t)
+          (completion-pcm--delim-wild-regex
+           (concat completion-pcm--delim-wild-regex
+                   "\\|\\([[:alpha:]]\\)[[:digit:]]"))
           (cs (completing-read (format "Coding system for saving file (default %s): " auto-cs)
                                (completion-table-in-turn
                                 bcss-table combined-table)

=== modified file 'lisp/minibuffer.el'
--- lisp/minibuffer.el  2008-10-30 00:44:07 +0000
+++ lisp/minibuffer.el  2008-10-31 21:11:08 +0000
@@ -1512,6 +1512,13 @@
            (p0 p))
 
       (while (setq p (string-match completion-pcm--delim-wild-regex string p))
+        ;; Usually, completion-pcm--delim-wild-regex matches a delimiter,
+        ;; meaning that something can be added *before* it, but it can also
+        ;; match a prefix and postfix, in which case something can be added
+        ;; in-between (e.g. match [[:lower:]][[:upper:]]).
+        ;; This is determined by the presence of a submatch-1 which delimits
+        ;; the prefix.
+        (if (match-end 1) (setq p (match-end 1)))
         (push (substring string p0 p) pattern)
         (if (eq (aref string p) ?*)
             (progn





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-31 19:31                                     ` Richard M. Stallman
@ 2008-11-01  2:17                                       ` Kenichi Handa
  2008-11-02  1:53                                         ` Richard M. Stallman
  0 siblings, 1 reply; 72+ messages in thread
From: Kenichi Handa @ 2008-11-01  2:17 UTC (permalink / raw)
  To: rms; +Cc: tzz, ding, emacs-devel

In article <E1KvziI-0005Qf-30@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:

>     In the case of filenames, there surely exist the actual file
>     with those ignored extensions.  But, in the case of coding
>     systems, such an alias as "utf8" doesn't exist.  Or do you
>     still propose to make such an alias as a sencond-class name
>     in advance?

> To define them as second-class extensions would be one method.
> Another is this: `read-coding-system' could create the completion
> alist, then add to it modified entries made by replacing "utf-8" with
> "utf8".  Then it could read the name, using the appropriate kind of
> completion.  When it gets back the value from `completing-read', it
> could replace "utf8" with "utf-8".

> This avoids having a list of second-class "utf8" aliases.  Those
> aliases would be constructed automatically from the valid names
> that start with "utf-8".

> 		 If so, I strongly oppose to it.

> Why, what harm would it do?

With that, people think that "utf8" is a valid coding system
name, and will write a code something like this:
  (decode-coding-string STR 'utf8)
and found that it signals an error because utf8 is not
statically declared as an alias.

>       If we are
>     going to allow users to type all names that are accepted by
>     iconv, we must make so many aliases.

> I don't know which names are accepted by iconv, so I don't know
> whether I'm in favor of accepting them all.

> But suppose that we decide to accept them all, and suppose we decide
> to do it by defining each one as a second-class alias.  How many
> second-class aliases would that require?

For instance, "% iconv -l", lists these variants for
iso-8859-1:

"ISO-8859-1", "ISO88591" "8859_1", "ISO_8859-1"

In addition, we must add (partial) lowercase versions.
Partial means something like this: Iso ISo isO

And, as we also have to add "-dos", "-unix", "-mac"
variatants.

So total aliases we'll add are more than 100 just for
iso-8859-1.

And, the "iconv" program actualy accepts any pattern
matching with "iso[^a-zA-Z0-9]8859-1"; e.g. "iso 8859-1",
"iso=8859-1", etc.

Considering them, it is not realistic to have all aliases
statically.

---
Kenichi Handa
handa@ni.aist.go.jp






^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-11-01  2:17                                       ` Kenichi Handa
@ 2008-11-02  1:53                                         ` Richard M. Stallman
  2008-11-07  7:15                                           ` Kenichi Handa
  0 siblings, 1 reply; 72+ messages in thread
From: Richard M. Stallman @ 2008-11-02  1:53 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: tzz, ding, emacs-devel

    For instance, "% iconv -l", lists these variants for
    iso-8859-1:

    "ISO-8859-1", "ISO88591" "8859_1", "ISO_8859-1"

The reading of coding system names could treat hyphens as optional,
and could treat underscores as equivalent to hyphens.  Then
with just two names, iso-8859-1 and 8859-1, it would recognize
all of these and more.

    In addition, we must add (partial) lowercase versions.
    Partial means something like this: Iso ISo isO

That's trivial, just ignore case when reading the coding system name.





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-11-02  1:53                                         ` Richard M. Stallman
@ 2008-11-07  7:15                                           ` Kenichi Handa
  2008-11-07 17:04                                             ` Richard M. Stallman
  0 siblings, 1 reply; 72+ messages in thread
From: Kenichi Handa @ 2008-11-07  7:15 UTC (permalink / raw)
  To: rms; +Cc: tzz, ding, emacs-devel

In article <E1KwS9Q-00074H-0v@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:

>     For instance, "% iconv -l", lists these variants for
>     iso-8859-1:

>     "ISO-8859-1", "ISO88591" "8859_1", "ISO_8859-1"

> The reading of coding system names could treat hyphens as optional,
> and could treat underscores as equivalent to hyphens.  Then
> with just two names, iso-8859-1 and 8859-1, it would recognize
> all of these and more.

>     In addition, we must add (partial) lowercase versions.
>     Partial means something like this: Iso ISo isO

> That's trivial, just ignore case when reading the coding system name.

You pay attention only to the case of reading the coding
system name.  But, that is not the point I'm arguing.  What
do you think about the case of writing codes as I wrote in
the previous mail?

> With that, people think that "utf8" is a valid coding system
> name, and will write a code something like this:
>   (decode-coding-string STR 'utf8)
> and found that it signals an error because utf8 is not
> statically declared as an alias.

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-11-07  7:15                                           ` Kenichi Handa
@ 2008-11-07 17:04                                             ` Richard M. Stallman
  0 siblings, 0 replies; 72+ messages in thread
From: Richard M. Stallman @ 2008-11-07 17:04 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: tzz, ding, emacs-devel

    You pay attention only to the case of reading the coding
    system name.  But, that is not the point I'm arguing.  What
    do you think about the case of writing codes as I wrote in
    the previous mail?

Do you mean this one?

    > With that, people think that "utf8" is a valid coding system
    > name, and will write a code something like this:
    >   (decode-coding-string STR 'utf8)
    > and found that it signals an error because utf8 is not
    > statically declared as an alias.

It isn't a big deal.  The manual will explain, and there
are ways to view the list of defined coding systems.



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-31 21:29                                   ` Stefan Monnier
@ 2008-11-10  5:43                                     ` Kenichi Handa
  2008-11-10 15:01                                       ` Stefan Monnier
  0 siblings, 1 reply; 72+ messages in thread
From: Kenichi Handa @ 2008-11-10  5:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: tzz, rms, ding, emacs-devel

In article <jwvprlgtq06.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:
[...]
> The following patch (to be applied by hand, or to be read) is a proof
> of concept.  It allows the user to do "C-x RET f u8 TAB" and have it
> complete to utf-8.
[...]
> === modified file 'lisp/international/mule.el'
> --- lisp/international/mule.el  2008-07-24 03:10:36 +0000
> +++ lisp/international/mule.el  2008-10-31 21:08:00 +0000
> @@ -1192,6 +1192,10 @@
>                   (widen)
>                   (goto-char (point-min))
>                   (set-auto-coding buffer-file-name (buffer-size))))))
> +          (completion-ignore-case t)
> +          (completion-pcm--delim-wild-regex
> +           (concat completion-pcm--delim-wild-regex
> +                   "\\|\\([[:alpha:]]\\)[[:digit:]]"))
>            (cs (completing-read (format "Coding system for saving file (default %s): " auto-cs)
>                                 (completion-table-in-turn
>                                  bcss-table combined-table)

This hunk can't be applied to the current mule.el.  In which function
should this change be applied?

---
Kenichi Handa
handa@ni.aist.go.jp




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-11-10  5:43                                     ` Kenichi Handa
@ 2008-11-10 15:01                                       ` Stefan Monnier
  0 siblings, 0 replies; 72+ messages in thread
From: Stefan Monnier @ 2008-11-10 15:01 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: tzz, rms, ding, emacs-devel

>> === modified file 'lisp/international/mule.el'
>> --- lisp/international/mule.el  2008-07-24 03:10:36 +0000
>> +++ lisp/international/mule.el  2008-10-31 21:08:00 +0000
>> @@ -1192,6 +1192,10 @@
>> (widen)
>> (goto-char (point-min))
>> (set-auto-coding buffer-file-name (buffer-size))))))
>> +          (completion-ignore-case t)
>> +          (completion-pcm--delim-wild-regex
>> +           (concat completion-pcm--delim-wild-regex
>> +                   "\\|\\([[:alpha:]]\\)[[:digit:]]"))
>> (cs (completing-read (format "Coding system for saving file (default %s): " auto-cs)
>> (completion-table-in-turn
>> bcss-table combined-table)

> This hunk can't be applied to the current mule.el.  In which function
> should this change be applied?

The above is from my own code, it's not in Emacs.  If you want the
complete code, it's the interactive spec I use for
set-buffer-file-coding-system, so that completion is first done only on
the applicable coding systems.  But the intent of the patch was just to
show what the code would look like: it would be placed inside a new
`read-coding-system' function.


        Stefan


  ;; FIXME: provide a useful default (e.g. the one that
  ;; select-safe-coding-system would have chosen, or the next best one if
  ;; it's already the current coding system).
  (interactive
   (let* ((bcss (find-coding-systems-region (point-min) (point-max)))
          (bcss-table (append '("dos" "unix" "mac")
                              (unless (equal bcss '(undecided))
                                (mapcar 'symbol-name
                                        (sanitize-coding-system-list bcss)))))
          (css-table
           (unless (equal bcss '(undecided))
             (delq nil (mapcar (lambda (cs)
                                 (if (memq (coding-system-base cs) bcss)
                                     (symbol-name cs)))
                               coding-system-list))))
          (combined-table
           (completion-table-in-turn css-table coding-system-alist))
          (auto-cs
           (unless find-file-literally
             (save-excursion
               (save-restriction
                 (widen)
                 (goto-char (point-min))
                 (set-auto-coding buffer-file-name (buffer-size))))))
          (cs (completing-read (format "Coding system for saving file (default %s): " auto-cs)
                               (completion-table-in-turn
                                bcss-table combined-table)
                               nil t nil 'coding-system-history
                               (if auto-cs (symbol-name auto-cs)))))
     (list (unless (zerop (length cs)) (intern cs))
           current-prefix-arg)))




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-21  0:01                   ` Katsumi Yamaoka
@ 2008-12-15 23:35                     ` Katsumi Yamaoka
  0 siblings, 0 replies; 72+ messages in thread
From: Katsumi Yamaoka @ 2008-12-15 23:35 UTC (permalink / raw)
  To: ding

>>>>> In <E1LCJN1-0002Yy-00@quimby.gnus.org> (in gmane.emacs.gnus.cvs)
>>>>>	Reiner Steib wrote:
>     Date: Monday, December 15, 2008 @ 20:44:47
>   Author: cvs
>     Path: /usr/local/cvsroot/gnus/lisp

> Modified: ChangeLog mm-util.el

> (mm-charset-synonym-alist): Add bogus names "UTF8" and "ISO_8859-1".

This change potentially makes Gnus be a shameful newsreader that
sends messages with bogus charsets.  I must cope with it.

>>>>> In <b4mhc7dpf7w.fsf@jpl.org> Katsumi Yamaoka wrote:
> Type `C-c C-m p text/plain RET' in the message buffer and add
> charset=utf8 manually as follows:

> < #part type="text/plain" disposition=inline charset=utf8>

> In this case the message will be sent with charset=utf8 .

>>>>> In <873air44me.fsf@marauder.physik.uni-ulm.de>
>>>>>	Reiner Steib wrote:
>> I'm surprised that `mm-charset-synonym-alist' has any effect on
>> outgoing messages.  I though it would only be used when displaying an
>> article.  Could you please investigate this?

>>>>> In <b4m1vyaolxy.fsf@jpl.org> Katsumi Yamaoka wrote:
> That is due to `mml-generate-mime-1' that uses the charset specified
> as is (for encoding, it uses the valid coding system derived from
> that charset according to `mm-charset-synonym-alist', though).
> It's a bug, I have two solutions, and I like the second one:

> 1. Bind `mm-charset-synonym-alist' (and `mm-charset-eval-alist')
>    to nil when encoding.  And signal an error if the charset is
>    invalid.

> 2. Replace the charset that is an alias with the valid one that
>    Emacs knows.  Although it violates the idea that "the charset
>    aliases would be used only when displaying an article", it
>    will be convenience for users.

> WDYT?  The patches for 1. and 2. are below:

I've installed 2.  The code and the comment have been slightly
modified.

> Patch1:
> --- mml.el~	2008-10-03 05:47:11 +0000
> +++ mml.el	2008-10-20 23:58:29 +0000
> @@ -476,8 +476,11 @@
>  				 "application/octet-stream")
>  			   "text/plain")))
>  	       (charset (cdr (assq 'charset cont)))
> -	       (coding (mm-charset-to-coding-system charset))
> +	       (coding (let (mm-charset-eval-alist mm-charset-synonym-alist)
> +			 (mm-charset-to-coding-system charset)))
>  	       encoding flowed coded)
> +	  (unless coding
> +	    (error "Unknown charset: %s" charset))
>  	  (cond ((eq coding 'ascii)
>  		 (setq charset nil
>  		       coding nil))

> Patch2:
> --- mml.el~	2008-10-03 05:47:11 +0000
> +++ mml.el	2008-10-20 23:58:29 +0000
> @@ -482,7 +482,12 @@
>  		 (setq charset nil
>  		       coding nil))
>  		(charset
> -		 (setq charset (intern (downcase charset)))))
> +		 ;; `charset' might be an alias that `mm-charset-synonym-alist'
> +		 ;; provides and might not be in common use, so we prefer
> +		 ;; the one that Emacs knows for `coding'.
> +		 (setq charset (if coding
> +				   (mm-coding-system-to-mime-charset coding)
> +				 (intern (downcase charset))))))
>  	  (if (and (not raw)
>  		   (member (car (split-string type "/")) '("text" "message")))
>  	      (progn

Regards,



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: gnus should accept UTF8 even if UTF-8 is standard
  2008-10-22  4:27                                         ` Eli Zaretskii
@ 2009-01-27  4:51                                           ` Kenichi Handa
  0 siblings, 0 replies; 72+ messages in thread
From: Kenichi Handa @ 2009-01-27  4:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: rms, ding, emacs-devel, tomas, monnier, stephen, miles

It seems that this thread is left unsolved.

In article <uabcxfe4v.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > > > Perhaps something like `canonicalize-coding-system-name' would be good.
> > 
> > > That implies that the return value would be a string, not the coding
> > > system itself.  I suggest we return the coding system (or nil), not
> > > just the name.
> > 
> > > Some time back in this thread I suggested `coding-system-for-charset'
> > > (since the argument strings will be charsets).
> > 
> > But, "for-charset" implies that it should be used for
> > mime-charset.  What is required is to find a coding system
> > by loose name matching (not necessarily a mime-charset
> > name), isn't it?
> > 
> > How about `resolve-coding-system-name'?

> Canonicalize is better, IMO.  But again, I think the function should
> return a symbol, not its name (which is a string).  There's no need to
> request that users intern the string.

I've just installed a new function `coding-system-from-name'
and use it to fix broken rmail-get-coding-system.

---
Kenichi Handa
handa@m17n.org




^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2009-01-27  4:51 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-14 20:30 gnus should accept UTF8 even if UTF-8 is standard jidanni
2008-10-14 20:50 ` Ted Zlatanov
2008-10-15  0:39   ` Kenichi Handa
2008-10-15 16:10     ` Richard M. Stallman
2008-10-15 17:32     ` Ted Zlatanov
2008-10-15 19:49       ` Reiner Steib
2008-10-15 19:05         ` Ted Zlatanov
2008-10-15 22:03           ` Reiner Steib
2008-10-15 21:23             ` Ted Zlatanov
2008-10-16  0:15               ` Katsumi Yamaoka
2008-10-20 16:23                 ` Reiner Steib
2008-10-21  0:01                   ` Katsumi Yamaoka
2008-12-15 23:35                     ` Katsumi Yamaoka
2008-10-16  4:32               ` Stephen J. Turnbull
2008-10-16  6:47               ` Eli Zaretskii
2008-10-16 13:01                 ` Ted Zlatanov
2008-10-16  2:41             ` Stefan Monnier
2008-10-16 14:27               ` Richard M. Stallman
2008-10-16 15:41                 ` Stefan Monnier
2008-10-16 17:47                   ` Eli Zaretskii
2008-10-17 19:59                   ` Richard M. Stallman
2008-10-18 19:01                     ` Stefan Monnier
2008-10-20  1:14                       ` Richard M. Stallman
2008-10-20  3:21                         ` Stefan Monnier
2008-10-20  8:42                           ` Eli Zaretskii
2008-10-20 17:04                           ` Richard M. Stallman
2008-10-21  4:39                             ` Stephen J. Turnbull
2008-10-21  5:23                               ` Miles Bader
2008-10-21  6:25                                 ` tomas
2008-10-21  6:21                                   ` Miles Bader
2008-10-21  7:44                                     ` tomas
2008-10-21  8:15                                     ` Eli Zaretskii
2008-10-21  9:06                                       ` Stephen J. Turnbull
2008-10-21 10:22                                         ` Eli Zaretskii
2008-10-21 12:06                                           ` Stephen J. Turnbull
2008-10-21 12:40                                             ` Eli Zaretskii
2008-10-22  2:34                                               ` Stephen J. Turnbull
2008-10-22  4:33                                                 ` Eli Zaretskii
2008-10-22 21:02                                                 ` Richard M. Stallman
2008-10-22  0:32                                       ` Kenichi Handa
2008-10-22  4:27                                         ` Eli Zaretskii
2009-01-27  4:51                                           ` Kenichi Handa
2008-10-21  8:06                                 ` Stephen J. Turnbull
2008-10-20 16:00                 ` Reiner Steib
2008-10-20 22:03                   ` Richard M. Stallman
2008-10-21  2:50                     ` Kenichi Handa
2008-10-21 16:00                       ` Ted Zlatanov
2008-10-22  1:22                         ` Kenichi Handa
2008-10-22  2:07                         ` Stephen J. Turnbull
2008-10-22  6:21                         ` Richard M. Stallman
2008-10-23  2:34                           ` Kenichi Handa
2008-10-23 21:08                             ` Richard M. Stallman
2008-10-24  0:54                               ` Kenichi Handa
2008-10-24 18:36                                 ` Richard M. Stallman
2008-10-22 13:15                         ` Stefan Monnier
2008-10-24 17:21                           ` Ted Zlatanov
2008-10-25  2:01                             ` Richard M. Stallman
2008-10-25  2:32                               ` Kenichi Handa
2008-10-26  4:10                                 ` Richard M. Stallman
2008-10-31  6:33                                   ` Kenichi Handa
2008-10-31  7:24                                     ` Miles Bader
2008-10-31 19:31                                     ` Richard M. Stallman
2008-11-01  2:17                                       ` Kenichi Handa
2008-11-02  1:53                                         ` Richard M. Stallman
2008-11-07  7:15                                           ` Kenichi Handa
2008-11-07 17:04                                             ` Richard M. Stallman
2008-10-25 18:27                               ` Stefan Monnier
2008-10-26  4:10                                 ` Richard M. Stallman
2008-10-31 21:29                                   ` Stefan Monnier
2008-11-10  5:43                                     ` Kenichi Handa
2008-11-10 15:01                                       ` Stefan Monnier
2008-10-16  1:12         ` Kenichi Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).