Gnus development mailing list
 help / color / mirror / Atom feed
* charset ANSI_x3.4-1968, et al
@ 2007-03-18  1:41 Karl Chen
  2007-03-18 19:54 ` Simon Josefsson
  0 siblings, 1 reply; 10+ messages in thread
From: Karl Chen @ 2007-03-18  1:41 UTC (permalink / raw)
  To: ding


Dear Gnus developers, I've been starting to receive a lot of
emails from cron [1] with this header:

 Content-Type: text/plain; charset=ANSI_X3.4-1968

I get the message: Unknown charset: ANSI_X3.4-1968

According to RFC 1345, "ANSI_X3.4-1968", along with other names,
are aliases for US-ASCII, and actually, "ANSI_X3.4-1968" is the
official name (though according to
http://www.iana.org/assignments/character-sets, "US-ASCII" is the
preferred MIME name).

Could you consider something such as the patch below, which works
for me:


--- mm-util.el	17 Mar 2007 18:08:14 -0700	7.62
+++ mm-util.el	17 Mar 2007 18:27:45 -0700	
@@ -641,8 +641,9 @@
    ((and allow-override
 	 (let ((cs (cdr (assq charset mm-charset-override-alist))))
 	   (and cs (mm-coding-system-p cs) cs))))
-   ;; ascii
-   ((eq charset 'us-ascii)
+   ;; ascii (see RFC 1345)
+   (memq charset '(us-ascii ansi_x3.4-1968 ansi_x3.4-1986 iso-ir-6 iso646-us
+                            iso_646.irv:1991 us ibm367 cp367 csascii))
     'ascii)
    ;; Check to see whether we can handle this charset.  (This depends
    ;; on there being some coding system matching each `mime-charset'


Regards, Karl

[1] This is now happening on multiple systems (which are all
    Debian).  cron appears to be using the default locale name.
    On Linux, you can get the string ANSI_x3.4-1968 if you run
    "LC_CTYPE=C locale charmap".




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: charset ANSI_x3.4-1968, et al
  2007-03-18  1:41 charset ANSI_x3.4-1968, et al Karl Chen
@ 2007-03-18 19:54 ` Simon Josefsson
  2007-03-18 22:54   ` Reiner Steib
  2007-03-18 23:57   ` Karl Chen
  0 siblings, 2 replies; 10+ messages in thread
From: Simon Josefsson @ 2007-03-18 19:54 UTC (permalink / raw)
  To: Karl Chen; +Cc: ding

Karl Chen <quarl@cs.berkeley.edu> writes:

> Dear Gnus developers, I've been starting to receive a lot of
> emails from cron [1] with this header:
>
>  Content-Type: text/plain; charset=ANSI_X3.4-1968
>
> I get the message: Unknown charset: ANSI_X3.4-1968
>
> According to RFC 1345, "ANSI_X3.4-1968", along with other names,
> are aliases for US-ASCII, and actually, "ANSI_X3.4-1968" is the
> official name (though according to
> http://www.iana.org/assignments/character-sets, "US-ASCII" is the
> preferred MIME name).

RFC 1345 is not standards-track, but the IANA registry is used by the
standards-track MIME.

> Could you consider something such as the patch below, which works
> for me:
>
>
> --- mm-util.el	17 Mar 2007 18:08:14 -0700	7.62
> +++ mm-util.el	17 Mar 2007 18:27:45 -0700	
> @@ -641,8 +641,9 @@
>     ((and allow-override
>  	 (let ((cs (cdr (assq charset mm-charset-override-alist))))
>  	   (and cs (mm-coding-system-p cs) cs))))
> -   ;; ascii
> -   ((eq charset 'us-ascii)
> +   ;; ascii (see RFC 1345)
> +   (memq charset '(us-ascii ansi_x3.4-1968 ansi_x3.4-1986 iso-ir-6 iso646-us
> +                            iso_646.irv:1991 us ibm367 cp367 csascii))
>      'ascii)
>     ;; Check to see whether we can handle this charset.  (This depends
>     ;; on there being some coding system matching each `mime-charset'

I think we should install this.  Any objections?

> [1] This is now happening on multiple systems (which are all
>     Debian).  cron appears to be using the default locale name.
>     On Linux, you can get the string ANSI_x3.4-1968 if you run
>     "LC_CTYPE=C locale charmap".

I think this should be reported as a bug.

/Simon



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: charset ANSI_x3.4-1968, et al
  2007-03-18 19:54 ` Simon Josefsson
@ 2007-03-18 22:54   ` Reiner Steib
  2007-03-19  0:02     ` Karl Chen
  2007-03-18 23:57   ` Karl Chen
  1 sibling, 1 reply; 10+ messages in thread
From: Reiner Steib @ 2007-03-18 22:54 UTC (permalink / raw)
  To: Karl Chen; +Cc: ding

On Sun, Mar 18 2007, Simon Josefsson wrote:

> Karl Chen <quarl@cs.berkeley.edu> writes:
>> I get the message: Unknown charset: ANSI_X3.4-1968

Well, this is only a warning, and you need to set `gnus-verbose' to 7
or higher to see it.

[...]
>> -   ;; ascii
>> -   ((eq charset 'us-ascii)
>> +   ;; ascii (see RFC 1345)
>> +   (memq charset '(us-ascii ansi_x3.4-1968 ansi_x3.4-1986 iso-ir-6 iso646-us
>> +                            iso_646.irv:1991 us ibm367 cp367 csascii))
>>      'ascii)
[...]
>
> I think we should install this.  Any objections?

If adding such a list it useful, I'd suggest to add it in variable,
say `mm-charset-ascii-synonym-list'...

-   ((eq charset 'us-ascii)
+   ((memq charset 'mm-charset-ascii-synonym-list)

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: charset ANSI_x3.4-1968, et al
  2007-03-18 19:54 ` Simon Josefsson
  2007-03-18 22:54   ` Reiner Steib
@ 2007-03-18 23:57   ` Karl Chen
  2007-03-19  8:45     ` Simon Josefsson
  1 sibling, 1 reply; 10+ messages in thread
From: Karl Chen @ 2007-03-18 23:57 UTC (permalink / raw)
  To: ding

>>>>> On 2007-03-18 12:54 PDT, Simon Josefsson writes:

    Simon> RFC 1345 is not standards-track, but the IANA registry
    Simon> is used by the standards-track MIME.

Good point about standards-track-ness.  But, regardless of whether
it's officially "okay", I would recommend /accepting/ (not
sending) the ANSI_x3.4-1968 string: "Be liberal in what you
accept, conservative in what you send."  There are already
implementations "in the wild" sending it (perhaps incorrectly, but
still, it's out there) -- Fedora, Debian, and presumably their
derivatives.

    >> [1] This is now happening on multiple systems (which are
    >> all Debian).  cron appears to be using the default locale
    >> name.  On Linux, you can get the string ANSI_x3.4-1968 if
    >> you run "LC_CTYPE=C locale charmap".

    Simon> I think this should be reported as a bug.

Yup, on the "conservative in what you send" side of it, I also
sent them patch after I emailed ding@.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=415302

-- 
Karl 2007-03-18 16:48




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: charset ANSI_x3.4-1968, et al
  2007-03-18 22:54   ` Reiner Steib
@ 2007-03-19  0:02     ` Karl Chen
  2007-03-19  8:43       ` Simon Josefsson
  0 siblings, 1 reply; 10+ messages in thread
From: Karl Chen @ 2007-03-19  0:02 UTC (permalink / raw)
  To: ding

>>>>> On 2007-03-18 15:54 PDT, Reiner Steib writes:

    Reiner> Well, this is only a warning, and you need to set
    Reiner> `gnus-verbose' to 7 or higher to see it.

Ah, okay.  (The default for `gnus-verbose' seems to be 7 though...)

    Reiner> If adding such a list it useful, I'd suggest to add it
    Reiner> in variable, say `mm-charset-ascii-synonym-list'...

Good point; I only sent a single list, but if you wanted to go all
the way you could do a whole alist of synonym lists, per the IANA
list.

If it's true that these aliases aren't officially required to be
accepted in MIME then it would also be okay to just have the few
most popular US-ASCII aliases, e.g. ANSI_x3.4-1968 and
ANSI_x3.4-1986.

-- 
Karl 2007-03-18 16:57




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: charset ANSI_x3.4-1968, et al
  2007-03-19  0:02     ` Karl Chen
@ 2007-03-19  8:43       ` Simon Josefsson
  2007-03-20  7:50         ` Karl Chen
  0 siblings, 1 reply; 10+ messages in thread
From: Simon Josefsson @ 2007-03-19  8:43 UTC (permalink / raw)
  To: Karl Chen; +Cc: ding

Karl Chen <quarl@cs.berkeley.edu> writes:

>     Reiner> If adding such a list it useful, I'd suggest to add it
>     Reiner> in variable, say `mm-charset-ascii-synonym-list'...
>
> Good point; I only sent a single list, but if you wanted to go all
> the way you could do a whole alist of synonym lists, per the IANA
> list.

I don't think that is a good idea -- let's add "bugfix" aliases on a
need-to-have basis judging from bug reports.

> If it's true that these aliases aren't officially required to be
> accepted in MIME then it would also be okay to just have the few
> most popular US-ASCII aliases, e.g. ANSI_x3.4-1968 and
> ANSI_x3.4-1986.

Yup.

I looked at a cleaner way to solve this.  Can you test the patch
below?

/Simon

--- mm-util.el	24 Jan 2007 11:48:30 +0100	7.62
+++ mm-util.el	19 Mar 2007 09:42:14 +0100	
@@ -347,7 +347,8 @@
 (defcustom mm-charset-override-alist
   '((iso-8859-1 . windows-1252)
     (iso-8859-8 . windows-1255)
-    (iso-8859-9 . windows-1254))
+    (iso-8859-9 . windows-1254)
+    (ansi_x3.4-1968 . us-ascii))
   "A mapping from undesired charset names to their replacement.
 
 You may add pairs like (iso-8859-1 . windows-1252) here,
@@ -357,6 +358,7 @@
 		    (const (iso-8859-1 . windows-1252))
 		    (const (iso-8859-8 . windows-1255))
 		    (const (iso-8859-9 . windows-1254))
+		    (const (ansi_x3.4-1968 . us-ascii))
 		    (const (undecided  . windows-1252)))
 	       (repeat :inline t
 		       :tag "Other options"



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: charset ANSI_x3.4-1968, et al
  2007-03-18 23:57   ` Karl Chen
@ 2007-03-19  8:45     ` Simon Josefsson
  0 siblings, 0 replies; 10+ messages in thread
From: Simon Josefsson @ 2007-03-19  8:45 UTC (permalink / raw)
  To: Karl Chen; +Cc: ding

Karl Chen <quarl@cs.berkeley.edu> writes:

>>>>>> On 2007-03-18 12:54 PDT, Simon Josefsson writes:
>
>     Simon> RFC 1345 is not standards-track, but the IANA registry
>     Simon> is used by the standards-track MIME.
>
> Good point about standards-track-ness.  But, regardless of whether
> it's officially "okay", I would recommend /accepting/ (not
> sending) the ANSI_x3.4-1968 string: "Be liberal in what you
> accept, conservative in what you send."  There are already
> implementations "in the wild" sending it (perhaps incorrectly, but
> still, it's out there) -- Fedora, Debian, and presumably their
> derivatives.

Yes, I agree.

>     >> [1] This is now happening on multiple systems (which are
>     >> all Debian).  cron appears to be using the default locale
>     >> name.  On Linux, you can get the string ANSI_x3.4-1968 if
>     >> you run "LC_CTYPE=C locale charmap".
>
>     Simon> I think this should be reported as a bug.
>
> Yup, on the "conservative in what you send" side of it, I also
> sent them patch after I emailed ding@.
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=415302

Thanks!

/Simon



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: charset ANSI_x3.4-1968, et al
  2007-03-19  8:43       ` Simon Josefsson
@ 2007-03-20  7:50         ` Karl Chen
  2007-03-20  8:35           ` Simon Josefsson
  0 siblings, 1 reply; 10+ messages in thread
From: Karl Chen @ 2007-03-20  7:50 UTC (permalink / raw)
  To: ding

>>>>> On 2007-03-19 01:43 PDT, Simon Josefsson writes:

    Simon> I looked at a cleaner way to solve this.  Can you test
    Simon> the patch below?

Thank you, yes, I tried it and it gets rid of the warning message,
but...

    Simon>  (defcustom mm-charset-override-alist
[..]
    Simon> - (iso-8859-9 . windows-1254))
    Simon> + (iso-8859-9 . windows-1254)
    Simon> + (ansi_x3.4-1968 . us-ascii))

I'm not sure it's "cleaner": as far as I can tell
mm-charset-override-alist is intended to be a user-level override
list.  For example, there could be aliases for iso-8859-1, but the
user shouldn't have to manually list all aliases to "override"
into windows-1252.  If the user wants to override us-ascii to
something else he'd have to manually add mappings for all the
us-ascii aliases.  I think it'd be cleaner (in terms of semantics,
maybe not patch size) to keep "override alist" and "alias alist"
separate.


-- 
Karl 2007-03-20 00:43




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: charset ANSI_x3.4-1968, et al
  2007-03-20  7:50         ` Karl Chen
@ 2007-03-20  8:35           ` Simon Josefsson
  2007-03-21  4:06             ` Karl Chen
  0 siblings, 1 reply; 10+ messages in thread
From: Simon Josefsson @ 2007-03-20  8:35 UTC (permalink / raw)
  To: Karl Chen; +Cc: ding

Karl Chen <quarl@cs.berkeley.edu> writes:

>>>>>> On 2007-03-19 01:43 PDT, Simon Josefsson writes:
>
>     Simon> I looked at a cleaner way to solve this.  Can you test
>     Simon> the patch below?
>
> Thank you, yes, I tried it and it gets rid of the warning message,
> but...
>
>     Simon>  (defcustom mm-charset-override-alist
> [..]
>     Simon> - (iso-8859-9 . windows-1254))
>     Simon> + (iso-8859-9 . windows-1254)
>     Simon> + (ansi_x3.4-1968 . us-ascii))
>
> I'm not sure it's "cleaner": as far as I can tell
> mm-charset-override-alist is intended to be a user-level override
> list.  For example, there could be aliases for iso-8859-1, but the
> user shouldn't have to manually list all aliases to "override"
> into windows-1252.  If the user wants to override us-ascii to
> something else he'd have to manually add mappings for all the
> us-ascii aliases.  I think it'd be cleaner (in terms of semantics,
> maybe not patch size) to keep "override alist" and "alias alist"
> separate.

I'm not sure there are two separate lists here.  As far as I
understand, there is only one MIME charset name.  All the other
aliases are not intended for MIME, and should never be used for MIME.
For compatibility, we could add overrides for such aliases, if they
are common, though, but conceptually it is wrong to think of them as
"aliases".  They are "overrides".

However, I do agree that using user variables for this is a bad idea
-- if the user customizes the variable in one Gnus version, and we add
compatibility mappings in a later version, they will miss the new
mappings.

However, I can think of users that do not want to map ansi_x3.4-1968
to us-ascii, for the same reason some users do not want to map
iso-8859-9 to windows-1254, so whatever we come up with, it should be
possible for users to customize it.

/Simon



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: charset ANSI_x3.4-1968, et al
  2007-03-20  8:35           ` Simon Josefsson
@ 2007-03-21  4:06             ` Karl Chen
  0 siblings, 0 replies; 10+ messages in thread
From: Karl Chen @ 2007-03-21  4:06 UTC (permalink / raw)
  To: ding

>>>>> On 2007-03-20 01:35 PDT, Simon Josefsson writes:

    Simon> I'm not sure there are two separate lists here.  As far
    Simon> as I understand, there is only one MIME charset name.
    Simon> All the other aliases are not intended for MIME, and
    Simon> should never be used for MIME.  For compatibility, we
    Simon> could add overrides for such aliases, if they are
    Simon> common, though, but conceptually it is wrong to think
    Simon> of them as "aliases".  They are "overrides".

Okay, whichever way works :)  Thanks, 

-- 
Karl 2007-03-20 21:05




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-03-21  4:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-18  1:41 charset ANSI_x3.4-1968, et al Karl Chen
2007-03-18 19:54 ` Simon Josefsson
2007-03-18 22:54   ` Reiner Steib
2007-03-19  0:02     ` Karl Chen
2007-03-19  8:43       ` Simon Josefsson
2007-03-20  7:50         ` Karl Chen
2007-03-20  8:35           ` Simon Josefsson
2007-03-21  4:06             ` Karl Chen
2007-03-18 23:57   ` Karl Chen
2007-03-19  8:45     ` Simon Josefsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).