Gnus development mailing list
 help / color / mirror / Atom feed
* wrong charset in spite of proper format
@ 2005-03-09 13:47 Matthias Andree
  2005-03-09 15:24 ` Reiner Steib
  0 siblings, 1 reply; 6+ messages in thread
From: Matthias Andree @ 2005-03-09 13:47 UTC (permalink / raw)


Hi,

a Ukraïnian subscriber recently posted a mail of the structure sketched
below to the fetchmail-friends mailing list. No matter what part of the
mail I look at (with C-d, article as ephemeral group), No Gnus (fresh
from CVS) uses my local character set, ISO-8859-whatever, rather than
Windows-1251 for display. mutt gets this right.

What's up here? Does Gnus lack Windows-1251? If so, why does it not
replace everything by dots, X or ?. If it supports Windows-1251, why
doesn't it see it? I don't have the time to play with modified copies of
the mail and Gnus now to figure what's up. (Emacs 21.3 with rm'd movemail)

Full copy of the message available from
<http://home.pages.de/~mandree/tmp/58def670bf48a5363b4df09b717495b6@apple.mk.ua.txt>.

Outline:

(head)
Mime-Version: 1.0 (Apple Message framework v619.2)
Content-Type: multipart/alternative; boundary=Apple-Mail-1-1053523650

--Apple-Mail-1-1053523650
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
        charset=WINDOWS-1251;
        format=flowed

6 =E1=E5=F0 2005, =EE 0:59, Matthias Andree =ED=E0=EF=E8=F1=E0=E2(=EB=E0):=

...

--Apple-Mail-1-1053523650
Content-Transfer-Encoding: quoted-printable
Content-Type: text/enriched;
        charset=WINDOWS-1251

...

--Apple-Mail-1-1053523650--


-- 
Matthias Andree



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wrong charset in spite of proper format
  2005-03-09 13:47 wrong charset in spite of proper format Matthias Andree
@ 2005-03-09 15:24 ` Reiner Steib
  2005-03-10  0:35   ` Matthias Andree
  0 siblings, 1 reply; 6+ messages in thread
From: Reiner Steib @ 2005-03-09 15:24 UTC (permalink / raw)


On Wed, Mar 09 2005, Matthias Andree wrote:

> a Ukraïnian subscriber recently posted a mail of the structure sketched
> below to the fetchmail-friends mailing list. No matter what part of the
> mail I look at (with C-d, article as ephemeral group), No Gnus (fresh
> from CVS) uses my local character set, ISO-8859-whatever, rather than
> Windows-1251 for display. mutt gets this right.
>
> What's up here? Does Gnus lack Windows-1251?

Gnus doesn't provide any charsets, (X)Emacs[1] does.

> If it supports Windows-1251, why doesn't it see it?

In Emacs 21.[1-4] you need (codepage-setup 1251) in `~/.gnus.el' or
`~/.emacs'.  Or better make that...

(unless (coding-system-p 'windows-1251)
  (codepage-setup 1251))

The upcoming Emacs 22 already has preloaded windows-1251 and has
autoloads for other commonly used charsets (iso-8859-*, windows-125*;
see [2]).

> Full copy of the message available from
> <http://home.pages.de/~mandree/tmp/58def670bf48a5363b4df09b717495b6@apple.mk.ua.txt>.

[ Or news://news.gmane.org/gmane.mail.fetchmail.user/7122
    <news:58def670bf48a5363b4df09b717495b6@apple.mk.ua> ]

Bye, Reiner.

[1] More on (X)Emacs, Gnus and charsets (in German):
    http://theotp1.physik.uni-ulm.de/~ste/comp/emacs/gnus/draft/

[2] http://thread.gmane.org/v9k6orjx0i.fsf@marauder.physik.uni-ulm.de

,----[ emacs/lisp/ChangeLog ]
| 2005-03-04  Reiner Steib  <Reiner.Steib@gmx.de>
| 
| 	* international/code-pages.el (windows-1250, windows-125[2-8])
| 	(iso-8859-10, -13, -16, georgian-ps): Add autoload cookies.
`----
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wrong charset in spite of proper format
  2005-03-09 15:24 ` Reiner Steib
@ 2005-03-10  0:35   ` Matthias Andree
  2005-03-10 18:53     ` Reiner Steib
  0 siblings, 1 reply; 6+ messages in thread
From: Matthias Andree @ 2005-03-10  0:35 UTC (permalink / raw)


Reiner Steib <reinersteib+gmane@imap.cc> writes:

>> What's up here? Does Gnus lack Windows-1251?
>
> Gnus doesn't provide any charsets, (X)Emacs[1] does.

OK. My complaint is that as a result of Emacs 21.SOME_MINOR_RELEASE not
providing a particular character set, Gnus display anything else rather
than falling back to ASCII (where appropriate), masking the unprintables
and stuffing a status line that reads something like "windows-1251 not
supported by your emacs, displaying ASCII parts"

>> If it supports Windows-1251, why doesn't it see it?
>
> In Emacs 21.[1-4] you need (codepage-setup 1251) in `~/.gnus.el' or
> `~/.emacs'.  Or better make that...
>
> (unless (coding-system-p 'windows-1251)
>   (codepage-setup 1251))

Insufficient. Calling (define-coding-system-alias 'windows-1251 'cp1251)
on top of that works however. This is along the lines Simon Josefsson
suggested one and a half years ago WRT Windows-1252.

It is a shame that such functionality still isn't enabled in the default
No Gnus after such a long time. :-(

> [1] More on (X)Emacs, Gnus and charsets (in German):
>     http://theotp1.physik.uni-ulm.de/~ste/comp/emacs/gnus/draft/

Currently unavailable.

-- 
Matthias Andree



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wrong charset in spite of proper format
  2005-03-10  0:35   ` Matthias Andree
@ 2005-03-10 18:53     ` Reiner Steib
  2005-03-10 22:37       ` Miles Bader
  0 siblings, 1 reply; 6+ messages in thread
From: Reiner Steib @ 2005-03-10 18:53 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 936 bytes --]

On Thu, Mar 10 2005, Matthias Andree wrote:

> OK. My complaint is that as a result of Emacs 21.SOME_MINOR_RELEASE not
> providing a particular character set, Gnus display anything else rather
> than falling back to ASCII (where appropriate), masking the unprintables
> and stuffing a status line that reads something like "windows-1251 not
> supported by your emacs, displaying ASCII parts"
[...]
> This is along the lines Simon Josefsson suggested one and a half
> years ago WRT Windows-1252.
>
> It is a shame that such functionality still isn't enabled in the default
> No Gnus after such a long time. :-(

Could you try the following patch? [*] It should automatically do the
setup for windows-125[0137] (which are available in Emacs 21).  If no
charset (or alias) is found, it will print a message.  (Displaying as
ASCII is and replacing unknown chars with `?' is not included.  I'm
not sure how this could be achieved in Gnus.)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rs-mm-util-auto-charset.patch --]
[-- Type: text/x-patch, Size: 3365 bytes --]

--- mm-util.el	21 Feb 2005 12:42:41 +0100	7.26
+++ mm-util.el	10 Mar 2005 19:31:33 +0100	
@@ -142,6 +142,34 @@
       ;; Is this branch ever actually useful?
       (car (memq cs (mm-get-coding-system-list))))))
 
+(defun mm-codepage-setup (number)
+  "Create a coding system cpNUMBER and an alias for windows-NUMBER.
+The coding system is created using `codepage-setup'.  The alias
+is added to `mm-charset-synonym-alist'."
+  (interactive
+   (let ((completion-ignore-case t)
+	 (candidates (cp-supported-codepages)))
+     (list (completing-read "Setup DOS Codepage: (default 437) " candidates
+			    nil t nil nil "437"))))
+  (let* ((cp (intern (format "cp%s" number)))
+	 (alias (intern (format "windows-%s" number))))
+    (unless (mm-coding-system-p cp)
+      (when (codepage-setup number)
+	(unless (mm-coding-system-p alias)
+	  (add-to-list 'mm-charset-synonym-alist
+		       (cons alias cp)))))))
+
+(defvar mm-charset-eval-alist
+  '(;; (iso-8859-13 . (require 'code-pages))
+    ;; Emacs 21 offers: 1250 1251 1253 1257
+    (windows-1250 . (mm-codepage-setup 1250))
+    (windows-1251 . (mm-codepage-setup 1251))
+    (windows-1253 . (mm-codepage-setup 1253))
+    (windows-1257 . (mm-codepage-setup 1257)))
+  "An alist of \(charset . form\) pairs.
+If an article is encoded in an unknown CHARSET, FORM is evaluated.
+This allows to load additional libraries providing CHARSETS.")
+
 (defvar mm-charset-synonym-alist
   `(
     ;; Not in XEmacs, but it's not a proper MIME charset anyhow.
@@ -175,7 +203,7 @@
 	    '((ks_c_5601-1987 . cp949))
 	  '((ks_c_5601-1987 . euc-kr))))
     )
-  "A mapping from invalid charset names to the real charset names.")
+  "A mapping from unknown or invalid charset names to the real charset names.")
 
 (defvar mm-binary-coding-system
   (cond
@@ -400,6 +428,10 @@
 	(pop alist))
       out)))
 
+;; FIXME: `gnus-message' must be replaced by `message'.  This is just for
+;; testing.
+(autoload 'gnus-message "gnus-util")
+
 (defun mm-charset-to-coding-system (charset &optional lbt)
   "Return coding-system corresponding to CHARSET.
 CHARSET is a symbol naming a MIME charset.
@@ -428,9 +460,26 @@
 ;;; 	 (eq charset (coding-system-get charset 'mime-charset))
 	 )
     charset)
+   ;; Eval expressions from `mm-charset-eval-alist'
+   ((let* ((el (assq charset mm-charset-eval-alist))
+	   (cs (car el))
+	   (form (cdr el)))
+      (and cs
+	   form
+	   ;; Avoid errors...
+	   (condition-case nil (eval form) (error nil))
+	   ;; (message "Failed to eval `%s'" form))
+	   (mm-coding-system-p cs)
+	   (gnus-message 7 "Added charset `%s' via `mm-charset-eval-alist'" cs)
+	   cs)))
    ;; Translate invalid charsets.
    ((let ((cs (cdr (assq charset mm-charset-synonym-alist))))
-      (and cs (mm-coding-system-p cs) cs)))
+      (and cs
+	   (mm-coding-system-p cs)
+	   (gnus-message 7
+	    "Using synonym `%s' from `mm-charset-synonym-alist' for `%s'"
+	    cs charset)
+	   cs)))
    ;; Last resort: search the coding system list for entries which
    ;; have the right mime-charset in case the canonical name isn't
    ;; defined (though it should be).
@@ -442,6 +491,8 @@
 		 (eq charset (or (coding-system-get c :mime-charset)
 				 (coding-system-get c 'mime-charset))))
 	    (setq cs c)))
+      (unless cs
+	(gnus-message 7 "Unknown charset: %s" charset))
       cs))))
 
 (eval-and-compile

[-- Attachment #3: Type: text/plain, Size: 1088 bytes --]


>> [1] More on (X)Emacs, Gnus and charsets (in German):
>>     http://theotp1.physik.uni-ulm.de/~ste/comp/emacs/gnus/draft/
>
> Currently unavailable.

Up again.  (But probably not up to date WRT Emacs 22.)

Bye, Reiner.

[*] I've posted a series of test postings for windows-125* to
    <news:gmane.test>:
    <news:2005-03-10-gmane-windows-1250@marauder.physik.uni-ulm.de>
    <news:2005-03-10-gmane-windows-1251@marauder.physik.uni-ulm.de>
    <news:2005-03-10-gmane-windows-1252@marauder.physik.uni-ulm.de>
    <news:2005-03-10-gmane-windows-1253@marauder.physik.uni-ulm.de>
    <news:2005-03-10-gmane-windows-1254@marauder.physik.uni-ulm.de>
    <news:2005-03-10-gmane-windows-1255@marauder.physik.uni-ulm.de>
    <news:2005-03-10-gmane-windows-1256@marauder.physik.uni-ulm.de>
    <news:2005-03-10-gmane-windows-1257@marauder.physik.uni-ulm.de>
    <news:2005-03-10-gmane-windows-1258@marauder.physik.uni-ulm.de>
    <news:2005-03-10-gmane-windows-1259@marauder.physik.uni-ulm.de>
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wrong charset in spite of proper format
  2005-03-10 18:53     ` Reiner Steib
@ 2005-03-10 22:37       ` Miles Bader
  2005-03-11 10:05         ` Reiner Steib
  0 siblings, 1 reply; 6+ messages in thread
From: Miles Bader @ 2005-03-10 22:37 UTC (permalink / raw)


Reiner Steib <reinersteib+gmane@imap.cc> writes:
> [*] I've posted a series of test postings for windows-125* to

I notice that my emacs only displays something meaningful for 1251
and 1252; should it be able to handle the others too?  Or is it a
font issue or something (though I've got lots of fonts installed;
most of emacs HELLO displays properly)?

Thanks,

-Miles
-- 
[|nurgle|]  ddt- demonic? so quake will have an evil kinda setting? one that
            will  make every christian in the world foamm at the mouth?
[iddt]      nurg, that's the goal




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wrong charset in spite of proper format
  2005-03-10 22:37       ` Miles Bader
@ 2005-03-11 10:05         ` Reiner Steib
  0 siblings, 0 replies; 6+ messages in thread
From: Reiner Steib @ 2005-03-11 10:05 UTC (permalink / raw)


On Thu, Mar 10 2005, Miles Bader wrote:

> Reiner Steib <reinersteib+gmane@imap.cc> writes:
>> [*] I've posted a series of test postings for windows-125* to
>
> I notice that my emacs only displays something meaningful for 1251
> and 1252; should it be able to handle the others too?

The article "windows-1252" was actually created as windows-1252.  Then
I just replaced "windows-1252" by "windows-125x" (x \in {0...9}) in
the outgoing file using sed.  Emacs should display _something_ (at
least for A0-FF), although not necessarily the character denoted in
the "Description (only correct for Latin-1)" column.  The description
should be correct for windows-1252.  E.g. at Hex A3 there is "POUND
SIGN" in windows-1252 but a Cyrillic-J (Ј) in windows-1251.

As "windows-1259" doesn't exist (AFAIK), this article should display
the "Unknown charset" message (you must have `gnus-verbose' >= 7).

> Or is it a font issue or something (though I've got lots of fonts
> installed; most of emacs HELLO displays properly)?

If you see hollow squares, it is a font issue, I think.  If you see
\200 or similar, the position might be unused in this charset.

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-03-11 10:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-03-09 13:47 wrong charset in spite of proper format Matthias Andree
2005-03-09 15:24 ` Reiner Steib
2005-03-10  0:35   ` Matthias Andree
2005-03-10 18:53     ` Reiner Steib
2005-03-10 22:37       ` Miles Bader
2005-03-11 10:05         ` Reiner Steib

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).