From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/59377
Path: main.gmane.org!not-for-mail
From: Aidan Kehoe <kehoea@parhasard.net>
Newsgroups: gmane.emacs.gnus.general
Subject: Re: XEmacs, Gnus and mm-coding-system priorities.
Date: Tue, 07 Dec 2004 00:10:28 +0000
Message-ID: <16820.62708.663703.64580.z25zdq@parhasard.net>
References: <16816.39575.977351.530618.dm0@vm.parhasard.net>
	<b9yeki45fb4.fsf@jpl.org>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
X-Trace: sea.gmane.org 1102435600 11063 80.91.229.6 (7 Dec 2004 16:06:40 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Tue, 7 Dec 2004 16:06:40 +0000 (UTC)
Cc: ding@gnus.org
Original-X-From: ding-owner+M7918@lists.math.uh.edu Tue Dec 07 17:06:35 2004
Return-path: <ding-owner+M7918@lists.math.uh.edu>
Original-Received: from malifon.math.uh.edu ([129.7.128.13] ident=mail)
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1CbhRz-0007s5-00
	for <ding-account@gmane.org>; Tue, 07 Dec 2004 16:40:27 +0100
Original-Received: from localhost
	([127.0.0.1] helo=lists.math.uh.edu ident=lists)
	by malifon.math.uh.edu with smtp (Exim 3.20 #1)
	id 1CbhQ6-0003Ua-00; Tue, 07 Dec 2004 09:38:30 -0600
Original-Received: from util2.math.uh.edu ([129.7.128.23])
	by malifon.math.uh.edu with esmtp (Exim 3.20 #1)
	id 1CbSwB-0000wv-00
	for ding@lists.math.uh.edu; Mon, 06 Dec 2004 18:10:39 -0600
Original-Received: from justine.libertine.org ([66.139.78.221] ident=postfix)
	by util2.math.uh.edu with esmtp (Exim 4.30)
	id 1CbSw8-0000R1-8N
	for ding@lists.math.uh.edu; Mon, 06 Dec 2004 18:10:36 -0600
Original-Received: from ns5.nestdesign.com (unknown [69.93.162.170])
	by justine.libertine.org (Postfix) with ESMTP id B9DD13A0035
	for <ding@gnus.org>; Mon,  6 Dec 2004 18:10:35 -0600 (CST)
Original-Received: by ns5.nestdesign.com (Postfix, from userid 508)
	id E499C328002; Tue,  7 Dec 2004 00:10:31 +0000 (GMT)
Original-To: Katsumi Yamaoka <yamaoka@jpl.org>
Original-Newsgroups: gnu.emacs.gnus
In-Reply-To: <b9yeki45fb4.fsf@jpl.org>
X-Mailer: VM 7.14 under 21.4 (patch 13) "Rational FORTRAN" XEmacs Lucid
X-Echelon-distraction: RDI T Branch 701 Merv plutonium ZARK 
User-Agent: Gnus/5.1002 (Gnus v5.10.2) XEmacs/21.4 (Rational FORTRAN, linux)
Cancel-Lock: sha1:MEV38jwTMiAJ9v3TRuWR535P66Q=
Precedence: bulk
Original-Sender: ding-owner@lists.math.uh.edu
Xref: main.gmane.org gmane.emacs.gnus.general:59377
X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:59377

--=-=-=
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable


Hi! again,=20

 Ar an s=E9i=FA l=E1 de m=ED na Nollaig, scr=EDobh Katsumi Yamaoka:=20

 > [...] I tried your patch with the old mm-util.el and confirmed it works.
 > However, it doesn't work with non-Latin characters:
 >=20
 > (let ((mm-coding-system-priorities '(shift_jis)))
 >   (rfc2047-encode-string (string (make-char 'japanese-jisx0208 38 66))))

I was wrong, I can work around this. Since latin-unity knows the list of
coding systems it can map into, we can check each entry in
mm-coding-system-priorities for validity before trying to remap with it. So,
for example, if shift_jis is in mm-coding-system-priorities, the latin unity
code should just give up, and mm-find-mime-charset falls back to the code
you had written.

I~ve a revised patch and test set (including your tests from the last mail)
attached that addresses this. I~m also going to post this to gnu.emacs.gnus
because ding@gnus evidently doesn~t like me :-) .

In terms of usability, I~m starting to feel strongly that
mm-coding-systems-priorities should be initialised to '(iso-8859-1
iso-8859-15 iso-8859-2 iso-8859-16 utf-8) for non-East-Asian
locales. Americans don~t care, but for non-English speaking Europeans the
Mule breakage just another reason not to use an Emacs.=20

Best regards,=20

        - Aidan
--=20
~As democracy is perfected, the office of president represents, more and
more closely, the inner soul of the people. On some great and glorious day
the plain folks of the land will reach their heart~s desire at last and the
White House will be adorned by a downright moron.~ ~ H.L. Mencken=20


--=-=-=
Content-Type: text/x-patch; charset=iso-8859-1
Content-Disposition: attachment;
  filename=mm-xemacs-coding-systems20041207.diff
Content-Description: Further revision of mm-util.el to behave better under XEmacs. 

--- mm-util.el.orig	2004-12-06 23:20:17.000000000 +0000
+++ mm-util.el	2004-12-06 23:54:27.000000000 +0000
@@ -584,6 +584,73 @@
 		(length (memq (coding-system-base b) priorities)))
 	   t))))
 
+(defun mm-xemacs-find-mime-charset (begin end)
+  "Determine which MIME charset to use to send region as message.
+This uses the XEmacs-specific latin-unity package to better handle the case
+where identical characters from diverse ISO-8859-? character sets can be
+encoded using a single one of the corresponding coding systems. 
+
+It treats `mm-coding-system-priorities' as the list of preferred coding
+systems; a useful example setting for this list in Western Europe would be
+'(iso-8859-1 iso-8859-15 utf-8), which would default to the very standard
+Latin 1 coding system, and only move to coding systems that are less
+supported as is necessary to encode the characters that exist in the buffer.
+
+Latin Unity doesn't know about those non-ASCII Roman characters that are
+available in various East Asian character sets.  As such, its behavior if
+you have a JIS 0212 LATIN SMALL LETTER A WITH ACUTE in a buffer and it can
+otherwise be encoded as Latin 1, won't be ideal.  But this is very much a
+corner case, so don't worry about it.  "
+  (let ((systems mm-coding-system-priorities) csets psets curset chars-region)
+
+    ;; Load the Latin Unity library, if available.
+    (when (and (not (featurep 'latin-unity)) (locate-library "latin-unity"))
+      (require 'latin-unity))
+
+    ;; Now, can we use it?
+    (if (featurep 'latin-unity)
+	(progn 
+	  (assert (featurep 'xemacs) 
+		  (concat "We only expect latin-unity on XEmacs--this code "
+			  "will break on the FSF's Emacs.  "))
+	  (setq csets (latin-unity-representations-feasible-region begin end)
+		psets (latin-unity-representations-present-region begin end)
+		chars-region (delq 'ascii (charsets-in-region begin end)))
+
+	  (catch 'done
+
+	    ;; Pass back the first coding system in the preferred list that
+	    ;; can encode the whole region.
+	    (dolist (curset systems)
+	      (setq curset (latin-unity-massage-name curset 'buffer-default))
+
+	      ;; If the coding system is a universal coding system, then it
+	      ;; can certainly encode all the characters in the region. 
+	      (if (memq curset latin-unity-ucs-list)
+		  (throw 'done (list curset)))
+
+	      ;; If a coding system isn't universal, and isn't in the list
+	      ;; that latin unity knows about, we can't decide whether to
+	      ;; use it here. Leave that until later in mm-find-mime-charset
+	      ;; region function, whence we have been called.
+	      (unless (memq curset latin-unity-coding-systems)
+		(throw 'done nil))
+
+	      ;; Right, we know about this coding system, and it may
+	      ;; conceivably be able to encode all the characters in the
+	      ;; region.
+	      (if (latin-unity-maybe-remap begin end curset csets psets t)
+		  (throw 'done (list curset))))
+
+	    ;; Can't encode using anything from the
+	    ;; mm-coding-system-priorities list. Leave mm-find-mime-charset
+	    ;; to do most of the work.
+	    nil))
+
+      ;; Right, latin unity isn't available; let mm-find-charset-region 
+      ;; take its default action, which equally applies to GNU Emacs. 
+      nil)))
+
 (defun mm-find-mime-charset-region (b e &optional hack-charsets)
   "Return the MIME charsets needed to encode the region between B and E.
 nil means ASCII, a single-element list represents an appropriate MIME
@@ -625,8 +692,12 @@
 			 (setq systems nil
 			       charsets (list cs))))))
 	       charsets))
-	;; Otherwise we're not multibyte, we're XEmacs, or a single
-	;; coding system won't cover it.
+        ;; If we're XEmacs, and some coding system is appropriate,
+        ;; mm-xemacs-find-mime-charset will return an appropriate list.
+        ;; Otherwise, we'll get nil, and the next setq will get invoked.
+        (setq charsets (mm-xemacs-find-mime-charset b e))
+
+        ;; We're not multibyte, or a single coding system won't cover it.
 	(setq charsets
 	      (mm-delete-duplicates
 	       (mapcar 'mm-mime-charset

--=-=-=
Content-Type: application/emacs-lisp
Content-Disposition: attachment;
  filename=mm-xemacs-coding-test20041207.el
Content-Description: Test Lisp for the mm-util.el revision.


(require 'cl)
(require 'mm-util)

(with-temp-buffer
  (let ((oldres '(iso-8859-15 iso-8859-1 iso-8859-2))
	mm-coding-system-priorities testres buffer-undo-list)

    ;; Insert two characters that are theoretically both in Latin 1, but
    ;; that are in disjoint sets under Mule.
    (insert (format "latin 2 u umlaut: %c, latin 1 e grave: %c" 
		    (make-char 'latin-iso8859-2 #xfc)
		    (make-char 'latin-iso8859-1 #xe9)))

    (setq mm-coding-system-priorities '(iso-8859-1))

    (undo-boundary)

    (setq testres (mm-find-mime-charset-region (point-min) (point-max)))

    ;; mm-find-mime-charset-region should have worked out to use iso-8859-1.
    (assert (and (= (length testres) 1) (eq (car testres) 'iso-8859-1)))

    ;; Now, we choose to prefer the recent Latin 9 character set, which is
    ;; basically Latin 1 with the Euro sign. 
    (push 'iso-8859-15 mm-coding-system-priorities)

    ;; Undo, because mm-xemacs-find-mime-charset will have remapped to the
    ;; Latin 1 character set.
    (undo-start)
    (undo-more 1)

    ;; Find a MIME charset again. 
    (setq testres (mm-find-mime-charset-region (point-min) (point-max)))

    ;; This time, it should have worked out that it can use Latin 9. 
    (assert (and (= (length testres) 1) (eq (car testres) 'iso-8859-15)))

    ;; Undo, because mm-xemacs-find-mime-charset will have remapped to 
    ;; Latin 9. 
    (undo-start)
    (undo-more 1)

    ;; Now, reverse the list, preferring Latin 1. 
    (setq mm-coding-system-priorities (nreverse mm-coding-system-priorities)
	  testres (mm-find-mime-charset-region (point-min) (point-max))) 

    ;; It should use Latin 1 here, because that has higher priority.
    (assert (and (= (length testres) 1) (eq (car testres) 'iso-8859-1)))

    ;; Add a Euro sign to the buffer, to make it no longer representable in
    ;; Latin 1.
    (insert (format "\nEuro Sign: %c\n" (make-char 'latin-iso8859-15 #xa4)))

    ;; Preserve the Euro sign next time we do an undo.
    (undo-boundary)

    ;; Find a MIME charset again. 
    (setq testres (mm-find-mime-charset-region (point-min) (point-max)))

    ;; Undo, because mm-xemacs-find-mime-charset will have remapped to 
    ;; Latin 9. 
    (undo-start)
    (undo-more 1)

    ;; It should use Latin 9 here, because that's the first entry on the
    ;; list that can encode the buffer.
    (assert (and (= (length testres) 1) (eq (car testres) 'iso-8859-15)))

    ;; Now, revert mm-coding-system-priorities to nil, and we'll get to old
    ;; behaviour.
    (setq mm-coding-system-priorities nil
	  testres (mm-find-mime-charset-region (point-min) (point-max))) 

    ;; Each entry in the mm-find-mime-charset-region result should now be in
    ;; the original result list. 
    (dolist (singleres testres)
      (assert (memq singleres oldres))
      (setq oldres (delq singleres oldres)))

    ;; Some further tests from Katsumi Yamaoka.
    (let ((mm-coding-system-priorities '(iso-8859-1 iso-8859-15))) 
      (assert (equal "?iso-8859-1?" 
		     (substring (rfc2047-encode-string 
				 (string (make-char 'latin-iso8859-15 95)))
				1 13))))

    (let ((mm-coding-system-priorities '(shift_jis))) 
      (assert (equal "?shift_jis?" 
		     (substring (rfc2047-encode-string 
				 (string (make-char 'japanese-jisx0208 
						    38 66)))
				1 12))))

    (message "All the mm-util tests that were written passed.  ")))

--=-=-=--