From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/59377 Path: main.gmane.org!not-for-mail From: Aidan Kehoe Newsgroups: gmane.emacs.gnus.general Subject: Re: XEmacs, Gnus and mm-coding-system priorities. Date: Tue, 07 Dec 2004 00:10:28 +0000 Message-ID: <16820.62708.663703.64580.z25zdq@parhasard.net> References: <16816.39575.977351.530618.dm0@vm.parhasard.net> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: sea.gmane.org 1102435600 11063 80.91.229.6 (7 Dec 2004 16:06:40 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 7 Dec 2004 16:06:40 +0000 (UTC) Cc: ding@gnus.org Original-X-From: ding-owner+M7918@lists.math.uh.edu Tue Dec 07 17:06:35 2004 Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13] ident=mail) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1CbhRz-0007s5-00 for ; Tue, 07 Dec 2004 16:40:27 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu ident=lists) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 1CbhQ6-0003Ua-00; Tue, 07 Dec 2004 09:38:30 -0600 Original-Received: from util2.math.uh.edu ([129.7.128.23]) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 1CbSwB-0000wv-00 for ding@lists.math.uh.edu; Mon, 06 Dec 2004 18:10:39 -0600 Original-Received: from justine.libertine.org ([66.139.78.221] ident=postfix) by util2.math.uh.edu with esmtp (Exim 4.30) id 1CbSw8-0000R1-8N for ding@lists.math.uh.edu; Mon, 06 Dec 2004 18:10:36 -0600 Original-Received: from ns5.nestdesign.com (unknown [69.93.162.170]) by justine.libertine.org (Postfix) with ESMTP id B9DD13A0035 for ; Mon, 6 Dec 2004 18:10:35 -0600 (CST) Original-Received: by ns5.nestdesign.com (Postfix, from userid 508) id E499C328002; Tue, 7 Dec 2004 00:10:31 +0000 (GMT) Original-To: Katsumi Yamaoka Original-Newsgroups: gnu.emacs.gnus In-Reply-To: X-Mailer: VM 7.14 under 21.4 (patch 13) "Rational FORTRAN" XEmacs Lucid X-Echelon-distraction: RDI T Branch 701 Merv plutonium ZARK User-Agent: Gnus/5.1002 (Gnus v5.10.2) XEmacs/21.4 (Rational FORTRAN, linux) Cancel-Lock: sha1:MEV38jwTMiAJ9v3TRuWR535P66Q= Precedence: bulk Original-Sender: ding-owner@lists.math.uh.edu Xref: main.gmane.org gmane.emacs.gnus.general:59377 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:59377 --=-=-= Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Hi! again,=20 Ar an s=E9i=FA l=E1 de m=ED na Nollaig, scr=EDobh Katsumi Yamaoka:=20 > [...] I tried your patch with the old mm-util.el and confirmed it works. > However, it doesn't work with non-Latin characters: >=20 > (let ((mm-coding-system-priorities '(shift_jis))) > (rfc2047-encode-string (string (make-char 'japanese-jisx0208 38 66)))) I was wrong, I can work around this. Since latin-unity knows the list of coding systems it can map into, we can check each entry in mm-coding-system-priorities for validity before trying to remap with it. So, for example, if shift_jis is in mm-coding-system-priorities, the latin unity code should just give up, and mm-find-mime-charset falls back to the code you had written. I~ve a revised patch and test set (including your tests from the last mail) attached that addresses this. I~m also going to post this to gnu.emacs.gnus because ding@gnus evidently doesn~t like me :-) . In terms of usability, I~m starting to feel strongly that mm-coding-systems-priorities should be initialised to '(iso-8859-1 iso-8859-15 iso-8859-2 iso-8859-16 utf-8) for non-East-Asian locales. Americans don~t care, but for non-English speaking Europeans the Mule breakage just another reason not to use an Emacs.=20 Best regards,=20 - Aidan --=20 ~As democracy is perfected, the office of president represents, more and more closely, the inner soul of the people. On some great and glorious day the plain folks of the land will reach their heart~s desire at last and the White House will be adorned by a downright moron.~ ~ H.L. Mencken=20 --=-=-= Content-Type: text/x-patch; charset=iso-8859-1 Content-Disposition: attachment; filename=mm-xemacs-coding-systems20041207.diff Content-Description: Further revision of mm-util.el to behave better under XEmacs. --- mm-util.el.orig 2004-12-06 23:20:17.000000000 +0000 +++ mm-util.el 2004-12-06 23:54:27.000000000 +0000 @@ -584,6 +584,73 @@ (length (memq (coding-system-base b) priorities))) t)))) +(defun mm-xemacs-find-mime-charset (begin end) + "Determine which MIME charset to use to send region as message. +This uses the XEmacs-specific latin-unity package to better handle the case +where identical characters from diverse ISO-8859-? character sets can be +encoded using a single one of the corresponding coding systems. + +It treats `mm-coding-system-priorities' as the list of preferred coding +systems; a useful example setting for this list in Western Europe would be +'(iso-8859-1 iso-8859-15 utf-8), which would default to the very standard +Latin 1 coding system, and only move to coding systems that are less +supported as is necessary to encode the characters that exist in the buffer. + +Latin Unity doesn't know about those non-ASCII Roman characters that are +available in various East Asian character sets. As such, its behavior if +you have a JIS 0212 LATIN SMALL LETTER A WITH ACUTE in a buffer and it can +otherwise be encoded as Latin 1, won't be ideal. But this is very much a +corner case, so don't worry about it. " + (let ((systems mm-coding-system-priorities) csets psets curset chars-region) + + ;; Load the Latin Unity library, if available. + (when (and (not (featurep 'latin-unity)) (locate-library "latin-unity")) + (require 'latin-unity)) + + ;; Now, can we use it? + (if (featurep 'latin-unity) + (progn + (assert (featurep 'xemacs) + (concat "We only expect latin-unity on XEmacs--this code " + "will break on the FSF's Emacs. ")) + (setq csets (latin-unity-representations-feasible-region begin end) + psets (latin-unity-representations-present-region begin end) + chars-region (delq 'ascii (charsets-in-region begin end))) + + (catch 'done + + ;; Pass back the first coding system in the preferred list that + ;; can encode the whole region. + (dolist (curset systems) + (setq curset (latin-unity-massage-name curset 'buffer-default)) + + ;; If the coding system is a universal coding system, then it + ;; can certainly encode all the characters in the region. + (if (memq curset latin-unity-ucs-list) + (throw 'done (list curset))) + + ;; If a coding system isn't universal, and isn't in the list + ;; that latin unity knows about, we can't decide whether to + ;; use it here. Leave that until later in mm-find-mime-charset + ;; region function, whence we have been called. + (unless (memq curset latin-unity-coding-systems) + (throw 'done nil)) + + ;; Right, we know about this coding system, and it may + ;; conceivably be able to encode all the characters in the + ;; region. + (if (latin-unity-maybe-remap begin end curset csets psets t) + (throw 'done (list curset)))) + + ;; Can't encode using anything from the + ;; mm-coding-system-priorities list. Leave mm-find-mime-charset + ;; to do most of the work. + nil)) + + ;; Right, latin unity isn't available; let mm-find-charset-region + ;; take its default action, which equally applies to GNU Emacs. + nil))) + (defun mm-find-mime-charset-region (b e &optional hack-charsets) "Return the MIME charsets needed to encode the region between B and E. nil means ASCII, a single-element list represents an appropriate MIME @@ -625,8 +692,12 @@ (setq systems nil charsets (list cs)))))) charsets)) - ;; Otherwise we're not multibyte, we're XEmacs, or a single - ;; coding system won't cover it. + ;; If we're XEmacs, and some coding system is appropriate, + ;; mm-xemacs-find-mime-charset will return an appropriate list. + ;; Otherwise, we'll get nil, and the next setq will get invoked. + (setq charsets (mm-xemacs-find-mime-charset b e)) + + ;; We're not multibyte, or a single coding system won't cover it. (setq charsets (mm-delete-duplicates (mapcar 'mm-mime-charset --=-=-= Content-Type: application/emacs-lisp Content-Disposition: attachment; filename=mm-xemacs-coding-test20041207.el Content-Description: Test Lisp for the mm-util.el revision. (require 'cl) (require 'mm-util) (with-temp-buffer (let ((oldres '(iso-8859-15 iso-8859-1 iso-8859-2)) mm-coding-system-priorities testres buffer-undo-list) ;; Insert two characters that are theoretically both in Latin 1, but ;; that are in disjoint sets under Mule. (insert (format "latin 2 u umlaut: %c, latin 1 e grave: %c" (make-char 'latin-iso8859-2 #xfc) (make-char 'latin-iso8859-1 #xe9))) (setq mm-coding-system-priorities '(iso-8859-1)) (undo-boundary) (setq testres (mm-find-mime-charset-region (point-min) (point-max))) ;; mm-find-mime-charset-region should have worked out to use iso-8859-1. (assert (and (= (length testres) 1) (eq (car testres) 'iso-8859-1))) ;; Now, we choose to prefer the recent Latin 9 character set, which is ;; basically Latin 1 with the Euro sign. (push 'iso-8859-15 mm-coding-system-priorities) ;; Undo, because mm-xemacs-find-mime-charset will have remapped to the ;; Latin 1 character set. (undo-start) (undo-more 1) ;; Find a MIME charset again. (setq testres (mm-find-mime-charset-region (point-min) (point-max))) ;; This time, it should have worked out that it can use Latin 9. (assert (and (= (length testres) 1) (eq (car testres) 'iso-8859-15))) ;; Undo, because mm-xemacs-find-mime-charset will have remapped to ;; Latin 9. (undo-start) (undo-more 1) ;; Now, reverse the list, preferring Latin 1. (setq mm-coding-system-priorities (nreverse mm-coding-system-priorities) testres (mm-find-mime-charset-region (point-min) (point-max))) ;; It should use Latin 1 here, because that has higher priority. (assert (and (= (length testres) 1) (eq (car testres) 'iso-8859-1))) ;; Add a Euro sign to the buffer, to make it no longer representable in ;; Latin 1. (insert (format "\nEuro Sign: %c\n" (make-char 'latin-iso8859-15 #xa4))) ;; Preserve the Euro sign next time we do an undo. (undo-boundary) ;; Find a MIME charset again. (setq testres (mm-find-mime-charset-region (point-min) (point-max))) ;; Undo, because mm-xemacs-find-mime-charset will have remapped to ;; Latin 9. (undo-start) (undo-more 1) ;; It should use Latin 9 here, because that's the first entry on the ;; list that can encode the buffer. (assert (and (= (length testres) 1) (eq (car testres) 'iso-8859-15))) ;; Now, revert mm-coding-system-priorities to nil, and we'll get to old ;; behaviour. (setq mm-coding-system-priorities nil testres (mm-find-mime-charset-region (point-min) (point-max))) ;; Each entry in the mm-find-mime-charset-region result should now be in ;; the original result list. (dolist (singleres testres) (assert (memq singleres oldres)) (setq oldres (delq singleres oldres))) ;; Some further tests from Katsumi Yamaoka. (let ((mm-coding-system-priorities '(iso-8859-1 iso-8859-15))) (assert (equal "?iso-8859-1?" (substring (rfc2047-encode-string (string (make-char 'latin-iso8859-15 95))) 1 13)))) (let ((mm-coding-system-priorities '(shift_jis))) (assert (equal "?shift_jis?" (substring (rfc2047-encode-string (string (make-char 'japanese-jisx0208 38 66))) 1 12)))) (message "All the mm-util tests that were written passed. "))) --=-=-=--