From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/58947 Path: main.gmane.org!not-for-mail From: Katsumi Yamaoka Newsgroups: gmane.emacs.gnus.general Subject: Re: [BUG]What does this mean:"Mention that multibyte characters Date: Wed, 20 Oct 2004 14:36:48 +0900 Organization: Emacsen advocacy group Sender: ding-owner@lists.math.uh.edu Message-ID: References: NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1098250817 26834 80.91.229.6 (20 Oct 2004 05:40:17 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 20 Oct 2004 05:40:17 +0000 (UTC) Cc: Juri Linkov Original-X-From: ding-owner+M7485@lists.math.uh.edu Wed Oct 20 07:40:02 2004 Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13] ident=mail) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1CK9Cc-00089H-00 for ; Wed, 20 Oct 2004 07:40:02 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu ident=lists) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 1CK9Ah-00013g-00; Wed, 20 Oct 2004 00:38:03 -0500 Original-Received: from util2.math.uh.edu ([129.7.128.23]) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 1CK9A6-00013a-00 for ding@lists.math.uh.edu; Wed, 20 Oct 2004 00:37:26 -0500 Original-Received: from justine.libertine.org ([66.139.78.221] ident=postfix) by util2.math.uh.edu with esmtp (Exim 4.30) id 1CK99b-0008WV-Fm for ding@lists.math.uh.edu; Wed, 20 Oct 2004 00:36:55 -0500 Original-Received: from washington.hostforweb.net (washington.hostforweb.net [69.61.11.2]) by justine.libertine.org (Postfix) with ESMTP id 828363A00B1 for ; Wed, 20 Oct 2004 00:36:53 -0500 (CDT) Original-Received: from localhost ([127.0.0.1]) by washington.hostforweb.net with esmtpa (Exim 4.42) id 1CK99e-0006DS-SX; Wed, 20 Oct 2004 01:36:59 -0400 Original-To: ding@gnus.org X-Face: #kKnN,xUnmKia.'[pp`;Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu;B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( User-Agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.3.50 (gnu/linux) Cancel-Lock: sha1:tfDsFZ9vLAQuTbIOjDpzS5ncIp0= X-Hashcash: 1:17:041020:ding@gnus.org::Gbw4JBaN8bFoyzQt:000008+0 X-Hashcash: 1:17:041020:juri@jurta.org::Wpd4Kt/HObeuZfeh:000000000000000000000000000000000000000000000002DCd X-Antivirus-Scanner: Clean mail though you should still use an Antivirus X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - washington.hostforweb.net X-AntiAbuse: Original Domain - gnus.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jpl.org X-Source: X-Source-Args: X-Source-Dir: Precedence: bulk Xref: main.gmane.org gmane.emacs.gnus.general:58947 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:58947 >>>>> In Katsumi Yamaoka wrote: > I'll try translation of Kenichi Handa's advice next time. This is practice of my English composition. No matter what mistake may be there, the responsibility is in me. >>>>> Katsumi Yamaoka wrote: > With the following form, Emacs 21.3.50 returns non-nil, and 22.0.0 > returns nil. Could you let me know for reference what occurs there? > (with-temp-buffer > (set-buffer-multibyte t) > (insert (string-as-unibyte "\200")) > (goto-char (point-min)) > (search-forward (string-as-multibyte "\200") nil t)) ;; Annotation by K.Y.: ;; At that time, I didn't know the possible insertion forms are ;; `(insert ?\200)' and `(insert (format "%c" ?\200))' yet. >>>>> Kenichi Handa wrote: Even with Emacs 21.3.50, the above form will return nil according to a certain language environment (e.g., Vietnamese, etc.). You have to understand first that a unibyte string is converted into a multibyte string by `string-make-multibyte' when inserting a unibyte ^^^^ string in a multibyte buffer. Therefore, the `(insert (string-as-unibyte "\200"))' form is identical to the `(insert (string-make-multibyte (string-as-unibyte "\200")))' form. Where how `string-make-multibyte' converts depends on the language environment. As for Emacs 21, in the Latin-1 language environment, for example, the string of "\200" will be converted into the character which corresponds to \200 in the eight-bit-control charset since the primary charset latin-iso8859-1 doesn't contain \200. Second, `string-as-multibyte' converts STRING into the multibyte string, keeping its byte sequence as much as possible. It works ``as much as possible'' but sometimes brings differences. For example, the string of "\200" will be converted into the byte-sequence of "\236\240" which is a character contained in the eight-bit-control charset. It is the same as the character which the above program inserted in the buffer. Consequently, in the Latin-1 language environment, for example, the above program returned non-nil, in Emacs 21. On the other hand, in Emacs 22, since iso-8859-1 which is the primary charset for Latin-1 contains \200, the form (insert (string-as-unibyte "\200")) inserts the character of U+0080 rather than the character which belongs to eight-bit-control. However, `string-as-multibyte' always converts \200 into the character of eight-bit-control. This is the reason that program returns nil. If you have a need to look for \200 after inserting it in a buffer, it will go well in both Emacs 21 and 22 using the following way for example: (with-temp-buffer (set-buffer-multibyte t) (insert (string-to-multibyte "\200")) (goto-char (point-min)) (search-forward (string-to-multibyte "\200") nil t)) ;; Annotation by K.Y.: ;; I didn't use that way in the `gnus-update-summary-mark-positions' ;; function (which see). `string-to-multibyte' always converts a string into the characters which belong to eight-bit-control or eight-bit-graphic, so the string which it makes will never match usual string. P.S. In Emacs 21, the form `(insert (string-to-multibyte "\200"))' does the same as the form `(insert ?\200)' does. It is because there is not the character corresponding to 128 in the multibyte buffer, and it is treated as the raw byte which belongs to eight-bit-control. However, it differs in Emacs 22. Since the character corresponding to 128 exists as U+0080, it will be inserted. ;; Annotation by K.Y.: ;; I deeply thank to Kenichi Handa. There was all knowledge that I ;; needed.