From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/65685 Path: news.gmane.org!not-for-mail From: Katsumi Yamaoka Newsgroups: gmane.emacs.gnus.general,gmane.emacs.devel Subject: Re: [Unicode-2] `read' always returns multibyte symbol Date: Fri, 16 Nov 2007 08:31:49 +0900 Organization: Emacsen advocacy group Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1195169556 1633 80.91.229.12 (15 Nov 2007 23:32:36 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 15 Nov 2007 23:32:36 +0000 (UTC) Cc: ding@gnus.org, emacs-devel@gnu.org To: Kenichi Handa Original-X-From: ding-owner+M14182@lists.math.uh.edu Fri Nov 16 00:32:40 2007 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by lo.gmane.org with esmtp (Exim 4.50) id 1IsoCN-0002Eg-Kc for ding-account@gmane.org; Fri, 16 Nov 2007 00:32:39 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1IsoBu-0003Xz-Dl; Thu, 15 Nov 2007 17:32:10 -0600 Original-Received: from mx2.math.uh.edu ([129.7.128.33]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1IsoBs-0003Xe-Ei for ding@lists.math.uh.edu; Thu, 15 Nov 2007 17:32:08 -0600 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx2.math.uh.edu with esmtp (Exim 4.67) (envelope-from ) id 1IsoBm-00033J-4i for ding@lists.math.uh.edu; Thu, 15 Nov 2007 17:32:08 -0600 Original-Received: from orlando.hostforweb.net ([216.246.45.90]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1IsoBf-0003Jn-00 for ; Fri, 16 Nov 2007 00:31:55 +0100 Original-Received: from [66.225.201.151] (port=48474 helo=mail.jpl.org) by orlando.hostforweb.net with esmtpa (Exim 4.68) (envelope-from ) id 1IsoBs-000835-Q5; Thu, 15 Nov 2007 17:32:09 -0600 X-Hashcash: 1:20:071115:handa@ni.aist.go.jp::8lasmo5WZc2CLzWE:0000000000000000000000000000000000000000001KAy X-Hashcash: 1:20:071115:ding@gnus.org::9X2BANPMeoh+QGv6:00000hEg X-Hashcash: 1:20:071115:emacs-devel@gnu.org::IFv9YHnz7bqUwBIE:00000000000000000000000000000000000000000009Oq X-Face: #kKnN,xUnmKia.'[pp`;Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu;B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( User-Agent: Gnus/5.110007 (No Gnus v0.7) Emacs/23.0.60 (gnu/linux) Cancel-Lock: sha1:W08GqhkMNmzbcjX2u+6ujKcuI1Y= X-Antivirus-Scanner: Clean mail though you should still use an Antivirus X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - orlando.hostforweb.net X-AntiAbuse: Original Domain - gnus.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jpl.org X-Source: X-Source-Args: X-Source-Dir: X-Spam-Score: -2.4 (--) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:65685 gmane.emacs.devel:83298 Archived-At: >>>>> Kenichi Handa wrote: > In article , > Katsumi Yamaoka writes: >> What I observed was different. > That is exactly what string-as-multibyte does. \206\343 and > \202\271 are valid multibyte forms in the current Emacs, > thus are treated as multibyte characters. I understood why such readable characters appeared abruptly. [...] > Please try this: > (string-make-unibyte > (string-as-multibyte "\343\203\206\343\202\271\343\203\210")) > You'll get the above result, ... yes, very weird. Oh, it made me surprised a bit. But I often view such a scene while playing with unibyte and multibyte things, and it always confuses me. > On the other hand, > (string-as-unibyte > (string-as-multibyte "\343\203\206\343\202\271\343\203\210")) > => "\343\203\206\343\202\271\343\203\210" >>> I long ago proposed a facility that turns on the >>> multibyteness of a buffer while converting 8-bit bytes to >>> multibyte characters as what string-to-multibyte does, but >>> not accepted. >> But the modern Emacsen does do so, doesn't it? > No. Oops. I misunderstood that the reason why Emacs 22 and 23 don't break 8-bit data while they are being fed into a multibyte buffer from a network process of which the process coding system is binary. So, maybe the best ways for the present are still to use a unibyte buffer for unibyte data and to use a multibyte buffer for multibyte data. And use a string, not a buffer, to encode and decode data if the multibyteness of data will change, like: (insert (prog1 (decode-coding-string (buffer-string) 'coding) (erase-buffer) (set-buffer-multibyte t)))