From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/32550 Path: main.gmane.org!not-for-mail From: Florian Weimer Newsgroups: gmane.emacs.gnus.general Subject: Re: \201 irritation! :-) Date: 23 Sep 2000 13:14:54 +0200 Sender: owner-ding@hpc.uh.edu Message-ID: <87og1f8imp.fsf@deneb.enyo.de> References: <00Aug28.151432edt.115218@gateway.intersys.com> <00Aug28.173634edt.115213@gateway.intersys.com> <200009051429.PAA09826@djlvig.dl.ac.uk> <200009082240.XAA16800@djlvig.dl.ac.uk> <87n1h86w6a.fsf@deneb.enyo.de> <200009181407.PAA02748@djlvig.dl.ac.uk> <87wvg7roi1.fsf@deneb.enyo.de> <200009211933.UAA08307@djlvig.dl.ac.uk> NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1035168818 20480 80.91.224.250 (21 Oct 2002 02:53:38 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 02:53:38 +0000 (UTC) Return-Path: Original-Received: from fisher.math.uh.edu (fisher.math.uh.edu [129.7.128.35]) by mailhost.sclp.com (Postfix) with ESMTP id 23A70D051E for ; Sat, 23 Sep 2000 07:17:04 -0400 (EDT) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by fisher.math.uh.edu (8.9.1/8.9.1) with ESMTP id GAC07401; Sat, 23 Sep 2000 06:16:44 -0500 (CDT) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Sat, 23 Sep 2000 06:16:05 -0500 (CDT) Original-Received: from mailhost.sclp.com (postfix@66-209.196.61.interliant.com [209.196.61.66] (may be forged)) by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id GAA20804 for ; Sat, 23 Sep 2000 06:15:50 -0500 (CDT) Original-Received: from mail.netic.de (mail.s.netic.de [212.9.160.11]) by mailhost.sclp.com (Postfix) with ESMTP id 62587D051E for ; Sat, 23 Sep 2000 07:16:13 -0400 (EDT) Original-Received: by mail.netic.de (Smail3.2.0.111/mail.s.netic.de) via LF.net GmbH Internet Services via remoteip 212.9.163.95 via remotehost mail.enyo.de with esmtp for mail.gnus.org id m13cnHv-001X58C; Sat, 23 Sep 2000 13:16:11 +0200 (CEST) Original-Received: from [192.168.1.2] (helo=deneb.enyo.de) by mail.enyo.de with esmtp (Exim 3.12 #1) id 13cnGD-00003p-00 for ding@gnus.org; Sat, 23 Sep 2000 13:14:25 +0200 Original-Received: from fw by deneb.enyo.de with local (Exim 3.12 #1) id 13cnGg-0006UB-00 for ding@gnus.org; Sat, 23 Sep 2000 13:14:54 +0200 Original-To: ding@gnus.org In-Reply-To: Dave Love's message of "Thu, 21 Sep 2000 20:33:58 +0100" Original-Lines: 42 User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7 Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:32550 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:32550 Dave Love writes: > >>>>> "FW" == Florian Weimer writes: > > FW> Byte-combination kills the quoted-printable encoder (or used to > FW> do it in the past, at least). > > I don't see why it should, at least for a sane charset. Do you know > exactly how it fails? If there is a byte-combination, "(char-after)" returns values outside the usual 0 .. 255 range, and "quoted-printable-encode-region" doesn't handle this. The best thing probably is to switch the buffer to uni-byte mode during quoted-printable encoding. Perhaps adding "mm-with-unibyte-current-buffer" could do the trick? > FW> Copying byte-combinations between unibyte and multibyte buffers > FW> results in some weird effects. Sometimes, Emacs is not 8-bit > FW> clean. :-( > > I don't know what that means. Just spurious combination of leading > bytes stuffed raw into a multibyte buffer or something else? Take the "Chinese" line from the HELLO file, copy it to a multi-byte buffer, do "encode-coding-region" on it and specify "utf-8" as encoding. Copy the result to a unibyte buffer, and paste it back into the first (multi-byte) buffer. Now the \201s are there, you can see them if you switch the first buffer to uni-byte mode. The trouble with UTF-8 is that it tends to generate more byte-combinations than other encodings, as it seems. > >> Anyway, the `eight-bit-control' charset in Mule 5.0 should fix that > >> sort of thing in the future, as well as allowing better auto-detection > >> of utf-8. > > FW> Does "encode-coding-region" produce characters in this encoding? > > Do you mean _de_code-coding-region? `encode-coding-region' would > produce raw bytes. And these raw bytes are not properly dealt with in multi-byte buffers.