From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/32619 Path: main.gmane.org!not-for-mail From: Florian Weimer Newsgroups: gmane.emacs.gnus.general Subject: Re: \201 irritation! :-) Date: 28 Sep 2000 14:22:19 +0200 Sender: owner-ding@hpc.uh.edu Message-ID: <87g0mkg0zo.fsf@deneb.enyo.de> References: <00Aug28.151432edt.115218@gateway.intersys.com> <00Aug28.173634edt.115213@gateway.intersys.com> <200009051429.PAA09826@djlvig.dl.ac.uk> <200009082240.XAA16800@djlvig.dl.ac.uk> <87n1h86w6a.fsf@deneb.enyo.de> <200009181407.PAA02748@djlvig.dl.ac.uk> <87wvg7roi1.fsf@deneb.enyo.de> <200009211933.UAA08307@djlvig.dl.ac.uk> <87og1f8imp.fsf@deneb.enyo.de> <200009251155.MAA15586@djlvig.dl.ac.uk> NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1035168874 20844 80.91.224.250 (21 Oct 2002 02:54:34 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 02:54:34 +0000 (UTC) Return-Path: Original-Received: from fisher.math.uh.edu (fisher.math.uh.edu [129.7.128.35]) by mailhost.sclp.com (Postfix) with ESMTP id 13B86D051E for ; Thu, 28 Sep 2000 08:27:11 -0400 (EDT) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by fisher.math.uh.edu (8.9.1/8.9.1) with ESMTP id HAC10320; Thu, 28 Sep 2000 07:27:02 -0500 (CDT) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Thu, 28 Sep 2000 07:26:27 -0500 (CDT) Original-Received: from mailhost.sclp.com (postfix@66-209.196.61.interliant.com [209.196.61.66] (may be forged)) by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id HAA22179 for ; Thu, 28 Sep 2000 07:26:12 -0500 (CDT) Original-Received: from mail.netic.de (mail.s.netic.de [212.9.160.11]) by mailhost.sclp.com (Postfix) with ESMTP id 6CD56D051E for ; Thu, 28 Sep 2000 08:26:36 -0400 (EDT) Original-Received: by mail.netic.de (Smail3.2.0.111/mail.s.netic.de) via LF.net GmbH Internet Services via remoteip 212.9.163.97 via remotehost mail.enyo.de with esmtp for mail.gnus.org id m13eclR-001X4gC; Thu, 28 Sep 2000 14:26:13 +0200 (CEST) Original-Received: from [192.168.1.2] (helo=deneb.enyo.de) by mail.enyo.de with esmtp (Exim 3.12 #1) id 13ecfA-00016V-00 for ding@gnus.org; Thu, 28 Sep 2000 14:19:44 +0200 Original-Received: from fw by deneb.enyo.de with local (Exim 3.12 #1) id 13echg-00010m-00 for ding@gnus.org; Thu, 28 Sep 2000 14:22:20 +0200 Original-To: ding@gnus.org In-Reply-To: Dave Love's message of "Mon, 25 Sep 2000 12:55:16 +0100" Original-Lines: 68 User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7 Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:32619 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:32619 Dave Love writes: > >>>>> "FW" == Florian Weimer writes: > > FW> If there is a byte-combination, "(char-after)" returns values > FW> outside the usual 0 .. 255 range, and > FW> "quoted-printable-encode-region" doesn't handle this. > > Then it's either intrinsically broken or not being used appropriately > since it doesn't make sense for multibyte characters. I suppose I'll > have to check it if no-one else can. The following code illustrates the problem: (let ((multi (get-buffer-create "*Multibyte")) chars) (with-current-buffer multi (erase-buffer) (set-buffer-multibyte t) (insert 41813) (encode-coding-region (point-min) (point-max) 'utf-8) (goto-char (point-min)) (while (not (eobp)) (setq chars (cons (char-after) chars)) (forward-char)) (setq chars (nreverse chars)) (insert "\n" (format "%S" chars) "\n")) (switch-to-buffer multi)) I'm going to add the "mm-with-unibyte-current-buffer" stuff as soon as I've left hospital permanently, so that I can deal with the consequences in a more timely manner. ;-) > FW> The best thing probably is to switch the buffer to > FW> uni-byte mode during quoted-printable encoding. > > Whatever you do, you need a buffer with encoded contents. Making it > unibyte isn't right otherwise or you're dealing with the emacs-mule > charset. When qp encoding takes place, Gnus has already invoked "encode-coding-region" at some point, so it should be safe to look at the raw bytes. > FW> Now the \201s are there, you can see them if you switch the first > FW> buffer to uni-byte mode. > > Of course you see the internal encoding if you switch it to unibyte > mode anyway. You may also have pasted in raw bytes. The problem is that raw bytes turn in to characters of the default encoding (Latin-1 with my setup) when copying them around. It would be very helpful if Emacs could optionally generate an error message in this case, so we could track those \201 stuff down more easily. > FW> The trouble with UTF-8 is that it tends to generate more > FW> byte-combinations than other encodings, as it seems. > > >> Do you mean _de_code-coding-region? `encode-coding-region' would > >> produce raw bytes. > > FW> And these raw bytes are not properly dealt with in multi-byte buffers. > > So don't do that. That's the point. The problem is the Gnus approach of gradually converting an article with multiple parts, I think. It's extremely fragile.