From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/32550
Path: main.gmane.org!not-for-mail
From: Florian Weimer <fw@deneb.enyo.de>
Newsgroups: gmane.emacs.gnus.general
Subject: Re: \201 irritation! :-)
Date: 23 Sep 2000 13:14:54 +0200
Sender: owner-ding@hpc.uh.edu
Message-ID: <87og1f8imp.fsf@deneb.enyo.de>
References: <oqaedytybi.fsf@titan.progiciels-bpi.ca>
	<00Aug28.151432edt.115218@gateway.intersys.com>
	<oqog2dqhza.fsf@titan.progiciels-bpi.ca>
	<00Aug28.173634edt.115213@gateway.intersys.com>
	<oqitslatr0.fsf@titan.progiciels-bpi.ca>
	<200009051429.PAA09826@djlvig.dl.ac.uk>
	<oqk8cpr2kk.fsf@titan.progiciels-bpi.ca>
	<200009082240.XAA16800@djlvig.dl.ac.uk> <87n1h86w6a.fsf@deneb.enyo.de>
	<200009181407.PAA02748@djlvig.dl.ac.uk> <87wvg7roi1.fsf@deneb.enyo.de>
	<200009211933.UAA08307@djlvig.dl.ac.uk>
NNTP-Posting-Host: coloc-standby.netfonds.no
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: main.gmane.org 1035168818 20480 80.91.224.250 (21 Oct 2002 02:53:38 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Mon, 21 Oct 2002 02:53:38 +0000 (UTC)
Return-Path: <owner-ding@hpc.uh.edu>
Original-Received: from fisher.math.uh.edu (fisher.math.uh.edu [129.7.128.35])
	by mailhost.sclp.com (Postfix) with ESMTP id 23A70D051E
	for <jason@mailhost.sclp.com>; Sat, 23 Sep 2000 07:17:04 -0400 (EDT)
Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5])
	by fisher.math.uh.edu (8.9.1/8.9.1) with ESMTP id GAC07401;
	Sat, 23 Sep 2000 06:16:44 -0500 (CDT)
Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Sat, 23 Sep 2000 06:16:05 -0500 (CDT)
Original-Received: from mailhost.sclp.com (postfix@66-209.196.61.interliant.com [209.196.61.66] (may be forged))
	by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id GAA20804
	for <ding@hpc.uh.edu>; Sat, 23 Sep 2000 06:15:50 -0500 (CDT)
Original-Received: from mail.netic.de (mail.s.netic.de [212.9.160.11])
	by mailhost.sclp.com (Postfix) with ESMTP id 62587D051E
	for <ding@gnus.org>; Sat, 23 Sep 2000 07:16:13 -0400 (EDT)
Original-Received: by mail.netic.de (Smail3.2.0.111/mail.s.netic.de)
	via LF.net GmbH Internet Services
	via remoteip 212.9.163.95
	via remotehost mail.enyo.de with esmtp
	for mail.gnus.org
	id m13cnHv-001X58C; Sat, 23 Sep 2000 13:16:11 +0200 (CEST)
Original-Received: from [192.168.1.2] (helo=deneb.enyo.de)
	by mail.enyo.de with esmtp (Exim 3.12 #1)
	id 13cnGD-00003p-00
	for ding@gnus.org; Sat, 23 Sep 2000 13:14:25 +0200
Original-Received: from fw by deneb.enyo.de with local (Exim 3.12 #1)
	id 13cnGg-0006UB-00
	for ding@gnus.org; Sat, 23 Sep 2000 13:14:54 +0200
Original-To: ding@gnus.org
In-Reply-To: Dave Love's message of "Thu, 21 Sep 2000 20:33:58 +0100"
Original-Lines: 42
User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7
Precedence: list
X-Majordomo: 1.94.jlt7
Xref: main.gmane.org gmane.emacs.gnus.general:32550
X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:32550

Dave Love <d.love@dl.ac.uk> writes:

> >>>>> "FW" == Florian Weimer <fw@deneb.enyo.de> writes:
> 
>  FW> Byte-combination kills the quoted-printable encoder (or used to
>  FW> do it in the past, at least).
> 
> I don't see why it should, at least for a sane charset.  Do you know
> exactly how it fails?

If there is a byte-combination, "(char-after)" returns values outside
the usual 0 .. 255 range, and "quoted-printable-encode-region" doesn't
handle this.  The best thing probably is to switch the buffer to
uni-byte mode during quoted-printable encoding.  Perhaps adding
"mm-with-unibyte-current-buffer" could do the trick?

>  FW> Copying byte-combinations between unibyte and multibyte buffers
>  FW> results in some weird effects.  Sometimes, Emacs is not 8-bit
>  FW> clean. :-(
> 
> I don't know what that means.  Just spurious combination of leading
> bytes stuffed raw into a multibyte buffer or something else?

Take the "Chinese" line from the HELLO file, copy it to a multi-byte
buffer, do "encode-coding-region" on it and specify "utf-8" as
encoding.  Copy the result to a unibyte buffer, and paste it back into
the first (multi-byte) buffer.  Now the \201s are there, you can see
them if you switch the first buffer to uni-byte mode.

The trouble with UTF-8 is that it tends to generate more
byte-combinations than other encodings, as it seems.

>  >> Anyway, the `eight-bit-control' charset in Mule 5.0 should fix that
>  >> sort of thing in the future, as well as allowing better auto-detection
>  >> of utf-8.
> 
>  FW> Does "encode-coding-region" produce characters in this encoding?
> 
> Do you mean _de_code-coding-region?  `encode-coding-region' would
> produce raw bytes.

And these raw bytes are not properly dealt with in multi-byte buffers.