From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: Date: Tue, 1 Feb 2005 13:26:56 -0500 From: Russ Cox To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] conversion of charsets in upas/fs In-Reply-To: <58218e63961d40d19c91d9e10f0be666@voidness.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable References: <87cb016c9bd81d2cce16b0219800bf01@voidness.de> <58218e63961d40d19c91d9e10f0be666@voidness.de> Topicbox-Message-UUID: fea3ca4c-eacf-11e9-9d60-3106f5b1d025 Sed operates on UTF, so if you give it non-UTF (aka garbage) it replaces bad UTF sequences with error runes, which is what you are seeing. Probably the best thing to do is write a program that applies the transformations you want but works a byte at a time and is character-set-ignorant. Russ On Tue, 1 Feb 2005 11:29:32 +0100, Heiko Dudzus wrote= : > > The problem remained when I sent this test mail to the local smtp > > server But I found out, that all is fine when I move away my pipeto > > file. > > > > It seems as if /mail/lib/pipeto.lib introduces the problem somewhere. > > I hope to find it. >=20 > Ok, i took a mail, made by Russ' smtp dialogue script, and did > manually what pipeto.lib does with every mail. >=20 > % cd /mail/fs/mbox/124 > % cat rawunix | sed '/^$/,$ s/^From / From /' > /tmp/msg >=20 > This file is already screwed. I compared to the original rawunix file > with xd: >=20 > term% diff <{xd -c rawunix} <{xd -c /tmp/msg} > 21,23c21,23 > < 0000140 a n u m l a u t : fc \n 04 04 > < 0000150 \n > < 0000151 > --- > > 0000140 a n u m l a u t : c2 80 \n 04 > > 0000150 04 \n > > 0000152 > term% >=20 > 0xfc represents '=FC' in iso-8859-15 but sed replaces it by 0xc2 and > 0xc8. Why? It should only hide bogus 'From ' lines in the mail body. > Is sed allowed to replace 0xfc by something different here? >=20 > Heiko >=20 >