From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: Date: Fri, 19 Oct 2007 14:07:36 -0400 From: "Russ Cox" To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu> Subject: Re: [9fans] pipeto.lib spool and encoding In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071018201035.5EC931E8C22@holo.morphisms.net> Topicbox-Message-UUID: d4cfeda6-ead2-11e9-9d60-3106f5b1d025 > > in /mail/lib/pipeto.lib, the line > > > > sed '/^$/,$ s/^From / From /' >$TMP.msg > > > > needs to be replaced with a c program that does this > > conversion without coercing its input text into utf-8. > > > > russ > > unfortunately, i think the patch is on the wrong track. > sed isn't coercing it's input to utf-8. there's no active > conversion going on. plan 9 programs assume utf-8 input, > since plan 9 uses utf-8. i said coerce, not convert. sed is treating its input as utf-8, like most plan 9 programs, but raw mail messages might be some other 8-bit ascii-compatible encoding. so the bytes that are not valid utf-8 sequences are getting mangled by the coercion into a Rune buffer. > i think a better solution to this is to convert the incoming > message to utf-8 first. there are likely more problems similar > to this one as plan 9 tools make valid assumptions that upas doesn't > honour. most plan 9 tools are used on the upas presentation of a mailbox, which *is* in utf-8. very few tools operate directly on the 8-bit mail message. pipeto.lib is one of the few, and even there it just works to get its input into an mbox and then invokes upas/fs. attempting to perform any conversion of the raw message is a mistake. you're almost guaranteed to lose some information, and with little to no benefit (thanks to everything using upas/fs to access mail). russ