From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Mon, 19 Oct 2009 09:14:41 -0400 To: 9fans@9fans.net Message-ID: <25526834e4974523e25c09565df13029@brasstown.quanstro.net> In-Reply-To: <> References: <> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Subject: Re: [9fans] utf-8 text files from httpd Topicbox-Message-UUID: 8a6cf4d6-ead5-11e9-9d60-3106f5b1d025 > Is the output of file(1) appropriate for this purpose? > Shouldn't your sample file also be sent as UTF-8? it should be. for example since ; echo ☺ | file stdin: short UTF text # sic one would expect that echo ☺ | file -m would yield text/plain; charset=utf-8. > file(1) speaks only mine type but not charset. file does sometimes return a character set. minooka; grep -n charset /sys/src/cmd/file.c | sed 1q 594: 0xfeff0000, 0xffffffff, "utf-32be\n", "text/plain charset=utf-32be", it doesn't make sense to me for file to be inconsistent. if file emits character sets, it should always emit character sets. i'm not sure why the ';' is dropped. this would force a client to parse the output. > it is difficult or impossible to determine charset from a few japanese > letters. plan 9 is a utf-8 system. if we have files in another character set that's not a proper subset, most plan 9 tools will not work properly on them. also, since it is hard to guess the charset of particular japanese-encoded files, it would probablly be good to force their encoding with html decoration. - erik