From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@9fans.net
Subject: Re: [9fans] utf-8 text files from httpd
Date: Mon, 19 Oct 2009 09:14:41 -0400 [thread overview]
Message-ID: <25526834e4974523e25c09565df13029@brasstown.quanstro.net> (raw)
In-Reply-To: <<fe41879c0910190300l51480646pf9630e90c6f30207@mail.gmail.com>>
> Is the output of file(1) appropriate for this purpose?
> Shouldn't your sample file also be sent as UTF-8?
it should be. for example since
; echo ☺ | file
stdin: short UTF text # sic
one would expect that echo ☺ | file -m
would yield text/plain; charset=utf-8.
> file(1) speaks only mine type but not charset.
file does sometimes return a character set.
minooka; grep -n charset /sys/src/cmd/file.c | sed 1q
594: 0xfeff0000, 0xffffffff, "utf-32be\n",
"text/plain charset=utf-32be",
it doesn't make sense to me for file to be
inconsistent. if file emits character sets, it
should always emit character sets.
i'm not sure why the ';' is dropped. this would force
a client to parse the output.
> it is difficult or impossible to determine charset from a few japanese
> letters.
plan 9 is a utf-8 system. if we have files in another
character set that's not a proper subset, most plan 9
tools will not work properly on them.
also, since it is hard to guess the charset of particular
japanese-encoded files, it would probablly be good to
force their encoding with html decoration.
- erik
next parent reply other threads:[~2009-10-19 13:14 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <<fe41879c0910190300l51480646pf9630e90c6f30207@mail.gmail.com>
2009-10-19 13:14 ` erik quanstrom [this message]
2009-10-19 13:49 ` roger peppe
[not found] <<df49a7370910190732i526a15b6o6d2822cd2d14bff0@mail.gmail.com>
2009-10-19 14:50 ` erik quanstrom
[not found] <<df49a7370910190649k3179f0b1r4c877d5ca72af232@mail.gmail.com>
2009-10-19 13:55 ` erik quanstrom
2009-10-19 14:32 ` roger peppe
2009-10-19 17:36 ` lucio
2009-10-19 9:05 Eris Discordia
[not found] <<A6127A93-8E78-4E11-9284-56A16D2A2093@ar.aichi-u.ac.jp>
2009-10-19 4:46 ` erik quanstrom
[not found] <<fe41879c0910181734l6363baebsa896bda992d690@mail.gmail.com>
2009-10-19 1:37 ` erik quanstrom
2009-10-19 10:00 ` Akshat Kumar
2009-10-19 12:45 ` Kenji Arisawa
-- strict thread matches above, loose matches on Subject: below --
2009-10-19 0:34 Akshat Kumar
2009-10-19 1:39 ` andrey mirtchovski
2009-10-19 2:16 ` Kenji Arisawa
2009-10-19 3:35 ` Kenji Arisawa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=25526834e4974523e25c09565df13029@brasstown.quanstro.net \
--to=quanstro@quanstro.net \
--cc=9fans@9fans.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).