9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@9fans.net
Subject: Re: [9fans] utf-8 text files from httpd
Date: Mon, 19 Oct 2009 09:14:41 -0400	[thread overview]
Message-ID: <25526834e4974523e25c09565df13029@brasstown.quanstro.net> (raw)
In-Reply-To: <<fe41879c0910190300l51480646pf9630e90c6f30207@mail.gmail.com>>

> Is the output of file(1) appropriate for this purpose?
> Shouldn't your sample file also be sent as UTF-8?

it should be.  for example since
	; echo ☺ | file
	stdin: short UTF text	# sic
one would expect that echo ☺ | file -m
would yield text/plain; charset=utf-8.

> file(1) speaks only mine type but not charset.

file does sometimes return a character set.

minooka;  grep -n charset /sys/src/cmd/file.c | sed 1q
594: 	0xfeff0000,	0xffffffff,	"utf-32be\n",
	"text/plain charset=utf-32be",

it doesn't make sense to me for file to be
inconsistent.  if file emits character sets, it
should always emit character sets.

i'm not sure why the ';' is dropped.  this would force
a client to parse the output.

> it is difficult or impossible to determine charset from a few japanese
> letters.

plan 9 is a utf-8 system.  if we have files in another
character set that's not a proper subset, most plan 9
tools will not work properly on them.

also, since it is hard to guess the charset of particular
japanese-encoded files, it would probablly be good to
force their encoding with html decoration.

- erik



       reply	other threads:[~2009-10-19 13:14 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <<fe41879c0910190300l51480646pf9630e90c6f30207@mail.gmail.com>
2009-10-19 13:14 ` erik quanstrom [this message]
2009-10-19 13:49   ` roger peppe
     [not found] <<df49a7370910190732i526a15b6o6d2822cd2d14bff0@mail.gmail.com>
2009-10-19 14:50 ` erik quanstrom
     [not found] <<df49a7370910190649k3179f0b1r4c877d5ca72af232@mail.gmail.com>
2009-10-19 13:55 ` erik quanstrom
2009-10-19 14:32   ` roger peppe
2009-10-19 17:36     ` lucio
2009-10-19  9:05 Eris Discordia
     [not found] <<A6127A93-8E78-4E11-9284-56A16D2A2093@ar.aichi-u.ac.jp>
2009-10-19  4:46 ` erik quanstrom
     [not found] <<fe41879c0910181734l6363baebsa896bda992d690@mail.gmail.com>
2009-10-19  1:37 ` erik quanstrom
2009-10-19 10:00   ` Akshat Kumar
2009-10-19 12:45     ` Kenji Arisawa
  -- strict thread matches above, loose matches on Subject: below --
2009-10-19  0:34 Akshat Kumar
2009-10-19  1:39 ` andrey mirtchovski
2009-10-19  2:16 ` Kenji Arisawa
2009-10-19  3:35   ` Kenji Arisawa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25526834e4974523e25c09565df13029@brasstown.quanstro.net \
    --to=quanstro@quanstro.net \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).