From: davidk@lysator.liu.se (David Kågedal)
Subject: Re: "Coding system"? Eh?
Date: 10 Sep 1998 14:45:06 +0200 [thread overview]
Message-ID: <jp4sug6srx.fsf@sandra.lysator.liu.se> (raw)
In-Reply-To: =?ISO-8859-1?Q?Fran=E7ois?= Pinard's message of "09 Sep 1998 22:50:30 +-400"
François Pinard <pinard@iro.umontreal.ca> writes:
> davidk@lysator.liu.se (David Kågedal) écrit:
>
> > Unicode defines a character set where LATIT-LETTER-A-WITH-UMLAUT has a
> > specific number (228 i believe), but Unicode also defines several
> > character encodings. There is UCS-2 where all characters occupy two
> > bytes. Then there is UTF-8 where most characters can be encoded using
> > one byte, while 'ä' needs at least two. Actually, all characters can
> > be encoded with, say, three bytes in UTF-8.
>
> You mean, all Unicode characters. ISO 10646 might need more then three,
> as UTF-8 is also available for ISO 10646.
True. I was talking about Unicode.
> > Unicode also defines UTF-7 which is so ugly that I won't say anything
> > further about it.
>
> Does Unicode now defines UTF-7? It originated from the IETF, and UTF-7
> is specifically for MIME contexts, which Unicode does not address.
I might be wrong about the origin of UTF-7. But it's still ugly.
> > Then ISO-10646, which is in principle a superset of Unicode (but does
> > not contain any more defined characters) [...]
>
> Some convergence happened, indeed, but the details are a bit more complex.
>
> > also defines UCS-4, where all characters are encoded using four bytes,
> > and UTF-16, where all characters are encoding using two bytes.
>
> I do not remember that ISO 10646 introduced UTF-16, I thought it was a
> Unicode invention, but once again, I'm no specialist and may easily be
> wrong. ISO 10646 redefined the BMP so there is room for UTF-16 coding,
> so ISO 10646 is aware and compatible with Unicode on this. By the way,
> UTF-16 encodes characters using either two or four bytes.
The difference between UTF-16 and UCS-2 is that it can encode some of
the charaters outside the Unicode range (BMP). So I guess Unicode has
no need for UTF-16.
--
David Kågedal <davidk@lysator.liu.se> http://www.lysator.liu.se/~davidk/
next prev parent reply other threads:[~1998-09-10 12:45 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
1998-09-05 16:01 Lars Magne Ingebrigtsen
1998-09-05 16:31 ` Michael Welsh Duggan
1998-09-05 20:07 ` Lars Magne Ingebrigtsen
1998-09-05 20:45 ` Hrvoje Niksic
1998-09-05 21:12 ` Lars Magne Ingebrigtsen
1998-09-05 21:47 ` Hrvoje Niksic
1998-09-07 15:12 ` David Kågedal
1998-09-09 18:50 ` François Pinard
1998-09-10 12:45 ` David Kågedal [this message]
1998-09-10 20:21 ` Gisle Aas
1998-09-11 6:27 ` François Pinard
1998-09-11 6:16 ` François Pinard
1998-09-11 16:14 ` Hallvard B Furuseth
2002-10-20 23:13 ` Lars Magne Ingebrigtsen
1998-09-09 18:59 ` François Pinard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jp4sug6srx.fsf@sandra.lysator.liu.se \
--to=davidk@lysator.liu.se \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).