Gnus development mailing list
 help / color / mirror / Atom feed
From: davidk@lysator.liu.se (David Kågedal)
Subject: Re: "Coding system"?  Eh?
Date: 10 Sep 1998 14:45:06 +0200	[thread overview]
Message-ID: <jp4sug6srx.fsf@sandra.lysator.liu.se> (raw)
In-Reply-To: =?ISO-8859-1?Q?Fran=E7ois?= Pinard's message of "09 Sep 1998 22:50:30 +-400"

François Pinard <pinard@iro.umontreal.ca> writes:

> davidk@lysator.liu.se (David Kågedal) écrit:
> 
> > Unicode defines a character set where LATIT-LETTER-A-WITH-UMLAUT has a
> > specific number (228 i believe), but Unicode also defines several
> > character encodings.  There is UCS-2 where all characters occupy two
> > bytes.  Then there is UTF-8 where most characters can be encoded using
> > one byte, while 'ä' needs at least two.  Actually, all characters can
> > be encoded with, say, three bytes in UTF-8.
> 
> You mean, all Unicode characters.  ISO 10646 might need more then three,
> as UTF-8 is also available for ISO 10646.

True.  I was talking about Unicode.

> > Unicode also defines UTF-7 which is so ugly that I won't say anything
> > further about it.
> 
> Does Unicode now defines UTF-7?  It originated from the IETF, and UTF-7
> is specifically for MIME contexts, which Unicode does not address.

I might be wrong about the origin of UTF-7.  But it's still ugly.

> > Then ISO-10646, which is in principle a superset of Unicode (but does
> > not contain any more defined characters) [...]
> 
> Some convergence happened, indeed, but the details are a bit more complex.
> 
> > also defines UCS-4, where all characters are encoded using four bytes,
> > and UTF-16, where all characters are encoding using two bytes.
> 
> I do not remember that ISO 10646 introduced UTF-16, I thought it was a
> Unicode invention, but once again, I'm no specialist and may easily be
> wrong.  ISO 10646 redefined the BMP so there is room for UTF-16 coding,
> so ISO 10646 is aware and compatible with Unicode on this.  By the way,
> UTF-16 encodes characters using either two or four bytes.

The difference between UTF-16 and UCS-2 is that it can encode some of
the charaters outside the Unicode range (BMP).  So I guess Unicode has
no need for UTF-16.

-- 
David Kågedal        <davidk@lysator.liu.se> http://www.lysator.liu.se/~davidk/


  reply	other threads:[~1998-09-10 12:45 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1998-09-05 16:01 Lars Magne Ingebrigtsen
1998-09-05 16:31 ` Michael Welsh Duggan
1998-09-05 20:07   ` Lars Magne Ingebrigtsen
1998-09-05 20:45     ` Hrvoje Niksic
1998-09-05 21:12       ` Lars Magne Ingebrigtsen
1998-09-05 21:47         ` Hrvoje Niksic
1998-09-07 15:12     ` David Kågedal
1998-09-09 18:50       ` François Pinard
1998-09-10 12:45         ` David Kågedal [this message]
1998-09-10 20:21           ` Gisle Aas
1998-09-11  6:27             ` François Pinard
1998-09-11  6:16           ` François Pinard
1998-09-11 16:14         ` Hallvard B Furuseth
2002-10-20 23:13       ` Lars Magne Ingebrigtsen
1998-09-09 18:59         ` François Pinard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jp4sug6srx.fsf@sandra.lysator.liu.se \
    --to=davidk@lysator.liu.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).