Gnus development mailing list
 help / color / mirror / Atom feed
From: davidk@lysator.liu.se (David Kågedal)
Subject: Re: "Coding system"?  Eh?
Date: 07 Sep 1998 17:12:40 +0200	[thread overview]
Message-ID: <jpd8982byv.fsf@tinget.lysator.liu.se> (raw)
In-Reply-To: Lars Magne Ingebrigtsen's message of "05 Sep 1998 22:07:43 +0200"

Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> Michael Welsh Duggan <md5i@cs.cmu.edu> writes:
> 
> > No, not really.  A character set is merely a set of characters.
> > latin-1, etc, are often called character sets because they use the
> > same number of characters as extended ASCII, etc.  A coding-system is
> > just that: a coding-system.  The characters could be encoded any which
> > way (including encrypted!).  For example, old-jis uses escapes around
> > sequences of 7-bit characters.  This is an encoding, which you can
> > display using a character set, but not a character set in and of
> > itself.
> 
> All texts consists of characters (from some character set) encoded
> (using some coding system).  iso-8859-1, for instance, represents the
> character LATIN-LETTER-A-WITH-UMLAUT ("ä") with one byte that contains
> the number 0xe4.  The same letter encoded in a different charset (say,
> Unicode) would occupy two bytes.  Other character sets use multiple
> bytes to represent characters, like iso-2022-jp.

Now you are mixing things.  The phrase "encoded in a different charset
(say, Unicode)" is a semantic error.

Unicode defines a character set where LATIT-LETTER-A-WITH-UMLAUT has a
specific number (228 i believe), but Unicode also defines several
character encodings.  There is UCS-2 where all characters occupy two
bytes.  Then there is UTF-8 where most characters can be encoded using
one byte, while 'ä' needs at least two.  Actually, all characters can
be encoded with, say, three bytes in UTF-8.  Unicode also defines
UTF-7 which is so ugly that I won't say anything further about it.
Then ISO-10646, which is in principle a superset of Unicode (but does
not contain any more defined characters) also defines UCS-4, where all
characters are encoded using four bytes, and UTF-16, where all
characters are encoding using two bytes.

Byt the character set is always the same, with numbers ranging from 0
to 65536.

> When one talks about character sets (in, say, MIME) one talks about
> encoded character sets.  Abstract character sets aren't all that
> interesting when fiddling with data.  iso-8859-1, which MULE calls a
> coding system, is something everyone else calls a character set.  The
> same with old-jis and iso-2022-jp.

ISO 8859-1 is both a character set, and an encoding (one-to-one from
charater to byte), I believe.  But I'm not sure how it is defined.

-- 
David Kågedal        <davidk@lysator.liu.se> http://www.lysator.liu.se/~davidk/


  parent reply	other threads:[~1998-09-07 15:12 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1998-09-05 16:01 Lars Magne Ingebrigtsen
1998-09-05 16:31 ` Michael Welsh Duggan
1998-09-05 20:07   ` Lars Magne Ingebrigtsen
1998-09-05 20:45     ` Hrvoje Niksic
1998-09-05 21:12       ` Lars Magne Ingebrigtsen
1998-09-05 21:47         ` Hrvoje Niksic
1998-09-07 15:12     ` David Kågedal [this message]
1998-09-09 18:50       ` François Pinard
1998-09-10 12:45         ` David Kågedal
1998-09-10 20:21           ` Gisle Aas
1998-09-11  6:27             ` François Pinard
1998-09-11  6:16           ` François Pinard
1998-09-11 16:14         ` Hallvard B Furuseth
2002-10-20 23:13       ` Lars Magne Ingebrigtsen
1998-09-09 18:59         ` François Pinard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jpd8982byv.fsf@tinget.lysator.liu.se \
    --to=davidk@lysator.liu.se \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).