From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/16862 Path: main.gmane.org!not-for-mail From: davidk@lysator.liu.se (David =?ISO-8859-1?Q?K=E5gedal?=) Newsgroups: gmane.emacs.gnus.general Subject: Re: "Coding system"? Eh? Date: 10 Sep 1998 14:45:06 +0200 Sender: owner-ding@hpc.uh.edu Message-ID: References: NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: main.gmane.org 1035155666 30351 80.91.224.250 (20 Oct 2002 23:14:26 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sun, 20 Oct 2002 23:14:26 +0000 (UTC) Return-Path: Original-Received: from gizmo.hpc.uh.edu (gizmo.hpc.uh.edu [129.7.102.31]) by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id IAA18722 for ; Thu, 10 Sep 1998 08:54:33 -0400 (EDT) Original-Received: from sina.hpc.uh.edu (sina.hpc.uh.edu [129.7.3.5]) by gizmo.hpc.uh.edu (8.7.6/8.7.3) with ESMTP id HAF04938; Thu, 10 Sep 1998 07:23:16 -0500 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Thu, 10 Sep 1998 07:47:54 -0500 (CDT) Original-Received: from sclp3.sclp.com (root@sclp3.sclp.com [209.195.19.139]) by sina.hpc.uh.edu (8.7.3/8.7.3) with ESMTP id HAA24069 for ; Thu, 10 Sep 1998 07:47:44 -0500 (CDT) Original-Received: from samantha.lysator.liu.se (root@samantha.lysator.liu.se [130.236.254.202]) by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id IAA18664 for ; Thu, 10 Sep 1998 08:47:26 -0400 (EDT) Original-Received: from sandra.lysator.liu.se (davidk@sandra.lysator.liu.se [130.236.254.203]) by samantha.lysator.liu.se (8.8.7/8.8.7) with ESMTP id OAA08580; Thu, 10 Sep 1998 14:45:16 +0200 (MET DST) Original-Received: (from davidk@localhost) by sandra.lysator.liu.se (8.8.8/8.8.7) id OAA28733; Thu, 10 Sep 1998 14:45:07 +0200 (MET DST) Original-To: ding@gnus.org In-Reply-To: =?ISO-8859-1?Q?Fran=E7ois?= Pinard's message of "09 Sep 1998 22:50:30 +-400" Original-Lines: 44 X-Mailer: Gnus v5.6.24/Emacs 19.34 Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:16862 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:16862 François Pinard writes: > davidk@lysator.liu.se (David Kågedal) écrit: > > > Unicode defines a character set where LATIT-LETTER-A-WITH-UMLAUT has a > > specific number (228 i believe), but Unicode also defines several > > character encodings. There is UCS-2 where all characters occupy two > > bytes. Then there is UTF-8 where most characters can be encoded using > > one byte, while 'ä' needs at least two. Actually, all characters can > > be encoded with, say, three bytes in UTF-8. > > You mean, all Unicode characters. ISO 10646 might need more then three, > as UTF-8 is also available for ISO 10646. True. I was talking about Unicode. > > Unicode also defines UTF-7 which is so ugly that I won't say anything > > further about it. > > Does Unicode now defines UTF-7? It originated from the IETF, and UTF-7 > is specifically for MIME contexts, which Unicode does not address. I might be wrong about the origin of UTF-7. But it's still ugly. > > Then ISO-10646, which is in principle a superset of Unicode (but does > > not contain any more defined characters) [...] > > Some convergence happened, indeed, but the details are a bit more complex. > > > also defines UCS-4, where all characters are encoded using four bytes, > > and UTF-16, where all characters are encoding using two bytes. > > I do not remember that ISO 10646 introduced UTF-16, I thought it was a > Unicode invention, but once again, I'm no specialist and may easily be > wrong. ISO 10646 redefined the BMP so there is room for UTF-16 coding, > so ISO 10646 is aware and compatible with Unicode on this. By the way, > UTF-16 encodes characters using either two or four bytes. The difference between UTF-16 and UCS-2 is that it can encode some of the charaters outside the Unicode range (BMP). So I guess Unicode has no need for UTF-16. -- David Kågedal http://www.lysator.liu.se/~davidk/