From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/21005 Path: main.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.gnus.general Subject: Re: More charset things Date: Fri, 5 Feb 1999 09:47:18 +0900 (JST) Sender: owner-ding@hpc.uh.edu Message-ID: <14010.16278.215333.623477@tanko.sk.tsukuba.ac.jp> References: <87d83qkyjf.fsf@pc-hrvoje.srce.hr> <87ognahyoh.fsf@pc-hrvoje.srce.hr> NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: main.gmane.org 1035159194 20762 80.91.224.250 (21 Oct 2002 00:13:14 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 00:13:14 +0000 (UTC) Return-Path: Original-Received: from karazm.math.uh.edu (karazm.math.uh.edu [129.7.128.1]) by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id TAA25595 for ; Thu, 4 Feb 1999 19:48:39 -0500 (EST) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by karazm.math.uh.edu (8.9.1/8.9.1) with ESMTP id SAB03128; Thu, 4 Feb 1999 18:48:05 -0600 (CST) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Thu, 04 Feb 1999 18:48:05 -0600 (CST) Original-Received: from sclp3.sclp.com (root@sclp3.sclp.com [204.252.123.139]) by sina.hpc.uh.edu (8.7.3/8.7.3) with ESMTP id SAA15929 for ; Thu, 4 Feb 1999 18:47:55 -0600 (CST) Original-Received: from localhost (root@tanko.sk.tsukuba.ac.jp [130.158.99.155]) by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id TAA25571 for ; Thu, 4 Feb 1999 19:47:45 -0500 (EST) Original-Received: by localhost id m108ZQU-00013ZC (Debian Smail-3.2.0.102 1998-Aug-2 #2); Fri, 5 Feb 1999 09:47:18 +0900 (JST) Original-To: ding@gnus.org, xemacs-mule@xemacs.org In-Reply-To: X-Mailer: VM 6.64 under 21.0 "Poitou60" XEmacs Lucid (beta60) Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:21005 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:21005 >>>>> "Lars" == Lars Magne Ingebrigtsen writes: Lars> Hrvoje Niksic writes: >> MULE is little else than a Japanese version of Emacs, and it >> appears that the Japanese are not interested in Unicode. So it The MULE development group is nearly entirely Japanese; including the people implementing Devanagari (for sure) and Arabic and Ethiopic (IIRC). Not surprisingly, the tuning (and tuning is absolutely necessary; the linguists don't know enough about language for charset guessing and the like to be more than heuristic) is best for Japanese, and bugs for non-Japanese languages don't get found and fixed quickly. But MULE is the only truly multilingual platform there is at the moment, to the best of my knowledge; Unicode doesn't satisfy the needs of lots of people, and is not easily extensible without changing the standard. MULE is. MULE is more than a Japanese version of Emacs. The Japanese are divided on Unicode; some are vehemently opposed, others are interested. There don't seem to be any strong advocates, though. >> wasn't implemented. I'm not sure about FSF, but for XEmacs, I >> know of no plans to implement it in the near future. Lars> A partial implementation of utf-mumble was posted recently Lars> somewhere by someone. (Could I possible get any more Lars> vague?) So I'm Cc'ing this to the xemacs-mule list. Morioka-san ported (IIRC) a Lisp-level implementation of UTF-8. The attachments were broken on the ML (so Steve never was able to look at it), I'll restore from archive the working (I hope) copy I got from Morioka. Martin Buchholz believes that since the tables are in Lisp, the performance impact will be huge. Lars> I asked before for a likely book that would introduce me to Lars> the basic concepts, and someone (Stephen Turnbull?) told me, Lars> but then I forgot. Prices are vague recollections, in decreasing order of importance for basic understanding: Ken Lunde. Chinese, Japanese, Korean and Vietnamese Information Processing. O'Reilly Associates. Probably the most useful single volume, although it doesn't cover single-octet encodings. ISO. ISO-2022: Extension Techniques for Coded Character Sets. US$75. Unicode Consortium. The Unicode Standard, v2.x. About US$70 from Amazon. ISO. ISO-10646: Universal Multi-octet Character Set Encoding Standard. About US$125. Don't bother unless you've got extra money, Unicode Standard is much more complete and readable. All ISO-10646 has extra is 4-octet encoding, which is presently useless, and it is very likely that any UTF-8 . I don't know of any textbooks on character set stuff, there must be some somewhere. Lunde's book will have a very extensive bibliography. -- University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091 __________________________________________________________________________ __________________________________________________________________________ What are those two straight lines for? "Free software rules."