From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20005 invoked from network); 1 Oct 2004 19:46:57 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 1 Oct 2004 19:46:57 -0000 Received: (qmail 35106 invoked from network); 1 Oct 2004 19:46:51 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 1 Oct 2004 19:46:51 -0000 Received: (qmail 10954 invoked by alias); 1 Oct 2004 19:46:39 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 20439 Received: (qmail 10944 invoked from network); 1 Oct 2004 19:46:39 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 1 Oct 2004 19:46:39 -0000 Received: (qmail 34819 invoked from network); 1 Oct 2004 19:46:38 -0000 Received: from mail36.messagelabs.com (193.109.254.211) by a.mx.sunsite.dk with SMTP; 1 Oct 2004 19:46:27 -0000 X-VirusChecked: Checked X-Env-Sender: okiddle@yahoo.co.uk X-Msg-Ref: server-19.tower-36.messagelabs.com!1096659986!9754064 X-StarScan-Version: 5.2.10; banners=-,-,- X-Originating-IP: [158.234.9.163] Received: (qmail 11783 invoked from network); 1 Oct 2004 19:46:26 -0000 Received: from iris.logica.co.uk (158.234.9.163) by server-19.tower-36.messagelabs.com with SMTP; 1 Oct 2004 19:46:26 -0000 Received: from trentino.logica.co.uk ([158.234.142.61]) by iris.logica.co.uk (8.12.3/8.12.3/Debian -4) with ESMTP id i91JkPAI029685; Fri, 1 Oct 2004 20:46:26 +0100 Received: from trentino.logica.co.uk (localhost [127.0.0.1]) by trentino.logica.co.uk (Postfix) with ESMTP id A1680FFBE; Fri, 1 Oct 2004 21:46:05 +0200 (CEST) Cc: Zsh-workers X-VirusChecked: Checked X-StarScan-Version: 5.0.7; banners=.,-,- In-reply-to: <20041001184122.GA9094@fargo> From: Oliver Kiddle References: <20041001184122.GA9094@fargo> To: David =?iso-8859-15?Q?G=F3mez?= Subject: Re: UTF-8 support Date: Fri, 01 Oct 2004 21:46:05 +0200 Message-ID: <23473.1096659965@trentino.logica.co.uk> X-Spam-Checker-Version: SpamAssassin 2.63 on a.mx.sunsite.dk X-Spam-Level: X-Spam-Status: No, hits=0.0 required=6.0 tests=none autolearn=no version=2.63 X-Spam-Hits: 0.0 -------- David =?iso-8859-15?Q?G=F3mez?= wrote: > So i conclude from your response that nobody is working on it ;). > I understand the time problem, everybody is short on time, including Nothing has been done. A few people may have done some work that was never posted. I got as far reading up, thinking about what the right approach would be and adding support for stuff like the following to print characters given their unicode code point: echo '\u20ac' It seemed a good point to start because it'll be useful for testing. Unfortunately, I'm very short on time for the rest of this year. > But i need help to know where to start. What parts of zsh would need > to be worked on, only zle? Is there already, some kind of, although Most parts of the source will need work but it is possible to add support in individual areas. So don't start with completion, find something simple like the print builtin (in particular -c and -C options). Builtins in general are simple because they are relatively self-contained. If you try to attack zle first, you'll just get fed up with it being too hard. Once you've got something simple like print done, another idea for something simple would be to add a Test/U01 test and add code to make it search for a UTF-8 locale ($langinfo[CODESET] in the langinfo module will help) and use it for LC_CTYPE. > minimal, support for utf-8? Also, if you know from some documentation > about zsh internals, besides from source ;), please point me to it. The source and comments are the only documentation I know of but you can always ask on the list. Do you know much about unicode/UTF-8? For the minimum, read http://www.joelonsoftware.com/articles/Unicode.html and then read http://www.cl.cam.ac.uk/~mgk25/unicode.html In my opinion it would be sensible to support multibyte encodings in general and not just UTF-8. Doing this isn't much effort beyond handling UTF-8 if we assume basic ASCII compatibility and don't worry about stateful encodings. There are a few characters which are defined to display as double width even in proportional fonts so keep that in mind. You can detect whether UTF-8 is enabled with the C library's locale functions but we shouldn't need to: functions such as mbrlen do all the work for us. Once we've got a few basic areas working, we might want to think about whether there are any common constructs we should create general functions for in utils.c. Oliver