From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14908 invoked from network); 14 Apr 2008 12:49:56 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.4 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 14 Apr 2008 12:49:56 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 26164 invoked from network); 14 Apr 2008 12:49:47 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 14 Apr 2008 12:49:47 -0000 Received: (qmail 27047 invoked by alias); 14 Apr 2008 12:49:43 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 24811 Received: (qmail 27033 invoked from network); 14 Apr 2008 12:49:42 -0000 Received: from bifrost.dotsrc.org (130.225.254.106) by sunsite.dk with SMTP; 14 Apr 2008 12:49:42 -0000 Received: from cluster-g.mailcontrol.com (cluster-g.mailcontrol.com [85.115.41.190]) by bifrost.dotsrc.org (Postfix) with ESMTP id 938158043AC7 for ; Mon, 14 Apr 2008 14:49:32 +0200 (CEST) Received: from cameurexb01.EUROPE.ROOT.PRI ([62.189.241.200]) by rly03g.srv.mailcontrol.com (MailControl) with ESMTP id m3ECeaCT016952 for ; Mon, 14 Apr 2008 13:49:18 +0100 Received: from news01 ([10.103.143.38]) by cameurexb01.EUROPE.ROOT.PRI with Microsoft SMTPSVC(6.0.3790.3959); Mon, 14 Apr 2008 13:48:27 +0100 Date: Mon, 14 Apr 2008 13:00:56 +0100 From: Peter Stephenson To: "Zsh Hackers' List" Subject: Re: PATCH: (large) initial support for combining characters in ZLE. Message-ID: <20080414130056.48b8e05a@news01> In-Reply-To: <20080413175442.0e95a241@pws-pc> References: <20080413175442.0e95a241@pws-pc> Organization: CSR X-Mailer: Claws Mail 3.3.1 (GTK+ 2.12.5; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 14 Apr 2008 12:48:27.0079 (UTC) FILETIME=[D562AD70:01C89E2D] X-Scanned-By: MailControl A-08-00-05 (www.mailcontrol.com) on 10.71.0.113 X-Virus-Scanned: ClamAV 0.91.2/6757/Mon Apr 14 14:06:56 2008 on bifrost X-Virus-Status: Clean Documention changes, some of them somewhat overdue. I wonder if it's time to merge the FAQ into the main documentation as zshfaq.1? Index: Doc/Zsh/roadmap.yo =================================================================== RCS file: /cvsroot/zsh/zsh/Doc/Zsh/roadmap.yo,v retrieving revision 1.10 diff -u -r1.10 roadmap.yo --- Doc/Zsh/roadmap.yo 1 Feb 2008 19:59:48 -0000 1.10 +++ Doc/Zsh/roadmap.yo 14 Apr 2008 11:50:10 -0000 @@ -44,6 +44,13 @@ tt(HISTSIZE) and tt(SAVEHIST) in ifzman(zmanref(zshparam))\ ifnzman(noderef(Parameters Used By The Shell)). +The shell now supports the UTF-8 character set (and also others if +supported by the operating system). This is (mostly) handled transparently +by the shell, but the degree of support in terminal emulators is variable. +There is some discussion of this in the shell FAQ, +http://zsh.dotsrc.org/FAQ/ . Note in particular that for combining +characters to be handled the option tt(COMBINING_CHARS) needs to be set. + subsect(Completion) Completion is a feature present in many shells. It allows the user to Index: Etc/FAQ.yo =================================================================== RCS file: /cvsroot/zsh/zsh/Etc/FAQ.yo,v retrieving revision 1.37 diff -u -r1.37 FAQ.yo --- Etc/FAQ.yo 31 Mar 2008 15:03:11 -0000 1.37 +++ Etc/FAQ.yo 14 Apr 2008 11:50:14 -0000 @@ -126,11 +126,11 @@ 4.5. How do I get started with programmable completion? 4.6. Suppose I want to complete all files during a special completion? -Chapter 5: Multibyte input +Chapter 5: Multibyte input and output 5.1. What is multibyte input? -5.2. How does zsh handle multibyte input? -5.3. How do I ensure multibyte input works on my system? +5.2. How does zsh handle multibyte input and output? +5.3. How do I ensure multibyte input and output work on my system? 5.4. How can I input characters that aren't on my keyboard? Chapter 6: The future of zsh @@ -1961,7 +1961,7 @@ such as expansion or approximate completion. -chapter(Multibyte input) +chapter(Multibyte input and output) label(c5) sect(What is multibyte input?) @@ -2012,7 +2012,7 @@ in those formats.) -sect(How does zsh handle multibyte input?) +sect(How does zsh handle multibyte input and output?) Until version 4.3, zsh didn't handle multibyte input properly at all. Each octet in a multibyte character would look to the shell like a @@ -2021,50 +2021,44 @@ cause all sorts of odd effects. (It was possible to edit in zsh using single-byte extensions of ASCII such as the ISO 8859 family, however.) - From version 4.3, multibyte input is handled in the line editor if zsh - has been compiled with the appropriate definitions. This will happen - automatically if the compiler defines __STDC_ISO_10646__, which is true - for many recent GNU-based systems. On other systems you must configure - zsh with the argument --enable-multibyte to configure. Explicit use of - --enable-multibyte should work on many other recent UNIX systems; if it - works on yours, and that's not mentioned in the shell documentation, - please report this to zsh-workers@sunsite.dk, and if it doesn't but you - can work out why not we'd also be interested in hearing. - - (The reason for the test for __STDC_ISO_10646__ is that its presence - happens to indicate that the required library support is likely to be - present, short-circuiting a large number of configuration tests. This - isn't strictly guaranteed, since the definition indicates the rather more - limited fact that the wide character representation used internally by - the shell is Unicode. However, in practice such systems provide the - right level of support for zsh to use. It would be better to test - individually for the library features the shell needs; unfortunately - there are a lot of them.) - - You can test if multibyte handling is compiled into your version of the - shell by running: - verb( - (bindkey -m) - ) - which should output a warning: - verb( - bindkey: warning: `bindkey -m' disables multibyte support - ) - If it doesn't, you don't have multibyte support in your shell. The - parentheses are there to run the command in a subshell, which protects - your interactive shell from the effects being warned about. - - Multibyte strings are not yet handled anywhere else in the shell. This - means, for example, patterns treat multibyte characters as a set of single - octets and the ${#var} syntax counts octets, not characters. There will - probably be new syntax to ensure that zsh can work both in its traditional - way as well as when interpreting multibyte characters. + From version 4.3.4, multibyte input is handled in the line editor if zsh + has been compiled with the appropriate definitions, and is automatically + activated. This is indicated by the option tt(MULTIBYTE), which is + set by default on shells that support multibyte mode. Hence you + can test this with a standard option test: `tt([[ -o multibyte ]])'. + + The tt(MULTIBYTE) option affects the entire shell: parameter expansion, + pattern matching, etc. count valid multibyte character strings as a + single character. You can unset the option locally in a function to + revert to single-byte operation. + + Note that if the shell is emulating a Bourne shell the tt(MULTIBYTE) + option is unset by default. This allows various POSIX modes to + work normally (POSIX does not deal with multibyte characters). If + you use a "sh" or "ksh" emulation interactively you shouldprobably + set the tt(MULTIBYTE) option. + + The other option that affects multibyte support is tt(COMBINING_CHARS), + new in version 4.3.7. When this is set, any zero-length punctuation + characters that follow an alphanumeric character (the base character) are + assumed to be modifications (accents etc.) to the base character and to + be displayed within the same screen area as the base character. As not + all terminals handle this, even if they correctly display the base + multibyte character, this option is not on by default. The KDE terminal + emulator tt(konsole) is known to handle combining characters. + + The tt(COMBINING_CHARS) option only affects output; combining characters + may always be input, but when the option is off will be displayed + specially. By default this is as a code point (the index of the + character in the character set) between angle brackets, usually + in inverse video. Highlighting of such special characters can + be modified using the new array parameter tt(zle_highlight). -sect(How do I ensure multibyte input works on my system?) +sect(How do I ensure multibyte input and output work on my system?) Once you have a version of zsh with multibyte support, you need to - ensure the envivronment is correct. We'll assume you're using UTF-8. + ensure the environment is correct. We'll assume you're using UTF-8. Many modern systems may come set up correctly already. Try one of the editing widgets described in the next section to see. @@ -2163,6 +2157,9 @@ however, using UTF-8 massively extends the number of valid characters that can be produced. + See also url(http://www.cl.cam.ac.uk/~mgk25/unicode.html#input)http://www.cl.cam.ac.uk/~mgk25/unicode.html#input) + for general information on entering Unicode characters from a keyboard. + chapter(The future of zsh)