From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8365 invoked from network); 30 Jan 2005 01:06:36 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 30 Jan 2005 01:06:36 -0000 Received: (qmail 63644 invoked from network); 30 Jan 2005 01:06:30 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 30 Jan 2005 01:06:30 -0000 Received: (qmail 397 invoked by alias); 30 Jan 2005 01:06:24 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 20757 Received: (qmail 377 invoked from network); 30 Jan 2005 01:06:21 -0000 Received: from unknown (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 30 Jan 2005 01:06:21 -0000 Received: (qmail 63353 invoked from network); 30 Jan 2005 01:05:45 -0000 Received: from cmailm4.svr.pol.co.uk (195.92.193.211) by a.mx.sunsite.dk with SMTP; 30 Jan 2005 01:05:40 -0000 Received: from modem-111.green-mandarin.dialup.pol.co.uk ([62.137.23.111] helo=pwstephenson.fsnet.co.uk) by cmailm4.svr.pol.co.uk with esmtp (Exim 4.41) id 1Cv3X1-0005nd-4z for zsh-workers@sunsite.dk; Sun, 30 Jan 2005 01:05:39 +0000 Received: by pwstephenson.fsnet.co.uk (Postfix, from userid 501) id 6F985863A; Sat, 29 Jan 2005 20:07:54 -0500 (EST) Received: from pwstephenson.fsnet.co.uk (localhost [127.0.0.1]) by pwstephenson.fsnet.co.uk (Postfix) with ESMTP id 5B77C8636 for ; Sun, 30 Jan 2005 01:07:54 +0000 (GMT) To: Zsh hackers list Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c] In-reply-to: <20050129034740.GA21742@scowler.net> References: <200501261806.j0QI6Q2d021854@news01.csr.com> <20050129034740.GA21742@scowler.net> Date: Sun, 30 Jan 2005 01:07:53 +0000 From: Peter Stephenson Message-Id: <20050130010754.6F985863A@pwstephenson.fsnet.co.uk> X-Spam-Checker-Version: SpamAssassin 3.0.2 on a.mx.sunsite.dk X-Spam-Level: X-Spam-Status: No, score=-2.6 required=6.0 tests=BAYES_00 autolearn=ham version=3.0.2 X-Spam-Hits: -2.6 Clint Adams wrote: > > I've left last_isearch since it's not clear what is to become of it > > yet. Fixing doisearch isn't going to be great fun (240 lines, 2 > > comments). It'll have to wait until we decide about input. > > What needs deciding? At what stage we turn a character from read() into a wide character. I argued before that key bindings should still use ordinary character strings to avoid breaking existing bindings. Somewhere before we insert a character in the line we need to accumulate bytes from multibyte characters where necessary. I thought of the following: self-insert could take a single character, as at present, and then test if it was the initial part of a multibyte character. If it was, it could read the rest; we might need a timeout to avoid an infinite hang on systems that didn't do multibyte input properly, which is potentially quite a lot of them. This would allow you to bind all 8-bit characters with the top bit set to self-insert and voila, multibyte character input with the property (as in UTF-8) that the 7-bit subset is ASCII is now completely handled, but with the choice of whether to do so or keep old 8-bit bindings left to users. This leaves other calls to getkey() and other low-level key handling routines. Some might need the same mechanism; isearch is an example, because some keys are interpreted while some are inserted into the search string. A further complication is that when searching the history we might well want to keep the history lines as multibyte strings; then the search string remains in that format, too. As this example indicates I think each case will need considering on its merits. In addition to getkey() and friends, there is the related matter of the variable lastchar. Currently this is a single character; I'm not yet 100% sure whether we can keep this, or promote it to a wchar_t, or whether we might need both types. I fear it may be the last. -- Peter Stephenson Work: pws@csr.com Web: http://www.pwstephenson.fsnet.co.uk