From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26394 invoked from network); 30 Jan 2005 06:35:45 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 30 Jan 2005 06:35:45 -0000 Received: (qmail 23830 invoked from network); 30 Jan 2005 06:35:39 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 30 Jan 2005 06:35:39 -0000 Received: (qmail 24423 invoked by alias); 30 Jan 2005 06:35:37 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 20758 Received: (qmail 24414 invoked from network); 30 Jan 2005 06:35:36 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 30 Jan 2005 06:35:36 -0000 Received: (qmail 23646 invoked from network); 30 Jan 2005 06:35:36 -0000 Received: from out010pub.verizon.net (HELO out010.verizon.net) (206.46.170.133) by a.mx.sunsite.dk with SMTP; 30 Jan 2005 06:35:32 -0000 Received: from candle.brasslantern.com ([4.11.10.129]) by out010.verizon.net (InterMail vM.5.01.06.06 201-253-122-130-106-20030910) with ESMTP id <20050130063530.BAWX24714.out010.verizon.net@candle.brasslantern.com> for ; Sun, 30 Jan 2005 00:35:30 -0600 Received: from candle.brasslantern.com (IDENT:schaefer@localhost [127.0.0.1]) by candle.brasslantern.com (8.12.11/8.12.11) with ESMTP id j0U6ZQB6024314 for ; Sat, 29 Jan 2005 22:35:26 -0800 Received: (from schaefer@localhost) by candle.brasslantern.com (8.12.11/8.12.11/Submit) id j0U6ZPTm024313 for zsh-workers@sunsite.dk; Sat, 29 Jan 2005 22:35:26 -0800 From: Bart Schaefer Message-Id: <1050130063525.ZM24312@candle.brasslantern.com> Date: Sun, 30 Jan 2005 06:35:25 +0000 In-Reply-To: <20050130010754.6F985863A@pwstephenson.fsnet.co.uk> Comments: In reply to Peter Stephenson "Re: UTF-8 input [was Re: PATCH: zle_params.c]" (Jan 30, 1:07am) References: <200501261806.j0QI6Q2d021854@news01.csr.com> <20050129034740.GA21742@scowler.net> <20050130010754.6F985863A@pwstephenson.fsnet.co.uk> X-Mailer: Z-Mail (5.0.0 30July97) To: Zsh hackers list Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Authentication-Info: Submitted using SMTP AUTH at out010.verizon.net from [4.11.10.129] at Sun, 30 Jan 2005 00:35:27 -0600 X-Spam-Checker-Version: SpamAssassin 3.0.2 on a.mx.sunsite.dk X-Spam-Level: X-Spam-Status: No, score=-2.6 required=6.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.2 X-Spam-Hits: -2.6 On Jan 30, 1:07am, Peter Stephenson wrote: } Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c] } } I thought of the following: self-insert could take a single character, } as at present, and then test if it was the initial part of a multibyte } character. If it was, it could read the rest; we might need a timeout to } avoid an infinite hang on systems that didn't do multibyte input } properly This would mean what, in terms of binding other functions to wide chars? That they'd behave like escape sequences do now? I would think you'd want to decide whether the input is a wide char at a lower level than that. Otherwise don't you have issues if what the user really means to bind to self-insert is a single-byte character that happens to have the high bit set? It seems to me that some stage of the input process has to be "told" that the input stream is UTF-8 rather than e.g. iso-8859-something. If it's the widget level that's going to handle that [*], I think it'd be most useful to create a self-insert-multibyte which does in fact wait indefinitely (or at least, longer than the normal escape-sequence key timeout) for the "rest" of a multibyte character after the first byte is seen, and feep if it doesn't get something recognizable as the rest. Then, probably, create a shortcut along the lines of bindkey -m that sets up self-insert-multibyte on the appropriate prefixes. [*] Is there a plan yet for UTF-8 shell scripts, by the way? That can't be handled at the ZLE level. What about zcompile? } In addition to getkey() and friends, there is the related matter of the } variable lastchar. Currently this is a single character; I'm not yet } 100% sure whether we can keep this, or promote it to a wchar_t, or } whether we might need both types. I fear it may be the last. Not just lastchar, but also the KEYS parameter. If wide chars are dealt with as sequences at the widget binding level, but BUFFER contains the corresponding wchars instead, then various currently-working tricks that involve inserting all or part of KEYS into BUFFER will fail. At least, it becomes harder to emulate self-insert(-multibyte) in widget funcs.