From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1745 invoked from network); 31 Jan 2005 11:47:39 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 31 Jan 2005 11:47:39 -0000 Received: (qmail 96100 invoked from network); 31 Jan 2005 11:47:32 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 31 Jan 2005 11:47:32 -0000 Received: (qmail 26899 invoked by alias); 31 Jan 2005 11:47:27 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 20761 Received: (qmail 26885 invoked from network); 31 Jan 2005 11:47:26 -0000 Received: from unknown (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 31 Jan 2005 11:47:26 -0000 Received: (qmail 95729 invoked from network); 31 Jan 2005 11:46:50 -0000 Received: from mailhost1.csr.com (HELO MAILSWEEPER01.csr.com) (81.105.217.43) by a.mx.sunsite.dk with SMTP; 31 Jan 2005 11:46:46 -0000 Received: from exchange03.csr.com (unverified [10.100.137.60]) by MAILSWEEPER01.csr.com (Content Technologies SMTPRS 4.3.12) with ESMTP id for ; Mon, 31 Jan 2005 11:45:22 +0000 Received: from news01.csr.com ([10.103.143.38]) by exchange03.csr.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 31 Jan 2005 11:46:43 +0000 Received: from news01.csr.com (localhost.localdomain [127.0.0.1]) by news01.csr.com (8.13.1/8.12.11) with ESMTP id j0VBkjJw028882 for ; Mon, 31 Jan 2005 11:46:45 GMT Received: from csr.com (pws@localhost) by news01.csr.com (8.13.1/8.13.1/Submit) with ESMTP id j0VBki1g028832 for ; Mon, 31 Jan 2005 11:46:45 GMT Message-Id: <200501311146.j0VBki1g028832@news01.csr.com> X-Authentication-Warning: news01.csr.com: pws owned process doing -bs To: Zsh hackers list Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c] In-reply-to: <1050130063525.ZM24312@candle.brasslantern.com> References: <200501261806.j0QI6Q2d021854@news01.csr.com> <20050129034740.GA21742@scowler.net> <20050130010754.6F985863A@pwstephenson.fsnet.co.uk> <1050130063525.ZM24312@candle.brasslantern.com> Date: Mon, 31 Jan 2005 11:46:44 +0000 From: Peter Stephenson X-OriginalArrivalTime: 31 Jan 2005 11:46:43.0521 (UTC) FILETIME=[88FED310:01C5078A] X-Spam-Checker-Version: SpamAssassin 3.0.2 on a.mx.sunsite.dk X-Spam-Level: X-Spam-Status: No, score=-2.4 required=6.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.2 X-Spam-Hits: -2.4 Bart Schaefer wrote: > On Jan 30, 1:07am, Peter Stephenson wrote: > } Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c] > } > } I thought of the following: self-insert could take a single character, > } as at present, and then test if it was the initial part of a multibyte > } character. If it was, it could read the rest; we might need a timeout to > } avoid an infinite hang on systems that didn't do multibyte input > } properly > > This would mean what, in terms of binding other functions to wide chars? > That they'd behave like escape sequences do now? I would think you'd > want to decide whether the input is a wide char at a lower level than > that. Otherwise don't you have issues if what the user really means to > bind to self-insert is a single-byte character that happens to have the > high bit set? Hmmm... you mean that on a system where mbrtowc() reports that a single-byte character is incomplete, the user might nonetheless want to insert a single-byte character onto the command line? That's certainly not something I'd thought of. However, I'm not sure I see what this is doing. If mbrtowc() etc. are confused, which in this case they must be (it's the only way the user's intention can disagree with what the proposed mechanism is doing), how can we handle the later stages of character processing successfully? When outputting, do we ignore the fact that wctomb() failed on this character (as it must), reset the shift counter (for safety) and carry on? In other words, are you supposing this is some kind of fallback in case the locale isn't set correctly, e.g. it's set to UTF-8 but on an xterm with character set ISO-8859-1? > It seems to me that some stage of the input process has to be "told" > that the input stream is UTF-8 rather than e.g. iso-8859-something. If > it's the widget level that's going to handle that [*], I think it'd be > most useful to create a self-insert-multibyte which does in fact wait > indefinitely (or at least, longer than the normal escape-sequence key > timeout) for the "rest" of a multibyte character after the first byte is > seen, and feep if it doesn't get something recognizable as the rest. That's perfectly workable, but the question above about self-insert remains. -- Peter Stephenson Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. www.mimesweeper.com **********************************************************************