From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8052 invoked from network); 31 Jan 2005 17:02:26 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 31 Jan 2005 17:02:26 -0000 Received: (qmail 28539 invoked from network); 31 Jan 2005 17:02:20 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 31 Jan 2005 17:02:20 -0000 Received: (qmail 9330 invoked by alias); 31 Jan 2005 17:02:14 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 20763 Received: (qmail 9316 invoked from network); 31 Jan 2005 17:02:13 -0000 Received: from unknown (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 31 Jan 2005 17:02:13 -0000 Received: (qmail 28136 invoked from network); 31 Jan 2005 17:01:59 -0000 Received: from mailhost1.csr.com (HELO MAILSWEEPER01.csr.com) (81.105.217.43) by a.mx.sunsite.dk with SMTP; 31 Jan 2005 17:01:53 -0000 Received: from exchange03.csr.com (unverified [10.100.137.60]) by MAILSWEEPER01.csr.com (Content Technologies SMTPRS 4.3.12) with ESMTP id for ; Mon, 31 Jan 2005 17:00:29 +0000 Received: from news01.csr.com ([10.103.143.38]) by exchange03.csr.com with Microsoft SMTPSVC(5.0.2195.6713); Mon, 31 Jan 2005 17:01:50 +0000 Received: from news01.csr.com (localhost.localdomain [127.0.0.1]) by news01.csr.com (8.13.1/8.12.11) with ESMTP id j0VH1puH031379 for ; Mon, 31 Jan 2005 17:01:52 GMT Received: from csr.com (pws@localhost) by news01.csr.com (8.13.1/8.13.1/Submit) with ESMTP id j0VH1pRR031376 for ; Mon, 31 Jan 2005 17:01:51 GMT Message-Id: <200501311701.j0VH1pRR031376@news01.csr.com> X-Authentication-Warning: news01.csr.com: pws owned process doing -bs To: Zsh hackers list Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c] In-reply-to: <1050131161826.ZM31264@candle.brasslantern.com> References: <200501261806.j0QI6Q2d021854@news01.csr.com> <20050129034740.GA21742@scowler.net> <20050130010754.6F985863A@pwstephenson.fsnet.co.uk> <1050130063525.ZM24312@candle.brasslantern.com> <200501311146.j0VBki1g028832@news01.csr.com> <1050131161826.ZM31264@candle.brasslantern.com> Date: Mon, 31 Jan 2005 17:01:50 +0000 From: Peter Stephenson X-OriginalArrivalTime: 31 Jan 2005 17:01:50.0501 (UTC) FILETIME=[8E6F0550:01C507B6] X-Spam-Checker-Version: SpamAssassin 3.0.2 on a.mx.sunsite.dk X-Spam-Level: X-Spam-Status: No, score=-2.4 required=6.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.2 X-Spam-Hits: -2.4 Bart Schaefer wrote: > No. I mean, suppose the user uses the same .zshrc in both a iso-8859-* > and a UTF-8 locale, and has an explicit bindkey command which is intended > to work only in the iso-8859-* locale. That bindkey happens to use a > character for which, in the UTF-8 locale, mbrtowc() reports incomplete. > This was in part why I added the footnote asking about plans for UTF-8 > in shell scripts; is it even possible to have the same .zshrc in these > cases? UTF-8 should work fine to that extent: it gets passed straight through from the main shell to zle (or anything else) intact by the usual Meta mechanism. (That's why I'm so keen on retaining the current string representation in the main shell.) If we keep metafied input strings as the hash keys for the key binding lookups and they are simply string arguments to bindkey, then there shouldn't be a problem. I think. The bit that doesn't work is when you try to examine individual characters in the main shell; you will get single bytes, possibly with the 8th bit set. I can't think of a simple case where setting up key bindings would need this to work, however. > I'm still worried about the case where that bindkey exists but is for a > function other than self-insert. If multibyte translation is handled by > a widget at the same priority as all other widgets, that "stray" bindkey > can mess up the whole scheme. You mean if the input is real UTF-8 and a widget grabs the first byte, leaving garbage? Yes, that's a real problem. I was expecting that the shell would either be set up to handle old-style input, or new style input, not a combination, based on what the user (or administrator; this should all be possible to automate relatively easily) knows about the system. To be explicit, either: - Input system is not UTF-8 aware; "pass8" or equivalent allows 8-bit bindings; any zsh bindings for high-eighth-bit bytes are ordinary commands. or: - Input system is UTF-8 aware; by hypothesis, any high-eighth-bit character sent from the terminal is part of a multibyte character (this is beyond our control); any zsh bindings for such bytes reflect their use as part of a multibyte character. The zsh bindings would need to be set by whoever decides which is the case. I don't see much more we can do within the shell without more clairvoyance than usual and without breaking someone's setup. Please enlighten me. -- Peter Stephenson Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. www.mimesweeper.com **********************************************************************