From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7663 invoked from network); 31 Jan 2005 18:30:02 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 31 Jan 2005 18:30:02 -0000 Received: (qmail 69134 invoked from network); 31 Jan 2005 18:29:51 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 31 Jan 2005 18:29:51 -0000 Received: (qmail 17381 invoked by alias); 31 Jan 2005 18:29:46 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 20764 Received: (qmail 17372 invoked from network); 31 Jan 2005 18:29:46 -0000 Received: from unknown (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 31 Jan 2005 18:29:46 -0000 Received: (qmail 68709 invoked from network); 31 Jan 2005 18:29:10 -0000 Received: from out011pub.verizon.net (HELO out011.verizon.net) (206.46.170.135) by a.mx.sunsite.dk with SMTP; 31 Jan 2005 18:29:05 -0000 Received: from candle.brasslantern.com ([4.11.10.129]) by out011.verizon.net (InterMail vM.5.01.06.06 201-253-122-130-106-20030910) with ESMTP id <20050131182904.WDTA4717.out011.verizon.net@candle.brasslantern.com> for ; Mon, 31 Jan 2005 12:29:04 -0600 Received: from candle.brasslantern.com (IDENT:schaefer@localhost [127.0.0.1]) by candle.brasslantern.com (8.12.11/8.12.11) with ESMTP id j0VIT2GM031428 for ; Mon, 31 Jan 2005 10:29:03 -0800 Received: (from schaefer@localhost) by candle.brasslantern.com (8.12.11/8.12.11/Submit) id j0VIT2bH031427 for zsh-workers@sunsite.dk; Mon, 31 Jan 2005 10:29:02 -0800 From: Bart Schaefer Message-Id: <1050131182902.ZM31426@candle.brasslantern.com> Date: Mon, 31 Jan 2005 18:29:02 +0000 In-Reply-To: <200501311701.j0VH1pRR031376@news01.csr.com> Comments: In reply to Peter Stephenson "Re: UTF-8 input [was Re: PATCH: zle_params.c]" (Jan 31, 5:01pm) References: <200501261806.j0QI6Q2d021854@news01.csr.com> <20050129034740.GA21742@scowler.net> <20050130010754.6F985863A@pwstephenson.fsnet.co.uk> <1050130063525.ZM24312@candle.brasslantern.com> <200501311146.j0VBki1g028832@news01.csr.com> <1050131161826.ZM31264@candle.brasslantern.com> <200501311701.j0VH1pRR031376@news01.csr.com> X-Mailer: Z-Mail (5.0.0 30July97) To: Zsh hackers list Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Authentication-Info: Submitted using SMTP AUTH at out011.verizon.net from [4.11.10.129] at Mon, 31 Jan 2005 12:29:03 -0600 X-Spam-Checker-Version: SpamAssassin 3.0.2 on a.mx.sunsite.dk X-Spam-Level: X-Spam-Status: No, score=-2.6 required=6.0 tests=BAYES_00 autolearn=ham version=3.0.2 X-Spam-Hits: -2.6 On Jan 31, 5:01pm, Peter Stephenson wrote: } Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c] } } Bart Schaefer wrote: } > No. I mean, suppose the user uses the same .zshrc in both a iso-8859-* } > and a UTF-8 locale, and has an explicit bindkey command which is intended } > to work only in the iso-8859-* locale. } } UTF-8 should work fine to that extent: it gets passed straight through } from the main shell to zle (or anything else) intact by the usual Meta } mechanism. That doesn't answer the question. When reading the .zshrc (or any other script) and a byte for which mbrtowc() reports incomplete is found, what decides whether it's part of a string intended for an iso-8859-* locale or the introducer of a wide character for a UTF-8 locale? Is the answer "the file just gets metafied as if it were a binary stream and individual modules work it out later"? } > If multibyte translation is handled by a widget at the same priority } > as all other widgets, that "stray" bindkey can mess up the whole } > scheme. } } You mean if the input is real UTF-8 and a widget grabs the first byte, } leaving garbage? Yes, that's a real problem. I was expecting that the } shell would either be set up to handle old-style input, or new style } input, not a combination In other words, you assume that nobody will try to use the same .zshrc in two different locales, or at least not without wrapping bits of it in tests of the value of LC_CTYPE or the like. } I don't see much more we can do within the shell without more } clairvoyance than usual and without breaking someone's setup. Please } enlighten me. I don't (yet?) know what else we can do, either; I'm just pointing out issues to make sure they've been considered. A question that comes to mind is, how will the shell deal with UTF-8 input when ZLE is not enabled?