From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24934 invoked by alias); 6 Jan 2010 11:38:23 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 27571 Received: (qmail 23498 invoked from network); 6 Jan 2010 11:37:32 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-3.0 required=5.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_LOW,SPF_HELO_PASS autolearn=ham version=3.2.5 Received-SPF: none (ns1.primenet.com.au: domain at csr.com does not designate permitted sender hosts) Date: Wed, 6 Jan 2010 11:37:23 +0000 From: Peter Stephenson To: zsh workers Subject: Re: vared/zle silently discards non-utf8 bytes Message-ID: <20100106113723.4ecd8568@news01> In-Reply-To: <237967ef0912230244i2ea13dfav734535262871db7e@mail.gmail.com> References: <237967ef0912230244i2ea13dfav734535262871db7e@mail.gmail.com> Organization: CSR X-Mailer: Claws Mail 3.5.0 (GTK+ 2.12.8; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 06 Jan 2010 11:37:23.0810 (UTC) FILETIME=[9D597420:01CA8EC4] X-Scanned-By: MailControl A-09-22-10 (www.mailcontrol.com) on 10.71.0.127 On Wed, 23 Dec 2009 11:44:51 +0100 Mikael Magnusson wrote: > Ufortunately it seems vared discards > anything after an invalid byte. To reproduce, just do > > % a=hi$'\374'nothing > % vared a This is currently the designed behaviour if multibyte support is compiled in. In this case the editing line is a set of wide characters. If it can't convert the input into wide characters it's stuck. Internally, there are two options (i) I could simply make it ignore invalid characters, which gets you some of the line, but is probably even more dangerous (ii) you could have a go at rewriting the way characters are stored for editing to use a marker that a character isn't a valid wide character but is being stored to represent an octet. This is a big job to get consistent all the way through (display including width, character tests, conversion back and forth). Note that a simpler wrapper varedquote() { # ignoring vared options for now.... local var=${argv[-1]} local val=${(q)${(P)var}} # hmmm... if the user stripped some quoting the following is # a bit fraught... vared val && eval ${var}=${val} } should work because the (q) flag is already smart about unprintable characters (except it does rely on the user not removing backslashes in the variable). This could be made a vared option. It's a little bit hairy making it default behaviour because it changes the meaning of special characters in the string you're editing---it's no longer "raw" in other ways than just $'...' quoting. -- Peter Stephenson Software Engineer Tel: +44 (0)1223 692070 Cambridge Silicon Radio Limited Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom