zsh-workers
 help / color / mirror / code / Atom feed
* vared/zle silently discards non-utf8 bytes
@ 2009-12-23 10:44 Mikael Magnusson
  2010-01-06 11:37 ` Peter Stephenson
  0 siblings, 1 reply; 3+ messages in thread
From: Mikael Magnusson @ 2009-12-23 10:44 UTC (permalink / raw)
  To: zsh workers

I have this function

function name() {
  [[ $#@ -eq 1 ]] || { echo Give exactly one argument ; return 1 }
  test -e "$1" || { echo No such file or directory: "$1" ; return 1 }
  local newname=$1
  if vared -c -p 'rename to: ' newname &&
    [[ -n $newname && $newname != $1 ]]
  then
    command mv -i -- $1 $newname
  else
    echo Some error occured; return 1
  fi
}

which I use to rename files interactively if they just need small
adjustments. It would also be useful for files in wrong encodings
where there's a ü here or there. Unfortunately it seems vared discards
anything after an invalid byte. To reproduce, just do

% a=hi$'\374'nothing
% vared a

It actually seems main zle does this too (I thought it would do the
<0374> thing, but then realized those are unicode code points so that
wouldn't work), but it's much harder to hit since completion inserts
the $'\374' sequence for you and there's really no simple way to
insert the string. If you start a terminal in latin1 and run a command
with some upper byte, then press up-arrow in a utf8 terminal,
everything after that byte is silently discarded too. I care less
about this case personally.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: vared/zle silently discards non-utf8 bytes
  2009-12-23 10:44 vared/zle silently discards non-utf8 bytes Mikael Magnusson
@ 2010-01-06 11:37 ` Peter Stephenson
  2010-01-06 11:42   ` Peter Stephenson
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Stephenson @ 2010-01-06 11:37 UTC (permalink / raw)
  To: zsh workers

On Wed, 23 Dec 2009 11:44:51 +0100
Mikael Magnusson <mikachu@gmail.com> wrote:
> Ufortunately it seems vared discards
> anything after an invalid byte. To reproduce, just do
> 
> % a=hi$'\374'nothing
> % vared a

This is currently the designed behaviour if multibyte support is compiled
in.  In this case the editing line is a set of wide characters.  If it
can't convert the input into wide characters it's stuck.

Internally, there are two options

(i) I could simply make it ignore invalid characters, which gets you some
of the line, but is probably even more dangerous

(ii) you could have a go at rewriting the way characters are stored for
editing to use a marker that a character isn't a valid wide character but
is being stored to represent an octet.  This is a big job to get consistent
all the way through (display including width, character tests, conversion
back and forth).

Note that a simpler wrapper

varedquote() {
  # ignoring vared options for now....
  local var=${argv[-1]}
  local val=${(q)${(P)var}}
  # hmmm... if the user stripped some quoting the following is
  # a bit fraught...
  vared val && eval ${var}=${val}
}

should work because the (q) flag is already smart about unprintable
characters (except it does rely on the user not removing backslashes in the
variable).  This could be made a vared option.  It's a little bit hairy
making it default behaviour because it changes the meaning of special
characters in the string you're editing---it's no longer "raw" in other
ways than just $'...' quoting.

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: vared/zle silently discards non-utf8 bytes
  2010-01-06 11:37 ` Peter Stephenson
@ 2010-01-06 11:42   ` Peter Stephenson
  0 siblings, 0 replies; 3+ messages in thread
From: Peter Stephenson @ 2010-01-06 11:42 UTC (permalink / raw)
  Cc: zsh workers

Peter Stephenson wrote:
> varedquote() {
>   # ignoring vared options for now....
>   local var=${argv[-1]}
>   local val=${(q)${(P)var}}
>   # hmmm... if the user stripped some quoting the following is
>   # a bit fraught...
>   vared val && eval ${var}=${val}
> }
> 
> should work because the (q) flag is already smart about unprintable
> characters (except it does rely on the user not removing backslashes in the
> variable).

Much better:

varedquote() {
  # ignoring vared options for now....
  local var=${argv[-1]}
  local val=${(q)${(P)var}}
  vared val && eval ${var}="\${(Q)val}"
}

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-01-06 11:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-23 10:44 vared/zle silently discards non-utf8 bytes Mikael Magnusson
2010-01-06 11:37 ` Peter Stephenson
2010-01-06 11:42   ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).