zsh-workers
 help / color / mirror / code / Atom feed
* UTF-8 non-breaking spaces
@ 2021-03-26 18:15 Bart Schaefer
  2021-03-26 21:07 ` Daniel Shahaf
  0 siblings, 1 reply; 2+ messages in thread
From: Bart Schaefer @ 2021-03-26 18:15 UTC (permalink / raw)
  To: Zsh hackers list

> If you're copy-pasting from an edit in browser gmail, for example, it
> has a tendency to insert non-breaking spaces whenever there is more
> than one consecutive space, which the shell interprets as
> non-whitespace and attempts to execute as commands.

Non-breaking space in this case is (bindkey syntax) "\M-B\M- ".  The
error message is equally confusing because you still can't see the
non-breaking spaces when "not found" is reported.

Handling this is complicated by bracketed-paste, which protects the
non-breaking spaces from (for example) { bindkey -s '\M-B\M- ' ' ' }.

"unsetopt multibyte" does not affect this but LANG=C results in (for example)

(In gmail editor)
 echo " " "  "
(Pasted at shell prompt)
% echo " " "<c2><a0> "

That's totally a ZLE display thing, the actual nbsp is output when the
command executes, but at least you can see what's going on.

(The non-breaking spaces go back to normal spaces in sent email, I
believe, or at least do so when the message is displayed in gmail;
this is just a "thing" in the browser text editor.)

Similar goofiness can result when copy-pasting from other "smart"
multibyte editors when zsh has a UTF-8 variant in $LANG.

Any good suggestions how to deal with this in a non-confusing fashion?
 Everything I've thought of (short of hacking up the lexer) risks
corrupting parts of the input that aren't intended to be word
separators (the bindkey -s above has that problem, for example, if
bracketed-paste is disabled).


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: UTF-8 non-breaking spaces
  2021-03-26 18:15 UTF-8 non-breaking spaces Bart Schaefer
@ 2021-03-26 21:07 ` Daniel Shahaf
  0 siblings, 0 replies; 2+ messages in thread
From: Daniel Shahaf @ 2021-03-26 21:07 UTC (permalink / raw)
  To: Zsh hackers list

Bart Schaefer wrote on Fri, Mar 26, 2021 at 11:15:47 -0700:
> > If you're copy-pasting from an edit in browser gmail, for example, it
> > has a tendency to insert non-breaking spaces whenever there is more
> > than one consecutive space, which the shell interprets as
> > non-whitespace and attempts to execute as commands.
> 
> Non-breaking space in this case is (bindkey syntax) "\M-B\M- ".  The
> error message is equally confusing because you still can't see the
> non-breaking spaces when "not found" is reported.
> 
> Handling this is complicated by bracketed-paste, which protects the
> non-breaking spaces from (for example) { bindkey -s '\M-B\M- ' ' ' }.
> 
> "unsetopt multibyte" does not affect this but LANG=C results in (for example)
> 
> (In gmail editor)
>  echo " " "  "
> (Pasted at shell prompt)
> % echo " " "<c2><a0> "
> 
> That's totally a ZLE display thing, the actual nbsp is output when the
> command executes, but at least you can see what's going on.
> 
> (The non-breaking spaces go back to normal spaces in sent email, I
> believe, or at least do so when the message is displayed in gmail;
> this is just a "thing" in the browser text editor.)
> 
> Similar goofiness can result when copy-pasting from other "smart"
> multibyte editors when zsh has a UTF-8 variant in $LANG.
> 
> Any good suggestions how to deal with this in a non-confusing fashion?

(I presume "Use a non-buggy MUA" isn't the answer you're after.)

With zsh-syntax-highlighting:

    . /path/to/zsh-syntax-highlighting
    ZSH_HIGHLIGHT_HIGHLIGHTERS=( pattern )                     # or += if you already use z-sy-h
    typeset -A ZSH_HIGHLIGHT_PATTERNS=($'\uA0' 'bg=blue,bold')

This'll highlight nbsp's.  Not change them, just highlight them.  To
change them, a custom s/nbsp/space/g widget might be convenient.

>  Everything I've thought of (short of hacking up the lexer) risks
> corrupting parts of the input that aren't intended to be word
> separators (the bindkey -s above has that problem, for example, if
> bracketed-paste is disabled).
> 


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-03-26 21:08 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-26 18:15 UTF-8 non-breaking spaces Bart Schaefer
2021-03-26 21:07 ` Daniel Shahaf

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).