zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <pws@csr.com>
To: Zsh hackers list <zsh-workers@sunsite.dk>
Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c]
Date: Mon, 31 Jan 2005 11:46:44 +0000	[thread overview]
Message-ID: <200501311146.j0VBki1g028832@news01.csr.com> (raw)
In-Reply-To: <1050130063525.ZM24312@candle.brasslantern.com>

Bart Schaefer wrote:
> On Jan 30,  1:07am, Peter Stephenson wrote:
> } Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c]
> }
> } I thought of the following: self-insert could take a single character,
> } as at present, and then test if it was the initial part of a multibyte
> } character.  If it was, it could read the rest; we might need a timeout to
> } avoid an infinite hang on systems that didn't do multibyte input
> } properly
> 
> This would mean what, in terms of binding other functions to wide chars?
> That they'd behave like escape sequences do now?  I would think you'd
> want to decide whether the input is a wide char at a lower level than
> that.  Otherwise don't you have issues if what the user really means to
> bind to self-insert is a single-byte character that happens to have the
> high bit set?

Hmmm... you mean that on a system where mbrtowc() reports that a
single-byte character is incomplete, the user might nonetheless want to
insert a single-byte character onto the command line?  That's certainly
not something I'd thought of.  However, I'm not sure I see what this is
doing.  If mbrtowc() etc. are confused, which in this case they must be
(it's the only way the user's intention can disagree with what the
proposed mechanism is doing), how can we handle the later stages of
character processing successfully?  When outputting, do we ignore the
fact that wctomb() failed on this character (as it must), reset the
shift counter (for safety) and carry on?  In other words, are you
supposing this is some kind of fallback in case the locale isn't set
correctly, e.g. it's set to UTF-8 but on an xterm with character set
ISO-8859-1?

> It seems to me that some stage of the input process has to be "told"
> that the input stream is UTF-8 rather than e.g. iso-8859-something.  If
> it's the widget level that's going to handle that [*], I think it'd be
> most useful to create a self-insert-multibyte which does in fact wait
> indefinitely (or at least, longer than the normal escape-sequence key
> timeout) for the "rest" of a multibyte character after the first byte is
> seen, and feep if it doesn't get something recognizable as the rest.

That's perfectly workable, but the question above about self-insert
remains.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************


  reply	other threads:[~2005-01-31 11:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-26 18:06 PATCH: zle_params.c Peter Stephenson
2005-01-26 18:35 ` Clint Adams
2005-01-29  3:47 ` UTF-8 input [was Re: PATCH: zle_params.c] Clint Adams
2005-01-30  1:07   ` Peter Stephenson
2005-01-30  6:35     ` Bart Schaefer
2005-01-31 11:46       ` Peter Stephenson [this message]
2005-01-31 16:18         ` Bart Schaefer
2005-01-31 17:01           ` Peter Stephenson
2005-01-31 18:29             ` Bart Schaefer
2005-02-01 10:37               ` Peter Stephenson
2005-02-10 14:22       ` Peter Stephenson
2005-02-10 14:51         ` Bart Schaefer
2005-02-10 15:06           ` Peter Stephenson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200501311146.j0VBki1g028832@news01.csr.com \
    --to=pws@csr.com \
    --cc=zsh-workers@sunsite.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).