zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <pws@csr.com>
Cc: zsh-workers@sunsite.dk
Subject: Re: D07multibyte.ztst failure on HP-UX 11.11
Date: Thu, 7 May 2009 16:38:19 +0100	[thread overview]
Message-ID: <20090507163819.1932f06e@news01> (raw)
In-Reply-To: <20090506215026.GA5565@otaku>

On Wed, 6 May 2009 21:50:26 +0000
Paul Ackersviller <pda@sdf.lonestar.org> wrote:
> On Wed, May 06, 2009 at 08:22:06PM +0100, Peter Stephenson wrote:
> > On Tue, 5 May 2009 19:39:31 +0000
> > Paul Ackersviller <pda@sdf.lonestar.org> wrote:
> > > I can get read to silently fail on the HP box with
> > > 
> > > env -i LANG=en_US.utf8 ../Src/zsh -fc \
> > > 	"(LC_ALL=C; print \$'\\u00e9') | read || print failure"
>
>> > Taking out the LC_ALL should produce some sensible output if you omit
> > the read.  (Replacing it with xxd or failing that od -x might make it
> > clearer what's going on.)
> 
> Not quite: "zsh:1: cannot do charset conversion (iconv failed)"

It's not clear why it should fail, but the error message is OK and allowed
for by the test.

> > If you're simply taking out the subshell and not replacing it with
> > anything then the LC_ALL=C covers the "read" as well as the "print".
> > So possibly something strange is happening in the read.  Replacing it
> > with xxd might be even more instructive here.
> 
> This gives
> 	0000000 c50a
> Does this mean the 0a should be the second byte, but is perhaps being
> interpreted as newline?

So this comes from

 env -i LANG=en_US.utf8 ../Src/zsh -fc \
   "LC_ALL=C; print \$'\\u00e9' | read || print failure"

I get "character not in range" here.  It looks like your system is
outputting 0xc5, which I wouldn't expect to be a valid character in the C
locale, and I can't work out why it comes from Unicode character 0xe9.  The
UTF-8 would be 0xc3a9, the ISO-8859-1 or -15 would be 0xe9.

The 0x0a really is a newline.

In the test you show, read is running with UTF-8.  I can confirm that
on my system (where I happen already to be in the en_GB.UTF-8 locale)

  (unsetopt multibyte; print $'\xc5') | xxd

gives what you're sending to read, and

  (unsetopt multibyte; print $'\xc5') | read

returns status 1 with no output.

So this all tallies, and I think we've found out all we need, but I'm not
sure about the fix; possibly read should output an error on an invalid
character in MULTIBYTE mode (which we could add to the test)?  Does anyone
see a problem with that?

I'm fairly happy this isn't a shell bug, but I'd still like the shell to
have enough facilities to be able to detect the problem.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


  reply	other threads:[~2009-05-07 15:38 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-27  3:17 Paul Ackersviller
2009-04-27  4:42 ` Andrey Borzenkov
2009-04-27 19:26   ` Paul Ackersviller
2009-04-28  8:48     ` Peter Stephenson
2009-04-28 19:19       ` Paul Ackersviller
2009-04-28 19:48         ` Peter Stephenson
2009-04-30  3:01           ` Paul Ackersviller
2009-04-30  8:41             ` Peter Stephenson
2009-04-30 15:58               ` Paul Ackersviller
2009-04-30 16:03                 ` Peter Stephenson
2009-05-01 14:52                   ` Paul Ackersviller
2009-05-01 15:18                     ` Peter Stephenson
2009-05-05 19:39                       ` Paul Ackersviller
2009-05-06 19:22                         ` Peter Stephenson
2009-05-06 21:50                           ` Paul Ackersviller
2009-05-07 15:38                             ` Peter Stephenson [this message]
2009-05-07 16:02                               ` Peter Stephenson
2009-05-07 22:08                                 ` Paul Ackersviller
2009-05-07 23:30                                   ` Modules on HP-UX (Re: D07multibyte.ztst failure on HP-UX 11.11) Bart Schaefer
2009-05-08  8:34                                     ` Peter Stephenson
2009-05-08 14:20                                       ` Bart Schaefer
2009-05-08 14:29                                         ` Peter Stephenson
     [not found]                                           ` <090508084125.ZM17697@torch.brasslantern.com>
2009-05-11  8:52                                             ` Peter Stephenson
2009-05-08 18:42                                     ` Modules on HP-UX Paul Ackersviller
2009-05-12 20:22                                     ` Modules on HP-UX, with small PATCH Paul Ackersviller
2009-05-08 14:23                                   ` D07multibyte.ztst failure on HP-UX 11.11 Peter Stephenson
2009-05-02  1:00               ` Phil Pennock

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090507163819.1932f06e@news01 \
    --to=pws@csr.com \
    --cc=zsh-workers@sunsite.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).