zsh-workers
 help / color / mirror / code / Atom feed
* Re: Echoing of 8-bit-characters broken after 4.3.2?
@ 2009-02-28 20:31 Wolfgang Hukriede
  2009-02-28 20:49 ` Andrey Borzenkov
  0 siblings, 1 reply; 12+ messages in thread
From: Wolfgang Hukriede @ 2009-02-28 20:31 UTC (permalink / raw)
  To: zsh-workers

Andrey wrote:
> Wolfgang, what happens if you explicitly disable multibyte support (--
> disable-multibyte) during build?

I did not know about that option. I will report asap.

Btw, I've to correct myself with respect to LANG, since I wrote:
> change the export to
> LANG=IS.ISO8859-1 though since otherwise `date' talks in a language
> which is unknown to me ...

True, but setting LANG to "is_IS.ISO8859-1" once and then setting it
to anything else seems to do the trick as well:

  > export LANG=is_IS.ISO8859-1
  > date
  lau 28 feb 2009 21:17:50 CET

  > export LANG=nada
  > date
  Sat Feb 28 21:24:36 CET 2009

Eight-bit-chars still work.

  > unset LANG

Again, eight-bit-chars still work.

Seems dubious to me.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Echoing of 8-bit-characters broken after 4.3.2?
  2009-02-28 20:31 Echoing of 8-bit-characters broken after 4.3.2? Wolfgang Hukriede
@ 2009-02-28 20:49 ` Andrey Borzenkov
  2009-02-28 23:01   ` Bart Schaefer
  0 siblings, 1 reply; 12+ messages in thread
From: Andrey Borzenkov @ 2009-02-28 20:49 UTC (permalink / raw)
  To: zsh-workers; +Cc: Wolfgang Hukriede

[-- Attachment #1: Type: text/plain, Size: 667 bytes --]

On 28 февраля 2009 23:31:05 Wolfgang Hukriede wrote:

>
> True, but setting LANG to "is_IS.ISO8859-1" once and then setting it
>
> to anything else seems to do the trick as well:
>   > export LANG=is_IS.ISO8859-1
>   > date
>
>   lau 28 feb 2009 21:17:50 CET
>
>   > export LANG=nada
>   > date
>
>   Sat Feb 28 21:24:36 CET 2009
>
> Eight-bit-chars still work.
>

setlocale() failed so old value (is_IS.ISO8859-1) remains in effect.

>   > unset LANG
>
> Again, eight-bit-chars still work.
>

Zsh does not do anything special when *unsetting* locale variable; there 
is no unsetlocale() function. So old value remains in effect.


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Echoing of 8-bit-characters broken after 4.3.2?
  2009-02-28 20:49 ` Andrey Borzenkov
@ 2009-02-28 23:01   ` Bart Schaefer
  0 siblings, 0 replies; 12+ messages in thread
From: Bart Schaefer @ 2009-02-28 23:01 UTC (permalink / raw)
  To: zsh-workers; +Cc: Wolfgang Hukriede

} > True, but setting LANG to "is_IS.ISO8859-1" once and then setting it
} > to anything else seems to do the trick as well:
} >   > export LANG=is_IS.ISO8859-1
} >   > date
} >
} >   lau 28 feb 2009 21:17:50 CET
} >
} >   > export LANG=nada
} >   > date
} >
} >   Sat Feb 28 21:24:36 CET 2009
} >
} > Eight-bit-chars still work.
} 
} setlocale() failed so old value (is_IS.ISO8859-1) remains in effect.

In more detail, what's happening is that the variable is no longer in
the environment, so it's not inherited by "date", even though zsh is
still using the previous local.

However, the deeper problem is that is_IS.ISO8859-1 is probably the
wrong value in the first place.  I found it by tab-completing thus:

zsh% LANG=IS<TAB>

but on a closer look there are a whole lot of possible alternatives.
I'm guessing that perhaps one of de_AT.iso88591 or de_DE.iso88591 is
more correct, but you should look through the list yourself.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Echoing of 8-bit-characters broken after 4.3.2?
  2009-02-28 22:00 Wolfgang Hukriede
@ 2009-03-01  0:12 ` Phil Pennock
  0 siblings, 0 replies; 12+ messages in thread
From: Phil Pennock @ 2009-03-01  0:12 UTC (permalink / raw)
  To: zsh-workers

On 2009-02-28 at 23:00 +0100, Wolfgang Hukriede wrote:
> Andrey wrote:
> > Because this is established standard to define your character set
> > properties. Without it applications should assume C (or POSIX) locale
> > that basically corresponds to standard ASCII.
> 
> Should the character set properties not be set by LC_CTYPE? As far as
> I can tell LANG sets more than that? Do I understand correctly that
> LANG is zsh-specific? (On my box, man 3 setlocale does not have it.)

LANG is not zsh-specific.

LANG sets defaults, the LC_* settings override those and LC_ALL
overrides everything, if memory serves.

On FreeBSD, running "locale" on its own reports all the relevant
variables, including LANG.  "locale charmap" is useful to see the
current charmap.

BTW, are you from Iceland?  is_IS means Icelandic, Iceland variant.
en_GB and en_US for English are common, and de_DE for German.

So I run with en_US.UTF-8 on FreeBSD and LANG=<tab> will show you all
the options.  For LC_CTYPE it doesn't matter so much which language you
choose.

To test an "unset" variable, set it to the value "C", which is the
default and roughly means 7-bit ASCII.

So, I just installed luit on my laptop and ssh'd to my private box,
which is normally UTF-8, with:
  luit -encoding iso8859-1 ssh
luit is a wrapper which lets you translate foreign character sets back
to UTF-8, so you can connect to non-UTF-8 systems from a UTF-8 system.

If I set:
  LC_CTYPE=C
and type <Compose><L><-> to see £ (should be POUND SIGN, Sterling, for
the British currency) then I get <00a3>.

If I set:
  LC_CTYPE=en_US.ISO8859-15
then I see the "£" as I should.

These days, you probably want to use the -15 variant instead of the -1
variant, to get Latin 9, which has a few small changes from Latin 1;
most noticeably for many, the international currency symbol is replaced
with the EURO SIGN.  Except that on the FreeBSD (7.0) system, it doesn't
appear to work in ISO8859-15.  Strange.  I'd never noticed before
because I use UTF-8 and frankly I don't care enough to chase down the
cause.

Regards,
-Phil


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Echoing of 8-bit-characters broken after 4.3.2?
@ 2009-02-28 22:00 Wolfgang Hukriede
  2009-03-01  0:12 ` Phil Pennock
  0 siblings, 1 reply; 12+ messages in thread
From: Wolfgang Hukriede @ 2009-02-28 22:00 UTC (permalink / raw)
  To: zsh-workers

Andrey wrote:
> Because this is established standard to define your character set
> properties. Without it applications should assume C (or POSIX) locale
> that basically corresponds to standard ASCII.

Should the character set properties not be set by LC_CTYPE? As far as
I can tell LANG sets more than that? Do I understand correctly that
LANG is zsh-specific? (On my box, man 3 setlocale does not have it.)

> So I would be surprised if
> zsh were the only program that had issues with non-ASCII characters.

At least emacs passes them through without ado. There's only one other
program that I had problems with in that respect. (That's unicode
only.) Looks like more will come...

> FreeBSD could provide some other means to define local though.

Not that I know of.

> Because blindly emitting arbitrary character sequence to terminal may
> have completely undefined effects and screw up display to the point that
> you need hard reset (town legend also is that you can cause you terminal
> to echo back any sequence like "rm -rf" as input back to shell ...)

Urban legends aside, this may be. Otoh... I've been using zsh since at
least 10 years almost exclusively and quite intensely and have used
8-bit-characters all the time (all on xterms), but any display
distortion never happened to me. This is probably due to the fact that
filenames are mostly under ones own control. I suffered display
distortion from reading emails though, but the shell could not have
done anything about that. Correctness of vt100-control-sequences
cannot be monitored either.

Therefore I think passing-through of eight bit characters should be
configurable. But I still do not understand how am I supposed to do
that (without triggering side effects). Why is PRINT_EIGHT_BIT
constricted to affect tab-completion only?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Echoing of 8-bit-characters broken after 4.3.2?
  2009-02-28 19:52 Wolfgang Hukriede
@ 2009-02-28 20:40 ` Andrey Borzenkov
  0 siblings, 0 replies; 12+ messages in thread
From: Andrey Borzenkov @ 2009-02-28 20:40 UTC (permalink / raw)
  To: zsh-workers; +Cc: Wolfgang Hukriede

[-- Attachment #1: Type: text/plain, Size: 1561 bytes --]

On 28 февраля 2009 22:52:33 Wolfgang Hukriede wrote:
> > At the same time, OSX doesn't appear to export a LANG value (or at
> > least it doesn't on my iMac at work).
>
> Freebsd does not export the variable either, but why should it?
>

Because this is established standard to define your character set 
properties. Without it applications should assume C (or POSIX) locale 
that basically corresponds to standard ASCII. So I would be surprised if 
zsh were the only program that had issues with non-ASCII characters.

FreeBSD could provide some other means to define local though.

>
> > Wolfgang, if you're reading this, something that I forgot to
> > mention in my reply to you is that sometime during 4.3.x zsh began
> > to pay closer attention to characters that are absent from the
> > declared LANG character set and to either refuse to process them at
> > all, or to render them as digits surrounded by angle brackets.  It
> > no longer blindly passes those characters around unprocessed, so
> > things that "worked" before because xterm dealt with the processing
> > will now appear to "fail" because the shell is trying harder to do
> > the right thing internally.
>
> Yes, I suspected so. But what is the benefit of it?

Because blindly emitting arbitrary character sequence to terminal may 
have completely undefined effects and screw up display to the point that 
you need hard reset (town legend also is that you can cause you terminal 
to echo back any sequence like "rm -rf" as input back to shell ...)



[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Echoing of 8-bit-characters broken after 4.3.2?
@ 2009-02-28 19:52 Wolfgang Hukriede
  2009-02-28 20:40 ` Andrey Borzenkov
  0 siblings, 1 reply; 12+ messages in thread
From: Wolfgang Hukriede @ 2009-02-28 19:52 UTC (permalink / raw)
  To: zsh-workers

Bart wrote:
> Try
>
> export LANG=is_IS.ISO8859-1
>
> I discovered this by tab-completing values for LANG.

Ok, many thanks, this works so far. I had to change the export to
LANG=IS.ISO8859-1 though since otherwise `date' talks in a language
which is unknown to me:

  > date
  lau 28 feb 2009 19:40:54 CET

No, this isn't OSX, but Freebsd-6.4 (with newest ports though by a few
days).

> The multibyte character handling on OSX appears to be particularly
> sensitive to the LANG setting (see my previous mail to Wolfgang).
> At the same time, OSX doesn't appear to export a LANG value (or at
> least it doesn't on my iMac at work).

Freebsd does not export the variable either, but why should it?

> I can't precisely reproduce the above; I get things like
>
> schaefer<263> touch x<00c3><00c3><00c3>x
>
> or
>
> schaefer<263> touch xinsert-composed-char:180: character not in range
>
> before I ever get as far as creating the file.  Maybe there's some
> additional character munging happening in transit of the email so
> I'm not using the correct input.

This is not so here. Only just the echoing of the character fails
unless LANG is set.
Tab completion worked in 4.3.2 and works with LANG set.

> Wolfgang, if you're reading this, something that I forgot to mention in
> my reply to you is that sometime during 4.3.x zsh began to pay closer
> attention to characters that are absent from the declared LANG character
> set and to either refuse to process them at all, or to render them as
> digits surrounded by angle brackets.  It no longer blindly passes those
> characters around unprocessed, so things that "worked" before because
> xterm dealt with the processing will now appear to "fail" because the
> shell is trying harder to do the right thing internally.

Yes, I suspected so. But what is the benefit of it? Perhaps to make
certain the shell can assume unicode as the default? Would an explicit
setopt (to remove the ambiguity) not be a viable/better alternative?

Looking up "man 1 locale" I found the bug section below. Might this be
significant?

  DESCRIPTION
       ...
       -m      Print names of all available charmaps.

  BUGS
       Since FreeBSD does not support charmaps in their POSIX meaning, locale
       emulates the -m option using the CODESETs listing of all available
       locales.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Echoing of 8-bit-characters broken after 4.3.2?
  2009-02-28 19:19     ` Bart Schaefer
@ 2009-02-28 19:29       ` Andrey Borzenkov
  0 siblings, 0 replies; 12+ messages in thread
From: Andrey Borzenkov @ 2009-02-28 19:29 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 764 bytes --]

On 28 февраля 2009 22:19:13 Bart Schaefer wrote:
> On Feb 28, 10:07pm, Andrey Borzenkov wrote:
> }
> } Hmm ... quoting original mail:
> }
> } >This is on freebsd 6.4.
> } >Unicode ist not used, and I currently do not intend to use it.
>
> Oops, sorry about that.  Don't know how I missed it.
>
> } Wolfgang, what happens if you explicitly disable multibyte support
> (-- } disable-multibyte) during build?
>
> BTW I tried "unsetopt multibyte" when fiddling with this on my Mac,
> and "unknown" characters were still displayed as <00c3>.  (Which is
> better than in 4.3.4, where they are both displayed and processed as
> %C3.)

This is not keyed on runtime option but on build time (under #ifdef 
MULTIBYTE_SUPPORT). But see other mail.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Echoing of 8-bit-characters broken after 4.3.2?
  2009-02-28 19:07   ` Andrey Borzenkov
@ 2009-02-28 19:19     ` Bart Schaefer
  2009-02-28 19:29       ` Andrey Borzenkov
  0 siblings, 1 reply; 12+ messages in thread
From: Bart Schaefer @ 2009-02-28 19:19 UTC (permalink / raw)
  To: zsh-workers

On Feb 28, 10:07pm, Andrey Borzenkov wrote:
}
} Hmm ... quoting original mail:
} 
} >This is on freebsd 6.4.
} >Unicode ist not used, and I currently do not intend to use it.

Oops, sorry about that.  Don't know how I missed it.

} Wolfgang, what happens if you explicitly disable multibyte support (--
} disable-multibyte) during build?

BTW I tried "unsetopt multibyte" when fiddling with this on my Mac, and
"unknown" characters were still displayed as <00c3>.  (Which is better
than in 4.3.4, where they are both displayed and processed as %C3.)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Echoing of 8-bit-characters broken after 4.3.2?
  2009-02-28 17:53 ` Bart Schaefer
@ 2009-02-28 19:07   ` Andrey Borzenkov
  2009-02-28 19:19     ` Bart Schaefer
  0 siblings, 1 reply; 12+ messages in thread
From: Andrey Borzenkov @ 2009-02-28 19:07 UTC (permalink / raw)
  To: zsh-workers; +Cc: Wolfgang Hukriede

[-- Attachment #1: Type: text/plain, Size: 840 bytes --]

On 28 февраля 2009 20:53:11 Bart Schaefer wrote:
> On Feb 28,  9:35am, Wolfgang Hukriede wrote:
> }
> } After upgrading from 4.3.2 to 4.3.9 I've currently two zshells open
> of } the respective versions, each in its own xterm-window. Now,
> while in } 4.3.2, when I type any 8-bit-character (from the
> latin1-set), it is nicely } echoed back. But this no longer works in
> 4.3.9.
>
> This doesn't happen to be on Mac OSX, does it?  (You should always
> include operating system information when reporting a problem.)
>

Hmm ... quoting original mail:

>This is on freebsd 6.4.
>Unicode ist not used, and I currently do not intend to use it.


Wolfgang, what happens if you explicitly disable multibyte support (--
disable-multibyte) during build?

Otherwise I also tend to think this is locale setting issue.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Echoing of 8-bit-characters broken after 4.3.2?
  2009-02-28  8:35 Wolfgang Hukriede
@ 2009-02-28 17:53 ` Bart Schaefer
  2009-02-28 19:07   ` Andrey Borzenkov
  0 siblings, 1 reply; 12+ messages in thread
From: Bart Schaefer @ 2009-02-28 17:53 UTC (permalink / raw)
  To: Wolfgang Hukriede, zsh-workers

On Feb 28,  9:35am, Wolfgang Hukriede wrote:
}
} After upgrading from 4.3.2 to 4.3.9 I've currently two zshells open of
} the respective versions, each in its own xterm-window. Now, while in
} 4.3.2, when I type any 8-bit-character (from the latin1-set), it is nicely
} echoed back. But this no longer works in 4.3.9.

This doesn't happen to be on Mac OSX, does it?  (You should always
include operating system information when reporting a problem.)

} All zsh-dotfiles in both sessions are identical. No LC-variables are
} set (but export LC_CTYPE=ISO8859-1 does not help either).

I think the trouble there is that this is the wrong value for LC_CTYPE.

Try

    export LANG=is_IS.ISO8859-1

I discovered this by tab-completing values for LANG.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Echoing of 8-bit-characters broken after 4.3.2?
@ 2009-02-28  8:35 Wolfgang Hukriede
  2009-02-28 17:53 ` Bart Schaefer
  0 siblings, 1 reply; 12+ messages in thread
From: Wolfgang Hukriede @ 2009-02-28  8:35 UTC (permalink / raw)
  To: zsh-workers

After upgrading from 4.3.2 to 4.3.9 I've currently two zshells open of
the respective versions, each in its own xterm-window. Now, while in
4.3.2, when I type any 8-bit-character (from the latin1-set), it is nicely
echoed back. But this no longer works in 4.3.9.

E.g.:

        4.3.9> echo Le dictionnaire fran<00e7>ais-anglais
        Le dictionnaire franc,ais-anglais

where "c," is a "c" with cedilla. As can be seen it appears not to be
a problem with the xterm, since the character comes out quite well,
only the shell refuses to echo it.

All zsh-dotfiles in both sessions are identical. No LC-variables are
set (but export LC_CTYPE=ISO8859-1 does not help either). Also, I've
tried to setopt PRINT_EIGHT_BIT to no avail. This is on freebsd 6.4.
Unicode ist not used, and I currently do not intend to use it.

Is there a work-around or solution except downgrading again?

Greetings and thanks, Wolfgang


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-03-01  0:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-28 20:31 Echoing of 8-bit-characters broken after 4.3.2? Wolfgang Hukriede
2009-02-28 20:49 ` Andrey Borzenkov
2009-02-28 23:01   ` Bart Schaefer
  -- strict thread matches above, loose matches on Subject: below --
2009-02-28 22:00 Wolfgang Hukriede
2009-03-01  0:12 ` Phil Pennock
2009-02-28 19:52 Wolfgang Hukriede
2009-02-28 20:40 ` Andrey Borzenkov
2009-02-28  8:35 Wolfgang Hukriede
2009-02-28 17:53 ` Bart Schaefer
2009-02-28 19:07   ` Andrey Borzenkov
2009-02-28 19:19     ` Bart Schaefer
2009-02-28 19:29       ` Andrey Borzenkov

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).