PATCH: assume "enhanced goodness" when --multibyte-enable

zsh-workers
 help / color / mirror / code / Atom feed

* PATCH: assume "enhanced goodness" when --multibyte-enable
@ 2005-12-14 18:22 Peter Stephenson
  2005-12-14 19:06 ` Wayne Davison
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Stephenson @ 2005-12-14 18:22 UTC (permalink / raw)
  To: Zsh hackers list

In utils.c we don't enable the full multibyte code for converting
characters unless __STDC_ISO_10646__ is turned on.  However, everywhere
in zle we simply trust that if --multibyte-enable is turned on
everything just works.  That includes wctomb(), which is all we need
for character conversion.

Hence I think we need to make the same assumption in utils.c, too.  This
makes things (in particular insert-{composed,unicode}-char) work better
on Solaris 8.

Index: Src/utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/utils.c,v
retrieving revision 1.105
diff -u -r1.105 utils.c
--- Src/utils.c	30 Nov 2005 16:35:33 -0000	1.105
+++ Src/utils.c	14 Dec 2005 18:17:29 -0000
@@ -3918,7 +3918,7 @@
 }
 #endif
 
-# if defined(HAVE_NL_LANGINFO) && defined(CODESET) && !defined(__STDC_ISO_10646__)
+# if defined(HAVE_NL_LANGINFO) && defined(CODESET) && !defined(__STDC_ISO_10646__) && !defined(MULTIBYTE_SUPPORT)
 /* Convert a character from UCS4 encoding to UTF-8 */
 
 /**/
@@ -3984,7 +3984,7 @@
     char svchar = '\0';
     int meta = 0, control = 0;
     int i;
-#if defined(HAVE_WCHAR_H) && defined(HAVE_WCTOMB) && defined(__STDC_ISO_10646__)
+#if defined(HAVE_WCHAR_H) && defined(HAVE_WCTOMB) && (defined(__STDC_ISO_10646__) || defined(MULTIBYTE_SUPPORT))
     wint_t wval;
     size_t count;
 #else
@@ -4093,7 +4093,7 @@
 		    *misc = wval;
 		    return s+1;
 		}
-#if defined(HAVE_WCHAR_H) && defined(HAVE_WCTOMB) && defined(__STDC_ISO_10646__)
+#if defined(HAVE_WCHAR_H) && defined(HAVE_WCTOMB) && (defined(__STDC_ISO_10646__) || defined(MULTIBYTE_SUPPORT))
 		count = wctomb(t, (wchar_t)wval);
 		if (count == (size_t)-1) {
 		    zerr("character not in range", NULL, 0);

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


This message has been scanned for viruses by BlackSpider MailControl - www.blackspider.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PATCH: assume "enhanced goodness" when --multibyte-enable
  2005-12-14 18:22 PATCH: assume "enhanced goodness" when --multibyte-enable Peter Stephenson
@ 2005-12-14 19:06 ` Wayne Davison
  0 siblings, 0 replies; 4+ messages in thread
From: Wayne Davison @ 2005-12-14 19:06 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

On Wed, Dec 14, 2005 at 06:22:25PM +0000, Peter Stephenson wrote:
> Hence I think we need to make the same assumption in utils.c, too.

That certainly seems right to me.

The first hunk in your diff makes me wonder:  why isn't ucs4toutf8()
declared as static?  It is only used inside utils.c, and it sometimes
doesn't even get defined.

..wayne..


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PATCH: assume "enhanced goodness" when --multibyte-enable
  2005-12-15 11:52 Oliver Kiddle
@ 2005-12-15 12:09 ` Peter Stephenson
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Stephenson @ 2005-12-15 12:09 UTC (permalink / raw)
  To: Zsh hackers list

Oliver Kiddle wrote:
> Peter wrote:
> > In utils.c we don't enable the full multibyte code for converting
> > characters unless __STDC_ISO_10646__ is turned on.  However,
> everywhere
> > in zle we simply trust that if --multibyte-enable is turned on
> > everything just works.  That includes wctomb(), which is all we need
> > for character conversion.
> 
> This doesn't make sense to me. With MULTIBYTE_SUPPORT enabled are you
> just assuming that wchar_t is UCS-4 everywhere?

ish...

> I don't understand how that'll work if you have a system which has
> perfectly good multibyte support but uses some other encoding for
> wchar_t.

It might well not, but up to now I've been assuming we need to know how
to convert it.  --enable-multibyte just says "go ahead and assume this
works".  Unless we can probe for what to do with a wchar_t I've been
assuming we're kind of stuck.

However, the assumptions we rely on are a bit different in the code for
converting Unicode characters and in the reset of zle, so quite likely
they shouldn't be tied...

In converting \U/\u sequences, as you say, we really need fully paid up
UCS-4.

In the reset of zle, we need wchar_t to be an integer which overlaps
with ASCII in positions 0 to 127, and we only need that in some places.
(A lot of the time we can work on the pre-converted multibyte string,
since that *must* have ASCII has a subset, and it's probably possible to
do that everywhere by additional conversions.)  I don't think it
necessarily has to be exactly UCS-4 and most of the time it probably
works if it isn't.  So maybe the change is wrong.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070

This message has been scanned for viruses by BlackSpider MailControl - www.blackspider.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PATCH: assume "enhanced goodness" when --multibyte-enable
@ 2005-12-15 11:52 Oliver Kiddle
  2005-12-15 12:09 ` Peter Stephenson
  0 siblings, 1 reply; 4+ messages in thread
From: Oliver Kiddle @ 2005-12-15 11:52 UTC (permalink / raw)
  To: zsh-workers

Peter wrote:
> In utils.c we don't enable the full multibyte code for converting
> characters unless __STDC_ISO_10646__ is turned on.  However,
everywhere
> in zle we simply trust that if --multibyte-enable is turned on
> everything just works.  That includes wctomb(), which is all we need
> for character conversion.

This doesn't make sense to me. With MULTIBYTE_SUPPORT enabled are you
just assuming that wchar_t is UCS-4 everywhere? I don't understand how
that'll work if you have a system which has perfectly good multibyte
support but uses some other encoding for wchar_t. I think FreeBSD is
such a system and older versions of many Unix systems do that. Do you
actually know for sure that Solaris 8 is using UCS-4 for wchar_t? The
fact the it doesn't define __STDC_ISO_10646__ would imply that it does
not. I'd suspect it is similar but different which is why it sort-of
seems to work.

(The code in utils.c which you have just enabled with MULTIBYTE_SUPPORT
casts a 4 byte integer into a wchar_t: something that can only work
when wchar_t is implemented as UCS-4. __STDC_ISO_10646__ is supposed a
reliable way to determine if wchar_t is UCS-4)

Oliver

___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-12-15 12:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-14 18:22 PATCH: assume "enhanced goodness" when --multibyte-enable Peter Stephenson
2005-12-14 19:06 ` Wayne Davison
2005-12-15 11:52 Oliver Kiddle
2005-12-15 12:09 ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).