zsh-workers
 help / color / mirror / code / Atom feed
* Removing forced casts between signed/unsigned char pointers
@ 2005-11-15  9:27 Wayne Davison
  2005-11-15 10:00 ` DervishD
  0 siblings, 1 reply; 4+ messages in thread
From: Wayne Davison @ 2005-11-15  9:27 UTC (permalink / raw)
  To: zsh-workers

There were a few compiler warnings about pointers and differ in
signedness in the non-multibyte code that I wanted to fix.  I first
looked at the easy fix of adding a few new forced casts, but I didn't
like how many casts to and from "unsigned char *" and "char *" we were
getting in the code.  Instead, I decided to change some of the functions
and global pointers that were taking/returning unsigned char pointers to
make them take/return normal char pointers.  This allowed me to remove a
bunch of forced casts.

I was careful in my changes to preserve any unsigned features of the old
code (for instance, ztrcmp() still compares the character values in an
unsigned manner even though it takes normal "char *" strings).  In two
spots in the code there were unsigned char values being assigned and
restored to/from a "char save" value, so this potentially fixed a
problem for those broken systems that have trouble converting
signed<->unsigned char values without extra casts.

If anyone would like to see all the changes I just checked in, you'll
find a unified diff of them here:

    http://opencoder.net/zsh-less-unsigned-char.patch

..wayne..


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Removing forced casts between signed/unsigned char pointers
  2005-11-15  9:27 Removing forced casts between signed/unsigned char pointers Wayne Davison
@ 2005-11-15 10:00 ` DervishD
  2005-11-15 11:03   ` Wayne Davison
  0 siblings, 1 reply; 4+ messages in thread
From: DervishD @ 2005-11-15 10:00 UTC (permalink / raw)
  To: Wayne Davison; +Cc: zsh-workers

    Hi Wayne :)

 * Wayne Davison <wayned@users.sourceforge.net> dixit:
> I first looked at the easy fix of adding a few new forced casts,
> but I didn't like how many casts to and from "unsigned char *" and
> "char *" we were getting in the code.  Instead, I decided to change
> some of the functions and global pointers that were
> taking/returning unsigned char pointers to make them take/return
> normal char pointers.  This allowed me to remove a bunch of forced
> casts.

    But that's not a good idea. If you go for portability, you
shouldn't assume that a plain char is equivalent to "unsigned char".
If you don't mind me suggesting, you should use "unsigned char"
wherever possible, and cast to plain "char" when needed. In fact,
plain char has been maintained in the C99 standard because making it
dissappear or making it an alias to "unsigned char" will break some
existing code.
 
> I was careful in my changes to preserve any unsigned features of
> the old code (for instance, ztrcmp() still compares the character
> values in an unsigned manner even though it takes normal "char *"
> strings).

    If you really want to make sure you're preserving unsigned
features, avoid the use of "char" and "char *". I know you're using
autoconf and that anyway you can check if "char" is equivalent to
"unsigned char" or "signed char" using SCHAR_MAX, CHAR_MAX and
UCHAR_MAX macros, but I think that using "unsigned char" where
possible is safer.

    I haven't seen any potential problem in your patch, but I don't
think it's a good idea to change some prototypes from using "unsigned
char *" to "char *". I have had a similar problem with one of my
projects and I think that the extra casts are a better solution,
especially if you can afford to modify prototypes. You can run into
trouble by changing from "unsigned char" to "char" (this applies to
pointers to these types, too), because you can potentially restrict
the range, but unless your code relies on "char" being signed at some
point, you can safely go from "char" to "unsigned char".

    Currently I don't know of any system nor C implementation that
won't copy eight bits from any kind of char type to any other kind of
char type (even through pointers) if CHAR_BIT is 8, and the real
danger is probably in comparisons, but anyway, and to be on the safe
side, I would go for "unsigned char" and will do any needed casts.
Given that you can change some prototypes, I think that the number of
casts will be pretty low. In my project, going to "unsigned char" and
"unsigned char *" will need casts only for a couple uses of
<string.h> functions (which should be avoided whenever possible) and
a couple of Xlib functions. The other way around (using "char" &
"char *" instead) will need lots of casts and a careful auditing to
see if we are breaking unsigned features, since the code must be 8bit
clean.

    And thanks a lot for fixing the warnings :) I know that sometimes
GCC warnings are nonsenses or misleading, but this time they're
right: the C99 standard makes pretty clear that, no matter if "char"
is equivalent to "unsigned char" or "signed char", the three types
are distinct, so their pointers are different and non compatible, so
you need explicit casts. You can see a discussion in the GCC
bugzilla, and if you want I can point you to the relevant C99
standard paragraphs.

    Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736 | http://www.dervishd.net
http://www.pleyades.net & http://www.gotesdelluna.net
It's my PC and I'll cry if I want to...


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Removing forced casts between signed/unsigned char pointers
  2005-11-15 10:00 ` DervishD
@ 2005-11-15 11:03   ` Wayne Davison
  2005-11-15 12:15     ` DervishD
  0 siblings, 1 reply; 4+ messages in thread
From: Wayne Davison @ 2005-11-15 11:03 UTC (permalink / raw)
  To: zsh-workers

On Tue, Nov 15, 2005 at 11:00:09AM +0100, DervishD wrote:
> If you go for portability, you shouldn't assume that a plain char is
> equivalent to "unsigned char".

I'm not -- I'm assuming that "char" is probably signed while making sure
that the code works irrespective of it being signed or not.  What I want
to avoid is a bunch of switching back and forth between the two pointer
types.  I like to avoid pointer casts where possible because they can
hide bugs.  For instance, it's easy to think that we're casting one
pointer type to another when we're really forcibly casting a character
value or an int into a pointer -- the casts are coded the same way, and
C will not complain about it if it's an explicit cast.  (Yes, we could
use inline functions under a compiler like gcc and code up type-safe
conversions, but I think it's better to pick a single char type and
stick with it than to do a bunch of converting back and forth.)

> If you don't mind me suggesting, you should use "unsigned char"
> wherever possible, and cast to plain "char" when needed.

I certainly wouldn't mind if all of zsh were switched over to using
unsigned chars for its strings, but that's a MUCH larger change than
what I did (my changes mainly affected the ZLE code).  Switching all of
zsh to unsigned chars would also involve some kind of a wrapper scheme
for standard C functions that we use (e.g. strcpy(), strcmp(), strchr(),
etc.), which might be a little annoying, but probably wouldn't be too
bad.

The main thing zsh needs is consistency of type and good coding
practices to avoid any problems with signed characters.  Yes, such
coding practices are slightly easier to get right when using unsigned
chars, but zsh already has several idioms in place that are geared
towards making sure that signed characters are handled correctly.

That's my opinion, at least.

..wayne..


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Removing forced casts between signed/unsigned char pointers
  2005-11-15 11:03   ` Wayne Davison
@ 2005-11-15 12:15     ` DervishD
  0 siblings, 0 replies; 4+ messages in thread
From: DervishD @ 2005-11-15 12:15 UTC (permalink / raw)
  To: Wayne Davison; +Cc: zsh-workers

    Hi Wayne :)

 * Wayne Davison <wayned@users.sourceforge.net> dixit:
> On Tue, Nov 15, 2005 at 11:00:09AM +0100, DervishD wrote:
> > If you go for portability, you shouldn't assume that a plain char is
> > equivalent to "unsigned char".
> 
> I'm not -- I'm assuming that "char" is probably signed while making
> sure that the code works irrespective of it being signed or not.

    That's cool :) (and of course I wasn't saying that you was
assuming that, it was just a suggestion).

> What I want to avoid is a bunch of switching back and forth between
> the two pointer types.  I like to avoid pointer casts where
> possible because they can hide bugs.

    You're completely right, arbitrary casts can hide very nasty
bugs. In that sense you're right when sticking to a particular type
of char. Anyway and IMHO (since I'm not as familiar as you are with
zsh source code) going to "unsigned char" will remove a whole bunch
of casts (especially if you can change some prototypes) and it is
safer. But obviously you know LOTS more than me about zsh source
code. For example, I didn't know that zsh code can deal with signed
chars.

> > If you don't mind me suggesting, you should use "unsigned char"
> > wherever possible, and cast to plain "char" when needed.
> 
> I certainly wouldn't mind if all of zsh were switched over to using
> unsigned chars for its strings, but that's a MUCH larger change
> than what I did (my changes mainly affected the ZLE code).

    OK, I understand. I didn't think about the extents of such
modification.

> Switching all of zsh to unsigned chars would also involve some kind
> of a wrapper scheme for standard C functions that we use (e.g.
> strcpy(), strcmp(), strchr(), etc.), which might be a little
> annoying, but probably wouldn't be too bad.

    But zsh is already using such wrappers, isn't it? This way you
only have to change the wrapper prototypes. As I've already told, I'm
not familiar with the code to know such details. It was just a
suggestion.

> The main thing zsh needs is consistency of type and good coding
> practices to avoid any problems with signed characters.

    This is obvious in any project. Good coding practices can avoid
more bugs that just trying to write "proper" code (for example, using
"unsigned char" instead of plain "char"). Here you're completely
right, and as far as I know, zsh code fits your description.

> Yes, such coding practices are slightly easier to get right when
> using unsigned chars, but zsh already has several idioms in place
> that are geared towards making sure that signed characters are
> handled correctly.

    I didn't know that, thanks for pointing :)

> That's my opinion, at least.

    And you're right. If switching to "unsigned char" can lead to
significant problems and the change is very large (in number of lines
of code affected) I don't think it is worth the effort. Anyway, if the
change can be done progressively (for example, changing the wrappers
first, after that some static variables, etc.) then it maybe worth a
try. Anyway this is just a humble suggestion since I'm not able to do
that change myself and I don't want (nor like) to say others how they
should do their work.

    Thanks for considering and discussing the idea :) And good luck
with the changes, whatever they are ;)

    Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736 | http://www.dervishd.net
http://www.pleyades.net & http://www.gotesdelluna.net
It's my PC and I'll cry if I want to...


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-11-15 12:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-15  9:27 Removing forced casts between signed/unsigned char pointers Wayne Davison
2005-11-15 10:00 ` DervishD
2005-11-15 11:03   ` Wayne Davison
2005-11-15 12:15     ` DervishD

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).