zsh-workers
 help / color / mirror / code / Atom feed
* PATCH: math and locale
@ 1999-11-20 20:18 Clint Adams
  1999-11-20 20:53 ` Bart Schaefer
  0 siblings, 1 reply; 10+ messages in thread
From: Clint Adams @ 1999-11-20 20:18 UTC (permalink / raw)
  To: zsh-workers

This alleviates the decimal point problem by making it
locale-independent.  This reverses the previous fix
which introduced new problems.

--- Src/math.c	1999/11/10 19:13:33	1.1.1.19
+++ Src/math.c	1999/11/20 20:07:47
@@ -184,20 +184,12 @@
 static int
 zzlex(void)
 {
-    char decimal = '.', thousands = ',';
-    int cct = 0;
 #ifdef USE_LOCALE
-    struct lconv *lc;
+    char *prev_locale;
 #endif
-
+    int cct = 0;
     yyval.type = MN_INTEGER;
 
-#ifdef USE_LOCALE
-    lc = localeconv();
-    decimal = *(lc->decimal_point);
-    thousands = *(lc->thousands_sep);
-#endif
-
     for (;; cct = 0)
 	switch (*ptr++) {
 	case '+':
@@ -335,9 +327,7 @@
 	case ':':
 	    return COLON;
 	case ',':
-	case '.':
-	    if (*(ptr-1) == thousands) return COMMA;
-	    else break;
+	    return COMMA;
 	case '\0':
 	    ptr--;
 	    return EOI;
@@ -362,15 +352,22 @@
 	    }
 	/* Fall through! */
 	default:
-	    if (idigit(*--ptr) || *ptr == decimal) {
+	    if (idigit(*--ptr) || *ptr == '.') {
 		char *nptr;
 		for (nptr = ptr; idigit(*nptr); nptr++);
 
-		if (*nptr == decimal || *nptr == 'e' || *nptr == 'E') {
+		if (*nptr == '.' || *nptr == 'e' || *nptr == 'E') {
 		    /* it's a float */
 		    yyval.type = MN_FLOAT;
+#ifdef USE_LOCALE
+		    prev_locale = setlocale(LC_NUMERIC, NULL);
+		    setlocale(LC_NUMERIC, "POSIX");
+#endif
 		    yyval.u.d = strtod(ptr, &nptr);
-		    if (ptr == nptr || *nptr == decimal ) {
+#ifdef USE_LOCALE
+		    setlocale(LC_NUMERIC, prev_locale);
+#endif
+		    if (ptr == nptr || *nptr == '.' ) {
 			zerr("bad floating point constant", NULL, 0);
 			return EOI;
 		    }


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: math and locale
  1999-11-20 20:18 PATCH: math and locale Clint Adams
@ 1999-11-20 20:53 ` Bart Schaefer
  1999-11-21 18:14   ` Clint Adams
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Schaefer @ 1999-11-20 20:53 UTC (permalink / raw)
  To: zsh-workers

On Nov 20,  3:18pm, Clint Adams wrote:
} Subject: PATCH: math and locale
}
} This alleviates the decimal point problem by making it
} locale-independent.  This reverses the previous fix
} which introduced new problems.

Here's the equivalent against 3.1.6-bart-8, for those of you who, like me,
never applied Clint's previous patch and hence don't need to reverse it.

I'm mildly concerned that setting and restoring the locale is an excessive
overhead, especially if it's a no-op (prev_local is already "POSIX" or "C").
Can anyone reassure me?

I also wonder whether "C" would not be a better choice than "POSIX" here.

Index: Src/math.c
===================================================================
@@ -184,6 +184,9 @@
 static int
 zzlex(void)
 {
+#ifdef USE_LOCALE
+    char *prev_locale;
+#endif
     int cct = 0;
 
     yyval.type = MN_INTEGER;
@@ -356,7 +359,14 @@
 		if (*nptr == '.' || *nptr == 'e' || *nptr == 'E') {
 		    /* it's a float */
 		    yyval.type = MN_FLOAT;
+#ifdef USE_LOCALE
+		    prev_locale = setlocale(LC_NUMERIC, NULL);
+		    setlocale(LC_NUMERIC, "POSIX");
+#endif
 		    yyval.u.d = strtod(ptr, &nptr);
+#ifdef USE_LOCALE
+		    setlocale(LC_NUMERIC, prev_locale);
+#endif
 		    if (ptr == nptr || *nptr == '.') {
 			zerr("bad floating point constant", NULL, 0);
 			return EOI;

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: math and locale
  1999-11-20 20:53 ` Bart Schaefer
@ 1999-11-21 18:14   ` Clint Adams
  1999-11-22  8:17     ` Bart Schaefer
  0 siblings, 1 reply; 10+ messages in thread
From: Clint Adams @ 1999-11-21 18:14 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

> I'm mildly concerned that setting and restoring the locale is an excessive
> overhead, especially if it's a no-op (prev_local is already "POSIX" or "C").
> Can anyone reassure me?

I see that the setlocale code is a bit meatier than I would have expected.
On the other hand, a few string comparisons followed by setlocale is
potentially even worse for those using other than C/POSIX.

GNU libc seems to have an "extended locale model" allowing strtod to
take a locale argument; however, this is neither portable nor standardized.

I'm also beginning to wonder if prev_locale won't get clobbered somehow.

> I also wonder whether "C" would not be a better choice than "POSIX" here.

On sheer byte count or by some other criterion?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: math and locale
  1999-11-21 18:14   ` Clint Adams
@ 1999-11-22  8:17     ` Bart Schaefer
  1999-11-22 14:42       ` Clint Adams
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Schaefer @ 1999-11-22  8:17 UTC (permalink / raw)
  To: zsh-workers

On Nov 21,  1:14pm, Clint Adams wrote:
} Subject: Re: PATCH: math and locale
}
} > I'm mildly concerned that setting and restoring the locale is an excessive
} > overhead, especially if it's a no-op (prev_local is already "POSIX" or "C").
} > Can anyone reassure me?
} 
} I see that the setlocale code is a bit meatier than I would have expected.

I am mostly concerned about things like re-opening (or worse, reading) of
external locale-definition files.

} I'm also beginning to wonder if prev_locale won't get clobbered somehow.

Oo, ick.

} > I also wonder whether "C" would not be a better choice than "POSIX" here.
} 
} On sheer byte count or by some other criterion?

(1) That it's more likely to be supported correctly, and (2) that it is
after all C-like expression parsing that math.c is attempting to peform.

It further occurs to me that it might be bad even to have output formatting
controlled by LC_NUMERIC, given the tendency of scripts to do things like

	eval $(print ...)

Where *exactly* would LC_NUMERIC make any difference in pws-9 ?

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: math and locale
  1999-11-22  8:17     ` Bart Schaefer
@ 1999-11-22 14:42       ` Clint Adams
  1999-11-22 18:23         ` Bart Schaefer
  0 siblings, 1 reply; 10+ messages in thread
From: Clint Adams @ 1999-11-22 14:42 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

> I am mostly concerned about things like re-opening (or worse, reading) of
> external locale-definition files.

So if the locale is "C" or "POSIX" don't make the second setlocale
call.  Or is there a better solution?

> (1) That it's more likely to be supported correctly, and (2) that it is
> after all C-like expression parsing that math.c is attempting to peform.

Fair enough.

> It further occurs to me that it might be bad even to have output formatting
> controlled by LC_NUMERIC, given the tendency of scripts to do things like
> 
> 	eval $(print ...)

Yes.  I thought we were going to implement options/flags to control
such output.

> Where *exactly* would LC_NUMERIC make any difference in pws-9 ?

Anywhere LC_ALL would, except for zzlex now.  What are you asking?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: math and locale
  1999-11-22 14:42       ` Clint Adams
@ 1999-11-22 18:23         ` Bart Schaefer
  1999-11-22 19:36           ` Clint Adams
  1999-11-22 20:03           ` Zefram
  0 siblings, 2 replies; 10+ messages in thread
From: Bart Schaefer @ 1999-11-22 18:23 UTC (permalink / raw)
  To: zsh-workers

On Nov 22,  9:42am, Clint Adams wrote:
} Subject: Re: PATCH: math and locale
}
} > I am mostly concerned about things like re-opening (or worse, reading) of
} > external locale-definition files.
} 
} So if the locale is "C" or "POSIX" don't make the second setlocale
} call.  Or is there a better solution?

I'm hoping someone more familiar with it will tell us.  Maybe there isn't 
anyone more familiar with it on the mailing list ...  In the meantime,
it's OK as is.

} > (1) That it's more likely to be supported correctly, and (2) that it is
} > after all C-like expression parsing that math.c is attempting to peform.
} 
} Fair enough.

Glancing through /usr/X11R6/lib/X11/locale/locale.alias I see that the
"POSIX" locale is an alias for the "C" locale, so it probably doesn't
really matter either way.  Anyone else have an opinion?  Zefram, perhaps?

} > Where *exactly* would LC_NUMERIC make any difference in pws-9 ?
} 
} Anywhere LC_ALL would, except for zzlex now.  What are you asking?

I'm trying to come up with a circumstance in which zsh would "print" or
otherwise output a number in a format that zzlex would then fail to read.
I tried setting LC_ALL to a few different things and then doing "echo $x"
where x is a floating-point parameter, but it never changed anything.
Where else might it matter?  Output of process times or cpu percentages?
Can we make an exhaustive list?  Or am I worrying about it too much?

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: math and locale
  1999-11-22 18:23         ` Bart Schaefer
@ 1999-11-22 19:36           ` Clint Adams
  1999-11-22 20:03           ` Zefram
  1 sibling, 0 replies; 10+ messages in thread
From: Clint Adams @ 1999-11-22 19:36 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

> I'm trying to come up with a circumstance in which zsh would "print" or
> otherwise output a number in a format that zzlex would then fail to read.
> I tried setting LC_ALL to a few different things and then doing "echo $x"
> where x is a floating-point parameter, but it never changed anything.
> Where else might it matter?  Output of process times or cpu percentages?
> Can we make an exhaustive list?  Or am I worrying about it too much?

No, this is a definite problem.

% LC_ALL=pl_PL
% typeset -f g
% ((g=4.4))
% echo $g
4,4000000000
% ((g=$g + 3))
% echo $g
4,0000000000


And also:

% LC_ALL=de_DE
% echo $g
4,0000000000
% ((g=8.3))
% echo $g  
8,3000000000
% printf "%f\n" $g  
printf: 8,3000000000: value not completely converted
8.000000
% export LC_ALL
% printf "%f\n" $g
8,300000


printf is GNU sh-utils.

As for process times:

% time sync
sync  0,00s user 0,00s system 0% cpu 0,125 total


Anything that uses convfloat() will be affected.
I can't find anything else.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: math and locale
  1999-11-22 18:23         ` Bart Schaefer
  1999-11-22 19:36           ` Clint Adams
@ 1999-11-22 20:03           ` Zefram
  1999-11-23 18:18             ` Bart Schaefer
  1999-11-26 22:08             ` Peter Stephenson
  1 sibling, 2 replies; 10+ messages in thread
From: Zefram @ 1999-11-22 20:03 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: zsh-workers

Bart Schaefer wrote:
>Glancing through /usr/X11R6/lib/X11/locale/locale.alias I see that the
>"POSIX" locale is an alias for the "C" locale, so it probably doesn't
>really matter either way.  Anyone else have an opinion?  Zefram, perhaps?

Well since you asked...

"C" is probably to be preferred, because it will be available in *all*
locale implementations, whereas the "POSIX" locale is a POSIXism.
However, where the locale is already "POSIX", presumably we wouldn't need
to change that, if we're making setting it conditional -- I don't have
the standard to hand, but I'd be very surprised if it defined a different
numeric form.  Data point: setlocale(3) on Linux indicates that "C" and
"POSIX" are canonically equivalent.

>} > Where *exactly* would LC_NUMERIC make any difference in pws-9 ?

Here we run into one of the big conceptual problems with locales.
Are we outputting human text or machine data?  There's a need for both.
locale-aware programs need to consider which category each output
falls into, and perform conversions appropriately.  Unfortunately, the
C locale system provides no way to switch locale on a per-conversion
basis -- not even separate interfaces to perform conversions in the "C"
and current locales.  This is where it all falls down, and programmers
like me just throw up our hands and give up until we're provided with a
usable interface.

If we do wish zsh itself to use locales, I see several possibilities:

* We can do what POSIX defines for most utilities (not sure what it
  says about sh): guarantee sensible behaviour in the "POSIX" locale,
  and leave it undefined everywhere else.  Strictly speaking, if you
  want the standard form of output from most POSIX utilities, you have
  to set LC_ALL=POSIX for it.

* We can use the selected locale for LC_MESSAGES, which only affects
  things that are definitely for human consumption, and leave everything
  else using the "C" locale.  This would have almost exactly the desired
  effect for zsh, and is trivially easy to implement.

* We can try to do it properly: decide which things should be
  locale-dependent and which shouldn't.  We can fabricate a set of
  functions like glibc's strtod_l() (perform strtod() using locale X)
  to make it easier.  Problem: we'd get ambiguous cases.  strftime()
  is particularly nasty in this respect.  However, following this path,
  we could make the whole locale thing much easier for the users of zsh:
  we could provide interfaces where the user specifies whether to use
  locales or not.  ztrftime() could allow a flag in % sequences to say
  "use the "C" locale for this expansion".

Opinions?

-zefram


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: math and locale
  1999-11-22 20:03           ` Zefram
@ 1999-11-23 18:18             ` Bart Schaefer
  1999-11-26 22:08             ` Peter Stephenson
  1 sibling, 0 replies; 10+ messages in thread
From: Bart Schaefer @ 1999-11-23 18:18 UTC (permalink / raw)
  To: zsh-workers

On Nov 22,  8:03pm, Zefram wrote:
} Subject: Re: PATCH: math and locale
}
} If we do wish zsh itself to use locales, I see several possibilities:
} 
} * We can do what POSIX defines for most utilities (not sure what it
}   says about sh): guarantee sensible behaviour in the "POSIX" locale,
}   and leave it undefined everywhere else.  Strictly speaking, if you
}   want the standard form of output from most POSIX utilities, you have
}   to set LC_ALL=POSIX for it.
} 
} * We can use the selected locale for LC_MESSAGES, which only affects
}   things that are definitely for human consumption, and leave everything
}   else using the "C" locale.  This would have almost exactly the desired
}   effect for zsh, and is trivially easy to implement.
} 
} * We can try to do it properly: decide which things should be
}   locale-dependent and which shouldn't.

It sounds like we already have the first one, that the second one might be
preferable, and that the third one is going to be hard to implement and
even harder to document clearly.

I suggest that we first find out what POSIX says about shells in this
regard (Zoltan?  Are you out there?), and then choose whichever of the
first two gets us closest to that.

Then we worry about how to get all the way to POSIX (if we aren't) *after*
the next major release (3.2 or whatever it will be, not the next 3.1.x).

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: math and locale
  1999-11-22 20:03           ` Zefram
  1999-11-23 18:18             ` Bart Schaefer
@ 1999-11-26 22:08             ` Peter Stephenson
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Stephenson @ 1999-11-26 22:08 UTC (permalink / raw)
  To: Zsh hackers list

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

Zefram wrote:
> * We can use the selected locale for LC_MESSAGES, which only affects
>   things that are definitely for human consumption, and leave everything
>   else using the "C" locale.  This would have almost exactly the desired
>   effect for zsh, and is trivially easy to implement.

Sounds reasonable, since there's definitely no chance of getting things
like alternative decimal points right with locales active.

I don't see a problem with using locales in the ctype macros, so that Ü is
recognized as an uppercase letter, etc. --- is there one?  Relying on
printeightbit for output is about the only alternative, and isn't very
reliable.  Or maybe you simply weren't thinking of that, since it doesn't
determine the output of one character rather than another.

We could consider options to use locales in other cases (e.g. standard glob
sorting), but I suspect a final decision not to use them would be better
here.

-- 
Peter Stephenson <pws@supanet.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~1999-11-26 22:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-11-20 20:18 PATCH: math and locale Clint Adams
1999-11-20 20:53 ` Bart Schaefer
1999-11-21 18:14   ` Clint Adams
1999-11-22  8:17     ` Bart Schaefer
1999-11-22 14:42       ` Clint Adams
1999-11-22 18:23         ` Bart Schaefer
1999-11-22 19:36           ` Clint Adams
1999-11-22 20:03           ` Zefram
1999-11-23 18:18             ` Bart Schaefer
1999-11-26 22:08             ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).