9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] strangely typed functions in standard library
@ 2006-05-16  3:03 Matt Stewart
  2006-05-16 11:40 ` Martin Neubauer
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Matt Stewart @ 2006-05-16  3:03 UTC (permalink / raw)
  To: 9fans

The following functions are described as accepting a Rune, but instead
the parameters are of type long.  Why?

int runelen(long);
char *utfrune(char *, long);
char *utfrrune(char *, long);


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] strangely typed functions in standard library
  2006-05-16  3:03 [9fans] strangely typed functions in standard library Matt Stewart
@ 2006-05-16 11:40 ` Martin Neubauer
  2006-05-16 15:09 ` R
  2006-05-19 22:49 ` Lluís Batlle i Rossell
  2 siblings, 0 replies; 11+ messages in thread
From: Martin Neubauer @ 2006-05-16 11:40 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

* Matt Stewart (rotaerk1@gmail.com) wrote:
> The following functions are described as accepting a Rune, but instead
> the parameters are of type long.  Why?
> 
> int runelen(long);
> char *utfrune(char *, long);
> char *utfrrune(char *, long);
> 

Though I'm far from being an expert on that matter, I would assume
it's for the same reason putchar() takes an int.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] strangely typed functions in standard library
  2006-05-16  3:03 [9fans] strangely typed functions in standard library Matt Stewart
  2006-05-16 11:40 ` Martin Neubauer
@ 2006-05-16 15:09 ` R
  2006-05-19 22:49 ` Lluís Batlle i Rossell
  2 siblings, 0 replies; 11+ messages in thread
From: R @ 2006-05-16 15:09 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 5/16/06, Matt Stewart <rotaerk1@gmail.com> wrote:
> The following functions are described as accepting a Rune, but instead
> the parameters are of type long.  Why?
>
> int runelen(long);
> char *utfrune(char *, long);
> char *utfrrune(char *, long);

full unicode is 32 bit, even if plan9 (afaik)
supports only characters in the BMP.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] strangely typed functions in standard library
  2006-05-19 22:49 ` Lluís Batlle i Rossell
@ 2006-05-19 22:43   ` quanstro
  0 siblings, 0 replies; 11+ messages in thread
From: quanstro @ 2006-05-19 22:43 UTC (permalink / raw)
  To: 9fans

while true, this is not the reason for the function signature.  this is for
type promotion reasons.  Rune is still an unsigned short.  plan 9 does not
support ucs-4.

- erik

On Fri May 19 17:51:07 CDT 2006, viriketo@gmail.com wrote:

> Matt Stewart wrote:
> > The following functions are described as accepting a Rune, but instead
> > the parameters are of type long.  Why?
> > 
> > int runelen(long);
> > char *utfrune(char *, long);
> > char *utfrrune(char *, long);
> > 
>  From History in this wikipedia page (http://en.wikipedia.org/wiki/UTF-32):
> 
> UCS-4 is sufficient to represent all of the Unicode code space, which 
> has 1114112 (= 2^20+2^16) code points and therefore requires only up to 
> hexadecimal 10FFFF. Some people consider it wasteful to reserve such a 
> large code space for mapping a relatively small set of code points, so a 
> new encoding form, UTF-32, was proposed.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] strangely typed functions in standard library
  2006-05-16  3:03 [9fans] strangely typed functions in standard library Matt Stewart
  2006-05-16 11:40 ` Martin Neubauer
  2006-05-16 15:09 ` R
@ 2006-05-19 22:49 ` Lluís Batlle i Rossell
  2006-05-19 22:43   ` quanstro
  2 siblings, 1 reply; 11+ messages in thread
From: Lluís Batlle i Rossell @ 2006-05-19 22:49 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 647 bytes --]

Matt Stewart wrote:
> The following functions are described as accepting a Rune, but instead
> the parameters are of type long.  Why?
> 
> int runelen(long);
> char *utfrune(char *, long);
> char *utfrrune(char *, long);
> 
 From History in this wikipedia page (http://en.wikipedia.org/wiki/UTF-32):

UCS-4 is sufficient to represent all of the Unicode code space, which 
has 1114112 (= 2^20+2^16) code points and therefore requires only up to 
hexadecimal 10FFFF. Some people consider it wasteful to reserve such a 
large code space for mapping a relatively small set of code points, so a 
new encoding form, UTF-32, was proposed.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/x-pkcs7-signature, Size: 3311 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] strangely typed functions in standard library
  2006-05-19 12:43   ` Joel Salomon
@ 2006-05-19 13:03     ` Victor Nazarov
  0 siblings, 0 replies; 11+ messages in thread
From: Victor Nazarov @ 2006-05-19 13:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Joel Salomon wrote:

> On 5/18/06, Bruce Ellis <bruce.ellis@gmail.com> wrote:
>
>> 32 bit unicode is not Rune friendly
>
>
> The "other" standard, ISO 10646, has promised that 21 bits will always
> be sufficient to represent characters.
>
> Making Rune a 32 bit type allows all characters to be represented and
> leaves room for out-of-band information; for example, the end of a
> utf8 text stream (EOF) can be (Rune32)-1, with no need for a wider
> type.
>
> --Joel

Oh, I think it is resonable. I thought that ISO10656 is 32 bit, so EOF 
detection
and so on will break expanding Rune to 32 bit...

Composing seems to me a better solution than encoding everything to one
alphabet. I mean sorting, transformations and so on (convertion to ASCII 
at least).
But I really not competent in this question.

--
Victor



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] strangely typed functions in standard library
  2006-05-18  9:21 ` Bruce Ellis
  2006-05-19  4:43   ` quanstro
@ 2006-05-19 12:43   ` Joel Salomon
  2006-05-19 13:03     ` Victor Nazarov
  1 sibling, 1 reply; 11+ messages in thread
From: Joel Salomon @ 2006-05-19 12:43 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 5/18/06, Bruce Ellis <bruce.ellis@gmail.com> wrote:
> 32 bit unicode is not Rune friendly

The "other" standard, ISO 10646, has promised that 21 bits will always
be sufficient to represent characters.

Making Rune a 32 bit type allows all characters to be represented and
leaves room for out-of-band information; for example, the end of a
utf8 text stream (EOF) can be (Rune32)-1, with no need for a wider
type.

--Joel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] strangely typed functions in standard library
  2006-05-19  4:43   ` quanstro
@ 2006-05-19  5:03     ` geoff
  0 siblings, 0 replies; 11+ messages in thread
From: geoff @ 2006-05-19  5:03 UTC (permalink / raw)
  To: 9fans

I feel that the Unicode consortium bungled the job.  People warned
them from the outset that 16 bits was a largish space, but still
finite.  Once Han unification was done, the Unicode people felt that
they had all the room in the world, and assigned generous portions of
the space to each alphabet.  Others warned them that they should be
more conservative in their allocations, but were waved off.

It may be that 16 bits are not enough to express all the necessary
characters in the world, but I don't think that the Unicode consortium
have proven that case.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] strangely typed functions in standard library
  2006-05-18  9:21 ` Bruce Ellis
@ 2006-05-19  4:43   ` quanstro
  2006-05-19  5:03     ` geoff
  2006-05-19 12:43   ` Joel Salomon
  1 sibling, 1 reply; 11+ messages in thread
From: quanstro @ 2006-05-19  4:43 UTC (permalink / raw)
  To: 9fans

of course, how an array of Runes is stored is a different issue from how a
Rune argument is passed on the stack (or in a register). 

don't consider this advocacy, but what would break if runes were expanded
to 32 bits?  i think that font handling would need some limits changed,
chartorune and runetochar would need to be modified and some enum
constants in libc.h would need to be changed.  this seems to me to be small
potatoes in comparison to dealing with issues lurking within the basic plane
like character composition.

rob has suggested passing uncomposed characters to libdraw and handling
the problem there.  but there's one problem with that.  how do you stick
a nonspacing horn onto an arbitrary letter?  how do you put a grave accent
on top of that?  (transliterations of cryllic to the roman alphabet use some 
double- and triple- accented letters which do not exist in precombined form
within unicode.)

i modified p9p libdraw at one point to draw combining characters.  (i.e. compose
a zero-width character on top of the previous character.) the results
were not legible.  a+diaresis may have been marginal buit A + " was mud.

this is really annoying.  characters should not combine.  the character-composition
algorithims of tex+metafont shouldn't come hidden within a character set.  

(well that's my 2¢, anyway.  maybe somebody has an idea on how to manage these
issues.)

- erik

p.s. how would you do this?  

> i'd like to map them to RFat ... something unassigned in
> 0xFF.. space.

the problem is that there's not enough free space in the basic plane. and as usual 
with these things invention is the mother of all necessity.


On Thu May 18 04:22:47 CDT 2006, bruce.ellis@gmail.com wrote:
> 32 bit unicode is not Rune friendly ... i hope Runes don't
> get fatter.  it will break many things.  rob has had something
> to say about this, do a search on the list.
> 
> i'd like to map them to RFat ... something unassigned in
> 0xFF.. space.
> 
> use them at your peril.
> 
> brucee
> 
> On 5/18/06, erik quanstrom <quanstro@quanstro.net> wrote:
> > while this is true, i believe that the real reason for this is that
> > on a >=32-bit machine, an ushort can just be declared
> > to be a long by the compiler whereas the compiler must emit
> > instructions to convert a long to an unsigned short.
> >
> > - erik
> >
> > On Tue May 16 10:10:37 CDT 2006, 0xef967c36@gmail.com wrote:
> > > On 5/16/06, Matt Stewart <rotaerk1@gmail.com> wrote:
> > > > The following functions are described as accepting a Rune, but instead
> > > > the parameters are of type long.  Why?
> > > >
> > > > int runelen(long);
> > > > char *utfrune(char *, long);
> > > > char *utfrrune(char *, long);
> > >
> > > full unicode is 32 bit, even if plan9 (afaik)
> > > supports only characters in the BMP.
> >
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] strangely typed functions in standard library
  2006-05-18  3:06 erik quanstrom
@ 2006-05-18  9:21 ` Bruce Ellis
  2006-05-19  4:43   ` quanstro
  2006-05-19 12:43   ` Joel Salomon
  0 siblings, 2 replies; 11+ messages in thread
From: Bruce Ellis @ 2006-05-18  9:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

32 bit unicode is not Rune friendly ... i hope Runes don't
get fatter.  it will break many things.  rob has had something
to say about this, do a search on the list.

i'd like to map them to RFat ... something unassigned in
0xFF.. space.

use them at your peril.

brucee

On 5/18/06, erik quanstrom <quanstro@quanstro.net> wrote:
> while this is true, i believe that the real reason for this is that
> on a >=32-bit machine, an ushort can just be declared
> to be a long by the compiler whereas the compiler must emit
> instructions to convert a long to an unsigned short.
>
> - erik
>
> On Tue May 16 10:10:37 CDT 2006, 0xef967c36@gmail.com wrote:
> > On 5/16/06, Matt Stewart <rotaerk1@gmail.com> wrote:
> > > The following functions are described as accepting a Rune, but instead
> > > the parameters are of type long.  Why?
> > >
> > > int runelen(long);
> > > char *utfrune(char *, long);
> > > char *utfrrune(char *, long);
> >
> > full unicode is 32 bit, even if plan9 (afaik)
> > supports only characters in the BMP.
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] strangely typed functions in standard library
@ 2006-05-18  3:06 erik quanstrom
  2006-05-18  9:21 ` Bruce Ellis
  0 siblings, 1 reply; 11+ messages in thread
From: erik quanstrom @ 2006-05-18  3:06 UTC (permalink / raw)
  To: 9fans

while this is true, i believe that the real reason for this is that
on a >=32-bit machine, an ushort can just be declared
to be a long by the compiler whereas the compiler must emit 
instructions to convert a long to an unsigned short.

- erik

On Tue May 16 10:10:37 CDT 2006, 0xef967c36@gmail.com wrote:
> On 5/16/06, Matt Stewart <rotaerk1@gmail.com> wrote:
> > The following functions are described as accepting a Rune, but instead
> > the parameters are of type long.  Why?
> >
> > int runelen(long);
> > char *utfrune(char *, long);
> > char *utfrrune(char *, long);
> 
> full unicode is 32 bit, even if plan9 (afaik)
> supports only characters in the BMP.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-05-19 22:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-16  3:03 [9fans] strangely typed functions in standard library Matt Stewart
2006-05-16 11:40 ` Martin Neubauer
2006-05-16 15:09 ` R
2006-05-19 22:49 ` Lluís Batlle i Rossell
2006-05-19 22:43   ` quanstro
2006-05-18  3:06 erik quanstrom
2006-05-18  9:21 ` Bruce Ellis
2006-05-19  4:43   ` quanstro
2006-05-19  5:03     ` geoff
2006-05-19 12:43   ` Joel Salomon
2006-05-19 13:03     ` Victor Nazarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).