9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* Re: [9fans] wchar_t in ANSI C (was "Announce: port")
@ 2002-04-29 12:42 rob pike, esq.
  2002-04-29 16:18 ` Douglas A. Gwyn
  0 siblings, 1 reply; 6+ messages in thread
From: rob pike, esq. @ 2002-04-29 12:42 UTC (permalink / raw)
  To: 9fans

> Sounds like somebody who doesn't use them enough to know.
> wchar_t is closely analogous to rune.
> The real problem is that "char" is inadequate for encoding a character,
> largely a consequence of Dennis chiming in on the sizeof(char)==1 side.

I don't want to debate whether sizeof(char) should be 1, but I do
think you're being too forgiving about wchar_t, at least in the
original standard.  There were too many holes in the standard, such as
no defined format for printing wchar_t strings, no defined conversion
between strings of either type (just of individual characters) and no
defined input method.  In short, no stdio support!  Too much last-minute
committee design, I find.

Footnotes 119 and 122 in the I/O section of the standard (printf,
scanf) both read: "No special provisions are made for multibyte
characters."  Give me a break!  How hard would it have been to define
%lc and %ls, for instance?

The answer is surprisingly subtle, and is answered in my next paragraph.

The issue that cheeses me most still remains even in the new standard:
the clumsiness of converting in the face of conversion errors such as
malformed UTF-8, which turn up a lot when you're scanning binary data
looking for strings, or just get handed something like Latin-1 when
you're expecting UTF-8.  Most programs (e.g grep) can do nothing
useful in the face of errors except barge on, but the ANSI C standard
makes the standard character processing loop a real mess.  It also
makes scanf("%ls or %lc") impossible to write consistently with the
rest of the standard, since you need to stop if there's a conversion
error, almost never what you want.  This issue is a matter of taste,
but I feel it's done wrong.  The Plan 9 model, with the concept of an
"Error Rune", makes it easy to ignore errors but also easy to handle
them, as you decide.  Plan 9's is a much better model because it was a
model born of experience rather than design without implementation.

I reiterate that the error handling issue is one of taste, but that
there is no excuse for omitting wchar_t support in stdio.

We wrote about this in our UTF paper
	http://plan9.bell-labs.com/sys/doc/utf.pdf .html .ps
(The .html version has some character set awkwardness!).  If we could
have used ANSI C's design for wide characters, we would have, but it
was inadequate.

-rob



^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: [9fans] wchar_t in ANSI C (was "Announce: port")
@ 2002-04-29 12:53 rob pike, esq.
  0 siblings, 0 replies; 6+ messages in thread
From: rob pike, esq. @ 2002-04-29 12:53 UTC (permalink / raw)
  To: 9fans

Here's a simple version of the problem.  Imagine you have a (bio) loop
along the lines of

	int c;

	while((c = Bgetc(&bWin)) != Beof){
		c = process(c);
		Bputc(&out, c);
	}

To make this work with UTF-8, all you do is change 'c' to 'rune'
in the calls:

	int c;

	while((c = Bgetrune(&bWin)) != Beof){
		c = process(c);
		Bputrune(&out, c);
	}

Loops like this are everywhere in the Plan 9 tools.  Bgetrune gets
called much more than Bgetc, I bet, at least for programs operating
on text.

No such charm works in ANSI C.

-rob



^ permalink raw reply	[flat|nested] 6+ messages in thread
[parent not found: <9b8b28678237726753936b99587567ed@plan9.bell-labs.com>]
[parent not found: <991a7d99caeee7b2557f759e7b5a8a77@caldo.demon.co.uk>]
* Re: [9fans] wchar_t in ANSI C (was "Announce: port")
@ 2002-04-30 18:16 David Gordon Hogan
  0 siblings, 0 replies; 6+ messages in thread
From: David Gordon Hogan @ 2002-04-30 18:16 UTC (permalink / raw)
  To: 9fans

>> Now you're being needlessly pedantic. Plan 9 does not run on any
>> 16-bit platform and I explicitly said it was a Plan 9 example.
>
> Well, okay, I was looking toward what would be needed were the
> example to be extended to support more general encodings, in
> anticipation of a complaint that Standard C also requires
> "int" be changed to an appropriate typedef.  If Plan 9 were
> addressing a wider problem domain it would need the same kind
> of thing done for it too.

Yeah, but C doesn't have parametric polymorphism ;-)



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-04-30 18:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-04-29 12:42 [9fans] wchar_t in ANSI C (was "Announce: port") rob pike, esq.
2002-04-29 16:18 ` Douglas A. Gwyn
2002-04-29 12:53 rob pike, esq.
     [not found] <9b8b28678237726753936b99587567ed@plan9.bell-labs.com>
2002-04-30  9:40 ` Douglas A. Gwyn
     [not found] <991a7d99caeee7b2557f759e7b5a8a77@caldo.demon.co.uk>
2002-04-30  9:40 ` Douglas A. Gwyn
2002-04-30 18:16 David Gordon Hogan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).