9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] ... in the Kingdom of Sources
@ 2003-09-05 11:54 David Presotto
  2003-09-05 12:14 ` Lucio De Re
  0 siblings, 1 reply; 42+ messages in thread
From: David Presotto @ 2003-09-05 11:54 UTC (permalink / raw)
  To: 9fans

Now that I'm pointing to the right place, the bleeding edge is
a lot bloodier.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-05 11:54 [9fans] ... in the Kingdom of Sources David Presotto
@ 2003-09-05 12:14 ` Lucio De Re
  2003-09-05 12:33   ` boyd, rounin
  2003-09-06 11:13   ` boyd, rounin
  0 siblings, 2 replies; 42+ messages in thread
From: Lucio De Re @ 2003-09-05 12:14 UTC (permalink / raw)
  To: 9fans

On Fri, Sep 05, 2003 at 07:54:48AM -0400, David Presotto wrote:
>
> Now that I'm pointing to the right place, the bleeding edge is
> a lot bloodier.

We'll see next Friday how well I have scripted my way around the
Bell Labs licence etc. acknowledgement buttons.

Now it's time for me to get out of town for the weekend.

++L


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-05 12:14 ` Lucio De Re
@ 2003-09-05 12:33   ` boyd, rounin
  2003-09-07  9:00     ` Aharon Robbins
  2003-09-06 11:13   ` boyd, rounin
  1 sibling, 1 reply; 42+ messages in thread
From: boyd, rounin @ 2003-09-05 12:33 UTC (permalink / raw)
  To: 9fans

bloodier?  i've just about Bio-fided awk.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-05 12:14 ` Lucio De Re
  2003-09-05 12:33   ` boyd, rounin
@ 2003-09-06 11:13   ` boyd, rounin
  1 sibling, 0 replies; 42+ messages in thread
From: boyd, rounin @ 2003-09-06 11:13 UTC (permalink / raw)
  To: 9fans

as one with the knowledge and magic of the source.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-05 12:33   ` boyd, rounin
@ 2003-09-07  9:00     ` Aharon Robbins
  2003-09-07  9:05       ` boyd, rounin
  2003-09-07 13:48       ` Russ Cox
  0 siblings, 2 replies; 42+ messages in thread
From: Aharon Robbins @ 2003-09-07  9:00 UTC (permalink / raw)
  To: 9fans

In article <05e001c373a9$ea37d260$b9844051@insultant.net> you write:
>bloodier?  i've just about Bio-fided awk.

While you were at it, did you look at BWK's current version?

	http://cm.bell-labs.com/who/bwk/awk.tar.gz

Some things (mostly in dark corners of the language) have been
fixed over the years.

Does anyone know when the Plan 9 awk was forked from BWK's?
Boyd: I have archive copies dating back to 1993, so, *if you're
interested*, I can send you something close to the original
fork time, to help you isolate what's changed.  Just let
me know off-line.

Arnold


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-07  9:00     ` Aharon Robbins
@ 2003-09-07  9:05       ` boyd, rounin
  2003-09-08  9:10         ` Douglas A. Gwyn
  2003-09-07 13:48       ` Russ Cox
  1 sibling, 1 reply; 42+ messages in thread
From: boyd, rounin @ 2003-09-07  9:05 UTC (permalink / raw)
  To: 9fans

> Boyd: I have archive copies dating back to 1993, ...

ta, but i'm _this close_ to getting it load.  i just have to
deal with manitowok [wchat_t bullshit] so it'll load.

i'm typing/mousing in paris but the work is getting done
on dan's net in ny, with drawterm.

god give me stregnth: that whar_t crap is just insane.

yup i got popen/pclose/system all written.

and, one more stitch, this time in my forehead and blood
all over my CWU-27/P.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-07  9:00     ` Aharon Robbins
  2003-09-07  9:05       ` boyd, rounin
@ 2003-09-07 13:48       ` Russ Cox
  2003-09-07 15:45         ` David Presotto
  1 sibling, 1 reply; 42+ messages in thread
From: Russ Cox @ 2003-09-07 13:48 UTC (permalink / raw)
  To: 9fans

Aharon Robbins wrote:

> Does anyone know when the Plan 9 awk was forked from BWK's?


19990602 was the last full import, although I think we've picked up
one or two individual bug fixes since then.





^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-07 13:48       ` Russ Cox
@ 2003-09-07 15:45         ` David Presotto
  0 siblings, 0 replies; 42+ messages in thread
From: David Presotto @ 2003-09-07 15:45 UTC (permalink / raw)
  To: 9fans

BWK said something about fixing up awk quite recently.  I'll ask him
what state its in when he comes in next week.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-07  9:05       ` boyd, rounin
@ 2003-09-08  9:10         ` Douglas A. Gwyn
  2003-09-08  9:52           ` boyd, rounin
  2003-09-08 17:33           ` rob pike, esq.
  0 siblings, 2 replies; 42+ messages in thread
From: Douglas A. Gwyn @ 2003-09-08  9:10 UTC (permalink / raw)
  To: 9fans

boyd, rounin wrote:
> god give me stregnth: that whar_t crap is just insane.

You must not understand it, then.  It's just like char
except wide enough.  Plan 9 "rune" is hardly different
apart from the names.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-08  9:10         ` Douglas A. Gwyn
@ 2003-09-08  9:52           ` boyd, rounin
  2003-09-08 17:33           ` rob pike, esq.
  1 sibling, 0 replies; 42+ messages in thread
From: boyd, rounin @ 2003-09-08  9:52 UTC (permalink / raw)
  To: 9fans

> > god give me stregnth: that whar_t crap is just insane.
>
> You must not understand it, then.

that's the problem.  it's too complex.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-08  9:10         ` Douglas A. Gwyn
  2003-09-08  9:52           ` boyd, rounin
@ 2003-09-08 17:33           ` rob pike, esq.
  2003-09-09  8:34             ` Douglas A. Gwyn
  1 sibling, 1 reply; 42+ messages in thread
From: rob pike, esq. @ 2003-09-08 17:33 UTC (permalink / raw)
  To: 9fans

> You must not understand it, then.  It's just like char
> except wide enough.  Plan 9 "rune" is hardly different
> apart from the names.

ken and i wrote a paper about how and why it is more than
'hardly different' (http://plan9.bell-labs.com/sys/doc/utf.pdf).

there are two important differences.  first is that (old) ANSI C
did not provide any formatted i/o for wchar_t, rendering them
essentially useless.  second, more subtle but probably more
central, the way plan 9 handles encoding errors with runetochar
and chartorune, by introducing the 'error rune' 0x80, is vastly
more convenient and effective than the error return values set
by mbtowchar stuff.

-rob



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-08 17:33           ` rob pike, esq.
@ 2003-09-09  8:34             ` Douglas A. Gwyn
  2003-09-09 15:50               ` rob pike, esq.
                                 ` (3 more replies)
  0 siblings, 4 replies; 42+ messages in thread
From: Douglas A. Gwyn @ 2003-09-09  8:34 UTC (permalink / raw)
  To: 9fans

rob pike, esq. wrote:
> there are two important differences.  first is that (old) ANSI C
> did not provide any formatted i/o for wchar_t, rendering them
> essentially useless.

The target clientele at the time insisted that they did not want
us to specify more than the minimum necessary support, and that
they would happily invoke the multibyte<->wide conversions when
necessary.  I remember Plauger relaying that promise.  Not
surprisingly, as soon as they got the capability they started
pushing for the missing functions, which became part of a
normative amendment to the C standard by 1995 (along with mode
on the stream to control whether automatic conversion is or is
not applied).

> the way plan 9 handles encoding errors with runetochar
> and chartorune, by introducing the 'error rune' 0x80, is vastly
> more convenient and effective than the error return values set
> by mbtowchar stuff.

Since an encoding error should "never" occur, we thought it
better to stop scanning rather that plow ahead and convert some
unknown amount of garbage once synchronization has been lost.
It was also not evident that there would always be a "spare"
wide-character code available for in-band signaling.

I think the main difference driving the designs is that the
Standard C version has to accommodate a huge variety of
multibyte and internal encoding systems, whereas Plan 9 could
just pick one.

Anyway, the C wide-character facilities are quite similar to
the legacy char-based ones, so Boyd shouldn't be finding them
"confusing".


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-09  8:34             ` Douglas A. Gwyn
@ 2003-09-09 15:50               ` rob pike, esq.
  2003-09-09 15:50               ` rob pike, esq.
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 42+ messages in thread
From: rob pike, esq. @ 2003-09-09 15:50 UTC (permalink / raw)
  To: 9fans

> Since an encoding error should "never" occur, we thought it
> better to stop scanning rather that plow ahead and convert some
> unknown amount of garbage once synchronization has been lost.
> It was also not evident that there would always be a "spare"
> wide-character code available for in-band signaling.

experience tells that 'never' happens many times a day.  for most
tools, let's say, oh, a terminal program, cat, grep, web browser, ...,
you should most certainly plow ahead.  for the few programs that
absolutely must not make mistakes, it's still possible to discover an
error has occurred (error rune + only one byte consumed).  the point -
often lost on standards committees - is that convenience is often more
important than universality or rigidity.

i admit that our strategy is tied to utf-8.

-rob



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-09  8:34             ` Douglas A. Gwyn
  2003-09-09 15:50               ` rob pike, esq.
@ 2003-09-09 15:50               ` rob pike, esq.
  2003-09-10 10:17               ` Bruce Ellis
  2003-09-12  3:01               ` Dan Cross
  3 siblings, 0 replies; 42+ messages in thread
From: rob pike, esq. @ 2003-09-09 15:50 UTC (permalink / raw)
  To: 9fans

> Since an encoding error should "never" occur, we thought it
> better to stop scanning rather that plow ahead and convert some
> unknown amount of garbage once synchronization has been lost.
> It was also not evident that there would always be a "spare"
> wide-character code available for in-band signaling.

experience tells that 'never' happens many times a day.  for most
tools, let's say, oh, a terminal program, cat, grep, web browser, ...,
you should most certainly plow ahead.  for the few programs that
absolutely must not make mistakes, it's still possible to discover an
error has occurred (error rune + only one byte consumed).  the point -
often lost on standards committees - is that convenience is often more
important than universality or rigidity.

i admit that our strategy is tied to utf-8.

-rob



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-09  8:34             ` Douglas A. Gwyn
  2003-09-09 15:50               ` rob pike, esq.
  2003-09-09 15:50               ` rob pike, esq.
@ 2003-09-10 10:17               ` Bruce Ellis
  2003-09-11  9:07                 ` Douglas A. Gwyn
  2003-09-12  3:01               ` Dan Cross
  3 siblings, 1 reply; 42+ messages in thread
From: Bruce Ellis @ 2003-09-10 10:17 UTC (permalink / raw)
  To: 9fans

I guess I somehow missed what is hard about UTF-8 synchronization.
By design there is no problem.

> Since an encoding error should "never" occur, we thought it
> better to stop scanning rather that plow ahead and convert some
> unknown amount of garbage once synchronization has been lost.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-10 10:17               ` Bruce Ellis
@ 2003-09-11  9:07                 ` Douglas A. Gwyn
  2003-09-11 13:06                   ` rog
                                     ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Douglas A. Gwyn @ 2003-09-11  9:07 UTC (permalink / raw)
  To: 9fans

Bruce Ellis wrote:
> I guess I somehow missed what is hard about UTF-8 synchronization.

We weren't talking about a supposed problem with UTF-8,
but about a supposed problem with Standard C wide-character
facilities.  The Standard C facility has to deal with a
much wider variety of multibyte encodings.  In any event,
flagging a conversion error only by a specific value
embedded in the converted data means that the error will
be missed unless the application scans the data looking
for it.  Seems to me that encourages use of erroneous data.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11  9:07                 ` Douglas A. Gwyn
@ 2003-09-11 13:06                   ` rog
  2003-09-11 14:01                     ` Lucio De Re
                                       ` (2 more replies)
  2003-09-11 15:42                   ` rob pike, esq.
  2003-09-12  3:53                   ` boyd, rounin
  2 siblings, 3 replies; 42+ messages in thread
From: rog @ 2003-09-11 13:06 UTC (permalink / raw)
  To: 9fans

> flagging a conversion error only by a specific value
> embedded in the converted data means that the error will
> be missed unless the application scans the data looking
> for it.  Seems to me that encourages use of erroneous data.

that argument strikes me as similar to the reasoning used by
politicians when they oppose reducing the harm associated with illicit
drugs on the grounds that it will encourage their use.

surely it's not a matter of encouraging or discouraging the use of
erroneous data, but simply of making as best a job of dealing with the
situations we encounter.

i'm certainly glad that when i accidentally browse a piece of latin-1
text in plan 9, i get almost all the text, with an occasional error
character thrown in.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11 13:06                   ` rog
@ 2003-09-11 14:01                     ` Lucio De Re
  2003-09-11 14:25                       ` rog
  2003-09-11 15:48                     ` rob pike, esq.
  2003-09-12  0:56                     ` Bruce Ellis
  2 siblings, 1 reply; 42+ messages in thread
From: Lucio De Re @ 2003-09-11 14:01 UTC (permalink / raw)
  To: 9fans

On Thu, Sep 11, 2003 at 02:06:48PM +0100, rog@vitanuova.com wrote:
>
> i'm certainly glad that when i accidentally browse a piece of latin-1
> text in plan 9, i get almost all the text, with an occasional error
> character thrown in.

That's because you have a good image processor inside your skull.
And because you haven't had to solve a complex problem created by
the occasional error character.

I would personally rather solve the problems caused by the application
of discipline than those caused by its relaxation.

The question does not even arise in a purely technological context,
it is sociology (will the horse bolt if I don't close the stable
door?) that makes it difficult to decide whether the enforcement
of strict discipline is necessary or counterproductive.

++L


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11 14:01                     ` Lucio De Re
@ 2003-09-11 14:25                       ` rog
  2003-09-11 15:17                         ` Lucio De Re
  2003-09-12  9:18                         ` Douglas A. Gwyn
  0 siblings, 2 replies; 42+ messages in thread
From: rog @ 2003-09-11 14:25 UTC (permalink / raw)
  To: 9fans

> That's because you have a good image processor inside your skull.
> And because you haven't had to solve a complex problem created by
> the occasional error character.

is truncating a 1MB file to 512K due to a mid-stream dud character
really a better solution than just flagging the dud character itself?

as you suggest, the answer is subjective, but surely the old maxim is
useful:

	be liberal in what you accept and strict in what you generate.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11 14:25                       ` rog
@ 2003-09-11 15:17                         ` Lucio De Re
  2003-09-12  9:18                         ` Douglas A. Gwyn
  1 sibling, 0 replies; 42+ messages in thread
From: Lucio De Re @ 2003-09-11 15:17 UTC (permalink / raw)
  To: 9fans

On Thu, Sep 11, 2003 at 03:25:00PM +0100, rog@vitanuova.com wrote:
>
> as you suggest, the answer is subjective, but surely the old maxim is
> useful:
>
> 	be liberal in what you accept and strict in what you generate.

Iff this norm is followed by everyone.  Think SPAM.

In my book, the deciding factor is whether the principle is
enforceable.  I tried programming a dialogue with a PABX a while
back, where the assumption is that the operator is a human.  I
sympathise with Doug altogether.

++L


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11  9:07                 ` Douglas A. Gwyn
  2003-09-11 13:06                   ` rog
@ 2003-09-11 15:42                   ` rob pike, esq.
  2003-09-12  1:18                     ` okamoto
  2003-09-12  9:18                     ` Ralph Corderoy
  2003-09-12  3:53                   ` boyd, rounin
  2 siblings, 2 replies; 42+ messages in thread
From: rob pike, esq. @ 2003-09-11 15:42 UTC (permalink / raw)
  To: 9fans

doug, do you think the C committee ever made a mistake?
your response to every comment here is that the C committee
got it right and any deviation, even hypothetical, is mistaken.

-rob



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11 13:06                   ` rog
  2003-09-11 14:01                     ` Lucio De Re
@ 2003-09-11 15:48                     ` rob pike, esq.
  2003-09-11 17:04                       ` Lucio De Re
  2003-09-12  0:56                     ` Bruce Ellis
  2 siblings, 1 reply; 42+ messages in thread
From: rob pike, esq. @ 2003-09-11 15:48 UTC (permalink / raw)
  To: 9fans

> i'm certainly glad that when i accidentally browse a piece of latin-1
> text in plan 9, i get almost all the text, with an occasional error
> character thrown in.

precisely.  the approach taken by the C standard implies you're
supposed to error check the conversion and presumably yell about it,
maybe exit.  but the great majority of text processing code is
interactive - editors, browsers, terminal programs - and must soldier
on.  the `error rune' solution not only solves the problem of
soldiering on, it provides a way to yell and takes no code at all in
the application.  (if you really want to stop processing on errors,
it's still a simple check; it's just that you almost never have to
bother to check and i can't think of a single program that does.)

-rob



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11 15:48                     ` rob pike, esq.
@ 2003-09-11 17:04                       ` Lucio De Re
  2003-09-11 17:40                         ` chris
  2003-09-12  9:18                         ` Douglas A. Gwyn
  0 siblings, 2 replies; 42+ messages in thread
From: Lucio De Re @ 2003-09-11 17:04 UTC (permalink / raw)
  To: 9fans

On Thu, Sep 11, 2003 at 08:48:43AM -0700, rob pike, esq. wrote:
>
> precisely.  the approach taken by the C standard implies you're
> supposed to error check the conversion and presumably yell about it,
> maybe exit.  but the great majority of text processing code is
> interactive - editors, browsers, terminal programs - and must soldier
> on.  the `error rune' solution not only solves the problem of
> soldiering on, it provides a way to yell and takes no code at all in
> the application.  (if you really want to stop processing on errors,
> it's still a simple check; it's just that you almost never have to
> bother to check and i can't think of a single program that does.)
>
I'm willing to concede rob's point.  In an interactive environment
error checking is counterproductive if a user response would be
preferable.  But you can't design a standard around "the great
majority of text processing code" without sacrificing the ability
to deal with the exceptions.

I've done a good deal of programming using Don Libes' Expect and
it's quite a lesson in the cost of relying in real intelligence to
deal with contingencies.  MS Windows programming really brings the
point home: there aren't many operations I would feel comfortable
automating unsupervised.  Whereas forgetting the "close" function
in a pop-up error message isn't too serious, being unable to detect
an out-of-memeory or out-of-disk-space condition in a background
task can be catastrophic.

I had a print spool completion pop-up appearing at the same time
as the screen saver Ctl-Alt-Del prompt pop-up: I didn't have to
identify myself before acknowledging the print completion, for
example.  Is something like that worth worrying about?  Not really,
but is not perhaps symptomatic of a programming style where all
contingencies are passed on to the user except when the user isn't
available, in which case they are merely glossed over until disaster
strikes, at which point the RESET button comes to the rescue?

Whereas in Plan 9 (and Unix) it is easy to distinguish between
interactive tools and command line utilities, in Windows the
distinction is very thoroughly and almost certainly intentionally
blurred.  In my experience very few Windows programmers actually
code with unattended, unsupervised operation in mind.

I think Doug's concerns, like mine, are in laying down rules that
make it possible to deal with the unexpected programmatically.
The ANSI-C committee, like the ITU-T, seemingly strive to remove
all possible ambiguity instead of relying on the human brain to
make a choice.  It is a sane approach to dealing with computers,
even though the compromises made in this quest may be atrocious
and, I'm sure, occasionally mistaken.  As compromises have to be,
when trying to address conflicting requirements.

What may help considerably is the acknowledgement that different
needs can be addressed by different solutions.  For example, Perl
is a prime example of a language designed to concoct one-off
solutions.  Some of us have to do this on a regular basis (I know
of one instance where data conversions from continually changing
formats is a daily occurrence) whereas others can indulge in
monumental works to be used for decades.  Writing monumental works
in Perl should be discouraged, I'm sure, while Limbo may be quite
unsuitable as an instrument to write short sysadmin scripts.

Another point worth raising (I think it was Geoff Collyer who first
pointed it out) is that graphic programming is still at the level
of assembly code.  A lot of the decision making in interactive
programming is similarly at a very low level and therefore seems
extremely cumbersome.  Somewhere, somehow - I'd expect rob to be
the likely genius to come up with something - somebody will have
the flash of inspiration that leads to second and third generation
graphic languages.  Until then, I think we have to accept that we
don't have sufficient grounds on which to construct "standards"
that address the real problem, we can only base our recommendations
on traditional principles that are undoubtedly obsolete.

++L


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11 17:04                       ` Lucio De Re
@ 2003-09-11 17:40                         ` chris
  2003-09-12  4:13                           ` boyd, rounin
  2003-09-12  9:18                           ` Douglas A. Gwyn
  2003-09-12  9:18                         ` Douglas A. Gwyn
  1 sibling, 2 replies; 42+ messages in thread
From: chris @ 2003-09-11 17:40 UTC (permalink / raw)
  To: 9fans

lucio@proxima.alt.za wrote:
>
> I'm willing to concede rob's point.  In an interactive environment
> error checking is counterproductive if a user response would be
> preferable.  But you can't design a standard around "the great
> majority of text processing code" without sacrificing the ability
> to deal with the exceptions.
>

But it is very easy to write a conversion function that does do the check.
If you have an exceptional requirement, then write the code for it.
The general purpose library should do the most convenient thing for the
majority of applications.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11 13:06                   ` rog
  2003-09-11 14:01                     ` Lucio De Re
  2003-09-11 15:48                     ` rob pike, esq.
@ 2003-09-12  0:56                     ` Bruce Ellis
  2 siblings, 0 replies; 42+ messages in thread
From: Bruce Ellis @ 2003-09-12  0:56 UTC (permalink / raw)
  To: 9fans

i totally agree.  give me a peter face rather than an IO error.
tho there may be clinical examples of clever uses of ANSI-C
wide characters i see a whole useable system based on the
pragmatic use of Runes.

> surely it's not a matter of encouraging or discouraging the use of
> erroneous data, but simply of making as best a job of dealing with the
> situations we encounter.
>
> i'm certainly glad that when i accidentally browse a piece of latin-1
> text in plan 9, i get almost all the text, with an occasional error
> character thrown in.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11 15:42                   ` rob pike, esq.
@ 2003-09-12  1:18                     ` okamoto
  2003-09-12  9:18                     ` Ralph Corderoy
  1 sibling, 0 replies; 42+ messages in thread
From: okamoto @ 2003-09-12  1:18 UTC (permalink / raw)
  To: 9fans

> doug, do you think the C committee ever made a mistake?

In Japan, we are taught Japanese government offcials never
make mistakes.   Yeah, then, we have very happy economic
problem now.   ☺

Kenji



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-09  8:34             ` Douglas A. Gwyn
                                 ` (2 preceding siblings ...)
  2003-09-10 10:17               ` Bruce Ellis
@ 2003-09-12  3:01               ` Dan Cross
  3 siblings, 0 replies; 42+ messages in thread
From: Dan Cross @ 2003-09-12  3:01 UTC (permalink / raw)
  To: 9fans

"Douglas A. Gwyn" <DAGwyn@null.net> writes:
>
> rob pike, esq. wrote:
> > there are two important differences.  first is that (old) ANSI C
> > did not provide any formatted i/o for wchar_t, rendering them
> > essentially useless.
>
> The target clientele at the time insisted that they did not want
> us to specify more than the minimum necessary support, and that
> they would happily invoke the multibyte<->wide conversions when
> necessary.  I remember Plauger relaying that promise.  Not
> surprisingly, as soon as they got the capability they started
> pushing for the missing functions, which became part of a
> normative amendment to the C standard by 1995 (along with mode
> on the stream to control whether automatic conversion is or is
> not applied).

So you're admiting it's a hack?

	- Dan C.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11  9:07                 ` Douglas A. Gwyn
  2003-09-11 13:06                   ` rog
  2003-09-11 15:42                   ` rob pike, esq.
@ 2003-09-12  3:53                   ` boyd, rounin
  2 siblings, 0 replies; 42+ messages in thread
From: boyd, rounin @ 2003-09-12  3:53 UTC (permalink / raw)
  To: 9fans

> We weren't talking about a supposed problem with UTF-8,
> but about a supposed problem with Standard C wide-character
> facilities.  The Standard C facility has to deal with a
> much wider variety of multibyte encodings.

i don't even understand the doc, it's straight outta µSloth:

    - it does X
    - it also does random things
    - it does not X

runes are simpler,  better designed and i can carry the doc
around i my head.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11 17:40                         ` chris
@ 2003-09-12  4:13                           ` boyd, rounin
  2003-09-12  9:18                           ` Douglas A. Gwyn
  1 sibling, 0 replies; 42+ messages in thread
From: boyd, rounin @ 2003-09-12  4:13 UTC (permalink / raw)
  To: 9fans

> But it is very easy to write a conversion function that does do the check.
> If you have an exceptional requirement, then write the code for it.
> The general purpose library should do the most convenient thing for the
> majority of applications.

except that it's not very useful.  iirc the original rune code did error
returns, but it was realised to be pointless, so they return the
rune error char.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11 15:42                   ` rob pike, esq.
  2003-09-12  1:18                     ` okamoto
@ 2003-09-12  9:18                     ` Ralph Corderoy
  2003-09-12  9:57                       ` Bruce Ellis
  1 sibling, 1 reply; 42+ messages in thread
From: Ralph Corderoy @ 2003-09-12  9:18 UTC (permalink / raw)
  To: 9fans

Hi rob,

> doug, do you think the C committee ever made a mistake?  your response
> to every comment here is that the C committee got it right and any
> deviation, even hypothetical, is mistaken.

No it isn't.  Doug replies generally give the reasoning behind the C
committee's decision, and point out the restrictions they had to take
into account.  That's not the same thing.

Cheers,


Ralph.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11 14:25                       ` rog
  2003-09-11 15:17                         ` Lucio De Re
@ 2003-09-12  9:18                         ` Douglas A. Gwyn
  1 sibling, 0 replies; 42+ messages in thread
From: Douglas A. Gwyn @ 2003-09-12  9:18 UTC (permalink / raw)
  To: 9fans

rog@vitanuova.com wrote:
> is truncating a 1MB file to 512K due to a mid-stream dud character
> really a better solution than just flagging the dud character itself?
> as you suggest, the answer is subjective, but surely the old maxim is
> useful:
> 	be liberal in what you accept and strict in what you generate.

The trouble is, most programs are data transformers, and
garbage in means garbage out *unless* the garbage is detected
and an appropriate strategy used to deal with it.  What is
appropriate depends on the intended application.

An impossible character encoding in input text data indicates
that the preceding process did not do *its* job.  Naturally
you want to be able to handle such situations.  But merely
ignoring the fact that something is wrong and proceeding with
the computation just propagates the error; it doesn't "fix"
the situation.  It probably doesn't matter much if you're just
hacking, but if something important depends on having correct
data, a better strategy is called for.  I would hate to think
that the codes sent to a respirator or pneumatic brake hadn't
been correctly generated.

I admit that lazy programming is more fun.  So have fun, but
don't think that an approach that has different goals is
therefore inherently deficient.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11 17:04                       ` Lucio De Re
  2003-09-11 17:40                         ` chris
@ 2003-09-12  9:18                         ` Douglas A. Gwyn
  1 sibling, 0 replies; 42+ messages in thread
From: Douglas A. Gwyn @ 2003-09-12  9:18 UTC (permalink / raw)
  To: 9fans

Lucio De Re wrote:
> I think Doug's concerns, like mine, are in laying down rules that
> make it possible to deal with the unexpected programmatically.

Yes; in fact I think handling encoding errors is a fine example
of a situation where a really good exception-handling facility
would be useful.

> What may help considerably is the acknowledgement that different
> needs can be addressed by different solutions.

Indeed, and as C is used for embedded systems that control
almost everything we come in contact with these days, the
natural inclination when providing new facilities is to
make error handling an important component rather than an
afterthought.  It is easy to use {functions that provide
error returns} in a lackadaisical manner; it is hard or
impossible to use {functions that are overly permissive} in
a safe manner.  (Sorry for the braces but there would be
ambiguous parsing without them.)
Followup-To:
Distribution:
Organization: University of Bath Computing Services, UK
Keywords:
Cc:


--
Dennis Davis, BUCS, University of Bath, Bath, BA2 7AY, UK
D.H.Davis@bath.ac.uk


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-11 17:40                         ` chris
  2003-09-12  4:13                           ` boyd, rounin
@ 2003-09-12  9:18                           ` Douglas A. Gwyn
  2003-09-12 15:18                             ` rob pike, esq.
  1 sibling, 1 reply; 42+ messages in thread
From: Douglas A. Gwyn @ 2003-09-12  9:18 UTC (permalink / raw)
  To: 9fans

chris@hollis-locke.com wrote:
> The general purpose library should do the most convenient thing for the
> majority of applications.

We disagree about that, especially for standardized libraries.
The general purpose library needs to be general purpose, not
have some particular preconceived purpose.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-12  9:18                     ` Ralph Corderoy
@ 2003-09-12  9:57                       ` Bruce Ellis
  2003-09-15  8:27                         ` Douglas A. Gwyn
  0 siblings, 1 reply; 42+ messages in thread
From: Bruce Ellis @ 2003-09-12  9:57 UTC (permalink / raw)
  To: 9fans

at their best ANSI-C wide chars were a half-baked idea.  at the worst
they were damaging to programming.  Runes are neither.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-12  9:18                           ` Douglas A. Gwyn
@ 2003-09-12 15:18                             ` rob pike, esq.
  2003-09-12 16:39                               ` rog
  0 siblings, 1 reply; 42+ messages in thread
From: rob pike, esq. @ 2003-09-12 15:18 UTC (permalink / raw)
  To: 9fans

> We disagree about that, especially for standardized libraries.
> The general purpose library needs to be general purpose, not
> have some particular preconceived purpose.

nonsense. you can't design a proper interface without some
idea of how it's going to be used in practice.  interfaces
designed with only generality in mind are cumbrous and
ugly.

-rob



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-12 15:18                             ` rob pike, esq.
@ 2003-09-12 16:39                               ` rog
  0 siblings, 0 replies; 42+ messages in thread
From: rog @ 2003-09-12 16:39 UTC (permalink / raw)
  To: 9fans

rob:
> doug:
> > We disagree about that, especially for standardized libraries.
> > The general purpose library needs to be general purpose, not
> > have some particular preconceived purpose.
>
> nonsense. you can't design a proper interface without some
> idea of how it's going to be used in practice.  interfaces
> designed with only generality in mind are cumbrous and
> ugly.

not to mention that both the interfaces in question are as general as
each other; they differ only in the relative implementation effort
required for certain functionalities.

doug:
> The trouble is, most programs are data transformers, and
> garbage in means garbage out *unless* the garbage is detected
> and an appropriate strategy used to deal with it.

"garbage in, garbage out" seems to imply that "garbage" is all or
nothing.  that's not the case here: the input can be littered with
pieces of garbage, and the question is whether we let one piece of
litter trash the whole input or just its locality.

> What is appropriate depends on the intended application.

given that the error has already occurred, what strategies might be
appropriate?  there aren't many possibilities here.

> But merely ignoring the fact that something is wrong and proceeding
> with the computation just propagates the error; it doesn't "fix" the
> situation.

the situation is not fixable at this point.  the error has already
occurred.  if you're parsing the text, there are likely to be many
other possible lexical analysis errors, and the plan 9 approach means
that a dud utf8 sequence can be treated in the same way without any
special code.

> I would hate to think that the codes sent to a respirator or pneumatic
> brake hadn't been correctly generated.

the input has already been incorrectly generated.

i'd hate to think that a code sent to a respirator or a pneumatic
brake was ignored because an earlier one had a formatting error.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-12  9:57                       ` Bruce Ellis
@ 2003-09-15  8:27                         ` Douglas A. Gwyn
  0 siblings, 0 replies; 42+ messages in thread
From: Douglas A. Gwyn @ 2003-09-15  8:27 UTC (permalink / raw)
  To: 9fans

Bruce Ellis wrote:
> at their best ANSI-C wide chars were a half-baked idea.  at the worst
> they were damaging to programming.  Runes are neither.

Well reasoned.  Runes rule, dude!


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [9fans] ... in the Kingdom of Sources
@ 2003-09-05 12:04 David Presotto
  0 siblings, 0 replies; 42+ messages in thread
From: David Presotto @ 2003-09-05 12:04 UTC (permalink / raw)
  To: 9fans

The real distribution is also newer...


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-05  6:39 lucio
  2003-09-05  6:31 ` Lucio De Re
  2003-09-05 11:30 ` David Presotto
@ 2003-09-05 11:42 ` David Presotto
  2 siblings, 0 replies; 42+ messages in thread
From: David Presotto @ 2003-09-05 11:42 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 43 bytes --]

Arghh!  The link in the web page was wrong.

[-- Attachment #2: Type: message/rfc822, Size: 2136 bytes --]

From: lucio@proxima.alt.za
To: 9fans@cse.psu.edu
Subject: [9fans] ... in the Kingdom of Sources
Date: Fri, 5 Sep 2003 08:39:00 +0200
Message-ID: <3b9b03f355eeb8c2a5290775af1d9673@proxima.alt.za>

I guess this will keep until Russ gets back, but if he happens to be
available he may want to put some effort into getting the daily
belleding edge image rebuilt.  It is now positively stale :-)

++L

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-05  6:39 lucio
  2003-09-05  6:31 ` Lucio De Re
@ 2003-09-05 11:30 ` David Presotto
  2003-09-05 11:42 ` David Presotto
  2 siblings, 0 replies; 42+ messages in thread
From: David Presotto @ 2003-09-05 11:30 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 103 bytes --]

I don't understand, it gets made every night from sources.  The current one is
from 4:35 this morning.

[-- Attachment #2: Type: message/rfc822, Size: 2136 bytes --]

From: lucio@proxima.alt.za
To: 9fans@cse.psu.edu
Subject: [9fans] ... in the Kingdom of Sources
Date: Fri, 5 Sep 2003 08:39:00 +0200
Message-ID: <3b9b03f355eeb8c2a5290775af1d9673@proxima.alt.za>

I guess this will keep until Russ gets back, but if he happens to be
available he may want to put some effort into getting the daily
belleding edge image rebuilt.  It is now positively stale :-)

++L

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [9fans] ... in the Kingdom of Sources
@ 2003-09-05  6:39 lucio
  2003-09-05  6:31 ` Lucio De Re
                   ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: lucio @ 2003-09-05  6:39 UTC (permalink / raw)
  To: 9fans

I guess this will keep until Russ gets back, but if he happens to be
available he may want to put some effort into getting the daily
belleding edge image rebuilt.  It is now positively stale :-)

++L



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [9fans] ... in the Kingdom of Sources
  2003-09-05  6:39 lucio
@ 2003-09-05  6:31 ` Lucio De Re
  2003-09-05 11:30 ` David Presotto
  2003-09-05 11:42 ` David Presotto
  2 siblings, 0 replies; 42+ messages in thread
From: Lucio De Re @ 2003-09-05  6:31 UTC (permalink / raw)
  To: 9fans

On Fri, Sep 05, 2003 at 08:39:00AM +0200, lucio@proxima.alt.za wrote:
>
> I guess this will keep until Russ gets back, but if he happens to be
> available he may want to put some effort into getting the daily
> belleding edge image rebuilt.  It is now positively stale :-)
  ^^^^^^^^^ bleeding - Freudian slip?  No, finger trouble.

++L


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2003-09-15  8:27 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-05 11:54 [9fans] ... in the Kingdom of Sources David Presotto
2003-09-05 12:14 ` Lucio De Re
2003-09-05 12:33   ` boyd, rounin
2003-09-07  9:00     ` Aharon Robbins
2003-09-07  9:05       ` boyd, rounin
2003-09-08  9:10         ` Douglas A. Gwyn
2003-09-08  9:52           ` boyd, rounin
2003-09-08 17:33           ` rob pike, esq.
2003-09-09  8:34             ` Douglas A. Gwyn
2003-09-09 15:50               ` rob pike, esq.
2003-09-09 15:50               ` rob pike, esq.
2003-09-10 10:17               ` Bruce Ellis
2003-09-11  9:07                 ` Douglas A. Gwyn
2003-09-11 13:06                   ` rog
2003-09-11 14:01                     ` Lucio De Re
2003-09-11 14:25                       ` rog
2003-09-11 15:17                         ` Lucio De Re
2003-09-12  9:18                         ` Douglas A. Gwyn
2003-09-11 15:48                     ` rob pike, esq.
2003-09-11 17:04                       ` Lucio De Re
2003-09-11 17:40                         ` chris
2003-09-12  4:13                           ` boyd, rounin
2003-09-12  9:18                           ` Douglas A. Gwyn
2003-09-12 15:18                             ` rob pike, esq.
2003-09-12 16:39                               ` rog
2003-09-12  9:18                         ` Douglas A. Gwyn
2003-09-12  0:56                     ` Bruce Ellis
2003-09-11 15:42                   ` rob pike, esq.
2003-09-12  1:18                     ` okamoto
2003-09-12  9:18                     ` Ralph Corderoy
2003-09-12  9:57                       ` Bruce Ellis
2003-09-15  8:27                         ` Douglas A. Gwyn
2003-09-12  3:53                   ` boyd, rounin
2003-09-12  3:01               ` Dan Cross
2003-09-07 13:48       ` Russ Cox
2003-09-07 15:45         ` David Presotto
2003-09-06 11:13   ` boyd, rounin
  -- strict thread matches above, loose matches on Subject: below --
2003-09-05 12:04 David Presotto
2003-09-05  6:39 lucio
2003-09-05  6:31 ` Lucio De Re
2003-09-05 11:30 ` David Presotto
2003-09-05 11:42 ` David Presotto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).