From mboxrd@z Thu Jan  1 00:00:00 1970
Message-Id: <200101120031.AAA22053@whitecrow.demon.co.uk>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Typesetting 
In-reply-to: Your message of "Wed, 10 Jan 2001 18:32:55 EST."
             <200101102332.SAA28475@augusta.math.psu.edu> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
From: Steve Kilbane <steve@whitecrow.demon.co.uk>
Date: Fri, 12 Jan 2001 00:31:03 +0000
Topicbox-Message-UUID: 4bf47b7c-eac9-11e9-9e20-41e7f4b1d025

> In article <20010110201239.3ECB619A40@mail.cse.psu.edu> Dan wrote:
> The following facts about goal of the web are relevant:
> 
> 	+ The web is about information sharing.

Alas, no longer. The web is now about selling something, or
looking "cool". If it was just about sharing information, most
of the current problems wouldn't be nearly so bad. However,
that's a sociological problem, and a technical fix won't help.

> 	+ Most web pages are specific to a given topic area.
> 	+ The data is often meant to be *used* (by that, I mean
> 	  not just telling me where the muffler on my car is,
> 	  but manipulated by me).

In other words, it's got internal structure that's more than just
pixels on the screen, so it should be handled as such.

> So, some things that really don't make sense are:
> 
> 	+ The distribution protocol is based on file transfer, not sharing.

You're linking the use of the data with the retrieval of the data. I don't
think the two are particularly related. The key, I feel, is how you
identify what data is to be fetched next. How you actually get it isn't
really interesting.

> 	+ The markup language doesn't preserve the semantics of the content.
> 	+ The Markup language doesn't provide a good way to present the data.

Both true.

> 	+ There is no built in ordering to the data.

That's arguable, so I think you'd better clarify it.

> 	+ The browser model is all wrong; it doesn't integrate cleanly into
> 	  the rest of the environment, and effectively prevents me from
> 	  manipulating the content.

But that's never going to change, at least until Microsoft (or any other
company) achieve their goal of being the universal platform. While there
is heterogeny, you'll need some way to insulate the data from the destination
platform's weirdness.

> So these are the problems that I think need to be addressed first.  You
> can probably see where I'm going with this, but, here goes:
> 
> 	+ Replace the distribution protocol with a distributed
> 	  filesystem; something similar to AFS. [...]
> 	  This simplifies a lot of stuff.

It does, but unfortunately, it simplifies the stuff that happens to
be simple to begin with. Filesystems are well understood, and so
is data transfer in general. HTTP was a bad start to begin with,
and we've only still got it now because it had a solid foothold.
A side point, though: DNS contains much less information than
a web server. Heavily-accessed sites will still need big systems
because the intermediate nodes on the net can only cache so much,
so many accesses will still make it back to the source machine.

> 	  The hierarchial organization of the filesystem namespace
> 	  allows me to easily categorize content.

I'm sorry to say this, but no chance. Absolutely none. For any
arbitrary information storage system, you can't come up with a
hierarchy that makes sense to more than one segment of the user-base
(unless you count /everything). Different users see things in different
ways, and so need a different hierarchy. Worse, it changes depending
on what they're looking for.

As a simple example, the unbiquitous FAQ: a document ideally written
by an expert, for a non-expert reader. The author and the target
reader have different views of the same information, and would probably
like it presented differently. A tutorial is structured differently from
a reference guide, and that's just the tip of the iceberg.

> 	+ The next major problem is content markup and presentation.
> 	  I haven't figured out too much about that yet.

You and most webmasters. :-)

> Most
> 	  content needs something a little more, umm, attractive
> 	  than plain text to be popular,

Which takes me back to the original point about what the focus of
the web is, nowadays. As it happens, I agree with the masses here,
albeit for different reasons. The commercial sites want pages that
look snazzy, whereas I want pages that get the information into my
brain in the fastest possible way, in a manner I understand. If this
means images, animations, etc, then fine - but only if that's the
best way. It also probably needs an expert in visual aids to pull it
off, and that's a rare talent.

> but it's also important
> 	  to preserve information about content.  For instance,
> 	  ``this is a telephone number.''  XML tries to do this,
> 	  and I think does okay, but it imposes a rigid structure
> 	  on the data.  That's kind of unfortunate, since it doesn't
> 	  integrate well with text processing tools like grep et al.

But then, how do you grep a 3d model? If you want your information
to be more than unstructured text, you need different data manipulation
tools.

I realise I've been purely negative here, but I don't have any constructive
comments to make. I worked on the sort of thing you're after (a "knowledge
management system" - ick), and I didn't gain much in the way of answers.
Mainly, I got an appreciation of how hard the generic problem is.

steve