From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dan Cross <cross@math.psu.edu>
Message-Id: <200101102332.SAA28475@augusta.math.psu.edu>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Typesetting
In-Reply-To: <20010110201239.3ECB619A40@mail.cse.psu.edu>
Cc: 
Date: Wed, 10 Jan 2001 18:32:55 -0500
Topicbox-Message-UUID: 4b603f34-eac9-11e9-9e20-41e7f4b1d025

In article <20010110201239.3ECB619A40@mail.cse.psu.edu> you write:
>I agree that the web has become ugly. How would you
>change it? I think I would reduce it a bit to something
>more content oriented and less driven to "appeal".

I've used a lot of emails to folks as sounding boards for some random
ideas, but unfortunately, it all sounds like kind of incoherent
rambling, as I'm sure this note does.  :-)

In general, I think that the distribution mechanism is all wrong, as is
the fact that there's no decent way to categorize or present the data.
The following facts about goal of the web are relevant:

	+ The web is about information sharing.
	+ The web is content driven.
	+ Most web pages are specific to a given topic area.
	+ The data is often meant to be *used* (by that, I mean
	  not just telling me where the muffler on my car is,
	  but manipulated by me).

So, some things that really don't make sense are:

	+ The distribution protocol is based on file transfer, not sharing.
	+ The markup language doesn't preserve the semantics of the content.
	+ The Markup language doesn't provide a good way to present the data.
	+ There is no built in ordering to the data.
	+ The browser model is all wrong; it doesn't integrate cleanly into
	  the rest of the environment, and effectively prevents me from
	  manipulating the content.

So these are the problems that I think need to be addressed first.  You
can probably see where I'm going with this, but, here goes:

	+ Replace the distribution protocol with a distributed
	  filesystem; something similar to AFS.

		- Instead of having web servers, have file servers.
		- I should never have to talk to more than one file
		  server to get anywhere on the web; kinda like DNS.
		- File servers should cache data using a mechanism
		  similar to that of DNS; that is, each file
		  should have as part of it's metadata a ``time to
		  live'' detailing how long another server may cache
		  the file.  It could use an LRU mechanism to keep
		  the cache size reasonable.  Whole file caching is
		  fine.
		- File servers should provide a network-enabled ``named
		  pipe'' like mechanism to provide interactive services.

	  This simplifies a lot of stuff.  First of all, the
	  scalability problems of current-generation web servers
	  go away.  I don't need a farm of high powered boxes to
	  serve out content; I just need a few file servers.
	  This is kind of like what Akami (sic) et al attempt to do.

	  Second, the ``session handling'' problem goes away for
	  interactive services.  A session is active as long as
	  I have one of these ``named pipe'' like files open.  It
	  goes away when I close the file.

	  It provides a mechanism for built-in proxies, since a
	  client only ever talks to a local file server.  If I
	  need a proxy for some reason, I can just interject a
	  file server between my clients and the rest of the ``web.''
	  I don't have to worry about configuring my firewall to
	  allow everyone's desktop machine to access every web
	  server in the world.  Instead, I just have a single
	  caching file server in my DMZ or outside my firewall
	  that the desktops talk to.

	  The hierarchial organization of the filesystem namespace
	  allows me to easily categorize content.  I can also use
	  this to restrict access to information; if I authenticate
	  to the filesystem, then I can say things like, ``if user
	  is not in acl, don't let him/her see /foo....''  This
	  could be useful for blocking the rest of the world from
	  my internal information, or for blocking the kids from
	  things they shouldn't see.

	+ The next major problem is content markup and presentation.
	  I haven't figured out too much about that yet.  Most
	  content needs something a little more, umm, attractive
	  than plain text to be popular, but it's also important
	  to preserve information about content.  For instance,
	  ``this is a telephone number.''  XML tries to do this,
	  and I think does okay, but it imposes a rigid structure
	  on the data.  That's kind of unfortunate, since it doesn't
	  integrate well with text processing tools like grep et al.

	  Perhaps structured regular expressions and some kind of
	  metalanguage could help out the content structure part,
	  but the markup part is still unsolved.  Perhaps another
	  metalanguage derived from structured regular expressions
	  could help here.  Either that, or treat everything as an
	  object with a ``render'' method, almost what XML does.

Well, that's basically it, sorry it's rather rambling.  There are a
lot of open issues, like authentication and privacy (both of which
are afterthoughts on the web, but must be integral from the beginning
of a new system), etc, but I'd rather solve them at the file server
level than at the application level.

	- Dan C.