From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <254e6411a483bf4a605691ce52824ab3@vitanuova.com>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] OT: small xml parser found!
From: C H Forsyth <forsyth@vitanuova.com>
In-Reply-To: <1077194211.30893.200.camel@zevon>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Date: Thu, 19 Feb 2004 13:45:23 +0000
Topicbox-Message-UUID: eda0e6ba-eacc-11e9-9e20-41e7f4b1d025

i did some work with an xml fs in Inferno several years ago.
it was intended for (semi-)structured data, with the emphasis on
structure, not a mishmash of textual swill (although it would cope, the result wasn't
obviously useful).

the directory structure resembled mail's, or Inferno's dbfs's: an array of n
directories, numbered 0 ... n-1, corresponding to records,
with subdirectories where there was substructure,
and at any level there were files containing data items and metadata, and so on.
the ctl file set the structuring parameters
(eg, ``structure on <family><member><dog>''), amongst other things.
for experiment, i read in all the XML, then served it, but today i'd have used Inferno's xml(2)
to navigate the structure without reading it (all) in (which is usually what happens with DOM).
i had a program that could pack an XML file into a more efficiently read data structure on file.
the aim was to centralise the parsing and validation, and allow concurrent access (and update)
of the data.

i've got a draft paper about it somewhere.  i did some experiments with a stripped-down
prototype xmlfs, which did less than the paper suggested, and worked on XML compression
and compact validating parsers (using a schema in Xduce notation), but i got so fed up
with XML and the hype surrounding it, when as far as i could see it mainly got in the way,
that i stopped to do more urgent things.

i had a rant here which i thoughtfully removed.

	- it would be useful to have a good representation for the interchange of (semi-)structured data.
	- XML isn't it [that's the gist of the rant], but there it is.
	- the file system representation was useful for what i intended, particularly given the desire for concurrent access
	- originally data items were represented by a file containing that subtree in
	XML but that forced all apps to contain the code to read it,
	so i quickly switched to Xduce notation (which i used for the schema), which doesn't take much
	code to parse.
	- Xduce's use of quoted strings removes the whitespace-handling problems caused
	by XML's heritage from mainframe batch text formatting.  i didn't think to use S expressions but
	today i probably would consider it (they cope with a content quoting problem that XML solves badly)
	- although it's too late to kill it off, there is some nice work in automata and type systems that can
	help handle it more reliably
	- although it's too late to kill it off, it is worthwhile trying to discourage its completely inappropriate use,
	and where it is to be used, it would be helpful to make the representation as uniform as possible