9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: C H Forsyth <forsyth@vitanuova.com>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] OT: small xml parser found!
Date: Thu, 19 Feb 2004 13:45:23 +0000	[thread overview]
Message-ID: <254e6411a483bf4a605691ce52824ab3@vitanuova.com> (raw)
In-Reply-To: <1077194211.30893.200.camel@zevon>

i did some work with an xml fs in Inferno several years ago.
it was intended for (semi-)structured data, with the emphasis on
structure, not a mishmash of textual swill (although it would cope, the result wasn't
obviously useful).

the directory structure resembled mail's, or Inferno's dbfs's: an array of n
directories, numbered 0 ... n-1, corresponding to records,
with subdirectories where there was substructure,
and at any level there were files containing data items and metadata, and so on.
the ctl file set the structuring parameters
(eg, ``structure on <family><member><dog>''), amongst other things.
for experiment, i read in all the XML, then served it, but today i'd have used Inferno's xml(2)
to navigate the structure without reading it (all) in (which is usually what happens with DOM).
i had a program that could pack an XML file into a more efficiently read data structure on file.
the aim was to centralise the parsing and validation, and allow concurrent access (and update)
of the data.

i've got a draft paper about it somewhere.  i did some experiments with a stripped-down
prototype xmlfs, which did less than the paper suggested, and worked on XML compression
and compact validating parsers (using a schema in Xduce notation), but i got so fed up
with XML and the hype surrounding it, when as far as i could see it mainly got in the way,
that i stopped to do more urgent things.

i had a rant here which i thoughtfully removed.

	- it would be useful to have a good representation for the interchange of (semi-)structured data.
	- XML isn't it [that's the gist of the rant], but there it is.
	- the file system representation was useful for what i intended, particularly given the desire for concurrent access
	- originally data items were represented by a file containing that subtree in
	XML but that forced all apps to contain the code to read it,
	so i quickly switched to Xduce notation (which i used for the schema), which doesn't take much
	code to parse.
	- Xduce's use of quoted strings removes the whitespace-handling problems caused
	by XML's heritage from mainframe batch text formatting.  i didn't think to use S expressions but
	today i probably would consider it (they cope with a content quoting problem that XML solves badly)
	- although it's too late to kill it off, there is some nice work in automata and type systems that can
	help handle it more reliably
	- although it's too late to kill it off, it is worthwhile trying to discourage its completely inappropriate use,
	and where it is to be used, it would be helpful to make the representation as uniform as possible



  reply	other threads:[~2004-02-19 13:45 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-19  9:59 plan9fans
2004-02-19 10:13 ` David Tolpin
2004-02-19 11:04 ` Dave Lukes
2004-02-19 11:41   ` matt
2004-02-19 12:36     ` Dave Lukes
2004-02-19 13:45       ` C H Forsyth [this message]
2004-02-20  8:22         ` Martin C.Atkins
  -- strict thread matches above, loose matches on Subject: below --
2004-02-19 10:31 plan9fans
2004-02-17 13:50 steve-simon
2004-02-18 18:17 ` Roger Flores
2004-02-18 20:43   ` rog
2004-02-18 20:42     ` David Tolpin
2004-02-19 15:29       ` rog
2004-02-19 15:27         ` Gorka Guardiola Múzquiz
2004-02-19 15:31           ` boyd, rounin
2004-02-19 16:14             ` Rob Pike
2004-02-19 16:16             ` Rob Pike
2004-02-19 16:18               ` David Tolpin
2004-02-19 16:20               ` boyd, rounin
2004-02-19 15:54           ` John Murdie
2004-02-19 16:14             ` C H Forsyth
2004-02-19 16:24               ` John Murdie
2004-02-19 16:17             ` David Tolpin
2004-02-20  2:36               ` boyd, rounin
2004-02-19 17:15             ` rog
2004-02-19 17:20               ` David Tolpin
2004-02-19 17:31                 ` rog
2004-02-19 17:30                   ` David Tolpin
2004-02-19 17:45                     ` rog
2004-02-19 17:39                   ` C H Forsyth
2004-02-19 15:27         ` David Tolpin
2004-02-19 10:28     ` Roger Flores

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=254e6411a483bf4a605691ce52824ab3@vitanuova.com \
    --to=forsyth@vitanuova.com \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).