From mboxrd@z Thu Jan  1 00:00:00 1970
Message-Id: <200307190345.h6J3je727417@augusta.math.psu.edu>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] don't shoot me
In-Reply-To: Your message of "Fri, 18 Jul 2003 20:12:24 PDT."
             <d721bb5de9a85ccc79001f9a04c79a3a@collyer.net>
From: Dan Cross <cross@math.psu.edu>
Date: Fri, 18 Jul 2003 23:45:40 -0400
Topicbox-Message-UUID: fcc7b05c-eacb-11e9-9e20-41e7f4b1d025

> > Yes, but with some tree structure and naming too.
>
> I haven't used XML, but tree structure seems like a simple thing to
> provide, for example with indentation as Plan 9 prototype files and
> Python do:
>
> /
> 	sys
> 		man
> 			1
> 				cat
> 		doc
> 			fonts
> 			9.ms
>
> Obviously each line could contain more than a simple file name
> component for general data representation.

I guess the one cool think about XML that you don't get with something
like this is a way to name the data meaningfully and naturally.  With
XML, you can write something like the following:

<?xml version="1.0"?>
<address>
  <street>123 Main Street</street>
  <city>Cherry Hill</city>
  <state>New Jersey</state>
  <postcode>12345</postcode>
  <country>USA</country>
</address>

Now, looking beyond the hideous syntax for a moment, one thing jumps
out: all the data is clearly labelled.  I know that ``123 Main Street''
is a street address, and I can extract street addresses by name,
instead of relying on it being in some conventional place in the record
(ie, ``the first text field in each record is the street address'').
Is that useful?  Sometimes yes, more often no.  But when it's needed,
it's really needed and is indispensible.

I contend the only real advantage of XML over other representations is
that it forces data to be labelled (you can do this with sexp's, but
it's not mandatory).  Of course, 9 times out of 10, the labelling
sucks and tells you nothing.  The corresponding sexp might look something
like the following, btw:

(address
  (street "123 Main Street")
  (city "Cherry Hill")
  (state "New Jersey")
  (postcode "12345")
  (country "USA"))

But the following:

("123 Main Street" ("Cherry Hill" "New Jersey") ("12345" "USA"))

is also valid.  Unfortunately, the latter example doesn't preserve any
metainformation about the data; we as humans can look at this and say,
``oh, that looks like an address; 12345 is probably a zip code.''  But
a computer has no idea, and I have no way to tell it other than by
position.  But what if I decide to add another field between the street
address and City/State tuple?  Say, an apartment number field?  All of
a sudden, my position-based extraction logic fails.  At least with XML,
that isn't a problem (in theory, anyway; like I said, the labelling can
be totally bonheaded and meaningless).

Scott Schwartz once proposed using LaTeX syntax for describing data
in the same way one uses XML.  It was a good idea, and we'd end up
with something like:

\begin{address}
  \street{123 Main Street}
  \city{Cherry Hill}
  \state{New Jersey}
  \postcode{12345}
  \country{USA}
\end{address}

Of course, that made too much sense and thus never caught on.  It
would have been a lot cleaner and more compact than using XML
syntax, though.

Oh well.  Like anything else, XML has its place, but it's been shoe
horned into 80 billion different places it doesn't belong.

	- Dan C.