9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] OT: small xml parser found!
@ 2004-02-17 13:50 steve-simon
  2004-02-18 18:17 ` Roger Flores
  0 siblings, 1 reply; 32+ messages in thread
From: steve-simon @ 2004-02-17 13:50 UTC (permalink / raw)
  To: 9fans

Hi,

Off Topic, but somone was asking for a small xml
parser a while ago, I mentioned expat which a colleague
had used. I have now discovered a much smaller and neater
one for another project.

if you need such a thing google for "lilxml", if you
like I can supply a modified version with better error
messages and which doesn't balk at xmlns:java="java"
constructs.

-Steve


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-17 13:50 [9fans] OT: small xml parser found! steve-simon
@ 2004-02-18 18:17 ` Roger Flores
  2004-02-18 20:43   ` rog
  0 siblings, 1 reply; 32+ messages in thread
From: Roger Flores @ 2004-02-18 18:17 UTC (permalink / raw)
  To: 9fans

I also have a small xml parser (and writer).  Check it out at SourceForge
http://ali.sourceforge.net/.

Not only is it small, but I think you'll find that you need to write less
code to use it because of it's unique API, so it's a double win!

-Roger Flores
roger.flores@pacbell.net

<steve-simon@ntlworld.nospam.com> wrote in message
news:317af9775d5f45567c098141354bcc19@snellwilcox.com...
> Hi,
>
> Off Topic, but somone was asking for a small xml
> parser a while ago, I mentioned expat which a colleague
> had used. I have now discovered a much smaller and neater
> one for another project.
>
> if you need such a thing google for "lilxml", if you
> like I can supply a modified version with better error
> messages and which doesn't balk at xmlns:java="java"
> constructs.
>
> -Steve


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-18 20:43   ` rog
@ 2004-02-18 20:42     ` David Tolpin
  2004-02-19 15:29       ` rog
  2004-02-19 10:28     ` Roger Flores
  1 sibling, 1 reply; 32+ messages in thread
From: David Tolpin @ 2004-02-18 20:42 UTC (permalink / raw)
  To: 9fans

> that said, it would be useful if there was a standard xml library that
> someone had ported to (or written under) plan 9 that met the plan 9
> interface cleanliness standards, for those times when the XML crud
> can't be kicked off the doorstep.

Why isn't Expat suitable?

David


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-18 18:17 ` Roger Flores
@ 2004-02-18 20:43   ` rog
  2004-02-18 20:42     ` David Tolpin
  2004-02-19 10:28     ` Roger Flores
  0 siblings, 2 replies; 32+ messages in thread
From: rog @ 2004-02-18 20:43 UTC (permalink / raw)
  To: 9fans

> Not only is it small, but I think you'll find that you need to write less
> code to use it because of it's unique API, so it's a double win!

erm, forgive me for demurring, but 1898 lines doesn't
strike me as "small", especially given that (from the source):

 * The disadvantages are
 *
 * 2. Not fully XML 1.0 compliant.
 *
 * 4. Incorrect handling of XML formatting, including entities.
 *
 * 6. No namespace support.

not to mention that the source #includes the non-existent "host.h" (i
think "ali_config.h" is intended) and non-ANSI header files,
references undefined types, and generally doesn't give the impression
of stability.

or that it doesn't appear to give any means of accessing element
attributes.

that said, it would be useful if there was a standard xml library that
someone had ported to (or written under) plan 9 that met the plan 9
interface cleanliness standards, for those times when the XML crud
can't be kicked off the doorstep.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-18 20:43   ` rog
  2004-02-18 20:42     ` David Tolpin
@ 2004-02-19 10:28     ` Roger Flores
  1 sibling, 0 replies; 32+ messages in thread
From: Roger Flores @ 2004-02-19 10:28 UTC (permalink / raw)
  To: 9fans

<rog@vitanuova.com> wrote in message
news:1dac8c6094aba121c8f564e3cb3b8142@vitanuova.com...
> not to mention that the source #includes the non-existent "host.h" (i
> think "ali_config.h" is intended)

Ooops.  Yes, ali_config.h, was intended.  The only other header file, ali.h,
is already included.  Host.h is a file I use to allow Ali to compile in Palm
OS apps.


>non-ANSI header files

Hmmm.  Searching around for a list of ANSI header files I see this:
http://www.cplusplus.com/doc/ansi/hfiles.html
I see two headers are not in that list.  malloc.h apparently should be
stdlib.h, so I'll change that.  Thanks.  The other is stdint.h, which is a
C99 standard.  I used to just define my own types (like int32 with no stupid
_t) but the stdint.h types are "standard" and so usually understood.  I
included #defines for the types for those lacking a stdint.h header file.

> references undefined types

I assume these are the just mentioned stdint.h types?


>and generally doesn't give the impression of stability.

The alpha standing on SourceForge is intended to reflect my contentness with
the feature set versus a 1.0 release.  I do not know of any bugs and those
that will be found can be fixed.


> or that it doesn't appear to give any means of accessing element
attributes.

Sure it does.  Just extend the address book example from the web page where
it reads the "id" attribute from the "person" element.  Say you want to read
<phone type="home">1234567</phone>.  Add a line like this to parse_person():

   if (!doc->error) ali_in(doc, personN, "^e%f", 0, "phone", parse_phone);

And then add a function to to parse the phone number element.  A function is
used because the element is complex instead of simple.

static void
parse_phone(ali_doc_info *doc, ali_element_ref phoneN, void * data, bool
new_element)
{
   my_person_struct * person = last_person(data);
   if (!doc->error) ali_in(doc, phoneN, "^a%s", 0, "type",
&person->phone_type);
   if (!doc->error) ali_in(doc, phoneN, "%d", &person->phone_number);
}

I think some might prefer a notation like "^e^a%s%d" instead of creating a
function.  I've added support in Alo for such syntactical sugar but not in
Ali yet.


> erm, forgive me for demurring, but 1898 lines doesn't
> strike me as "small", especially given that (from the source):

I think Ali is "small" compared to other XML parsers.  Can you find another
that is close?  Remember that nothing about XML is small! :)

If you can live with the restrictions and minimal features then you can save
a lot of program size and probably code writing.  If your XML data can come
from anywhere or you need the missing features or don't care about size then
use something like Expat because it will work better for you.  I just find
that my apps' code + Ali + Alo are still smaller than Expat!


Thanks for the comments and let me know if you find anything else.

-Roger Flores


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 15:29       ` rog
@ 2004-02-19 15:27         ` Gorka Guardiola Múzquiz
  2004-02-19 15:31           ` boyd, rounin
  2004-02-19 15:54           ` John Murdie
  2004-02-19 15:27         ` David Tolpin
  1 sibling, 2 replies; 32+ messages in thread
From: Gorka Guardiola Múzquiz @ 2004-02-19 15:27 UTC (permalink / raw)
  To: 9fans

>> Why isn't Expat suitable?
>
> callbacks are horrible.

Reasons?.


					G.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 15:29       ` rog
  2004-02-19 15:27         ` Gorka Guardiola Múzquiz
@ 2004-02-19 15:27         ` David Tolpin
  1 sibling, 0 replies; 32+ messages in thread
From: David Tolpin @ 2004-02-19 15:27 UTC (permalink / raw)
  To: 9fans

> > Why isn't Expat suitable?
> callbacks are horrible.

Horrible for what? Build your own interface on top of Expat,
it's just a few lines. Many systems do just that.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-18 20:42     ` David Tolpin
@ 2004-02-19 15:29       ` rog
  2004-02-19 15:27         ` Gorka Guardiola Múzquiz
  2004-02-19 15:27         ` David Tolpin
  0 siblings, 2 replies; 32+ messages in thread
From: rog @ 2004-02-19 15:29 UTC (permalink / raw)
  To: 9fans

> Why isn't Expat suitable?

callbacks are horrible.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 15:27         ` Gorka Guardiola Múzquiz
@ 2004-02-19 15:31           ` boyd, rounin
  2004-02-19 16:14             ` Rob Pike
  2004-02-19 16:16             ` Rob Pike
  2004-02-19 15:54           ` John Murdie
  1 sibling, 2 replies; 32+ messages in thread
From: boyd, rounin @ 2004-02-19 15:31 UTC (permalink / raw)
  To: 9fans

> callbacks are horrible.

callbacks are fine.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 15:27         ` Gorka Guardiola Múzquiz
  2004-02-19 15:31           ` boyd, rounin
@ 2004-02-19 15:54           ` John Murdie
  2004-02-19 16:14             ` C H Forsyth
                               ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: John Murdie @ 2004-02-19 15:54 UTC (permalink / raw)
  To: 9fans; +Cc: john

On Thu, 2004-02-19 at 15:27, Gorka Guardiola Múzquiz wrote:
> >> Why isn't Expat suitable? 
> > 
> > callbacks are horrible.
> 
> Reasons?.

Callback-(handler)s are interrupt-(handler)s - remember Dijkstra's
original (1962?) dismissal of the idea of writing programs with
interrupt handlers instead of processes as first-class entities. (From
his paper on The THE Multiprogramming System? I forget the reference.)

I'm fond of showing people a paper by a certain D. W. Jones from SIGPLAN
Notices that I think has some bearing on the matter; systems of
interrupt handlers have to maintain explicit state variables, whereas
much equivalent systems coded with processes seem to have fewer, and
Jones simple little paper contrasts the shallow control structures of
each:

> %T How (not) to code a finite state machine
> %X The standard advice for those coding a finite state machine is to
> use a while loop, a case statement, and a state variable. This is
> usually bad advice! The reasons for this are explored and better
> advice is formulated. The examples presented are an interesting test
> of software complexity metrics. All have the same deep control
> structure but they have different shallow control structures
> %K programming, finite state machine, while loop, case statement,
> state variable, software complexity metrics, deep control structure,
> shallow control structures
> %O SIGPLAN Not. (USA)
> %J SIGPLAN Notices
> %A Jones, D.W.
> %V 23
> %N 8
> %V A01
> %P 19-22
> %D Aug. 1988

See some seminar notes "Why threads are a bad idea" (he means processes,
not any particular formulation of threads, if I remember rightly) from
USENIX 1996 by John Ousterhout offer an opposing view, some of which I
can see the sense of http://home.pacbell.net/ouster/threads.pdf.

John A. Murdie
Department of Computer Science
University of York



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 15:31           ` boyd, rounin
@ 2004-02-19 16:14             ` Rob Pike
  2004-02-19 16:16             ` Rob Pike
  1 sibling, 0 replies; 32+ messages in thread
From: Rob Pike @ 2004-02-19 16:14 UTC (permalink / raw)
  To: 9fans

>> callbacks are horrible.
>
> callbacks are fine.

callbacks are horrible.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 15:54           ` John Murdie
@ 2004-02-19 16:14             ` C H Forsyth
  2004-02-19 16:24               ` John Murdie
  2004-02-19 16:17             ` David Tolpin
  2004-02-19 17:15             ` rog
  2 siblings, 1 reply; 32+ messages in thread
From: C H Forsyth @ 2004-02-19 16:14 UTC (permalink / raw)
  To: 9fans

>>USENIX 1996 by John Ousterhout offers an opposing view [of `threads']

only by ignoring nearly all the work on developing and
reasoning about concurrent systems done since (say) 1970!
but then that's not atypical of quite a bit of `modern' programming



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 15:31           ` boyd, rounin
  2004-02-19 16:14             ` Rob Pike
@ 2004-02-19 16:16             ` Rob Pike
  2004-02-19 16:18               ` David Tolpin
  2004-02-19 16:20               ` boyd, rounin
  1 sibling, 2 replies; 32+ messages in thread
From: Rob Pike @ 2004-02-19 16:16 UTC (permalink / raw)
  To: 9fans

>> callbacks are horrible.
>
> callbacks are fine.

ok, now i can say why they're  not fine.  they're pretty close to event
handlers, and i've written what's wrong with that model in several
papers.

callbacks for i/o are particularly egregious.  i've seen callbacks
for certain parsing applications that worked fairly cleanly.  the
difference is that in one case we're avoiding threads and being
asynchronous, while in the other we're just turning case statements
into function calls.

-rob



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 15:54           ` John Murdie
  2004-02-19 16:14             ` C H Forsyth
@ 2004-02-19 16:17             ` David Tolpin
  2004-02-20  2:36               ` boyd, rounin
  2004-02-19 17:15             ` rog
  2 siblings, 1 reply; 32+ messages in thread
From: David Tolpin @ 2004-02-19 16:17 UTC (permalink / raw)
  To: 9fans

> > >> Why isn't Expat suitable?
> > > callbacks are horrible.
> > Reasons?.
>
> Callback-(handler)s are interrupt-(handler)s - remember Dijkstra's
> original (1962?) dismissal of the idea of writing programs with
> interrupt handlers instead of processes as first-class entities. (From
> his paper on The THE Multiprogramming System? I forget the reference.)

This is a misconception. Push API does not demand inequal roles for
the interacting processes.  Define callback handlers to trigger
semaphores instead of calling routines.  This is how it is done
in many systems.

Push APIs can be used equally for either paradigm. And they could
even 42 years ago.

David Tolpin
http://davidashen.net/



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 16:16             ` Rob Pike
@ 2004-02-19 16:18               ` David Tolpin
  2004-02-19 16:20               ` boyd, rounin
  1 sibling, 0 replies; 32+ messages in thread
From: David Tolpin @ 2004-02-19 16:18 UTC (permalink / raw)
  To: 9fans

> ok, now i can say why they're  not fine.  they're pretty close to event
> handlers, and i've written what's wrong with that model in several
> papers.

They are as close to event handlers as to semaphores.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 16:16             ` Rob Pike
  2004-02-19 16:18               ` David Tolpin
@ 2004-02-19 16:20               ` boyd, rounin
  1 sibling, 0 replies; 32+ messages in thread
From: boyd, rounin @ 2004-02-19 16:20 UTC (permalink / raw)
  To: 9fans

Rob Pike wrote:
> callbacks for i/o are particularly egregious.  i've seen callbacks
> for certain parsing applications that worked fairly cleanly.  the
> difference is that in one case we're avoiding threads and being
> asynchronous, while in the other we're just turning case statements
> into function calls.

i'm all for the latter.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 16:14             ` C H Forsyth
@ 2004-02-19 16:24               ` John Murdie
  0 siblings, 0 replies; 32+ messages in thread
From: John Murdie @ 2004-02-19 16:24 UTC (permalink / raw)
  To: 9fans; +Cc: john

On Thu, 2004-02-19 at 16:14, C H Forsyth wrote:
> >>USENIX 1996 by John Ousterhout offers an opposing view [of `threads']
>
> only by ignoring nearly all the work on developing and
> reasoning about concurrent systems done since (say) 1970!
> but then that's not atypical of quite a bit of `modern' programming

Yes, indeed - that's what I thought. The very phrase "event-driven
programming" to describe programs written with callbacks annoys me; why
can't people be honest and say "interrupt-handler programming"? Programs
written e.g. with the CSP formalism "handle" events just as much as
those written with interrupt handlers do!

John A. Murdie
Department of Computer Science
University of York



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 15:54           ` John Murdie
  2004-02-19 16:14             ` C H Forsyth
  2004-02-19 16:17             ` David Tolpin
@ 2004-02-19 17:15             ` rog
  2004-02-19 17:20               ` David Tolpin
  2 siblings, 1 reply; 32+ messages in thread
From: rog @ 2004-02-19 17:15 UTC (permalink / raw)
  To: 9fans

> >> Why isn't Expat suitable?
> >
> > callbacks are horrible.
>
> Reasons?.
>
> Callback-(handler)s are interrupt-(handler)s
> [...]

can i just add this reference:

http://www.codepedia.com/wiki/display.aspx?WikiID=1&pagename=thunks

as an example of a really nasty way (note the machine dependent bit)
of getting around some of the limitations of a callback-based system.

if i was going to use expat, i'd wrap it up with the plan 9 threads
library, so that constructs would arrive on a channel, then at
least things would be marginally more bearable.

but even then, you'd probably want to put a function call interface
around the channel, as otherwise it's really quite awkward
skipping subtrees you don't want to know about.

i wasn't too dissatisfied with the xml(2) interface in inferno,
which looks something like (limbo syntax, i'm afraid):

	open: fn(fd: ref Sys->FD): ref Parser;
	Parser: adt {
		next: fn(p: self ref Parser): ref Item;
		down: fn(p: self ref Parser);
		up: fn(p: self ref Parser);
	};

next() gives you the next item at the current level (nil if we're
at the end of the enclosing block); down() delves into the most recently
returned element, and up() ascends a level.

one advantage of doing it this way is that potentially the parser
can drive how the parsing takes place; the above interface
takes advantage of that by allowing random access into
the XML (you can go back to a place you've previously marked).



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 17:15             ` rog
@ 2004-02-19 17:20               ` David Tolpin
  2004-02-19 17:31                 ` rog
  0 siblings, 1 reply; 32+ messages in thread
From: David Tolpin @ 2004-02-19 17:20 UTC (permalink / raw)
  To: 9fans

>
> one advantage of doing it this way is that potentially the parser
> can drive how the parsing takes place; the above interface
> takes advantage of that by allowing random access into
> the XML (you can go back to a place you've previously marked).
>

So, do you need DOM on top of Expat?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 17:31                 ` rog
@ 2004-02-19 17:30                   ` David Tolpin
  2004-02-19 17:45                     ` rog
  2004-02-19 17:39                   ` C H Forsyth
  1 sibling, 1 reply; 32+ messages in thread
From: David Tolpin @ 2004-02-19 17:30 UTC (permalink / raw)
  To: 9fans

> > So, do you need DOM on top of Expat?
>
> no.
>
> the main point of my implementation was that it only stores a single
> element at a time in memory, unlike DOM, which i believe stores
> the whole document.

DOM is not an implementation. DOM is an interface. up|down|next is
Document Object Model. How much it stores in memory is
an implementation issue. It's on a different level.

David Tolpin


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 17:20               ` David Tolpin
@ 2004-02-19 17:31                 ` rog
  2004-02-19 17:30                   ` David Tolpin
  2004-02-19 17:39                   ` C H Forsyth
  0 siblings, 2 replies; 32+ messages in thread
From: rog @ 2004-02-19 17:31 UTC (permalink / raw)
  To: 9fans

> So, do you need DOM on top of Expat?

no.

the main point of my implementation was that it only stores a single
element at a time in memory, unlike DOM, which i believe stores
the whole document.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 17:31                 ` rog
  2004-02-19 17:30                   ` David Tolpin
@ 2004-02-19 17:39                   ` C H Forsyth
  1 sibling, 0 replies; 32+ messages in thread
From: C H Forsyth @ 2004-02-19 17:39 UTC (permalink / raw)
  To: 9fans

>>element at a time in memory, unlike DOM, which i believe stores
>>the whole document.

not necessarily, but they commonly do.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 17:30                   ` David Tolpin
@ 2004-02-19 17:45                     ` rog
  0 siblings, 0 replies; 32+ messages in thread
From: rog @ 2004-02-19 17:45 UTC (permalink / raw)
  To: 9fans

> DOM is not an implementation. DOM is an interface. up|down|next is
> Document Object Model. How much it stores in memory is
> an implementation issue. It's on a different level.

it's difficult to implement DOM elements like "previousSibling",
without storing the whole document (or at least information
on all elements encountered so far).



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 16:17             ` David Tolpin
@ 2004-02-20  2:36               ` boyd, rounin
  0 siblings, 0 replies; 32+ messages in thread
From: boyd, rounin @ 2004-02-20  2:36 UTC (permalink / raw)
  To: 9fans

> Define callback handlers to trigger semaphores instead of calling
routines.

time to chamber a round ...




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 13:45       ` C H Forsyth
@ 2004-02-20  8:22         ` Martin C.Atkins
  0 siblings, 0 replies; 32+ messages in thread
From: Martin C.Atkins @ 2004-02-20  8:22 UTC (permalink / raw)
  To: 9fans

On Thu, 19 Feb 2004 13:45:23 0000 C H Forsyth <forsyth@vitanuova.com> wrote:
>...
> 	- although it's too late to kill it off, there is some nice work in automata
>	and type systems that can help handle it more reliably

It probably wasn't what you meant, and I haven't looked at it in any great detail
yet, but the new functional/object language, Scala, claims to use extended ML-style
pattern-matching in ways that are useful for XML. The extension is to allow some
forms of regular expressions, it appears.

Refs at: http://scala.epfl.ch/
and in particular, at: http://scala.epfl.ch/intro/regexppat.html

It only runs on .NET or the Java runtime, so isn't immediately useful for Plan 9!

Martin

--
Martin C. Atkins			martin@parvat.com
Parvat Infotech Private Limited		http://www.parvat.com{/,/martin}


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 12:36     ` Dave Lukes
@ 2004-02-19 13:45       ` C H Forsyth
  2004-02-20  8:22         ` Martin C.Atkins
  0 siblings, 1 reply; 32+ messages in thread
From: C H Forsyth @ 2004-02-19 13:45 UTC (permalink / raw)
  To: 9fans

i did some work with an xml fs in Inferno several years ago.
it was intended for (semi-)structured data, with the emphasis on
structure, not a mishmash of textual swill (although it would cope, the result wasn't
obviously useful).

the directory structure resembled mail's, or Inferno's dbfs's: an array of n
directories, numbered 0 ... n-1, corresponding to records,
with subdirectories where there was substructure,
and at any level there were files containing data items and metadata, and so on.
the ctl file set the structuring parameters
(eg, ``structure on <family><member><dog>''), amongst other things.
for experiment, i read in all the XML, then served it, but today i'd have used Inferno's xml(2)
to navigate the structure without reading it (all) in (which is usually what happens with DOM).
i had a program that could pack an XML file into a more efficiently read data structure on file.
the aim was to centralise the parsing and validation, and allow concurrent access (and update)
of the data.

i've got a draft paper about it somewhere.  i did some experiments with a stripped-down
prototype xmlfs, which did less than the paper suggested, and worked on XML compression
and compact validating parsers (using a schema in Xduce notation), but i got so fed up
with XML and the hype surrounding it, when as far as i could see it mainly got in the way,
that i stopped to do more urgent things.

i had a rant here which i thoughtfully removed.

	- it would be useful to have a good representation for the interchange of (semi-)structured data.
	- XML isn't it [that's the gist of the rant], but there it is.
	- the file system representation was useful for what i intended, particularly given the desire for concurrent access
	- originally data items were represented by a file containing that subtree in
	XML but that forced all apps to contain the code to read it,
	so i quickly switched to Xduce notation (which i used for the schema), which doesn't take much
	code to parse.
	- Xduce's use of quoted strings removes the whitespace-handling problems caused
	by XML's heritage from mainframe batch text formatting.  i didn't think to use S expressions but
	today i probably would consider it (they cope with a content quoting problem that XML solves badly)
	- although it's too late to kill it off, there is some nice work in automata and type systems that can
	help handle it more reliably
	- although it's too late to kill it off, it is worthwhile trying to discourage its completely inappropriate use,
	and where it is to be used, it would be helpful to make the representation as uniform as possible



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 11:41   ` matt
@ 2004-02-19 12:36     ` Dave Lukes
  2004-02-19 13:45       ` C H Forsyth
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Lukes @ 2004-02-19 12:36 UTC (permalink / raw)
  To: 9fans

> i had an xmlfs in my mind

I like the idea, but I can't see how you would cleanly handle either
the representation or the ordering.

e.g. if you have something like:

	<p>Here follows <b>some bold text <i>and this is bold-italic</i></b> and this isn't either.</p>

How do you represent it as a set of files and directories?

The only way I can see is to basically do what the XML parsers
out there do and make it into a sort of tree of lists of trees ...

Now you _can_ do it with directories etc.,
but ordering etc. becomes problematic:
you'd need to provide your own sequence numbering in the filenames
unless you want to hack rc ...

>  with anonymous node IDs

What is an "anonymous node ID"?

If you're relying on the majority of nodes having ID attributes
you'll come unstuck on "real" XML,
where IDs may be a special case but are deffo not mandatory.

I was thinking of something similar,
but naming the directories/files after the tag names somehow,
so you could extract all the paragraphs (using du:-).

>  given invalid names such as id="#51" (so that they could have a directory in the tree using their ids)

> so long as it could host xhtml

Not sure what you mean by "host xhtml".
Since xhtml is an xml application,
if it can parse xml, then it can parse xhtml by definition.

> and then one could implement the DOM using shell scripts

One could also write an xslt-to-rc converter (in rc:-).

Any more ideas on xmlfs, anyone?

Cheers,
	Dave.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19 11:04 ` Dave Lukes
@ 2004-02-19 11:41   ` matt
  2004-02-19 12:36     ` Dave Lukes
  0 siblings, 1 reply; 32+ messages in thread
From: matt @ 2004-02-19 11:41 UTC (permalink / raw)
  To: 9fans

i had an xmlfs in my mind with anonymous node IDs given invalid names such as id="#51" (so that they could have a directory in the tree using their ids)


so long as it could host xhtml

and then one could implement the DOM using shell scripts

m



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19  9:59 plan9fans
  2004-02-19 10:13 ` David Tolpin
@ 2004-02-19 11:04 ` Dave Lukes
  2004-02-19 11:41   ` matt
  1 sibling, 1 reply; 32+ messages in thread
From: Dave Lukes @ 2004-02-19 11:04 UTC (permalink / raw)
  To: 9fans

Hmmm ...

There's the germ of an idea beginning to grow here ...

Does anyone else need XML parsing?  (I said _need_ not _want).

Since I know XML, and want to get my mental fingers
into some plan9 programming, this might _possibly_ be a useful
intellectual exercise.

Cheers,
	Dave.


On Thu, 2004-02-19 at 09:59, plan9fans@ntlworld.nospam.com wrote:
> > Why isn't Expat suitable?
>
> Corperate insistance on XML means I may have to read it
> in a small embedded system where "size is everything".
>
> -Steve



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
@ 2004-02-19 10:31 plan9fans
  0 siblings, 0 replies; 32+ messages in thread
From: plan9fans @ 2004-02-19 10:31 UTC (permalink / raw)
  To: 9fans

Hi,

Small is an 8051 with 32k RAM - which doesn't run Plan9,
though the code does compile and run in my debug enviroment
which is.

term% cd expat
term% grep ';' *.c | wc
   3972   16052  180223
term% cd ../lilxml
term% grep ';' *.c | wc
    267    1070    7849

Really, it was just a heads up not a challange to anyone
or their software.

-Steve


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
  2004-02-19  9:59 plan9fans
@ 2004-02-19 10:13 ` David Tolpin
  2004-02-19 11:04 ` Dave Lukes
  1 sibling, 0 replies; 32+ messages in thread
From: David Tolpin @ 2004-02-19 10:13 UTC (permalink / raw)
  To: 9fans

> > Why isn't Expat suitable?
>
> Corperate insistance on XML means I may have to read it
> in a small embedded system where "size is everything".

This sounds cryptic for me. How small is small? Expat is
small, I use it on Palm IIIxe (with some modifications).

Why is it important for Plan9? Does it run on systems smaller
than Palm IIIxe?

David Tolpin


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [9fans] OT: small xml parser found!
@ 2004-02-19  9:59 plan9fans
  2004-02-19 10:13 ` David Tolpin
  2004-02-19 11:04 ` Dave Lukes
  0 siblings, 2 replies; 32+ messages in thread
From: plan9fans @ 2004-02-19  9:59 UTC (permalink / raw)
  To: 9fans

> Why isn't Expat suitable?

Corperate insistance on XML means I may have to read it
in a small embedded system where "size is everything".

-Steve


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2004-02-20  8:22 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-02-17 13:50 [9fans] OT: small xml parser found! steve-simon
2004-02-18 18:17 ` Roger Flores
2004-02-18 20:43   ` rog
2004-02-18 20:42     ` David Tolpin
2004-02-19 15:29       ` rog
2004-02-19 15:27         ` Gorka Guardiola Múzquiz
2004-02-19 15:31           ` boyd, rounin
2004-02-19 16:14             ` Rob Pike
2004-02-19 16:16             ` Rob Pike
2004-02-19 16:18               ` David Tolpin
2004-02-19 16:20               ` boyd, rounin
2004-02-19 15:54           ` John Murdie
2004-02-19 16:14             ` C H Forsyth
2004-02-19 16:24               ` John Murdie
2004-02-19 16:17             ` David Tolpin
2004-02-20  2:36               ` boyd, rounin
2004-02-19 17:15             ` rog
2004-02-19 17:20               ` David Tolpin
2004-02-19 17:31                 ` rog
2004-02-19 17:30                   ` David Tolpin
2004-02-19 17:45                     ` rog
2004-02-19 17:39                   ` C H Forsyth
2004-02-19 15:27         ` David Tolpin
2004-02-19 10:28     ` Roger Flores
2004-02-19  9:59 plan9fans
2004-02-19 10:13 ` David Tolpin
2004-02-19 11:04 ` Dave Lukes
2004-02-19 11:41   ` matt
2004-02-19 12:36     ` Dave Lukes
2004-02-19 13:45       ` C H Forsyth
2004-02-20  8:22         ` Martin C.Atkins
2004-02-19 10:31 plan9fans

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).