edbrowse-dev - development list for edbrowse
 help / color / mirror / Atom feed
From: Adam Thompson <arthompson1990@gmail.com>
To: Karl Dahlke <eklhad@comcast.net>
Cc: edbrowse-dev@edbrowse.org
Subject: Re: I don't know shit about xml
Date: Wed, 19 Oct 2022 09:14:03 +0100	[thread overview]
Message-ID: <Y0+xywrV6WaHygZw@pinebook-pro> (raw)
In-Reply-To: <20220912203237.eklhad@comcast.net>

On Wed, Oct 12, 2022 at 08:32:37PM -0400, Karl Dahlke wrote:
> The scanners have huge overlap, and I expect only minor differences, so
> should keep it as one function. All the tag cracking and attribute cracking
> and &element; cracking and building the tree it's all the same. I suspect
> html came first and xml was a direct generalization, by throwing away the
> semantics. For sure one was very quickly on the heels of the other.

I appreciate I'm a little late to this discussion but I think (and some
quick research seems to confirm this) that they're both subsets of SGML. To
be more specific, XML is readable by a generic SGML parser whilst some SGML
(i.e. some HTML constructs) will generate errors in XML parsers. In
addition, as previously noted, XML has no inherent semantics whereas HTML
most definitely does.

To add some more confusion, an attempt was made to apply XML strictness to
HTML called as XHTML. This was, as far as I remember, the thing for a while
until HTML5 came along which (I think) went back to the pure SGML basis of

Also, as previously noted, there's all the non-standard (and probably
incorrect in SGML though I've not bothered to read the generic standard)
garbage which people wrote (and continue to write) and browsers somehow turn
into something sane.

As such, I expect there to be quite a bit of overlap and the current
direction seems to make sense. In fact, there are other parsers which have
XML and HTML modes (and not just those used in browsers).


  reply	other threads:[~2022-10-19  8:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12 22:51 Karl Dahlke
2022-10-13  0:08 ` Dominique Martinet
2022-10-13  0:32   ` Karl Dahlke
2022-10-19  8:14     ` Adam Thompson [this message]
2022-10-19  9:13       ` Karl Dahlke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0+xywrV6WaHygZw@pinebook-pro \
    --to=arthompson1990@gmail.com \
    --cc=edbrowse-dev@edbrowse.org \
    --cc=eklhad@comcast.net \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).