From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 25953 invoked from network); 19 Oct 2022 08:14:17 -0000 Received: from hurricane.the-brannons.com (2602:ff06:725:1:20::25) by inbox.vuxu.org with ESMTPUTF8; 19 Oct 2022 08:14:17 -0000 Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by hurricane.the-brannons.com (OpenSMTPD) with ESMTP id d96bbaf7 for ; Wed, 19 Oct 2022 01:14:13 -0700 (PDT) Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [2a00:1450:4864:20::32c]) by hurricane.the-brannons.com (OpenSMTPD) with ESMTPS id 91e15a4e (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Wed, 19 Oct 2022 01:14:07 -0700 (PDT) Received: by mail-wm1-x32c.google.com with SMTP id r8-20020a1c4408000000b003c47d5fd475so16962815wma.3 for ; Wed, 19 Oct 2022 01:14:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ZgM1j4T+T/FO6GyKLkqYyB4Wx+hAYo+8xME4vhwSEVI=; b=ndVgJ1tG4uAYCAQ9MzdPXkocOp9ned4PiYRxt9TgQ8N4dP2EYgIxIEygqHwKF8sExu cCFC8N1lvpO1XeTNfafPnhE5AMpcpU24ykMp+PNO3tDDMHx3aV+M/4ZM0OgIsegU5Jy7 aMSnnsFSPXc2626UP/xFaSii0dlvEE0AN1xAvVFtFmSv2zmoXzuxgO2T3vndo7DL1Wqd nY/1loBbXVWkLG6KHotw1tpqDMFg8Qhzrebazr8KIJMQmeQQodvfjHWJgcJWrnTQLIIs OoanzADHFkb3jGzbSd7lEEtB+vmgXmzPQKPkaPS82CZjn5ssfNVpjRqg0iN1uZ+7S16W YDpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ZgM1j4T+T/FO6GyKLkqYyB4Wx+hAYo+8xME4vhwSEVI=; b=ChumbZ2sRPifhEQ1tiLTJivT4gJTyrjCVQrI6qWYFHtOiBBWzNNhKkfYcSKbkgVAuE PP6wU8/y5T6/mwgb8tQ1MIcMHrUginM39OjWvkauX5z3I4IxOxLUMBnOoh3RXYd2LSqP NhAVs5PsdE8j4vpUdLXlCiaY5ogwXhIuaD+9B5HMswAmaCTn1kddzEP+DALD9BmyWXh3 07OfBzqUqYrBjVKC7vEcyOAuuTU5HM+6cOMD6VwvWtT/HI7yY/Cr5XiOy/4GBsF8hDZo 0QpJlMih6P4fajlVLoDmWcQyW5Ji5KiCpmIACqWoay3UmOkzBtTc6EQRzAuo/5/4+hZ0 H4tw== X-Gm-Message-State: ACrzQf1t6QlhrkjU46dWBUqPDRhszircfkBeExdKCXelzVWCrC/Zahai bFNsAJNbVVYraZ3nQ/HjASE= X-Google-Smtp-Source: AMsMyM6DBuJk10/1PWa8xAJX1gh+viPxywA0hLLf5pBbxPgpfrri0Fz+TOAUQZRHRXJz6F0oXHHhlg== X-Received: by 2002:a05:600c:35ce:b0:3c6:809a:b5c3 with SMTP id r14-20020a05600c35ce00b003c6809ab5c3mr4550884wmq.206.1666167245481; Wed, 19 Oct 2022 01:14:05 -0700 (PDT) Received: from pinebook-pro (8.f.6.7.4.5.2.5.4.5.a.b.8.5.e.b.1.4.0.9.2.4.1.1.0.b.8.0.1.0.0.2.ip6.arpa. [2001:8b0:1142:9041:be58:ba54:5254:76f8]) by smtp.gmail.com with ESMTPSA id d9-20020a05600c4c0900b003a4efb794d7sm14888405wmp.36.2022.10.19.01.14.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Oct 2022 01:14:04 -0700 (PDT) Date: Wed, 19 Oct 2022 09:14:03 +0100 From: Adam Thompson To: Karl Dahlke Cc: edbrowse-dev@edbrowse.org Subject: Re: I don't know shit about xml Message-ID: References: <20220912185105.eklhad@comcast.net> <20220912203237.eklhad@comcast.net> X-BeenThere: edbrowse-dev@edbrowse.org List-Id: Edbrowse Development List MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220912203237.eklhad@comcast.net> On Wed, Oct 12, 2022 at 08:32:37PM -0400, Karl Dahlke wrote: > The scanners have huge overlap, and I expect only minor differences, so > should keep it as one function. All the tag cracking and attribute cracking > and &element; cracking and building the tree it's all the same. I suspect > html came first and xml was a direct generalization, by throwing away the > semantics. For sure one was very quickly on the heels of the other. I appreciate I'm a little late to this discussion but I think (and some quick research seems to confirm this) that they're both subsets of SGML. To be more specific, XML is readable by a generic SGML parser whilst some SGML (i.e. some HTML constructs) will generate errors in XML parsers. In addition, as previously noted, XML has no inherent semantics whereas HTML most definitely does. To add some more confusion, an attempt was made to apply XML strictness to HTML called as XHTML. This was, as far as I remember, the thing for a while until HTML5 came along which (I think) went back to the pure SGML basis of HTML. Also, as previously noted, there's all the non-standard (and probably incorrect in SGML though I've not bothered to read the generic standard) garbage which people wrote (and continue to write) and browsers somehow turn into something sane. As such, I expect there to be quite a bit of overlap and the current direction seems to make sense. In fact, there are other parsers which have XML and HTML modes (and not just those used in browsers). Cheers, Adam.