From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/114993 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Thangalin via ntg-context Newsgroups: gmane.comp.tex.context Subject: Re: ignore not closed tags in XML input Date: Wed, 18 May 2022 10:14:27 -0700 Message-ID: References: <996C11BF-338C-4764-8A65-00B544EBF391@ziggo.nl> <40642198-7105-4F8C-8897-C85F59B37D73@bittext.nl> Reply-To: mailing list for ConTeXt users Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8876659989223259145==" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37611"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Thangalin To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Wed May 18 19:15:34 2022 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane-mx.org Original-Received: from zapf.boekplan.nl ([5.39.185.232] helo=zapf.ntg.nl) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nrNGo-0009aM-4V for gctc-ntg-context-518@m.gmane-mx.org; Wed, 18 May 2022 19:15:34 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 05C89289E32; Wed, 18 May 2022 19:14:46 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id I_xt6hihWujP; Wed, 18 May 2022 19:14:43 +0200 (CEST) Original-Received: from zapf.ntg.nl (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 44761289E34; Wed, 18 May 2022 19:14:43 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 8829D289E34 for ; Wed, 18 May 2022 19:14:41 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id K-9H1JlrRf84 for ; Wed, 18 May 2022 19:14:40 +0200 (CEST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.208.177; helo=mail-lj1-f177.google.com; envelope-from=thangalin@gmail.com; receiver= Original-Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by zapf.ntg.nl (Postfix) with ESMTPS id 2B3C2289E32 for ; Wed, 18 May 2022 19:14:40 +0200 (CEST) Original-Received: by mail-lj1-f177.google.com with SMTP id bx33so3280043ljb.12 for ; Wed, 18 May 2022 10:14:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=NmiZ2/OhDeud+HYYMsdgH9w4doAnKeuUaShNSt7sndw=; b=JqD75J2h66EYdcVMuNra+5M8DtRub9Kt8MOX7b8QT7v9OmOLB6jE5hnPy0bqZsAStB vjsavZQPoqjmFMFIvstCMdb1Qcu2MMKhmqNszbE7L3k0hVytDpW5aLm6XuYqc1djFWEH hMyrPr900XXdxeR1i7JKrzfCn5JTKQgPNr8kt4E7PVpi7n9Z2KiAbpLhcIUx6mMeHMQn /GuK1AhJ4JKgN5sr1o1EerNgOP6NedtGl7u1l0hzd2gVwZwm5gToFtDzpAXRVSyR/Nbx zgswF+CGVZAv7F30H57nT8yP4uP0FrACtx1juNZS01eWAOp83c9g7jxGSxjnDWoukI7L vFbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=NmiZ2/OhDeud+HYYMsdgH9w4doAnKeuUaShNSt7sndw=; b=sA66RBPjUh4w3433l4P803I44CPFnyVIFK/x7B85jX3QV9M1u4c4CVxwzQ2Ts8HJIr Yb8uM7LxTQoGAG5/m/bQx6/bmeGdHMm9NNGEJjMlLgTpKZQ3WiLUGilKvVcZ8y1BlCB0 Vbr8o2ivpHzwhI5dpv1T3BbzpzwY4PkFNuRa70xU5AGRNLOMYDuYKT8n6x2Erj0wYQO9 xvbTwQCoxqXKEcU2CmbvN5P8hEIyDcLIvmyV91B3POqZz272e6hTuG3W17RWsA5ZYjTK eYjvyRyC/aDHJ+20RpJ5jyv4n8TXdPpy86RPU81dRuPOvGFJUgK6WJ7ZkTql2YytbJ3C kXPg== X-Gm-Message-State: AOAM532ZLdGkhGkTetDbUA71iDIMELNJUXg3z+u0v8pfWd65QumL+bdv NbqUhM4/J5SqRGssMDmNUXwxCje7sbP3TejbVxXzBGUiRj4= X-Google-Smtp-Source: ABdhPJyHHA0kjp8BcC/kYcfnYJvB6NUbkLGDC/5u1yE8+X8Ns2NARLcqrAxOa7UzZrSINO22oYMnm/eXGs863s7W7b0= X-Received: by 2002:a05:651c:616:b0:253:b99f:9650 with SMTP id k22-20020a05651c061600b00253b99f9650mr238476lje.8.1652894078254; Wed, 18 May 2022 10:14:38 -0700 (PDT) In-Reply-To: X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.26 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: "ntg-context" Xref: news.gmane.io gmane.comp.tex.context:114993 Archived-At: --===============8876659989223259145== Content-Type: multipart/alternative; boundary="00000000000062c11f05df4c645d" --00000000000062c11f05df4c645d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey Pablo, > One of the not irrelevant tasks for me is finding examples of XML code. To clarify, XHTML documents *are* XML documents. XHTML happens to use a standardized set of XML element and attribute names. All XHTML examples are also XML examples. > But my worries came from having to sanitize HTML sources (which aren=E2= =80=99t That was discussed in the blog post: finding a source of well-formed XHTML documents. There are a number of tools to sanitize HTML, as mentioned in the thread. KeenWrite uses the Java-based JSoup library https://jsoup.org/ to sanitize HTML and then create an XHTML version. All the best! --00000000000062c11f05df4c645d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hey= Pablo,

>=20 One of the not irrelevant tasks for me is finding examples of XML code.

To clarify, XHTML documents are XML documents. XHTML happens= to use a standardized set of XML element and attribute names. All XHTML ex= amples are also XML examples.

>=20 But my worries came from having to sanitize HTML sources (which aren=E2=80= =99t

=
That was discussed i= n the blog post: finding a source of well-formed XHTML documents. There are= a number of tools to sanitize HTML, as mentioned in the thread. KeenWrite = uses the Java-based JSoup library https://js= oup.org/ to sanitize HTML and then create an XHTML version.

All the best!
--00000000000062c11f05df4c645d-- --===============8876659989223259145== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18KSWYgeW91ciBxdWVzdGlvbiBpcyBvZiBpbnRlcmVz dCB0byBvdGhlcnMgYXMgd2VsbCwgcGxlYXNlIGFkZCBhbiBlbnRyeSB0byB0aGUgV2lraSEKCm1h aWxsaXN0IDogbnRnLWNvbnRleHRAbnRnLm5sIC8gaHR0cDovL3d3dy5udGcubmwvbWFpbG1hbi9s aXN0aW5mby9udGctY29udGV4dAp3ZWJwYWdlICA6IGh0dHA6Ly93d3cucHJhZ21hLWFkZS5ubCAv IGh0dHA6Ly9jb250ZXh0LmFhbmhldC5uZXQKYXJjaGl2ZSAgOiBodHRwczovL2JpdGJ1Y2tldC5v cmcvcGhnL2NvbnRleHQtbWlycm9yL2NvbW1pdHMvCndpa2kgICAgIDogaHR0cDovL2NvbnRleHRn YXJkZW4ubmV0Cl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCg== --===============8876659989223259145==--