ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Pablo Rodriguez via ntg-context <ntg-context@ntg.nl>
To: Thangalin via ntg-context <ntg-context@ntg.nl>
Cc: Pablo Rodriguez <oinos@gmx.es>
Subject: Re: ignore not closed tags in XML input
Date: Sat, 21 May 2022 19:01:54 +0200	[thread overview]
Message-ID: <4f69282d-a613-286e-3681-82814a56c20d@gmx.es> (raw)
In-Reply-To: <CAANrE7ri9OOkM4nMx=Q3XjEKePpboAGkivas8Yyud0m13Stuug@mail.gmail.com>

On 5/18/22 19:14, Thangalin via ntg-context wrote:
> Hey Pablo,
>
>> One of the not irrelevant tasks for me is finding examples of XML code.
>
> To clarify, XHTML documents /are/ XML documents. XHTML happens to use a
> standardized set of XML element and attribute names. All XHTML examples
> are also XML examples.

Hi Dave,

many thanks for the explanation.

>> But my worries came from having to sanitize HTML sources (which aren’t
>
> That was discussed in the blog post: finding a source of well-formed
> XHTML documents. There are a number of tools to sanitize HTML, as
> mentioned in the thread. KeenWrite uses the Java-based JSoup library
> https://jsoup.org/ <https://jsoup.org/> to sanitize HTML and then create
> an XHTML version.

After dealing with other (X)HTML sources, I have experienced that not
few of them contain sloppy encoded data (as Taco pointed out).

There are even some mismatches that xmllint doesn’t solve automatically
(as Taco already mentioned too).

Now I understand that I will have also to curate tidy XML sources to
typeset them with ConTeXt.

Many thanks for your help again,

Pablo
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

  reply	other threads:[~2022-05-21 17:01 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-16 15:08 Pablo Rodriguez via ntg-context
2022-05-16 15:22 ` mf via ntg-context
2022-05-16 16:37   ` Pablo Rodriguez via ntg-context
2022-05-16 15:30 ` Hans van der Meer via ntg-context
2022-05-16 16:50   ` Pablo Rodriguez via ntg-context
2022-05-16 18:13     ` Taco Hoekwater via ntg-context
2022-05-17 16:36       ` Pablo Rodriguez via ntg-context
2022-05-18  1:23         ` Thangalin via ntg-context
2022-05-18 16:00           ` Pablo Rodriguez via ntg-context
2022-05-18 17:14             ` Thangalin via ntg-context
2022-05-21 17:01               ` Pablo Rodriguez via ntg-context [this message]
2022-05-18 22:09             ` Bruce Horrocks via ntg-context
2022-05-21 17:28               ` Pablo Rodriguez via ntg-context
2022-05-19 15:33             ` juh via ntg-context
2022-05-21 18:23               ` Pablo Rodriguez via ntg-context

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4f69282d-a613-286e-3681-82814a56c20d@gmx.es \
    --to=ntg-context@ntg.nl \
    --cc=oinos@gmx.es \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).