From: Stephen Gregoratto <dev@sgregoratto.me> To: tech@mandoc.bsd.lv Subject: Parsing errors, output regressions with new XML parser Date: Sat, 30 Mar 2019 11:19:19 +1100 [thread overview] Message-ID: <20190330001919.rrbc2xxrx47upalg@BlackBox> (raw) Ingo, I see you've been working hard on ripping out libexpat from docbook2mdoc. While this should simplify development, I do have some problems with the new parser: - XML comments aren't ignored. This leads to documents like these[1] being formatted as one loooong section under NAME. - escaped XML chars aren't converted back into ASCII: <programlisting> xdg-email 'Jeremy White <jwhite@example.com>' </programlisting> EXAMPLES xdg-email 'Jeremy White <jwhite@example.com>' - There are regressions in how <author> and <citerefentry> nodes are transformed. The example I pointed out previously: <author> <personname> <firstname>Joe</firstname> <surname>Bloggs</surname> </personname> <email>joe@foo.net</email> </author> Now converts to: .Dd $Mdocdate$ .Dt UNKNOWN 1 .Os .Sh AUTHORS .Nm foo is maintained by .An \&Joe Bloggs , .Aq Mt joe@foo.net \&. Another regression is that closing delimiters are put on separate lines. This leads to SEE ALSO sections like this[2] being formatted like so: .Sh \&SEE ALSO .Xr man 7 , .Xr mdoc 7 , .Xr ms 7 , .Xr me 7 , .Xr mm 7 , .Xr mwww 7 , .Xr troff 1 \&. I noticed in a previous email you've begun working on a regression test suite of sorts. I could probably submit a couple examples of my own so these errors don't crop up again. - entities are not expanded. Some documents, like xmllint[3], will declare an ENTITY in the DTD. A solution here would be to use a tool like xmllint to expand the entities into their full versions like so: xmllint --noent xmllint.xml | docbook2mdoc > xmllint.1 That should be it for the parser stuff for now. I've been playing around with the new statistics program and I should release some data on that soon. I've been working on a git repo in which projects that use DocBook are added as submodules. What I'm doing now is that I'll "clean" the files with xmllint (using options --loaddtd --noent --nocdata --nsclean --dropdtd --format) and then run statistics over them. Also, I noticed that cvsweb was down for most of yesterday. Scheduled maintenance? [1] https://gitlab.gnome.org/GNOME/gtk/blob/master/docs/reference/gtk/css-overview.xml#L20 [2] https://gitlab.com/esr/doclifter/blob/master/doclifter.xml#L988 [3] https://gitlab.gnome.org/GNOME/libxml2/raw/master/doc/xmllint.xml -- Stephen Gregoratto PGP: 3FC6 3D0E 2801 C348 1C44 2D34 A80C 0F8E 8BAB EC8B -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
next reply other threads:[~2019-03-30 0:19 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-03-30 0:19 Stephen Gregoratto [this message] 2019-04-02 13:16 ` Ingo Schwarze 2019-04-02 16:02 ` Ingo Schwarze 2019-04-02 16:50 ` Ingo Schwarze 2019-04-02 17:20 ` Ingo Schwarze 2019-04-02 17:48 ` Ingo Schwarze
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190330001919.rrbc2xxrx47upalg@BlackBox \ --to=dev@sgregoratto.me \ --cc=tech@mandoc.bsd.lv \ --subject='Re: Parsing errors, output regressions with new XML parser' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).