From: Stephen Gregoratto <dev@sgregoratto.me>
To: tech@mandoc.bsd.lv
Subject: Parsing errors, output regressions with new XML parser
Date: Sat, 30 Mar 2019 11:19:19 +1100 [thread overview]
Message-ID: <20190330001919.rrbc2xxrx47upalg@BlackBox> (raw)
Ingo,
I see you've been working hard on ripping out libexpat from
docbook2mdoc. While this should simplify development, I do have some
problems with the new parser:
- XML comments aren't ignored. This leads to documents like these[1]
being formatted as one loooong section under NAME.
- escaped XML chars aren't converted back into ASCII:
<programlisting>
xdg-email 'Jeremy White <jwhite@example.com>'
</programlisting>
EXAMPLES
xdg-email 'Jeremy White <jwhite@example.com>'
- There are regressions in how <author> and <citerefentry>
nodes are transformed. The example I pointed out previously:
<author>
<personname>
<firstname>Joe</firstname>
<surname>Bloggs</surname>
</personname>
<email>joe@foo.net</email>
</author>
Now converts to:
.Dd $Mdocdate$
.Dt UNKNOWN 1
.Os
.Sh AUTHORS
.Nm foo
is maintained by
.An \&Joe Bloggs ,
.Aq Mt joe@foo.net
\&.
Another regression is that closing delimiters are put on separate
lines. This leads to SEE ALSO sections like this[2] being formatted
like so:
.Sh \&SEE ALSO
.Xr man 7
,
.Xr mdoc 7
,
.Xr ms 7
,
.Xr me 7
,
.Xr mm 7
,
.Xr mwww 7
,
.Xr troff 1
\&.
I noticed in a previous email you've begun working on a regression
test suite of sorts. I could probably submit a couple examples of my
own so these errors don't crop up again.
- entities are not expanded. Some documents, like xmllint[3], will
declare an ENTITY in the DTD. A solution here would be to use a tool
like xmllint to expand the entities into their full versions like so:
xmllint --noent xmllint.xml | docbook2mdoc > xmllint.1
That should be it for the parser stuff for now. I've been playing around
with the new statistics program and I should release some data on that
soon. I've been working on a git repo in which projects that use DocBook
are added as submodules. What I'm doing now is that I'll "clean" the
files with xmllint (using options --loaddtd --noent --nocdata --nsclean
--dropdtd --format) and then run statistics over them.
Also, I noticed that cvsweb was down for most of yesterday. Scheduled
maintenance?
[1] https://gitlab.gnome.org/GNOME/gtk/blob/master/docs/reference/gtk/css-overview.xml#L20
[2] https://gitlab.com/esr/doclifter/blob/master/doclifter.xml#L988
[3] https://gitlab.gnome.org/GNOME/libxml2/raw/master/doc/xmllint.xml
--
Stephen Gregoratto
PGP: 3FC6 3D0E 2801 C348 1C44 2D34 A80C 0F8E 8BAB EC8B
--
To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv
next reply other threads:[~2019-03-30 0:19 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-30 0:19 Stephen Gregoratto [this message]
2019-04-02 13:16 ` Ingo Schwarze
2019-04-02 16:02 ` Ingo Schwarze
2019-04-02 16:50 ` Ingo Schwarze
2019-04-02 17:20 ` Ingo Schwarze
2019-04-02 17:48 ` Ingo Schwarze
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190330001919.rrbc2xxrx47upalg@BlackBox \
--to=dev@sgregoratto.me \
--cc=tech@mandoc.bsd.lv \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).