help / color / mirror / Atom feed
From: Stephen Gregoratto <>
Subject: Parsing errors, output regressions with new XML parser
Date: Sat, 30 Mar 2019 11:19:19 +1100	[thread overview]
Message-ID: <20190330001919.rrbc2xxrx47upalg@BlackBox> (raw)


I see you've been working hard on ripping out libexpat from 
docbook2mdoc. While this should simplify development, I do have some 
problems with the new parser:

-  XML comments aren't ignored. This leads to documents like these[1] 
   being formatted as one loooong section under NAME.

-  escaped XML chars aren't converted back into ASCII:

  xdg-email 'Jeremy White &lt;;'

     xdg-email 'Jeremy White &lt;;'

-  There are regressions in how <author> and <citerefentry>
   nodes are transformed. The example I pointed out previously:


  Now converts to:

  .Dd $Mdocdate$
  .Nm foo
  is maintained by
  .An \&Joe Bloggs ,
  .Aq Mt

  Another regression is that closing delimiters are put on separate 
  lines. This leads to SEE ALSO sections like this[2] being formatted 
  like so:

  .Sh \&SEE ALSO
  .Xr man 7
  .Xr mdoc 7
  .Xr ms 7
  .Xr me 7
  .Xr mm 7
  .Xr mwww 7
  .Xr troff 1

  I noticed in a previous email you've begun working on a regression 
  test suite of sorts. I could probably submit a couple examples of my 
  own so these errors don't crop up again.

- entities are not expanded. Some documents, like xmllint[3], will 
  declare an ENTITY in the DTD. A solution here would be to use a tool 
  like xmllint to expand the entities into their full versions like so:

  xmllint --noent xmllint.xml | docbook2mdoc > xmllint.1

That should be it for the parser stuff for now. I've been playing around 
with the new statistics program and I should release some data on that 
soon. I've been working on a git repo in which projects that use DocBook 
are added as submodules. What I'm doing now is that I'll "clean" the 
files with xmllint (using options --loaddtd --noent --nocdata --nsclean 
--dropdtd --format) and then run statistics over them.

Also, I noticed that cvsweb was down for most of yesterday. Scheduled 

Stephen Gregoratto
PGP: 3FC6 3D0E 2801 C348 1C44 2D34 A80C 0F8E 8BAB EC8B
 To unsubscribe send an email to

             reply	other threads:[~2019-03-30  0:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-30  0:19 Stephen Gregoratto [this message]
2019-04-02 13:16 ` Ingo Schwarze
2019-04-02 16:02 ` Ingo Schwarze
2019-04-02 16:50 ` Ingo Schwarze
2019-04-02 17:20 ` Ingo Schwarze
2019-04-02 17:48 ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190330001919.rrbc2xxrx47upalg@BlackBox \ \ \
    --subject='Re: Parsing errors, output regressions with new XML parser' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).