tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Stephen Gregoratto <dev@sgregoratto.me>
To: tech@mandoc.bsd.lv
Subject: Parsing errors, output regressions with new XML parser
Date: Sat, 30 Mar 2019 11:19:19 +1100	[thread overview]
Message-ID: <20190330001919.rrbc2xxrx47upalg@BlackBox> (raw)

Ingo,

I see you've been working hard on ripping out libexpat from 
docbook2mdoc. While this should simplify development, I do have some 
problems with the new parser:

-  XML comments aren't ignored. This leads to documents like these[1] 
   being formatted as one loooong section under NAME.

-  escaped XML chars aren't converted back into ASCII:

  <programlisting>
  xdg-email 'Jeremy White &lt;jwhite@example.com&gt;'
  </programlisting>

  EXAMPLES
     xdg-email 'Jeremy White &lt;jwhite@example.com&gt;'


-  There are regressions in how <author> and <citerefentry>
   nodes are transformed. The example I pointed out previously:

  <author>
    <personname>
      <firstname>Joe</firstname>
      <surname>Bloggs</surname>
    </personname>
    <email>joe@foo.net</email>
  </author>

  Now converts to:

  .Dd $Mdocdate$
  .Dt UNKNOWN 1
  .Os
  .Sh AUTHORS
  .Nm foo
  is maintained by
  .An \&Joe Bloggs ,
  .Aq Mt joe@foo.net
  \&.

  Another regression is that closing delimiters are put on separate 
  lines. This leads to SEE ALSO sections like this[2] being formatted 
  like so:

  .Sh \&SEE ALSO
  .Xr man 7
  ,
  .Xr mdoc 7
  ,
  .Xr ms 7
  ,
  .Xr me 7
  ,
  .Xr mm 7
  ,
  .Xr mwww 7
  ,
  .Xr troff 1
  \&.

  I noticed in a previous email you've begun working on a regression 
  test suite of sorts. I could probably submit a couple examples of my 
  own so these errors don't crop up again.

- entities are not expanded. Some documents, like xmllint[3], will 
  declare an ENTITY in the DTD. A solution here would be to use a tool 
  like xmllint to expand the entities into their full versions like so:

  xmllint --noent xmllint.xml | docbook2mdoc > xmllint.1

That should be it for the parser stuff for now. I've been playing around 
with the new statistics program and I should release some data on that 
soon. I've been working on a git repo in which projects that use DocBook 
are added as submodules. What I'm doing now is that I'll "clean" the 
files with xmllint (using options --loaddtd --noent --nocdata --nsclean 
--dropdtd --format) and then run statistics over them.

Also, I noticed that cvsweb was down for most of yesterday. Scheduled 
maintenance?

[1] https://gitlab.gnome.org/GNOME/gtk/blob/master/docs/reference/gtk/css-overview.xml#L20
[2] https://gitlab.com/esr/doclifter/blob/master/doclifter.xml#L988
[3] https://gitlab.gnome.org/GNOME/libxml2/raw/master/doc/xmllint.xml
-- 
Stephen Gregoratto
PGP: 3FC6 3D0E 2801 C348 1C44 2D34 A80C 0F8E 8BAB EC8B
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv

             reply	other threads:[~2019-03-30  0:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-30  0:19 Stephen Gregoratto [this message]
2019-04-02 13:16 ` Ingo Schwarze
2019-04-02 16:02 ` Ingo Schwarze
2019-04-02 16:50 ` Ingo Schwarze
2019-04-02 17:20 ` Ingo Schwarze
2019-04-02 17:48 ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190330001919.rrbc2xxrx47upalg@BlackBox \
    --to=dev@sgregoratto.me \
    --cc=tech@mandoc.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).