tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* Parsing errors, output regressions with new XML parser
@ 2019-03-30  0:19 Stephen Gregoratto
  2019-04-02 13:16 ` Ingo Schwarze
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Stephen Gregoratto @ 2019-03-30  0:19 UTC (permalink / raw)
  To: tech

Ingo,

I see you've been working hard on ripping out libexpat from 
docbook2mdoc. While this should simplify development, I do have some 
problems with the new parser:

-  XML comments aren't ignored. This leads to documents like these[1] 
   being formatted as one loooong section under NAME.

-  escaped XML chars aren't converted back into ASCII:

  <programlisting>
  xdg-email 'Jeremy White &lt;jwhite@example.com&gt;'
  </programlisting>

  EXAMPLES
     xdg-email 'Jeremy White &lt;jwhite@example.com&gt;'


-  There are regressions in how <author> and <citerefentry>
   nodes are transformed. The example I pointed out previously:

  <author>
    <personname>
      <firstname>Joe</firstname>
      <surname>Bloggs</surname>
    </personname>
    <email>joe@foo.net</email>
  </author>

  Now converts to:

  .Dd $Mdocdate$
  .Dt UNKNOWN 1
  .Os
  .Sh AUTHORS
  .Nm foo
  is maintained by
  .An \&Joe Bloggs ,
  .Aq Mt joe@foo.net
  \&.

  Another regression is that closing delimiters are put on separate 
  lines. This leads to SEE ALSO sections like this[2] being formatted 
  like so:

  .Sh \&SEE ALSO
  .Xr man 7
  ,
  .Xr mdoc 7
  ,
  .Xr ms 7
  ,
  .Xr me 7
  ,
  .Xr mm 7
  ,
  .Xr mwww 7
  ,
  .Xr troff 1
  \&.

  I noticed in a previous email you've begun working on a regression 
  test suite of sorts. I could probably submit a couple examples of my 
  own so these errors don't crop up again.

- entities are not expanded. Some documents, like xmllint[3], will 
  declare an ENTITY in the DTD. A solution here would be to use a tool 
  like xmllint to expand the entities into their full versions like so:

  xmllint --noent xmllint.xml | docbook2mdoc > xmllint.1

That should be it for the parser stuff for now. I've been playing around 
with the new statistics program and I should release some data on that 
soon. I've been working on a git repo in which projects that use DocBook 
are added as submodules. What I'm doing now is that I'll "clean" the 
files with xmllint (using options --loaddtd --noent --nocdata --nsclean 
--dropdtd --format) and then run statistics over them.

Also, I noticed that cvsweb was down for most of yesterday. Scheduled 
maintenance?

[1] https://gitlab.gnome.org/GNOME/gtk/blob/master/docs/reference/gtk/css-overview.xml#L20
[2] https://gitlab.com/esr/doclifter/blob/master/doclifter.xml#L988
[3] https://gitlab.gnome.org/GNOME/libxml2/raw/master/doc/xmllint.xml
-- 
Stephen Gregoratto
PGP: 3FC6 3D0E 2801 C348 1C44 2D34 A80C 0F8E 8BAB EC8B
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-04-02 17:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-30  0:19 Parsing errors, output regressions with new XML parser Stephen Gregoratto
2019-04-02 13:16 ` Ingo Schwarze
2019-04-02 16:02 ` Ingo Schwarze
2019-04-02 16:50 ` Ingo Schwarze
2019-04-02 17:20 ` Ingo Schwarze
2019-04-02 17:48 ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).