From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail.sgregoratto.me (mail.sgregoratto.me [149.28.166.45])
	by fantadrom.bsd.lv (OpenSMTPD) with ESMTP id b1699a12
	for <tech@mandoc.bsd.lv>;
	Fri, 29 Mar 2019 19:19:23 -0500 (EST)
Received: from mail.sgregoratto.me (localhost [127.0.0.1])
	by mail.sgregoratto.me (Postfix) with ESMTP id B6D7B3E8D4
	for <tech@mandoc.bsd.lv>; Sat, 30 Mar 2019 11:19:20 +1100 (AEDT)
Authentication-Results: mail.sgregoratto.me (amavisd-new);
	dkim=pass (1024-bit key) reason="pass (just generated, assumed good)"
	header.d=sgregoratto.me
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=sgregoratto.me;
	 h=user-agent:content-disposition:content-type:content-type
	:mime-version:message-id:subject:subject:to:from:from:date:date;
	 s=dkim; t=1553905160; x=1556497161; bh=MLwr/5OTtEhRp/doSCGPqfi/
	TSdhfqn/xZ0ySuFV9xI=; b=OxfyT5+zk1yHIzdkmROHftl0exrt3yZz/WxYzyEx
	pPnuTWL4cuXwFy/WuxznfB/AJB2fIEYRSjiWUTLI2WYjg7IzDyMqTJK5TIjvMvfF
	0WO3Eiaqu0Yb99RIZvJY534vdKZ9CS4RfYErG+W+W07BDrFALLXGUONxGUkw8G2f
	lSY=
X-Virus-Scanned: Debian amavisd-new at mail.sgregoratto.me
Received: from mail.sgregoratto.me ([127.0.0.1])
	by mail.sgregoratto.me (mail.sgregoratto.me [127.0.0.1]) (amavisd-new, port 10026)
	with ESMTP id Jfijgj0bYutE for <tech@mandoc.bsd.lv>;
	Sat, 30 Mar 2019 11:19:20 +1100 (AEDT)
Received: from localhost (172.44.179.58.sta.dodo.net.au [58.179.44.172])
	by mail.sgregoratto.me (Postfix) with ESMTPSA id 1B8173E82E
	for <tech@mandoc.bsd.lv>; Sat, 30 Mar 2019 11:19:20 +1100 (AEDT)
Date: Sat, 30 Mar 2019 11:19:19 +1100
From: Stephen Gregoratto <dev@sgregoratto.me>
To: tech@mandoc.bsd.lv
Subject: Parsing errors, output regressions with new XML parser
Message-ID: <20190330001919.rrbc2xxrx47upalg@BlackBox>
Mail-Followup-To: tech@mandoc.bsd.lv
X-Mailinglist: mandoc-tech
Reply-To: tech@mandoc.bsd.lv
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
User-Agent: NeoMutt/20180716

Ingo,

I see you've been working hard on ripping out libexpat from 
docbook2mdoc. While this should simplify development, I do have some 
problems with the new parser:

-  XML comments aren't ignored. This leads to documents like these[1] 
   being formatted as one loooong section under NAME.

-  escaped XML chars aren't converted back into ASCII:

  <programlisting>
  xdg-email 'Jeremy White &lt;jwhite@example.com&gt;'
  </programlisting>

  EXAMPLES
     xdg-email 'Jeremy White &lt;jwhite@example.com&gt;'


-  There are regressions in how <author> and <citerefentry>
   nodes are transformed. The example I pointed out previously:

  <author>
    <personname>
      <firstname>Joe</firstname>
      <surname>Bloggs</surname>
    </personname>
    <email>joe@foo.net</email>
  </author>

  Now converts to:

  .Dd $Mdocdate$
  .Dt UNKNOWN 1
  .Os
  .Sh AUTHORS
  .Nm foo
  is maintained by
  .An \&Joe Bloggs ,
  .Aq Mt joe@foo.net
  \&.

  Another regression is that closing delimiters are put on separate 
  lines. This leads to SEE ALSO sections like this[2] being formatted 
  like so:

  .Sh \&SEE ALSO
  .Xr man 7
  ,
  .Xr mdoc 7
  ,
  .Xr ms 7
  ,
  .Xr me 7
  ,
  .Xr mm 7
  ,
  .Xr mwww 7
  ,
  .Xr troff 1
  \&.

  I noticed in a previous email you've begun working on a regression 
  test suite of sorts. I could probably submit a couple examples of my 
  own so these errors don't crop up again.

- entities are not expanded. Some documents, like xmllint[3], will 
  declare an ENTITY in the DTD. A solution here would be to use a tool 
  like xmllint to expand the entities into their full versions like so:

  xmllint --noent xmllint.xml | docbook2mdoc > xmllint.1

That should be it for the parser stuff for now. I've been playing around 
with the new statistics program and I should release some data on that 
soon. I've been working on a git repo in which projects that use DocBook 
are added as submodules. What I'm doing now is that I'll "clean" the 
files with xmllint (using options --loaddtd --noent --nocdata --nsclean 
--dropdtd --format) and then run statistics over them.

Also, I noticed that cvsweb was down for most of yesterday. Scheduled 
maintenance?

[1] https://gitlab.gnome.org/GNOME/gtk/blob/master/docs/reference/gtk/css-overview.xml#L20
[2] https://gitlab.com/esr/doclifter/blob/master/doclifter.xml#L988
[3] https://gitlab.gnome.org/GNOME/libxml2/raw/master/doc/xmllint.xml
-- 
Stephen Gregoratto
PGP: 3FC6 3D0E 2801 C348 1C44 2D34 A80C 0F8E 8BAB EC8B
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv