From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.sgregoratto.me (mail.sgregoratto.me [149.28.166.45]) by fantadrom.bsd.lv (OpenSMTPD) with ESMTP id b1699a12 for ; Fri, 29 Mar 2019 19:19:23 -0500 (EST) Received: from mail.sgregoratto.me (localhost [127.0.0.1]) by mail.sgregoratto.me (Postfix) with ESMTP id B6D7B3E8D4 for ; Sat, 30 Mar 2019 11:19:20 +1100 (AEDT) Authentication-Results: mail.sgregoratto.me (amavisd-new); dkim=pass (1024-bit key) reason="pass (just generated, assumed good)" header.d=sgregoratto.me DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=sgregoratto.me; h=user-agent:content-disposition:content-type:content-type :mime-version:message-id:subject:subject:to:from:from:date:date; s=dkim; t=1553905160; x=1556497161; bh=MLwr/5OTtEhRp/doSCGPqfi/ TSdhfqn/xZ0ySuFV9xI=; b=OxfyT5+zk1yHIzdkmROHftl0exrt3yZz/WxYzyEx pPnuTWL4cuXwFy/WuxznfB/AJB2fIEYRSjiWUTLI2WYjg7IzDyMqTJK5TIjvMvfF 0WO3Eiaqu0Yb99RIZvJY534vdKZ9CS4RfYErG+W+W07BDrFALLXGUONxGUkw8G2f lSY= X-Virus-Scanned: Debian amavisd-new at mail.sgregoratto.me Received: from mail.sgregoratto.me ([127.0.0.1]) by mail.sgregoratto.me (mail.sgregoratto.me [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Jfijgj0bYutE for ; Sat, 30 Mar 2019 11:19:20 +1100 (AEDT) Received: from localhost (172.44.179.58.sta.dodo.net.au [58.179.44.172]) by mail.sgregoratto.me (Postfix) with ESMTPSA id 1B8173E82E for ; Sat, 30 Mar 2019 11:19:20 +1100 (AEDT) Date: Sat, 30 Mar 2019 11:19:19 +1100 From: Stephen Gregoratto To: tech@mandoc.bsd.lv Subject: Parsing errors, output regressions with new XML parser Message-ID: <20190330001919.rrbc2xxrx47upalg@BlackBox> Mail-Followup-To: tech@mandoc.bsd.lv X-Mailinglist: mandoc-tech Reply-To: tech@mandoc.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: NeoMutt/20180716 Ingo, I see you've been working hard on ripping out libexpat from docbook2mdoc. While this should simplify development, I do have some problems with the new parser: - XML comments aren't ignored. This leads to documents like these[1] being formatted as one loooong section under NAME. - escaped XML chars aren't converted back into ASCII: xdg-email 'Jeremy White <jwhite@example.com>' EXAMPLES xdg-email 'Jeremy White <jwhite@example.com>' - There are regressions in how and nodes are transformed. The example I pointed out previously: Joe Bloggs joe@foo.net Now converts to: .Dd $Mdocdate$ .Dt UNKNOWN 1 .Os .Sh AUTHORS .Nm foo is maintained by .An \&Joe Bloggs , .Aq Mt joe@foo.net \&. Another regression is that closing delimiters are put on separate lines. This leads to SEE ALSO sections like this[2] being formatted like so: .Sh \&SEE ALSO .Xr man 7 , .Xr mdoc 7 , .Xr ms 7 , .Xr me 7 , .Xr mm 7 , .Xr mwww 7 , .Xr troff 1 \&. I noticed in a previous email you've begun working on a regression test suite of sorts. I could probably submit a couple examples of my own so these errors don't crop up again. - entities are not expanded. Some documents, like xmllint[3], will declare an ENTITY in the DTD. A solution here would be to use a tool like xmllint to expand the entities into their full versions like so: xmllint --noent xmllint.xml | docbook2mdoc > xmllint.1 That should be it for the parser stuff for now. I've been playing around with the new statistics program and I should release some data on that soon. I've been working on a git repo in which projects that use DocBook are added as submodules. What I'm doing now is that I'll "clean" the files with xmllint (using options --loaddtd --noent --nocdata --nsclean --dropdtd --format) and then run statistics over them. Also, I noticed that cvsweb was down for most of yesterday. Scheduled maintenance? [1] https://gitlab.gnome.org/GNOME/gtk/blob/master/docs/reference/gtk/css-overview.xml#L20 [2] https://gitlab.com/esr/doclifter/blob/master/doclifter.xml#L988 [3] https://gitlab.gnome.org/GNOME/libxml2/raw/master/doc/xmllint.xml -- Stephen Gregoratto PGP: 3FC6 3D0E 2801 C348 1C44 2D34 A80C 0F8E 8BAB EC8B -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv