tech@mandoc.bsd.lv
 help / color / Atom feed
* [PATCH docbook2mdoc] Add NODE_CONTRIB, NODE_PRODUCTNAME
@ 2019-03-27  3:52 Stephen Gregoratto
  2019-03-28 17:19 ` Ingo Schwarze
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen Gregoratto @ 2019-03-27  3:52 UTC (permalink / raw)
  To: tech

Here are some nodes I found in the wild.

Technically <productname> is used to markup text as 
copyright/trademark/registered, but the Systemd and GTK manuals use it 
to set the "source" argument of TH in man(7). We have no need for it.

<contrib> is a bit odd case. It classifies an <author>'s contributions 
to the program/manual/code etc. Here's an example from GTK:

  <authorgroup>
    <author>
      <contrib>Developer</contrib>
      <firstname>Matthias</firstname>
      <surname>Clasen</surname>
    </author>
  </authorgroup>

I'm setting it to be skipped for now, but if we ever start recording
<author>s we could probably include this.
-- 
Stephen Gregoratto
PGP: 3FC6 3D0E 2801 C348 1C44 2D34 A80C 0F8E 8BAB EC8B


Index: node.h
===================================================================
RCS file: /cvs/docbook2mdoc/node.h,v
retrieving revision 1.4
diff -u -p -r1.4 node.h
--- node.h	26 Mar 2019 22:39:33 -0000	1.4
+++ node.h	27 Mar 2019 03:28:02 -0000
@@ -46,6 +46,7 @@ enum	nodeid {
 	NODE_COLSPEC,
 	NODE_COMMAND,
 	NODE_CONSTANT,
+	NODE_CONTRIB,
 	NODE_COPYRIGHT,
 	NODE_DATE,
 	NODE_EDITOR,
@@ -98,6 +99,7 @@ enum	nodeid {
 	NODE_PARAMETER,
 	NODE_PERSONNAME,
 	NODE_PREFACE,
+	NODE_PRODUCTNAME,
 	NODE_PROGRAMLISTING,
 	NODE_PROMPT,
 	NODE_QUOTE,
Index: parse.c
===================================================================
RCS file: /cvs/docbook2mdoc/parse.c,v
retrieving revision 1.4
diff -u -p -r1.4 parse.c
--- parse.c	26 Mar 2019 22:39:33 -0000	1.4
+++ parse.c	27 Mar 2019 03:28:02 -0000
@@ -64,6 +64,7 @@ static	const struct element elements[] =
 	{ "citetitle",		NODE_CITETITLE },
 	{ "cmdsynopsis",	NODE_CMDSYNOPSIS },
 	{ "code",		NODE_CODE },
+	{ "contrib",		NODE_CONTRIB },
 	{ "colspec",		NODE_COLSPEC },
 	{ "command",		NODE_COMMAND },
 	{ "constant",		NODE_CONSTANT },
@@ -125,6 +126,7 @@ static	const struct element elements[] =
 	{ "phrase",		NODE_IGNORE },
 	{ "preface",		NODE_PREFACE },
 	{ "primary",		NODE_DELETE },
+	{ "productname",	NODE_PRODUCTNAME },
 	{ "programlisting",	NODE_PROGRAMLISTING },
 	{ "prompt",		NODE_PROMPT },
 	{ "quote",		NODE_QUOTE },
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH docbook2mdoc] Add NODE_CONTRIB, NODE_PRODUCTNAME
  2019-03-27  3:52 [PATCH docbook2mdoc] Add NODE_CONTRIB, NODE_PRODUCTNAME Stephen Gregoratto
@ 2019-03-28 17:19 ` Ingo Schwarze
  2019-03-28 20:53   ` Ingo Schwarze
  0 siblings, 1 reply; 3+ messages in thread
From: Ingo Schwarze @ 2019-03-28 17:19 UTC (permalink / raw)
  To: Stephen Gregoratto; +Cc: tech

Hi Stephen,

Stephen Gregoratto wrote on Wed, Mar 27, 2019 at 02:52:24PM +1100:

> Here are some nodes I found in the wild.

Thank you for the information about the two nodes.  I wasn't aware
of them yet and have added the information to my TODO list.

Regarding the patch, you are no longer up to date.  ;-)

Unless you want to make at least one explicit formatting decision
based on the presence of an element, adding it to "node.h" is no
longer required.  Elements can now be explicitly ignored, in the
sense that they generate no node and that their content appears in
their stead as children of their parent, with a line like

  { "productname", NODE_IGNORE },

in parse.c, elements[].

Yes, i know that "nohe.h" still contains enum constants for various
nodes that are correctly not doing any formatting.  I tend to remove
those that are unlikely to ever require formatting, but i tend to
keep those that are likely to require some handling when the program
becomes better in the future.  But for new nodes, i only add them
when i start actually using them for formatting purposes.

Regarding the additions to parse.c, even those are no longer needed.
Elements with unknown names are now automatically ignored (in the
above sense), except that they emit a warning with -W.  So, a
NODE_IGNORE line in parse.c as shown above is only needed to suppress
the warning, which may be useful in two cases: (1) if it is unlikely
that the element will ever need formatting (examples: <acronym>,
<phrase>, <trademark>) or (2) if the element occurs so frequently
that the warnings would be too noisy, making it hard to find warnings
that actually matter (no examples currently).  It seems to me neither
applies here.

> Technically <productname> is used to markup text as 
> copyright/trademark/registered,

In that capacity, NODE_IGNORE is probably right, like for <acronym>.

> but the Systemd and GTK manuals use it to set the "source" argument
> of TH in man(7).  We have no need for it.

So in that capacity, it would be NODE_DELETE.

However, each element name can only have one enum constant attached
to it.  If requirements conflict depending on context, like in this
case, one could make it NODE_PRODUCTNAME and then explicitly treat
it as transparent in normal text (which needs no explicit code at
all, nodes are transparent by default) and skip it in other contexts
(which does need explicit code).  But i'd prefer adding the
NODE_PRODUCTNAME enum constant when the skipping code is implemented,
not earlier: we are not really sure yet such skipping code will
ever be needed, or are we?

> <contrib> is a bit odd case. It classifies an <author>'s contributions 
> to the program/manual/code etc. Here's an example from GTK:
> 
>   <authorgroup>
>     <author>
>       <contrib>Developer</contrib>
>       <firstname>Matthias</firstname>
>       <surname>Clasen</surname>
>     </author>
>   </authorgroup>

Right, handling <author> properly is a challenge that is on my TODO
list.  There are three things to do that i'm aware of: (1) in normal
text, stop skipping children like <affiliation>; (2) below
<refentryinfo>, <articleinfo>, <bookinfo> and the like, move the
information down to the AUTHORS section (a slightly tricky task);
(3) handle <contrib>.

Currently (and dto. with your patch), the above example is formatted as:

  .An Developer Matthias Clasen

Instead, it should be:

  Developer:
  .An Matthias Clasen

That's somewhat tricky (but still straightforward enough to implement
with the recent improvements in the codebase) because NODE_AUTHOR
has to print a NODE_CONTRIB child *before* calling macro_open("An").
It is sufficient to add the NODE_CONTRIB enum constant when that
is implemented.

> I'm setting it to be skipped for now, but if we ever start recording
> <author>s we could probably include this.

Err, right now, <author> elements *are* handled, and <contrib> -
with or without your patch - is *ignored* (in the sense explained
above), not *skipped*; see <anchor> and <indexterm> as examples of
elements that are skipped (more formally, "deleted").

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH docbook2mdoc] Add NODE_CONTRIB, NODE_PRODUCTNAME
  2019-03-28 17:19 ` Ingo Schwarze
@ 2019-03-28 20:53   ` Ingo Schwarze
  0 siblings, 0 replies; 3+ messages in thread
From: Ingo Schwarze @ 2019-03-28 20:53 UTC (permalink / raw)
  To: Stephen Gregoratto; +Cc: tech

Hi Stephen,

Ingo Schwarze wrote on Thu, Mar 28, 2019 at 06:19:56PM +0100:
> Stephen Gregoratto wrote on Wed, Mar 27, 2019 at 02:52:24PM +1100:

>>     <author>
>>       <contrib>Developer</contrib>
>>       <firstname>Matthias</firstname>
>>       <surname>Clasen</surname>
>>     </author>

> Right, handling <author> properly is a challenge that is on my TODO
> list.  There are three things to do that i'm aware of: (1) in normal
> text, stop skipping children like <affiliation>;

Done.

> (2) below <refentryinfo>, <articleinfo>, <bookinfo> and the like,
> move the information down to the AUTHORS section
> (a slightly tricky task);

Still open.

> (3) handle <contrib>.

Implemented with the patch below.  Part of the decision to do it
right away was for your benefit, such that you can see a practical
example up to date with the latest parser and formatter infrastructure
how such formatters are written nowadays.

Here is an example of how it works:

 $ cat regress/author/contrib.xml                               
<part>
initial text
<author>
  <affiliation>OpenBSD</affiliation>
  <firstname>Theo</firstname>
  <surname>de Raadt</surname>
  <contrib>project founder</contrib>
  <contrib>developer</contrib>
</author>
final text
</part>
 $ ./docbook2mdoc regress/author/contrib.xml                    
.Dd $Mdocdate$
.Dt UNKNOWN 1
.Os
initial text project founder, developer:
.An Theo de Raadt ,
OpenBSD final text

Yours,
  Ingo


Log Message:
-----------
Implement a formatter for <author> elements, 
handling <contrib>, <personname>, <firstname>, <othername>, <surname>, 
as well as arbitrary children properly.

This required minor work on the formatting infrastructure:
Improve macro_addnode() such that it also handles text nodes.
Add a companion function print_textnode().
Let print_text() optionally work without ARG_SPACE.

Triggered by a report from Stephen Gregoratto <dev at sgregoratto
dot me> that and how GTK documentation uses <contrib>.

Modified Files:
--------------
    docbook2mdoc:
        docbook2mdoc.c
        macro.c
        macro.h
        node.h
        parse.c

Revision Data
-------------
Index: node.h
===================================================================
RCS file: /home/cvs/mdocml/docbook2mdoc/node.h,v
retrieving revision 1.5
retrieving revision 1.6
diff -Lnode.h -Lnode.h -u -p -r1.5 -r1.6
--- node.h
+++ node.h
@@ -46,6 +46,7 @@ enum	nodeid {
 	NODE_COLSPEC,
 	NODE_COMMAND,
 	NODE_CONSTANT,
+	NODE_CONTRIB,
 	NODE_COPYRIGHT,
 	NODE_DATE,
 	NODE_EDITOR,
Index: macro.h
===================================================================
RCS file: /home/cvs/mdocml/docbook2mdoc/macro.h,v
retrieving revision 1.1
retrieving revision 1.2
diff -Lmacro.h -Lmacro.h -u -p -r1.1 -r1.2
--- macro.h
+++ macro.h
@@ -46,3 +46,6 @@ void	 macro_addarg(struct format *, cons
 void	 macro_argline(struct format *, const char *, const char *);
 void	 macro_addnode(struct format *, struct pnode *, int);
 void	 macro_nodeline(struct format *, const char *, struct pnode *, int);
+
+void	 print_text(struct format *, const char *, int);
+void	 print_textnode(struct format *, struct pnode *);
Index: docbook2mdoc.c
===================================================================
RCS file: /home/cvs/mdocml/docbook2mdoc/docbook2mdoc.c,v
retrieving revision 1.77
retrieving revision 1.78
diff -Ldocbook2mdoc.c -Ldocbook2mdoc.c -u -p -r1.77 -r1.78
--- docbook2mdoc.c
+++ docbook2mdoc.c
@@ -32,23 +32,6 @@ static void	 pnode_print(struct format *
 
 
 static void
-print_text(struct format *p, const char *word)
-{
-	switch (p->linestate) {
-	case LINE_NEW:
-		break;
-	case LINE_TEXT:
-		putchar(' ');
-		break;
-	case LINE_MACRO:
-		macro_close(p);
-		break;
-	}
-	fputs(word, stdout);
-	p->linestate = LINE_TEXT;
-}
-
-static void
 pnode_printpara(struct format *p, struct pnode *pn)
 {
 	struct pnode	*pp;
@@ -407,6 +390,66 @@ pnode_printgroup(struct format *p, struc
 }
 
 static void
+pnode_printauthor(struct format *f, struct pnode *n)
+{
+	struct pnode	*nc, *ncn;
+	int		 have_contrib, have_name;
+
+	/*
+	 * Print <contrib> children up front, before the .An scope,
+	 * and figure out whether we a name of a person.
+	 */
+
+	have_contrib = have_name = 0;
+	TAILQ_FOREACH_SAFE(nc, &n->childq, child, ncn) {
+		switch (nc->node) {
+		case NODE_CONTRIB:
+			if (have_contrib)
+				print_text(f, ",", 0);
+			print_textnode(f, nc);
+			pnode_unlink(nc);
+			have_contrib = 1;
+			break;
+		case NODE_PERSONNAME:
+			have_name = 1;
+			break;
+		default:
+			break;
+		}
+	}
+	if (TAILQ_FIRST(&n->childq) == NULL)
+		return;
+
+	if (have_contrib)
+		print_text(f, ":", 0);
+
+	/*
+         * If we have a name, print it in the .An scope and leave
+         * all other content for child handlers, to print after the
+         * scope.  Otherwise, print everything in the scope.
+	 */
+
+	macro_open(f, "An");
+	TAILQ_FOREACH_SAFE(nc, &n->childq, child, ncn) {
+		if (nc->node == NODE_PERSONNAME || have_name == 0) {
+			macro_addnode(f, nc, ARG_SPACE);
+			pnode_unlink(nc);
+		}
+	}
+
+	/*
+	 * If there are still unprinted children, end the scope
+	 * with a comma.  Otherwise, leave the scope open in case
+	 * a text node follows that starts with closing punctuation.
+	 */
+
+	if (TAILQ_FIRST(&n->childq) != NULL) {
+		macro_addarg(f, ",", ARG_SPACE);
+		macro_close(f);
+	}
+}
+
+static void
 pnode_printprologue(struct format *p, struct ptree *tree)
 {
 	struct pnode	*refmeta;
@@ -428,7 +471,7 @@ pnode_printprologue(struct format *p, st
 
 	if (tree->flags & TREE_EQN) {
 		macro_line(p, "EQ");
-		print_text(p, "delim $$");
+		print_text(p, "delim $$", 0);
 		macro_line(p, "EN");
 	}
 }
@@ -563,7 +606,7 @@ pnode_print(struct format *p, struct pno
 		pnode_printarg(p, pn);
 		break;
 	case NODE_AUTHOR:
-		macro_open(p, "An");
+		pnode_printauthor(p, pn);
 		break;
 	case NODE_AUTHORGROUP:
 		macro_line(p, "An -split");
@@ -587,7 +630,7 @@ pnode_print(struct format *p, struct pno
 		macro_open(p, "Dv");
 		break;
 	case NODE_EDITOR:
-		print_text(p, "editor:");
+		print_text(p, "editor:", ARG_SPACE);
 		macro_open(p, "An");
 		break;
 	case NODE_EMAIL:
Index: macro.c
===================================================================
RCS file: /home/cvs/mdocml/docbook2mdoc/macro.c,v
retrieving revision 1.1
retrieving revision 1.2
diff -Lmacro.c -Lmacro.c -u -p -r1.1 -r1.2
--- macro.c
+++ macro.c
@@ -157,13 +157,13 @@ macro_addnode(struct format *f, struct p
 	assert(f->linestate == LINE_MACRO);
 
 	/*
-	 * If the only child is a text node, just add that text,
-	 * letting macro_addarg() decide about quoting.
+	 * If this node or its only child is a text node, just add
+	 * that text, letting macro_addarg() decide about quoting.
 	 */
 
-	pn = TAILQ_FIRST(&pn->childq);
-	if (pn != NULL && pn->node == NODE_TEXT &&
-	    TAILQ_NEXT(pn, child) == NULL) {
+	if (pn->node == NODE_TEXT ||
+	    ((pn = TAILQ_FIRST(&pn->childq)) != NULL &&
+	     pn->node == NODE_TEXT && TAILQ_NEXT(pn, child) == NULL)) {
 		macro_addarg(f, pn->b, flags);
 		return;
 	}
@@ -193,10 +193,7 @@ macro_addnode(struct format *f, struct p
 	 */
 
 	while (pn != NULL) {
-		if (pn->node == NODE_TEXT)
-			macro_addarg(f, pn->b, flags);
-		else
-			macro_addnode(f, pn, flags);
+		macro_addnode(f, pn, flags);
 		pn = TAILQ_NEXT(pn, child);
 		flags |= ARG_SPACE;
 	}
@@ -210,4 +207,41 @@ macro_nodeline(struct format *f, const c
 	macro_open(f, name);
 	macro_addnode(f, pn, ARG_SPACE | flags);
 	macro_close(f);
+}
+
+
+/*
+ * Print a word on the current text line if one is open, or on a new text
+ * line otherwise.  The flag ARG_SPACE inserts spaces between words.
+ */
+void
+print_text(struct format *f, const char *word, int flags) {
+	switch (f->linestate) {
+	case LINE_NEW:
+		break;
+	case LINE_TEXT:
+		if (flags & ARG_SPACE)
+			putchar(' ');
+		break;
+	case LINE_MACRO:
+		macro_close(f);
+		break;
+	}
+	fputs(word, stdout);
+	f->linestate = LINE_TEXT;
+}
+
+/*
+ * Recursively print the content of a node on a text line.
+ */
+void
+print_textnode(struct format *f, struct pnode *n)
+{
+	struct pnode	*nc;
+
+	if (n->node == NODE_TEXT)
+		print_text(f, n->b, ARG_SPACE);
+	else
+		TAILQ_FOREACH(nc, &n->childq, child)
+			print_textnode(f, nc);
 }
Index: parse.c
===================================================================
RCS file: /home/cvs/mdocml/docbook2mdoc/parse.c,v
retrieving revision 1.6
retrieving revision 1.7
diff -Lparse.c -Lparse.c -u -p -r1.6 -r1.7
--- parse.c
+++ parse.c
@@ -73,6 +73,7 @@ static	const struct element elements[] =
 	{ "colspec",		NODE_COLSPEC },
 	{ "command",		NODE_COMMAND },
 	{ "constant",		NODE_CONSTANT },
+	{ "contrib",		NODE_CONTRIB },
 	{ "copyright",		NODE_COPYRIGHT },
 	{ "date",		NODE_DATE },
 	{ "editor",		NODE_EDITOR },
@@ -82,7 +83,7 @@ static	const struct element elements[] =
 	{ "envar",		NODE_ENVAR },
 	{ "fieldsynopsis",	NODE_FIELDSYNOPSIS },
 	{ "filename",		NODE_FILENAME },
-	{ "firstname",		NODE_IGNORE },
+	{ "firstname",		NODE_PERSONNAME },
 	{ "firstterm",		NODE_FIRSTTERM },
 	{ "footnote",		NODE_FOOTNOTE },
 	{ "funcdef",		NODE_FUNCDEF },
@@ -122,7 +123,7 @@ static	const struct element elements[] =
 	{ "option",		NODE_OPTION },
 	{ "orderedlist",	NODE_ORDEREDLIST },
 	{ "orgname",		NODE_ORGNAME },
-	{ "othername",		NODE_IGNORE },
+	{ "othername",		NODE_PERSONNAME },
 	{ "para",		NODE_PARA },
 	{ "paramdef",		NODE_PARAMDEF },
 	{ "parameter",		NODE_PARAMETER },
@@ -164,7 +165,7 @@ static	const struct element elements[] =
 	{ "spanspec",		NODE_SPANSPEC },
 	{ "structname",		NODE_STRUCTNAME },
 	{ "subtitle",		NODE_SUBTITLE },
-	{ "surname",		NODE_IGNORE },
+	{ "surname",		NODE_PERSONNAME },
 	{ "synopsis",		NODE_SYNOPSIS },
 	{ "table",		NODE_TABLE },
 	{ "tbody",		NODE_TBODY },
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-27  3:52 [PATCH docbook2mdoc] Add NODE_CONTRIB, NODE_PRODUCTNAME Stephen Gregoratto
2019-03-28 17:19 ` Ingo Schwarze
2019-03-28 20:53   ` Ingo Schwarze

tech@mandoc.bsd.lv

Archives are clonable: git clone --mirror http://inbox.vuxu.org/mandoc-tech

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.mandoc.tech


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git