tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* PATCH: correct handling of literal tab characters
@ 2010-05-24 15:18 Ingo Schwarze
  2010-05-24 21:22 ` Kristaps Dzonsons
  0 siblings, 1 reply; 2+ messages in thread
From: Ingo Schwarze @ 2010-05-24 15:18 UTC (permalink / raw)
  To: tech

[-- Attachment #1: Type: text/plain, Size: 4978 bytes --]

Hi Kristaps and Joerg,

here is the second non-trivial patch extracted from the OpenBSD tree.
Note that there are two more diffs layered on top of this one, both
touching more or less the same lines in both .c files, but both
more or less unrelated regarding their purpose, so i'm trying
to keep this separate.

Here is the commit message from OpenBSD mdoc_term.c rev. 1.75,
April 23, 2010.  Note the old and current patches are not identical,
the surrounding code changed in the meantime.

> Handle literal tab characters both in literal context (.Bd -literal)
> and outside.  In literal context, tab stops are at each eigth column;
> outside, they are at each fifth column.
>
> Use tabwidth = 5 as the default and temporarily switch to 8 in termp_bd_pre().
> This requires to move the term_flushln() of the final line of a display from
> termp_bd_post() to termp_bd_pre(); the former still needs term_newln()
> to handle the final lines of non-literal displays.
>
> Handling inside term_flushln() is tricky because a tab collapses with
> inter-word spacing, but not with another tab.
>
> Missing feature reported independently by jmc@ and deraadt@.

Moving the term_flushln() around is not yet covered in this
diff but will follow in the next one.  This is too intricately
entangled with the issue of vertical spacing in tables, which
will be fully handled by the next diff anyway.

Also note that the weird tab stop positions at column 5 and 8 are
still used by modern groff.

A test file is attached.

This patch is also available from
  /usr/vhosts/mdocml.bsd.lv/patch/schwarze/05.tabbing.patch

OK?

Yours,
  Ingo


--- term.h	Mon May 17 00:33:00 2010
+++ term.h	Sat May 15 23:09:53 2010
@@ -37,7 +37,8 @@ struct	termp {
 	size_t		  maxrmargin;	/* Max right margin. */
 	size_t		  maxcols;	/* Max size of buf. */
 	size_t		  offset;	/* Margin offest. */
+	size_t		  tabwidth;	/* Distance of tab positions. */
 	size_t		  col;		/* Bytes in buf. */
 	int		  overstep;	/* See termp_flushln(). */
 	int		  flags;
 #define	TERMP_SENTENCE	 (1 << 1)	/* Space before a sentence. */
--- term.c	Tue May 18 02:05:38 2010
+++ term.c	Mon May 24 00:45:01 2010
@@ -84,6 +80,7 @@ term_alloc(enum termenc enc, size_t width)
 		perror(NULL);
 		exit(EXIT_FAILURE);
 	}
+	p->tabwidth = 5;
 	p->enc = enc;
 	/* Enforce some lower boundary. */
 	if (width < 60)
@@ -171,6 +169,17 @@ term_flushln(struct termp *p)
 	while (i < (int)p->col) {
 
 		/*
+		 * Handle literal tab characters.
+		 */
+		for (j = i; j < (int)p->col; j++) {
+			if ('\t' != p->buf[j])
+				break;
+			vend = (vis/p->tabwidth+1)*p->tabwidth;
+			vbl += vend - vis;
+			vis = vend;
+		}
+
+		/*
 		 * Count up visible word characters.  Control sequences
 		 * (starting with the CSI) aren't counted.  A space
 		 * generates a non-printing word, which is valid (the
@@ -178,27 +187,27 @@ term_flushln(struct termp *p)
 		 */
 
 		/* LINTED */
-		for (j = i; j < (int)p->col; j++) {
-			if (j && ' ' == p->buf[j]) 
+		for ( ; j < (int)p->col; j++) {
+			if ((j && ' ' == p->buf[j]) || '\t' == p->buf[j])
 				break;
 			if (8 == p->buf[j])
 				vend--;
 			else
 				vend++;
 		}
 
 		/*
 		 * Find out whether we would exceed the right margin.
 		 * If so, break to the next line.
 		 */
 		if (vend > bp && vis > 0) {
 			vend -= vis;
 			putchar('\n');
 			if (TERMP_NOBREAK & p->flags) {
 				for (j = 0; j < (int)p->rmargin; j++)
 					putchar(' ');
 				vend += p->rmargin - p->offset;
 			} else {
 				vbl = p->offset;
 			}
 
@@ -209,8 +224,16 @@ term_flushln(struct termp *p)
 			p->overstep = 0;
 		}
 
+		/*
+		 * Skip leading tabs, they were handled above.
+		 */
+		while (i < (int)p->col && '\t' == p->buf[i])
+			i++;
+
 		/* Write out the [remaining] word. */
 		for ( ; i < (int)p->col; i++) {
+			if ('\t' == p->buf[i])
+				break;
 			if (' ' == p->buf[i]) {
 				while (' ' == p->buf[i]) {
 					vbl++;
--- mdoc_term.c	Mon May 24 16:35:59 2010
+++ mdoc_term.c	Mon May 24 02:37:01 2010
@@ -274,6 +270,7 @@ terminal_mdoc(void *arg, const struct mdoc *mdoc)
 
 	p->overstep = 0;
 	p->maxrmargin = p->defrmargin;
+	p->tabwidth = 5;
 
 	if (NULL == p->symtab)
 		switch (p->enc) {
@@ -1593,6 +1590,7 @@ termp_fa_pre(DECL_ARGS)
 static int
 termp_bd_pre(DECL_ARGS)
 {
+	size_t			 tabwidth;
 	int	         	 i, type;
 	size_t			 rm, rmax;
 	const struct mdoc_node	*nn;
@@ -1622,6 +1620,8 @@ termp_bd_pre(DECL_ARGS)
 	if (MDOC_Literal != type && MDOC_Unfilled != type)
 		return(1);
 
+	tabwidth = p->tabwidth;
+	p->tabwidth = 8;
 	rm = p->rmargin;
 	rmax = p->maxrmargin;
 	p->rmargin = p->maxrmargin = TERM_MAXMARGIN;
@@ -1629,13 +1629,14 @@ termp_bd_pre(DECL_ARGS)
 	for (nn = n->child; nn; nn = nn->next) {
 		p->flags |= TERMP_NOSPACE;
 		print_mdoc_node(p, pair, m, nn);
 		if (NULL == nn->next)
 			continue;
 		if (nn->prev && nn->prev->line < nn->line)
 			term_flushln(p);
 		else if (NULL == nn->prev)
 			term_flushln(p);
 	}
+	p->tabwidth = tabwidth;
 
 	p->rmargin = rm;
 	p->maxrmargin = rmax;

[-- Attachment #2: tab.in --]
[-- Type: text/plain, Size: 888 bytes --]

.Dd $Mdocdate: April 23 2010 $
.Dt SPACE-TAB 1
.Os
.Sh NAME
.Nm space-tab
.Nd handling of literal space characters
.Sh DESCRIPTION
plain text
.br
1	x
.br
22	x
.br
333	x
.br
4444	x
.br
55555	x
.br
666666	x
.br
7777777	x
.br
88888888	x
.br
999999999	x
.br
aaaaaaaaaa	x
.br
bbbbbbbbbbb	x
.br
cccccccccccc	x
.br
ddddddddddddd	x
.br
tab	 space
.br
tab		tab
.br
space 	tab
.br
	tab
.br
		tab
.br
ragged display
.Bd -ragged -offset 2n
1	x
.br
22	x
.br
333	x
.br
4444	x
.br
55555	x
.br
666666	x
.br
7777777	x
.br
88888888	x
.br
999999999	x
.br
aaaaaaaaaa	x
.br
bbbbbbbbbbb	x
.br
cccccccccccc	x
.br
ddddddddddddd	x
.br
tab	 space
.br
tab		tab
.br
space 	tab
.br
	tab
.br
		tab
.Ed
literal display
.Bd -literal -offset 2n
1	x
22	x
333	x
4444	x
55555	x
666666	x
7777777	x
88888888	x
999999999	x
aaaaaaaaaa	x
bbbbbbbbbbb	x
cccccccccccc	x
ddddddddddddd	x
tab	 space
tab		tab
space 	tab
	tab
		tab
.Ed

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: PATCH: correct handling of literal tab characters
  2010-05-24 15:18 PATCH: correct handling of literal tab characters Ingo Schwarze
@ 2010-05-24 21:22 ` Kristaps Dzonsons
  0 siblings, 0 replies; 2+ messages in thread
From: Kristaps Dzonsons @ 2010-05-24 21:22 UTC (permalink / raw)
  To: tech

> A test file is attached.
> 
> This patch is also available from
>   /usr/vhosts/mdocml.bsd.lv/patch/schwarze/05.tabbing.patch

This looks fine by me, although I'd like a bit more in-code 
documentation in the critical sections.  Note that I didn't test the 
patch, as it didn't apply.  When you get it checked in, respond to this 
and I'll run it through the usual battery of tests to make sure nothing 
in one of the million manuals I have around pukes.

Commit it!
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-05-24 21:21 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-24 15:18 PATCH: correct handling of literal tab characters Ingo Schwarze
2010-05-24 21:22 ` Kristaps Dzonsons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).