From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp1.rz.uni-karlsruhe.de (Debian-exim@smtp1.rz.uni-karlsruhe.de [129.13.185.217]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id o4OFI9ZB012084 for ; Mon, 24 May 2010 09:18:10 -0600 (MDT) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by smtp1.rz.uni-karlsruhe.de with esmtp (Exim 4.63 #1) id 1OGZPt-0005pB-3j; Mon, 24 May 2010 17:18:09 +0200 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.71) (envelope-from ) id 1OGZPt-0008Od-2l for tech@mdocml.bsd.lv; Mon, 24 May 2010 17:18:09 +0200 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.69) (envelope-from ) id 1OGZPt-0002MD-27 for tech@mdocml.bsd.lv; Mon, 24 May 2010 17:18:09 +0200 Received: from schwarze by usta.de with local (Exim 4.71) (envelope-from ) id 1OGZPs-0000Id-QT for tech@mdocml.bsd.lv; Mon, 24 May 2010 17:18:08 +0200 Date: Mon, 24 May 2010 17:18:08 +0200 From: Ingo Schwarze To: tech@mdocml.bsd.lv Subject: PATCH: correct handling of literal tab characters Message-ID: <20100524151808.GR13544@iris.usta.de> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="sm4nu43k4a2Rpi4c" Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) --sm4nu43k4a2Rpi4c Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Kristaps and Joerg, here is the second non-trivial patch extracted from the OpenBSD tree. Note that there are two more diffs layered on top of this one, both touching more or less the same lines in both .c files, but both more or less unrelated regarding their purpose, so i'm trying to keep this separate. Here is the commit message from OpenBSD mdoc_term.c rev. 1.75, April 23, 2010. Note the old and current patches are not identical, the surrounding code changed in the meantime. > Handle literal tab characters both in literal context (.Bd -literal) > and outside. In literal context, tab stops are at each eigth column; > outside, they are at each fifth column. > > Use tabwidth = 5 as the default and temporarily switch to 8 in termp_bd_pre(). > This requires to move the term_flushln() of the final line of a display from > termp_bd_post() to termp_bd_pre(); the former still needs term_newln() > to handle the final lines of non-literal displays. > > Handling inside term_flushln() is tricky because a tab collapses with > inter-word spacing, but not with another tab. > > Missing feature reported independently by jmc@ and deraadt@. Moving the term_flushln() around is not yet covered in this diff but will follow in the next one. This is too intricately entangled with the issue of vertical spacing in tables, which will be fully handled by the next diff anyway. Also note that the weird tab stop positions at column 5 and 8 are still used by modern groff. A test file is attached. This patch is also available from /usr/vhosts/mdocml.bsd.lv/patch/schwarze/05.tabbing.patch OK? Yours, Ingo --- term.h Mon May 17 00:33:00 2010 +++ term.h Sat May 15 23:09:53 2010 @@ -37,7 +37,8 @@ struct termp { size_t maxrmargin; /* Max right margin. */ size_t maxcols; /* Max size of buf. */ size_t offset; /* Margin offest. */ + size_t tabwidth; /* Distance of tab positions. */ size_t col; /* Bytes in buf. */ int overstep; /* See termp_flushln(). */ int flags; #define TERMP_SENTENCE (1 << 1) /* Space before a sentence. */ --- term.c Tue May 18 02:05:38 2010 +++ term.c Mon May 24 00:45:01 2010 @@ -84,6 +80,7 @@ term_alloc(enum termenc enc, size_t width) perror(NULL); exit(EXIT_FAILURE); } + p->tabwidth = 5; p->enc = enc; /* Enforce some lower boundary. */ if (width < 60) @@ -171,6 +169,17 @@ term_flushln(struct termp *p) while (i < (int)p->col) { /* + * Handle literal tab characters. + */ + for (j = i; j < (int)p->col; j++) { + if ('\t' != p->buf[j]) + break; + vend = (vis/p->tabwidth+1)*p->tabwidth; + vbl += vend - vis; + vis = vend; + } + + /* * Count up visible word characters. Control sequences * (starting with the CSI) aren't counted. A space * generates a non-printing word, which is valid (the @@ -178,27 +187,27 @@ term_flushln(struct termp *p) */ /* LINTED */ - for (j = i; j < (int)p->col; j++) { - if (j && ' ' == p->buf[j]) + for ( ; j < (int)p->col; j++) { + if ((j && ' ' == p->buf[j]) || '\t' == p->buf[j]) break; if (8 == p->buf[j]) vend--; else vend++; } /* * Find out whether we would exceed the right margin. * If so, break to the next line. */ if (vend > bp && vis > 0) { vend -= vis; putchar('\n'); if (TERMP_NOBREAK & p->flags) { for (j = 0; j < (int)p->rmargin; j++) putchar(' '); vend += p->rmargin - p->offset; } else { vbl = p->offset; } @@ -209,8 +224,16 @@ term_flushln(struct termp *p) p->overstep = 0; } + /* + * Skip leading tabs, they were handled above. + */ + while (i < (int)p->col && '\t' == p->buf[i]) + i++; + /* Write out the [remaining] word. */ for ( ; i < (int)p->col; i++) { + if ('\t' == p->buf[i]) + break; if (' ' == p->buf[i]) { while (' ' == p->buf[i]) { vbl++; --- mdoc_term.c Mon May 24 16:35:59 2010 +++ mdoc_term.c Mon May 24 02:37:01 2010 @@ -274,6 +270,7 @@ terminal_mdoc(void *arg, const struct mdoc *mdoc) p->overstep = 0; p->maxrmargin = p->defrmargin; + p->tabwidth = 5; if (NULL == p->symtab) switch (p->enc) { @@ -1593,6 +1590,7 @@ termp_fa_pre(DECL_ARGS) static int termp_bd_pre(DECL_ARGS) { + size_t tabwidth; int i, type; size_t rm, rmax; const struct mdoc_node *nn; @@ -1622,6 +1620,8 @@ termp_bd_pre(DECL_ARGS) if (MDOC_Literal != type && MDOC_Unfilled != type) return(1); + tabwidth = p->tabwidth; + p->tabwidth = 8; rm = p->rmargin; rmax = p->maxrmargin; p->rmargin = p->maxrmargin = TERM_MAXMARGIN; @@ -1629,13 +1629,14 @@ termp_bd_pre(DECL_ARGS) for (nn = n->child; nn; nn = nn->next) { p->flags |= TERMP_NOSPACE; print_mdoc_node(p, pair, m, nn); if (NULL == nn->next) continue; if (nn->prev && nn->prev->line < nn->line) term_flushln(p); else if (NULL == nn->prev) term_flushln(p); } + p->tabwidth = tabwidth; p->rmargin = rm; p->maxrmargin = rmax; --sm4nu43k4a2Rpi4c Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="tab.in" .Dd $Mdocdate: April 23 2010 $ .Dt SPACE-TAB 1 .Os .Sh NAME .Nm space-tab .Nd handling of literal space characters .Sh DESCRIPTION plain text .br 1 x .br 22 x .br 333 x .br 4444 x .br 55555 x .br 666666 x .br 7777777 x .br 88888888 x .br 999999999 x .br aaaaaaaaaa x .br bbbbbbbbbbb x .br cccccccccccc x .br ddddddddddddd x .br tab space .br tab tab .br space tab .br tab .br tab .br ragged display .Bd -ragged -offset 2n 1 x .br 22 x .br 333 x .br 4444 x .br 55555 x .br 666666 x .br 7777777 x .br 88888888 x .br 999999999 x .br aaaaaaaaaa x .br bbbbbbbbbbb x .br cccccccccccc x .br ddddddddddddd x .br tab space .br tab tab .br space tab .br tab .br tab .Ed literal display .Bd -literal -offset 2n 1 x 22 x 333 x 4444 x 55555 x 666666 x 7777777 x 88888888 x 999999999 x aaaaaaaaaa x bbbbbbbbbbb x cccccccccccc x ddddddddddddd x tab space tab tab space tab tab tab .Ed --sm4nu43k4a2Rpi4c-- -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv