From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from localhost (fantadrom.bsd.lv [local]) by fantadrom.bsd.lv (OpenSMTPD) with ESMTPA id ddec8671 for ; Thu, 3 Jan 2019 15:00:25 -0500 (EST) Date: Thu, 3 Jan 2019 15:00:25 -0500 (EST) X-Mailinglist: mandoc-source Reply-To: source@mandoc.bsd.lv MIME-Version: 1.0 From: schwarze@mandoc.bsd.lv To: source@mandoc.bsd.lv Subject: mandoc: Rewrite the line filling function for terminal output yet again. X-Mailer: activitymail 1.26, http://search.cpan.org/dist/activitymail/ Content-Type: text/plain; charset=utf-8 Message-ID: <05f93c27c6046b2b@fantadrom.bsd.lv> Log Message: ----------- Rewrite the line filling function for terminal output yet again. This function has always been among the most complicated parts of mandoc, and it repeatedly needed substantial functional enhancements. The present rewrite is required to prepare for the implementation of simultaneous filling and centering of output lines. The previous implementation looked at each word in turn and printed it to the output stream as soon as it was found to still fit on the current output line. Obviously, that approach neither allows centering nor adjustment to the right margin. The new implementation first decides which part of the paragraph to put onto the current output line, also measuring the display width of that part, even if that part consists of multiple words including intervening whitespace. This will allow moving the whole output line to the right as desired before printing it, for example to center it or to adjust it to the right margin. The function is split into three parts, each much shorter, solving a better defined task, much easier to understand and better commented: 1. the steering function term_flushln() looping over output lines; 2. the calculation function term_fill() looping over input characters; 3. and the output function term_field() looping over printed characters. No functional change yet. Modified Files: -------------- mandoc: term.c Revision Data ------------- Index: term.c =================================================================== RCS file: /home/cvs/mandoc/mandoc/term.c,v retrieving revision 1.277 retrieving revision 1.278 diff -Lterm.c -Lterm.c -u -p -r1.277 -r1.278 --- term.c +++ term.c @@ -1,7 +1,7 @@ /* $Id$ */ /* * Copyright (c) 2008, 2009, 2010, 2011 Kristaps Dzonsons - * Copyright (c) 2010-2018 Ingo Schwarze + * Copyright (c) 2010-2019 Ingo Schwarze * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above @@ -21,6 +21,7 @@ #include #include +#include #include #include #include @@ -37,6 +38,10 @@ static void bufferc(struct termp *, ch static void encode(struct termp *, const char *, size_t); static void encode1(struct termp *, int); static void endline(struct termp *); +static void term_field(struct termp *, size_t, size_t, + size_t, size_t); +static void term_fill(struct termp *, size_t *, size_t *, + size_t); void @@ -83,241 +88,310 @@ term_end(struct termp *p) * Flush a chunk of text. By default, break the output line each time * the right margin is reached, and continue output on the next line * at the same offset as the chunk itself. By default, also break the - * output line at the end of the chunk. - * The following flags may be specified: - * - * - TERMP_NOBREAK: Do not break the output line at the right margin, - * but only at the max right margin. Also, do not break the output - * line at the end of the chunk, such that the next call can pad to - * the next column. However, if less than p->trailspace blanks, - * which can be 0, 1, or 2, remain to the right margin, the line - * will be broken. - * - TERMP_BRTRSP: Consider trailing whitespace significant - * when deciding whether the chunk fits or not. - * - TERMP_BRIND: If the chunk does not fit and the output line has - * to be broken, start the next line at the right margin instead - * of at the offset. Used together with TERMP_NOBREAK for the tags - * in various kinds of tagged lists. - * - TERMP_HANG: Do not break the output line at the right margin, - * append the next chunk after it even if this one is too long. - * To be used together with TERMP_NOBREAK. - * - TERMP_NOPAD: Start writing at the current position, - * do not pad with blank characters up to the offset. + * output line at the end of the chunk. There are many flags modifying + * this behaviour, see the comments in the body of the function. */ void term_flushln(struct termp *p) { - size_t vis; /* current visual position on output */ - size_t vbl; /* number of blanks to prepend to output */ - size_t vend; /* end of word visual position on output */ - size_t bp; /* visual right border position */ - size_t dv; /* temporary for visual pos calculations */ - size_t j; /* temporary loop index for p->tcol->buf */ - size_t jhy; /* last hyph before overflow w/r/t j */ - size_t maxvis; /* output position of visible boundary */ - int ntab; /* number of tabs to prepend */ - int breakline; /* after this word */ + size_t vbl; /* Number of blanks to prepend to the output. */ + size_t vbr; /* Actual visual position of the end of field. */ + size_t vfield; /* Desired visual field width. */ + size_t vtarget; /* Desired visual position of the right margin. */ + size_t ic; /* Character position in the input buffer. */ + size_t nbr; /* Number of characters to print in this field. */ + + /* + * Normally, start writing at the left margin, but with the + * NOPAD flag, start writing at the current position instead. + */ vbl = (p->flags & TERMP_NOPAD) || p->tcol->offset < p->viscol ? 0 : p->tcol->offset - p->viscol; if (p->minbl && vbl < p->minbl) vbl = p->minbl; - maxvis = p->tcol->rmargin > p->viscol + vbl ? - p->tcol->rmargin - p->viscol - vbl : 0; - bp = !(p->flags & TERMP_NOBREAK) ? maxvis : - p->maxrmargin > p->viscol + vbl ? - p->maxrmargin - p->viscol - vbl : 0; - vis = vend = 0; if ((p->flags & TERMP_MULTICOL) == 0) p->tcol->col = 0; - while (p->tcol->col < p->tcol->lastcol) { + + /* Loop over output lines. */ + + for (;;) { + vfield = p->tcol->rmargin > p->viscol + vbl ? + p->tcol->rmargin - p->viscol - vbl : 0; /* - * Handle literal tab characters: collapse all - * subsequent tabs into a single huge set of spaces. + * Normally, break the line at the the right margin + * of the field, but with the NOBREAK flag, only + * break it at the max right margin of the screen, + * and with the BRNEVER flag, never break it at all. */ - ntab = 0; - while (p->tcol->col < p->tcol->lastcol && - p->tcol->buf[p->tcol->col] == '\t') { - vend = term_tab_next(vis); - vbl += vend - vis; - vis = vend; - ntab++; - p->tcol->col++; - } + vtarget = p->flags & TERMP_BRNEVER ? SIZE_MAX : + (p->flags & TERMP_NOBREAK) == 0 ? vfield : + p->maxrmargin > p->viscol + vbl ? + p->maxrmargin - p->viscol - vbl : 0; /* - * Count up visible word characters. Control sequences - * (starting with the CSI) aren't counted. A space - * generates a non-printing word, which is valid (the - * space is printed according to regular spacing rules). + * Figure out how much text will fit in the field. + * If there is whitespace only, print nothing. + * Otherwise, print the field content. */ - jhy = 0; - breakline = 0; - for (j = p->tcol->col; j < p->tcol->lastcol; j++) { - if (p->tcol->buf[j] == '\n') { - if ((p->flags & TERMP_BRIND) == 0) - breakline = 1; - continue; - } - if (p->tcol->buf[j] == ' ' || p->tcol->buf[j] == '\t') - break; + term_fill(p, &nbr, &vbr, vtarget); + if (nbr == 0) + break; + + term_field(p, vbl, nbr, vbr, vtarget); - /* Back over the last printed character. */ - if (p->tcol->buf[j] == '\b') { - assert(j); - vend -= (*p->width)(p, p->tcol->buf[j - 1]); + /* + * If there is no text left in the field, exit the loop. + * If the BRTRSP flag is set, consider trailing + * whitespace significant when deciding whether + * the field fits or not. + */ + + for (ic = p->tcol->col; ic < p->tcol->lastcol; ic++) { + switch (p->tcol->buf[ic]) { + case '\t': + if (p->flags & TERMP_BRTRSP) + vbr = term_tab_next(vbr); + continue; + case ' ': + if (p->flags & TERMP_BRTRSP) + vbr += (*p->width)(p, ' '); continue; + case '\n': + case ASCII_BREAK: + continue; + default: + break; } + break; + } + if (ic == p->tcol->lastcol) + break; - /* Regular word. */ - /* Break at the hyphen point if we overrun. */ - if (vend > vis && vend < bp && - (p->tcol->buf[j] == ASCII_HYPH|| - p->tcol->buf[j] == ASCII_BREAK)) - jhy = j; + /* + * At the location of an automtic line break, input + * space characters are consumed by the line break. + */ - /* - * Hyphenation now decided, put back a real - * hyphen such that we get the correct width. - */ - if (p->tcol->buf[j] == ASCII_HYPH) - p->tcol->buf[j] = '-'; + while (p->tcol->col < p->tcol->lastcol && + p->tcol->buf[p->tcol->col] == ' ') + p->tcol->col++; - vend += (*p->width)(p, p->tcol->buf[j]); - } + /* + * In multi-column mode, leave the rest of the text + * in the buffer to be handled by a subsequent + * invocation, such that the other columns of the + * table can be handled first. + * In single-column mode, simply break the line. + */ + + if (p->flags & TERMP_MULTICOL) + return; + + endline(p); + p->viscol = 0; /* - * Find out whether we would exceed the right margin. - * If so, break to the next line. + * Normally, start the next line at the same indentation + * as this one, but with the BRIND flag, start it at the + * right margin instead. This is used together with + * NOBREAK for the tags in various kinds of tagged lists. */ - if (vend > bp && jhy == 0 && vis > 0 && - (p->flags & TERMP_BRNEVER) == 0) { - if (p->flags & TERMP_MULTICOL) - return; + vbl = p->flags & TERMP_BRIND ? + p->tcol->rmargin : p->tcol->offset; + } - endline(p); - vend -= vis; + /* Reset output state in preparation for the next field. */ - /* Use pending tabs on the new line. */ + p->col = p->tcol->col = p->tcol->lastcol = 0; + p->minbl = p->trailspace; + p->flags &= ~(TERMP_BACKAFTER | TERMP_BACKBEFORE | TERMP_NOPAD); - vbl = 0; - while (ntab--) - vbl = term_tab_next(vbl); + if (p->flags & TERMP_MULTICOL) + return; - /* Re-establish indentation. */ + /* + * The HANG flag means that the next field + * always follows on the same line. + * The NOBREAK flag means that the next field + * follows on the same line unless the field was overrun. + * Normally, break the line at the end of each field. + */ - if (p->flags & TERMP_BRIND) - vbl += p->tcol->rmargin; - else - vbl += p->tcol->offset; - maxvis = p->tcol->rmargin > vbl ? - p->tcol->rmargin - vbl : 0; - bp = !(p->flags & TERMP_NOBREAK) ? maxvis : - p->maxrmargin > vbl ? p->maxrmargin - vbl : 0; - } + if ((p->flags & TERMP_HANG) == 0 && + ((p->flags & TERMP_NOBREAK) == 0 || + vbr + term_len(p, p->trailspace) > vfield)) + endline(p); +} - /* - * Write out the rest of the word. - */ +/* + * Store the number of input characters to print in this field in *nbr + * and their total visual width to print in *vbr. + * If there is only whitespace in the field, both remain zero. + * The desired visual width of the field is provided by vtarget. + * If the first word is longer, the field will be overrun. + */ +static void +term_fill(struct termp *p, size_t *nbr, size_t *vbr, size_t vtarget) +{ + size_t ic; /* Character position in the input buffer. */ + size_t vis; /* Visual position of the current character. */ + size_t vn; /* Visual position of the next character. */ + int breakline; /* Break at the end of this word. */ + int graph; /* Last character was non-blank. */ + + *nbr = *vbr = vis = 0; + breakline = graph = 0; + for (ic = p->tcol->col; ic < p->tcol->lastcol; ic++) { + switch (p->tcol->buf[ic]) { + case '\b': /* Escape \o (overstrike) or backspace markup. */ + assert(ic > 0); + vis -= (*p->width)(p, p->tcol->buf[ic - 1]); + continue; - for ( ; p->tcol->col < p->tcol->lastcol; p->tcol->col++) { - if (vend > bp && jhy > 0 && p->tcol->col > jhy) + case '\t': /* Normal ASCII whitespace. */ + case ' ': + case ASCII_BREAK: /* Escape \: (breakpoint). */ + switch (p->tcol->buf[ic]) { + case '\t': + vn = term_tab_next(vis); break; - if (p->tcol->buf[p->tcol->col] == '\n') - continue; - if (p->tcol->buf[p->tcol->col] == '\t') + case ' ': + vn = vis + (*p->width)(p, ' '); break; - if (p->tcol->buf[p->tcol->col] == ' ') { - j = p->tcol->col; - while (p->tcol->col < p->tcol->lastcol && - p->tcol->buf[p->tcol->col] == ' ') - p->tcol->col++; - dv = (p->tcol->col - j) * (*p->width)(p, ' '); - vbl += dv; - vend += dv; + case ASCII_BREAK: + vn = vis; break; } - if (p->tcol->buf[p->tcol->col] == ASCII_NBRSP) { - vbl += (*p->width)(p, ' '); - continue; + /* Can break at the end of a word. */ + if (breakline || vn > vtarget) + break; + if (graph) { + *nbr = ic; + *vbr = vis; + graph = 0; } - if (p->tcol->buf[p->tcol->col] == ASCII_BREAK) - continue; + vis = vn; + continue; + + case '\n': /* Escape \p (break at the end of the word). */ + breakline = 1; + continue; + case ASCII_HYPH: /* Breakable hyphen. */ + graph = 1; /* - * Now we definitely know there will be - * printable characters to output, - * so write preceding white space now. + * We are about to decide whether to break the + * line or not, so we no longer need this hyphen + * to be marked as breakable. Put back a real + * hyphen such that we get the correct width. */ - if (vbl) { - (*p->advance)(p, vbl); - p->viscol += vbl; - vbl = 0; + p->tcol->buf[ic] = '-'; + vis += (*p->width)(p, '-'); + if (vis > vtarget) { + ic++; + break; } - - (*p->letter)(p, p->tcol->buf[p->tcol->col]); - if (p->tcol->buf[p->tcol->col] == '\b') - p->viscol -= (*p->width)(p, - p->tcol->buf[p->tcol->col - 1]); - else - p->viscol += (*p->width)(p, - p->tcol->buf[p->tcol->col]); - } - vis = vend; - - if (breakline == 0) + *nbr = ic + 1; + *vbr = vis; continue; - /* Explicitly requested output line break. */ - - if (p->flags & TERMP_MULTICOL) - return; - - endline(p); - breakline = 0; - vis = vend = 0; - - /* Re-establish indentation. */ - - vbl = p->tcol->offset; - maxvis = p->tcol->rmargin > vbl ? - p->tcol->rmargin - vbl : 0; - bp = !(p->flags & TERMP_NOBREAK) ? maxvis : - p->maxrmargin > vbl ? p->maxrmargin - vbl : 0; + case ASCII_NBRSP: /* Non-breakable space. */ + p->tcol->buf[ic] = ' '; + /* FALLTHROUGH */ + default: /* Printable character. */ + graph = 1; + vis += (*p->width)(p, p->tcol->buf[ic]); + if (vis > vtarget && *nbr > 0) + return; + continue; + } + break; } /* - * If there was trailing white space, it was not printed; - * so reset the cursor position accordingly. + * If the last word extends to the end of the field without any + * trailing whitespace, the loop could not check yet whether it + * can remain on this line. So do the check now. */ - if (vis > vbl) - vis -= vbl; - else - vis = 0; + if (graph && (vis <= vtarget || *nbr == 0)) { + *nbr = ic; + *vbr = vis; + } +} - p->col = p->tcol->col = p->tcol->lastcol = 0; - p->minbl = p->trailspace; - p->flags &= ~(TERMP_BACKAFTER | TERMP_BACKBEFORE | TERMP_NOPAD); +/* + * Print the contents of one field + * with an indentation of vbl visual columns, + * an input string length of nbr characters, + * an output width of vbr visual columns, + * and a desired field width of vtarget visual columns. + */ +static void +term_field(struct termp *p, size_t vbl, size_t nbr, size_t vbr, size_t vtarget) +{ + size_t ic; /* Character position in the input buffer. */ + size_t vis; /* Visual position of the current character. */ + size_t dv; /* Visual width of the current character. */ + size_t vn; /* Visual position of the next character. */ - if (p->flags & TERMP_MULTICOL) - return; + vis = 0; + for (ic = p->tcol->col; ic < nbr; ic++) { - /* Trailing whitespace is significant in some columns. */ + /* + * To avoid the printing of trailing whitespace, + * do not print whitespace right away, only count it. + */ - if (vis && vbl && (TERMP_BRTRSP & p->flags)) - vis += vbl; + switch (p->tcol->buf[ic]) { + case '\n': + case ASCII_BREAK: + continue; + case '\t': + vn = term_tab_next(vis); + vbl += vn - vis; + vis = vn; + continue; + case ' ': + case ASCII_NBRSP: + vbl++; + vis++; + continue; + default: + break; + } - /* If the column was overrun, break the line. */ - if ((p->flags & TERMP_NOBREAK) == 0 || - ((p->flags & TERMP_HANG) == 0 && - vis + p->trailspace * (*p->width)(p, ' ') > maxvis)) - endline(p); + /* + * We found a non-blank character to print, + * so write preceding white space now. + */ + + if (vbl > 0) { + (*p->advance)(p, vbl); + p->viscol += vbl; + vbl = 0; + } + + /* Print the character and adjust the visual position. */ + + (*p->letter)(p, p->tcol->buf[ic]); + if (p->tcol->buf[ic] == '\b') { + dv = (*p->width)(p, p->tcol->buf[ic - 1]); + p->viscol -= dv; + vis -= dv; + } else { + dv = (*p->width)(p, p->tcol->buf[ic]); + p->viscol += dv; + vis += dv; + } + } + p->tcol->col = nbr; } static void -- To unsubscribe send an email to source+unsubscribe@mandoc.bsd.lv