From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from krisdoz.my.domain (kristaps@localhost [127.0.0.1]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id p81M9pNp006610 for ; Thu, 1 Sep 2011 18:09:51 -0400 (EDT) Received: (from kristaps@localhost) by krisdoz.my.domain (8.14.3/8.14.3/Submit) id p81M9o3d011561; Thu, 1 Sep 2011 18:09:50 -0400 (EDT) Date: Thu, 1 Sep 2011 18:09:50 -0400 (EDT) Message-Id: <201109012209.p81M9o3d011561@krisdoz.my.domain> X-Mailinglist: mdocml-source Reply-To: source@mdocml.bsd.lv MIME-Version: 1.0 From: kristaps@mdocml.bsd.lv To: source@mdocml.bsd.lv Subject: mdocml: Make `-w' mode work much better. X-Mailer: activitymail 1.26, http://search.cpan.org/dist/activitymail/ Content-Type: text/plain; charset=utf-8 Log Message: ----------- Make `-w' mode work much better. This is INCREDIBLY poorly specified in any other deroff manual, and as I don't think anybody actually uses deroff, I don't feel compelled to research its behaviour too much and can just do what's logical. Modified Files: -------------- mdocml: demandoc.1 demandoc.c Revision Data ------------- Index: demandoc.1 =================================================================== RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/demandoc.1,v retrieving revision 1.2 retrieving revision 1.3 diff -Ldemandoc.1 -Ldemandoc.1 -u -p -r1.2 -r1.3 --- demandoc.1 +++ demandoc.1 @@ -39,9 +39,10 @@ Its arguments are as follows: Output a word list. This outputs each word of text on its own line. A -.Qq word -starts with at least two letters and consists of at least three letters -total. +.Qq word , +in this case, refers to whitespace-delimited terms beginning with at +least two letters after opening punctuation and not consisting of any +escape sequences. .It Ar The input files. .El @@ -51,12 +52,13 @@ If is not provided, .Nm accepts standard input. +If a document is not well-formed, it is skipped. .Pp By default, .Nm parses its input and outputs only text nodes, preserving line column position. -If a document is not well-formed, it is skipped. +Escape sequences are omitted from the output. .Pp The .Fl i , Index: demandoc.c =================================================================== RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/demandoc.c,v retrieving revision 1.4 retrieving revision 1.5 diff -Ldemandoc.c -Ldemandoc.c -u -p -r1.4 -r1.5 --- demandoc.c +++ demandoc.c @@ -121,7 +121,8 @@ pmandoc(struct mparse *mp, int fd, const else return; - putchar('\n'); + if ( ! list) + putchar('\n'); } /* @@ -131,12 +132,58 @@ static void pstring(const char *p, int col, int *colp, int list) { enum mandoc_esc esc; + const char *start; + int emit; + + /* + * Print as many column spaces til we achieve parity with the + * input document. + */ + +again: + if (list && '\0' != *p) { + while (isspace((unsigned char)*p)) + p++; + + while ('\'' == *p || '(' == *p || '"' == *p) + p++; + + emit = isalpha((unsigned char)p[0]) && + isalpha((unsigned char)p[1]); + + for (start = p; '\0' != *p; p++) + if ('\\' == *p) { + p++; + esc = mandoc_escape(&p, NULL, NULL); + if (ESCAPE_ERROR == esc) + return; + emit = 0; + } else if (isspace((unsigned char)*p)) + break; + + if (emit && p - start >= 2) { + for ( ; start != p; start++) + if (ASCII_HYPH == *start) + putchar('-'); + else + putchar((unsigned char)*start); + putchar('\n'); + } + + if (isspace((unsigned char)*p)) + goto again; + + return; + } while (*colp < col) { putchar(' '); (*colp)++; } + /* + * Print the input word, skipping any special characters. + */ while ('\0' != *p) if ('\\' == *p) { p++; @@ -152,6 +199,14 @@ pstring(const char *p, int col, int *col static void pline(int line, int *linep, int *col, int list) { + + if (list) + return; + + /* + * Print out as many lines as needed to reach parity with the + * original input. + */ while (*linep < line) { putchar('\n'); -- To unsubscribe send an email to source+unsubscribe@mdocml.bsd.lv