From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp1.rz.uni-karlsruhe.de (Debian-exim@smtp1.rz.uni-karlsruhe.de [129.13.185.217]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id o6DNeNKt000692 for ; Tue, 13 Jul 2010 19:40:23 -0400 (EDT) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by smtp1.rz.uni-karlsruhe.de with esmtp (Exim 4.63 #1) id 1OYp5I-0002iM-Ez; Wed, 14 Jul 2010 01:40:22 +0200 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.71) (envelope-from ) id 1OYp5I-0002Zi-Da for tech@mdocml.bsd.lv; Wed, 14 Jul 2010 01:40:20 +0200 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.69) (envelope-from ) id 1OYp5I-0003mR-Cq for tech@mdocml.bsd.lv; Wed, 14 Jul 2010 01:40:20 +0200 Received: from schwarze by usta.de with local (Exim 4.71) (envelope-from ) id 1OYp5I-0005Ja-2m for tech@mdocml.bsd.lv; Wed, 14 Jul 2010 01:40:20 +0200 Date: Wed, 14 Jul 2010 01:40:19 +0200 From: Ingo Schwarze To: tech@mdocml.bsd.lv Subject: [PATCH] improve mandoc_eos Message-ID: <20100713234019.GF31123@iris.usta.de> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Hi, the mandoc_eos utility function can be improved a bit. Currently, it always flags end-of-sentence after a trailing full stop, exclamation mark or question mark, even if that character is the only one in the text. There are two cases where flagging EOS is correct: 1) Alphanumeric characters preceding the punctiation, e.g. This is a sentence. Next sentence... This should render as: This is a sentence. Next sentence... 2) Even when there are no preceding alphanumeric characters, The punctuation is neither followed by closing delimiters nor child of a macro, like in this case: Here is a full stop .Pq quite lonely . Next sentence... This should render as: Here is a full stop (quite lonely). Next sentence... But there are two cases where setting the flag is wrong: 3) The punctuation is followed by closing delimiters and not preceded by alphanumeric characters, e.g. There is no full stop (.) in this sentence This should render as: There is no full stop (.) in this sentence 4) The punctuation is child of a macro and not preceded by alphanumeric characters, e.g. There is no full stop .Pq \&. in this sentence This should also render as: There is no full stop (.) in this sentence The last case requires context information, so mandoc_eos needs another argument, telling whether the text is enclosed in a macro. The other changes are: * When finding trailing punctuation, set the same "enclosed" flag. * When finding ".!?", don't return(1) at once, just set the "found" flag. * Make the decision when finding a non-punctuation character. * We can now drop backslash special handling, it is not alnum. * Avoid indexing p for each loop cycle. This fixed lots of spacing problems, e.g. - chown(8): Previous versions of the chown utility used the dot (`.') character - csh(1): part of the prompt by placing a `!' in the prompt string. special case, `!!' refers to the previous command; thus `!!' alone is a newline follows immediately as may the trailing `?' in a contextual (containing, e.g., *'s, ?'s, and instances of ``[...]'') against is ``.'', ``/bin'', ``/usr/bin'', ``/sbin'' and Commands within loops, prompted for by `?', are not placed on the - ed(1): entering a single period (`.') on a line. with a bang (`!'), then it is interpreted as a shell command. - ls(1): List all entries except for `.' and `..'. Always set for the character `?'; this is the default when output is to a terminal. OK? Yours, Ingo Index: libmandoc.h =================================================================== RCS file: /cvs/src/usr.bin/mandoc/libmandoc.h,v retrieving revision 1.6 diff -u -p -r1.6 libmandoc.h --- libmandoc.h 26 Jun 2010 17:56:43 -0000 1.6 +++ libmandoc.h 13 Jul 2010 22:57:32 -0000 @@ -29,7 +29,7 @@ time_t mandoc_a2time(int, const char * #define MTIME_REDUCED (1 << 1) #define MTIME_MDOCDATE (1 << 2) #define MTIME_ISO_8601 (1 << 3) -int mandoc_eos(const char *, size_t); +int mandoc_eos(const char *, size_t, int); int mandoc_hyph(const char *, const char *); __END_DECLS Index: man.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/man.c,v retrieving revision 1.36 diff -u -p -r1.36 man.c --- man.c 13 Jul 2010 01:09:13 -0000 1.36 +++ man.c 13 Jul 2010 22:57:32 -0000 @@ -405,7 +405,7 @@ man_ptext(struct man *m, int line, char */ assert(i); - if (mandoc_eos(buf, (size_t)i)) + if (mandoc_eos(buf, (size_t)i, 0)) m->last->flags |= MAN_EOS; descope: Index: mandoc.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/mandoc.c,v retrieving revision 1.14 diff -u -p -r1.14 mandoc.c --- mandoc.c 26 Jun 2010 17:56:43 -0000 1.14 +++ mandoc.c 13 Jul 2010 22:57:34 -0000 @@ -324,8 +324,10 @@ mandoc_a2time(int flags, const char *p) int -mandoc_eos(const char *p, size_t sz) +mandoc_eos(const char *p, size_t sz, int enclosed) { + const char *q; + int found = 0; if (0 == sz) return(0); @@ -336,8 +338,8 @@ mandoc_eos(const char *p, size_t sz) * propogate outward. */ - for ( ; sz; sz--) { - switch (p[(int)sz - 1]) { + for (q = p + sz - 1; q >= p; q--) { + switch (*q) { case ('\"'): /* FALLTHROUGH */ case ('\''): @@ -345,22 +347,22 @@ mandoc_eos(const char *p, size_t sz) case (']'): /* FALLTHROUGH */ case (')'): + if (0 == found) + enclosed = 1; break; case ('.'): - /* Escaped periods. */ - if (sz > 1 && '\\' == p[(int)sz - 2]) - return(0); /* FALLTHROUGH */ case ('!'): /* FALLTHROUGH */ case ('?'): - return(1); + found = 1; + break; default: - return(0); + return(found && (!enclosed || isalnum(*q))); } } - return(0); + return(found && !enclosed); } Index: mdoc.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/mdoc.c,v retrieving revision 1.61 diff -u -p -r1.61 mdoc.c --- mdoc.c 13 Jul 2010 01:09:13 -0000 1.61 +++ mdoc.c 13 Jul 2010 22:57:36 -0000 @@ -717,7 +717,7 @@ mdoc_ptext(struct mdoc *m, int line, cha assert(buf < end); - if (mandoc_eos(buf+offs, (size_t)(end-buf-offs))) + if (mandoc_eos(buf+offs, (size_t)(end-buf-offs), 0)) m->last->flags |= MDOC_EOS; return(1); Index: mdoc_macro.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/mdoc_macro.c,v retrieving revision 1.54 diff -u -p -r1.54 mdoc_macro.c --- mdoc_macro.c 13 Jul 2010 01:09:13 -0000 1.54 +++ mdoc_macro.c 13 Jul 2010 22:57:39 -0000 @@ -606,7 +606,7 @@ append_delims(struct mdoc *m, int line, * knowing which symbols break this behaviour, for * example, `. ;' shouldn't propogate the double-space. */ - if (mandoc_eos(p, strlen(p))) + if (mandoc_eos(p, strlen(p), 0)) m->last->flags |= MDOC_EOS; } @@ -1262,7 +1262,7 @@ blk_part_imp(MACRO_PROT_ARGS) */ if (n && MDOC_TEXT == n->type && n->string) - if (mandoc_eos(n->string, strlen(n->string))) + if (mandoc_eos(n->string, strlen(n->string), 1)) n->flags |= MDOC_EOS; /* Up-propogate the end-of-space flag. */ -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv