* [PATCH] improve mandoc_eos
@ 2010-07-13 23:40 Ingo Schwarze
0 siblings, 0 replies; only message in thread
From: Ingo Schwarze @ 2010-07-13 23:40 UTC (permalink / raw)
To: tech
Hi,
the mandoc_eos utility function can be improved a bit.
Currently, it always flags end-of-sentence after a trailing
full stop, exclamation mark or question mark, even if that
character is the only one in the text.
There are two cases where flagging EOS is correct:
1) Alphanumeric characters preceding the punctiation, e.g.
This is a sentence.
Next sentence...
This should render as:
This is a sentence. Next sentence...
2) Even when there are no preceding alphanumeric characters,
The punctuation is neither followed by closing delimiters
nor child of a macro, like in this case:
Here is a full stop
.Pq quite lonely .
Next sentence...
This should render as:
Here is a full stop (quite lonely). Next sentence...
But there are two cases where setting the flag is wrong:
3) The punctuation is followed by closing delimiters
and not preceded by alphanumeric characters, e.g.
There is no full stop (.) in this sentence
This should render as:
There is no full stop (.) in this sentence
4) The punctuation is child of a macro
and not preceded by alphanumeric characters, e.g.
There is no full stop
.Pq \&.
in this sentence
This should also render as:
There is no full stop (.) in this sentence
The last case requires context information, so mandoc_eos needs
another argument, telling whether the text is enclosed in a macro.
The other changes are:
* When finding trailing punctuation, set the same "enclosed" flag.
* When finding ".!?", don't return(1) at once, just set the "found" flag.
* Make the decision when finding a non-punctuation character.
* We can now drop backslash special handling, it is not alnum.
* Avoid indexing p for each loop cycle.
This fixed lots of spacing problems, e.g.
- chown(8):
Previous versions of the chown utility used the dot (`.') character
- csh(1):
part of the prompt by placing a `!' in the prompt string.
special case, `!!' refers to the previous command; thus `!!' alone is a
newline follows immediately as may the trailing `?' in a contextual
(containing, e.g., *'s, ?'s, and instances of ``[...]'') against
is ``.'', ``/bin'', ``/usr/bin'', ``/sbin'' and
Commands within loops, prompted for by `?', are not placed on the
- ed(1):
entering a single period (`.') on a line.
with a bang (`!'), then it is interpreted as a shell command.
- ls(1):
List all entries except for `.' and `..'. Always set for the
character `?'; this is the default when output is to a terminal.
OK?
Yours,
Ingo
Index: libmandoc.h
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/libmandoc.h,v
retrieving revision 1.6
diff -u -p -r1.6 libmandoc.h
--- libmandoc.h 26 Jun 2010 17:56:43 -0000 1.6
+++ libmandoc.h 13 Jul 2010 22:57:32 -0000
@@ -29,7 +29,7 @@ time_t mandoc_a2time(int, const char *
#define MTIME_REDUCED (1 << 1)
#define MTIME_MDOCDATE (1 << 2)
#define MTIME_ISO_8601 (1 << 3)
-int mandoc_eos(const char *, size_t);
+int mandoc_eos(const char *, size_t, int);
int mandoc_hyph(const char *, const char *);
__END_DECLS
Index: man.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/man.c,v
retrieving revision 1.36
diff -u -p -r1.36 man.c
--- man.c 13 Jul 2010 01:09:13 -0000 1.36
+++ man.c 13 Jul 2010 22:57:32 -0000
@@ -405,7 +405,7 @@ man_ptext(struct man *m, int line, char
*/
assert(i);
- if (mandoc_eos(buf, (size_t)i))
+ if (mandoc_eos(buf, (size_t)i, 0))
m->last->flags |= MAN_EOS;
descope:
Index: mandoc.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mandoc.c,v
retrieving revision 1.14
diff -u -p -r1.14 mandoc.c
--- mandoc.c 26 Jun 2010 17:56:43 -0000 1.14
+++ mandoc.c 13 Jul 2010 22:57:34 -0000
@@ -324,8 +324,10 @@ mandoc_a2time(int flags, const char *p)
int
-mandoc_eos(const char *p, size_t sz)
+mandoc_eos(const char *p, size_t sz, int enclosed)
{
+ const char *q;
+ int found = 0;
if (0 == sz)
return(0);
@@ -336,8 +338,8 @@ mandoc_eos(const char *p, size_t sz)
* propogate outward.
*/
- for ( ; sz; sz--) {
- switch (p[(int)sz - 1]) {
+ for (q = p + sz - 1; q >= p; q--) {
+ switch (*q) {
case ('\"'):
/* FALLTHROUGH */
case ('\''):
@@ -345,22 +347,22 @@ mandoc_eos(const char *p, size_t sz)
case (']'):
/* FALLTHROUGH */
case (')'):
+ if (0 == found)
+ enclosed = 1;
break;
case ('.'):
- /* Escaped periods. */
- if (sz > 1 && '\\' == p[(int)sz - 2])
- return(0);
/* FALLTHROUGH */
case ('!'):
/* FALLTHROUGH */
case ('?'):
- return(1);
+ found = 1;
+ break;
default:
- return(0);
+ return(found && (!enclosed || isalnum(*q)));
}
}
- return(0);
+ return(found && !enclosed);
}
Index: mdoc.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc.c,v
retrieving revision 1.61
diff -u -p -r1.61 mdoc.c
--- mdoc.c 13 Jul 2010 01:09:13 -0000 1.61
+++ mdoc.c 13 Jul 2010 22:57:36 -0000
@@ -717,7 +717,7 @@ mdoc_ptext(struct mdoc *m, int line, cha
assert(buf < end);
- if (mandoc_eos(buf+offs, (size_t)(end-buf-offs)))
+ if (mandoc_eos(buf+offs, (size_t)(end-buf-offs), 0))
m->last->flags |= MDOC_EOS;
return(1);
Index: mdoc_macro.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc_macro.c,v
retrieving revision 1.54
diff -u -p -r1.54 mdoc_macro.c
--- mdoc_macro.c 13 Jul 2010 01:09:13 -0000 1.54
+++ mdoc_macro.c 13 Jul 2010 22:57:39 -0000
@@ -606,7 +606,7 @@ append_delims(struct mdoc *m, int line,
* knowing which symbols break this behaviour, for
* example, `. ;' shouldn't propogate the double-space.
*/
- if (mandoc_eos(p, strlen(p)))
+ if (mandoc_eos(p, strlen(p), 0))
m->last->flags |= MDOC_EOS;
}
@@ -1262,7 +1262,7 @@ blk_part_imp(MACRO_PROT_ARGS)
*/
if (n && MDOC_TEXT == n->type && n->string)
- if (mandoc_eos(n->string, strlen(n->string)))
+ if (mandoc_eos(n->string, strlen(n->string), 1))
n->flags |= MDOC_EOS;
/* Up-propogate the end-of-space flag. */
--
To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2010-07-13 23:40 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-13 23:40 [PATCH] improve mandoc_eos Ingo Schwarze
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).