tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* [PATCH] improve mandoc_eos
@ 2010-07-13 23:40 Ingo Schwarze
  0 siblings, 0 replies; only message in thread
From: Ingo Schwarze @ 2010-07-13 23:40 UTC (permalink / raw)
  To: tech

Hi,

the mandoc_eos utility function can be improved a bit.
Currently, it always flags end-of-sentence after a trailing
full stop, exclamation mark or question mark, even if that
character is the only one in the text.

There are two cases where flagging EOS is correct:

 1) Alphanumeric characters preceding the punctiation, e.g.
      This is a sentence.
      Next sentence...
    This should render as:
      This is a sentence.  Next sentence...

 2) Even when there are no preceding alphanumeric characters,
    The punctuation is neither followed by closing delimiters
    nor child of a macro, like in this case:
      Here is a full stop
      .Pq quite lonely .
      Next sentence...
    This should render as:
      Here is a full stop (quite lonely).  Next sentence...

But there are two cases where setting the flag is wrong:

 3) The punctuation is followed by closing delimiters
    and not preceded by alphanumeric characters, e.g.
      There is no full stop (.) in this sentence
    This should render as:
      There is no full stop (.) in this sentence

 4) The punctuation is child of a macro
    and not preceded by alphanumeric characters, e.g.
      There is no full stop
      .Pq \&.
      in this sentence
    This should also render as:
      There is no full stop (.) in this sentence

The last case requires context information, so mandoc_eos needs
another argument, telling whether the text is enclosed in a macro.

The other changes are:
 * When finding trailing punctuation, set the same "enclosed" flag.
 * When finding ".!?", don't return(1) at once, just set the "found" flag.
 * Make the decision when finding a non-punctuation character.
 * We can now drop backslash special handling, it is not alnum.
 * Avoid indexing p for each loop cycle.

This fixed lots of spacing problems, e.g.

 - chown(8):
   Previous versions of the chown utility used the dot (`.')  character

 - csh(1):
   part of the prompt by placing a `!'  in the prompt string.
   special case, `!!'  refers to the previous command; thus `!!'  alone is a
   newline follows immediately as may the trailing `?'  in a contextual
   (containing, e.g., *'s, ?'s, and instances of ``[...]'')  against
   is ``.'',  ``/bin'', ``/usr/bin'', ``/sbin'' and
   Commands within loops, prompted for by `?',  are not placed on the

 - ed(1):
   entering a single period (`.')  on a line.
   with a bang (`!'),  then it is interpreted as a shell command.

 - ls(1):
   List all entries except for `.'  and `..'.  Always set for the
   character `?';  this is the default when output is to a terminal.

OK?

Yours,
  Ingo


Index: libmandoc.h
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/libmandoc.h,v
retrieving revision 1.6
diff -u -p -r1.6 libmandoc.h
--- libmandoc.h	26 Jun 2010 17:56:43 -0000	1.6
+++ libmandoc.h	13 Jul 2010 22:57:32 -0000
@@ -29,7 +29,7 @@ time_t		 mandoc_a2time(int, const char *
 #define		 MTIME_REDUCED		(1 << 1)
 #define		 MTIME_MDOCDATE		(1 << 2)
 #define		 MTIME_ISO_8601		(1 << 3)
-int		 mandoc_eos(const char *, size_t);
+int		 mandoc_eos(const char *, size_t, int);
 int		 mandoc_hyph(const char *, const char *);
 
 __END_DECLS
Index: man.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/man.c,v
retrieving revision 1.36
diff -u -p -r1.36 man.c
--- man.c	13 Jul 2010 01:09:13 -0000	1.36
+++ man.c	13 Jul 2010 22:57:32 -0000
@@ -405,7 +405,7 @@ man_ptext(struct man *m, int line, char 
 	 */
 
 	assert(i);
-	if (mandoc_eos(buf, (size_t)i))
+	if (mandoc_eos(buf, (size_t)i, 0))
 		m->last->flags |= MAN_EOS;
 
 descope:
Index: mandoc.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mandoc.c,v
retrieving revision 1.14
diff -u -p -r1.14 mandoc.c
--- mandoc.c	26 Jun 2010 17:56:43 -0000	1.14
+++ mandoc.c	13 Jul 2010 22:57:34 -0000
@@ -324,8 +324,10 @@ mandoc_a2time(int flags, const char *p)
 
 
 int
-mandoc_eos(const char *p, size_t sz)
+mandoc_eos(const char *p, size_t sz, int enclosed)
 {
+	const char *q;
+	int found = 0;
 
 	if (0 == sz)
 		return(0);
@@ -336,8 +338,8 @@ mandoc_eos(const char *p, size_t sz)
 	 * propogate outward.
 	 */
 
-	for ( ; sz; sz--) {
-		switch (p[(int)sz - 1]) {
+	for (q = p + sz - 1; q >= p; q--) {
+		switch (*q) {
 		case ('\"'):
 			/* FALLTHROUGH */
 		case ('\''):
@@ -345,22 +347,22 @@ mandoc_eos(const char *p, size_t sz)
 		case (']'):
 			/* FALLTHROUGH */
 		case (')'):
+			if (0 == found)
+				enclosed = 1;
 			break;
 		case ('.'):
-			/* Escaped periods. */
-			if (sz > 1 && '\\' == p[(int)sz - 2])
-				return(0);
 			/* FALLTHROUGH */
 		case ('!'):
 			/* FALLTHROUGH */
 		case ('?'):
-			return(1);
+			found = 1;
+			break;
 		default:
-			return(0);
+			return(found && (!enclosed || isalnum(*q)));
 		}
 	}
 
-	return(0);
+	return(found && !enclosed);
 }
 
 
Index: mdoc.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc.c,v
retrieving revision 1.61
diff -u -p -r1.61 mdoc.c
--- mdoc.c	13 Jul 2010 01:09:13 -0000	1.61
+++ mdoc.c	13 Jul 2010 22:57:36 -0000
@@ -717,7 +717,7 @@ mdoc_ptext(struct mdoc *m, int line, cha
 
 	assert(buf < end);
 
-	if (mandoc_eos(buf+offs, (size_t)(end-buf-offs)))
+	if (mandoc_eos(buf+offs, (size_t)(end-buf-offs), 0))
 		m->last->flags |= MDOC_EOS;
 
 	return(1);
Index: mdoc_macro.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc_macro.c,v
retrieving revision 1.54
diff -u -p -r1.54 mdoc_macro.c
--- mdoc_macro.c	13 Jul 2010 01:09:13 -0000	1.54
+++ mdoc_macro.c	13 Jul 2010 22:57:39 -0000
@@ -606,7 +606,7 @@ append_delims(struct mdoc *m, int line, 
 		 * knowing which symbols break this behaviour, for
 		 * example, `.  ;' shouldn't propogate the double-space.
 		 */
-		if (mandoc_eos(p, strlen(p)))
+		if (mandoc_eos(p, strlen(p), 0))
 			m->last->flags |= MDOC_EOS;
 	}
 
@@ -1262,7 +1262,7 @@ blk_part_imp(MACRO_PROT_ARGS)
 	 */
 
 	if (n && MDOC_TEXT == n->type && n->string)
-		if (mandoc_eos(n->string, strlen(n->string)))
+		if (mandoc_eos(n->string, strlen(n->string), 1))
 			n->flags |= MDOC_EOS;
 
 	/* Up-propogate the end-of-space flag. */
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2010-07-13 23:40 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-13 23:40 [PATCH] improve mandoc_eos Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).