tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: tech@mdocml.bsd.lv
Cc: jmc@openbsd.org
Subject: Non-ASCII check fails in main.c on OpenBSD
Date: Sun, 8 Aug 2010 14:47:04 +0200	[thread overview]
Message-ID: <20100808124704.GC17816@iris.usta.de> (raw)

Hi,

i just noticed that on OpenBSD, the change main.c bsd.lv-rev. 1.99
pathetically fails.  Non-ASCII characters are just passed through.

Now maybe that's a bug in OpenBSD isgraph(3), maybe it is not and
isgraph(3) behaves like it does on purpose - actually, i don't really
care that much either way, and i doubt that it will be easy to get
OpenBSD isgraph(3) changed.

For maximum portability, i think when we put a comment

  /* 
   * Warn about bogus characters.  If you're using
   * non-ASCII encoding, you're screwing your
   * readers.  Since I'd rather this not happen,
   * I'll be helpful and drop these characters so
   * we don't display gibberish.  Note to manual
   * writers: use special characters.
   */

which i fully agree with, we should also be explicit in the code,
or we risk that on some systems our code behaves in another way than
the comments make you think and than we intend, which is always bad.

So, as we want to drop non-ASCII, lets explicitely use isascii(3).

OK to commit to bsd.lv?

Of course, this will not be merged to OpenBSD until after unlock.

Yours,
  Ingo



Index: main.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/main.c,v
retrieving revision 1.43
diff -u -p -r1.43 main.c
--- main.c	25 Jul 2010 18:05:54 -0000	1.43
+++ main.c	8 Aug 2010 12:15:17 -0000
@@ -451,6 +451,7 @@ fdesc(struct curparse *curp)
 	struct buf	 ln, blk;
 	int		 i, pos, lnn, lnn_start, with_mmap, of;
 	enum rofferr	 re;
+	unsigned char	 c;
 	struct man	*man;
 	struct mdoc	*mdoc;
 	struct roff	*roff;
@@ -493,8 +494,8 @@ fdesc(struct curparse *curp)
 			 * writers: use special characters.
 			 */
 
-			if ( ! isgraph((u_char)blk.buf[i]) &&
-					! isblank((u_char)blk.buf[i])) {
+			c = (unsigned char) blk.buf[i];
+			if ( ! (isascii(c) && (isgraph(c) || isblank(c)))) {
 				if ( ! mmsg(MANDOCERR_BADCHAR, curp, 
 						lnn_start, pos, 
 						"ignoring byte"))
Index: man.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/man.c,v
retrieving revision 1.38
diff -u -p -r1.38 man.c
--- man.c	25 Jul 2010 18:05:54 -0000	1.38
+++ man.c	8 Aug 2010 12:15:17 -0000
@@ -477,23 +477,14 @@ man_pmacro(struct man *m, int ln, char *
 
 	ppos = i;
 
-	/* Copy the first word into a nil-terminated buffer. */
-
-	for (j = 0; j < 4; j++, i++) {
-		if ('\0' == (mac[j] = buf[i]))
-			break;
-		else if (' ' == buf[i])
-			break;
-
-		/* Check for invalid characters. */
-
-		if (isgraph((u_char)buf[i]))
-			continue;
-		if ( ! man_pmsg(m, ln, i, MANDOCERR_BADCHAR))
-			return(0);
-		i--;
-	}
+	/*
+	 * Copy the first word into a nil-terminated buffer.
+	 * Stop copying when a tab, space, or eoln is encountered.
+	 */
 
+	j = 0;
+	while (j < 4 && '\0' != buf[i] && ' ' != buf[i] && '\t' != buf[i])
+		mac[j++] = buf[i++];
 	mac[j] = '\0';
 
 	if (j == 4 || j < 1) {
Index: mdoc.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc.c,v
retrieving revision 1.63
diff -u -p -r1.63 mdoc.c
--- mdoc.c	7 Aug 2010 18:06:45 -0000	1.63
+++ mdoc.c	8 Aug 2010 12:15:17 -0000
@@ -773,26 +773,13 @@ mdoc_pmacro(struct mdoc *m, int ln, char
 	sv = i;
 
 	/* 
-	 * Copy the first word into a nil-terminated buffer.  Stop
-	 * copying when a tab, space, or eoln is encountered.
+	 * Copy the first word into a nil-terminated buffer.
+	 * Stop copying when a tab, space, or eoln is encountered.
 	 */
 
-	for (j = 0; j < 4; j++, i++) {
-		if ('\0' == (mac[j] = buf[i]))
-			break;
-		else if (' ' == buf[i] || '\t' == buf[i])
-			break;
-
-		/* Check for invalid characters. */
-		/* TODO: remove me, already done in main.c. */
-
-		if (isgraph((u_char)buf[i]))
-			continue;
-		if ( ! mdoc_pmsg(m, ln, i, MANDOCERR_BADCHAR))
-			return(0);
-		i--;
-	}
-
+	j = 0;
+	while (j < 4 && '\0' != buf[i] && ' ' != buf[i] && '\t' != buf[i])
+		mac[j++] = buf[i++];
 	mac[j] = '\0';
 
 	if (j == 4 || j < 2) {
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

             reply	other threads:[~2010-08-08 12:47 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-08 12:47 Ingo Schwarze [this message]
2010-08-08 14:27 ` Kristaps Dzonsons
2010-08-08 15:39 ` Joerg Sonnenberger
2010-08-08 18:52   ` Ingo Schwarze
2010-08-11 16:34     ` Ulrich Spörlein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100808124704.GC17816@iris.usta.de \
    --to=schwarze@usta.de \
    --cc=jmc@openbsd.org \
    --cc=tech@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).