From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp1.rz.uni-karlsruhe.de (Debian-exim@smtp1.rz.uni-karlsruhe.de [129.13.185.217]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id o78Cl7v0008462 for ; Sun, 8 Aug 2010 08:47:09 -0400 (EDT) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by smtp1.rz.uni-karlsruhe.de with esmtp (Exim 4.63 #1) id 1Oi5HM-0006M1-W1; Sun, 08 Aug 2010 14:47:05 +0200 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.71) (envelope-from ) id 1Oi5HM-0001J3-Ud; Sun, 08 Aug 2010 14:47:04 +0200 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.69) (envelope-from ) id 1Oi5HM-0001Yg-SN; Sun, 08 Aug 2010 14:47:04 +0200 Received: from schwarze by usta.de with local (Exim 4.71) (envelope-from ) id 1Oi5HM-0006y2-JC; Sun, 08 Aug 2010 14:47:04 +0200 Date: Sun, 8 Aug 2010 14:47:04 +0200 From: Ingo Schwarze To: tech@mdocml.bsd.lv Cc: jmc@openbsd.org Subject: Non-ASCII check fails in main.c on OpenBSD Message-ID: <20100808124704.GC17816@iris.usta.de> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Hi, i just noticed that on OpenBSD, the change main.c bsd.lv-rev. 1.99 pathetically fails. Non-ASCII characters are just passed through. Now maybe that's a bug in OpenBSD isgraph(3), maybe it is not and isgraph(3) behaves like it does on purpose - actually, i don't really care that much either way, and i doubt that it will be easy to get OpenBSD isgraph(3) changed. For maximum portability, i think when we put a comment /* * Warn about bogus characters. If you're using * non-ASCII encoding, you're screwing your * readers. Since I'd rather this not happen, * I'll be helpful and drop these characters so * we don't display gibberish. Note to manual * writers: use special characters. */ which i fully agree with, we should also be explicit in the code, or we risk that on some systems our code behaves in another way than the comments make you think and than we intend, which is always bad. So, as we want to drop non-ASCII, lets explicitely use isascii(3). OK to commit to bsd.lv? Of course, this will not be merged to OpenBSD until after unlock. Yours, Ingo Index: main.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/main.c,v retrieving revision 1.43 diff -u -p -r1.43 main.c --- main.c 25 Jul 2010 18:05:54 -0000 1.43 +++ main.c 8 Aug 2010 12:15:17 -0000 @@ -451,6 +451,7 @@ fdesc(struct curparse *curp) struct buf ln, blk; int i, pos, lnn, lnn_start, with_mmap, of; enum rofferr re; + unsigned char c; struct man *man; struct mdoc *mdoc; struct roff *roff; @@ -493,8 +494,8 @@ fdesc(struct curparse *curp) * writers: use special characters. */ - if ( ! isgraph((u_char)blk.buf[i]) && - ! isblank((u_char)blk.buf[i])) { + c = (unsigned char) blk.buf[i]; + if ( ! (isascii(c) && (isgraph(c) || isblank(c)))) { if ( ! mmsg(MANDOCERR_BADCHAR, curp, lnn_start, pos, "ignoring byte")) Index: man.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/man.c,v retrieving revision 1.38 diff -u -p -r1.38 man.c --- man.c 25 Jul 2010 18:05:54 -0000 1.38 +++ man.c 8 Aug 2010 12:15:17 -0000 @@ -477,23 +477,14 @@ man_pmacro(struct man *m, int ln, char * ppos = i; - /* Copy the first word into a nil-terminated buffer. */ - - for (j = 0; j < 4; j++, i++) { - if ('\0' == (mac[j] = buf[i])) - break; - else if (' ' == buf[i]) - break; - - /* Check for invalid characters. */ - - if (isgraph((u_char)buf[i])) - continue; - if ( ! man_pmsg(m, ln, i, MANDOCERR_BADCHAR)) - return(0); - i--; - } + /* + * Copy the first word into a nil-terminated buffer. + * Stop copying when a tab, space, or eoln is encountered. + */ + j = 0; + while (j < 4 && '\0' != buf[i] && ' ' != buf[i] && '\t' != buf[i]) + mac[j++] = buf[i++]; mac[j] = '\0'; if (j == 4 || j < 1) { Index: mdoc.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/mdoc.c,v retrieving revision 1.63 diff -u -p -r1.63 mdoc.c --- mdoc.c 7 Aug 2010 18:06:45 -0000 1.63 +++ mdoc.c 8 Aug 2010 12:15:17 -0000 @@ -773,26 +773,13 @@ mdoc_pmacro(struct mdoc *m, int ln, char sv = i; /* - * Copy the first word into a nil-terminated buffer. Stop - * copying when a tab, space, or eoln is encountered. + * Copy the first word into a nil-terminated buffer. + * Stop copying when a tab, space, or eoln is encountered. */ - for (j = 0; j < 4; j++, i++) { - if ('\0' == (mac[j] = buf[i])) - break; - else if (' ' == buf[i] || '\t' == buf[i]) - break; - - /* Check for invalid characters. */ - /* TODO: remove me, already done in main.c. */ - - if (isgraph((u_char)buf[i])) - continue; - if ( ! mdoc_pmsg(m, ln, i, MANDOCERR_BADCHAR)) - return(0); - i--; - } - + j = 0; + while (j < 4 && '\0' != buf[i] && ' ' != buf[i] && '\t' != buf[i]) + mac[j++] = buf[i++]; mac[j] = '\0'; if (j == 4 || j < 2) { -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv