tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* Non-ASCII check fails in main.c on OpenBSD
@ 2010-08-08 12:47 Ingo Schwarze
  2010-08-08 14:27 ` Kristaps Dzonsons
  2010-08-08 15:39 ` Joerg Sonnenberger
  0 siblings, 2 replies; 5+ messages in thread
From: Ingo Schwarze @ 2010-08-08 12:47 UTC (permalink / raw)
  To: tech; +Cc: jmc

Hi,

i just noticed that on OpenBSD, the change main.c bsd.lv-rev. 1.99
pathetically fails.  Non-ASCII characters are just passed through.

Now maybe that's a bug in OpenBSD isgraph(3), maybe it is not and
isgraph(3) behaves like it does on purpose - actually, i don't really
care that much either way, and i doubt that it will be easy to get
OpenBSD isgraph(3) changed.

For maximum portability, i think when we put a comment

  /* 
   * Warn about bogus characters.  If you're using
   * non-ASCII encoding, you're screwing your
   * readers.  Since I'd rather this not happen,
   * I'll be helpful and drop these characters so
   * we don't display gibberish.  Note to manual
   * writers: use special characters.
   */

which i fully agree with, we should also be explicit in the code,
or we risk that on some systems our code behaves in another way than
the comments make you think and than we intend, which is always bad.

So, as we want to drop non-ASCII, lets explicitely use isascii(3).

OK to commit to bsd.lv?

Of course, this will not be merged to OpenBSD until after unlock.

Yours,
  Ingo



Index: main.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/main.c,v
retrieving revision 1.43
diff -u -p -r1.43 main.c
--- main.c	25 Jul 2010 18:05:54 -0000	1.43
+++ main.c	8 Aug 2010 12:15:17 -0000
@@ -451,6 +451,7 @@ fdesc(struct curparse *curp)
 	struct buf	 ln, blk;
 	int		 i, pos, lnn, lnn_start, with_mmap, of;
 	enum rofferr	 re;
+	unsigned char	 c;
 	struct man	*man;
 	struct mdoc	*mdoc;
 	struct roff	*roff;
@@ -493,8 +494,8 @@ fdesc(struct curparse *curp)
 			 * writers: use special characters.
 			 */
 
-			if ( ! isgraph((u_char)blk.buf[i]) &&
-					! isblank((u_char)blk.buf[i])) {
+			c = (unsigned char) blk.buf[i];
+			if ( ! (isascii(c) && (isgraph(c) || isblank(c)))) {
 				if ( ! mmsg(MANDOCERR_BADCHAR, curp, 
 						lnn_start, pos, 
 						"ignoring byte"))
Index: man.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/man.c,v
retrieving revision 1.38
diff -u -p -r1.38 man.c
--- man.c	25 Jul 2010 18:05:54 -0000	1.38
+++ man.c	8 Aug 2010 12:15:17 -0000
@@ -477,23 +477,14 @@ man_pmacro(struct man *m, int ln, char *
 
 	ppos = i;
 
-	/* Copy the first word into a nil-terminated buffer. */
-
-	for (j = 0; j < 4; j++, i++) {
-		if ('\0' == (mac[j] = buf[i]))
-			break;
-		else if (' ' == buf[i])
-			break;
-
-		/* Check for invalid characters. */
-
-		if (isgraph((u_char)buf[i]))
-			continue;
-		if ( ! man_pmsg(m, ln, i, MANDOCERR_BADCHAR))
-			return(0);
-		i--;
-	}
+	/*
+	 * Copy the first word into a nil-terminated buffer.
+	 * Stop copying when a tab, space, or eoln is encountered.
+	 */
 
+	j = 0;
+	while (j < 4 && '\0' != buf[i] && ' ' != buf[i] && '\t' != buf[i])
+		mac[j++] = buf[i++];
 	mac[j] = '\0';
 
 	if (j == 4 || j < 1) {
Index: mdoc.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc.c,v
retrieving revision 1.63
diff -u -p -r1.63 mdoc.c
--- mdoc.c	7 Aug 2010 18:06:45 -0000	1.63
+++ mdoc.c	8 Aug 2010 12:15:17 -0000
@@ -773,26 +773,13 @@ mdoc_pmacro(struct mdoc *m, int ln, char
 	sv = i;
 
 	/* 
-	 * Copy the first word into a nil-terminated buffer.  Stop
-	 * copying when a tab, space, or eoln is encountered.
+	 * Copy the first word into a nil-terminated buffer.
+	 * Stop copying when a tab, space, or eoln is encountered.
 	 */
 
-	for (j = 0; j < 4; j++, i++) {
-		if ('\0' == (mac[j] = buf[i]))
-			break;
-		else if (' ' == buf[i] || '\t' == buf[i])
-			break;
-
-		/* Check for invalid characters. */
-		/* TODO: remove me, already done in main.c. */
-
-		if (isgraph((u_char)buf[i]))
-			continue;
-		if ( ! mdoc_pmsg(m, ln, i, MANDOCERR_BADCHAR))
-			return(0);
-		i--;
-	}
-
+	j = 0;
+	while (j < 4 && '\0' != buf[i] && ' ' != buf[i] && '\t' != buf[i])
+		mac[j++] = buf[i++];
 	mac[j] = '\0';
 
 	if (j == 4 || j < 2) {
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Non-ASCII check fails in main.c on OpenBSD
  2010-08-08 12:47 Non-ASCII check fails in main.c on OpenBSD Ingo Schwarze
@ 2010-08-08 14:27 ` Kristaps Dzonsons
  2010-08-08 15:39 ` Joerg Sonnenberger
  1 sibling, 0 replies; 5+ messages in thread
From: Kristaps Dzonsons @ 2010-08-08 14:27 UTC (permalink / raw)
  To: tech; +Cc: jmc

> i just noticed that on OpenBSD, the change main.c bsd.lv-rev. 1.99
> pathetically fails.  Non-ASCII characters are just passed through.
> 
> Now maybe that's a bug in OpenBSD isgraph(3), maybe it is not and
> isgraph(3) behaves like it does on purpose - actually, i don't really
> care that much either way, and i doubt that it will be easy to get
> OpenBSD isgraph(3) changed.
> 
> For maximum portability, i think when we put a comment
> 
>   /* 
>    * Warn about bogus characters.  If you're using
>    * non-ASCII encoding, you're screwing your
>    * readers.  Since I'd rather this not happen,
>    * I'll be helpful and drop these characters so
>    * we don't display gibberish.  Note to manual
>    * writers: use special characters.
>    */
> 
> which i fully agree with, we should also be explicit in the code,
> or we risk that on some systems our code behaves in another way than
> the comments make you think and than we intend, which is always bad.
> 
> So, as we want to drop non-ASCII, lets explicitely use isascii(3).
> 
> OK to commit to bsd.lv?
> 
> Of course, this will not be merged to OpenBSD until after unlock.

Ingo, good catch!  But yes, I tested this code on a GNU/Linux machine, 
never thinking that the behaviour would be different on OpenBSD.

Ok kristaps.
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Non-ASCII check fails in main.c on OpenBSD
  2010-08-08 12:47 Non-ASCII check fails in main.c on OpenBSD Ingo Schwarze
  2010-08-08 14:27 ` Kristaps Dzonsons
@ 2010-08-08 15:39 ` Joerg Sonnenberger
  2010-08-08 18:52   ` Ingo Schwarze
  1 sibling, 1 reply; 5+ messages in thread
From: Joerg Sonnenberger @ 2010-08-08 15:39 UTC (permalink / raw)
  To: tech

On Sun, Aug 08, 2010 at 02:47:04PM +0200, Ingo Schwarze wrote:
> Now maybe that's a bug in OpenBSD isgraph(3), maybe it is not and
> isgraph(3) behaves like it does on purpose - actually, i don't really
> care that much either way, and i doubt that it will be easy to get
> OpenBSD isgraph(3) changed.

Didn't OpenBSD have some "default locale == ISO 8859-1" hack or was
that MirBSD? I'm reasonable sure that such behavior violates the
definition of the C locale.

Joerg
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Non-ASCII check fails in main.c on OpenBSD
  2010-08-08 15:39 ` Joerg Sonnenberger
@ 2010-08-08 18:52   ` Ingo Schwarze
  2010-08-11 16:34     ` Ulrich Spörlein
  0 siblings, 1 reply; 5+ messages in thread
From: Ingo Schwarze @ 2010-08-08 18:52 UTC (permalink / raw)
  To: tech

Hi Joerg,

thanks for your hint.

Joerg Sonnenberger wrote on Sun, Aug 08, 2010 at 05:39:28PM +0200:
> On Sun, Aug 08, 2010 at 02:47:04PM +0200, Ingo Schwarze wrote:

>> Now maybe that's a bug in OpenBSD isgraph(3), maybe it is not and
>> isgraph(3) behaves like it does on purpose - actually, i don't really
>> care that much either way, and i doubt that it will be easy to get
>> OpenBSD isgraph(3) changed.

> Didn't OpenBSD have some "default locale == ISO 8859-1" hack

Maybe, no idea.
As i said, i don't care about locales and never change the defaults.

Hmmm, but at least i should understand what we are doing in mandoc.
The test program appended below gives me
  1 1  on OpenBSD
  0 1  on Linux

I guess Linux is correct.


> I'm reasonable sure that such behavior violates the
> definition of the C locale.

That statement seems to make sense to me.

So, no matter what we think about OpenBSD's behaviour in this respect,
i think the patch i committed to explicitely call isascii() is correct:
We certainly don't want any of mandoc's behaviour to depend on whatever
the user may have chosen as a locale.

Yours,
  Ingo


#include <ctype.h>
#include <locale.h>
#include <stdio.h>

int
main() {
	setlocale(LC_CTYPE, "C");
	printf("%i\n", !!isgraph(228));
	setlocale(LC_CTYPE, "de_DE.ISO8859-1");
	printf("%i\n", !!isgraph(228));
	return 0;
}
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Non-ASCII check fails in main.c on OpenBSD
  2010-08-08 18:52   ` Ingo Schwarze
@ 2010-08-11 16:34     ` Ulrich Spörlein
  0 siblings, 0 replies; 5+ messages in thread
From: Ulrich Spörlein @ 2010-08-11 16:34 UTC (permalink / raw)
  To: tech

On Sun, 08.08.2010 at 20:52:46 +0200, Ingo Schwarze wrote:
> Joerg Sonnenberger wrote on Sun, Aug 08, 2010 at 05:39:28PM +0200:
> > On Sun, Aug 08, 2010 at 02:47:04PM +0200, Ingo Schwarze wrote:
> >> Now maybe that's a bug in OpenBSD isgraph(3), maybe it is not and
> >> isgraph(3) behaves like it does on purpose - actually, i don't really
> >> care that much either way, and i doubt that it will be easy to get
> >> OpenBSD isgraph(3) changed.
> 
> > Didn't OpenBSD have some "default locale == ISO 8859-1" hack
> 
> Maybe, no idea.
> As i said, i don't care about locales and never change the defaults.
> 
> Hmmm, but at least i should understand what we are doing in mandoc.
> The test program appended below gives me
>   1 1  on OpenBSD
>   0 1  on Linux
> 
> I guess Linux is correct.

On FreeBSD, for FWIW:

% cc char.c && ./a.out
0
1

--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-08-11 16:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-08 12:47 Non-ASCII check fails in main.c on OpenBSD Ingo Schwarze
2010-08-08 14:27 ` Kristaps Dzonsons
2010-08-08 15:39 ` Joerg Sonnenberger
2010-08-08 18:52   ` Ingo Schwarze
2010-08-11 16:34     ` Ulrich Spörlein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).