From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailout.scc.kit.edu (mailout.scc.kit.edu [129.13.185.202]) by krisdoz.my.domain (8.14.5/8.14.5) with ESMTP id q15AHkvs015531 for ; Sun, 5 Feb 2012 05:17:47 -0500 (EST) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by scc-mailout-02.scc.kit.edu with esmtp (Exim 4.72 #1) id 1RtzAG-0000DS-Nh; Sun, 05 Feb 2012 11:17:44 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.77) (envelope-from ) id 1RtzAG-000129-Mv for discuss@mdocml.bsd.lv; Sun, 05 Feb 2012 11:17:44 +0100 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1RtzAG-0005JR-Lr for discuss@mdocml.bsd.lv; Sun, 05 Feb 2012 11:17:44 +0100 Received: from schwarze by usta.de with local (Exim 4.77) (envelope-from ) id 1RtzAG-0002VP-K9 for discuss@mdocml.bsd.lv; Sun, 05 Feb 2012 11:17:44 +0100 Date: Sun, 5 Feb 2012 11:17:44 +0100 From: Ingo Schwarze To: discuss@mdocml.bsd.lv Subject: Re: Discarding non-ASCII input Message-ID: <20120205101744.GA15816@iris.usta.de> References: <20120205004542.GA17831@britannica.bec.de> X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120205004542.GA17831@britannica.bec.de> User-Agent: Mutt/1.5.21 (2010-09-15) Hi Joerg, Joerg Sonnenberger wrote on Sun, Feb 05, 2012 at 01:45:42AM +0100: > at the moment we are discarding any non-ASCII characters. This turns a > bunch of syntactically documents into complete garbage, e.g. by removing > the arguments for .SH macros. I think it is more reasonable to replace > them with "safe" garbage like iconv on most platforms does. > > What do you think of the attached patch? I think i like the idea. You might also wish to replace the "ignoring byte" by NULL when changing this. Thanks, Ingo > Index: read.c > =================================================================== > RCS file: /home/joerg/cvsroot/mdocml/read.c,v > retrieving revision 1.26 > diff -u -p -r1.26 read.c > --- read.c 7 Nov 2011 01:24:40 -0000 1.26 > +++ read.c 5 Feb 2012 00:31:33 -0000 > @@ -325,9 +325,9 @@ mparse_buf_r(struct mparse *curp, struct > * Warn about bogus characters. If you're using > * non-ASCII encoding, you're screwing your > * readers. Since I'd rather this not happen, > - * I'll be helpful and drop these characters so > - * we don't display gibberish. Note to manual > - * writers: use special characters. > + * I'll be helpful and replace these characters > + * with "?", so we don't display gibberish. > + * Note to manual writers: use special characters. > */ > > c = (unsigned char) blk.buf[i]; > @@ -337,6 +337,9 @@ mparse_buf_r(struct mparse *curp, struct > mandoc_msg(MANDOCERR_BADCHAR, curp, > curp->line, pos, "ignoring byte"); > i++; > + if (pos >= (int)ln.sz) > + resize_buf(&ln, 256); > + ln.buf[pos++] = '?'; > continue; > } > -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv