From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mo-p00-ob.rzone.de (mo-p00-ob.rzone.de [81.169.146.161]) by krisdoz.my.domain (8.14.5/8.14.5) with ESMTP id q150jt7Q014698 for ; Sat, 4 Feb 2012 19:45:56 -0500 (EST) X-RZG-AUTH: :JiIXek6mfvEEUpFQdo7Fj1/zg48CFjWjQv0cW+St/nW/auYssS93lrNDFR+4U+0= X-RZG-CLASS-ID: mo00 Received: from britannica.bec.de (ip-2-202-121-218.web.vodafone.de [2.202.121.218]) by smtp.strato.de (klopstock mo61) (RZmta 27.6 DYNA|AUTH) with (DHE-RSA-AES128-SHA encrypted) ESMTPA id o01e5bo14MSH5a for ; Sun, 5 Feb 2012 01:45:46 +0100 (MET) Received: by britannica.bec.de (sSMTP sendmail emulation); Sun, 05 Feb 2012 01:45:42 +0100 Date: Sun, 5 Feb 2012 01:45:42 +0100 From: Joerg Sonnenberger To: discuss@mdocml.bsd.lv Subject: Discarding non-ASCII input Message-ID: <20120205004542.GA17831@britannica.bec.de> X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="82I3+IH0IqGh5yIs" Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) --82I3+IH0IqGh5yIs Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi all, at the moment we are discarding any non-ASCII characters. This turns a bunch of syntactically documents into complete garbage, e.g. by removing the arguments for .SH macros. I think it is more reasonable to replace them with "safe" garbage like iconv on most platforms does. What do you think of the attached patch? Joerg --82I3+IH0IqGh5yIs Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="read.c.diff" Index: read.c =================================================================== RCS file: /home/joerg/cvsroot/mdocml/read.c,v retrieving revision 1.26 diff -u -p -r1.26 read.c --- read.c 7 Nov 2011 01:24:40 -0000 1.26 +++ read.c 5 Feb 2012 00:31:33 -0000 @@ -325,9 +325,9 @@ mparse_buf_r(struct mparse *curp, struct * Warn about bogus characters. If you're using * non-ASCII encoding, you're screwing your * readers. Since I'd rather this not happen, - * I'll be helpful and drop these characters so - * we don't display gibberish. Note to manual - * writers: use special characters. + * I'll be helpful and replace these characters + * with "?", so we don't display gibberish. + * Note to manual writers: use special characters. */ c = (unsigned char) blk.buf[i]; @@ -337,6 +337,9 @@ mparse_buf_r(struct mparse *curp, struct mandoc_msg(MANDOCERR_BADCHAR, curp, curp->line, pos, "ignoring byte"); i++; + if (pos >= (int)ln.sz) + resize_buf(&ln, 256); + ln.buf[pos++] = '?'; continue; } --82I3+IH0IqGh5yIs-- -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv