From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-1.sys.kth.se (smtp-1.sys.kth.se [130.237.32.175]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id o678nQo1028660 for ; Wed, 7 Jul 2010 04:49:27 -0400 (EDT) Received: from smtp-1.sys.kth.se (localhost [127.0.0.1]) by smtp-1.sys.kth.se (Postfix) with ESMTP id 4B1DE157009; Wed, 7 Jul 2010 10:49:20 +0200 (CEST) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-1.sys.kth.se ([127.0.0.1]) by smtp-1.sys.kth.se (smtp-1.sys.kth.se [127.0.0.1]) (amavisd-new, port 10024) with LMTP id wtUo5PmnV-Ut; Wed, 7 Jul 2010 10:49:18 +0200 (CEST) X-KTH-Auth: kristaps [85.8.61.208] X-KTH-mail-from: kristaps@bsd.lv Received: from lappy.bsd.lv (h85-8-61-208.dynamic.se.alltele.net [85.8.61.208]) by smtp-1.sys.kth.se (Postfix) with ESMTP id 8982615701D; Wed, 7 Jul 2010 10:49:15 +0200 (CEST) Message-ID: <4C343FA6.6090807@bsd.lv> Date: Wed, 07 Jul 2010 10:49:42 +0200 From: Kristaps Dzonsons User-Agent: Thunderbird 2.0.0.16 (X11/20080812) X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 To: Jason McIntyre CC: "tech@mdocml.bsd.lv" Subject: Re: roff_getstr() and input characters References: <4C33B754.2010609@bsd.lv> <20100706231643.GC32413@bramka.kerhand.co.uk> In-Reply-To: <20100706231643.GC32413@bramka.kerhand.co.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit >> The reason I want to air it with you (I know it works: I've tested it >> across all manuals) is because it also removes the check for isprint(), >> using strcspn() instead. As you can see, the rej filter is only for >> '\b', which we must prohibit else we boff output encoding; '\t' for >> non-literals (warning); and '\\' for the specials check. >> >> I argue for lifting the ASCII-constraint because (1) there's nothing in >> mdoc/groff/etc that disallows non-ASCII (e.g., Latin-1) characters and >> (2) it makes the code much cleaner. >> >> Thoughts? >> > > i don;t really know what you mean, to be honest. you'll have to dumb > down your question a bit, i'm afraid... Jason, right now, mandoc spits out a warning for any non-printable ASCII character. This patch lifts this restriction, instead warning only about tabs and the "backspace" character. We'd spoken about this before, but seeing it in action, I'm no longer sure. The killer points are that -Tps will throw away all non-ASCII characters as it can't calculate their glyph widths, and -Thtml stipulates UTF-8 encoding, so anything but UTF-8 input will be gobbledygock. In effect, once one uses a non-ASCII encoding, the rendered output will be irregular across output modes and, more importantly, user environment (terminals, etc.). This is, in my opinion, a Bad Thing (tm). Thoughts? Kristaps -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv