From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-3.sys.kth.se (smtp-3.sys.kth.se [130.237.48.192]) by krisdoz.my.domain (8.14.5/8.14.5) with ESMTP id s91EpNnl020812 for ; Wed, 1 Oct 2014 10:51:26 -0400 (EDT) Received: from smtp-3.sys.kth.se (localhost.localdomain [127.0.0.1]) by smtp-3.sys.kth.se (Postfix) with ESMTP id 227417ED for ; Wed, 1 Oct 2014 16:51:22 +0200 (CEST) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-3.sys.kth.se ([127.0.0.1]) by smtp-3.sys.kth.se (smtp-3.sys.kth.se [127.0.0.1]) (amavisd-new, port 10024) with LMTP id BMW9T0rDnBX3 for ; Wed, 1 Oct 2014 16:51:21 +0200 (CEST) X-KTH-Auth: kristaps [2.69.124.167] X-KTH-mail-from: kristaps@bsd.lv X-KTH-rcpt-to: discuss@mdocml.bsd.lv Received: from [2.69.124.167] (2.69.124.167.mobile.tre.se [2.69.124.167]) by smtp-3.sys.kth.se (Postfix) with ESMTPSA id D38C51F02 for ; Wed, 1 Oct 2014 16:51:15 +0200 (CEST) Message-ID: <542C14E1.3040700@bsd.lv> Date: Wed, 01 Oct 2014 16:51:13 +0200 From: Kristaps Dzonsons User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 To: discuss@mdocml.bsd.lv Subject: Ambiguous grammar: unicode vs. \[uX] escapes Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit In adding diacriticals to the shiny new MathML output, I stumbled across a curious ambiguity. Basically, I wanted the following sequence: { a sub b } under Which in eqn(7), means a_b with a line under it all. In the new eqn.c, I have a special "bottom" string I set to a corresponding under-diacritical. (The others have a "top" string.) I was setting this to \[ul], underscore. However, the character refused to appear. Mystified, I explored further. Then I saw that in print_encode() (html.c), the \[ul] was being detected as a Unicode codepoint. Why? Because the sequence is \[uxxx] (mandoc.c:88). Is there any consensus on how we should handle this? groff_char(7) doesn't say anything, but I'm guessing the Unicode codepoints should be 4--6 hexdigits long. That's an easy fix, but I'm not sure if it's the right approach. Thoughts? -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv