tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: "Anthony J. Bentley" <anthony@anjbe.name>
Cc: tech@mdocml.bsd.lv
Subject: Re: MathML and <mo>, <mi>, <mn>
Date: Fri, 23 Jun 2017 04:57:12 +0200	[thread overview]
Message-ID: <20170623025712.GE77030@athene.usta.de> (raw)
In-Reply-To: <45090.1497945869@cathet.us>

Hi Anthony,

Anthony J. Bentley wrote on Tue, Jun 20, 2017 at 02:04:29AM -0600:

> Consider the quadratic formula:
> 
> x={-b +- sqrt{b sup 2 - 4ac}} over 2a
> 
> Wikipedia suggests it should be rendered in MathML like so (leaving
> out invisible operators):
> 
>   <mrow>
>     <mi>x</mi>
>     <mo>=</mo>
>     <mfrac>
>       <mrow>
>         <mo>&#8722;</mo>
>         <mi>b</mi>
>         <mo>&#177;</mo>
>         <msqrt>
>           <msup>
>             <mi>b</mi>
>             <mn>2</mn>
>           </msup>
>           <mo>&#8722;</mo>
>           <mn>4</mn>
>           <mi>a</mi>
>           <mi>c</mi>
>         </msqrt>
>       </mrow>
>       <mrow>
>         <mn>2</mn>
>         <mi>a</mi>
>       </mrow>
>     </mfrac>
>   </mrow>

After committing the patch appended below, mandoc now renders
as follows:

  <mrow>
    <mi>x</mi>  <!-- new identifier/operator splitting -->
    <mo>=</mo>  <!-- new operator element -->
    <mfrac>
      <mrow>
        <mo>-</mo>  <!-- XXX still no U+2212 -->
        <mi>b</mi>
        <mo>&#177;</mo>
        <msqrt>
          <mrow>  <!-- XXX no detection of needless rows yet -->
            <msup>
              <mi>b</mi>
              <mn>2</mn>  <!-- new number element -->
            </msup>
            <mi>&#8722;</mi>  <!-- XXX no non-ASCII operator detection -->
            <mn>4</mn>
            <mi fontstyle="italic">ac</mi>  <!-- SEE BELOW -->
          </mrow>
        </msqrt>
      </mrow>
      <mn>2</mn>  <!-- XXX oops, do we need a row here? -->
      <mi>a</mi>
    </mfrac>
  </mrow>

The <mi fontstyle="italic">ac</mi> does not seem wrong.
If you write "ac", mandoc cannot be sure whether this is
a two-letter identifier (which is correctly marked up above)
or the product of two identifiers.

In this case, you should probably write "a c" (with a blank)
to make it clear that these are two identifiers, and then it
will render as <mi>a</mi><mi>c</mi>.

> - mandoc only uses <mi>, not <mo> or <mn>.

Fixed.

> - mandoc will transform a '-' into U+2212, but only when it's not
>   directly adjacent to a digit.

Open.

> - In Firefox, <mi> only seems to italicize single letters.

That is required by the MathML standard, see the description of <mi>.

> It looks like adjacent variables, numbers, and operators should be split:
>     - 'x=' should become <mi>x</mi><mo>=</mo>

Done.

>     - '-b' should become <mo>&#8722;</mo><mi>b</mi>

Done except U+2212.

>     - '-4ac' should become <mo>&#8722;</mo><mn>4</mn><mi>a</mi><mi>c</mi>

I disagree: '4ac' is fine as it is, and '4a c' does become
what you ask for.

> The MathML standard says (MathML 3.0 2e # 3.2.33) that "sin" is
> appropriately marked up with <mi>. So <mi>sin</mi> should be enough to
> correctly render eqn's mathematical words. It seems that for
> non-mathematical words to be rendered with italics by default, they
> should be rendered with a <mi> per letter?

That would be possible, but it is not required, and it gives
strange results for multi-letter identifiers.

Yours,
  Ingo


Log Message:
-----------
Write text boxes as <mi>, <mn>, or <mo> as appropriate,
and write fontstyle or fontweight attributes where required. 
Missing features reported by bentley@.

Modified Files:
--------------
    mdocml:
        eqn_html.c
        html.c
        html.h

Revision Data
-------------
Index: html.h
===================================================================
RCS file: /home/cvs/mdocml/mdocml/html.h,v
retrieving revision 1.85
retrieving revision 1.86
diff -Lhtml.h -Lhtml.h -u -p -r1.85 -r1.86
--- html.h
+++ html.h
@@ -51,6 +51,7 @@ enum	htmltag {
 	TAG_MATH,
 	TAG_MROW,
 	TAG_MI,
+	TAG_MN,
 	TAG_MO,
 	TAG_MSUP,
 	TAG_MSUB,
Index: html.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/html.c,v
retrieving revision 1.214
retrieving revision 1.215
diff -Lhtml.c -Lhtml.c -u -p -r1.214 -r1.215
--- html.c
+++ html.c
@@ -87,6 +87,7 @@ static	const struct htmldata htmltags[TA
 	{"math",	HTML_NLALL | HTML_INDENT},
 	{"mrow",	0},
 	{"mi",		0},
+	{"mn",		0},
 	{"mo",		0},
 	{"msup",	0},
 	{"msub",	0},
Index: eqn_html.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/eqn_html.c,v
retrieving revision 1.12
retrieving revision 1.13
diff -Leqn_html.c -Leqn_html.c -u -p -r1.12 -r1.13
--- eqn_html.c
+++ eqn_html.c
@@ -20,6 +20,7 @@
 #include <sys/types.h>
 
 #include <assert.h>
+#include <ctype.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -33,7 +34,10 @@ eqn_box(struct html *p, const struct eqn
 {
 	struct tag	*post, *row, *cell, *t;
 	const struct eqn_box *child, *parent;
+	const unsigned char *cp;
 	size_t		 i, j, rows;
+	enum htmltag	 tag;
+	enum eqn_fontt	 font;
 
 	if (NULL == bp)
 		return;
@@ -136,9 +140,51 @@ eqn_box(struct html *p, const struct eqn
 		print_otag(p, TAG_MTD, "");
 	}
 
-	if (NULL != bp->text) {
-		assert(NULL == post);
-		post = print_otag(p, TAG_MI, "");
+	if (bp->text != NULL) {
+		assert(post == NULL);
+		tag = TAG_MI;
+		cp = (unsigned char *)bp->text;
+		if (isdigit(cp[0]) || (cp[0] == '.' && isdigit(cp[1]))) {
+			tag = TAG_MN;
+			while (*++cp != '\0') {
+				if (*cp != '.' && !isdigit(*cp)) {
+					tag = TAG_MI;
+					break;
+				}
+			}
+		} else if (*cp != '\0' && isalpha(*cp) == 0) {
+			tag = TAG_MO;
+			while (*++cp != '\0') {
+				if (isalnum(*cp)) {
+					tag = TAG_MI;
+					break;
+				}
+			}
+		}
+		font = bp->font;
+		if (bp->text[0] != '\0' &&
+		    (((tag == TAG_MN || tag == TAG_MO) &&
+		      font == EQNFONT_ROMAN) ||
+		     (tag == TAG_MI && font == (bp->text[1] == '\0' ?
+		      EQNFONT_ITALIC : EQNFONT_ROMAN))))
+			font = EQNFONT_NONE;
+		switch (font) {
+		case EQNFONT_NONE:
+			post = print_otag(p, tag, "");
+			break;
+		case EQNFONT_ROMAN:
+			post = print_otag(p, tag, "?", "fontstyle", "normal");
+			break;
+		case EQNFONT_BOLD:
+		case EQNFONT_FAT:
+			post = print_otag(p, tag, "?", "fontweight", "bold");
+			break;
+		case EQNFONT_ITALIC:
+			post = print_otag(p, tag, "?", "fontstyle", "italic");
+			break;
+		default:
+			abort();
+		}
 		print_text(p, bp->text);
 	} else if (NULL == post) {
 		if (NULL != bp->left || NULL != bp->right)
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

  parent reply	other threads:[~2017-06-23  2:57 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-20  8:04 Anthony J. Bentley
2017-06-21 20:59 ` Ingo Schwarze
2017-06-23  2:57 ` Ingo Schwarze [this message]
2017-06-23  3:22   ` Anthony J. Bentley
2017-06-23 21:24   ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170623025712.GE77030@athene.usta.de \
    --to=schwarze@usta.de \
    --cc=anthony@anjbe.name \
    --cc=tech@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).