tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* MathML and <mo>, <mi>, <mn>
@ 2017-06-20  8:04 Anthony J. Bentley
  2017-06-21 20:59 ` Ingo Schwarze
  2017-06-23  2:57 ` Ingo Schwarze
  0 siblings, 2 replies; 5+ messages in thread
From: Anthony J. Bentley @ 2017-06-20  8:04 UTC (permalink / raw)
  To: tech

Hi,

Consider the quadratic formula:

x={-b +- sqrt{b sup 2 - 4ac}} over 2a

Wikipedia suggests it should be rendered in MathML like so (leaving
out invisible operators):

  <mrow>
    <mi>x</mi>
    <mo>=</mo>
    <mfrac>
      <mrow>
        <mo>&#8722;</mo>
        <mi>b</mi>
        <mo>&#177;</mo>
        <msqrt>
          <msup>
            <mi>b</mi>
            <mn>2</mn>
          </msup>
          <mo>&#8722;</mo>
          <mn>4</mn>
          <mi>a</mi>
          <mi>c</mi>
        </msqrt>
      </mrow>
      <mrow>
        <mn>2</mn>
        <mi>a</mi>
      </mrow>
    </mfrac>
  </mrow>

mandoc -Thtml renders it like so:

  <mrow>
    <mi>x=</mi>
    <mfrac>
      <mrow>
        <mi>-b</mi>
        <mi>&#177;</mi>
        <msqrt>
          <mrow>
            <msup>
              <mi>b</mi>
              <mi>2</mi>
            </msup>
            <mi>&#8722;</mi>
            <mi>4ac</mi>
          </mrow>
        </msqrt>
      </mrow>
      <mi>2a</mi>
    </mfrac>
  </mrow>

A few things are noticeable here:

- mandoc only uses <mi>, not <mo> or <mn>.
- mandoc will transform a '-' into U+2212, but only when it's not
  directly adjacent to a digit.
- In Firefox, <mi> only seems to italicize single letters.

It looks like adjacent variables, numbers, and operators should be split:
    - 'x=' should become <mi>x</mi><mo>=</mo>
    - '-b' should become <mo>&#8722;</mo><mi>b</mi>
    - '-4ac' should become <mo>&#8722;</mo><mn>4</mn><mi>a</mi><mi>c</mi>

The MathML standard says (MathML 3.0 2e # 3.2.33) that "sin" is
appropriately marked up with <mi>. So <mi>sin</mi> should be enough to
correctly render eqn's mathematical words. It seems that for
non-mathematical words to be rendered with italics by default, they
should be rendered with a <mi> per letter?

-- 
Anthony J. Bentley
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MathML and <mo>, <mi>, <mn>
  2017-06-20  8:04 MathML and <mo>, <mi>, <mn> Anthony J. Bentley
@ 2017-06-21 20:59 ` Ingo Schwarze
  2017-06-23  2:57 ` Ingo Schwarze
  1 sibling, 0 replies; 5+ messages in thread
From: Ingo Schwarze @ 2017-06-21 20:59 UTC (permalink / raw)
  To: Anthony J. Bentley; +Cc: tech

Hi Anthony,

Anthony J. Bentley wrote on Tue, Jun 20, 2017 at 02:04:29AM -0600:

> - mandoc only uses <mi>, not <mo> or <mn>.
> - mandoc will transform a '-' into U+2212, but only when it's not
>   directly adjacent to a digit.

These are still open.

> - In Firefox, <mi> only seems to italicize single letters.
> 
> It looks like adjacent variables, numbers, and operators should be split:
>     - 'x=' should become <mi>x</mi><mo>=</mo>
>     - '-b' should become <mo>&#8722;</mo><mi>b</mi>
>     - '-4ac' should become <mo>&#8722;</mo><mn>4</mn><mi>a</mi><mi>c</mi>
> 
> The MathML standard says (MathML 3.0 2e # 3.2.33) that "sin" is
> appropriately marked up with <mi>. So <mi>sin</mi> should be enough to
> correctly render eqn's mathematical words. It seems that for
> non-mathematical words to be rendered with italics by default, they
> should be rendered with a <mi> per letter?

The following commit implements the parser side parts needed to fix
that.  Some formatter parts are still open.

Thanks for the analysis,
  Ingo


Log Message:
-----------
Outside explicit font context, give every letter its own box.
The formatters need this to correctly select fonts.
Missing feature reported by bentley@.

Modified Files:
--------------
    mdocml:
        eqn.c

Revision Data
-------------
Index: eqn.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/eqn.c,v
retrieving revision 1.65
retrieving revision 1.66
diff -Leqn.c -Leqn.c -u -p -r1.65 -r1.66
--- eqn.c
+++ eqn.c
@@ -20,6 +20,7 @@
 #include <sys/types.h>
 
 #include <assert.h>
+#include <ctype.h>
 #include <limits.h>
 #include <stdio.h>
 #include <stdlib.h>
@@ -718,8 +719,8 @@ static enum rofferr
 eqn_parse(struct eqn_node *ep, struct eqn_box *parent)
 {
 	char		 sym[64];
-	struct eqn_box	*cur;
-	const char	*start;
+	struct eqn_box	*cur, *fontp, *nbox;
+	const char	*cp, *cpn, *start;
 	char		*p;
 	size_t		 sz;
 	enum eqn_tok	 tok, subtok;
@@ -1092,21 +1093,51 @@ this_tok:
 		 */
 		while (parent->args == parent->expectargs)
 			parent = parent->parent;
-		if (tok == EQN_TOK_FUNC) {
-			for (cur = parent; cur != NULL; cur = cur->parent)
-				if (cur->font != EQNFONT_NONE)
-					break;
-			if (cur == NULL || cur->font != EQNFONT_ROMAN) {
-				parent = eqn_box_alloc(ep, parent);
-				parent->type = EQN_LISTONE;
-				parent->font = EQNFONT_ROMAN;
-				parent->expectargs = 1;
-			}
+		/*
+		 * Wrap well-known function names in a roman box,
+		 * unless they already are in roman context.
+		 */
+		for (fontp = parent; fontp != NULL; fontp = fontp->parent)
+			if (fontp->font != EQNFONT_NONE)
+				break;
+		if (tok == EQN_TOK_FUNC &&
+		    (fontp == NULL || fontp->font != EQNFONT_ROMAN)) {
+			parent = fontp = eqn_box_alloc(ep, parent);
+			parent->type = EQN_LISTONE;
+			parent->font = EQNFONT_ROMAN;
+			parent->expectargs = 1;
 		}
 		cur = eqn_box_alloc(ep, parent);
 		cur->type = EQN_TEXT;
 		cur->text = p;
-
+		/*
+		 * If not inside any explicit font context,
+		 * give every letter its own box.
+		 */
+		if (fontp == NULL && *p != '\0') {
+			cp = p;
+			for (;;) {
+				cpn = cp + 1;
+				if (*cp == '\\')
+					mandoc_escape(&cpn, NULL, NULL);
+				if (*cpn == '\0')
+					break;
+				if (isalpha((unsigned char)*cp) == 0 &&
+				    isalpha((unsigned char)*cpn) == 0) {
+					cp = cpn;
+					continue;
+				}
+				nbox = eqn_box_alloc(ep, parent);
+				nbox->type = EQN_TEXT;
+				nbox->text = mandoc_strdup(cpn);
+				p = mandoc_strndup(cur->text,
+				    cpn - cur->text);
+				free(cur->text);
+				cur->text = p;
+				cur = nbox;
+				cp = nbox->text;
+			}
+		}
 		/*
 		 * Post-process list status.
 		 */
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MathML and <mo>, <mi>, <mn>
  2017-06-20  8:04 MathML and <mo>, <mi>, <mn> Anthony J. Bentley
  2017-06-21 20:59 ` Ingo Schwarze
@ 2017-06-23  2:57 ` Ingo Schwarze
  2017-06-23  3:22   ` Anthony J. Bentley
  2017-06-23 21:24   ` Ingo Schwarze
  1 sibling, 2 replies; 5+ messages in thread
From: Ingo Schwarze @ 2017-06-23  2:57 UTC (permalink / raw)
  To: Anthony J. Bentley; +Cc: tech

Hi Anthony,

Anthony J. Bentley wrote on Tue, Jun 20, 2017 at 02:04:29AM -0600:

> Consider the quadratic formula:
> 
> x={-b +- sqrt{b sup 2 - 4ac}} over 2a
> 
> Wikipedia suggests it should be rendered in MathML like so (leaving
> out invisible operators):
> 
>   <mrow>
>     <mi>x</mi>
>     <mo>=</mo>
>     <mfrac>
>       <mrow>
>         <mo>&#8722;</mo>
>         <mi>b</mi>
>         <mo>&#177;</mo>
>         <msqrt>
>           <msup>
>             <mi>b</mi>
>             <mn>2</mn>
>           </msup>
>           <mo>&#8722;</mo>
>           <mn>4</mn>
>           <mi>a</mi>
>           <mi>c</mi>
>         </msqrt>
>       </mrow>
>       <mrow>
>         <mn>2</mn>
>         <mi>a</mi>
>       </mrow>
>     </mfrac>
>   </mrow>

After committing the patch appended below, mandoc now renders
as follows:

  <mrow>
    <mi>x</mi>  <!-- new identifier/operator splitting -->
    <mo>=</mo>  <!-- new operator element -->
    <mfrac>
      <mrow>
        <mo>-</mo>  <!-- XXX still no U+2212 -->
        <mi>b</mi>
        <mo>&#177;</mo>
        <msqrt>
          <mrow>  <!-- XXX no detection of needless rows yet -->
            <msup>
              <mi>b</mi>
              <mn>2</mn>  <!-- new number element -->
            </msup>
            <mi>&#8722;</mi>  <!-- XXX no non-ASCII operator detection -->
            <mn>4</mn>
            <mi fontstyle="italic">ac</mi>  <!-- SEE BELOW -->
          </mrow>
        </msqrt>
      </mrow>
      <mn>2</mn>  <!-- XXX oops, do we need a row here? -->
      <mi>a</mi>
    </mfrac>
  </mrow>

The <mi fontstyle="italic">ac</mi> does not seem wrong.
If you write "ac", mandoc cannot be sure whether this is
a two-letter identifier (which is correctly marked up above)
or the product of two identifiers.

In this case, you should probably write "a c" (with a blank)
to make it clear that these are two identifiers, and then it
will render as <mi>a</mi><mi>c</mi>.

> - mandoc only uses <mi>, not <mo> or <mn>.

Fixed.

> - mandoc will transform a '-' into U+2212, but only when it's not
>   directly adjacent to a digit.

Open.

> - In Firefox, <mi> only seems to italicize single letters.

That is required by the MathML standard, see the description of <mi>.

> It looks like adjacent variables, numbers, and operators should be split:
>     - 'x=' should become <mi>x</mi><mo>=</mo>

Done.

>     - '-b' should become <mo>&#8722;</mo><mi>b</mi>

Done except U+2212.

>     - '-4ac' should become <mo>&#8722;</mo><mn>4</mn><mi>a</mi><mi>c</mi>

I disagree: '4ac' is fine as it is, and '4a c' does become
what you ask for.

> The MathML standard says (MathML 3.0 2e # 3.2.33) that "sin" is
> appropriately marked up with <mi>. So <mi>sin</mi> should be enough to
> correctly render eqn's mathematical words. It seems that for
> non-mathematical words to be rendered with italics by default, they
> should be rendered with a <mi> per letter?

That would be possible, but it is not required, and it gives
strange results for multi-letter identifiers.

Yours,
  Ingo


Log Message:
-----------
Write text boxes as <mi>, <mn>, or <mo> as appropriate,
and write fontstyle or fontweight attributes where required. 
Missing features reported by bentley@.

Modified Files:
--------------
    mdocml:
        eqn_html.c
        html.c
        html.h

Revision Data
-------------
Index: html.h
===================================================================
RCS file: /home/cvs/mdocml/mdocml/html.h,v
retrieving revision 1.85
retrieving revision 1.86
diff -Lhtml.h -Lhtml.h -u -p -r1.85 -r1.86
--- html.h
+++ html.h
@@ -51,6 +51,7 @@ enum	htmltag {
 	TAG_MATH,
 	TAG_MROW,
 	TAG_MI,
+	TAG_MN,
 	TAG_MO,
 	TAG_MSUP,
 	TAG_MSUB,
Index: html.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/html.c,v
retrieving revision 1.214
retrieving revision 1.215
diff -Lhtml.c -Lhtml.c -u -p -r1.214 -r1.215
--- html.c
+++ html.c
@@ -87,6 +87,7 @@ static	const struct htmldata htmltags[TA
 	{"math",	HTML_NLALL | HTML_INDENT},
 	{"mrow",	0},
 	{"mi",		0},
+	{"mn",		0},
 	{"mo",		0},
 	{"msup",	0},
 	{"msub",	0},
Index: eqn_html.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/eqn_html.c,v
retrieving revision 1.12
retrieving revision 1.13
diff -Leqn_html.c -Leqn_html.c -u -p -r1.12 -r1.13
--- eqn_html.c
+++ eqn_html.c
@@ -20,6 +20,7 @@
 #include <sys/types.h>
 
 #include <assert.h>
+#include <ctype.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -33,7 +34,10 @@ eqn_box(struct html *p, const struct eqn
 {
 	struct tag	*post, *row, *cell, *t;
 	const struct eqn_box *child, *parent;
+	const unsigned char *cp;
 	size_t		 i, j, rows;
+	enum htmltag	 tag;
+	enum eqn_fontt	 font;
 
 	if (NULL == bp)
 		return;
@@ -136,9 +140,51 @@ eqn_box(struct html *p, const struct eqn
 		print_otag(p, TAG_MTD, "");
 	}
 
-	if (NULL != bp->text) {
-		assert(NULL == post);
-		post = print_otag(p, TAG_MI, "");
+	if (bp->text != NULL) {
+		assert(post == NULL);
+		tag = TAG_MI;
+		cp = (unsigned char *)bp->text;
+		if (isdigit(cp[0]) || (cp[0] == '.' && isdigit(cp[1]))) {
+			tag = TAG_MN;
+			while (*++cp != '\0') {
+				if (*cp != '.' && !isdigit(*cp)) {
+					tag = TAG_MI;
+					break;
+				}
+			}
+		} else if (*cp != '\0' && isalpha(*cp) == 0) {
+			tag = TAG_MO;
+			while (*++cp != '\0') {
+				if (isalnum(*cp)) {
+					tag = TAG_MI;
+					break;
+				}
+			}
+		}
+		font = bp->font;
+		if (bp->text[0] != '\0' &&
+		    (((tag == TAG_MN || tag == TAG_MO) &&
+		      font == EQNFONT_ROMAN) ||
+		     (tag == TAG_MI && font == (bp->text[1] == '\0' ?
+		      EQNFONT_ITALIC : EQNFONT_ROMAN))))
+			font = EQNFONT_NONE;
+		switch (font) {
+		case EQNFONT_NONE:
+			post = print_otag(p, tag, "");
+			break;
+		case EQNFONT_ROMAN:
+			post = print_otag(p, tag, "?", "fontstyle", "normal");
+			break;
+		case EQNFONT_BOLD:
+		case EQNFONT_FAT:
+			post = print_otag(p, tag, "?", "fontweight", "bold");
+			break;
+		case EQNFONT_ITALIC:
+			post = print_otag(p, tag, "?", "fontstyle", "italic");
+			break;
+		default:
+			abort();
+		}
 		print_text(p, bp->text);
 	} else if (NULL == post) {
 		if (NULL != bp->left || NULL != bp->right)
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MathML and <mo>, <mi>, <mn>
  2017-06-23  2:57 ` Ingo Schwarze
@ 2017-06-23  3:22   ` Anthony J. Bentley
  2017-06-23 21:24   ` Ingo Schwarze
  1 sibling, 0 replies; 5+ messages in thread
From: Anthony J. Bentley @ 2017-06-23  3:22 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: tech

Hi Ingo,

Thanks for the improvements!

Ingo Schwarze writes:
> The <mi fontstyle="italic">ac</mi> does not seem wrong.
> If you write "ac", mandoc cannot be sure whether this is
> a two-letter identifier (which is correctly marked up above)
> or the product of two identifiers.
> 
> In this case, you should probably write "a c" (with a blank)
> to make it clear that these are two identifiers, and then it
> will render as <mi>a</mi><mi>c</mi>.

This decision seems fine to me.

-- 
Anthony J. Bentley
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: MathML and <mo>, <mi>, <mn>
  2017-06-23  2:57 ` Ingo Schwarze
  2017-06-23  3:22   ` Anthony J. Bentley
@ 2017-06-23 21:24   ` Ingo Schwarze
  1 sibling, 0 replies; 5+ messages in thread
From: Ingo Schwarze @ 2017-06-23 21:24 UTC (permalink / raw)
  To: Anthony J. Bentley, tech

Hi Anthony,

Ingo Schwarze wrote on Fri, Jun 23, 2017 at 04:57:12AM +0200:
> Anthony J. Bentley wrote on Tue, Jun 20, 2017 at 02:04:29AM -0600:

>> Consider the quadratic formula:
>> x={-b +- sqrt{b sup 2 - 4ac}} over 2a

> After committing the patch appended below, mandoc now renders
> as follows:
> 
>   <mrow>
>     <mi>x</mi>  <!-- new identifier/operator splitting -->
>     <mo>=</mo>  <!-- new operator element -->
>     <mfrac>
>       <mrow>
[...]
>       </mrow>
>       <mn>2</mn>  <!-- XXX oops, do we need a row here? -->
>       <mi>a</mi>
>     </mfrac>
>   </mrow>

I just fixed this with the following commit, to render as:

        </mrow>
        <mrow>
          <mn>2</mn>
          <mi>a</mi>
        </mrow>
      </mfrac>

Yours,
  Ingo


Log Message:
-----------
splitting a text box sometimes requires wrapping it in a list

Modified Files:
--------------
    mandoc:
        eqn.c

Revision Data
-------------
Index: eqn.c
===================================================================
RCS file: /home/cvs/mandoc/mandoc/eqn.c,v
retrieving revision 1.68
retrieving revision 1.69
diff -Leqn.c -Leqn.c -u -p -r1.68 -r1.69
--- eqn.c
+++ eqn.c
@@ -1139,7 +1139,25 @@ this_tok:
 					break;
 				if (ccln == ccl)
 					continue;
-				/* Boundary found, add a new box. */
+				/* Boundary found, split the text. */
+				if (parent->args == parent->expectargs) {
+					/* Remove the text from the tree. */
+					if (cur->prev == NULL)
+						parent->first = cur->next;
+					else
+						cur->prev->next = NULL;
+					parent->last = cur->prev;
+					parent->args--;
+					/* Set up a list instead. */
+					nbox = eqn_box_alloc(ep, parent);
+					nbox->type = EQN_LIST;
+					/* Insert the word into the list. */
+					nbox->first = nbox->last = cur;
+					cur->parent = nbox;
+					cur->prev = NULL;
+					parent = nbox;
+				}
+				/* Append a new text box. */
 				nbox = eqn_box_alloc(ep, parent);
 				nbox->type = EQN_TEXT;
 				nbox->text = mandoc_strdup(cpn);
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-06-23 21:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-20  8:04 MathML and <mo>, <mi>, <mn> Anthony J. Bentley
2017-06-21 20:59 ` Ingo Schwarze
2017-06-23  2:57 ` Ingo Schwarze
2017-06-23  3:22   ` Anthony J. Bentley
2017-06-23 21:24   ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).