* MathML and <mo>, <mi>, <mn>
@ 2017-06-20 8:04 Anthony J. Bentley
2017-06-21 20:59 ` Ingo Schwarze
2017-06-23 2:57 ` Ingo Schwarze
0 siblings, 2 replies; 5+ messages in thread
From: Anthony J. Bentley @ 2017-06-20 8:04 UTC (permalink / raw)
To: tech
Hi,
Consider the quadratic formula:
x={-b +- sqrt{b sup 2 - 4ac}} over 2a
Wikipedia suggests it should be rendered in MathML like so (leaving
out invisible operators):
<mrow>
<mi>x</mi>
<mo>=</mo>
<mfrac>
<mrow>
<mo>−</mo>
<mi>b</mi>
<mo>±</mo>
<msqrt>
<msup>
<mi>b</mi>
<mn>2</mn>
</msup>
<mo>−</mo>
<mn>4</mn>
<mi>a</mi>
<mi>c</mi>
</msqrt>
</mrow>
<mrow>
<mn>2</mn>
<mi>a</mi>
</mrow>
</mfrac>
</mrow>
mandoc -Thtml renders it like so:
<mrow>
<mi>x=</mi>
<mfrac>
<mrow>
<mi>-b</mi>
<mi>±</mi>
<msqrt>
<mrow>
<msup>
<mi>b</mi>
<mi>2</mi>
</msup>
<mi>−</mi>
<mi>4ac</mi>
</mrow>
</msqrt>
</mrow>
<mi>2a</mi>
</mfrac>
</mrow>
A few things are noticeable here:
- mandoc only uses <mi>, not <mo> or <mn>.
- mandoc will transform a '-' into U+2212, but only when it's not
directly adjacent to a digit.
- In Firefox, <mi> only seems to italicize single letters.
It looks like adjacent variables, numbers, and operators should be split:
- 'x=' should become <mi>x</mi><mo>=</mo>
- '-b' should become <mo>−</mo><mi>b</mi>
- '-4ac' should become <mo>−</mo><mn>4</mn><mi>a</mi><mi>c</mi>
The MathML standard says (MathML 3.0 2e # 3.2.33) that "sin" is
appropriately marked up with <mi>. So <mi>sin</mi> should be enough to
correctly render eqn's mathematical words. It seems that for
non-mathematical words to be rendered with italics by default, they
should be rendered with a <mi> per letter?
--
Anthony J. Bentley
--
To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: MathML and <mo>, <mi>, <mn>
2017-06-20 8:04 MathML and <mo>, <mi>, <mn> Anthony J. Bentley
@ 2017-06-21 20:59 ` Ingo Schwarze
2017-06-23 2:57 ` Ingo Schwarze
1 sibling, 0 replies; 5+ messages in thread
From: Ingo Schwarze @ 2017-06-21 20:59 UTC (permalink / raw)
To: Anthony J. Bentley; +Cc: tech
Hi Anthony,
Anthony J. Bentley wrote on Tue, Jun 20, 2017 at 02:04:29AM -0600:
> - mandoc only uses <mi>, not <mo> or <mn>.
> - mandoc will transform a '-' into U+2212, but only when it's not
> directly adjacent to a digit.
These are still open.
> - In Firefox, <mi> only seems to italicize single letters.
>
> It looks like adjacent variables, numbers, and operators should be split:
> - 'x=' should become <mi>x</mi><mo>=</mo>
> - '-b' should become <mo>−</mo><mi>b</mi>
> - '-4ac' should become <mo>−</mo><mn>4</mn><mi>a</mi><mi>c</mi>
>
> The MathML standard says (MathML 3.0 2e # 3.2.33) that "sin" is
> appropriately marked up with <mi>. So <mi>sin</mi> should be enough to
> correctly render eqn's mathematical words. It seems that for
> non-mathematical words to be rendered with italics by default, they
> should be rendered with a <mi> per letter?
The following commit implements the parser side parts needed to fix
that. Some formatter parts are still open.
Thanks for the analysis,
Ingo
Log Message:
-----------
Outside explicit font context, give every letter its own box.
The formatters need this to correctly select fonts.
Missing feature reported by bentley@.
Modified Files:
--------------
mdocml:
eqn.c
Revision Data
-------------
Index: eqn.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/eqn.c,v
retrieving revision 1.65
retrieving revision 1.66
diff -Leqn.c -Leqn.c -u -p -r1.65 -r1.66
--- eqn.c
+++ eqn.c
@@ -20,6 +20,7 @@
#include <sys/types.h>
#include <assert.h>
+#include <ctype.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
@@ -718,8 +719,8 @@ static enum rofferr
eqn_parse(struct eqn_node *ep, struct eqn_box *parent)
{
char sym[64];
- struct eqn_box *cur;
- const char *start;
+ struct eqn_box *cur, *fontp, *nbox;
+ const char *cp, *cpn, *start;
char *p;
size_t sz;
enum eqn_tok tok, subtok;
@@ -1092,21 +1093,51 @@ this_tok:
*/
while (parent->args == parent->expectargs)
parent = parent->parent;
- if (tok == EQN_TOK_FUNC) {
- for (cur = parent; cur != NULL; cur = cur->parent)
- if (cur->font != EQNFONT_NONE)
- break;
- if (cur == NULL || cur->font != EQNFONT_ROMAN) {
- parent = eqn_box_alloc(ep, parent);
- parent->type = EQN_LISTONE;
- parent->font = EQNFONT_ROMAN;
- parent->expectargs = 1;
- }
+ /*
+ * Wrap well-known function names in a roman box,
+ * unless they already are in roman context.
+ */
+ for (fontp = parent; fontp != NULL; fontp = fontp->parent)
+ if (fontp->font != EQNFONT_NONE)
+ break;
+ if (tok == EQN_TOK_FUNC &&
+ (fontp == NULL || fontp->font != EQNFONT_ROMAN)) {
+ parent = fontp = eqn_box_alloc(ep, parent);
+ parent->type = EQN_LISTONE;
+ parent->font = EQNFONT_ROMAN;
+ parent->expectargs = 1;
}
cur = eqn_box_alloc(ep, parent);
cur->type = EQN_TEXT;
cur->text = p;
-
+ /*
+ * If not inside any explicit font context,
+ * give every letter its own box.
+ */
+ if (fontp == NULL && *p != '\0') {
+ cp = p;
+ for (;;) {
+ cpn = cp + 1;
+ if (*cp == '\\')
+ mandoc_escape(&cpn, NULL, NULL);
+ if (*cpn == '\0')
+ break;
+ if (isalpha((unsigned char)*cp) == 0 &&
+ isalpha((unsigned char)*cpn) == 0) {
+ cp = cpn;
+ continue;
+ }
+ nbox = eqn_box_alloc(ep, parent);
+ nbox->type = EQN_TEXT;
+ nbox->text = mandoc_strdup(cpn);
+ p = mandoc_strndup(cur->text,
+ cpn - cur->text);
+ free(cur->text);
+ cur->text = p;
+ cur = nbox;
+ cp = nbox->text;
+ }
+ }
/*
* Post-process list status.
*/
--
To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: MathML and <mo>, <mi>, <mn>
2017-06-20 8:04 MathML and <mo>, <mi>, <mn> Anthony J. Bentley
2017-06-21 20:59 ` Ingo Schwarze
@ 2017-06-23 2:57 ` Ingo Schwarze
2017-06-23 3:22 ` Anthony J. Bentley
2017-06-23 21:24 ` Ingo Schwarze
1 sibling, 2 replies; 5+ messages in thread
From: Ingo Schwarze @ 2017-06-23 2:57 UTC (permalink / raw)
To: Anthony J. Bentley; +Cc: tech
Hi Anthony,
Anthony J. Bentley wrote on Tue, Jun 20, 2017 at 02:04:29AM -0600:
> Consider the quadratic formula:
>
> x={-b +- sqrt{b sup 2 - 4ac}} over 2a
>
> Wikipedia suggests it should be rendered in MathML like so (leaving
> out invisible operators):
>
> <mrow>
> <mi>x</mi>
> <mo>=</mo>
> <mfrac>
> <mrow>
> <mo>−</mo>
> <mi>b</mi>
> <mo>±</mo>
> <msqrt>
> <msup>
> <mi>b</mi>
> <mn>2</mn>
> </msup>
> <mo>−</mo>
> <mn>4</mn>
> <mi>a</mi>
> <mi>c</mi>
> </msqrt>
> </mrow>
> <mrow>
> <mn>2</mn>
> <mi>a</mi>
> </mrow>
> </mfrac>
> </mrow>
After committing the patch appended below, mandoc now renders
as follows:
<mrow>
<mi>x</mi> <!-- new identifier/operator splitting -->
<mo>=</mo> <!-- new operator element -->
<mfrac>
<mrow>
<mo>-</mo> <!-- XXX still no U+2212 -->
<mi>b</mi>
<mo>±</mo>
<msqrt>
<mrow> <!-- XXX no detection of needless rows yet -->
<msup>
<mi>b</mi>
<mn>2</mn> <!-- new number element -->
</msup>
<mi>−</mi> <!-- XXX no non-ASCII operator detection -->
<mn>4</mn>
<mi fontstyle="italic">ac</mi> <!-- SEE BELOW -->
</mrow>
</msqrt>
</mrow>
<mn>2</mn> <!-- XXX oops, do we need a row here? -->
<mi>a</mi>
</mfrac>
</mrow>
The <mi fontstyle="italic">ac</mi> does not seem wrong.
If you write "ac", mandoc cannot be sure whether this is
a two-letter identifier (which is correctly marked up above)
or the product of two identifiers.
In this case, you should probably write "a c" (with a blank)
to make it clear that these are two identifiers, and then it
will render as <mi>a</mi><mi>c</mi>.
> - mandoc only uses <mi>, not <mo> or <mn>.
Fixed.
> - mandoc will transform a '-' into U+2212, but only when it's not
> directly adjacent to a digit.
Open.
> - In Firefox, <mi> only seems to italicize single letters.
That is required by the MathML standard, see the description of <mi>.
> It looks like adjacent variables, numbers, and operators should be split:
> - 'x=' should become <mi>x</mi><mo>=</mo>
Done.
> - '-b' should become <mo>−</mo><mi>b</mi>
Done except U+2212.
> - '-4ac' should become <mo>−</mo><mn>4</mn><mi>a</mi><mi>c</mi>
I disagree: '4ac' is fine as it is, and '4a c' does become
what you ask for.
> The MathML standard says (MathML 3.0 2e # 3.2.33) that "sin" is
> appropriately marked up with <mi>. So <mi>sin</mi> should be enough to
> correctly render eqn's mathematical words. It seems that for
> non-mathematical words to be rendered with italics by default, they
> should be rendered with a <mi> per letter?
That would be possible, but it is not required, and it gives
strange results for multi-letter identifiers.
Yours,
Ingo
Log Message:
-----------
Write text boxes as <mi>, <mn>, or <mo> as appropriate,
and write fontstyle or fontweight attributes where required.
Missing features reported by bentley@.
Modified Files:
--------------
mdocml:
eqn_html.c
html.c
html.h
Revision Data
-------------
Index: html.h
===================================================================
RCS file: /home/cvs/mdocml/mdocml/html.h,v
retrieving revision 1.85
retrieving revision 1.86
diff -Lhtml.h -Lhtml.h -u -p -r1.85 -r1.86
--- html.h
+++ html.h
@@ -51,6 +51,7 @@ enum htmltag {
TAG_MATH,
TAG_MROW,
TAG_MI,
+ TAG_MN,
TAG_MO,
TAG_MSUP,
TAG_MSUB,
Index: html.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/html.c,v
retrieving revision 1.214
retrieving revision 1.215
diff -Lhtml.c -Lhtml.c -u -p -r1.214 -r1.215
--- html.c
+++ html.c
@@ -87,6 +87,7 @@ static const struct htmldata htmltags[TA
{"math", HTML_NLALL | HTML_INDENT},
{"mrow", 0},
{"mi", 0},
+ {"mn", 0},
{"mo", 0},
{"msup", 0},
{"msub", 0},
Index: eqn_html.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/eqn_html.c,v
retrieving revision 1.12
retrieving revision 1.13
diff -Leqn_html.c -Leqn_html.c -u -p -r1.12 -r1.13
--- eqn_html.c
+++ eqn_html.c
@@ -20,6 +20,7 @@
#include <sys/types.h>
#include <assert.h>
+#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
@@ -33,7 +34,10 @@ eqn_box(struct html *p, const struct eqn
{
struct tag *post, *row, *cell, *t;
const struct eqn_box *child, *parent;
+ const unsigned char *cp;
size_t i, j, rows;
+ enum htmltag tag;
+ enum eqn_fontt font;
if (NULL == bp)
return;
@@ -136,9 +140,51 @@ eqn_box(struct html *p, const struct eqn
print_otag(p, TAG_MTD, "");
}
- if (NULL != bp->text) {
- assert(NULL == post);
- post = print_otag(p, TAG_MI, "");
+ if (bp->text != NULL) {
+ assert(post == NULL);
+ tag = TAG_MI;
+ cp = (unsigned char *)bp->text;
+ if (isdigit(cp[0]) || (cp[0] == '.' && isdigit(cp[1]))) {
+ tag = TAG_MN;
+ while (*++cp != '\0') {
+ if (*cp != '.' && !isdigit(*cp)) {
+ tag = TAG_MI;
+ break;
+ }
+ }
+ } else if (*cp != '\0' && isalpha(*cp) == 0) {
+ tag = TAG_MO;
+ while (*++cp != '\0') {
+ if (isalnum(*cp)) {
+ tag = TAG_MI;
+ break;
+ }
+ }
+ }
+ font = bp->font;
+ if (bp->text[0] != '\0' &&
+ (((tag == TAG_MN || tag == TAG_MO) &&
+ font == EQNFONT_ROMAN) ||
+ (tag == TAG_MI && font == (bp->text[1] == '\0' ?
+ EQNFONT_ITALIC : EQNFONT_ROMAN))))
+ font = EQNFONT_NONE;
+ switch (font) {
+ case EQNFONT_NONE:
+ post = print_otag(p, tag, "");
+ break;
+ case EQNFONT_ROMAN:
+ post = print_otag(p, tag, "?", "fontstyle", "normal");
+ break;
+ case EQNFONT_BOLD:
+ case EQNFONT_FAT:
+ post = print_otag(p, tag, "?", "fontweight", "bold");
+ break;
+ case EQNFONT_ITALIC:
+ post = print_otag(p, tag, "?", "fontstyle", "italic");
+ break;
+ default:
+ abort();
+ }
print_text(p, bp->text);
} else if (NULL == post) {
if (NULL != bp->left || NULL != bp->right)
--
To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: MathML and <mo>, <mi>, <mn>
2017-06-23 2:57 ` Ingo Schwarze
@ 2017-06-23 3:22 ` Anthony J. Bentley
2017-06-23 21:24 ` Ingo Schwarze
1 sibling, 0 replies; 5+ messages in thread
From: Anthony J. Bentley @ 2017-06-23 3:22 UTC (permalink / raw)
To: Ingo Schwarze; +Cc: tech
Hi Ingo,
Thanks for the improvements!
Ingo Schwarze writes:
> The <mi fontstyle="italic">ac</mi> does not seem wrong.
> If you write "ac", mandoc cannot be sure whether this is
> a two-letter identifier (which is correctly marked up above)
> or the product of two identifiers.
>
> In this case, you should probably write "a c" (with a blank)
> to make it clear that these are two identifiers, and then it
> will render as <mi>a</mi><mi>c</mi>.
This decision seems fine to me.
--
Anthony J. Bentley
--
To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: MathML and <mo>, <mi>, <mn>
2017-06-23 2:57 ` Ingo Schwarze
2017-06-23 3:22 ` Anthony J. Bentley
@ 2017-06-23 21:24 ` Ingo Schwarze
1 sibling, 0 replies; 5+ messages in thread
From: Ingo Schwarze @ 2017-06-23 21:24 UTC (permalink / raw)
To: Anthony J. Bentley, tech
Hi Anthony,
Ingo Schwarze wrote on Fri, Jun 23, 2017 at 04:57:12AM +0200:
> Anthony J. Bentley wrote on Tue, Jun 20, 2017 at 02:04:29AM -0600:
>> Consider the quadratic formula:
>> x={-b +- sqrt{b sup 2 - 4ac}} over 2a
> After committing the patch appended below, mandoc now renders
> as follows:
>
> <mrow>
> <mi>x</mi> <!-- new identifier/operator splitting -->
> <mo>=</mo> <!-- new operator element -->
> <mfrac>
> <mrow>
[...]
> </mrow>
> <mn>2</mn> <!-- XXX oops, do we need a row here? -->
> <mi>a</mi>
> </mfrac>
> </mrow>
I just fixed this with the following commit, to render as:
</mrow>
<mrow>
<mn>2</mn>
<mi>a</mi>
</mrow>
</mfrac>
Yours,
Ingo
Log Message:
-----------
splitting a text box sometimes requires wrapping it in a list
Modified Files:
--------------
mandoc:
eqn.c
Revision Data
-------------
Index: eqn.c
===================================================================
RCS file: /home/cvs/mandoc/mandoc/eqn.c,v
retrieving revision 1.68
retrieving revision 1.69
diff -Leqn.c -Leqn.c -u -p -r1.68 -r1.69
--- eqn.c
+++ eqn.c
@@ -1139,7 +1139,25 @@ this_tok:
break;
if (ccln == ccl)
continue;
- /* Boundary found, add a new box. */
+ /* Boundary found, split the text. */
+ if (parent->args == parent->expectargs) {
+ /* Remove the text from the tree. */
+ if (cur->prev == NULL)
+ parent->first = cur->next;
+ else
+ cur->prev->next = NULL;
+ parent->last = cur->prev;
+ parent->args--;
+ /* Set up a list instead. */
+ nbox = eqn_box_alloc(ep, parent);
+ nbox->type = EQN_LIST;
+ /* Insert the word into the list. */
+ nbox->first = nbox->last = cur;
+ cur->parent = nbox;
+ cur->prev = NULL;
+ parent = nbox;
+ }
+ /* Append a new text box. */
nbox = eqn_box_alloc(ep, parent);
nbox->type = EQN_TEXT;
nbox->text = mandoc_strdup(cpn);
--
To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-06-23 21:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-20 8:04 MathML and <mo>, <mi>, <mn> Anthony J. Bentley
2017-06-21 20:59 ` Ingo Schwarze
2017-06-23 2:57 ` Ingo Schwarze
2017-06-23 3:22 ` Anthony J. Bentley
2017-06-23 21:24 ` Ingo Schwarze
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).