tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* -Tmarkdown: don't wrap mailtos in <>
@ 2017-03-09  1:47 Anthony J. Bentley
  2017-03-09  2:32 ` Ingo Schwarze
  0 siblings, 1 reply; 4+ messages in thread
From: Anthony J. Bentley @ 2017-03-09  1:47 UTC (permalink / raw)
  To: tech

Hi,

Currently -Tmarkdown wraps Mt email addresses in <>. That's not correct;
in manpages the <> are provided by a parent Aq block.

Index: usr.bin/mandoc/mdoc_markdown.c
===================================================================
RCS file: /cvs/src/usr.bin/mandoc/mdoc_markdown.c,v
retrieving revision 1.14
diff -u -p -r1.14 mdoc_markdown.c
--- usr.bin/mandoc/mdoc_markdown.c	8 Mar 2017 19:23:23 -0000	1.14
+++ usr.bin/mandoc/mdoc_markdown.c	9 Mar 2017 01:46:30 -0000
@@ -211,7 +211,7 @@ static	const struct md_act md_acts[MDOC_
 	{ NULL, NULL, md_post_Lb, NULL, NULL }, /* Lb */
 	{ NULL, md_pre_Pp, NULL, NULL, NULL }, /* Lp */
 	{ NULL, md_pre_Lk, NULL, NULL, NULL }, /* Lk */
-	{ NULL, md_pre_raw, md_post_raw, "<", ">" }, /* Mt */
+	{ NULL, md_pre_raw, md_post_raw, NULL, NULL }, /* Mt */
 	{ md_cond_body, md_pre_word, md_post_word, "{", "}" }, /* Brq */
 	{ md_cond_body, md_pre_word, md_post_word, "{", "}" }, /* Bro */
 	{ NULL, NULL, NULL, NULL, NULL }, /* Brc */
Index: regress/usr.bin/mandoc/mdoc/Aq/author.out_markdown
===================================================================
RCS file: /cvs/src/regress/usr.bin/mandoc/mdoc/Aq/author.out_markdown,v
retrieving revision 1.1
diff -u -p -r1.1 author.out_markdown
--- regress/usr.bin/mandoc/mdoc/Aq/author.out_markdown	7 Mar 2017 13:09:08 -0000	1.1
+++ regress/usr.bin/mandoc/mdoc/Aq/author.out_markdown	9 Mar 2017 01:46:30 -0000
@@ -6,11 +6,11 @@ AQ-AUTHOR(1) - General Commands Manual
 
 # DESCRIPTION
 
-Name &lt;<addr>&gt; Name &lt;<addr>&gt;
+Name &lt;addr&gt; Name &lt;addr&gt;
 
 # AUTHORS
 
-Name &lt;<addr>&gt;  
-Name &lt;<addr>&gt;
+Name &lt;addr&gt;  
+Name &lt;addr&gt;
 
 OpenBSD - November 19, 2014
Index: regress/usr.bin/mandoc/mdoc/Mt/simple.out_markdown
===================================================================
RCS file: /cvs/src/regress/usr.bin/mandoc/mdoc/Mt/simple.out_markdown,v
retrieving revision 1.1
diff -u -p -r1.1 simple.out_markdown
--- regress/usr.bin/mandoc/mdoc/Mt/simple.out_markdown	5 Mar 2017 19:59:26 -0000	1.1
+++ regress/usr.bin/mandoc/mdoc/Mt/simple.out_markdown	9 Mar 2017 01:46:30 -0000
@@ -7,9 +7,9 @@ MT-SIMPLE(1) - General Commands Manual
 # DESCRIPTION
 
 Please send mail to
-<schwarze@openbsd.org>.
+schwarze@openbsd.org.
 
 Do not send mail to
-<~>.
+~.
 
 OpenBSD - February 17, 2010
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: -Tmarkdown: don't wrap mailtos in <>
  2017-03-09  1:47 -Tmarkdown: don't wrap mailtos in <> Anthony J. Bentley
@ 2017-03-09  2:32 ` Ingo Schwarze
  2017-03-09  7:27   ` Anthony J. Bentley
  0 siblings, 1 reply; 4+ messages in thread
From: Ingo Schwarze @ 2017-03-09  2:32 UTC (permalink / raw)
  To: Anthony J. Bentley; +Cc: tech

Hi Anthony,

Anthony J. Bentley wrote on Wed, Mar 08, 2017 at 06:47:52PM -0700:

> Currently -Tmarkdown wraps Mt email addresses in <>.

Yes, and that is indeed required.  That is markdown syntax
which will be tranformed into <a href="mailto:...>...</a>
by the markdown compiler.  Contrary to the outward appearance,
the angle brackets are *not* duplicate.

Your diff is very wrong and completely breaks .Mt hyperlinking.

Yours,
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: -Tmarkdown: don't wrap mailtos in <>
  2017-03-09  2:32 ` Ingo Schwarze
@ 2017-03-09  7:27   ` Anthony J. Bentley
  2017-03-09 16:39     ` Ingo Schwarze
  0 siblings, 1 reply; 4+ messages in thread
From: Anthony J. Bentley @ 2017-03-09  7:27 UTC (permalink / raw)
  To: Ingo Schwarze; +Cc: tech

Hi Ingo,

Ingo Schwarze writes:
> Your diff is very wrong and completely breaks .Mt hyperlinking.

Indeed, you're right. In my defense, there are a couple of reasonable(?)
factors that led to my misunderstanding here:

 - Markdown parsers I've encountered (and GitHub in particular) do
   hyperlink email addresses automatically without <>; I neglected to
   check the spec, as it never occurred to me that Markdown parsers
   might exist that don't do this. Mea culpa.

 - The regression test output mdoc/Aq/author.out_markdown gets
   misinterpreted in Markdown parsers, including try.commonmark.org,
   as <addr> gets passed through as an HTML tag.

 - The regression test mdoc/Mt/simple.in behaves differently between
   output formats: in -Thtml, "Mt ." is hyperlinked, and in -Tmarkdown,
   it is not, at least in CommonMark and GitHub.

The second point seems particularly problematic: any Mt whose argument
doesn't contain '@' seems to be passed through common Markdown parsers
as an HTML tag. Like, say,

.Aq Mt pre

or

.Aq Mt "link rel=stylesheet href=https://example.com/malicious.css"

Is this something we should be worried about? Are there other macros a
crafty manual could use to inject arbitrary HTML into Markdown output?

-- 
Anthony J. Bentley
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: -Tmarkdown: don't wrap mailtos in <>
  2017-03-09  7:27   ` Anthony J. Bentley
@ 2017-03-09 16:39     ` Ingo Schwarze
  0 siblings, 0 replies; 4+ messages in thread
From: Ingo Schwarze @ 2017-03-09 16:39 UTC (permalink / raw)
  To: Anthony J. Bentley; +Cc: tech

Hi Anthony,

Anthony J. Bentley wrote on Thu, Mar 09, 2017 at 12:27:00AM -0700:

> In my defense,

No need to apologize, this still leads to some interesting
considerations.

>  - Markdown parsers I've encountered (and GitHub in particular) do
>    hyperlink email addresses automatically without <>;

That's an example of the plague of non-standard extensions, which
makes markdown a highly insecure language.  You never know which
character combinations in your plain text might get interpreted
as markdown extension tokens, generating unexpected and potentially
hostile HTML code.

Unfortunately, i don't see what we can do about that.  Compiling a
list of extensions and escaping everything that might trigger them
is just not feasible, both because of the number of different
extensions, many of which are poorly documented and buggy, and
because escaping is quite difficult in markdown, often not portably
possible at all without loss of content - think of Unicode characters
in literal font mode as a striking example.

So, the output of a markdown compiler must always be treated as
potentially hostile HTML code, even if the input is known to have
been written by a trustworthy person.  I doubt that many people
serving markdown content on the web are aware of that caveat.
But it implies that mandoc(1) doesn't make anything worse by
potentially producing insecure output.  Of course i try to avoid
producing blatantly insecure output, for example always escaping
"<" and ">" when they occur in mdoc(7) input.  But there is no
way to reach actual safety here.

>  - The regression test output mdoc/Aq/author.out_markdown gets
>    misinterpreted in Markdown parsers, including try.commonmark.org,
>    as <addr> gets passed through as an HTML tag.

That's an example of the excessive context-sensitivity of the
markup language.  Whether <...> is an automatic link or an HTML
tag depends on "...", and the rules differ for each and every
markdown compiler.

Maybe we should do the strictest possible validation that any of
the markdown compilers does before emitting <...>, and in case
of failure emit plain text rather than <...>?  I don't really like
that idea.  It complicates the -T markdown code, and pulls the
abhorrent context dependency of markdown into mandoc.

Maybe we should never emit automatic links at all, but always
transform .Lk foo into [foo](foo)?  That is probably safer.

Maybe we should also transform .Mt foo into [foo](mailto:foo)?
That has the side benefit of also avoiding the ridiculous anti-spam
escaping of the mail address.

>  - The regression test mdoc/Mt/simple.in behaves differently between
>    output formats: in -Thtml, "Mt ." is hyperlinked, and in -Tmarkdown,
>    it is not, at least in CommonMark and GitHub.

Emitting [foo](mailto:foo) would cure that, too.

> The second point seems particularly problematic: any Mt whose argument
> doesn't contain '@' seems to be passed through common Markdown parsers
> as an HTML tag. Like, say,
> 
> .Aq Mt pre
> 
> or
> 
> .Aq Mt "link rel=stylesheet href=https://example.com/malicious.css"
> 
> Is this something we should be worried about? Are there other macros a
> crafty manual could use to inject arbitrary HTML into Markdown output?

As explained above, we can never be sure.

But automatic links seem dangerous enough to avoid them altogether,
even though real safety can never be attained.

Do you agree?
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-03-09 16:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-09  1:47 -Tmarkdown: don't wrap mailtos in <> Anthony J. Bentley
2017-03-09  2:32 ` Ingo Schwarze
2017-03-09  7:27   ` Anthony J. Bentley
2017-03-09 16:39     ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).