From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from scc-mailout-kit-01.scc.kit.edu (scc-mailout-kit-01.scc.kit.edu [129.13.231.81]) by fantadrom.bsd.lv (OpenSMTPD) with ESMTP id cdadd146 for ; Thu, 9 Mar 2017 11:39:08 -0500 (EST) Received: from asta-nat.asta.uni-karlsruhe.de ([172.22.63.82] helo=hekate.usta.de) by scc-mailout-kit-01.scc.kit.edu with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (envelope-from ) id 1cm15m-0004Lc-FK; Thu, 09 Mar 2017 17:39:07 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.77) (envelope-from ) id 1cm15k-0006fm-NW; Thu, 09 Mar 2017 17:39:04 +0100 Received: from athene.usta.de ([172.24.96.10]) by donnerwolke.usta.de with esmtp (Exim 4.84_2) (envelope-from ) id 1cm15j-0000BT-Ic; Thu, 09 Mar 2017 17:39:04 +0100 Received: from localhost (athene.usta.de [local]) by athene.usta.de (OpenSMTPD) with ESMTPA id 5ed33aa6; Thu, 9 Mar 2017 17:39:03 +0100 (CET) Date: Thu, 9 Mar 2017 17:39:03 +0100 From: Ingo Schwarze To: "Anthony J. Bentley" Cc: tech@mdocml.bsd.lv Subject: Re: -Tmarkdown: don't wrap mailtos in <> Message-ID: <20170309163903.GB36097@athene.usta.de> References: <34636.1489024072@cathet.us> <20170309023235.GA76398@athene.usta.de> <48572.1489044420@cathet.us> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <48572.1489044420@cathet.us> User-Agent: Mutt/1.6.2 (2016-07-01) Hi Anthony, Anthony J. Bentley wrote on Thu, Mar 09, 2017 at 12:27:00AM -0700: > In my defense, No need to apologize, this still leads to some interesting considerations. > - Markdown parsers I've encountered (and GitHub in particular) do > hyperlink email addresses automatically without <>; That's an example of the plague of non-standard extensions, which makes markdown a highly insecure language. You never know which character combinations in your plain text might get interpreted as markdown extension tokens, generating unexpected and potentially hostile HTML code. Unfortunately, i don't see what we can do about that. Compiling a list of extensions and escaping everything that might trigger them is just not feasible, both because of the number of different extensions, many of which are poorly documented and buggy, and because escaping is quite difficult in markdown, often not portably possible at all without loss of content - think of Unicode characters in literal font mode as a striking example. So, the output of a markdown compiler must always be treated as potentially hostile HTML code, even if the input is known to have been written by a trustworthy person. I doubt that many people serving markdown content on the web are aware of that caveat. But it implies that mandoc(1) doesn't make anything worse by potentially producing insecure output. Of course i try to avoid producing blatantly insecure output, for example always escaping "<" and ">" when they occur in mdoc(7) input. But there is no way to reach actual safety here. > - The regression test output mdoc/Aq/author.out_markdown gets > misinterpreted in Markdown parsers, including try.commonmark.org, > as gets passed through as an HTML tag. That's an example of the excessive context-sensitivity of the markup language. Whether <...> is an automatic link or an HTML tag depends on "...", and the rules differ for each and every markdown compiler. Maybe we should do the strictest possible validation that any of the markdown compilers does before emitting <...>, and in case of failure emit plain text rather than <...>? I don't really like that idea. It complicates the -T markdown code, and pulls the abhorrent context dependency of markdown into mandoc. Maybe we should never emit automatic links at all, but always transform .Lk foo into [foo](foo)? That is probably safer. Maybe we should also transform .Mt foo into [foo](mailto:foo)? That has the side benefit of also avoiding the ridiculous anti-spam escaping of the mail address. > - The regression test mdoc/Mt/simple.in behaves differently between > output formats: in -Thtml, "Mt ." is hyperlinked, and in -Tmarkdown, > it is not, at least in CommonMark and GitHub. Emitting [foo](mailto:foo) would cure that, too. > The second point seems particularly problematic: any Mt whose argument > doesn't contain '@' seems to be passed through common Markdown parsers > as an HTML tag. Like, say, > > .Aq Mt pre > > or > > .Aq Mt "link rel=stylesheet href=https://example.com/malicious.css" > > Is this something we should be worried about? Are there other macros a > crafty manual could use to inject arbitrary HTML into Markdown output? As explained above, we can never be sure. But automatic links seem dangerous enough to avoid them altogether, even though real safety can never be attained. Do you agree? Ingo -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv