tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: "Anthony J. Bentley" <anthony@anjbe.name>
Cc: tech@mdocml.bsd.lv
Subject: Re: -Tmarkdown: don't wrap mailtos in <>
Date: Thu, 9 Mar 2017 17:39:03 +0100	[thread overview]
Message-ID: <20170309163903.GB36097@athene.usta.de> (raw)
In-Reply-To: <48572.1489044420@cathet.us>

Hi Anthony,

Anthony J. Bentley wrote on Thu, Mar 09, 2017 at 12:27:00AM -0700:

> In my defense,

No need to apologize, this still leads to some interesting
considerations.

>  - Markdown parsers I've encountered (and GitHub in particular) do
>    hyperlink email addresses automatically without <>;

That's an example of the plague of non-standard extensions, which
makes markdown a highly insecure language.  You never know which
character combinations in your plain text might get interpreted
as markdown extension tokens, generating unexpected and potentially
hostile HTML code.

Unfortunately, i don't see what we can do about that.  Compiling a
list of extensions and escaping everything that might trigger them
is just not feasible, both because of the number of different
extensions, many of which are poorly documented and buggy, and
because escaping is quite difficult in markdown, often not portably
possible at all without loss of content - think of Unicode characters
in literal font mode as a striking example.

So, the output of a markdown compiler must always be treated as
potentially hostile HTML code, even if the input is known to have
been written by a trustworthy person.  I doubt that many people
serving markdown content on the web are aware of that caveat.
But it implies that mandoc(1) doesn't make anything worse by
potentially producing insecure output.  Of course i try to avoid
producing blatantly insecure output, for example always escaping
"<" and ">" when they occur in mdoc(7) input.  But there is no
way to reach actual safety here.

>  - The regression test output mdoc/Aq/author.out_markdown gets
>    misinterpreted in Markdown parsers, including try.commonmark.org,
>    as <addr> gets passed through as an HTML tag.

That's an example of the excessive context-sensitivity of the
markup language.  Whether <...> is an automatic link or an HTML
tag depends on "...", and the rules differ for each and every
markdown compiler.

Maybe we should do the strictest possible validation that any of
the markdown compilers does before emitting <...>, and in case
of failure emit plain text rather than <...>?  I don't really like
that idea.  It complicates the -T markdown code, and pulls the
abhorrent context dependency of markdown into mandoc.

Maybe we should never emit automatic links at all, but always
transform .Lk foo into [foo](foo)?  That is probably safer.

Maybe we should also transform .Mt foo into [foo](mailto:foo)?
That has the side benefit of also avoiding the ridiculous anti-spam
escaping of the mail address.

>  - The regression test mdoc/Mt/simple.in behaves differently between
>    output formats: in -Thtml, "Mt ." is hyperlinked, and in -Tmarkdown,
>    it is not, at least in CommonMark and GitHub.

Emitting [foo](mailto:foo) would cure that, too.

> The second point seems particularly problematic: any Mt whose argument
> doesn't contain '@' seems to be passed through common Markdown parsers
> as an HTML tag. Like, say,
> 
> .Aq Mt pre
> 
> or
> 
> .Aq Mt "link rel=stylesheet href=https://example.com/malicious.css"
> 
> Is this something we should be worried about? Are there other macros a
> crafty manual could use to inject arbitrary HTML into Markdown output?

As explained above, we can never be sure.

But automatic links seem dangerous enough to avoid them altogether,
even though real safety can never be attained.

Do you agree?
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

      reply	other threads:[~2017-03-09 16:39 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-09  1:47 Anthony J. Bentley
2017-03-09  2:32 ` Ingo Schwarze
2017-03-09  7:27   ` Anthony J. Bentley
2017-03-09 16:39     ` Ingo Schwarze [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170309163903.GB36097@athene.usta.de \
    --to=schwarze@usta.de \
    --cc=anthony@anjbe.name \
    --cc=tech@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).