public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Add a flag/option to disallow all network access?
@ 2021-06-17 16:18 Michael Weiss
       [not found] ` <YMt1w2fD9xNcxSVi-PyQmACp+/18RaqMYiN0sRPp8/MnJGftv@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Weiss @ 2021-06-17 16:18 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

I currently use Pandoc for a somewhat strange(?) use-case: Converting
HTML-only emails to plaintext so that I can read them in Mutt.
I've used a text-based web browser for that in the past but recently
switched to Pandoc because it is better maintained, I trust it more to
securely parse untrusted/arbitrary HTML input [0] (is that correct or
are there any risks?), and most importantly I assumed Pandoc wouldn't
fetch any links, images, style sheets, etc. which would avoid any
tracking and therefore improve privacy.

So far this has worked very well :)
However, when I tested this setup via Email Privacy Tester [1] I noticed
that Pandoc still leaks my IP address (obviously also revealing when I
open/read the mail) by fetching an Iframe [2].

Knowing this I'm wondering if it would make sense to add a flag/option
to disallow any network access (ideally this would even be fairly simple
to implement but I'm not familiar enough with the code / Haskell).
Maybe this is even already possible via the PandocPure [3] monad?
Nonetheless it would be nice to have a CLI option/parameter like
--no-network-access (or even something like --sandboxed or --no-io to
disallow other types of IO as well).

What do you think of this feature request?

Kind regards,
Michael

PS: For my use-case I've noticed that I can avoid this issue by enabling
the raw_html extension (found that in src/Text/Pandoc/Readers/HTML.hs
but it's likely not ideal either although it does at least seem safe for
my use-case(?)). I.e. I use the following now:
text/html; pandoc --from=html+raw_html --to=plain | less
text/html; pandoc --from=html+raw_html --to=plain; copiousoutput

PPS: And thanks for Pandoc btw! It's such an awesome project that I use
for years now.

[0]: https://pandoc.org/MANUAL.html#a-note-on-security
[1]: https://www.emailprivacytester.com/
[2]: https://www.emailprivacytester.com/testDescription?test=iframe
[3]: https://pandoc.org/using-the-pandoc-api.html#the-pandocmonad-class


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Add a flag/option to disallow all network access?
       [not found] ` <YMt1w2fD9xNcxSVi-PyQmACp+/18RaqMYiN0sRPp8/MnJGftv@public.gmane.org>
@ 2021-06-17 17:24   ` Joseph Reagle
  2021-06-17 20:08   ` John MacFarlane
  1 sibling, 0 replies; 4+ messages in thread
From: Joseph Reagle @ 2021-06-17 17:24 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

This doesn't address your feature request, but it could be a useful hack: set a null http proxy (with an instantaneous timeout) with whatever tool you use, whether it's lynx, w3m, links, etc. I don't know if this can be done with pandoc's `--request-header=`.

On 21-06-17 12:18, Michael Weiss wrote:
> I currently use Pandoc for a somewhat strange(?) use-case: Converting
> HTML-only emails to plaintext so that I can read them in Mutt.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4d640aae-c629-b3e8-e621-1a56e59e3148%40reagle.org.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Add a flag/option to disallow all network access?
       [not found] ` <YMt1w2fD9xNcxSVi-PyQmACp+/18RaqMYiN0sRPp8/MnJGftv@public.gmane.org>
  2021-06-17 17:24   ` Joseph Reagle
@ 2021-06-17 20:08   ` John MacFarlane
       [not found]     ` <m2h7hw1cf8.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
  1 sibling, 1 reply; 4+ messages in thread
From: John MacFarlane @ 2021-06-17 20:08 UTC (permalink / raw)
  To: Michael Weiss, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


Yes, I've been wanting to do something like this.

https://github.com/jgm/pandoc/issues/5045

Michael Weiss <dev.primeos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I currently use Pandoc for a somewhat strange(?) use-case: Converting
> HTML-only emails to plaintext so that I can read them in Mutt.
> I've used a text-based web browser for that in the past but recently
> switched to Pandoc because it is better maintained, I trust it more to
> securely parse untrusted/arbitrary HTML input [0] (is that correct or
> are there any risks?), and most importantly I assumed Pandoc wouldn't
> fetch any links, images, style sheets, etc. which would avoid any
> tracking and therefore improve privacy.
>
> So far this has worked very well :)
> However, when I tested this setup via Email Privacy Tester [1] I noticed
> that Pandoc still leaks my IP address (obviously also revealing when I
> open/read the mail) by fetching an Iframe [2].
>
> Knowing this I'm wondering if it would make sense to add a flag/option
> to disallow any network access (ideally this would even be fairly simple
> to implement but I'm not familiar enough with the code / Haskell).
> Maybe this is even already possible via the PandocPure [3] monad?
> Nonetheless it would be nice to have a CLI option/parameter like
> --no-network-access (or even something like --sandboxed or --no-io to
> disallow other types of IO as well).
>
> What do you think of this feature request?
>
> Kind regards,
> Michael
>
> PS: For my use-case I've noticed that I can avoid this issue by enabling
> the raw_html extension (found that in src/Text/Pandoc/Readers/HTML.hs
> but it's likely not ideal either although it does at least seem safe for
> my use-case(?)). I.e. I use the following now:
> text/html; pandoc --from=html+raw_html --to=plain | less
> text/html; pandoc --from=html+raw_html --to=plain; copiousoutput
>
> PPS: And thanks for Pandoc btw! It's such an awesome project that I use
> for years now.
>
> [0]: https://pandoc.org/MANUAL.html#a-note-on-security
> [1]: https://www.emailprivacytester.com/
> [2]: https://www.emailprivacytester.com/testDescription?test=iframe
> [3]: https://pandoc.org/using-the-pandoc-api.html#the-pandocmonad-class
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YMt1w2fD9xNcxSVi%40jarvis.primeos.dev.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Add a flag/option to disallow all network access?
       [not found]     ` <m2h7hw1cf8.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
@ 2021-06-17 21:37       ` Michael Weiss
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Weiss @ 2021-06-17 21:37 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On Thu, 17 Jun, 2021 at 13:08:59 -0700, John MacFarlane wrote:
> Yes, I've been wanting to do something like this.
> https://github.com/jgm/pandoc/issues/5045

That's awesome, thanks for the reply! In hindsight I should've searched
for "sandbox" as well. Restricting any IO (apart from the files
specified via CLI parameters) via the PandocPure monad seems like the
best idea and I also like the "--sandboxed" parameter name. I think that
would be a nice addition (like [0] already states) but it seems like the
implementation is unfortunately much more complicated than I thought.

I'll subscribe to the GitHub issue and from my side we can consider this
thread resolved then :)

Joseph wrote:
> This doesn't address your feature request, but it could be a useful hack: set a null http proxy (with an instantaneous timeout) with whatever tool you use, whether it's lynx, w3m, links, etc. I don't know if this can be done with pandoc's `--request-header=`.

That's an interesting idea, I somehow didn't think of that. Using a
network namespace with only the loopback interface would be another
option to guaranty there won't be any network I/O, e.g.:
unshare --user --net pandoc --from=html+raw_html --to=plain

However, both approaches could still leak information via DNS (not sure
about proxy clients but e.g. nscd can still cause DNS requests when
using network namespaces without additional countermeasures).

If the sandboxing is really important it might be best to use an
existing security sandbox like Firejail or Bubblewrap.

But a "--sandboxed" option for Pandoc would seem interesting
nonetheless (e.g. if user namespaces or a suid security sandbox isn't
available and a Pandoc option would be much easier to use).

Anyway, thanks for that idea.

[0]: https://github.com/jgm/pandoc/issues/5045#issuecomment-504469702

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YMvAoe5GNqghNAM6%40jarvis.primeos.dev.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-06-17 21:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-17 16:18 Add a flag/option to disallow all network access? Michael Weiss
     [not found] ` <YMt1w2fD9xNcxSVi-PyQmACp+/18RaqMYiN0sRPp8/MnJGftv@public.gmane.org>
2021-06-17 17:24   ` Joseph Reagle
2021-06-17 20:08   ` John MacFarlane
     [not found]     ` <m2h7hw1cf8.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2021-06-17 21:37       ` Michael Weiss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).