* Add a flag/option to disallow all network access? @ 2021-06-17 16:18 Michael Weiss [not found] ` <YMt1w2fD9xNcxSVi-PyQmACp+/18RaqMYiN0sRPp8/MnJGftv@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Michael Weiss @ 2021-06-17 16:18 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw I currently use Pandoc for a somewhat strange(?) use-case: Converting HTML-only emails to plaintext so that I can read them in Mutt. I've used a text-based web browser for that in the past but recently switched to Pandoc because it is better maintained, I trust it more to securely parse untrusted/arbitrary HTML input [0] (is that correct or are there any risks?), and most importantly I assumed Pandoc wouldn't fetch any links, images, style sheets, etc. which would avoid any tracking and therefore improve privacy. So far this has worked very well :) However, when I tested this setup via Email Privacy Tester [1] I noticed that Pandoc still leaks my IP address (obviously also revealing when I open/read the mail) by fetching an Iframe [2]. Knowing this I'm wondering if it would make sense to add a flag/option to disallow any network access (ideally this would even be fairly simple to implement but I'm not familiar enough with the code / Haskell). Maybe this is even already possible via the PandocPure [3] monad? Nonetheless it would be nice to have a CLI option/parameter like --no-network-access (or even something like --sandboxed or --no-io to disallow other types of IO as well). What do you think of this feature request? Kind regards, Michael PS: For my use-case I've noticed that I can avoid this issue by enabling the raw_html extension (found that in src/Text/Pandoc/Readers/HTML.hs but it's likely not ideal either although it does at least seem safe for my use-case(?)). I.e. I use the following now: text/html; pandoc --from=html+raw_html --to=plain | less text/html; pandoc --from=html+raw_html --to=plain; copiousoutput PPS: And thanks for Pandoc btw! It's such an awesome project that I use for years now. [0]: https://pandoc.org/MANUAL.html#a-note-on-security [1]: https://www.emailprivacytester.com/ [2]: https://www.emailprivacytester.com/testDescription?test=iframe [3]: https://pandoc.org/using-the-pandoc-api.html#the-pandocmonad-class ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <YMt1w2fD9xNcxSVi-PyQmACp+/18RaqMYiN0sRPp8/MnJGftv@public.gmane.org>]
* Re: Add a flag/option to disallow all network access? [not found] ` <YMt1w2fD9xNcxSVi-PyQmACp+/18RaqMYiN0sRPp8/MnJGftv@public.gmane.org> @ 2021-06-17 17:24 ` Joseph Reagle 2021-06-17 20:08 ` John MacFarlane 1 sibling, 0 replies; 4+ messages in thread From: Joseph Reagle @ 2021-06-17 17:24 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw This doesn't address your feature request, but it could be a useful hack: set a null http proxy (with an instantaneous timeout) with whatever tool you use, whether it's lynx, w3m, links, etc. I don't know if this can be done with pandoc's `--request-header=`. On 21-06-17 12:18, Michael Weiss wrote: > I currently use Pandoc for a somewhat strange(?) use-case: Converting > HTML-only emails to plaintext so that I can read them in Mutt. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4d640aae-c629-b3e8-e621-1a56e59e3148%40reagle.org. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Add a flag/option to disallow all network access? [not found] ` <YMt1w2fD9xNcxSVi-PyQmACp+/18RaqMYiN0sRPp8/MnJGftv@public.gmane.org> 2021-06-17 17:24 ` Joseph Reagle @ 2021-06-17 20:08 ` John MacFarlane [not found] ` <m2h7hw1cf8.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org> 1 sibling, 1 reply; 4+ messages in thread From: John MacFarlane @ 2021-06-17 20:08 UTC (permalink / raw) To: Michael Weiss, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw Yes, I've been wanting to do something like this. https://github.com/jgm/pandoc/issues/5045 Michael Weiss <dev.primeos-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > I currently use Pandoc for a somewhat strange(?) use-case: Converting > HTML-only emails to plaintext so that I can read them in Mutt. > I've used a text-based web browser for that in the past but recently > switched to Pandoc because it is better maintained, I trust it more to > securely parse untrusted/arbitrary HTML input [0] (is that correct or > are there any risks?), and most importantly I assumed Pandoc wouldn't > fetch any links, images, style sheets, etc. which would avoid any > tracking and therefore improve privacy. > > So far this has worked very well :) > However, when I tested this setup via Email Privacy Tester [1] I noticed > that Pandoc still leaks my IP address (obviously also revealing when I > open/read the mail) by fetching an Iframe [2]. > > Knowing this I'm wondering if it would make sense to add a flag/option > to disallow any network access (ideally this would even be fairly simple > to implement but I'm not familiar enough with the code / Haskell). > Maybe this is even already possible via the PandocPure [3] monad? > Nonetheless it would be nice to have a CLI option/parameter like > --no-network-access (or even something like --sandboxed or --no-io to > disallow other types of IO as well). > > What do you think of this feature request? > > Kind regards, > Michael > > PS: For my use-case I've noticed that I can avoid this issue by enabling > the raw_html extension (found that in src/Text/Pandoc/Readers/HTML.hs > but it's likely not ideal either although it does at least seem safe for > my use-case(?)). I.e. I use the following now: > text/html; pandoc --from=html+raw_html --to=plain | less > text/html; pandoc --from=html+raw_html --to=plain; copiousoutput > > PPS: And thanks for Pandoc btw! It's such an awesome project that I use > for years now. > > [0]: https://pandoc.org/MANUAL.html#a-note-on-security > [1]: https://www.emailprivacytester.com/ > [2]: https://www.emailprivacytester.com/testDescription?test=iframe > [3]: https://pandoc.org/using-the-pandoc-api.html#the-pandocmonad-class > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YMt1w2fD9xNcxSVi%40jarvis.primeos.dev. ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <m2h7hw1cf8.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>]
* Re: Add a flag/option to disallow all network access? [not found] ` <m2h7hw1cf8.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org> @ 2021-06-17 21:37 ` Michael Weiss 0 siblings, 0 replies; 4+ messages in thread From: Michael Weiss @ 2021-06-17 21:37 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw On Thu, 17 Jun, 2021 at 13:08:59 -0700, John MacFarlane wrote: > Yes, I've been wanting to do something like this. > https://github.com/jgm/pandoc/issues/5045 That's awesome, thanks for the reply! In hindsight I should've searched for "sandbox" as well. Restricting any IO (apart from the files specified via CLI parameters) via the PandocPure monad seems like the best idea and I also like the "--sandboxed" parameter name. I think that would be a nice addition (like [0] already states) but it seems like the implementation is unfortunately much more complicated than I thought. I'll subscribe to the GitHub issue and from my side we can consider this thread resolved then :) Joseph wrote: > This doesn't address your feature request, but it could be a useful hack: set a null http proxy (with an instantaneous timeout) with whatever tool you use, whether it's lynx, w3m, links, etc. I don't know if this can be done with pandoc's `--request-header=`. That's an interesting idea, I somehow didn't think of that. Using a network namespace with only the loopback interface would be another option to guaranty there won't be any network I/O, e.g.: unshare --user --net pandoc --from=html+raw_html --to=plain However, both approaches could still leak information via DNS (not sure about proxy clients but e.g. nscd can still cause DNS requests when using network namespaces without additional countermeasures). If the sandboxing is really important it might be best to use an existing security sandbox like Firejail or Bubblewrap. But a "--sandboxed" option for Pandoc would seem interesting nonetheless (e.g. if user namespaces or a suid security sandbox isn't available and a Pandoc option would be much easier to use). Anyway, thanks for that idea. [0]: https://github.com/jgm/pandoc/issues/5045#issuecomment-504469702 -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YMvAoe5GNqghNAM6%40jarvis.primeos.dev. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-06-17 21:37 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-06-17 16:18 Add a flag/option to disallow all network access? Michael Weiss [not found] ` <YMt1w2fD9xNcxSVi-PyQmACp+/18RaqMYiN0sRPp8/MnJGftv@public.gmane.org> 2021-06-17 17:24 ` Joseph Reagle 2021-06-17 20:08 ` John MacFarlane [not found] ` <m2h7hw1cf8.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org> 2021-06-17 21:37 ` Michael Weiss
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).