* Converting all PDF in a folder to Markdown quick question @ 2022-05-10 2:10 D.J. [not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: D.J. @ 2022-05-10 2:10 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1101 bytes --] I've been able to convert all html in a folder to Markdown using: find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"' {} \; I'm now trying to convert all pdf in a folder to Markdown using a similar approach: find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"' {} \; All that happens when I trigger the same for pdf conversion is terminal then says: dquote> The PDF isn't converted to Markdown. I'm a terminal/Pandoc newb so maybe it's different for PDF? I have installed Mactex using Brew install librsvg python homebrew/cask/basictex *Do I have to do a full pdflatex install, or is there a better formula here? Thanks!* -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 1861 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* AW: Converting all PDF in a folder to Markdown quick question [not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2022-05-10 6:30 ` denis.maier-NSENcxR/0n0 2022-05-10 6:36 ` 'Saku Laesvuori' via pandoc-discuss 2022-05-10 8:09 ` Bastien DUMONT 2 siblings, 0 replies; 5+ messages in thread From: denis.maier-NSENcxR/0n0 @ 2022-05-10 6:30 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw Pandoc converts to pdf not from pdf See https://pandoc.org/ PDF → via pdflatex, lualatex, xelatex, latexmk, tectonic, wkhtmltopdf, weasyprint, prince, pagedjs-cli, context, or pdfroff. Denis ________________________________________ Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> im Auftrag von D.J. <futurevintage-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Gesendet: Dienstag, 10. Mai 2022 04:10:02 An: pandoc-discuss Betreff: Converting all PDF in a folder to Markdown quick question I've been able to convert all html in a folder to Markdown using: find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"' {} \; I'm now trying to convert all pdf in a folder to Markdown using a similar approach: find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"' {} \; All that happens when I trigger the same for pdf conversion is terminal then says: dquote> The PDF isn't converted to Markdown. I'm a terminal/Pandoc newb so maybe it's different for PDF? I have installed Mactex using Brew install librsvg python homebrew/cask/basictex Do I have to do a full pdflatex install, or is there a better formula here? Thanks! -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com<https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/46ac8399a65b49d89196ea0772f70c50%40unibe.ch. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Converting all PDF in a folder to Markdown quick question [not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-05-10 6:30 ` AW: " denis.maier-NSENcxR/0n0 @ 2022-05-10 6:36 ` 'Saku Laesvuori' via pandoc-discuss 2022-05-10 8:09 ` Bastien DUMONT 2 siblings, 0 replies; 5+ messages in thread From: 'Saku Laesvuori' via pandoc-discuss @ 2022-05-10 6:36 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 721 bytes --] > I'm now trying to convert all pdf in a folder to Markdown using a similar > approach: Pandoc can't convert from pdf (see `pandoc --list-input-formats`). > > find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o > "${0%.html}.md"' {} \; > > All that happens when I trigger the same for pdf conversion is terminal > then says: > > dquote> This means that the shell is expecting a closing double quote ("). > > *Do I have to do a full pdflatex install, or is there a better formula > here? Thanks!* pdflatex is used to convert *to* pdf, it can't be used to convert from pdf. You could try to use `pdftotext` from poppler to extract text from a pdf file, but it converts to plain text, not markdown. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Converting all PDF in a folder to Markdown quick question [not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-05-10 6:30 ` AW: " denis.maier-NSENcxR/0n0 2022-05-10 6:36 ` 'Saku Laesvuori' via pandoc-discuss @ 2022-05-10 8:09 ` Bastien DUMONT 2022-05-11 1:37 ` D.J. 2 siblings, 1 reply; 5+ messages in thread From: Bastien DUMONT @ 2022-05-10 8:09 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw You may be interested by https://github.com/jzillmann/pdf-to-markdown Le Monday 09 May 2022 à 07:10:02PM, D.J. a écrit : > I've been able to convert all html in a folder to Markdown using: > > find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"' > {} \; > > I'm now trying to convert all pdf in a folder to Markdown using a similar > approach: > > find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"' > {} \; > > All that happens when I trigger the same for pdf conversion is terminal then > says: > > dquote> > > The PDF isn't converted to Markdown. I'm a terminal/Pandoc newb so maybe it's > different for PDF? I have installed Mactex using Brew install librsvg python > homebrew/cask/basictex > > Do I have to do a full pdflatex install, or is there a better formula here? > Thanks! > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [2]https://groups.google.com/d/msgid/ > pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com. > > References: > > [1] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [2] https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com?utm_medium=email&utm_source=footer -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Ynodv07l6ax1A2Tg%40localhost. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Converting all PDF in a folder to Markdown quick question 2022-05-10 8:09 ` Bastien DUMONT @ 2022-05-11 1:37 ` D.J. 0 siblings, 0 replies; 5+ messages in thread From: D.J. @ 2022-05-11 1:37 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 2268 bytes --] Aha, missed the not converting from part. Will try with one of the other tools. Thank you all for your responses! On Tuesday, May 10, 2022 at 1:10:15 AM UTC-7 Bastien Dumont wrote: > You may be interested by https://github.com/jzillmann/pdf-to-markdown > > Le Monday 09 May 2022 à 07:10:02PM, D.J. a écrit : > > I've been able to convert all html in a folder to Markdown using: > > > > find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o > "${0%.html}.md"' > > {} \; > > > > I'm now trying to convert all pdf in a folder to Markdown using a similar > > approach: > > > > find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o > "${0%.html}.md"' > > {} \; > > > > All that happens when I trigger the same for pdf conversion is terminal > then > > says: > > > > dquote> > > > > The PDF isn't converted to Markdown. I'm a terminal/Pandoc newb so maybe > it's > > different for PDF? I have installed Mactex using Brew install librsvg > python > > homebrew/cask/basictex > > > > Do I have to do a full pdflatex install, or is there a better formula > here? > > Thanks! > > > > -- > > You received this message because you are subscribed to the Google Groups > > "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email > > to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit [2] > https://groups.google.com/d/msgid/ > > pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com. > > > > References: > > > > [1] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > > [2] > https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com?utm_medium=email&utm_source=footer > > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/2440b99b-01f0-48be-bd5c-6cfb7885bb9an%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 4327 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-05-11 1:37 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-10 2:10 Converting all PDF in a folder to Markdown quick question D.J. [not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-05-10 6:30 ` AW: " denis.maier-NSENcxR/0n0 2022-05-10 6:36 ` 'Saku Laesvuori' via pandoc-discuss 2022-05-10 8:09 ` Bastien DUMONT 2022-05-11 1:37 ` D.J.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).