* Converting all PDF in a folder to Markdown quick question
@ 2022-05-10 2:10 D.J.
[not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: D.J. @ 2022-05-10 2:10 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1101 bytes --]
I've been able to convert all html in a folder to Markdown using:
find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o
"${0%.html}.md"' {} \;
I'm now trying to convert all pdf in a folder to Markdown using a similar
approach:
find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o
"${0%.html}.md"' {} \;
All that happens when I trigger the same for pdf conversion is terminal
then says:
dquote>
The PDF isn't converted to Markdown. I'm a terminal/Pandoc newb so maybe
it's different for PDF? I have installed Mactex using Brew install librsvg
python homebrew/cask/basictex
*Do I have to do a full pdflatex install, or is there a better formula
here? Thanks!*
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 1861 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* AW: Converting all PDF in a folder to Markdown quick question
[not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-05-10 6:30 ` denis.maier-NSENcxR/0n0
2022-05-10 6:36 ` 'Saku Laesvuori' via pandoc-discuss
2022-05-10 8:09 ` Bastien DUMONT
2 siblings, 0 replies; 5+ messages in thread
From: denis.maier-NSENcxR/0n0 @ 2022-05-10 6:30 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
Pandoc converts to pdf not from pdf
See https://pandoc.org/
PDF
→ via pdflatex, lualatex, xelatex, latexmk, tectonic, wkhtmltopdf, weasyprint, prince, pagedjs-cli, context, or pdfroff.
Denis
________________________________________
Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> im Auftrag von D.J. <futurevintage-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Gesendet: Dienstag, 10. Mai 2022 04:10:02
An: pandoc-discuss
Betreff: Converting all PDF in a folder to Markdown quick question
I've been able to convert all html in a folder to Markdown using:
find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"' {} \;
I'm now trying to convert all pdf in a folder to Markdown using a similar approach:
find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"' {} \;
All that happens when I trigger the same for pdf conversion is terminal then says:
dquote>
The PDF isn't converted to Markdown. I'm a terminal/Pandoc newb so maybe it's different for PDF? I have installed Mactex using Brew install librsvg python homebrew/cask/basictex
Do I have to do a full pdflatex install, or is there a better formula here? Thanks!
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com<https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/46ac8399a65b49d89196ea0772f70c50%40unibe.ch.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Converting all PDF in a folder to Markdown quick question
[not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-05-10 6:30 ` AW: " denis.maier-NSENcxR/0n0
@ 2022-05-10 6:36 ` 'Saku Laesvuori' via pandoc-discuss
2022-05-10 8:09 ` Bastien DUMONT
2 siblings, 0 replies; 5+ messages in thread
From: 'Saku Laesvuori' via pandoc-discuss @ 2022-05-10 6:36 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 721 bytes --]
> I'm now trying to convert all pdf in a folder to Markdown using a similar
> approach:
Pandoc can't convert from pdf (see `pandoc --list-input-formats`).
>
> find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o
> "${0%.html}.md"' {} \;
>
> All that happens when I trigger the same for pdf conversion is terminal
> then says:
>
> dquote>
This means that the shell is expecting a closing double quote (").
>
> *Do I have to do a full pdflatex install, or is there a better formula
> here? Thanks!*
pdflatex is used to convert *to* pdf, it can't be used to convert from
pdf. You could try to use `pdftotext` from poppler to extract text from
a pdf file, but it converts to plain text, not markdown.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Converting all PDF in a folder to Markdown quick question
[not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-05-10 6:30 ` AW: " denis.maier-NSENcxR/0n0
2022-05-10 6:36 ` 'Saku Laesvuori' via pandoc-discuss
@ 2022-05-10 8:09 ` Bastien DUMONT
2022-05-11 1:37 ` D.J.
2 siblings, 1 reply; 5+ messages in thread
From: Bastien DUMONT @ 2022-05-10 8:09 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
You may be interested by https://github.com/jzillmann/pdf-to-markdown
Le Monday 09 May 2022 à 07:10:02PM, D.J. a écrit :
> I've been able to convert all html in a folder to Markdown using:
>
> find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"'
> {} \;
>
> I'm now trying to convert all pdf in a folder to Markdown using a similar
> approach:
>
> find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"'
> {} \;
>
> All that happens when I trigger the same for pdf conversion is terminal then
> says:
>
> dquote>
>
> The PDF isn't converted to Markdown. I'm a terminal/Pandoc newb so maybe it's
> different for PDF? I have installed Mactex using Brew install librsvg python
> homebrew/cask/basictex
>
> Do I have to do a full pdflatex install, or is there a better formula here?
> Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit [2]https://groups.google.com/d/msgid/
> pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com.
>
> References:
>
> [1] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [2] https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com?utm_medium=email&utm_source=footer
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Ynodv07l6ax1A2Tg%40localhost.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Converting all PDF in a folder to Markdown quick question
2022-05-10 8:09 ` Bastien DUMONT
@ 2022-05-11 1:37 ` D.J.
0 siblings, 0 replies; 5+ messages in thread
From: D.J. @ 2022-05-11 1:37 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 2268 bytes --]
Aha, missed the not converting from part. Will try with one of the other
tools. Thank you all for your responses!
On Tuesday, May 10, 2022 at 1:10:15 AM UTC-7 Bastien Dumont wrote:
> You may be interested by https://github.com/jzillmann/pdf-to-markdown
>
> Le Monday 09 May 2022 à 07:10:02PM, D.J. a écrit :
> > I've been able to convert all html in a folder to Markdown using:
> >
> > find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o
> "${0%.html}.md"'
> > {} \;
> >
> > I'm now trying to convert all pdf in a folder to Markdown using a similar
> > approach:
> >
> > find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o
> "${0%.html}.md"'
> > {} \;
> >
> > All that happens when I trigger the same for pdf conversion is terminal
> then
> > says:
> >
> > dquote>
> >
> > The PDF isn't converted to Markdown. I'm a terminal/Pandoc newb so maybe
> it's
> > different for PDF? I have installed Mactex using Brew install librsvg
> python
> > homebrew/cask/basictex
> >
> > Do I have to do a full pdflatex install, or is there a better formula
> here?
> > Thanks!
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email
> > to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit [2]
> https://groups.google.com/d/msgid/
> > pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com.
> >
> > References:
> >
> > [1] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > [2]
> https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com?utm_medium=email&utm_source=footer
>
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/2440b99b-01f0-48be-bd5c-6cfb7885bb9an%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 4327 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-05-11 1:37 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-10 2:10 Converting all PDF in a folder to Markdown quick question D.J.
[not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-05-10 6:30 ` AW: " denis.maier-NSENcxR/0n0
2022-05-10 6:36 ` 'Saku Laesvuori' via pandoc-discuss
2022-05-10 8:09 ` Bastien DUMONT
2022-05-11 1:37 ` D.J.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).