public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Converting all PDF in a folder to Markdown quick question
@ 2022-05-10  2:10 D.J.
       [not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: D.J. @ 2022-05-10  2:10 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1101 bytes --]

I've been able to convert all html in a folder to Markdown using:

find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o 
"${0%.html}.md"' {} \;

I'm now trying to convert all pdf in a folder to Markdown using a similar 
approach:

find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o 
"${0%.html}.md"' {} \;

All that happens when I trigger the same for pdf conversion is terminal 
then says:  

dquote>

The PDF isn't converted to Markdown. I'm a terminal/Pandoc newb so maybe 
it's different for PDF? I have installed Mactex using Brew install librsvg 
python homebrew/cask/basictex

*Do I have to do a full pdflatex install, or is there a better formula 
here? Thanks!*

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1861 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* AW: Converting all PDF in a folder to Markdown quick question
       [not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-05-10  6:30   ` denis.maier-NSENcxR/0n0
  2022-05-10  6:36   ` 'Saku Laesvuori' via pandoc-discuss
  2022-05-10  8:09   ` Bastien DUMONT
  2 siblings, 0 replies; 5+ messages in thread
From: denis.maier-NSENcxR/0n0 @ 2022-05-10  6:30 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Pandoc converts to pdf not from pdf

See https://pandoc.org/

PDF

    → via pdflatex, lualatex, xelatex, latexmk, tectonic, wkhtmltopdf, weasyprint, prince, pagedjs-cli, context, or pdfroff.

Denis

________________________________________
Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> im Auftrag von D.J. <futurevintage-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Gesendet: Dienstag, 10. Mai 2022 04:10:02
An: pandoc-discuss
Betreff: Converting all PDF in a folder to Markdown quick question

I've been able to convert all html in a folder to Markdown using:

find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"' {} \;

I'm now trying to convert all pdf in a folder to Markdown using a similar approach:

find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"' {} \;

All that happens when I trigger the same for pdf conversion is terminal then says:

dquote>

The PDF isn't converted to Markdown. I'm a terminal/Pandoc newb so maybe it's different for PDF? I have installed Mactex using Brew install librsvg python homebrew/cask/basictex

Do I have to do a full pdflatex install, or is there a better formula here? Thanks!

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com<https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/46ac8399a65b49d89196ea0772f70c50%40unibe.ch.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Converting all PDF in a folder to Markdown quick question
       [not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2022-05-10  6:30   ` AW: " denis.maier-NSENcxR/0n0
@ 2022-05-10  6:36   ` 'Saku Laesvuori' via pandoc-discuss
  2022-05-10  8:09   ` Bastien DUMONT
  2 siblings, 0 replies; 5+ messages in thread
From: 'Saku Laesvuori' via pandoc-discuss @ 2022-05-10  6:36 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]

> I'm now trying to convert all pdf in a folder to Markdown using a similar 
> approach:

Pandoc can't convert from pdf (see `pandoc --list-input-formats`). 

> 
> find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o 
> "${0%.html}.md"' {} \;
> 
> All that happens when I trigger the same for pdf conversion is terminal 
> then says:  
> 
> dquote>

This means that the shell is expecting a closing double quote (").

> 
> *Do I have to do a full pdflatex install, or is there a better formula 
> here? Thanks!*

pdflatex is used to convert *to* pdf, it can't be used to convert from
pdf. You could try to use `pdftotext` from poppler to extract text from
a pdf file, but it converts to plain text, not markdown.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Converting all PDF in a folder to Markdown quick question
       [not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2022-05-10  6:30   ` AW: " denis.maier-NSENcxR/0n0
  2022-05-10  6:36   ` 'Saku Laesvuori' via pandoc-discuss
@ 2022-05-10  8:09   ` Bastien DUMONT
  2022-05-11  1:37     ` D.J.
  2 siblings, 1 reply; 5+ messages in thread
From: Bastien DUMONT @ 2022-05-10  8:09 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

You may be interested by https://github.com/jzillmann/pdf-to-markdown

Le Monday 09 May 2022 à 07:10:02PM, D.J. a écrit :
> I've been able to convert all html in a folder to Markdown using:
> 
> find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"'
> {} \;
> 
> I'm now trying to convert all pdf in a folder to Markdown using a similar
> approach:
> 
> find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o "${0%.html}.md"'
> {} \;
> 
> All that happens when I trigger the same for pdf conversion is terminal then
> says: 
> 
> dquote>
> 
> The PDF isn't converted to Markdown. I'm a terminal/Pandoc newb so maybe it's
> different for PDF? I have installed Mactex using Brew install librsvg python
> homebrew/cask/basictex
> 
> Do I have to do a full pdflatex install, or is there a better formula here?
> Thanks!
> 
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit [2]https://groups.google.com/d/msgid/
> pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com.
> 
> References:
> 
> [1] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [2] https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com?utm_medium=email&utm_source=footer

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/Ynodv07l6ax1A2Tg%40localhost.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Converting all PDF in a folder to Markdown quick question
  2022-05-10  8:09   ` Bastien DUMONT
@ 2022-05-11  1:37     ` D.J.
  0 siblings, 0 replies; 5+ messages in thread
From: D.J. @ 2022-05-11  1:37 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2268 bytes --]

Aha, missed the not converting from part. Will try with one of the other 
tools. Thank you all for your responses!

On Tuesday, May 10, 2022 at 1:10:15 AM UTC-7 Bastien Dumont wrote:

> You may be interested by https://github.com/jzillmann/pdf-to-markdown
>
> Le Monday 09 May 2022 à 07:10:02PM, D.J. a écrit :
> > I've been able to convert all html in a folder to Markdown using:
> > 
> > find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o 
> "${0%.html}.md"'
> > {} \;
> > 
> > I'm now trying to convert all pdf in a folder to Markdown using a similar
> > approach:
> > 
> > find ./ -iname "*.pdf" -type f -exec sh -c 'pandoc "${0}" -o 
> "${0%.html}.md"'
> > {} \;
> > 
> > All that happens when I trigger the same for pdf conversion is terminal 
> then
> > says: 
> > 
> > dquote>
> > 
> > The PDF isn't converted to Markdown. I'm a terminal/Pandoc newb so maybe 
> it's
> > different for PDF? I have installed Mactex using Brew install librsvg 
> python
> > homebrew/cask/basictex
> > 
> > Do I have to do a full pdflatex install, or is there a better formula 
> here?
> > Thanks!
> > 
> > --
> > You received this message because you are subscribed to the Google Groups
> > "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email
> > to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit [2]
> https://groups.google.com/d/msgid/
> > pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com.
> > 
> > References:
> > 
> > [1] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > [2] 
> https://groups.google.com/d/msgid/pandoc-discuss/846e351a-a762-4c7d-8026-6fb700893f44n%40googlegroups.com?utm_medium=email&utm_source=footer
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/2440b99b-01f0-48be-bd5c-6cfb7885bb9an%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 4327 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-11  1:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-10  2:10 Converting all PDF in a folder to Markdown quick question D.J.
     [not found] ` <846e351a-a762-4c7d-8026-6fb700893f44n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-05-10  6:30   ` AW: " denis.maier-NSENcxR/0n0
2022-05-10  6:36   ` 'Saku Laesvuori' via pandoc-discuss
2022-05-10  8:09   ` Bastien DUMONT
2022-05-11  1:37     ` D.J.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).