public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* A way to convert PDF to Markdown or other (Solution!)
@ 2017-09-12 13:14 BP Jonsson
       [not found] ` <a26eb786-3e48-671b-99ca-dbc3aeb274f5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: BP Jonsson @ 2017-09-12 13:14 UTC (permalink / raw)
  To: pandoc-discuss

This may be old news to some, but I can't remember having
seen it, so I make a post for the record.

I just discovered that you can convert a PDF to Markdown (or
any other format Pandoc supports) by uploading it to Google
Drive, opening it in Google Docs and downloading it from
there as DOCX, then converting the DOCX to Markdown with
Pandoc. The result is quite good!

The steps:

0.  Log into <drive.google.com> in a web browser.

1.  Select the menu [My Drive⏷] → [Upload files…] in the top
     bar.

2.  At least on my system a file dialog opens. Browse to the
     PDF file; select it; click [Open].

3.  The file appears in the "Quick access" field just below
     the top bar.

4.  Right-click the file thumbnail; choose [Open with] →
     [Google Docs]. You should now find yourself in the
     Google Docs document view.

5.  In the [File] menu choose [Download as]
     → [Microsoft Word (.docx)].

6.  Save the DOCX file to disk and convert it with Pandoc
     the same as you would any DOCX file.

I hope this is of use to someone!

/bpj

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a26eb786-3e48-671b-99ca-dbc3aeb274f5%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* A way to convert PDF to Markdown or other (Solution!)
       [not found] ` <a26eb786-3e48-671b-99ca-dbc3aeb274f5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-09-13  6:07   ` Kolen Cheung
       [not found]     ` <8cd2b406-4f28-4c44-9fe8-2ff183276db8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2017-09-21 11:17   ` CR
  1 sibling, 1 reply; 5+ messages in thread
From: Kolen Cheung @ 2017-09-13  6:07 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 212 bytes --]

Sounds interesting.

I used a cli tool for Google Drive before (gdrive), and for those who are interested, you probably can chain them together to upload a PDF and download a docx from it and pipe it from there.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A way to convert PDF to Markdown or other (Solution!)
       [not found]     ` <8cd2b406-4f28-4c44-9fe8-2ff183276db8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-09-13 15:20       ` Paulo Ney de Souza
       [not found]         ` <CAFVhNZOsud92D-3=vpWWBrzzc2m=HnDhe=pjqM6nPKtKBL90NQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Paulo Ney de Souza @ 2017-09-13 15:20 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1571 bytes --]

I would be interested in hearing how!

Paulo Ney

On Tue, Sep 12, 2017 at 11:07 PM, Kolen Cheung <christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:

> Sounds interesting.
>
> I used a cli tool for Google Drive before (gdrive), and for those who are
> interested, you probably can chain them together to upload a PDF and
> download a docx from it and pipe it from there.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/pandoc-discuss/8cd2b406-4f28-4c44-9fe8-2ff183276db8%
> 40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZOsud92D-3%3DvpWWBrzzc2m%3DHnDhe%3DpjqM6nPKtKBL90NQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 2743 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A way to convert PDF to Markdown or other (Solution!)
       [not found]         ` <CAFVhNZOsud92D-3=vpWWBrzzc2m=HnDhe=pjqM6nPKtKBL90NQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-09-20  0:29           ` David Sanson
  0 siblings, 0 replies; 5+ messages in thread
From: David Sanson @ 2017-09-20  0:29 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2013 bytes --]

Here is a bash function that does it. It leaves the docx file in your 
working directory and pipes the markdown to STDOUT.

function pdf2md() {
   key=$(gdrive import "$1" | cut -d' ' -f2)
   gdrive export "$key" --mime 
application/vnd.openxmlformats-officedocument.wordprocessingml.document
   pandoc "$1.docx" -t markdown -s
}

On Wednesday, September 13, 2017 at 10:20:20 AM UTC-5, Paulo Ney de Souza 
wrote:
>
> I would be interested in hearing how!
>
> Paulo Ney
>
> On Tue, Sep 12, 2017 at 11:07 PM, Kolen Cheung <christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org 
> <javascript:>> wrote:
>
>> Sounds interesting.
>>
>> I used a cli tool for Google Drive before (gdrive), and for those who are 
>> interested, you probably can chain them together to upload a PDF and 
>> download a docx from it and pipe it from there.
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>.
>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org 
>> <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/8cd2b406-4f28-4c44-9fe8-2ff183276db8%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/8d8de477-28cb-4c44-8021-803c03baeb69%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 4157 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A way to convert PDF to Markdown or other (Solution!)
       [not found] ` <a26eb786-3e48-671b-99ca-dbc3aeb274f5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2017-09-13  6:07   ` Kolen Cheung
@ 2017-09-21 11:17   ` CR
  1 sibling, 0 replies; 5+ messages in thread
From: CR @ 2017-09-21 11:17 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 972 bytes --]

I use <https://document.online-convert.com/convert-to-txt> to extract the 
text from a PDF, then manually edit the text to turn it into Markdown. It 
works pretty well. Except sometimes the footnotes in the PDF are 
mis-formatted in the output text. This is one of the better PDF to text 
convertors I've used and I tested 5 or 6 of the free online convertors.

I may have to try your method though.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/24ee679b-3b0a-4ffe-ba28-8f8f25c56052%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1404 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-09-21 11:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-12 13:14 A way to convert PDF to Markdown or other (Solution!) BP Jonsson
     [not found] ` <a26eb786-3e48-671b-99ca-dbc3aeb274f5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-09-13  6:07   ` Kolen Cheung
     [not found]     ` <8cd2b406-4f28-4c44-9fe8-2ff183276db8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-09-13 15:20       ` Paulo Ney de Souza
     [not found]         ` <CAFVhNZOsud92D-3=vpWWBrzzc2m=HnDhe=pjqM6nPKtKBL90NQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-09-20  0:29           ` David Sanson
2017-09-21 11:17   ` CR

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).