public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* How I copy HTML from browser to markdown by using pandoc
@ 2017-05-04 14:32 support1-ZohPw8X7yHTQT0dZR+AlfA
       [not found] ` <20170504143246.GB23510-vvHXCvOI15V+RnA8QueWCFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: support1-ZohPw8X7yHTQT0dZR+AlfA @ 2017-05-04 14:32 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

I wish to share how I am copying the HTML snippets from web pages and
converting them into markdown files on the hard disk directly.

Often I write some test on third party websites, and often there is
text that was authored by other people and is permissive to be copied
and distributed.

Yet, who wants all that HTML?

First I was searching for the Firefox extension that may copy the HTML
to markdown, but what I found is just this one:
https://addons.mozilla.org/en-us/firefox/addon/copy-as-markdown/ and
it does not copy all the HTML.

My search on DuckDuckGo
https://duckduckgo.com/html?q=copy+html+as+markdown&t=gnu discovered
this website that converts the HTML to markdown:
https://puppypaste.com/

And then I found the answer on
https://unix.stackexchange.com/questions/78395/save-html-from-clipboard-as-markdown-text
which refers directly to pandoc's command:

xclip -o -selection clipboard -t text/html | pandoc -r html -w markdown

I am using so much keyboard input, as I am user of the tiling window
manager StumpWM https://stumpwm.github.io/ that gives me good control
over windows.

So I have modified one key for the window manager to run the command,
to convert the X clipboard content with HTML data (it is not to work
on Windows) to  markdown files.

So, I made the configuration like:

(define-key *root-map* (kbd "M") "exec save-html-as-markdown")

Which means, when I press the keys C-t followed by upcase M, the
program save-html-as-markdown is to be run in background.

Small program in background is peace of Lisp code, that defines the
directory where such snippets of HTML, converted to markdown, are to
be saved and runs the pandoc command.

In this example, I am using CLISP as Lisp version http://www.clisp.org
but it really can be easily adapted to any programming language that
is to save the output of the pandoc command to a file. I am saving it
into files named after date and time.

#!/home/data1/protected/bin/lisp

(defun timestamp-filename nil
  (multiple-value-bind
        (second minute hour date month year day-of-week dst-p tz)
      (get-decoded-time)
    (format nil "~d-~2,'0d-~2,'0d-~2,'0d:~2,'0d:~2,'0d"
            year
            month
            date
            hour
            minute
            second
            )))

(defparameter *html-to-markdown-dir* "/home/data1/protected/Documents/HTML-Markdown/")

(let* ((filename (concatenate 'string (timestamp-filename) ".md"))
       (markdown (uiop:run-program "xclip -t text/html -selection primary -out | pandoc -r html -w commonmark" :output :string))
       (output (concatenate 'string *html-to-markdown-dir* filename)))
  (alexandria:write-string-into-file markdown output)
  (uiop:run-program (concatenate 'string "emacs-client-x " output)))

I am sure somebody can write much easier shorter Bash script or
Python, whatever similar script to give the same result.

It could be as simple as:

#!/bin/bash
FILE=`/bin/date -Iseconds`.md
xclip -t text/html -selection primary -out | pandoc -r html -w commonmark > $FILE
emacs $FILE

In my version, after the program execution, GNU Emacs editor is firing up the file
that was saved as
/home/data1/protected/Documents/HTML-Markdown/2017-05-04-16:58:27.md
for example, and I may modify the file and also make sure that file
does exist.

This way, anything that I write on someone's blog or if I find HTML
content that I wish to reuse, I simply mark, copy and press keys C-t M
that creates the markdown file on the disk.

Other window managers may do the same if they allow the keyboard
customization.

Jean


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-05-06  8:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-04 14:32 How I copy HTML from browser to markdown by using pandoc support1-ZohPw8X7yHTQT0dZR+AlfA
     [not found] ` <20170504143246.GB23510-vvHXCvOI15V+RnA8QueWCFaTQe2KTcn/@public.gmane.org>
2017-05-05  9:53   ` Kolen Cheung
     [not found]     ` <bb322584-959f-4fbe-aceb-1d87128487dd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-05-05 10:00       ` RCDRUN
     [not found]         ` <20170505100011.GA11734-vvHXCvOI15V+RnA8QueWCFaTQe2KTcn/@public.gmane.org>
2017-05-06  0:55           ` Kolen Cheung
     [not found]             ` <58c1d809-a2c3-4a1e-a053-6b1f55da4576-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-05-06  0:59               ` Kolen Cheung
     [not found]                 ` <5977ca6e-4cda-4548-a82a-c5fdabaa9368-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-05-06  8:03                   ` support1-ZohPw8X7yHTQT0dZR+AlfA
2017-05-06  8:01               ` support1-ZohPw8X7yHTQT0dZR+AlfA

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).