public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: support1-ZohPw8X7yHTQT0dZR+AlfA@public.gmane.org
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: How I copy HTML from browser to markdown by using pandoc
Date: Thu, 4 May 2017 17:32:46 +0300	[thread overview]
Message-ID: <20170504143246.GB23510@protected.rcdrun.com> (raw)

I wish to share how I am copying the HTML snippets from web pages and
converting them into markdown files on the hard disk directly.

Often I write some test on third party websites, and often there is
text that was authored by other people and is permissive to be copied
and distributed.

Yet, who wants all that HTML?

First I was searching for the Firefox extension that may copy the HTML
to markdown, but what I found is just this one:
https://addons.mozilla.org/en-us/firefox/addon/copy-as-markdown/ and
it does not copy all the HTML.

My search on DuckDuckGo
https://duckduckgo.com/html?q=copy+html+as+markdown&t=gnu discovered
this website that converts the HTML to markdown:
https://puppypaste.com/

And then I found the answer on
https://unix.stackexchange.com/questions/78395/save-html-from-clipboard-as-markdown-text
which refers directly to pandoc's command:

xclip -o -selection clipboard -t text/html | pandoc -r html -w markdown

I am using so much keyboard input, as I am user of the tiling window
manager StumpWM https://stumpwm.github.io/ that gives me good control
over windows.

So I have modified one key for the window manager to run the command,
to convert the X clipboard content with HTML data (it is not to work
on Windows) to  markdown files.

So, I made the configuration like:

(define-key *root-map* (kbd "M") "exec save-html-as-markdown")

Which means, when I press the keys C-t followed by upcase M, the
program save-html-as-markdown is to be run in background.

Small program in background is peace of Lisp code, that defines the
directory where such snippets of HTML, converted to markdown, are to
be saved and runs the pandoc command.

In this example, I am using CLISP as Lisp version http://www.clisp.org
but it really can be easily adapted to any programming language that
is to save the output of the pandoc command to a file. I am saving it
into files named after date and time.

#!/home/data1/protected/bin/lisp

(defun timestamp-filename nil
  (multiple-value-bind
        (second minute hour date month year day-of-week dst-p tz)
      (get-decoded-time)
    (format nil "~d-~2,'0d-~2,'0d-~2,'0d:~2,'0d:~2,'0d"
            year
            month
            date
            hour
            minute
            second
            )))

(defparameter *html-to-markdown-dir* "/home/data1/protected/Documents/HTML-Markdown/")

(let* ((filename (concatenate 'string (timestamp-filename) ".md"))
       (markdown (uiop:run-program "xclip -t text/html -selection primary -out | pandoc -r html -w commonmark" :output :string))
       (output (concatenate 'string *html-to-markdown-dir* filename)))
  (alexandria:write-string-into-file markdown output)
  (uiop:run-program (concatenate 'string "emacs-client-x " output)))

I am sure somebody can write much easier shorter Bash script or
Python, whatever similar script to give the same result.

It could be as simple as:

#!/bin/bash
FILE=`/bin/date -Iseconds`.md
xclip -t text/html -selection primary -out | pandoc -r html -w commonmark > $FILE
emacs $FILE

In my version, after the program execution, GNU Emacs editor is firing up the file
that was saved as
/home/data1/protected/Documents/HTML-Markdown/2017-05-04-16:58:27.md
for example, and I may modify the file and also make sure that file
does exist.

This way, anything that I write on someone's blog or if I find HTML
content that I wish to reuse, I simply mark, copy and press keys C-t M
that creates the markdown file on the disk.

Other window managers may do the same if they allow the keyboard
customization.

Jean


             reply	other threads:[~2017-05-04 14:32 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-04 14:32 support1-ZohPw8X7yHTQT0dZR+AlfA [this message]
     [not found] ` <20170504143246.GB23510-vvHXCvOI15V+RnA8QueWCFaTQe2KTcn/@public.gmane.org>
2017-05-05  9:53   ` Kolen Cheung
     [not found]     ` <bb322584-959f-4fbe-aceb-1d87128487dd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-05-05 10:00       ` RCDRUN
     [not found]         ` <20170505100011.GA11734-vvHXCvOI15V+RnA8QueWCFaTQe2KTcn/@public.gmane.org>
2017-05-06  0:55           ` Kolen Cheung
     [not found]             ` <58c1d809-a2c3-4a1e-a053-6b1f55da4576-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-05-06  0:59               ` Kolen Cheung
     [not found]                 ` <5977ca6e-4cda-4548-a82a-c5fdabaa9368-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-05-06  8:03                   ` support1-ZohPw8X7yHTQT0dZR+AlfA
2017-05-06  8:01               ` support1-ZohPw8X7yHTQT0dZR+AlfA

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170504143246.GB23510@protected.rcdrun.com \
    --to=support1-zohpw8x7yhtqt0dzr+alfa@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).