public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* link-titler.hs: a script for adding titles to links in Markdown files while preserving formatting
@ 2022-04-02 14:52 Gwern Branwen
       [not found] ` <CAMwO0gw5WVQXYdnxs8xRWM6d0LjJmQPFQoBtFiHMkie94-4Lxg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Gwern Branwen @ 2022-04-02 14:52 UTC (permalink / raw)
  To: pandoc-discuss

https://github.com/gwern/gwern.net/blob/master/build/link-titler.hs
(uses https://github.com/gwern/gwern.net/blob/master/build/Query.hs
https://github.com/gwern/gwern.net/blob/master/build/Utils.hs
https://github.com/gwern/gwern.net/blob/master/build/LinkMetadata.hs )

Bare links in a Markdown file are hard to read or search for,
especially when you have thousands of them spread across ~300 Markdown
files like I do. You aren't going to add titles to each one by hand.
If you have a database of URLs/title pairs, then it wouldn't be too
hard to walk the Pandoc AST and add titles when there's not one
already (and I already did this at Hakyll compile-time)... but then
Pandoc will use its default formatting which can be nasty-looking,
particularly with all the arbitrary line-breaking (which will screw up
the very searches you hoped to improve by making the titles readable
inline).

So, I use a more heuristic approach: parse and walk the Pandoc AST...
but only to get a list of URLs missing titles! Then simply do
search-and-replace on "$URL)", which is how missing titles will
probably be written. (Links written with <a> or with the [foo][]
syntax won't be handled, but oh well.) This turns out to work well
enough that you can just run it automatically on all files daily to
keep things up to date.

-- 
gwern
https://www.gwern.net


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-04-02 17:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-02 14:52 link-titler.hs: a script for adding titles to links in Markdown files while preserving formatting Gwern Branwen
     [not found] ` <CAMwO0gw5WVQXYdnxs8xRWM6d0LjJmQPFQoBtFiHMkie94-4Lxg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-04-02 17:02   ` John MacFarlane
     [not found]     ` <m2v8vre7v9.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2022-04-02 17:43       ` Gwern Branwen
     [not found]         ` <CAMwO0gxB4xeMEvTYMw9TAuJ7uX1eMNrF0xfisPCSWkRUb1BzOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-04-02 17:55           ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).