* link-titler.hs: a script for adding titles to links in Markdown files while preserving formatting @ 2022-04-02 14:52 Gwern Branwen [not found] ` <CAMwO0gw5WVQXYdnxs8xRWM6d0LjJmQPFQoBtFiHMkie94-4Lxg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Gwern Branwen @ 2022-04-02 14:52 UTC (permalink / raw) To: pandoc-discuss https://github.com/gwern/gwern.net/blob/master/build/link-titler.hs (uses https://github.com/gwern/gwern.net/blob/master/build/Query.hs https://github.com/gwern/gwern.net/blob/master/build/Utils.hs https://github.com/gwern/gwern.net/blob/master/build/LinkMetadata.hs ) Bare links in a Markdown file are hard to read or search for, especially when you have thousands of them spread across ~300 Markdown files like I do. You aren't going to add titles to each one by hand. If you have a database of URLs/title pairs, then it wouldn't be too hard to walk the Pandoc AST and add titles when there's not one already (and I already did this at Hakyll compile-time)... but then Pandoc will use its default formatting which can be nasty-looking, particularly with all the arbitrary line-breaking (which will screw up the very searches you hoped to improve by making the titles readable inline). So, I use a more heuristic approach: parse and walk the Pandoc AST... but only to get a list of URLs missing titles! Then simply do search-and-replace on "$URL)", which is how missing titles will probably be written. (Links written with <a> or with the [foo][] syntax won't be handled, but oh well.) This turns out to work well enough that you can just run it automatically on all files daily to keep things up to date. -- gwern https://www.gwern.net ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <CAMwO0gw5WVQXYdnxs8xRWM6d0LjJmQPFQoBtFiHMkie94-4Lxg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: link-titler.hs: a script for adding titles to links in Markdown files while preserving formatting [not found] ` <CAMwO0gw5WVQXYdnxs8xRWM6d0LjJmQPFQoBtFiHMkie94-4Lxg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2022-04-02 17:02 ` John MacFarlane [not found] ` <m2v8vre7v9.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: John MacFarlane @ 2022-04-02 17:02 UTC (permalink / raw) To: Gwern Branwen, pandoc-discuss Gwern Branwen <gwern-v26ZT+9V8bxeoWH0uzbU5w@public.gmane.org> writes: > Pandoc will use its default formatting which can be nasty-looking, > particularly with all the arbitrary line-breaking (which will screw up > the very searches you hoped to improve by making the titles readable > inline). Not sure what you're talking about here. What setting for --wrap are you using? ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <m2v8vre7v9.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>]
* Re: link-titler.hs: a script for adding titles to links in Markdown files while preserving formatting [not found] ` <m2v8vre7v9.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org> @ 2022-04-02 17:43 ` Gwern Branwen [not found] ` <CAMwO0gxB4xeMEvTYMw9TAuJ7uX1eMNrF0xfisPCSWkRUb1BzOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Gwern Branwen @ 2022-04-02 17:43 UTC (permalink / raw) To: pandoc-discuss I dunno; I have not looked into fixing linewrapping in detail because Pandoc will change all the other formatting like headers and rulers, and you've made it clear in the past that Pandoc will never roundtrip Markdown formatting, so there's no point in investigating individual fixes - even if there is a magic option for this or that, there will still be all the other things and you'll still want to edit at the text rather than AST level. -- gwern https://www.gwern.net ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <CAMwO0gxB4xeMEvTYMw9TAuJ7uX1eMNrF0xfisPCSWkRUb1BzOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: link-titler.hs: a script for adding titles to links in Markdown files while preserving formatting [not found] ` <CAMwO0gxB4xeMEvTYMw9TAuJ7uX1eMNrF0xfisPCSWkRUb1BzOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2022-04-02 17:55 ` John MacFarlane 0 siblings, 0 replies; 4+ messages in thread From: John MacFarlane @ 2022-04-02 17:55 UTC (permalink / raw) To: Gwern Branwen, pandoc-discuss --wrap=preserve will preserve the line wrapping from the source. That might be what you want. Gwern Branwen <gwern-v26ZT+9V8bxeoWH0uzbU5w@public.gmane.org> writes: > I dunno; I have not looked into fixing linewrapping in detail because > Pandoc will change all the other formatting like headers and rulers, > and you've made it clear in the past that Pandoc will never roundtrip > Markdown formatting, so there's no point in investigating individual > fixes - even if there is a magic option for this or that, there will > still be all the other things and you'll still want to edit at the > text rather than AST level. > > -- > gwern > https://www.gwern.net > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAMwO0gxB4xeMEvTYMw9TAuJ7uX1eMNrF0xfisPCSWkRUb1BzOg%40mail.gmail.com. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-04-02 17:55 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-02 14:52 link-titler.hs: a script for adding titles to links in Markdown files while preserving formatting Gwern Branwen [not found] ` <CAMwO0gw5WVQXYdnxs8xRWM6d0LjJmQPFQoBtFiHMkie94-4Lxg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2022-04-02 17:02 ` John MacFarlane [not found] ` <m2v8vre7v9.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org> 2022-04-02 17:43 ` Gwern Branwen [not found] ` <CAMwO0gxB4xeMEvTYMw9TAuJ7uX1eMNrF0xfisPCSWkRUb1BzOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2022-04-02 17:55 ` John MacFarlane
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).