public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* link-titler.hs: a script for adding titles to links in Markdown files while preserving formatting
@ 2022-04-02 14:52 Gwern Branwen
       [not found] ` <CAMwO0gw5WVQXYdnxs8xRWM6d0LjJmQPFQoBtFiHMkie94-4Lxg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Gwern Branwen @ 2022-04-02 14:52 UTC (permalink / raw)
  To: pandoc-discuss

https://github.com/gwern/gwern.net/blob/master/build/link-titler.hs
(uses https://github.com/gwern/gwern.net/blob/master/build/Query.hs
https://github.com/gwern/gwern.net/blob/master/build/Utils.hs
https://github.com/gwern/gwern.net/blob/master/build/LinkMetadata.hs )

Bare links in a Markdown file are hard to read or search for,
especially when you have thousands of them spread across ~300 Markdown
files like I do. You aren't going to add titles to each one by hand.
If you have a database of URLs/title pairs, then it wouldn't be too
hard to walk the Pandoc AST and add titles when there's not one
already (and I already did this at Hakyll compile-time)... but then
Pandoc will use its default formatting which can be nasty-looking,
particularly with all the arbitrary line-breaking (which will screw up
the very searches you hoped to improve by making the titles readable
inline).

So, I use a more heuristic approach: parse and walk the Pandoc AST...
but only to get a list of URLs missing titles! Then simply do
search-and-replace on "$URL)", which is how missing titles will
probably be written. (Links written with <a> or with the [foo][]
syntax won't be handled, but oh well.) This turns out to work well
enough that you can just run it automatically on all files daily to
keep things up to date.

-- 
gwern
https://www.gwern.net


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: link-titler.hs: a script for adding titles to links in Markdown files while preserving formatting
       [not found] ` <CAMwO0gw5WVQXYdnxs8xRWM6d0LjJmQPFQoBtFiHMkie94-4Lxg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2022-04-02 17:02   ` John MacFarlane
       [not found]     ` <m2v8vre7v9.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: John MacFarlane @ 2022-04-02 17:02 UTC (permalink / raw)
  To: Gwern Branwen, pandoc-discuss

Gwern Branwen <gwern-v26ZT+9V8bxeoWH0uzbU5w@public.gmane.org> writes:

> Pandoc will use its default formatting which can be nasty-looking,
> particularly with all the arbitrary line-breaking (which will screw up
> the very searches you hoped to improve by making the titles readable
> inline).

Not sure what you're talking about here.  What setting for --wrap
are you using?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: link-titler.hs: a script for adding titles to links in Markdown files while preserving formatting
       [not found]     ` <m2v8vre7v9.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
@ 2022-04-02 17:43       ` Gwern Branwen
       [not found]         ` <CAMwO0gxB4xeMEvTYMw9TAuJ7uX1eMNrF0xfisPCSWkRUb1BzOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Gwern Branwen @ 2022-04-02 17:43 UTC (permalink / raw)
  To: pandoc-discuss

I dunno; I have not looked into fixing linewrapping in detail because
Pandoc will change all the other formatting like headers and rulers,
and you've made it clear in the past that Pandoc will never roundtrip
Markdown formatting, so there's no point in investigating individual
fixes - even if there is a magic option for this or that, there will
still be all the other things and you'll still want to edit at the
text rather than AST level.

-- 
gwern
https://www.gwern.net


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: link-titler.hs: a script for adding titles to links in Markdown files while preserving formatting
       [not found]         ` <CAMwO0gxB4xeMEvTYMw9TAuJ7uX1eMNrF0xfisPCSWkRUb1BzOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2022-04-02 17:55           ` John MacFarlane
  0 siblings, 0 replies; 4+ messages in thread
From: John MacFarlane @ 2022-04-02 17:55 UTC (permalink / raw)
  To: Gwern Branwen, pandoc-discuss


--wrap=preserve will preserve the line wrapping from the source.
That might be what you want.

Gwern Branwen <gwern-v26ZT+9V8bxeoWH0uzbU5w@public.gmane.org> writes:

> I dunno; I have not looked into fixing linewrapping in detail because
> Pandoc will change all the other formatting like headers and rulers,
> and you've made it clear in the past that Pandoc will never roundtrip
> Markdown formatting, so there's no point in investigating individual
> fixes - even if there is a magic option for this or that, there will
> still be all the other things and you'll still want to edit at the
> text rather than AST level.
>
> -- 
> gwern
> https://www.gwern.net
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAMwO0gxB4xeMEvTYMw9TAuJ7uX1eMNrF0xfisPCSWkRUb1BzOg%40mail.gmail.com.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-04-02 17:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-02 14:52 link-titler.hs: a script for adding titles to links in Markdown files while preserving formatting Gwern Branwen
     [not found] ` <CAMwO0gw5WVQXYdnxs8xRWM6d0LjJmQPFQoBtFiHMkie94-4Lxg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-04-02 17:02   ` John MacFarlane
     [not found]     ` <m2v8vre7v9.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2022-04-02 17:43       ` Gwern Branwen
     [not found]         ` <CAMwO0gxB4xeMEvTYMw9TAuJ7uX1eMNrF0xfisPCSWkRUb1BzOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-04-02 17:55           ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).