public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Skinny-list detector script: 'list-columns.hs'
@ 2021-05-22 18:18 Gwern Branwen
  0 siblings, 0 replies; only message in thread
From: Gwern Branwen @ 2021-05-22 18:18 UTC (permalink / raw)
  To: pandoc-discuss

On Gwern.net, I use multiple column layouts for some lists like
https://www.gwern.net/DNM-archives#overall-coverage which are very
'skinny': thin + tall. They would take up an inordinate amount of
vertical space if simply left as normal 1-column lists, but there is
also no semantically-sensible way to collapse them.

To detect opportunities for using multiple-column layouts, I have
written a Pandoc API script which looks for lists of a certain minimum
length whose sub-list entries are no more than certain lengths when
rendered as plain text (to avoid penalizing elements like links, which
may be long to write but look much smaller):

    #!/usr/bin/env runhaskell
    {-# LANGUAGE OverloadedStrings #-}
    -- dependencies: libghc-pandoc-dev

    -- usage: 'lost-columns.hs [file]'; reads a Pandoc Markdown file
and looks for 'skinny tall' lists which are better rendered
    -- as multiple columns (supported on gwern.net by special CSS
triggered by '<div class="columns"></div>' wrappers)
    -- A skinny tall list is defined as a list which is at least 8
items long (so you get at least 2×4 columns—a 2×2 square or 2×3
rectangle looks dumb),
    -- and where the individual lines are all <75 characters wide
(>half the width of a gwern.net line at the utmost).

    module Main where

    import Text.Pandoc (def, nullMeta, queryWith, readerExtensions,
readMarkdown, runPure,
                        pandocExtensions, writePlain,
Block(BulletList, OrderedList), Pandoc(Pandoc))
    import qualified Data.Text as T (length, unlines, Text)
    import qualified Data.Text.IO as TIO (readFile, putStrLn)
    import System.Environment (getArgs)
    import Control.Monad (when, unless)

    -- | Map over the filenames
    main :: IO ()
    main = do
      fs <- getArgs
      let printfilenamep = head fs == "--print-filenames"
      let fs' = if printfilenamep then Prelude.drop 1 fs else fs
      mapM_ (printLists printfilenamep) fs'

    printLists :: Bool -> FilePath -> IO ()
    printLists printfilenamep file = do
      input <- TIO.readFile file
      let long = getLongLists input
      unless (null long) $ do
          when printfilenamep $ putStrLn $ file ++ ":"
          TIO.putStrLn $ T.unlines $ map simplified long

    listLengthMax, sublistsLengthMin :: Int
    listLengthMax = 75
    sublistsLengthMin = 8

    getLongLists :: T.Text -> [Block]
    getLongLists txt = let parsedEither = runPure $ readMarkdown
def{readerExtensions = pandocExtensions } txt
                            -- if we don't explicitly enable
footnotes, Pandoc interprets the footnotes as broken links, which
throws many spurious warnings to stdout
                       in case parsedEither of
                                  Left _ -> []
                                  Right pnd -> let lists = extractLists pnd in
                                                 filter (\x ->
listLength x < listLengthMax) lists

    extractLists :: Pandoc -> [Block]
    extractLists = queryWith extractList
     where
       extractList :: Block -> [Block]
       extractList l@(OrderedList _ _) = [l]
       extractList l@(BulletList _) = [l]
       extractList _ = []

    -- > listLength $ BulletList [[Para [Str "test"]],[Para [Str
"test2"],Para [Str "Continuation"]],[Para [Link ("",[],[]) [Str "WP"]
("https://en.wikipedia.org/wiki/Foo","")]],[Para [Str
"Final",Space,Str "line"]]]
    -- → 7
    listLength :: Block -> Int
    listLength (OrderedList _ list) = listLengthAvg list
    listLength (BulletList    list) = listLengthAvg list
    listLength _                    = maxBound
    listLengthAvg :: [[Block]] -> Int
    listLengthAvg list = if length list < sublistsLengthMin then maxBound else
                           let lengths = map listItemLength list in
maximum lengths

    -- > listItemLength $ [Para [Str "Foo", Link nullAttr [Str "bar"]
("https://en.wikipedia.org/wiki/Bar", "Wikipedia link")], Para [Str
"Continued Line"]]
    -- → 15
    -- > listItemLength $ [Para [Str "Foo", Link nullAttr [Str "bar"]
("https://en.wikipedia.org/wiki/Bar", "Wikipedia link")]]
    -- → 7
    -- > listItemLength $ [Para [Str "Continued Line"]]
    -- → 15
    listItemLength :: [Block] -> Int
    listItemLength is = let lengths = map listSubItemLength is in
maximum lengths

    -- > listSubItemLength $ Para [Str "Foo"]
    -- → 4
    -- > listSubItemLength $ Para [Str "Foo", Link nullAttr [Str
"bar"] ("https://en.wikipedia.org/wiki/Bar", "Wikipedia link")]
    -- → 7
    listSubItemLength :: Block -> Int
    listSubItemLength i = T.length $ simplified i

    simplified :: Block -> T.Text
    simplified i = let md = runPure $ writePlain def (Pandoc nullMeta [i]) in
                             case md of
                               Left _ -> error $ "Failed to render: " ++ show md
                               Right md' -> md'

Could perhaps be improved by doing some sort of max/length monoid over
the AST tree, but Pandoc makes it hard to update trees in place or
ascend systematically or get lengths, so I didn't bother. Any list
which needs such tricks is probably not a good candidate for
multiple-columns anyway.

-- 
gwern
https://www.gwern.net

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAMwO0gyaLj3Bdrs-jo%2BLPaquhD2LKogE63T-N0O1nSPfVyLDWQ%40mail.gmail.com.


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-05-22 18:18 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-22 18:18 Skinny-list detector script: 'list-columns.hs' Gwern Branwen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).