From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/28418 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Gwern Branwen Newsgroups: gmane.text.pandoc Subject: Skinny-list detector script: 'list-columns.hs' Date: Sat, 22 May 2021 14:18:01 -0400 Message-ID: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23706"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDFJXQMSYMIRB7UVUWCQMGQESQBBL6I-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat May 22 20:18:41 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-ot1-f64.google.com ([209.85.210.64]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1lkWCu-0005t8-Fm for gtp-pandoc-discuss@m.gmane-mx.org; Sat, 22 May 2021 20:18:40 +0200 Original-Received: by mail-ot1-f64.google.com with SMTP id w8-20020a0568300788b029033d472f6029sf5912142ots.12 for ; Sat, 22 May 2021 11:18:40 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1621707519; cv=pass; d=google.com; s=arc-20160816; b=ofcFXEUICZVZCjQ9smM1l44UdB0kKbrWRFA7iYZyF6+trORn5itqRfc2IZXhksdePt BU8EiL6b6/DFO5q7GjK1yVwwML9PG4iLCjSy4pbEhASJCym7nQq5u+ijTPKfem8pRODf Z3huuReI47/Z53izmvwfws1QokGh7asPJIg0ErUAtwt1JdV9gRsboEYRDg6jVQ9af/sh D/t5pjjnK/BV1bqNyu2+9db6tHUEZoHHD4A5VYIhUO+9hsE+3cy6Mk6chMn1IyS5IjK/ zHC2he1nzA+gzOXEi3zEjwfXXBCWcgyXbcOEoXFZAuSLLo5wmtYyRb8JMxJR0MyVhA/S XqyQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-transfer-encoding :to:subject:message-id:date:from:mime-version:sender:dkim-signature; bh=oTJrSGbr4HBe8C4px8AdOlLWCucYOCQTnNcEGkCHado=; b=wVh75sDIsizMYnyzOdsUUcnkttJpdbAWWchAPM4DOMITUJJhO73UBT6YEDwKVkhKxp LSl8fJLG0Ut1xmG7fCXi+XuzJCEljdHiYiCqifJcKHcJIzjMjxfStglZFP3/8RYwcW+N BldcZheWLOFMoL4u/S6vKf5pv92/A9RBsJDN27YydSdz7HPyw955iDJ3HNaH8BsGXrDM BQxplfKynCloCZYBWkABTlTh+SqFw1Mrhu8DJD/Nh8q3JYFvS77LpUMLioI9jQYtIxLZ DvFS0iYMS02nNr5CHlYIdCM/O3TQ50MngR8rftCG9SthW+HjWGq4tHOofU+LbjRuPoxt G+sA== ARC-Authentication-Results: i=2; gmr-mx.google.com; spf=pass (google.com: domain of gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.166.41 as permitted sender) smtp.mailfrom=gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:mime-version:from:date:message-id:subject:to :content-transfer-encoding:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=oTJrSGbr4HBe8C4px8AdOlLWCucYOCQTnNcEGkCHado=; b=EJSKm2jPQsPRMCnOetBahmatgTZqC2fXTKqazpPOjU1iK/F2QVg8hp/TUatZIACWfI nLcDvx3XJH+W2bHZLUE2D3zR8LGuvUxCjZVV3a6+cCyQLvgkWB6CfNr9ruJ9Gsy0auT+ ajR70TB99z19MmIlCGpPyFLIz8h2fWAHjGg8ufZ+muaiiEqo9wtow4vaiDMrDaHiP+Vq cWw1tkO1kbZ/TMwTiTLG/+qu1o8j6Qlu1HfxCAZmC5VzxldxicG3iOQAgaqI5XKZwrLR zjdONz8upk42hLS8nHuajpOOAuUh722sufPk/Uv0ycNn6P77yWxGhMdObkkxPpBng5I6 yROQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:mime-version:from:date:message-id:subject :to:content-transfer-encoding:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=oTJrSGbr4HBe8C4px8AdOlLWCucYOCQTnNcEGkCHado=; b=iHSwCn5zeLitRM+vKVSSJ6128GCBTnN/TDODIoQJA9ZxUk/1WrQjpGq3IhOonwd4Cu u+XCp/eBDFY78TLZT/UVXN9ftFcllgiX1dWXT8Ym5f3BtKXWBQiSPg41+H0sSEm+Rs6/ 3tk8BQlIEatvXuvy7HfHqOqNjw8own2y0wwDybT1h8KJG9hszSh+8zgcuiAA0J3hVH48 MXjT1AgVO48rQG/BUEheaMCdz2Ysy8HWC0QCuQ2eOr2Q/vrr1HGTlYhnqKNOlJHiwYJb u8MPcIRyA+tRQdTvkfIb7rPUrSW4zrTLy//DL1MylzxG5C12uT1FXtc0Pnb048Ecxf0H 3Q6Q== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM532Z6A+6zQoWzQsQTEj7AINsNrM41VisgEw/OhbFdLmmoAex02fd UaXxAHGmPE4uBN3d3S/7jYI= X-Google-Smtp-Source: ABdhPJwLnkIy7Y0ALEXRjQvFwC/k07fs/XUNJnt1/6jj76jOxJnULcfuo7/+twSLw/zR5e7NicH/cg== X-Received: by 2002:a9d:4617:: with SMTP id y23mr12426289ote.71.1621707519511; Sat, 22 May 2021 11:18:39 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a54:4705:: with SMTP id k5ls516417oik.3.gmail; Sat, 22 May 2021 11:18:37 -0700 (PDT) X-Received: by 2002:a05:6808:8a:: with SMTP id s10mr5916967oic.66.1621707517799; Sat, 22 May 2021 11:18:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621707517; cv=none; d=google.com; s=arc-20160816; b=jO6pL/HSmlNkhQmH0YNsyAiSGQPJzO9IgEiX3/qMV4XoBQ1S74VIsepxykyBtsioyw N5cgxI5KOhzAoCvgHugqhShAe+WcSe38WSNhUAqLpSG96vfA38bGlS2jU2Je232VLZkt A+sT7nxgEgtZv9HZ1y1rgCMb0Kkukt5U9yaJePs+LEZzP4zAixWuzzHJx15nqXyv2rUU juqJeRnVOvFT9FyRqyJV1P4yEn3tG2EuYwQQOdh/EC5++QV27Fld0cgUc8QfZTrIPo2l aZ14do0ziIVT/FuKWjgkq7eQjRZBKWP/MJv3PQc/exVrwm5SiT46abOWS9rDZMD3tmWG w4Bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:to:subject:message-id:date:from :mime-version; bh=GNmx6fkOUwvntUoTqd1o3e+bxThR4MiES5Tt38gktks=; b=WNKQ773x2f4/p5xWXPiLdyn51ApQyv+G+ye0cNOPfH8bk9+l7WYLh0D+oiVBLw4NLp 72GYmkiYHMo6+k7JZ62bUndsaZ8Ulmvzo5dIu9/3nxE9h+q2zTgGaDexPQx+/cFeF1rl N4rGZWD7VaHpGLNh8renXivTnBs13TtYoXhBQdqm7B8BpPEFftl+LuWHiPuCJv1ioMRV SclkOsbrmNay91jef+T4bXY0WP/Xa+t9tTbn+0YMJvYcI7bOcRpFIiWtY++9R/2lTaMF EGmK2WOMFpR/6QkMfkBrmkAql1hd7a5dG035fk84PClOn84WBInRC0L9RoI/97lSyM/M 2JoA== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=pass (google.com: domain of gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.166.41 as permitted sender) smtp.mailfrom=gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Original-Received: from mail-io1-f41.google.com (mail-io1-f41.google.com. [209.85.166.41]) by gmr-mx.google.com with ESMTPS id w16si1345652oov.0.2021.05.22.11.18.37 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 22 May 2021 11:18:37 -0700 (PDT) Received-SPF: pass (google.com: domain of gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.166.41 as permitted sender) client-ip=209.85.166.41; Original-Received: by mail-io1-f41.google.com with SMTP id b81so1658161iof.2 for ; Sat, 22 May 2021 11:18:37 -0700 (PDT) X-Received: by 2002:a5d:8501:: with SMTP id q1mr6220606ion.66.1621707517127; Sat, 22 May 2021 11:18:37 -0700 (PDT) X-Original-Sender: gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.166.41 as permitted sender) smtp.mailfrom=gwern0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:28418 Archived-At: On Gwern.net, I use multiple column layouts for some lists like https://www.gwern.net/DNM-archives#overall-coverage which are very 'skinny': thin + tall. They would take up an inordinate amount of vertical space if simply left as normal 1-column lists, but there is also no semantically-sensible way to collapse them. To detect opportunities for using multiple-column layouts, I have written a Pandoc API script which looks for lists of a certain minimum length whose sub-list entries are no more than certain lengths when rendered as plain text (to avoid penalizing elements like links, which may be long to write but look much smaller): #!/usr/bin/env runhaskell {-# LANGUAGE OverloadedStrings #-} -- dependencies: libghc-pandoc-dev -- usage: 'lost-columns.hs [file]'; reads a Pandoc Markdown file and looks for 'skinny tall' lists which are better rendered -- as multiple columns (supported on gwern.net by special CSS triggered by '
' wrappers) -- A skinny tall list is defined as a list which is at least 8 items long (so you get at least 2=C3=974 columns=E2=80=94a 2=C3=972 square = or 2=C3=973 rectangle looks dumb), -- and where the individual lines are all <75 characters wide (>half the width of a gwern.net line at the utmost). module Main where import Text.Pandoc (def, nullMeta, queryWith, readerExtensions, readMarkdown, runPure, pandocExtensions, writePlain, Block(BulletList, OrderedList), Pandoc(Pandoc)) import qualified Data.Text as T (length, unlines, Text) import qualified Data.Text.IO as TIO (readFile, putStrLn) import System.Environment (getArgs) import Control.Monad (when, unless) -- | Map over the filenames main :: IO () main =3D do fs <- getArgs let printfilenamep =3D head fs =3D=3D "--print-filenames" let fs' =3D if printfilenamep then Prelude.drop 1 fs else fs mapM_ (printLists printfilenamep) fs' printLists :: Bool -> FilePath -> IO () printLists printfilenamep file =3D do input <- TIO.readFile file let long =3D getLongLists input unless (null long) $ do when printfilenamep $ putStrLn $ file ++ ":" TIO.putStrLn $ T.unlines $ map simplified long listLengthMax, sublistsLengthMin :: Int listLengthMax =3D 75 sublistsLengthMin =3D 8 getLongLists :: T.Text -> [Block] getLongLists txt =3D let parsedEither =3D runPure $ readMarkdown def{readerExtensions =3D pandocExtensions } txt -- if we don't explicitly enable footnotes, Pandoc interprets the footnotes as broken links, which throws many spurious warnings to stdout in case parsedEither of Left _ -> [] Right pnd -> let lists =3D extractLists p= nd in filter (\x -> listLength x < listLengthMax) lists extractLists :: Pandoc -> [Block] extractLists =3D queryWith extractList where extractList :: Block -> [Block] extractList l@(OrderedList _ _) =3D [l] extractList l@(BulletList _) =3D [l] extractList _ =3D [] -- > listLength $ BulletList [[Para [Str "test"]],[Para [Str "test2"],Para [Str "Continuation"]],[Para [Link ("",[],[]) [Str "WP"] ("https://en.wikipedia.org/wiki/Foo","")]],[Para [Str "Final",Space,Str "line"]]] -- =E2=86=92 7 listLength :: Block -> Int listLength (OrderedList _ list) =3D listLengthAvg list listLength (BulletList list) =3D listLengthAvg list listLength _ =3D maxBound listLengthAvg :: [[Block]] -> Int listLengthAvg list =3D if length list < sublistsLengthMin then maxBound= else let lengths =3D map listItemLength list in maximum lengths -- > listItemLength $ [Para [Str "Foo", Link nullAttr [Str "bar"] ("https://en.wikipedia.org/wiki/Bar", "Wikipedia link")], Para [Str "Continued Line"]] -- =E2=86=92 15 -- > listItemLength $ [Para [Str "Foo", Link nullAttr [Str "bar"] ("https://en.wikipedia.org/wiki/Bar", "Wikipedia link")]] -- =E2=86=92 7 -- > listItemLength $ [Para [Str "Continued Line"]] -- =E2=86=92 15 listItemLength :: [Block] -> Int listItemLength is =3D let lengths =3D map listSubItemLength is in maximum lengths -- > listSubItemLength $ Para [Str "Foo"] -- =E2=86=92 4 -- > listSubItemLength $ Para [Str "Foo", Link nullAttr [Str "bar"] ("https://en.wikipedia.org/wiki/Bar", "Wikipedia link")] -- =E2=86=92 7 listSubItemLength :: Block -> Int listSubItemLength i =3D T.length $ simplified i simplified :: Block -> T.Text simplified i =3D let md =3D runPure $ writePlain def (Pandoc nullMeta [= i]) in case md of Left _ -> error $ "Failed to render: " ++ sh= ow md Right md' -> md' Could perhaps be improved by doing some sort of max/length monoid over the AST tree, but Pandoc makes it hard to update trees in place or ascend systematically or get lengths, so I didn't bother. Any list which needs such tricks is probably not a good candidate for multiple-columns anyway. --=20 gwern https://www.gwern.net --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/CAMwO0gyaLj3Bdrs-jo%2BLPaquhD2LKogE63T-N0O1nSPfVyLDWQ%40mail= .gmail.com.