* Auto-smallcaps filter
@ 2020-02-19 20:14 Gwern Branwen
[not found] ` <CAMwO0gwVEMVMrGrSv3F4qq=ZSVeWgaq8xZ2PE+xKx51GWDKW1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Gwern Branwen @ 2020-02-19 20:14 UTC (permalink / raw)
To: pandoc-discuss
I wrote a plugin for my gwern.net Hakyll script
(https://www.gwern.net/hakyll.hs) which was slightly tricky, and so
might be of interest.
Bringhurst & other typographers recommend using small-caps for
acronyms/initials of 3 or more capital letters because with full
capitals, they look too big and dominate the page (eg Bringhurst 2004,
_Elements_ pg47; cf https://en.wikipedia.org/wiki/Small_caps#Uses
http://theworldsgreatestbook.com/book-design-part-5/
http://webtypography.net/3.2.2 )
This can be done by hand in Pandoc by using the span syntax like
`[ABC]{.smallcaps}`, but quickly grows tedious. It can also be done
reasonably easily with a query-replace regexp eg in Emacs
`(query-replace-regexp "\\([^>]\\)\\(\\\".*?\\\"\\)" "\\1<q>\\2</q>"
nil begin end)`, but still must be done manually because while almost
all uses in regular text can be smallcaps-fied, a blind regexp will
wreck a ton of things like URLs & tooltips, code blocks, etc.
However, if we walk a Pandoc AST and check for only acronyms/initials
inside a `Str`, where they *can't* be part of a `Link` or `CodeBlock`,
then looking over gwern.net ASTs, they seem to always be safe to
substitute in `SmallCaps` elements. Unfortunately, we can't use the
regular `Inline -> Inline` replacement pattern because `SmallCaps`
takes a `[Inline]` argument, and so we are doing `Str String ->
SmallCaps [Inline]` and changing the size/type.
So we instead walk the Pandoc AST, use a regexp to split on 3+ capital
letters, `SmallCaps` the matched text, and append recursively, and
return the concatenated results.
`bottomUp` is slower than `walk` but appears to be necessary here for
greedy generation; `walk` will do only *some* substitutions, which has
something to do with its tree traversal method, I think? (Regardless,
`smallcapsfy` doesn't seem to add *too* much overhead.)
The final code:
import Text.Pandoc
import Text.Regex.Posix ((=~))
smallcapsfy :: [Inline] -> [Inline]
smallcapsfy ((Str []):[]) = []
-- why `::String` on the regexp pattern? need to specify it
otherwise hakyll.hs OverloadedStrings makes it ambiguous & a type
error
smallcapsfy xs@(Str a : x) = let (before,matched,after) = a =~
("[A-Z][A-Z][A-Z]+"::String) :: (String,String,String)
in if matched==""
then xs -- no acronym anywhere in x
else [Str before, SmallCaps [Str
matched]] ++ smallcapsfy [Str after] ++ smallcapsfy x
smallcapsfy xs = xs
Regexp examples:
"BigGAN" =~ "[A-Z][A-Z][A-Z]+" :: (String,String,String)
~> ("Big","GAN","")
"BigGANNN BigGAN" =~ "[A-Z][A-Z][A-Z]+" :: (String,String,String)
~> ("Big","GANNN"," BigGAN")
"NSFW BigGAN" =~ "[A-Z][A-Z][A-Z]+" :: (String,String,String)
~> ("","NSFW"," BigGAN")
"BigGANNN BigGAN" =~ "[A-Z][A-Z][A-Z]" :: (String,String,String)
~> ("Big","GAN","NN BigGAN")
"biggan means big" =~ "[A-Z][A-Z][A-Z]" :: (String,String,String)
~> ("biggan means big","","")
Function examples:
smallcaps [Str "BigGAN"]
~> [Str "Big",SmallCaps [Str "GAN"]]
smallcaps [Str "BigGANNN means big"]
~> [Str "Big",SmallCaps [Str "GANNN"],Str " means big"]
smallcaps [Str "biggan means big"]
~> [Str "biggan means big"]
Whole-document examples:
bottomUp smallcapsfy [Str "bigGAN means", Emph [Str "BIG"]]
~> [Str "big",SmallCaps [Str "GAN"],Str " means",Emph [Str
"",SmallCaps [Str "BIG"]]]
--
gwern
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Auto-smallcaps filter
[not found] ` <CAMwO0gwVEMVMrGrSv3F4qq=ZSVeWgaq8xZ2PE+xKx51GWDKW1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-02-20 22:58 ` John MacFarlane
[not found] ` <yh480ko8tsrhm4.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: John MacFarlane @ 2020-02-20 22:58 UTC (permalink / raw)
To: Gwern Branwen, pandoc-discuss
You could use this idiom instead of bottomUp:
walk (concatMap go)
Where 'go' is Inline -> [Inline], 'walk (concatMap go)' is
[Inline] -> [Inline]. This should perform better than
bottomUp.
Gwern Branwen <gwern-v26ZT+9V8bxeoWH0uzbU5w@public.gmane.org> writes:
> I wrote a plugin for my gwern.net Hakyll script
> (https://www.gwern.net/hakyll.hs) which was slightly tricky, and so
> might be of interest.
>
> Bringhurst & other typographers recommend using small-caps for
> acronyms/initials of 3 or more capital letters because with full
> capitals, they look too big and dominate the page (eg Bringhurst 2004,
> _Elements_ pg47; cf https://en.wikipedia.org/wiki/Small_caps#Uses
> http://theworldsgreatestbook.com/book-design-part-5/
> http://webtypography.net/3.2.2 )
>
> This can be done by hand in Pandoc by using the span syntax like
> `[ABC]{.smallcaps}`, but quickly grows tedious. It can also be done
> reasonably easily with a query-replace regexp eg in Emacs
> `(query-replace-regexp "\\([^>]\\)\\(\\\".*?\\\"\\)" "\\1<q>\\2</q>"
> nil begin end)`, but still must be done manually because while almost
> all uses in regular text can be smallcaps-fied, a blind regexp will
> wreck a ton of things like URLs & tooltips, code blocks, etc.
>
> However, if we walk a Pandoc AST and check for only acronyms/initials
> inside a `Str`, where they *can't* be part of a `Link` or `CodeBlock`,
> then looking over gwern.net ASTs, they seem to always be safe to
> substitute in `SmallCaps` elements. Unfortunately, we can't use the
> regular `Inline -> Inline` replacement pattern because `SmallCaps`
> takes a `[Inline]` argument, and so we are doing `Str String ->
> SmallCaps [Inline]` and changing the size/type.
>
> So we instead walk the Pandoc AST, use a regexp to split on 3+ capital
> letters, `SmallCaps` the matched text, and append recursively, and
> return the concatenated results.
> `bottomUp` is slower than `walk` but appears to be necessary here for
> greedy generation; `walk` will do only *some* substitutions, which has
> something to do with its tree traversal method, I think? (Regardless,
> `smallcapsfy` doesn't seem to add *too* much overhead.)
>
> The final code:
>
> import Text.Pandoc
> import Text.Regex.Posix ((=~))
>
> smallcapsfy :: [Inline] -> [Inline]
> smallcapsfy ((Str []):[]) = []
> -- why `::String` on the regexp pattern? need to specify it
> otherwise hakyll.hs OverloadedStrings makes it ambiguous & a type
> error
> smallcapsfy xs@(Str a : x) = let (before,matched,after) = a =~
> ("[A-Z][A-Z][A-Z]+"::String) :: (String,String,String)
> in if matched==""
> then xs -- no acronym anywhere in x
> else [Str before, SmallCaps [Str
> matched]] ++ smallcapsfy [Str after] ++ smallcapsfy x
> smallcapsfy xs = xs
>
> Regexp examples:
>
> "BigGAN" =~ "[A-Z][A-Z][A-Z]+" :: (String,String,String)
> ~> ("Big","GAN","")
> "BigGANNN BigGAN" =~ "[A-Z][A-Z][A-Z]+" :: (String,String,String)
> ~> ("Big","GANNN"," BigGAN")
> "NSFW BigGAN" =~ "[A-Z][A-Z][A-Z]+" :: (String,String,String)
> ~> ("","NSFW"," BigGAN")
> "BigGANNN BigGAN" =~ "[A-Z][A-Z][A-Z]" :: (String,String,String)
> ~> ("Big","GAN","NN BigGAN")
> "biggan means big" =~ "[A-Z][A-Z][A-Z]" :: (String,String,String)
> ~> ("biggan means big","","")
>
> Function examples:
>
> smallcaps [Str "BigGAN"]
> ~> [Str "Big",SmallCaps [Str "GAN"]]
> smallcaps [Str "BigGANNN means big"]
> ~> [Str "Big",SmallCaps [Str "GANNN"],Str " means big"]
> smallcaps [Str "biggan means big"]
> ~> [Str "biggan means big"]
>
> Whole-document examples:
>
> bottomUp smallcapsfy [Str "bigGAN means", Emph [Str "BIG"]]
> ~> [Str "big",SmallCaps [Str "GAN"],Str " means",Emph [Str
> "",SmallCaps [Str "BIG"]]]
>
> --
> gwern
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAMwO0gwVEMVMrGrSv3F4qq%3DZSVeWgaq8xZ2PE%2BxKx51GWDKW1w%40mail.gmail.com.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Auto-smallcaps filter
[not found] ` <yh480ko8tsrhm4.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2020-02-21 4:12 ` Gwern Branwen
[not found] ` <CAMwO0gxRuvPQnDcu-8BgLVzLWZBrj90C5_RWPccN-NzF_BqFxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Gwern Branwen @ 2020-02-21 4:12 UTC (permalink / raw)
To: pandoc-discuss
That seems to work, thanks.
--
gwern
https://www.gwern.net
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Auto-smallcaps filter
[not found] ` <CAMwO0gxRuvPQnDcu-8BgLVzLWZBrj90C5_RWPccN-NzF_BqFxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-04-07 15:12 ` Gwern Branwen
0 siblings, 0 replies; 4+ messages in thread
From: Gwern Branwen @ 2020-04-07 15:12 UTC (permalink / raw)
To: pandoc-discuss
To update this: for HTML output, this code is broken because it
doesn't transform the smallcapsfied phrases into lowercase, and
smallcaps on uppercase is a null op. We need to set a new CSS class,
lowercase it, and then smallcaps it as usual.
For HTML output, this is not enough, because using smallcaps on a
capital letter is a null-op. We *could* just rewrite the capitals to
lowercases with `map toLower` etc, but then that breaks copypaste: the
underlying text for a 'Big[GAN]{.smallcaps}' is now
'[Biggan]{.smallcaps}' etc. So instead of using native SmallCaps AST
elements, we create a new HTML span class for *just* all-caps separate
from the pre-existing standard Pandoci 'smallcaps' CSS class,
'smallcaps-auto'; we annotate capitals with that new class in a Span
rather than SmallCaps, and then in CSS, we do `span.smallcaps-auto {
font-feature-settings: 'smcp'; text-transform: lowercase; }` -
smallcaps is enabled for this class, but we also lowercase everything,
thereby forcing the intended smallcaps appearance while ensuring that
copy-paste produces 'BigGAN' (as written) instead of 'Biggan'.
Aside from the new CSS declaration specified above, `smallcapsfy` need
to set a Span rather than SmallCaps as follows:
smallcapsfy :: [Inline] -> [Inline]
smallcapsfy = concatMap go
where
go :: Inline -> [Inline]
go (Str []) = []
go x@(Str a) = let (before,matched,after) = a =~
("[A-Z][A-Z][A-Z]+"::String) :: (String,String,String)
in if matched==""
then [x] -- no acronym anywhere in x
else [Str before, Span ("",
["smallcaps-auto"], []) [Str matched]] ++ go (Str after)
go x = [x]
--
gwern
https://www.gwern.net
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-04-07 15:12 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-19 20:14 Auto-smallcaps filter Gwern Branwen
[not found] ` <CAMwO0gwVEMVMrGrSv3F4qq=ZSVeWgaq8xZ2PE+xKx51GWDKW1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-02-20 22:58 ` John MacFarlane
[not found] ` <yh480ko8tsrhm4.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2020-02-21 4:12 ` Gwern Branwen
[not found] ` <CAMwO0gxRuvPQnDcu-8BgLVzLWZBrj90C5_RWPccN-NzF_BqFxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-04-07 15:12 ` Gwern Branwen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).