LaTeX custom-style" />
2023-06-30 (revised 2023-07-05)
Pandoc filter which injects LaTeX code before
and/or after Div and Span elements based on (DOCX)
custom-style
attributes and configuration in external JSON
files.
This is a Pandoc filter which “converts” DOCX custom-styles to LaTeX commands and environments, as specified in the filter or in external JSON files.
A special feature of this filter is that it avoids extra whitespace
between the wrapping LaTeX markup and the contents of Divs since such
whitespace may prevent some LaTeX environments
from working properly. It avoids such whitespace by converting Divs to
LaTeX during filter execution and replacing the
Div with a single concatenated latex
RawBlock which
contains both the converted Div and the wrapping markup as a single
string.
custom-style
attributeIf you use +styles
as a reader extension when
converting from DOCX (-f doxc+styles
) Pandoc will pick up
DOCX custom named styles. The docx reader will, if you used the
+styles
reader extension, convert
paragraphs and character runs with custom named styles into Divs and Spans with
an attribute custom-style
with the style name as its
value. Conversely the docx writer will convert Divs and Spans
with a custom-style
attribute into paragraphs and character
runs with a named style with the value of the attribute as style name.
(These styles have no effect unless you use a reference DOCX file
containing styles with these names which differ in their settings from
the default paragraph or character style, or open the DOCX file produced
by Pandoc with Word (or LibreOffice Writer) and change the settings of
those styles.)
The usefulness of this when converting to DOCX is obvious:
you can apply DOCX named styles to parts of your documents by including
Divs and Spans with appropriate custom-style
attributes.
The main usefulness when converting from DOCX is that you can
preserve the style info when converting to Markdown, edit the Markdown
and then convert the Markdown to DOCX using the original DOCX file as
reference-doc, thus “preserving” the original styles.
As for converting from DOCX to other output formats, for HTML you can
obviously target the custom-style
attribute directly with
CSS attribute selectors. For other formats you can use a filter which
inspects the custom-style
attribute just as it would any
other attribute and transform the Div or Span depending on its
value.
This is such a filter which injects raw LaTeX markup, which
you have specified either in a table in the filter itself or in an
external JSON file, before and/or after Div and Span elements based on
the values of the custom-style
attribute of those
elements.
The filter takes configuration from one or more external JSON files
which specify which custom-style
attribute values should
trigger LaTeX markup injection and which markup to
inject. You specify the JSON file(s) to use by setting the value of the
metadata field custom-style2latex
to a single file
path/name (string) or a list with the path/name of each file (a list of
strings), on the command line with the
-M custom-style2latex=FILENAME
option (which may be
repeated for multiple files), or in the document metadata, in a metadata
file or in the metadata
section of a defaults file with
custom-style2latex: FILENAME
or
custom-style2latex:
- FILE-1
- FILE-2
# ...
Some characters when found at the start of a file name will be replaced with certain common path prefixes:
~
(tilde) will be replaced with the value of the
HOME
environment variable.*
(asterisk) will be replaced with the path to the
user’s Pandoc data directory as and if provided by Pandoc’s Lua
API.A JSON file might look like this:
{
"Div" : {
"Centered" : {
"pre" : "\\begin{center}",
"post" : "\\end{center}",
"join" : "\n"
}
},
"Span" : {
"Red" : {
"pre" : "\\textcolor[HTML]{FF0000}{",
"post" : "}"
},
"Sans" : "textsf"
}
}
If $ pandoc -f docx+styles -t markdown text.docx
would
give you
::: {custom-style="Centered"}
[is]{custom-style="Red"} [centered]{custom-style="Sans"}.
This :::
then
$ pandoc -L custom-style2latex.lua -M custom-style2latex=styles.json \
-f docx+styles -t latex text.docx
gives you
\begin{center}
\textcolor[HTML]{FF0000}{{is}} \textsf{{centered}}.
This \end{center}
The example above should be rather self-explanatory, but some remarks are necessary, not least since some “shortcuts” are supported.
Each config file must contain an object with the top level keys
Div
and/or Span
, containing the config for Div
and Span elements respectively. This division is necessary since in DOCX
you can have one paragraph style and one character style with the same
name. The value for each of these keys is an object. In these second
level objects each key is a style name (i.e. the value of a
custom-style
attribute) and each value is the
definition for the LaTeX to inject before
and/or after Divs/Spans with that style. The definition value can be any
of
A string value will be used as an environment name for a Div and as a command name for a Span. This means that
{
"Span" : {
"Sans" : "textsf"
}
}
is equivalent to
{
"Span" : {
"Sans" : {
"pre" : "\\textsf{",
"post" : "}"
}
}
}
and the Centered
Div example above could actually be
written as
{
"Div" : {
"Centered" : "center"
}
}
The meaning of a boolean as definition depends on wheter it is
true
or false
:
A false
value causes the default definition for the style, if
any, or any definition from a file listed earlier in the
custom-style2latex
metadata field, to be removed.
A true
value means that the command or environment
name shall be identical to the style name:
{
"Div" : {
"Centered" : true
},
"Span" : {
"Red" : true
}
}
is equivalent to
{
"Div" : {
"Centered" : "Centered"
},
"Span" : {
"Red" : "Red"
}
}
or
{
"Div" : {
"Centered" : {
"pre" : "\\begin{Centered}",
"post" : "\\end{Centered}"
}
},
"Span" : {
"Red" : {
"pre" : "\\Red{",
"post" : "}"
}
}
}
Note that this shortcut only will work if the style name consists entirely of upper and/or lowercase letters since these are the only characters which are legal in TeX command names.1 You will probably need to arrange for a command or environment with the right name to be defined in LaTeX.
An object as definition is necessary when the assumptions the filter
makes with string or boolean values are incorrect, such as when a Div
needs to be wrapped in a command rather than in an environment or the
signature of the command or environment is more complicated than the
default, as in the example with \textcolor
at the start of
this section.
In a table definition the filter looks for the following fields:
pre
A string with LaTeX to insert before the body of the Div or Span.
post
A string with LaTeX to insert after the body of the Div or Span.
join
A string, by default "\n"
.
To avoid excess whitespace between the body of a Div and the
pre
and/or post
markup this filter converts
the body of the Div to LaTeX markup, removes
any leading or trailing whitespace around that markup and then
concatenates the pre
markup if any, the body markup and the
post
markup if any with this string as separator, then
replaces the Div with a single raw latex block containin the
concatenated LaTeX markup. With the default value (a single
newline) is the pre
and/or post
markup will be
on separate lines, but without any blank line between them and
the body markup. If you want nothing at all between the
pre
/post
markup and the body markup you must
explicitly set the separator to an empty string with
"join": ""
.
Wrapped Spans do not have the same whitespace issues and the value of
join
is currently unused with Spans.
If you list multiple config files in the
custom-style2latex
metadata variable any definition for a
given style name (for each of the Div
and Span
namespaces separately) found in a file listed later will override any
definition for the same style name found in a file listed earlier.
At the top of the filter code is a variable style2latex
whose value is a table of the structure
local style2latex = {
Div = {
-- Centered = {
-- pre = '\\begin{center}',
-- post = '\\end{center}',
-- join = "\n"
-- }
},
Span = {
-- Red = {
-- pre = '\\textcolor[HTML]{FF0000}{',
-- post = '}'
-- }
}
}
which should look familiar after you have read the description of the
configuration file structure above. Here you can insert definitions for
custom styles which you always or at least often use, so that they will
be available as if “builtin”, without the need to always load a
configuration file defining them, but note that the “shortcuts” of using
a string or boolean as definition do not work here: you always have to
use a table with pre
and/or post
fields! You
can still override definitions given here in a configuration file,
including removing them entirely by saying
"Style Name": false
in a configuration file.
Warning: You shouldn’t comment out or remove the
entire style2latex
variable since the proper functioning of
the filter relies on it existing and being a table!
Benct Philip Jonsson bpjonsson+custom-style2latex@gmail.com
This software is Copyright (c) 2023 by Benct Philip Jonsson.
This is free software, licensed under:
The MIT (X11) License
http://www.opensource.org/licenses/mit-license.php