LaTeX custom-style" /> custom-style2latex.lua

custom-style2latex.lua

Benct Philip Jonsson

2023-06-30 (revised 2023-07-05)

Pandoc filter which injects LaTeX code before and/or after Div and Span elements based on (DOCX) custom-style attributes and configuration in external JSON files.

Description

This is a Pandoc filter which “converts” DOCX custom-styles to LaTeX commands and environments, as specified in the filter or in external JSON files.

A special feature of this filter is that it avoids extra whitespace between the wrapping LaTeX markup and the contents of Divs since such whitespace may prevent some LaTeX environments from working properly. It avoids such whitespace by converting Divs to LaTeX during filter execution and replacing the Div with a single concatenated latex RawBlock which contains both the converted Div and the wrapping markup as a single string.

The custom-style attribute

If you use +styles as a reader extension when converting from DOCX (-f doxc+styles) Pandoc will pick up DOCX custom named styles. The docx reader will, if you used the +styles reader extension, convert paragraphs and character runs with custom named styles into Divs and Spans with an attribute custom-style with the style name as its value. Conversely the docx writer will convert Divs and Spans with a custom-style attribute into paragraphs and character runs with a named style with the value of the attribute as style name. (These styles have no effect unless you use a reference DOCX file containing styles with these names which differ in their settings from the default paragraph or character style, or open the DOCX file produced by Pandoc with Word (or LibreOffice Writer) and change the settings of those styles.)

The usefulness of this when converting to DOCX is obvious: you can apply DOCX named styles to parts of your documents by including Divs and Spans with appropriate custom-style attributes. The main usefulness when converting from DOCX is that you can preserve the style info when converting to Markdown, edit the Markdown and then convert the Markdown to DOCX using the original DOCX file as reference-doc, thus “preserving” the original styles.

As for converting from DOCX to other output formats, for HTML you can obviously target the custom-style attribute directly with CSS attribute selectors. For other formats you can use a filter which inspects the custom-style attribute just as it would any other attribute and transform the Div or Span depending on its value.

This is such a filter which injects raw LaTeX markup, which you have specified either in a table in the filter itself or in an external JSON file, before and/or after Div and Span elements based on the values of the custom-style attribute of those elements.

Usage

The filter takes configuration from one or more external JSON files which specify which custom-style attribute values should trigger LaTeX markup injection and which markup to inject. You specify the JSON file(s) to use by setting the value of the metadata field custom-style2latex to a single file path/name (string) or a list with the path/name of each file (a list of strings), on the command line with the -M custom-style2latex=FILENAME option (which may be repeated for multiple files), or in the document metadata, in a metadata file or in the metadata section of a defaults file with

custom-style2latex: FILENAME

or

custom-style2latex:
  - FILE-1
  - FILE-2
  # ...

Some characters when found at the start of a file name will be replaced with certain common path prefixes:

A JSON file might look like this:

{
   "Div" : {
      "Centered" : {
         "pre" : "\\begin{center}",
         "post" : "\\end{center}",
         "join" : "\n"
      }
   },
   "Span" : {
      "Red" : {
         "pre" : "\\textcolor[HTML]{FF0000}{",
         "post" : "}"
      },
      "Sans" : "textsf"
   }
}

If $ pandoc -f docx+styles -t markdown text.docx would give you

::: {custom-style="Centered"}
This [is]{custom-style="Red"} [centered]{custom-style="Sans"}.
:::

then

$ pandoc -L custom-style2latex.lua -M custom-style2latex=styles.json \
-f docx+styles -t latex text.docx

gives you

\begin{center}
This \textcolor[HTML]{FF0000}{{is}} \textsf{{centered}}.
\end{center}

More on JSON configuration file structure

The example above should be rather self-explanatory, but some remarks are necessary, not least since some “shortcuts” are supported.

Each config file must contain an object with the top level keys Div and/or Span, containing the config for Div and Span elements respectively. This division is necessary since in DOCX you can have one paragraph style and one character style with the same name. The value for each of these keys is an object. In these second level objects each key is a style name (i.e. the value of a custom-style attribute) and each value is the definition for the LaTeX to inject before and/or after Divs/Spans with that style. The definition value can be any of

A string

A string value will be used as an environment name for a Div and as a command name for a Span. This means that

{
   "Span" : {
      "Sans" : "textsf"
   }
}

is equivalent to

{
   "Span" : {
      "Sans" : {
         "pre" : "\\textsf{",
         "post" : "}"
      }
   }
}

and the Centered Div example above could actually be written as

{
   "Div" : {
      "Centered" : "center"
   }
}
A boolean

The meaning of a boolean as definition depends on wheter it is true or false:

An object

An object as definition is necessary when the assumptions the filter makes with string or boolean values are incorrect, such as when a Div needs to be wrapped in a command rather than in an environment or the signature of the command or environment is more complicated than the default, as in the example with \textcolor at the start of this section.

In a table definition the filter looks for the following fields:

pre

A string with LaTeX to insert before the body of the Div or Span.

post

A string with LaTeX to insert after the body of the Div or Span.

join

A string, by default "\n".

To avoid excess whitespace between the body of a Div and the pre and/or post markup this filter converts the body of the Div to LaTeX markup, removes any leading or trailing whitespace around that markup and then concatenates the pre markup if any, the body markup and the post markup if any with this string as separator, then replaces the Div with a single raw latex block containin the concatenated LaTeX markup. With the default value (a single newline) is the pre and/or post markup will be on separate lines, but without any blank line between them and the body markup. If you want nothing at all between the pre/post markup and the body markup you must explicitly set the separator to an empty string with "join": "".

Wrapped Spans do not have the same whitespace issues and the value of join is currently unused with Spans.

If you list multiple config files in the custom-style2latex metadata variable any definition for a given style name (for each of the Div and Span namespaces separately) found in a file listed later will override any definition for the same style name found in a file listed earlier.

Default definitions

At the top of the filter code is a variable style2latex whose value is a table of the structure

local style2latex = {
  Div = {
    -- Centered = {
    --   pre = '\\begin{center}',
    --   post = '\\end{center}',
    --   join = "\n"
    -- }
  },
  Span = {
    -- Red = {
    --   pre = '\\textcolor[HTML]{FF0000}{',
    --   post = '}'
    -- }
  }
}

which should look familiar after you have read the description of the configuration file structure above. Here you can insert definitions for custom styles which you always or at least often use, so that they will be available as if “builtin”, without the need to always load a configuration file defining them, but note that the “shortcuts” of using a string or boolean as definition do not work here: you always have to use a table with pre and/or post fields! You can still override definitions given here in a configuration file, including removing them entirely by saying "Style Name": false in a configuration file.

Warning: You shouldn’t comment out or remove the entire style2latex variable since the proper functioning of the filter relies on it existing and being a table!

Author

Benct Philip Jonsson bpjonsson+custom-style2latex@gmail.com

This software is Copyright (c) 2023 by Benct Philip Jonsson.

This is free software, licensed under:

The MIT (X11) License

http://www.opensource.org/licenses/mit-license.php