Hi all, Working in markdown as primary input and using pandoc to generate my target output formats, I have some inconveniences with formatting my tables in markdown and would appreciate ideas how to solve them. (For what it worth, the output in my case is exclusively docx but that doesn't really matter I guess). # Problem definition ## Replacement of individual words in the table Sometimes it is tempting to use some macro pre-processing of the source markdown file to substitute individual words (like name of customer, postal addresses or phone numbers). It is very easy to do using some python or perl scripting (my personal favorites for such things are m4 and sed). It woks very nice in all "normal" parts of the markdown source. However if such replacement is made inside markdown table, it usually destroy table layout, making parsing of the table incorrect. ## "Word" is too long Sometimes "Word" is just too long for the table column to fit. Disclaimer: perhaps there is some syntax existing already in some of pandoc extensions of plain markdown format, which would permit to "concatenate" the "word", started on one line, with continuation of that word, defined on another line. If such syntax already exits, please excuse my ignorance. I tried to find such syntax in docs and via internet - without success. Examples of such "words" are often happen to me in German writing ("Qualitätssicherungsmaßnahmen" for example alone eats 28 chars already, in some of the narrow columns of an average table in markdown this word has no chances to fit into). Another example can be full qualified file names, which are often used in technical documentation: /usr/x86_64-pc-cygwin/lib/ldscripts/i386pe.xbn I think sooner or later everybody who work with markdown documents will stumble on such problem. ## Complications by cell content growing over time After table is created initially, there are often need to append or re-phrase something. In all those steps it is pretty hard to maintain. Especially big pain in such operation if cell contain something not trivial formatting, like lists. Of course one can imagine some plug-ins for text editors (my live changed when I learned "Table Mode" vim plug-in). Another strategy: it is possible to convert your intermediate markdown version to your favorite office tool format, extend table with some WYSIWYG tool of your choice, convert again to markdown. In all those cases you however miss the power of plain markdown editing. As a result, you tend to avoid using tables or reduce their usage to the minimum. By the way, sometimes converting back from office tool format to markdown you will experience those "Word is too long" effect in the generated markdown, which you need to quickly fix somehow, to proceed with your main task at hands. # Proposed solution Idea is to permit some syntax for definition of named segments of Abstract Syntax Tree (AST) ("Definition of AST segments"), and provide some way to "inject" those segments inside the AST ("Definition of replacement pointers"). Definition of segments as well as pointers has to be removed after "injection" of segments is done from the final AST representation, so that the end document (pdf or whatever) has everything substituted and all foreign markup removed. Proof-of-concept version of that solution is implemented and seems to work via lua script. It actually uses ugly, but easy to parse syntax to definite AST segments and replacement pointers in source markdwon. Now I am looking forward for more elegant definition and would very appreciate any thoughts or feedback, to compensate for my ignorance of many markdown syntax extensions existing so far, which may conflict with my syntax. Specifically, my preference is for the moment with inline Math syntax (identifier enclosed in pair of \$-signs). May be some Math experts out there will tell immediately that it is bad idea, who knows. Or even perhaps whole problem and specifically listed use cases has some easier and more elegant existing solution. Anyway, here we go. The details of the syntax proposed below. ## Definition of replacement pointers Replacement pointer for "AST-segments" replacement can be defined within normal markdown content using inline Math expression. Such expression has to contain without any blanks single identifier. Rules for identifier naming are described in **Definition of "AST-segments"** / **Rules for identifier naming**. Same identifier has to be used in segment definition know, what to "inject" at the position of every replacement pointer. Example table which demo the syntax of replacement pointers: +------------------------------+---------------+---------------+--------------------+ | Change log | Author #1 | Author #2 | Date | +==============================+===============+===============+====================+ | Some long text in Cell 2:1, | Fred | Wilma | 3000 Y.b.C. | | which do not contain | Flinstone | Flinstone | | | anything interesting | | | | +------------------------------+---------------+---------------+--------------------+ | Almost $ALM_NTH_H1$ nothing | Yabba-Dabba-D | Wilma F. | 2999 Y.b.C | | happened | | | | +------------------------------+---------------+---------------+--------------------+ | Here $ALMOST_NTH_H2$, but we | $YABBDBBD_S$ | $YABBDBBD3$ | 2998.5 Y.b.C | | had a dedicated revision for | | | | | that last time and now we | | | | | have it again | | | | +------------------------------+---------------+---------------+--------------------+ Same table, after AST segments are "injected" (injection is just approximately demonstrated, AST state is almost like this but not exactly like this: broken words are continued on next line using → symbol, as in "Yabba-Dabba-Doo, Senior" substitution, otherwise we would need to dramatically increase column width to fit them): +------------------------------+---------------+---------------+--------------------+ | Change log | Author #1 | Author #2 | Date | +==============================+===============+===============+====================+ | Some long text in Cell 2:1, | Fred | Wilma | 3000 Y.b.C. | | which do not contain | Flinstone | Flinstone | | | anything interesting | | | | +------------------------------+---------------+---------------+--------------------+ | Almost nothing happened, | Yabba-Dabba-D | Wilma F. | 2999 Y.b.C | | except for: | | | | | | | | | | - Wilma was biten by a snake | | | | | - Fred invented fire | | | | +------------------------------+---------------+---------------+--------------------+ | Here almost nothing | Yabba-Dabba-D | Yabba-Dabba-D | 2998.5 Y.b.C | | happened, but we had a | →oo, Senior | →oo, III (and | | | dedicated revision for that | | greatest) | | | last time and now we have it | | | | | again | | | | +------------------------------+---------------+---------------+--------------------+ ## Definitions of AST segments Here we have for our example table corresponding definition of AST segments for replacement. They will be removed from the final state of AST and by that from final representation of the document. $ALM_NTH_H1$ = Almost nothing happened, except for: - Wilma was biten by a snake - Fred invented fire $ALMOST_NTH_H2$ = almost nothing happened $END$. Everything past \$END\$ segment finishing identifier till the end of paragraph is ignored. This text is not part of any segment definiton - definition of \$ALMOST_NTH_H2\$ is explicitly terminated. $YABBDBBD_S$ = Yabba-Dabba-Doo, Senior $YABBDBBD3$ = Yabba-Dabba-Doo, III (and greatest) $END$ Every segment definition has to start in the new paragraph and be the first element of it. Segment definition is introduced by an inline Math expression, containing without any blanks single identifier. **Rules for identifier naming** Rules for identifier naming are like in Java or C - start by non digit followed by any number of underscore '\_' or letters or digits. Identifiers are case sensitive. Segment definition has to be either explicitly finalized by reserved identifier \$END\$, or it is finalized implicitly when next segment definition is started. Everything, what is situated after \$END\$ identifier in segment definition till the end of current paragraph is completely ignored. If segment definition contains only one paragraph, segment content will be inserted instead of segment replacement pointer into source paragraph, allowing for replacement of individual words or groups of words (as in $ALMOST_NTH_H2$ example). If segment definition contains more than one paragraph (as in $ALM_NTH_H1$ example), replacement is done for the complete paragraph, which mentions replacement pointer. This way we can put numbered or ordered lists conveniently into markdown table. Also we can "outsource" content of some heavy cells, which could save lot of time especially when maintaining existing markdown tables. With best regards, Anton