AST segments replacement - Anton Sharonov

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

From: Anton Sharonov <anton.sharonov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: AST segments replacement
Date: Sun, 9 May 2021 09:23:25 -0700 (PDT)	[thread overview]
Message-ID: <d29c90a4-0f89-45c7-ad33-4a59b8ab0c33n@googlegroups.com> (raw)

[-- Attachment #1.1: Type: text/plain, Size: 10613 bytes --]

Hi all,

Working in markdown as primary input and using pandoc to generate
my target output formats, I have some inconveniences with
formatting my tables in markdown and would appreciate ideas how
to solve them. (For what it worth, the output in my case is
exclusively docx but that doesn't really matter I guess).

# Problem definition

## Replacement of individual words in the table

Sometimes it is tempting to use some macro pre-processing of the
source markdown file to substitute individual words (like name of
customer, postal addresses or phone numbers). It is very easy to
do using some python or perl scripting (my personal favorites for
such things are m4 and sed). It woks very nice in all "normal"
parts of the markdown source. However if such replacement is made
inside markdown table, it usually destroy table layout, making parsing
of the table incorrect.

## "Word" is too long

Sometimes "Word" is just too long for the table column to fit.

Disclaimer: perhaps there is some syntax existing already in some
of pandoc extensions of plain markdown format, which would permit
to "concatenate" the "word", started on one line, with
continuation of that word, defined on another line. If such
syntax already exits, please excuse my ignorance. I tried to find
such syntax in docs and via internet - without success.

Examples of such "words" are often happen to me in German writing
("Qualitätssicherungsmaßnahmen" for example alone eats 28 chars
already, in some of the narrow columns of an average table in
markdown this word has no chances to fit into). Another example
can be full qualified file names, which are often used in
technical documentation:

/usr/x86_64-pc-cygwin/lib/ldscripts/i386pe.xbn

I think sooner or later everybody who work with markdown
documents will stumble on such problem.

## Complications by cell content growing over time

After table is created initially, there are often need to append
or re-phrase something. In all those steps it is pretty hard to
maintain. Especially big pain in such operation if cell contain
something not trivial formatting, like lists.

Of course one can imagine some plug-ins for text editors (my live
changed when I learned "Table Mode" vim plug-in). Another
strategy: it is possible to convert your intermediate markdown
version to your favorite office tool format, extend table with
some WYSIWYG tool of your choice, convert again to markdown. In
all those cases you however miss the power of plain markdown
editing. As a result, you tend to avoid using tables or reduce
their usage to the minimum. By the way, sometimes converting back
from office tool format to markdown you will experience those
"Word is too long" effect in the generated markdown, which you
need to quickly fix somehow, to proceed with your main task at
hands.

# Proposed solution

Idea is to permit some syntax for definition of named segments of
Abstract Syntax Tree (AST) ("Definition of AST segments"), and
provide some way to "inject" those segments inside the AST
("Definition of replacement pointers"). Definition of segments as
well as pointers has to be removed after "injection" of segments
is done from the final AST representation, so that the end
document (pdf or whatever) has everything substituted and all
foreign markup removed.

Proof-of-concept version of that solution is implemented and
seems to work via lua script. It actually uses ugly, but easy to
parse syntax to definite AST segments and replacement pointers in
source markdwon. Now I am looking forward for more elegant
definition and would very appreciate any thoughts or feedback, to
compensate for my ignorance of many markdown syntax extensions
existing so far, which may conflict with my syntax. Specifically,
my preference is for the moment with inline Math syntax
(identifier enclosed in pair of \$-signs). May be some Math
experts out there will tell immediately that it is bad idea, who
knows.

Or even perhaps whole problem and specifically listed use cases
has some easier and more elegant existing solution.

Anyway, here we go. The details of the syntax proposed below.

## Definition of replacement pointers

Replacement pointer for "AST-segments" replacement can be defined
within normal markdown content using inline Math expression. Such
expression has to contain without any blanks single identifier.
Rules for identifier naming are described in **Definition of
"AST-segments"** / **Rules for identifier naming**. Same
identifier has to be used in segment definition know, what to
"inject" at the position of every replacement pointer.

Example table which demo the syntax of replacement pointers:

+------------------------------+---------------+---------------+--------------------+
| Change log                   | Author #1     | Author #2     | 
Date               |
+==============================+===============+===============+====================+
| Some long text in Cell 2:1,  | Fred          | Wilma         | 3000 
Y.b.C.        |
| which do not contain         | Flinstone     | Flinstone     
|                    |
| anything interesting         |               |               
|                    |
+------------------------------+---------------+---------------+--------------------+
| Almost $ALM_NTH_H1$ nothing  | Yabba-Dabba-D | Wilma F.      | 2999 
Y.b.C         |
| happened                     |               |               
|                    |
+------------------------------+---------------+---------------+--------------------+
| Here $ALMOST_NTH_H2$, but we | $YABBDBBD_S$  | $YABBDBBD3$   | 2998.5 
Y.b.C       |
| had a dedicated revision for |               |               
|                    |
| that last time and now we    |               |               
|                    |
| have it again                |               |               
|                    |
+------------------------------+---------------+---------------+--------------------+

Same table, after AST segments are "injected" (injection is just
approximately demonstrated, AST state is almost like this but not
exactly like this: broken words are continued on next line using
→ symbol, as in "Yabba-Dabba-Doo, Senior" substitution, otherwise
we would need to dramatically increase column width to fit them):

+------------------------------+---------------+---------------+--------------------+
| Change log                   | Author #1     | Author #2     | 
Date               |
+==============================+===============+===============+====================+
| Some long text in Cell 2:1,  | Fred          | Wilma         | 3000 
Y.b.C.        |
| which do not contain         | Flinstone     | Flinstone     
|                    |
| anything interesting         |               |               
|                    |
+------------------------------+---------------+---------------+--------------------+
| Almost nothing happened,     | Yabba-Dabba-D | Wilma F.      | 2999 
Y.b.C         |
| except for:                  |               |               
|                    |
|                              |               |               
|                    |
| - Wilma was biten by a snake |               |               
|                    |
| - Fred invented fire         |               |               
|                    |
+------------------------------+---------------+---------------+--------------------+
| Here almost nothing          | Yabba-Dabba-D | Yabba-Dabba-D | 2998.5 
Y.b.C       |
| happened, but we had a       | →oo, Senior   | →oo, III (and 
|                    |
| dedicated revision for that  |               | greatest)     
|                    |
| last time and now we have it |               |               
|                    |
| again                        |               |               
|                    |
+------------------------------+---------------+---------------+--------------------+

## Definitions of AST segments

Here we have for our example table corresponding definition of
AST segments for replacement. They will be removed from the final
state of AST and by that from final representation of the document.

$ALM_NTH_H1$ = Almost nothing happened, except for:

- Wilma was biten by a snake
- Fred invented fire

$ALMOST_NTH_H2$ = almost nothing happened $END$. Everything past
\$END\$ segment finishing identifier till the end of paragraph
is ignored.

This text is not part of any segment definiton - definition
of \$ALMOST_NTH_H2\$ is explicitly terminated.

$YABBDBBD_S$ = Yabba-Dabba-Doo, Senior

$YABBDBBD3$ = Yabba-Dabba-Doo, III (and greatest)

$END$

Every segment definition has to start in the new paragraph and be
the first element of it.

Segment definition is introduced by an inline Math expression,
containing without any blanks single identifier.

**Rules for identifier naming**

Rules for identifier naming are like in Java or C - start by non
digit followed by any number of underscore '\_' or letters or
digits.  Identifiers are case sensitive.

Segment definition has to be either explicitly finalized by
reserved identifier \$END\$, or it is finalized implicitly when
next segment definition is started. 

Everything, what is situated after \$END\$ identifier in segment
definition till the end of current paragraph is completely
ignored.

If segment definition contains only one paragraph, segment
content will be inserted instead of segment replacement pointer
into source paragraph, allowing for replacement of individual
words or groups of words (as in $ALMOST_NTH_H2$ example).

If segment definition contains more than one paragraph (as in
$ALM_NTH_H1$ example), replacement is done for the complete
paragraph, which mentions replacement pointer. This way we can
put numbered or ordered lists conveniently into markdown table.
Also we can "outsource" content of some heavy cells, which could
save lot of time especially when maintaining existing markdown
tables.

With best regards, Anton

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d29c90a4-0f89-45c7-ad33-4a59b8ab0c33n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 16352 bytes --]

[-- Attachment #2: AST_Segments_Replacement.md --]
[-- Type: text/markdown, Size: 10149 bytes --]

Hi all,

Working in markdown as primary input and using pandoc to generate
my target output formats, I have some inconveniences with
formatting my tables in markdown and would appreciate ideas how
to solve them. (For what it worth, the output in my case is
exclusively docx but that doesn't really matter I guess).

# Problem definition

## Replacement of individual words in the table

Sometimes it is tempting to use some macro pre-processing of the
source markdown file to substitute individual words (like name of
customer, postal addresses or phone numbers). It is very easy to
do using some python or perl scripting (my personal favorites for
such things are m4 and sed). It woks very nice in all "normal"
parts of the markdown source. However if such replacement is made
inside markdown table, it usually destroy table layout, making parsing
of the table incorrect.

## "Word" is too long

Sometimes "Word" is just too long for the table column to fit.

Disclaimer: perhaps there is some syntax existing already in some
of pandoc extensions of plain markdown format, which would permit
to "concatenate" the "word", started on one line, with
continuation of that word, defined on another line. If such
syntax already exits, please excuse my ignorance. I tried to find
such syntax in docs and via internet - without success.

Examples of such "words" are often happen to me in German writing
("Qualitätssicherungsmaßnahmen" for example alone eats 28 chars
already, in some of the narrow columns of an average table in
markdown this word has no chances to fit into). Another example
can be full qualified file names, which are often used in
technical documentation:

/usr/x86_64-pc-cygwin/lib/ldscripts/i386pe.xbn

I think sooner or later everybody who work with markdown
documents will stumble on such problem.

## Complications by cell content growing over time

After table is created initially, there are often need to append
or re-phrase something. In all those steps it is pretty hard to
maintain. Especially big pain in such operation if cell contain
something not trivial formatting, like lists.

Of course one can imagine some plug-ins for text editors (my live
changed when I learned "Table Mode" vim plug-in). Another
strategy: it is possible to convert your intermediate markdown
version to your favorite office tool format, extend table with
some WYSIWYG tool of your choice, convert again to markdown. In
all those cases you however miss the power of plain markdown
editing. As a result, you tend to avoid using tables or reduce
their usage to the minimum. By the way, sometimes converting back
from office tool format to markdown you will experience those
"Word is too long" effect in the generated markdown, which you
need to quickly fix somehow, to proceed with your main task at
hands.

# Proposed solution

Idea is to permit some syntax for definition of named segments of
Abstract Syntax Tree (AST) ("Definition of AST segments"), and
provide some way to "inject" those segments inside the AST
("Definition of replacement pointers"). Definition of segments as
well as pointers has to be removed after "injection" of segments
is done from the final AST representation, so that the end
document (pdf or whatever) has everything substituted and all
foreign markup removed.

Proof-of-concept version of that solution is implemented and
seems to work via lua script. It actually uses ugly, but easy to
parse syntax to definite AST segments and replacement pointers in
source markdwon. Now I am looking forward for more elegant
definition and would very appreciate any thoughts or feedback, to
compensate for my ignorance of many markdown syntax extensions
existing so far, which may conflict with my syntax. Specifically,
my preference is for the moment with inline Math syntax
(identifier enclosed in pair of \$-signs). May be some Math
experts out there will tell immediately that it is bad idea, who
knows.

Or even perhaps whole problem and specifically listed use cases
has some easier and more elegant existing solution.

Anyway, here we go. The details of the syntax proposed below.

## Definition of replacement pointers

Replacement pointer for "AST-segments" replacement can be defined
within normal markdown content using inline Math expression. Such
expression has to contain without any blanks single identifier.
Rules for identifier naming are described in **Definition of
"AST-segments"** / **Rules for identifier naming**. Same
identifier has to be used in segment definition know, what to
"inject" at the position of every replacement pointer.

Example table which demo the syntax of replacement pointers:

+------------------------------+---------------+---------------+--------------------+
| Change log                   | Author #1     | Author #2     | Date               |
+==============================+===============+===============+====================+
| Some long text in Cell 2:1,  | Fred          | Wilma         | 3000 Y.b.C.        |
| which do not contain         | Flinstone     | Flinstone     |                    |
| anything interesting         |               |               |                    |
+------------------------------+---------------+---------------+--------------------+
| Almost $ALM_NTH_H1$ nothing  | Yabba-Dabba-D | Wilma F.      | 2999 Y.b.C         |
| happened                     |               |               |                    |
+------------------------------+---------------+---------------+--------------------+
| Here $ALMOST_NTH_H2$, but we | $YABBDBBD_S$  | $YABBDBBD3$   | 2998.5 Y.b.C       |
| had a dedicated revision for |               |               |                    |
| that last time and now we    |               |               |                    |
| have it again                |               |               |                    |
+------------------------------+---------------+---------------+--------------------+

Same table, after AST segments are "injected" (injection is just
approximately demonstrated, AST state is almost like this but not
exactly like this: broken words are continued on next line using
→ symbol, as in "Yabba-Dabba-Doo, Senior" substitution, otherwise
we would need to dramatically increase column width to fit them):

+------------------------------+---------------+---------------+--------------------+
| Change log                   | Author #1     | Author #2     | Date               |
+==============================+===============+===============+====================+
| Some long text in Cell 2:1,  | Fred          | Wilma         | 3000 Y.b.C.        |
| which do not contain         | Flinstone     | Flinstone     |                    |
| anything interesting         |               |               |                    |
+------------------------------+---------------+---------------+--------------------+
| Almost nothing happened,     | Yabba-Dabba-D | Wilma F.      | 2999 Y.b.C         |
| except for:                  |               |               |                    |
|                              |               |               |                    |
| - Wilma was biten by a snake |               |               |                    |
| - Fred invented fire         |               |               |                    |
+------------------------------+---------------+---------------+--------------------+
| Here almost nothing          | Yabba-Dabba-D | Yabba-Dabba-D | 2998.5 Y.b.C       |
| happened, but we had a       | →oo, Senior   | →oo, III (and |                    |
| dedicated revision for that  |               | greatest)     |                    |
| last time and now we have it |               |               |                    |
| again                        |               |               |                    |
+------------------------------+---------------+---------------+--------------------+

## Definitions of AST segments

Here we have for our example table corresponding definition of
AST segments for replacement. They will be removed from the final
state of AST and by that from final representation of the document.

$ALM_NTH_H1$ = Almost nothing happened, except for:

- Wilma was biten by a snake
- Fred invented fire

$ALMOST_NTH_H2$ = almost nothing happened $END$. Everything past
\$END\$ segment finishing identifier till the end of paragraph
is ignored.

This text is not part of any segment definiton - definition
of \$ALMOST_NTH_H2\$ is explicitly terminated.

$YABBDBBD_S$ = Yabba-Dabba-Doo, Senior

$YABBDBBD3$ = Yabba-Dabba-Doo, III (and greatest)

$END$

Every segment definition has to start in the new paragraph and be
the first element of it.

Segment definition is introduced by an inline Math expression,
containing without any blanks single identifier.

**Rules for identifier naming**

Rules for identifier naming are like in Java or C - start by non
digit followed by any number of underscore '\_' or letters or
digits.  Identifiers are case sensitive.

Segment definition has to be either explicitly finalized by
reserved identifier \$END\$, or it is finalized implicitly when
next segment definition is started. 

Everything, what is situated after \$END\$ identifier in segment
definition till the end of current paragraph is completely
ignored.

If segment definition contains only one paragraph, segment
content will be inserted instead of segment replacement pointer
into source paragraph, allowing for replacement of individual
words or groups of words (as in $ALMOST_NTH_H2$ example).

If segment definition contains more than one paragraph (as in
$ALM_NTH_H1$ example), replacement is done for the complete
paragraph, which mentions replacement pointer. This way we can
put numbered or ordered lists conveniently into markdown table.
Also we can "outsource" content of some heavy cells, which could
save lot of time especially when maintaining existing markdown
tables.

With best regards, Anton

next             reply	other threads:[~2021-05-09 16:23 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-09 16:23 Anton Sharonov [this message]
     [not found] ` <d29c90a4-0f89-45c7-ad33-4a59b8ab0c33n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-05-09 18:19   ` BPJ
     [not found]     ` <CADAJKhA4R6F-+o24Z6PCfPbVi-Nrm0rc7e0vxQpDei6AsiRBOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-05-12 14:12       ` Anton Sharonov
     [not found]         ` <CAMoRF4=nmMzhRev5ZpdEZCsVBVMZkD5Y7UtvL6U_oCYgazS5eQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-05-12 18:24           ` BPJ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d29c90a4-0f89-45c7-ad33-4a59b8ab0c33n@googlegroups.com \
    --to=anton.sharonov-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).