Some thoughts for markdown syntax extensions

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Some thoughts for markdown syntax extensions
@ 2010-11-06 13:24 BP Jonsson
       [not found] ` <4CD556FA.6090807-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: BP Jonsson @ 2010-11-06 13:24 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

# Some thoughts for markdown syntax extensions

## Verbatim text

There is a need for some syntax to render text 'verbatim',
i.e. passing it trough exactly as-is without any conversion,
of special characters to entities/escapes.  This would be
useful when writing for a templating system or similar which
has it's own markup (like `<? ?> <% %> <& &>` and whatnot).
The suggestion is that

* any inline text beginning with at least two doublequotes
   and extending to a matching number of doublequotes

         "" ... "" or """ ... """ or """" ... """"
         or...

* or a block beginning and ending with at least three
   doublequotes on their own line (but optionally followed by
   by whitespace -- similar to the `~~~` for delimited code
   blocks),

be just passed through, except that that these multi-
doublequotes delimiters be removed, and that if the closing
delimiter is followed by a newline that newline is
preserved.

This way *any* kind of extended target markup can be used
without hardcoding specific styles.

The reason for a delimiter consisting of a variable but
matching number of characters is of course that code often
contains an empty pair of doublequotes, and some languages
have a block comment style with `""" ... """`.  Compare the
`C<>` and similar markup in POD which can have any number of
delimiting angles greater than the greatest number of
consecutive `>`s which occur in the enclosed code.

On the other hand double doublequotes don't normally occur
in *text*, so that one wouldn't need to use `\"\"` very
often, if ever.

## Dashes

TeX-style convertsion of `--` to U+2013 EN DASH and `---` to
U+2014 EM DASH should be supported in --smart mode.

## Abbreviations, acronyms and definitions

There should be syntaxes for abbreviations, acronyms and definitions.

### Abbreviations and acronyms

The abbreviation and acronym syntaxes should be reference-style
and cause any instance of their argument in the text to be
surrounded by appropriate markup:

	*[etc.]:	Et cetera - and so on
	**[HTML]:	Hypertext Markup Language

These two should both use the HTML `<abbr>` tag, since `<acronym>`
is deprecated, but the latter should have a class="acronym" so
that one can apply different CSS if desired:

	<abbr title="Et cetera - and so on">etc.</abbr>

	<abbr class="acronym" title="Hypertext Markup Language"
	>HTML</abbr>

	abbr { text-decoration: none; border-bottom: 1pt dotted #000; }

	abbr.acronym {
		text-decoration: none;
		border-bottom: 1pt dotted #000;
		text-transform: lowercase;
		font-variant: small-caps;
	}

Furthermore the definition should optionally contain an URL to
a definition of the abbreviation/acronym:

	**[HTML]:	Hypertext Markup Language 
"http://www.acronyms.net/h.xhtml#html-acronym"

To be rendered in HTML as

	<abbr class="acronym" title="Hypertext Markup Language"
	><a href="http://www.acronyms.net/h.xhtml#html-acronym"
	>title="Hypertext Markup Language">HTML</a></abbr>

N.B. the title attribute *needs* to be duplicated, and the
anchor needs to be present because some browsers -- notably
screen readers :-( and not just the usual culprit -- are
dumb.

### Definitions

Definitions should be either inline or reference:

     Most ?[Romance languages](The modern languages descended from
     Latin) have palatalized consonants.

	<dfn title="The modern languages descended from Latin"
	>Romance languages</dfn>

     Most ?[Romance languages] have palatalized consonants.

	?[Romance languages]: The modern languages descended from Latin
		"http://en.wikipedia.org/wiki/Romance_languages"

	<dfn title="The modern languages descended from Latin"
	><a href="http://en.wikipedia.org/wiki/Romance_languages"
	>title="The modern languages descended from Latin"
	>>Romance languages</a></dfn>

Of course only the reference style should allow for an URL
(it got to be better at something, right?)

## Small-caps

The thorniest problem with finding a syntax for small-caps
is that

* the markdown should preferably *use* capitals so that it
   doesn't just look like ordinary lowercase/mixed case text
   in funny delimiters -- that HTML/CSS/LaTeX smallcaps err
   on that acount is no excuse for markdown --,
* and at the same time it should be possible to include
   big-caps: "Caesar" in small-caps should be (faked with
   Unicode phonetic letters which I hope everyone can see!)
   "Cᴀᴇꜱᴀʀ", not "ᴄᴀᴇꜱᴀʀ".

A possible solution is that the markup for small-caps be
capital letters delimited by
double pipe characters, and that any letter inside which is
preceded by a single unescaped pipe character be rendered as
a (big) capital:

     If naturally descended |||IULIU |CAESARE|| would have
     would perhaps have become _Juil Cierre_ rather than
     _Jules César_ in French.

     <p>If naturally descended <span class="smallcaps">Iuliu
     Caesare</span> would have would have become <em>Juil
     Ciestre</em> rather than <em>Jules César</em> in French.</p>

Truth to tell I did prefer `^^^IULIU ^CAESARE^^`, because
carets are small things pointing upwards, but one may want
to include superscripts in a small-caps span, and seen side
by side `|||ROMANICE||` looks less messy than `^^^ROMANICE^^`,
or conversely three carets in a row look worse than three
pipes in a row, probably because of the white gap below them.

/bpj

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some thoughts for markdown syntax extensions
       [not found] ` <4CD556FA.6090807-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2010-11-06 13:44   ` BP Jonsson
  2010-11-06 18:47   ` John MacFarlane
  2010-11-06 21:45   ` Ivan Lazar Miljenovic
  2 siblings, 0 replies; 6+ messages in thread
From: BP Jonsson @ 2010-11-06 13:44 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

2010-11-06 14:24, BP Jonsson skrev:
> * any inline text beginning with at least two doublequotes
>    and extending to a matching number of doublequotes
>
>          "" ... "" or """ ... """ or """" ... """"
>          or...
>
> * or a block beginning and ending with at least three
>    doublequotes on their own line (but optionally followed by
>    by whitespace -- similar to the `~~~` for delimited code
>    blocks),
>
> be just passed through, except that that these multi-
> doublequotes delimiters be removed, and that if the closing
> delimiter is followed by a newline that newline is
> preserved.
>
> This way *any* kind of extended target markup can be used
> without hardcoding specific styles.

I forgot to mention that this also could be used to get
inline markdown in a LaTeX command argument converted:

     ""\MyCmd{""Argument with *emphasized* word""}""

though currently I have a rather well-working trick
for that:

     \Foo{MyCmd}Argument with *emphasized* word\ooF

Then a script which replaces \Foo{MyCmd} with \MyCmd{
and \ooF with }.  Voilà!

/bpj

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some thoughts for markdown syntax extensions
       [not found] ` <4CD556FA.6090807-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2010-11-06 13:44   ` BP Jonsson
@ 2010-11-06 18:47   ` John MacFarlane
       [not found]     ` <20101106184722.GA21524-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-06 21:45   ` Ivan Lazar Miljenovic
  2 siblings, 1 reply; 6+ messages in thread
From: John MacFarlane @ 2010-11-06 18:47 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ BP Jonsson [Nov 06 10 14:24 ]:
> # Some thoughts for markdown syntax extensions
> 
> ## Verbatim text
> 
> There is a need for some syntax to render text 'verbatim',
> i.e. passing it trough exactly as-is without any conversion,
> of special characters to entities/escapes.  This would be
> useful when writing for a templating system or similar which
> has it's own markup (like `<? ?> <% %> <& &>` and whatnot).
> The suggestion is that
> 
> * any inline text beginning with at least two doublequotes
>   and extending to a matching number of doublequotes
> 
>         "" ... "" or """ ... """ or """" ... """"
>         or...
> 
> * or a block beginning and ending with at least three
>   doublequotes on their own line (but optionally followed by
>   by whitespace -- similar to the `~~~` for delimited code
>   blocks),
> 
> be just passed through, except that that these multi-
> doublequotes delimiters be removed, and that if the closing
> delimiter is followed by a newline that newline is
> preserved.
> 
> This way *any* kind of extended target markup can be used
> without hardcoding specific styles.

I'm reluctant to add a feature like this. I think that pandoc should aim
to produce valid X for any output format X. This feature would break that
guarantee.  The verbatim text would only make sense for one particular
output format.

A better solution, I think, would be to change the parser
so that <? ... >, <% .. >, <& ...> and other standard template
tags are recognized as raw HTML.

> ## Dashes
> 
> TeX-style convertsion of `--` to U+2013 EN DASH and `---` to
> U+2014 EM DASH should be supported in --smart mode.

This has been discussed before on the list.  See the earlier,
inconclusive discussion.

Currently pandoc will convert both `--` and `---` to
an EM DASH.  That is because non-TeXers are likely to use the
symbols this way.  Pandoc will automatically convert a `-`
between digits to an EN DASH.  This sacrifices some
flexibility but promotes the goals of markdown -- you should
be able to write with normal email conventions.

> ## Abbreviations, acronyms and definitions
> 
> There should be syntaxes for abbreviations, acronyms and definitions.

It's not clear what these would mean in formats other than HTML, so
I'm reluctant to complicate pandoc for this. (For HTML output, you can just
use raw HTML.)  But maybe I could be persuaded.

> ### Abbreviations and acronyms
> 
> The abbreviation and acronym syntaxes should be reference-style
> and cause any instance of their argument in the text to be
> surrounded by appropriate markup:
> 
> 	*[etc.]:	Et cetera - and so on
> 	**[HTML]:	Hypertext Markup Language
> 
> These two should both use the HTML `<abbr>` tag, since `<acronym>`
> is deprecated, but the latter should have a class="acronym" so
> that one can apply different CSS if desired:
> 
> 	<abbr title="Et cetera - and so on">etc.</abbr>
> 
> 	<abbr class="acronym" title="Hypertext Markup Language"
> 	>HTML</abbr>
> 	
> 	abbr { text-decoration: none; border-bottom: 1pt dotted #000; }
> 
> 	abbr.acronym {
> 		text-decoration: none;
> 		border-bottom: 1pt dotted #000;
> 		text-transform: lowercase;
> 		font-variant: small-caps;
> 	}
> 
> Furthermore the definition should optionally contain an URL to
> a definition of the abbreviation/acronym:
> 
> 	**[HTML]:	Hypertext Markup Language
> "http://www.acronyms.net/h.xhtml#html-acronym"
> 
> To be rendered in HTML as
> 
> 	<abbr class="acronym" title="Hypertext Markup Language"
> 	><a href="http://www.acronyms.net/h.xhtml#html-acronym"
> 	>title="Hypertext Markup Language">HTML</a></abbr>
> 	
> N.B. the title attribute *needs* to be duplicated, and the
> anchor needs to be present because some browsers -- notably
> screen readers :-( and not just the usual culprit -- are
> dumb.
> 
> ### Definitions

See above on abbreviations/acronyms.

> Definitions should be either inline or reference:
> 
>     Most ?[Romance languages](The modern languages descended from
>     Latin) have palatalized consonants.
> 
> 	<dfn title="The modern languages descended from Latin"
> 	>Romance languages</dfn>
> 
>     Most ?[Romance languages] have palatalized consonants.
> 
> 	?[Romance languages]: The modern languages descended from Latin
> 		"http://en.wikipedia.org/wiki/Romance_languages"
> 
> 	<dfn title="The modern languages descended from Latin"
> 	><a href="http://en.wikipedia.org/wiki/Romance_languages"
> 	>title="The modern languages descended from Latin"
> 	>>Romance languages</a></dfn>
> 
> Of course only the reference style should allow for an URL
> (it got to be better at something, right?)
> 
> ## Small-caps
> 
> The thorniest problem with finding a syntax for small-caps
> is that
> 
> * the markdown should preferably *use* capitals so that it
>   doesn't just look like ordinary lowercase/mixed case text
>   in funny delimiters -- that HTML/CSS/LaTeX smallcaps err
>   on that acount is no excuse for markdown --,
> * and at the same time it should be possible to include
>   big-caps: "Caesar" in small-caps should be (faked with
>   Unicode phonetic letters which I hope everyone can see!)
>   "Cᴀᴇꜱᴀʀ", not "ᴄᴀᴇꜱᴀʀ".
> 
> A possible solution is that the markup for small-caps be
> capital letters delimited by
> double pipe characters, and that any letter inside which is
> preceded by a single unescaped pipe character be rendered as
> a (big) capital:
> 
>     If naturally descended |||IULIU |CAESARE|| would have
>     would perhaps have become _Juil Cierre_ rather than
>     _Jules César_ in French.

This is an interesting proposal; I think the || idea looks
fairly natural.  But we might want to use | in an alternative
table syntax.

>     <p>If naturally descended <span class="smallcaps">Iuliu
>     Caesare</span> would have would have become <em>Juil
>     Ciestre</em> rather than <em>Jules César</em> in French.</p>
> 
> Truth to tell I did prefer `^^^IULIU ^CAESARE^^`, because
> carets are small things pointing upwards, but one may want
> to include superscripts in a small-caps span, and seen side
> by side `|||ROMANICE||` looks less messy than `^^^ROMANICE^^`,
> or conversely three carets in a row look worse than three
> pipes in a row, probably because of the white gap below them.
> 
> /bpj
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some thoughts for markdown syntax extensions
       [not found] ` <4CD556FA.6090807-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2010-11-06 13:44   ` BP Jonsson
  2010-11-06 18:47   ` John MacFarlane
@ 2010-11-06 21:45   ` Ivan Lazar Miljenovic
  2 siblings, 0 replies; 6+ messages in thread
From: Ivan Lazar Miljenovic @ 2010-11-06 21:45 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 7 November 2010 00:24, BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> # Some thoughts for markdown syntax extensions
>
> ## Verbatim text
>
> There is a need for some syntax to render text 'verbatim',
> i.e. passing it trough exactly as-is without any conversion,
> of special characters to entities/escapes.  This would be
> useful when writing for a templating system or similar which
> has it's own markup (like `<? ?> <% %> <& &>` and whatnot).
> The suggestion is that
>
> * any inline text beginning with at least two doublequotes
>  and extending to a matching number of doublequotes
>
>        "" ... "" or """ ... """ or """" ... """"
>        or...
>
> * or a block beginning and ending with at least three
>  doublequotes on their own line (but optionally followed by
>  by whitespace -- similar to the `~~~` for delimited code
>  blocks),
>
> be just passed through, except that that these multi-
> doublequotes delimiters be removed, and that if the closing
> delimiter is followed by a newline that newline is
> preserved.
>
> This way *any* kind of extended target markup can be used
> without hardcoding specific styles.
>
> The reason for a delimiter consisting of a variable but
> matching number of characters is of course that code often
> contains an empty pair of doublequotes, and some languages
> have a block comment style with `""" ... """`.  Compare the
> `C<>` and similar markup in POD which can have any number of
> delimiting angles greater than the greatest number of
> consecutive `>`s which occur in the enclosed code.
>
> On the other hand double doublequotes don't normally occur
> in *text*, so that one wouldn't need to use `\"\"` very
> often, if ever.
>
> ## Dashes
>
> TeX-style convertsion of `--` to U+2013 EN DASH and `---` to
> U+2014 EM DASH should be supported in --smart mode.
>
> ## Abbreviations, acronyms and definitions
>
> There should be syntaxes for abbreviations, acronyms and definitions.
>
> ### Abbreviations and acronyms
>
> The abbreviation and acronym syntaxes should be reference-style
> and cause any instance of their argument in the text to be
> surrounded by appropriate markup:
>
>        *[etc.]:        Et cetera - and so on
>        **[HTML]:       Hypertext Markup Language
>
> These two should both use the HTML `<abbr>` tag, since `<acronym>`
> is deprecated, but the latter should have a class="acronym" so
> that one can apply different CSS if desired:
>
>        <abbr title="Et cetera - and so on">etc.</abbr>
>
>        <abbr class="acronym" title="Hypertext Markup Language"
>        >HTML</abbr>
>
>        abbr { text-decoration: none; border-bottom: 1pt dotted #000; }
>
>        abbr.acronym {
>                text-decoration: none;
>                border-bottom: 1pt dotted #000;
>                text-transform: lowercase;
>                font-variant: small-caps;
>        }
>
> Furthermore the definition should optionally contain an URL to
> a definition of the abbreviation/acronym:
>
>        **[HTML]:       Hypertext Markup Language
> "http://www.acronyms.net/h.xhtml#html-acronym"
>
> To be rendered in HTML as
>
>        <abbr class="acronym" title="Hypertext Markup Language"
>        ><a href="http://www.acronyms.net/h.xhtml#html-acronym"
>        >title="Hypertext Markup Language">HTML</a></abbr>
>
> N.B. the title attribute *needs* to be duplicated, and the
> anchor needs to be present because some browsers -- notably
> screen readers :-( and not just the usual culprit -- are
> dumb.
>
> ### Definitions
>
> Definitions should be either inline or reference:
>
>    Most ?[Romance languages](The modern languages descended from
>    Latin) have palatalized consonants.
>
>        <dfn title="The modern languages descended from Latin"
>        >Romance languages</dfn>
>
>    Most ?[Romance languages] have palatalized consonants.
>
>        ?[Romance languages]: The modern languages descended from Latin
>                "http://en.wikipedia.org/wiki/Romance_languages"
>
>        <dfn title="The modern languages descended from Latin"
>        ><a href="http://en.wikipedia.org/wiki/Romance_languages"
>        >title="The modern languages descended from Latin"
>        >>Romance languages</a></dfn>
>
> Of course only the reference style should allow for an URL
> (it got to be better at something, right?)
>
> ## Small-caps
>
> The thorniest problem with finding a syntax for small-caps
> is that
>
> * the markdown should preferably *use* capitals so that it
>  doesn't just look like ordinary lowercase/mixed case text
>  in funny delimiters -- that HTML/CSS/LaTeX smallcaps err
>  on that acount is no excuse for markdown --,
> * and at the same time it should be possible to include
>  big-caps: "Caesar" in small-caps should be (faked with
>  Unicode phonetic letters which I hope everyone can see!)
>  "Cᴀᴇꜱᴀʀ", not "ᴄᴀᴇꜱᴀʀ".
>
> A possible solution is that the markup for small-caps be
> capital letters delimited by
> double pipe characters, and that any letter inside which is
> preceded by a single unescaped pipe character be rendered as
> a (big) capital:
>
>    If naturally descended |||IULIU |CAESARE|| would have
>    would perhaps have become _Juil Cierre_ rather than
>    _Jules César_ in French.
>
>    <p>If naturally descended <span class="smallcaps">Iuliu
>    Caesare</span> would have would have become <em>Juil
>    Ciestre</em> rather than <em>Jules César</em> in French.</p>
>
> Truth to tell I did prefer `^^^IULIU ^CAESARE^^`, because
> carets are small things pointing upwards, but one may want
> to include superscripts in a small-caps span, and seen side
> by side `|||ROMANICE||` looks less messy than `^^^ROMANICE^^`,
> or conversely three carets in a row look worse than three
> pipes in a row, probably because of the white gap below them.

If we're discussing markdown extensions, I'd like to bring back up the
topic of syntax for comments in markdown.  Such an inclusion might
make me actually go write that split pragma I've been procrastinating
about for a while (so that you can specify how to split a single
markdown document into multiple documents; that way you can have an
all-in-one textual help document and have it split into multiple HTML
pages).

-- 
Ivan Lazar Miljenovic
Ivan.Miljenovic-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
IvanMiljenovic.wordpress.com

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some thoughts for markdown syntax extensions
       [not found]     ` <20101106184722.GA21524-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-11-07 10:16       ` Tillmann Rendel
  2010-11-07 10:59       ` Nathan Gass
  1 sibling, 0 replies; 6+ messages in thread
From: Tillmann Rendel @ 2010-11-07 10:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

John MacFarlane wrote:
>>  The suggestion is that
>>
>>  * any inline text beginning with at least two doublequotes
>>     and extending to a matching number of doublequotes
>>
>>           "" ... "" or """ ... """ or """" ... """"
>>           or...
>>
>>  be just passed through.
>>
> I'm reluctant to add a feature like this. I think that pandoc should aim
> to produce valid X for any output format X. This feature would break that
> guarantee.  The verbatim text would only make sense for one particular
> output format.
>
> A better solution, I think, would be to change the parser
> so that<? ...>,<% ..>,<&  ...>  and other standard template
> tags are recognized as raw HTML.

I agree that changing the parser is the better solution here, but what 
about non-standard template tags? What about non-HTML target formats?

A syntax for "raw text" not to be processed by pandoc would offer some 
extra flexibility: A (power-) user could write a little preprocessor to 
wrap non-standard tags into pando's "raw text" tags. For example, such a 
preprocessor could convert

   <? ... ?>

into

   """<? ... ?>""".

Now, pandoc can be run as usual and the overall pipeline will behave 
like pandoc with a changed parser. This allows a user to extend pandoc's 
parser without extending its Haskell source code.

By the way, with latex output, pandoc can already be tricked to do this 
as follows:

   \newcommand{\ignoreThis}[1]{#1}

   ...
   \ignoreThis{text to be passed through}
   ...

Pandoc will wrap "text to be passed through" in an \ignoreThis call, but 
at macro expansion time, TeX will expand it away. So in many situations, 
this allows to use TeX commands with non-standard syntax. For example, 
the pgf/tikz package for drawing pictures supports the following syntax

   \tikz <textual description of a picture> .

So the textual description is terminated with a full-stop. Pandoc's 
parser would usually not realize that the textual description should be 
passed as-is to the Latex output, but with the above hack, one can write

   \ignoreThis{\itkz <textual description of a picture> .}

instead, and it works. Or one can write a preprocessor which searchs for 
"\tikz ... ." and wraps it in a \ignoreThis call.

Adding support for "raw text" in pandoc would allow similar tricks in a 
wider range of situations, including other target formats than latex.

> I think that pandoc should aim to produce valid X for any output format X.

I guess there are different "usage scenarios" for pandoc. One scenario 
is that one wants to generate different formats from a single source 
file, and for that scenario, the described property is obviously important.

But another scenario is that one needs, for some reason, to produce a 
specific format, but doesn't want to actually write one's document in 
that format. For example, I need to produce latex documents for 
scientific articles, but rather want to write markdown instead, because 
it fits better into my workflow. (I want to have a smooth transition 
between programming, writing emails about the programs, and condensing 
the emails into an article). In that scenario, it is important that 
pandoc supports all the bells and whistles of the target format one 
happens to need.

Clearly, because of the first scenario, pandoc should not support all 
these bells and whistles natively. Instead, I propose, it should be 
easily extendible to support them. Already, the pandoc API is an 
important tool in that regard, but syntax for "raw text" would be 
another step in a good direction.

   Tillmann

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some thoughts for markdown syntax extensions
       [not found]     ` <20101106184722.GA21524-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-11-07 10:16       ` Tillmann Rendel
@ 2010-11-07 10:59       ` Nathan Gass
  1 sibling, 0 replies; 6+ messages in thread
From: Nathan Gass @ 2010-11-07 10:59 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 06.11.10 19:47, John MacFarlane wrote:
> +++ BP Jonsson [Nov 06 10 14:24 ]:
>> # Some thoughts for markdown syntax extensions
>>
>> ## Verbatim text
>>
>> There is a need for some syntax to render text 'verbatim',
>> i.e. passing it trough exactly as-is without any conversion,
>> of special characters to entities/escapes.  This would be
>> useful when writing for a templating system or similar which
>> has it's own markup (like `<? ?>  <% %>  <&  &>` and whatnot).
>> The suggestion is that
>>
>> * any inline text beginning with at least two doublequotes
>>    and extending to a matching number of doublequotes
>>
>>          "" ... "" or """ ... """ or """" ... """"
>>          or...
>>
>> * or a block beginning and ending with at least three
>>    doublequotes on their own line (but optionally followed by
>>    by whitespace -- similar to the `~~~` for delimited code
>>    blocks),
>>
>> be just passed through, except that that these multi-
>> doublequotes delimiters be removed, and that if the closing
>> delimiter is followed by a newline that newline is
>> preserved.
>>
>> This way *any* kind of extended target markup can be used
>> without hardcoding specific styles.
>
> I'm reluctant to add a feature like this. I think that pandoc should aim
> to produce valid X for any output format X. This feature would break that
> guarantee.  The verbatim text would only make sense for one particular
> output format.
>
> A better solution, I think, would be to change the parser
> so that<? ...>,<% ..>,<&  ...>  and other standard template
> tags are recognized as raw HTML.

I'm not personally interested in this topic, but wanted to toss in an 
idea anyway:

The new syntax just needs a way to select for which writer(s) the 
verbatim block should be written. So something like:

"":latex nonstandard verbatim latex""
"":html,s5 nonstandard verbatim html""

This is longer than the proposed form, but it should be as rarely used 
as possible anyway. Another "solution" could be to give the intended 
output format in metadata, if we add a syntax for metadata. This way it 
is clear that the document only works in one output format and you have 
the added benefit of a shorter command line as you don't need to declare 
your output format.

By the way, I wonder if this syntax is a bit wasted on this feature, as 
it is very easy to type. Of course, I think this because I don't 
personally need this feature. Anyway, its probably useful to think about 
using a less practical char and keeping this syntax free to use for some 
other feature.


>
>> ## Dashes
>>
>> TeX-style convertsion of `--` to U+2013 EN DASH and `---` to
>> U+2014 EM DASH should be supported in --smart mode.
>
> This has been discussed before on the list.  See the earlier,
> inconclusive discussion.
>
> Currently pandoc will convert both `--` and `---` to
> an EM DASH.  That is because non-TeXers are likely to use the
> symbols this way.  Pandoc will automatically convert a `-`
> between digits to an EN DASH.  This sacrifices some
> flexibility but promotes the goals of markdown -- you should
> be able to write with normal email conventions.
>
>> ## Abbreviations, acronyms and definitions
>>
>> There should be syntaxes for abbreviations, acronyms and definitions.
>
> It's not clear what these would mean in formats other than HTML, so
> I'm reluctant to complicate pandoc for this. (For HTML output, you can just
> use raw HTML.)  But maybe I could be persuaded.
>
>> ### Abbreviations and acronyms
>>
>> The abbreviation and acronym syntaxes should be reference-style
>> and cause any instance of their argument in the text to be
>> surrounded by appropriate markup:
>>
>> 	*[etc.]:	Et cetera - and so on
>> 	**[HTML]:	Hypertext Markup Language
>>
>> These two should both use the HTML `<abbr>` tag, since `<acronym>`
>> is deprecated, but the latter should have a class="acronym" so
>> that one can apply different CSS if desired:
>>
>> 	<abbr title="Et cetera - and so on">etc.</abbr>
>>
>> 	<abbr class="acronym" title="Hypertext Markup Language"
>> 	>HTML</abbr>
>> 	
>> 	abbr { text-decoration: none; border-bottom: 1pt dotted #000; }
>>
>> 	abbr.acronym {
>> 		text-decoration: none;
>> 		border-bottom: 1pt dotted #000;
>> 		text-transform: lowercase;
>> 		font-variant: small-caps;
>> 	}
>>
>> Furthermore the definition should optionally contain an URL to
>> a definition of the abbreviation/acronym:
>>
>> 	**[HTML]:	Hypertext Markup Language
>> "http://www.acronyms.net/h.xhtml#html-acronym"
>>
>> To be rendered in HTML as
>>
>> 	<abbr class="acronym" title="Hypertext Markup Language"
>> 	><a href="http://www.acronyms.net/h.xhtml#html-acronym"
>> 	>title="Hypertext Markup Language">HTML</a></abbr>
>> 	
>> N.B. the title attribute *needs* to be duplicated, and the
>> anchor needs to be present because some browsers -- notably
>> screen readers :-( and not just the usual culprit -- are
>> dumb.
>>
>> ### Definitions
>
> See above on abbreviations/acronyms.
>
>> Definitions should be either inline or reference:
>>
>>      Most ?[Romance languages](The modern languages descended from
>>      Latin) have palatalized consonants.
>>
>> 	<dfn title="The modern languages descended from Latin"
>> 	>Romance languages</dfn>
>>
>>      Most ?[Romance languages] have palatalized consonants.
>>
>> 	?[Romance languages]: The modern languages descended from Latin
>> 		"http://en.wikipedia.org/wiki/Romance_languages"
>>
>> 	<dfn title="The modern languages descended from Latin"
>> 	><a href="http://en.wikipedia.org/wiki/Romance_languages"
>> 	>title="The modern languages descended from Latin"
>> 	>>Romance languages</a></dfn>
>>
>> Of course only the reference style should allow for an URL
>> (it got to be better at something, right?)
>>
>> ## Small-caps
>>
>> The thorniest problem with finding a syntax for small-caps
>> is that
>>
>> * the markdown should preferably *use* capitals so that it
>>    doesn't just look like ordinary lowercase/mixed case text
>>    in funny delimiters -- that HTML/CSS/LaTeX smallcaps err
>>    on that acount is no excuse for markdown --,
>> * and at the same time it should be possible to include
>>    big-caps: "Caesar" in small-caps should be (faked with
>>    Unicode phonetic letters which I hope everyone can see!)
>>    "Cᴀᴇꜱᴀʀ", not "ᴄᴀᴇꜱᴀʀ".
>>
>> A possible solution is that the markup for small-caps be
>> capital letters delimited by
>> double pipe characters, and that any letter inside which is
>> preceded by a single unescaped pipe character be rendered as
>> a (big) capital:
>>
>>      If naturally descended |||IULIU |CAESARE|| would have
>>      would perhaps have become _Juil Cierre_ rather than
>>      _Jules César_ in French.
>
> This is an interesting proposal; I think the || idea looks
> fairly natural.  But we might want to use | in an alternative
> table syntax.

Imho, |Iuliu Caesare| is more readable to my eyes than |||IULIU 
|CAESAR|| and not to far appart from the desired output. So I'd rather 
simply have normal text between pipes render in small-caps and leave 
big-caps between pipes as big-caps. I don't think the ugly syntax just 
for being able to write everything in big-caps is worth it. Especially 
as they are visually not that close to small-caps.

Just my 2 cents.

Nathan

>
>>      <p>If naturally descended<span class="smallcaps">Iuliu
>>      Caesare</span>  would have would have become<em>Juil
>>      Ciestre</em>  rather than<em>Jules César</em>  in French.</p>
>>
>> Truth to tell I did prefer `^^^IULIU ^CAESARE^^`, because
>> carets are small things pointing upwards, but one may want
>> to include superscripts in a small-caps span, and seen side
>> by side `|||ROMANICE||` looks less messy than `^^^ROMANICE^^`,
>> or conversely three carets in a row look worse than three
>> pipes in a row, probably because of the white gap below them.
>>
>> /bpj
>>
>> --
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
>> For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-11-07 10:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-06 13:24 Some thoughts for markdown syntax extensions BP Jonsson
     [not found] ` <4CD556FA.6090807-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2010-11-06 13:44   ` BP Jonsson
2010-11-06 18:47   ` John MacFarlane
     [not found]     ` <20101106184722.GA21524-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-07 10:16       ` Tillmann Rendel
2010-11-07 10:59       ` Nathan Gass
2010-11-06 21:45   ` Ivan Lazar Miljenovic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).