public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Some thoughts for markdown syntax extensions
Date: Sat, 06 Nov 2010 14:24:10 +0100	[thread overview]
Message-ID: <4CD556FA.6090807@gmail.com> (raw)

# Some thoughts for markdown syntax extensions

## Verbatim text

There is a need for some syntax to render text 'verbatim',
i.e. passing it trough exactly as-is without any conversion,
of special characters to entities/escapes.  This would be
useful when writing for a templating system or similar which
has it's own markup (like `<? ?> <% %> <& &>` and whatnot).
The suggestion is that

* any inline text beginning with at least two doublequotes
   and extending to a matching number of doublequotes

         "" ... "" or """ ... """ or """" ... """"
         or...

* or a block beginning and ending with at least three
   doublequotes on their own line (but optionally followed by
   by whitespace -- similar to the `~~~` for delimited code
   blocks),

be just passed through, except that that these multi-
doublequotes delimiters be removed, and that if the closing
delimiter is followed by a newline that newline is
preserved.

This way *any* kind of extended target markup can be used
without hardcoding specific styles.

The reason for a delimiter consisting of a variable but
matching number of characters is of course that code often
contains an empty pair of doublequotes, and some languages
have a block comment style with `""" ... """`.  Compare the
`C<>` and similar markup in POD which can have any number of
delimiting angles greater than the greatest number of
consecutive `>`s which occur in the enclosed code.

On the other hand double doublequotes don't normally occur
in *text*, so that one wouldn't need to use `\"\"` very
often, if ever.

## Dashes

TeX-style convertsion of `--` to U+2013 EN DASH and `---` to
U+2014 EM DASH should be supported in --smart mode.

## Abbreviations, acronyms and definitions

There should be syntaxes for abbreviations, acronyms and definitions.

### Abbreviations and acronyms

The abbreviation and acronym syntaxes should be reference-style
and cause any instance of their argument in the text to be
surrounded by appropriate markup:

	*[etc.]:	Et cetera - and so on
	**[HTML]:	Hypertext Markup Language

These two should both use the HTML `<abbr>` tag, since `<acronym>`
is deprecated, but the latter should have a class="acronym" so
that one can apply different CSS if desired:

	<abbr title="Et cetera - and so on">etc.</abbr>

	<abbr class="acronym" title="Hypertext Markup Language"
	>HTML</abbr>
	
	abbr { text-decoration: none; border-bottom: 1pt dotted #000; }

	abbr.acronym {
		text-decoration: none;
		border-bottom: 1pt dotted #000;
		text-transform: lowercase;
		font-variant: small-caps;
	}

Furthermore the definition should optionally contain an URL to
a definition of the abbreviation/acronym:

	**[HTML]:	Hypertext Markup Language 
"http://www.acronyms.net/h.xhtml#html-acronym"

To be rendered in HTML as

	<abbr class="acronym" title="Hypertext Markup Language"
	><a href="http://www.acronyms.net/h.xhtml#html-acronym"
	>title="Hypertext Markup Language">HTML</a></abbr>
	
N.B. the title attribute *needs* to be duplicated, and the
anchor needs to be present because some browsers -- notably
screen readers :-( and not just the usual culprit -- are
dumb.

### Definitions

Definitions should be either inline or reference:

     Most ?[Romance languages](The modern languages descended from
     Latin) have palatalized consonants.

	<dfn title="The modern languages descended from Latin"
	>Romance languages</dfn>

     Most ?[Romance languages] have palatalized consonants.

	?[Romance languages]: The modern languages descended from Latin
		"http://en.wikipedia.org/wiki/Romance_languages"

	<dfn title="The modern languages descended from Latin"
	><a href="http://en.wikipedia.org/wiki/Romance_languages"
	>title="The modern languages descended from Latin"
	>>Romance languages</a></dfn>

Of course only the reference style should allow for an URL
(it got to be better at something, right?)

## Small-caps

The thorniest problem with finding a syntax for small-caps
is that

* the markdown should preferably *use* capitals so that it
   doesn't just look like ordinary lowercase/mixed case text
   in funny delimiters -- that HTML/CSS/LaTeX smallcaps err
   on that acount is no excuse for markdown --,
* and at the same time it should be possible to include
   big-caps: "Caesar" in small-caps should be (faked with
   Unicode phonetic letters which I hope everyone can see!)
   "Cᴀᴇꜱᴀʀ", not "ᴄᴀᴇꜱᴀʀ".

A possible solution is that the markup for small-caps be
capital letters delimited by
double pipe characters, and that any letter inside which is
preceded by a single unescaped pipe character be rendered as
a (big) capital:

     If naturally descended |||IULIU |CAESARE|| would have
     would perhaps have become _Juil Cierre_ rather than
     _Jules César_ in French.

     <p>If naturally descended <span class="smallcaps">Iuliu
     Caesare</span> would have would have become <em>Juil
     Ciestre</em> rather than <em>Jules César</em> in French.</p>

Truth to tell I did prefer `^^^IULIU ^CAESARE^^`, because
carets are small things pointing upwards, but one may want
to include superscripts in a small-caps span, and seen side
by side `|||ROMANICE||` looks less messy than `^^^ROMANICE^^`,
or conversely three carets in a row look worse than three
pipes in a row, probably because of the white gap below them.

/bpj

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.



             reply	other threads:[~2010-11-06 13:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-06 13:24 BP Jonsson [this message]
     [not found] ` <4CD556FA.6090807-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2010-11-06 13:44   ` BP Jonsson
2010-11-06 18:47   ` John MacFarlane
     [not found]     ` <20101106184722.GA21524-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-11-07 10:16       ` Tillmann Rendel
2010-11-07 10:59       ` Nathan Gass
2010-11-06 21:45   ` Ivan Lazar Miljenovic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CD556FA.6090807@gmail.com \
    --to=bpjonsson-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).