public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Syntax highlighting Pandoc-flavoured Markdown
@ 2015-11-09  7:11 Tom McLean
       [not found] ` <e1da6088-0657-4685-9288-f6cfe7b6c5f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Tom McLean @ 2015-11-09  7:11 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2285 bytes --]

I'm playing around with an application that uses Pandoc-flavoured markdown 
in a graphical widget, and have a slightly vague question.

Syntax highlighting (of the Markdown source, not code blocks within the 
source) would be one of those things that's nice to have. The easiest way 
of doing this would just be a series of regexes (this seems to be the 
solution others have used). But this feels like a possible source of 
abundant error. I'm currently using this (peg-markdown-highlight) 
<http://hasseg.org/peg-markdown-highlight/> as a more accurate method, 
which works very well for standard Markdown but fails to succeed with 
Pandoc's extensions.

One possible (if inefficient) method is to use Pandoc as a syntax parser. 
It's easy enough to get an AST from Pandoc, but this strips out, quite 
naturally, the formatting characters (like '*') and redundant whitespace in 
favour of, well, abstraction.  The trouble is, if we're wanting to 
highlight syntax, we need to include the special characters that indicate 
the syntax. At a minimum, we need to know the indices in in a string of a 
given format's starts and ends, not merely the text that is formatted so. 

I don't know any Haskell, and while I'm happy to try and learn some to 
achieve this, I thought I might first ask whether this is an achievable 
goal. My quick look at the design of Pandoc seems to be that such details 
are quickly abstracted away to preserve its modular structure, so that a 
simple solution like a new filter is impossible. Is it possible to parse 
Markdown like this without going beyond what's likely going to be a very 
limited ability to write Haskell? Is there a more sensible solution? Am I 
missing an easy way?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e1da6088-0657-4685-9288-f6cfe7b6c5f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2763 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Syntax highlighting Pandoc-flavoured Markdown
       [not found] ` <e1da6088-0657-4685-9288-f6cfe7b6c5f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-11-09 14:17   ` 'Jason Seeley' via pandoc-discuss
       [not found]     ` <2382946c-6dee-433f-95ab-170feda031c4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2015-11-09 22:08   ` John MACFARLANE
  1 sibling, 1 reply; 6+ messages in thread
From: 'Jason Seeley' via pandoc-discuss @ 2015-11-09 14:17 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 978 bytes --]

To be truly accurate, you will probably need a parser. Pandoc is 
surprisingly complicated, will all of the extensions and embedded HTML and 
LaTeX. Regexes CAN do the job (mostly), but it gets complicated fast.

If you are familiar with Vim syntax highlighting at all, there is 
vim-pandoc-syntax <https://github.com/vim-pandoc/vim-pandoc-syntax>. You 
could use that as a base, for the regexes and such.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/2382946c-6dee-433f-95ab-170feda031c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1423 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Syntax highlighting Pandoc-flavoured Markdown
       [not found]     ` <2382946c-6dee-433f-95ab-170feda031c4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-11-09 14:20       ` Beni Cherniavsky-Paskin
  0 siblings, 0 replies; 6+ messages in thread
From: Beni Cherniavsky-Paskin @ 2015-11-09 14:20 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2001 bytes --]

Atom's https://atom.io/packages/language-pfm extension is also a relatively
rich implementation of pandoc syntax highlighting.


2015-11-09 16:17 GMT+02:00 'Jason Seeley' via pandoc-discuss <
pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>:

> To be truly accurate, you will probably need a parser. Pandoc is
> surprisingly complicated, will all of the extensions and embedded HTML and
> LaTeX. Regexes CAN do the job (mostly), but it gets complicated fast.
>
> If you are familiar with Vim syntax highlighting at all, there is
> vim-pandoc-syntax <https://github.com/vim-pandoc/vim-pandoc-syntax>. You
> could use that as a base, for the regexes and such.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/2382946c-6dee-433f-95ab-170feda031c4%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/2382946c-6dee-433f-95ab-170feda031c4%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CALJxeiKoQ%2BqfTmCH_-Qsdo-j-Sk%2B3uidO23R8%2BnEs6%2BT%2B0KtKg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 3230 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Syntax highlighting Pandoc-flavoured Markdown
       [not found] ` <e1da6088-0657-4685-9288-f6cfe7b6c5f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2015-11-09 14:17   ` 'Jason Seeley' via pandoc-discuss
@ 2015-11-09 22:08   ` John MACFARLANE
       [not found]     ` <20151109220820.GB14389-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
       [not found]     ` <7DCF1B6B6A90E3484486DAFE@192.168.1.50>
  1 sibling, 2 replies; 6+ messages in thread
From: John MACFARLANE @ 2015-11-09 22:08 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Sorry, there isn't an easy way.  It would have been good to
design pandoc to return some source mapping information as
well as an AST, but I didn't have this in mind at the time.

It's not a trivial problem for Markdown anyway, since
elements can occupy non-contiguous spans of text.  e.g.

    1.  > block quote
        > more block quote

The character on the second line, directly under the `1`,
isn't part of the block quote, even though it occurs after
the position where the block quote begins, and before the
position where the block quote ends.

+++ Tom McLean [Nov 08 15 23:11 ]:
>   I'm playing around with an application that uses Pandoc-flavoured
>   markdown in a graphical widget, and have a slightly vague question.
>   Syntax highlighting (of the Markdown source, not code blocks within the
>   source) would be one of those things that's nice to have. The easiest
>   way of doing this would just be a series of regexes (this seems to be
>   the solution others have used). But this feels like a possible source
>   of abundant error. I'm currently using [1]this (peg-markdown-highlight)
>   as a more accurate method, which works very well for standard Markdown
>   but fails to succeed with Pandoc's extensions.
>   One possible (if inefficient) method is to use Pandoc as a syntax
>   parser. It's easy enough to get an AST from Pandoc, but this strips
>   out, quite naturally, the formatting characters (like '*') and
>   redundant whitespace in favour of, well, abstraction.  The trouble is,
>   if we're wanting to highlight syntax, we need to include the special
>   characters that indicate the syntax. At a minimum, we need to know the
>   indices in in a string of a given format's starts and ends, not merely
>   the text that is formatted so.
>   I don't know any Haskell, and while I'm happy to try and learn some to
>   achieve this, I thought I might first ask whether this is an achievable
>   goal. My quick look at the design of Pandoc seems to be that such
>   details are quickly abstracted away to preserve its modular structure,
>   so that a simple solution like a new filter is impossible. Is it
>   possible to parse Markdown like this without going beyond what's likely
>   going to be a very limited ability to write Haskell? Is there a more
>   sensible solution? Am I missing an easy way?
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [2]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [3]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [4]https://groups.google.com/d/msgid/pandoc-discuss/e1da6088-0657-4685-
>   9288-f6cfe7b6c5f3%40googlegroups.com.
>   For more options, visit [5]https://groups.google.com/d/optout.
>
>References
>
>   1. http://hasseg.org/peg-markdown-highlight/
>   2. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   3. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   4. https://groups.google.com/d/msgid/pandoc-discuss/e1da6088-0657-4685-9288-f6cfe7b6c5f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>   5. https://groups.google.com/d/optout


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Syntax highlighting Pandoc-flavoured Markdown
       [not found]     ` <20151109220820.GB14389-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
@ 2015-11-09 22:34       ` Daniel Staal
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Staal @ 2015-11-09 22:34 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

--As of November 9, 2015 2:08:20 PM -0800, John MACFARLANE is alleged to 
have said:

> Sorry, there isn't an easy way.  It would have been good to
> design pandoc to return some source mapping information as
> well as an AST, but I didn't have this in mind at the time.
>
> It's not a trivial problem for Markdown anyway, since
> elements can occupy non-contiguous spans of text.  e.g.
>
>     1.  > block quote
>         > more block quote
>
> The character on the second line, directly under the `1`,
> isn't part of the block quote, even though it occurs after
> the position where the block quote begins, and before the
> position where the block quote ends.

--As for the rest, it is mine.

As a side thought: If this is a desired feature, could it be done as a 
custom output format?  Basically a 'Markdown wrapped in tags' of some sort, 
where the tags can be read as the mapping info, around the text with the 
original markup?

Just a thought.

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Syntax highlighting Pandoc-flavoured Markdown
       [not found]       ` <7DCF1B6B6A90E3484486DAFE-Q0ErXNX1RuZz+/J76PBWHg@public.gmane.org>
@ 2015-11-10 14:26         ` BPJ
  0 siblings, 0 replies; 6+ messages in thread
From: BPJ @ 2015-11-10 14:26 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 3314 bytes --]

As for the OP's immediate needs the best bet is probably to use vim-pandoc
(the syntax part of which lives in its own repo) and vim's tohtml function
and then convert the HTML thus generated into the desired format by some
means. A basic description of the easy part can be found on the vim wiki

<http://vim.wikia.com/wiki/Pasting_code_with_syntax_coloring_in_emails>

The vim-pandoc syntax highlighting has its issues, particularly with
non-codeblock indented material, but it is far better than nothing!

/bpj

måndag 9 november 2015 skrev Daniel Staal <DStaal-Jdbf3xiKgS8@public.gmane.org>:

> --As of November 9, 2015 2:08:20 PM -0800, John MACFARLANE is alleged to
> have said:
>
> Sorry, there isn't an easy way.  It would have been good to
>> design pandoc to return some source mapping information as
>> well as an AST, but I didn't have this in mind at the time.
>>
>> It's not a trivial problem for Markdown anyway, since
>> elements can occupy non-contiguous spans of text.  e.g.
>>
>>     1.  > block quote
>>         > more block quote
>>
>> The character on the second line, directly under the `1`,
>> isn't part of the block quote, even though it occurs after
>> the position where the block quote begins, and before the
>> position where the block quote ends.
>>
>
> --As for the rest, it is mine.
>
> As a side thought: If this is a desired feature, could it be done as a
> custom output format?  Basically a 'Markdown wrapped in tags' of some sort,
> where the tags can be read as the mapping info, around the text with the
> original markup?
>
> Just a thought.
>
> Daniel T. Staal
>
> ---------------------------------------------------------------
> This email copyright the author.  Unless otherwise noted, you
> are expressly allowed to retransmit, quote, or otherwise use
> the contents for non-commercial purposes.  This copyright will
> expire 5 years after the author's death, or in 30 years,
> whichever is longer, unless such a period is in excess of
> local copyright law.
> ---------------------------------------------------------------
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/7DCF1B6B6A90E3484486DAFE%40%5B192.168.1.50%5D
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCuFT5WdBbWwUd0G%2Bk1xJS-crNp8GExXox3M9ZYor9EhA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 4484 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-11-10 14:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-09  7:11 Syntax highlighting Pandoc-flavoured Markdown Tom McLean
     [not found] ` <e1da6088-0657-4685-9288-f6cfe7b6c5f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-11-09 14:17   ` 'Jason Seeley' via pandoc-discuss
     [not found]     ` <2382946c-6dee-433f-95ab-170feda031c4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-11-09 14:20       ` Beni Cherniavsky-Paskin
2015-11-09 22:08   ` John MACFARLANE
     [not found]     ` <20151109220820.GB14389-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
2015-11-09 22:34       ` Daniel Staal
     [not found]     ` <7DCF1B6B6A90E3484486DAFE@192.168.1.50>
     [not found]       ` <7DCF1B6B6A90E3484486DAFE-Q0ErXNX1RuZz+/J76PBWHg@public.gmane.org>
2015-11-10 14:26         ` BPJ

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).