public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: BPJ <bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Way to have markdown ==highlighting== show up as highlighting in .docx or .odt files?
Date: Thu, 23 Jun 2022 21:06:45 +0200	[thread overview]
Message-ID: <CADAJKhA3F-VC--BMe2mpERZr=LmXZFNE61EwvHmfk0dwYp_ALw@mail.gmail.com> (raw)
In-Reply-To: <3316a007-a142-4d3d-a2f8-40befafb4249n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 9239 bytes --]

It would be possible but it would be rather fragile and finicky because you
would have to

1.  traverse lists of inline elements,
2.  locate string elements which contain "==",
3.  split that strings into the bit before and after "==",
4.  insert the right raw markup for the output format in place of "=="
5.  collect elements up to the next string element which contains "==",
6.  Redo #3 and #4 with that string,
7.  Throw an error if #5 fails!

You are probably better off replacing the `==...==` in your existing files
using the attached Perl script. It is a modification of a script which I
have used to convert `_..._` and the like to spans. It uses regexes, but is
smart enough to leave block and inline code and math as well as "==" in
contexts were it probably isn't a delimiter alone. Make sure to check out
the -h and -m options for documentation



Den tors 23 juni 2022 13:15Emiliano <gattulli.emiliano-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

> BPJ, is it possible to create a lua filter that does the same thing but it
> converts Obsidian syntax '== ==' into a highlighted text? I have tons of
> notes written in Obsidian syntax and it would be an enormous task to modify
> all of them with the 'new' syntax. By the way, your lua filter works
> perfectly!
>
> Il giorno mercoledì 22 giugno 2022 alle 19:45:07 UTC+2 BPJ ha scritto:
>
>> According to the principle that it's better to find out what you can do
>> with the tools you have you can use a span with a class, like `[text]{.hl}`
>> and use a simple filter to convert that to Obsidian's syntax when
>> processing with Obsidian, by choosing `markdown` as output format, or
>> insert the necessary LaTeX markup when producing PDF (or arrange for the
>> necessary CSS to be loaded if producing PDF via HTML.)
>>
>> ``````lua
>> local eq_hl = pandoc.RawInline('markdown', '==')
>>
>> local highlight = {
>>   markdown = { start = eq_hl, stop = eq_hl },
>>   latex = {
>>     start = pandoc.RawInline('latex', '\\colorbox[named]{yellow}{'),
>>     stop = pandoc.RawInline('latex', '}'),
>>   },
>> }
>>
>> local hl = highlight[FORMAT]
>>
>> function Span (s)
>>   if s.classes:includes('hl') then
>>     if hl then
>>       rv = s.content
>>       rv:insert(1, hl.start)
>>       rv:insert(hl.stop)
>>       return rv
>>     end
>>   end
>>   return nil
>> end
>> ``````
>>
>> I'm not sure that the default LaTeX template always loads the xcolor
>> package. You may need a modifier template.
>>
>> I can imagine you lose some in-editor preview, but you get reasonable
>> output.
>>
>> HTH,
>>
>> /bpj
>>
>> Den ons 22 juni 2022 16:11Emiliano <gattulli...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
>>
>>> Well, if you export in PDF through Obsidian the highlighted text is
>>> rendered correctly but not if you use Pandoc. I do not export in PDF
>>> through Obsidian because then I would be bound to the style of the active
>>> theme, namely, I would see the PDF file with a black background (I use the
>>> Dark Mode), font size, spacing, margins, etc. of Obsidian's active theme.
>>>
>>> Il giorno martedì 21 giugno 2022 alle 18:44:42 UTC+2
>>> paulschi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org ha scritto:
>>>
>>>> Good question! Thanks for reminding me of this. But exporting to PDF in
>>>> Obsidian with highlights should work automatically, no?
>>>>
>>>> On Tuesday, June 21, 2022 at 3:21:03 p.m. UTC+2 Emiliano wrote:
>>>>
>>>>> Any news about this feature for Pandoc? I use a lot the highlight
>>>>> syntax ('== ==') in Obsidian and it would be great if I could render my
>>>>> highlighted text in PDF (also in DOCX and ODT).
>>>>>
>>>>> Il giorno domenica 2 gennaio 2022 alle 17:52:44 UTC+1 Alx Nbl ha
>>>>> scritto:
>>>>>
>>>>>> My use case is different from paulschi, in my case i am trying to
>>>>>> convert docx into markdown and generating '== ==' syntax when there is
>>>>>> higlighted text in the docx file.
>>>>>>
>>>>>> On Sunday, January 2, 2022 at 3:09:42 PM UTC+1 Alx Nbl wrote:
>>>>>>
>>>>>>> Hi all. The '== ==' syntax is also used by Joplin app. I would also
>>>>>>> be very interested by such a feature.
>>>>>>>
>>>>>>> On Thursday, December 9, 2021 at 6:29:51 PM UTC+1 John MacFarlane
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On CriticMarkup, see
>>>>>>>>
>>>>>>>> https://github.com/jgm/pandoc/issues/2873
>>>>>>>> https://github.com/jgm/pandoc/issues/5430
>>>>>>>>
>>>>>>>>
>>>>>>>> Joseph Reagle <josep...-T1oY19WcHSwdnm+yROfE0A@public.gmane.org> writes:
>>>>>>>>
>>>>>>>> > BTW: If CommonMark or pandoc were to support highlight, I would
>>>>>>>> then wonder why not support all of CriticMarkup, which supports highlight
>>>>>>>> as `{== ==}` or `{>> <<}`. (It's a shame that we have two different
>>>>>>>> syntaxes emerging for highlight.)
>>>>>>>> >
>>>>>>>> > On 21-12-09 11:10, John MacFarlane wrote:
>>>>>>>> >>
>>>>>>>> >> If this is a syntax that is becoming common, we could consider
>>>>>>>> >> adding a markdown extension for it. You could open an issue on
>>>>>>>> >> our issue tracker.
>>>>>>>> >>
>>>>>>>> >> Joseph Reagle <josep...-T1oY19WcHSwdnm+yROfE0A@public.gmane.org> writes:
>>>>>>>> >>
>>>>>>>> >>> This is the first time I've encountered [this syntax][1] and it
>>>>>>>> is not natively supported by pandoc. Or am I wrong and you are saying
>>>>>>>> pandoc handles it when using the latex/PDF writer? (Or, are you saying
>>>>>>>> Obsidian can export to PDF, but not Word?)
>>>>>>>> >>>
>>>>>>>> >>> I see there's been some discussion on the [CommonMark
>>>>>>>> forum][2], but it doesn't look like you'd find an immediate solution.
>>>>>>>> >>>
>>>>>>>> >>> Using a filter or hacking something that converts `==foo==` to
>>>>>>>> [foo]{.highlight} that is properly rendered in Word might be options.
>>>>>>>> >>>
>>>>>>>> >>> [1]: https://www.markdownguide.org/extended-syntax/#highlight
>>>>>>>> >>> [2]:
>>>>>>>> https://talk.commonmark.org/t/highlighting-text-with-the-mark-element/840
>>>>>>>> >>>
>>>>>>>> >>> On 21-12-09 08:29, Paul wrote:
>>>>>>>> >>>> I use a lot of highlighting in my markdown editor Obsidian,
>>>>>>>> but I was wondering if there's a way to have that highlighting show up in
>>>>>>>> the Word or Libreoffice Writer files?
>>>>>>>> >>>>
>>>>>>>> >>>> Bold and italics work fine, as far as I can tell, and when
>>>>>>>> converting to a pdf the highlighting transfers great. I gather, however,
>>>>>>>> that the ==highlighting== is not standard in all markdown so is that the
>>>>>>>> issue?
>>>>>>>> >>>
>>>>>>>> >>> --
>>>>>>>> >>> You received this message because you are subscribed to the
>>>>>>>> Google Groups "pandoc-discuss" group.
>>>>>>>> >>> To unsubscribe from this group and stop receiving emails from
>>>>>>>> it, send an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>>>>> >>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/pandoc-discuss/9995ee8a-295e-1836-5645-9bb5ff76445d%40reagle.org.
>>>>>>>>
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > You received this message because you are subscribed to the
>>>>>>>> Google Groups "pandoc-discuss" group.
>>>>>>>> > To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>>>>> > To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/pandoc-discuss/9d89679a-94dc-2459-822f-93dbe4cbca57%40reagle.org.
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-discuss/ba18ff15-897d-4a7f-bbd4-3735da206f1dn%40googlegroups.com
>>> <https://groups.google.com/d/msgid/pandoc-discuss/ba18ff15-897d-4a7f-bbd4-3735da206f1dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/3316a007-a142-4d3d-a2f8-40befafb4249n%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/3316a007-a142-4d3d-a2f8-40befafb4249n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhA3F-VC--BMe2mpERZr%3DLmXZFNE61EwvHmfk0dwYp_ALw%40mail.gmail.com.

[-- Attachment #1.2: Type: text/html, Size: 13740 bytes --]

[-- Attachment #2: highlight-eq2span.pl --]
[-- Type: text/x-perl, Size: 6371 bytes --]

#!/usr/bin/env perl

use 5.010001;
use utf8;
# use utf8::all;
use strict;
use warnings;
use warnings FATAL => 'utf8';
use autodie;

use open qw[ :utf8 :std ];

use Getopt::Long qw[GetOptions
  :config bundling no_auto_abbrev no_ignore_case];
use Pod::Usage qw[pod2usage];
use Text::Balanced qw[extract_multiple];

my %opt = (
  attributes => '.hl',
  check_word_chars => 1,
  check_whitespace => 1,
  backslash_escapes => 1,
  backticks_code => 1,
  tilde_code_blocks => 1,
  tex_math_dollars => 1,
  tex_math_double_backslash => 0,
  tex_math_single_backslash => 0,
);

my @opts = grep { /_/ } keys %opt;

sub all {
  $opt{$_} = 1 for @opts;
}

sub none {
  $opt{$_} = 0 for @opts;
}

sub neg_opt {
  my($name) = @_;
  $name =~ s/^no_//;
  $opt{$name} = 0;
}

GetOptions(
  \%opt,
  'attributes|a=s',
  'check_whitespace|check-whitespace|s',
  'no_check_whitespace|no-check-whitespace|S' => \&neg_opt,
  'check_word_chars|check-word-chars|w',
  'no_check_word_chars|no-check-word-chars|W' => \&neg_opt,
  'backslash_escapes|backslash-escapes|b',
  'no_backslash_escapes|no-backslash-escapes|B' => \&neg_opt,
  'backticks_code|backticks-code|c',
  'no_backticks_code|no-backticks-code|C' => \&neg_opt,
  'tilde_code_blocks|tilde-code-blocks|t',
  'no_tilde_code_blocks|no-tilde-code-blocks|T' => \&neg_opt,
  'tex_math_dollars|tex-math-dollars|d',
  'no_tex_math_dollars|no-tex-math-dollars|D' => \&neg_opt,
  'tex_math_double_backslash|tex-math-double-backslash|db',
  'no_tex_math_double_backslash|no-tex-math-double-backslash|DB' => \&neg_opt,
  'tex_math_single_backslash|tex-math-single-backslash|sb',
  'no_tex_math_single_backslash|no-tex-math-single-backslash|SB' => \&neg_opt,
  'none|n' => \&none,
  'all|N|A' => \&all,
  'help|h' => sub { pod2usage(1) },
  'man|m' => sub { pod2usage( -verbose => 2) },
);

my $span_start = '[';
my $span_stop  = "]{$opt{attributes}}";

my @extractors;

if ( $opt{tex_math_double_backslash} ) 	{
  push @extractors, (
    qr{ \\\\ \( .+? \\\\ \) }msx,
    qr{ \\\\ \[ .+? \\\\ \] }msx,
  );
}
if ( $opt{tex_math_single_backslash} ) 	{
  push @extractors, (
    qr{ \\ \( .+? \\ \) }msx,
    qr{ \\ \[ .+? \\ \] }msx,
  );
}
push @extractors, qr{ \\. }msx if $opt{backslash_escapes};
push @extractors, qr[ ( ( \~{3,} ) .+? \g{-1} ) ]msx if $opt{tilde_code_blocks};
push @extractors, qr[ ( ( \`+ ) .+? \g{-1} ) ]msx if $opt{backticks_code};
if ( $opt{tex_math_dollars} ) {
  push @extractors, (
    qr{ \$\$ (?: [^\n] | (?<! \n ) \n (?! \n ) )+? \$\$ }msx,
    qr{ \$ (?! \s ) .+? (?<! \s ) \$ (?! \d ) }msx,
  );
}

{
  my $highlight = qr{
    #w (?<! [\pL\pN\p{Mn}] )
    \=\=
    #s (?! \s )
    ( .+? )
    #s (?<! \s )
    \=\=
    #w (?! [\pL\pN\p{Mn}] )
  }msx;
  if ( $opt{check_whitespace} ) {
    $highlight =~ s/#s//g;
  }
  if ( $opt{check_word_chars} ) {
    $highlight =~ s/#w//g;
  }
  push @extractors, +{ highlight => qr/$highlight/msx };
}

# Slurp stdin
my $text = do { local $/; <>; };

# Process the text
my @chunks = extract_multiple $text, \@extractors;
for my $chunk ( @chunks ) {
  if ( ref $chunk ) {
    $chunk = $span_start . $$chunk . $span_stop;
  }
}

print join "", @chunks;
    
__END__

=encoding UTF-8

=head1 NAME

highlight-eq2span.pl -- Replace Obsidian higlight runs with Pandoc spans

=head1 VERSION

This documentation describes version 0.001 of highlight-eq2span.pl

=head1 SYNOPSIS

    perl highlight-eq2span.pl [OPTIONS] <input.md >output.md

=head1 DESCRIPTION

highlight-eq2span.pl replaces C<==HIGHLIGHTED==> as understood
by Obsidian with Pandoc spans like C<[HIGHLIGHTED]{.hl}>.

This script is a regex-based text filter, with far simpler parsing
capabilities than Pandoc.
However it by default tries to leave B<==> sequences which are unlikely
to be highlighting markup alone. There are some command line
options to control this.

=head1 OPTIONS

=over

=item -a, --attributes STR

Use STR as attributes for Pandoc spans.

Default value: C<.hl>

=item -s, --check-whitespace 

Assume that opening C<==> delimiters are not followed by whitespace,
and that closing C<==> delimiters are not preceded by whitespace.

Default value: true

=item -S --no-check-whitespace

Set the -s option just above to false.

=item -w, --check-word-chars 

Assume that opening C<==> delimiters are not preceded by word-chars,
and that closing C<==> delimiters are not followed by word-chars.

Default value: true

=item -W --no-check-word-chars

Set the -w option just above to false.

=item -b, --backslash-escapes

Skip characters preceded by a backslash.
This notably includes C<\=>.

Default value: true

Note that the B<--db> and B<--sb> option below affect this option!

=item -B --no-backslash-escapes

Set the -b option just above to false.

=item -c, --backticks-code 

Skip chunks of text which look like block or inline
backticks-delimited code.

Default value: true

=item -C --no-backticks-code

Set the -c option just above to false.

=item -t, --tilde-code-blocks

Skip chunks of text which look like tilde-delimited code blocks.

Default value: true

=item -T --no-tilde-code-blocks

Set the -t option just above to false.

=item -d, --tex-math-dollars 

Skip chunks of text which look like block or inline $ delimited math.

Default value: true


=item -D --no-tex-math-dollars

Set the -d option just above to false.

=item --db, --tex-math-double-backticks 

Skip chunks of text which look like C<\\(...\\)> or C<\\[...\\]>
delimited math.

Default value: false

=item --DB --no-tex-math-double-backticks

Set the --db option just above to false.

=item --sb, --tex-math-single-backticks 

Skip chunks of text which look like C<\(...\)> or C<\[...\]>
delimited math.

Default value: false

=item --SB --no-tex-math-single-backticks

Set the --sb option just above to false.

=item -n, --none

Disable all switches.

=item -A, -N, --all

Enable all switches.

=item -h --help

Print usage help and exit.

=item -m, --man

Print full documentation and exit.

=head1 LICENSE

This software is copyright (c) 2022 by Benct Philip Jonsson.

This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.

http://dev.perl.org/licenses/

=head1 AUTHOR

Benct Philip Jonsson E<lt>bpjonsson@gmail.comE<gt>

=cut

# Vim: set ft=pod et ts=4 sts=4 sw=4 tw=72 cc=72:


# Vim: set ft=pod et ts=4 sts=4 sw=4 tw=72 cc=72:



  parent reply	other threads:[~2022-06-23 19:06 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09 13:29 Paul
     [not found] ` <b36d117c-bce8-4cda-acef-795fdc6d95dfn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-12-09 13:55   ` Joseph Reagle
     [not found]     ` <9995ee8a-295e-1836-5645-9bb5ff76445d-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
2021-12-09 14:23       ` Paul
2021-12-09 16:10       ` John MacFarlane
     [not found]         ` <m2czm5ep3u.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org>
2021-12-09 16:58           ` Paul
2021-12-09 17:11           ` Joseph Reagle
     [not found]             ` <9d89679a-94dc-2459-822f-93dbe4cbca57-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
2021-12-09 17:16               ` Paul
2021-12-09 17:29               ` John MacFarlane
     [not found]                 ` <m2r1ald6vy.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org>
2022-01-02 14:09                   ` Alx Nbl
     [not found]                     ` <2cf7ddb7-c135-441c-8758-d780938bb5ffn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-01-02 16:52                       ` Alx Nbl
     [not found]                         ` <c0083e12-4b71-4fd1-a701-f6ea922a1f98n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-06-21 13:21                           ` Emiliano
     [not found]                             ` <beef58d1-ac94-4f5a-9405-ecfbff6caa8cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-06-21 16:43                               ` John MacFarlane
2022-06-21 16:44                               ` -
     [not found]                                 ` <489be9a1-e45a-4bee-ab8d-ce83ca7ed292n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-06-22 14:10                                   ` Emiliano
     [not found]                                     ` <ba18ff15-897d-4a7f-bbd4-3735da206f1dn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-06-22 17:44                                       ` BPJ
     [not found]                                         ` <CADAJKhCSGnvyAP=OSkNB_JRhwUgdtZ0Do8bNScw+b-aQDWwzWQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-06-23 11:14                                           ` Emiliano
     [not found]                                             ` <3316a007-a142-4d3d-a2f8-40befafb4249n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-06-23 19:06                                               ` BPJ [this message]
     [not found]                                                 ` <CADAJKhA3F-VC--BMe2mpERZr=LmXZFNE61EwvHmfk0dwYp_ALw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-06-23 19:45                                                   ` Emiliano
2022-06-24  8:46                                                   ` BPJ
     [not found]                                                     ` <CADAJKhAq4vmgvNP7VFduvLQ-EAPeGry+-gNcuFYJFpnDbZ02Bw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-06-24  9:04                                                       ` BPJ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADAJKhA3F-VC--BMe2mpERZr=LmXZFNE61EwvHmfk0dwYp_ALw@mail.gmail.com' \
    --to=bpj-j3h7gcxpsitlodktgw+v6w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).