public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Replace Str with HTML in Lua Filter
@ 2019-08-09 19:28 Ken Dow
       [not found] ` <abe5ae45-2ad8-419b-a282-5b5e1b4fcda1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Ken Dow @ 2019-08-09 19:28 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 711 bytes --]

My DOCX source document, which is being converted to HTML, uses some Google 
Material fonts. What shows up in the AST are values like

Str "\8211"

I'd like to find and replace those to produce something like the following 
HTML:

<i class="material-icons">face</i>

Is that possible and if so, how? 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/abe5ae45-2ad8-419b-a282-5b5e1b4fcda1%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2412 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Replace Str with HTML in Lua Filter
       [not found] ` <abe5ae45-2ad8-419b-a282-5b5e1b4fcda1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-08-10 16:02   ` Albert Krewinkel
       [not found]     ` <8736i9qa95.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Albert Krewinkel @ 2019-08-10 16:02 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Ken Dow writes:

> My DOCX source document, which is being converted to HTML, uses some Google
> Material fonts. What shows up in the AST are values like
>
> Str "\8211"
>
> I'd like to find and replace those to produce something like the following
> HTML:
>
> <i class="material-icons">face</i>
>
> Is that possible and if so, how?

The way to go here is via `RawInline` elements, e.g.:

    function Str (s)
      if s.text == '–' then
        return pandoc.RawInline(
          'html',
          '<i class="material-icons">face</i>'
        )
      end
    end

Note matching on an exact string would fail if the character was
somewhere within a word (a typical would be em-dashes). One would have
to use the [utf8.codes] module to manually find and replace those
characters in that case.

[utf8.codes](https://www.lua.org/manual/5.3/manual.html#pdf-utf8.codes)

--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/8736i9qa95.fsf%40zeitkraut.de.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Replace Str with HTML in Lua Filter
       [not found]     ` <8736i9qa95.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2019-08-29 15:06       ` Ken Dow
       [not found]         ` <87a12669-ed81-4ce4-aa8e-eb5d3d64bf3d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Ken Dow @ 2019-08-29 15:06 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2375 bytes --]

Thanks for the help (Sorry for the long delay - I didn't get notified of 
your post).

I tried your suggestion and it works perfectly when searching for normal 
text (e.g., s.text == "Widget") but with s.text == "\8211", Pandoc throws 
the following error:

decimal escape too large near '"\5881'

Single quotes (.e.g, s.text == '\8211') gives the same error. I tried 
"\\8211" in case the backslash needs to be escaped; no error but no 
replacement occurs.

Finally, I tried the utf8.codes approach, referring to Material Icon 
codepoints doc for the value that should match, like so:

function Str (s)
  if utf8.codes(s.text) == 'e5c3' then
    return pandoc.RawInline(
      'html',
      '<i class="material-icons">apps</i>'
    )
  end
end 

No error but no replacement. 

On Saturday, 10 August 2019 12:02:40 UTC-4, Albert Krewinkel wrote:
>
> Ken Dow writes: 
>
> > My DOCX source document, which is being converted to HTML, uses some 
> Google 
> > Material fonts. What shows up in the AST are values like 
> > 
> > Str "\8211" 
> > 
> > I'd like to find and replace those to produce something like the 
> following 
> > HTML: 
> > 
> > <i class="material-icons">face</i> 
> > 
> > Is that possible and if so, how? 
>
> The way to go here is via `RawInline` elements, e.g.: 
>
>     function Str (s) 
>       if s.text == '–' then 
>         return pandoc.RawInline( 
>           'html', 
>           '<i class="material-icons">face</i>' 
>         ) 
>       end 
>     end 
>
> Note matching on an exact string would fail if the character was 
> somewhere within a word (a typical would be em-dashes). One would have 
> to use the [utf8.codes] module to manually find and replace those 
> characters in that case. 
>
> [utf8.codes](https://www.lua.org/manual/5.3/manual.html#pdf-utf8.codes) 
>
> -- 
> Albert Krewinkel 
> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87a12669-ed81-4ce4-aa8e-eb5d3d64bf3d%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 6962 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Replace Str with HTML in Lua Filter
       [not found]         ` <87a12669-ed81-4ce4-aa8e-eb5d3d64bf3d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-08-29 17:50           ` John MacFarlane
  2019-08-29 17:51           ` John MacFarlane
  1 sibling, 0 replies; 6+ messages in thread
From: John MacFarlane @ 2019-08-29 17:50 UTC (permalink / raw)
  To: Ken Dow, pandoc-discuss


In Haskell you can use "\5881" but in lua this won't work.
Try "\u{16F9}".



Ken Dow <thekenshow-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Thanks for the help (Sorry for the long delay - I didn't get notified of 
> your post).
>
> I tried your suggestion and it works perfectly when searching for normal 
> text (e.g., s.text == "Widget") but with s.text == "\8211", Pandoc throws 
> the following error:
>
> decimal escape too large near '"\5881'
>
> Single quotes (.e.g, s.text == '\8211') gives the same error. I tried 
> "\\8211" in case the backslash needs to be escaped; no error but no 
> replacement occurs.
>
> Finally, I tried the utf8.codes approach, referring to Material Icon 
> codepoints doc for the value that should match, like so:
>
> function Str (s)
>   if utf8.codes(s.text) == 'e5c3' then
>     return pandoc.RawInline(
>       'html',
>       '<i class="material-icons">apps</i>'
>     )
>   end
> end 
>
> No error but no replacement. 
>
> On Saturday, 10 August 2019 12:02:40 UTC-4, Albert Krewinkel wrote:
>>
>> Ken Dow writes: 
>>
>> > My DOCX source document, which is being converted to HTML, uses some 
>> Google 
>> > Material fonts. What shows up in the AST are values like 
>> > 
>> > Str "\8211" 
>> > 
>> > I'd like to find and replace those to produce something like the 
>> following 
>> > HTML: 
>> > 
>> > <i class="material-icons">face</i> 
>> > 
>> > Is that possible and if so, how? 
>>
>> The way to go here is via `RawInline` elements, e.g.: 
>>
>>     function Str (s) 
>>       if s.text == '–' then 
>>         return pandoc.RawInline( 
>>           'html', 
>>           '<i class="material-icons">face</i>' 
>>         ) 
>>       end 
>>     end 
>>
>> Note matching on an exact string would fail if the character was 
>> somewhere within a word (a typical would be em-dashes). One would have 
>> to use the [utf8.codes] module to manually find and replace those 
>> characters in that case. 
>>
>> [utf8.codes](https://www.lua.org/manual/5.3/manual.html#pdf-utf8.codes) 
>>
>> -- 
>> Albert Krewinkel 
>> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124 
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87a12669-ed81-4ce4-aa8e-eb5d3d64bf3d%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2pnknamjr.fsf%40johnmacfarlane.net.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Replace Str with HTML in Lua Filter
       [not found]         ` <87a12669-ed81-4ce4-aa8e-eb5d3d64bf3d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2019-08-29 17:50           ` John MacFarlane
@ 2019-08-29 17:51           ` John MacFarlane
       [not found]             ` <m2muframhm.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  1 sibling, 1 reply; 6+ messages in thread
From: John MacFarlane @ 2019-08-29 17:51 UTC (permalink / raw)
  To: Ken Dow, pandoc-discuss


Or better yet just use the unicode character (make sure your
lua filter is UTF-8 encoded):

s.text == '–'

Ken Dow <thekenshow-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Thanks for the help (Sorry for the long delay - I didn't get notified of 
> your post).
>
> I tried your suggestion and it works perfectly when searching for normal 
> text (e.g., s.text == "Widget") but with s.text == "\8211", Pandoc throws 
> the following error:
>
> decimal escape too large near '"\5881'
>
> Single quotes (.e.g, s.text == '\8211') gives the same error. I tried 
> "\\8211" in case the backslash needs to be escaped; no error but no 
> replacement occurs.
>
> Finally, I tried the utf8.codes approach, referring to Material Icon 
> codepoints doc for the value that should match, like so:
>
> function Str (s)
>   if utf8.codes(s.text) == 'e5c3' then
>     return pandoc.RawInline(
>       'html',
>       '<i class="material-icons">apps</i>'
>     )
>   end
> end 
>
> No error but no replacement. 
>
> On Saturday, 10 August 2019 12:02:40 UTC-4, Albert Krewinkel wrote:
>>
>> Ken Dow writes: 
>>
>> > My DOCX source document, which is being converted to HTML, uses some 
>> Google 
>> > Material fonts. What shows up in the AST are values like 
>> > 
>> > Str "\8211" 
>> > 
>> > I'd like to find and replace those to produce something like the 
>> following 
>> > HTML: 
>> > 
>> > <i class="material-icons">face</i> 
>> > 
>> > Is that possible and if so, how? 
>>
>> The way to go here is via `RawInline` elements, e.g.: 
>>
>>     function Str (s) 
>>       if s.text == '–' then 
>>         return pandoc.RawInline( 
>>           'html', 
>>           '<i class="material-icons">face</i>' 
>>         ) 
>>       end 
>>     end 
>>
>> Note matching on an exact string would fail if the character was 
>> somewhere within a word (a typical would be em-dashes). One would have 
>> to use the [utf8.codes] module to manually find and replace those 
>> characters in that case. 
>>
>> [utf8.codes](https://www.lua.org/manual/5.3/manual.html#pdf-utf8.codes) 
>>
>> -- 
>> Albert Krewinkel 
>> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124 
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87a12669-ed81-4ce4-aa8e-eb5d3d64bf3d%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2muframhm.fsf%40johnmacfarlane.net.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Replace Str with HTML in Lua Filter
       [not found]             ` <m2muframhm.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-08-29 20:24               ` Ken Dow
  0 siblings, 0 replies; 6+ messages in thread
From: Ken Dow @ 2019-08-29 20:24 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3496 bytes --]

Thanks John - copying & pasting the unicode from the HTML output into the 
Lua filter did the trick. Should've thought of that!

On Thursday, 29 August 2019 13:51:49 UTC-4, John MacFarlane wrote:
>
>
> Or better yet just use the unicode character (make sure your 
> lua filter is UTF-8 encoded): 
>
> s.text == '–' 
>
> Ken Dow <theke...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: 
>
> > Thanks for the help (Sorry for the long delay - I didn't get notified of 
> > your post). 
> > 
> > I tried your suggestion and it works perfectly when searching for normal 
> > text (e.g., s.text == "Widget") but with s.text == "\8211", Pandoc 
> throws 
> > the following error: 
> > 
> > decimal escape too large near '"\5881' 
> > 
> > Single quotes (.e.g, s.text == '\8211') gives the same error. I tried 
> > "\\8211" in case the backslash needs to be escaped; no error but no 
> > replacement occurs. 
> > 
> > Finally, I tried the utf8.codes approach, referring to Material Icon 
> > codepoints doc for the value that should match, like so: 
> > 
> > function Str (s) 
> >   if utf8.codes(s.text) == 'e5c3' then 
> >     return pandoc.RawInline( 
> >       'html', 
> >       '<i class="material-icons">apps</i>' 
> >     ) 
> >   end 
> > end 
> > 
> > No error but no replacement. 
> > 
> > On Saturday, 10 August 2019 12:02:40 UTC-4, Albert Krewinkel wrote: 
> >> 
> >> Ken Dow writes: 
> >> 
> >> > My DOCX source document, which is being converted to HTML, uses some 
> >> Google 
> >> > Material fonts. What shows up in the AST are values like 
> >> > 
> >> > Str "\8211" 
> >> > 
> >> > I'd like to find and replace those to produce something like the 
> >> following 
> >> > HTML: 
> >> > 
> >> > <i class="material-icons">face</i> 
> >> > 
> >> > Is that possible and if so, how? 
> >> 
> >> The way to go here is via `RawInline` elements, e.g.: 
> >> 
> >>     function Str (s) 
> >>       if s.text == '–' then 
> >>         return pandoc.RawInline( 
> >>           'html', 
> >>           '<i class="material-icons">face</i>' 
> >>         ) 
> >>       end 
> >>     end 
> >> 
> >> Note matching on an exact string would fail if the character was 
> >> somewhere within a word (a typical would be em-dashes). One would have 
> >> to use the [utf8.codes] module to manually find and replace those 
> >> characters in that case. 
> >> 
> >> [utf8.codes](https://www.lua.org/manual/5.3/manual.html#pdf-utf8.codes) 
>
> >> 
> >> -- 
> >> Albert Krewinkel 
> >> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124 
> >> 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/87a12669-ed81-4ce4-aa8e-eb5d3d64bf3d%40googlegroups.com. 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f48093d8-f00b-4287-9b31-abd24912d17d%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 6048 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-08-29 20:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-09 19:28 Replace Str with HTML in Lua Filter Ken Dow
     [not found] ` <abe5ae45-2ad8-419b-a282-5b5e1b4fcda1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-08-10 16:02   ` Albert Krewinkel
     [not found]     ` <8736i9qa95.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2019-08-29 15:06       ` Ken Dow
     [not found]         ` <87a12669-ed81-4ce4-aa8e-eb5d3d64bf3d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-08-29 17:50           ` John MacFarlane
2019-08-29 17:51           ` John MacFarlane
     [not found]             ` <m2muframhm.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-08-29 20:24               ` Ken Dow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).