From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/23329 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Ken Dow Newsgroups: gmane.text.pandoc Subject: Re: Replace Str with HTML in Lua Filter Date: Thu, 29 Aug 2019 08:06:27 -0700 (PDT) Message-ID: <87a12669-ed81-4ce4-aa8e-eb5d3d64bf3d@googlegroups.com> References: <8736i9qa95.fsf@zeitkraut.de> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_2212_1560050711.1567091187883" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="236199"; mail-complaints-to="usenet@blaine.gmane.org" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBD7LJ7PVSXEJB62ECD2NHH7LAUNAIOGHP7RQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Aug 29 17:06:32 2019 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-ot1-f61.google.com ([209.85.210.61]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from ) id 1i3M0O-000zD5-6S for gtp-pandoc-discuss@m.gmane.org; Thu, 29 Aug 2019 17:06:32 +0200 Original-Received: by mail-ot1-f61.google.com with SMTP id c25sf1829631otp.15 for ; Thu, 29 Aug 2019 08:06:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=T8WxLikpvVFSC7bgMOhVLNJhYVStjWdAnm+ns5o/9YU=; b=K2htcWjKGjjhLGGF2JybabmT8zX6W9tcH0NKppnYmdycH81FsutwAEoIymIqN7/bA1 wLWgGOgFr2wUL0zXRVrq6KF/n/8iT+W5PKdy/htInhp5+DxaSucG4QJwMsfY/OnvkqHK AgBmOuhyIiAdhglcApz/q6kiaGJpvdj8Ti4zrl7IKUWmogN3P+0J+u/Rzs/sidJv2HuY +1NKqbD3KLyls0EzEpgvzUW7UIjiHVtPJs/fsvuoJfPExK0rhYZXVYbdx9YpY5hHiJ1L XAot4VGGgOTftv0Py0/5zJmE+vIBc0KuDMwl3dRpHPflXTBpOfZAFTey3KIX+WJhCW/b /Tuw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=T8WxLikpvVFSC7bgMOhVLNJhYVStjWdAnm+ns5o/9YU=; b=ij2Ki2KLejQx2EZFXROVEAYneJR8POcxGI3RJfWnWVn4LRK9S3QZwcIU7GZp6hcn08 3oIz0bctw92+z1zLqWfA+tUiTG+tpA1vRRGUgRfJo5FYlU/77yAaYaOJzO85NslUMAgk SkSkpf11KitUG6vdCJQnXajB6SNe+biWF9Z6QbPV60jp8SdcX0CUxlHA8DZhjig4ZzUZ 6plakOy/tYa/N4TC7TzZL6jq5akP76jHnHvBqBvg17vph+NzLT4vdNQxTyh8qip4P5Pr qb9Cy5AqvJBNH6kW5T6fqmb0Vo+TlAzVPS5IY5WNl/lYOrUYaUiMYI88JoD0d6CYDxXj 6GXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=T8WxLikpvVFSC7bgMOhVLNJhYVStjWdAnm+ns5o/9YU=; b=A6V9AgkQTDj7l/c7LDZ5xK9ItLd8Jwhpb8AVaoZ/Gu6rkgQJmAS88CTs33YBcF7dIc uKigOAiLu6OzvSHDfPZzWxtYche1Kt6DiylTxNzK46CM3mrsMYw/tlehZQ+4TIOZgwHN GwhvNf6OmqvbwzdXy+BHv2cEln6Fu77/OAHrvZYRDOfvAPuwOKeDi2UuLhYOATg7AG51 4SYDVc9dyj65e0X/5X4EnyTrquUrleVVbyw98H23d/BPBY3hCZWZbQiSB2Uf/Hp4C70f KBbFUEUu23esLqI7LwrfAVK7P8EQdLuUuHOIan44dsZUkIboyQB9ttwJzEqmnImpr/VQ 8hvw== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: APjAAAX2xKJhTWE7uLSzrKU2B5cmXWNL6m0eO4LicZAMryhYeO9W6n7D aEfV0GFnSoV8NJKKAsVLysc= X-Google-Smtp-Source: APXvYqyeiCSPEkpPIbVX1rtz9rPmlYZdvzrZ4Z81YK8Oatorpj2aBzTwM+Rl38GMW+58Cnm40HeYjg== X-Received: by 2002:aca:d552:: with SMTP id m79mr6938122oig.157.1567091190563; Thu, 29 Aug 2019 08:06:30 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:aca:6046:: with SMTP id u67ls986187oib.1.gmail; Thu, 29 Aug 2019 08:06:28 -0700 (PDT) X-Received: by 2002:aca:c644:: with SMTP id w65mr6501302oif.41.1567091188461; Thu, 29 Aug 2019 08:06:28 -0700 (PDT) In-Reply-To: <8736i9qa95.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> X-Original-Sender: thekenshow-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:23329 Archived-At: ------=_Part_2212_1560050711.1567091187883 Content-Type: multipart/alternative; boundary="----=_Part_2213_1223566860.1567091187884" ------=_Part_2213_1223566860.1567091187884 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for the help (Sorry for the long delay - I didn't get notified of=20 your post). I tried your suggestion and it works perfectly when searching for normal=20 text (e.g., s.text =3D=3D "Widget") but with s.text =3D=3D "\8211", Pandoc = throws=20 the following error: decimal escape too large near '"\5881' Single quotes (.e.g, s.text =3D=3D '\8211') gives the same error. I tried= =20 "\\8211" in case the backslash needs to be escaped; no error but no=20 replacement occurs. Finally, I tried the utf8.codes approach, referring to Material Icon=20 codepoints doc for the value that should match, like so: function Str (s) if utf8.codes(s.text) =3D=3D 'e5c3' then return pandoc.RawInline( 'html', 'apps' ) end end=20 No error but no replacement.=20 On Saturday, 10 August 2019 12:02:40 UTC-4, Albert Krewinkel wrote: > > Ken Dow writes:=20 > > > My DOCX source document, which is being converted to HTML, uses some=20 > Google=20 > > Material fonts. What shows up in the AST are values like=20 > >=20 > > Str "\8211"=20 > >=20 > > I'd like to find and replace those to produce something like the=20 > following=20 > > HTML:=20 > >=20 > > face=20 > >=20 > > Is that possible and if so, how?=20 > > The way to go here is via `RawInline` elements, e.g.:=20 > > function Str (s)=20 > if s.text =3D=3D '=E2=80=93' then=20 > return pandoc.RawInline(=20 > 'html',=20 > 'face'=20 > )=20 > end=20 > end=20 > > Note matching on an exact string would fail if the character was=20 > somewhere within a word (a typical would be em-dashes). One would have=20 > to use the [utf8.codes] module to manually find and replace those=20 > characters in that case.=20 > > [utf8.codes](https://www.lua.org/manual/5.3/manual.html#pdf-utf8.codes)= =20 > > --=20 > Albert Krewinkel=20 > GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124=20 > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/87a12669-ed81-4ce4-aa8e-eb5d3d64bf3d%40googlegroups.com. ------=_Part_2213_1223566860.1567091187884 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for the help (Sorry for the long delay - I did= n't get notified of your post).

I tried yo= ur suggestion and it works perfectly when searching for normal text (e.g., = s.text =3D=3D "Widget") but with s.text =3D=3D "\8211",= Pandoc throws the following error:

decimal escape too large near '"\= 5881'

Single quotes = (.e.g, s.text =3D=3D '\8211') gives the same error. I tried "\= \8211" in case the backslash needs to be escaped; no error but no repl= acement occurs.

Finally, I tried the utf8.codes ap= proach, referring to Material Icon codepoints doc for the value that should= match, like so:

function Str (s)
=C2=A0 if utf8.codes(s.text= ) <= span style=3D"color: #660;" class=3D"styled-by-prettify">=3D=3D
'e5c3' then
=C2=A0 =C2=A0
return pandoc.RawInline(
=C2=A0 =C2=A0 =C2=A0
'html',
=C2=A0 =C2=A0 =C2=A0
'<i class=3D"material-i= cons">apps</i>'
=C2=A0 =C2=A0
)
=C2=A0
end
end

No error but no replacement.
=
On Saturday, 10 August 2019 12:02:40 UTC-4, Albert Krewinkel wrote:Ken Dow writes:

> My DOCX source document, which is being converted to HTML, uses so= me Google
> Material fonts. What shows up in the AST are values like
>
> Str "\8211"
>
> I'd like to find and replace those to produce something like t= he following
> HTML:
>
> <i class=3D"material-icons">face</i>
>
> Is that possible and if so, how?

The way to go here is via `RawInline` elements, e.g.:

=C2=A0 =C2=A0 function Str (s)
=C2=A0 =C2=A0 =C2=A0 if s.text =3D=3D '=E2=80=93' then
=C2=A0 =C2=A0 =C2=A0 =C2=A0 return pandoc.RawInline(
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 'html',
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 '<i class=3D"material-ic= ons">face</i>'
=C2=A0 =C2=A0 =C2=A0 =C2=A0 )
=C2=A0 =C2=A0 =C2=A0 end
=C2=A0 =C2=A0 end

Note matching on an exact string would fail if the character was
somewhere within a word (a typical would be em-dashes). One would have
to use the [utf8.codes] module to manually find and replace those
characters in that case.

[utf8.codes](https://www.lua.org/manual/5.= 3/manual.html#pdf-utf8.codes)

--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe =C2=A0e836 388d c0b2 1f63 1124

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/87a12669-ed81-4ce4-aa8e-eb5d3d64bf3d%40googlegroups.co= m.
------=_Part_2213_1223566860.1567091187884-- ------=_Part_2212_1560050711.1567091187883--