From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/23334 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Ken Dow Newsgroups: gmane.text.pandoc Subject: Re: Replace Str with HTML in Lua Filter Date: Thu, 29 Aug 2019 13:24:03 -0700 (PDT) Message-ID: References: <8736i9qa95.fsf@zeitkraut.de> <87a12669-ed81-4ce4-aa8e-eb5d3d64bf3d@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_42_1806725848.1567110243823" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="257759"; mail-complaints-to="usenet@blaine.gmane.org" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBD7LJ7PVSXEJB62ECDSORIHLAUNAIPVJZV4Q-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Aug 29 22:24:07 2019 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-ot1-f63.google.com ([209.85.210.63]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from ) id 1i3Qxi-0014sc-PC for gtp-pandoc-discuss@m.gmane.org; Thu, 29 Aug 2019 22:24:06 +0200 Original-Received: by mail-ot1-f63.google.com with SMTP id t26sf2404690otm.9 for ; Thu, 29 Aug 2019 13:24:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=bSDLh6rpw1hXCMejq+tQT3dmzLsBLCnrjq+ZVfa93lo=; b=h1LfIXn60QdsfInPcq8qL2C2mhXgwVd4QM0lZQhE2PIKP7vKX2ZwjR63x777UcFr1Z E+mRWaQ6JXuD6iMNJmJhxiT4gK/K6FQJwWTweDH+bPZNw1AkOiyqNPPK3c+klIehWNNU mxFQn5UaGcylyk7PFxYZTOvi6du5LqERa19qFdVIbLydS85hmZd1xYSjwYSf//RLi3re zTivDuP0WSQw49TwCeXFg1wUftRc4CFsRF5CxpnqJDTvkbW7aHh1943hbfYnSswzXc2y KblVLMWxBN7C39n8vbmZpqXTXpX8ntzgZtI45L5MbSaeENn6FX4YZyf//L9iCUav5fqM Sn+Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=bSDLh6rpw1hXCMejq+tQT3dmzLsBLCnrjq+ZVfa93lo=; b=OboPFZD1/POIVp6xtXsVanP4wXCLSH9+Qlnur4DUDR5kJeINHtX8asBKgHX3tXPtox onS8Q9x+lduDx+5Y2ThU5dNK+B1UtNrSD3r4rv/vbSBjz2Ek4rjB5wBIrC7VzQ72hfI+ KlScTECd8PZ6PAWgXD4SGgfRLj++4zOUdme0BytF4L4NFEShfG14pN9rJkOFscD0Dcbv LsWSfHmPIpWQKLj8tw3zw7Aak8Qox2vm4S50XK2VCPByhRI4QB6LnnvLBvbWwHd6J/HO s9Zq98U3PqfQoQ3QphwTSCQDt/wvtuSQ4SrL1rqFWDgoiw7diLfShfgmLfIKVTQgiIxx JerA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=bSDLh6rpw1hXCMejq+tQT3dmzLsBLCnrjq+ZVfa93lo=; b=s5J63XBKeyq3/wdWzfWC/pfQ8o1/zYTDZDiG/j+UxSW+j9ill7tgeKO2mTrvXMQ3XC TCi2qvPlL8mwJlH3IUvXQ8kquyUqytpzY13B7m02XFUFvT4aZnlDeTDbcja7dXbL6GSM 6Gf3nKCDllSDNt8fsx+i+RPxO0oc19sInCDYGkWQVSvvO+MLgNQxlQFkg0oktXWThhJJ T1WU9YbznXMkkfbooKXhx+3zjlpJyI0vsxVjSb/V2LAQqDgZX8OQrVNkm0ye7ESJNdXC hBTwOwDW0QnTlzIluxqefHLFWGYiq61BGXXnOCZrpS7x8Jwi7iSuZbE8nwGiEH4tTOiE j7FQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: APjAAAX+gWsHKAenP0gl+zvsG4UA4sI7dF/3Fj2Lc6Ss379w8YvB5QG3 s0c8LxODV4v24Tf45BYQ3m0= X-Google-Smtp-Source: APXvYqyLAfx/HetDeGm5fK1OAAagOOh8obfDi7fkehrdhVrnYc/bMAeeNqpm3KP60u1GVMc/afVGYg== X-Received: by 2002:aca:c355:: with SMTP id t82mr8181923oif.99.1567110245487; Thu, 29 Aug 2019 13:24:05 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a9d:6d15:: with SMTP id o21ls141860otp.2.gmail; Thu, 29 Aug 2019 13:24:04 -0700 (PDT) X-Received: by 2002:a05:6830:1bd9:: with SMTP id v25mr9153125ota.205.1567110244419; Thu, 29 Aug 2019 13:24:04 -0700 (PDT) In-Reply-To: X-Original-Sender: thekenshow-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:23334 Archived-At: ------=_Part_42_1806725848.1567110243823 Content-Type: multipart/alternative; boundary="----=_Part_43_894227285.1567110243823" ------=_Part_43_894227285.1567110243823 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks John - copying & pasting the unicode from the HTML output into the= =20 Lua filter did the trick. Should've thought of that! On Thursday, 29 August 2019 13:51:49 UTC-4, John MacFarlane wrote: > > > Or better yet just use the unicode character (make sure your=20 > lua filter is UTF-8 encoded):=20 > > s.text =3D=3D '=E2=80=93'=20 > > Ken Dow > writes:=20 > > > Thanks for the help (Sorry for the long delay - I didn't get notified o= f=20 > > your post).=20 > >=20 > > I tried your suggestion and it works perfectly when searching for norma= l=20 > > text (e.g., s.text =3D=3D "Widget") but with s.text =3D=3D "\8211", Pan= doc=20 > throws=20 > > the following error:=20 > >=20 > > decimal escape too large near '"\5881'=20 > >=20 > > Single quotes (.e.g, s.text =3D=3D '\8211') gives the same error. I tri= ed=20 > > "\\8211" in case the backslash needs to be escaped; no error but no=20 > > replacement occurs.=20 > >=20 > > Finally, I tried the utf8.codes approach, referring to Material Icon=20 > > codepoints doc for the value that should match, like so:=20 > >=20 > > function Str (s)=20 > > if utf8.codes(s.text) =3D=3D 'e5c3' then=20 > > return pandoc.RawInline(=20 > > 'html',=20 > > 'apps'=20 > > )=20 > > end=20 > > end=20 > >=20 > > No error but no replacement.=20 > >=20 > > On Saturday, 10 August 2019 12:02:40 UTC-4, Albert Krewinkel wrote:=20 > >>=20 > >> Ken Dow writes:=20 > >>=20 > >> > My DOCX source document, which is being converted to HTML, uses some= =20 > >> Google=20 > >> > Material fonts. What shows up in the AST are values like=20 > >> >=20 > >> > Str "\8211"=20 > >> >=20 > >> > I'd like to find and replace those to produce something like the=20 > >> following=20 > >> > HTML:=20 > >> >=20 > >> > face=20 > >> >=20 > >> > Is that possible and if so, how?=20 > >>=20 > >> The way to go here is via `RawInline` elements, e.g.:=20 > >>=20 > >> function Str (s)=20 > >> if s.text =3D=3D '=E2=80=93' then=20 > >> return pandoc.RawInline(=20 > >> 'html',=20 > >> 'face'=20 > >> )=20 > >> end=20 > >> end=20 > >>=20 > >> Note matching on an exact string would fail if the character was=20 > >> somewhere within a word (a typical would be em-dashes). One would have= =20 > >> to use the [utf8.codes] module to manually find and replace those=20 > >> characters in that case.=20 > >>=20 > >> [utf8.codes](https://www.lua.org/manual/5.3/manual.html#pdf-utf8.codes= )=20 > > >>=20 > >> --=20 > >> Albert Krewinkel=20 > >> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124=20 > >>=20 > >=20 > > --=20 > > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group.=20 > > To unsubscribe from this group and stop receiving emails from it, send= =20 > an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org .=20 > > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/87a12669-ed81-4ce4-aa8e-= eb5d3d64bf3d%40googlegroups.com.=20 > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/f48093d8-f00b-4287-9b31-abd24912d17d%40googlegroups.com. ------=_Part_43_894227285.1567110243823 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks John - copying & pasting the unicode from the H= TML output into the Lua filter did the trick. Should've thought of that= !

On Thursday, 29 August 2019 13:51:49 UTC-4, John MacFarlane wrote= :

Or better yet just use the unicode character (make sure your
lua filter is UTF-8 encoded):

s.text =3D=3D '=E2=80=93'

Ken Dow <theke...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Thanks for the help (Sorry for the long delay - I didn't get n= otified of=20
> your post).
>
> I tried your suggestion and it works perfectly when searching for = normal=20
> text (e.g., s.text =3D=3D "Widget") but with s.text =3D= =3D "\8211", Pandoc throws=20
> the following error:
>
> decimal escape too large near '"\5881'
>
> Single quotes (.e.g, s.text =3D=3D '\8211') gives the same= error. I tried=20
> "\\8211" in case the backslash needs to be escaped; no e= rror but no=20
> replacement occurs.
>
> Finally, I tried the utf8.codes approach, referring to Material Ic= on=20
> codepoints doc for the value that should match, like so:
>
> function Str (s)
> =C2=A0 if utf8.codes(s.text) =3D=3D 'e5c3' then
> =C2=A0 =C2=A0 return pandoc.RawInline(
> =C2=A0 =C2=A0 =C2=A0 'html',
> =C2=A0 =C2=A0 =C2=A0 '<i class=3D"material-icons"= >apps</i>'
> =C2=A0 =C2=A0 )
> =C2=A0 end
> end=20
>
> No error but no replacement.=20
>
> On Saturday, 10 August 2019 12:02:40 UTC-4, Albert Krewinkel wrote= :
>>
>> Ken Dow writes:=20
>>
>> > My DOCX source document, which is being converted to HTML= , uses some=20
>> Google=20
>> > Material fonts. What shows up in the AST are values like= =20
>> >=20
>> > Str "\8211"=20
>> >=20
>> > I'd like to find and replace those to produce somethi= ng like the=20
>> following=20
>> > HTML:=20
>> >=20
>> > <i class=3D"material-icons">face</i>=20
>> >=20
>> > Is that possible and if so, how?=20
>>
>> The way to go here is via `RawInline` elements, e.g.:=20
>>
>> =C2=A0 =C2=A0 function Str (s)=20
>> =C2=A0 =C2=A0 =C2=A0 if s.text =3D=3D '=E2=80=93' then= =20
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 return pandoc.RawInline(=20
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 'html',=20
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 '<i class=3D"ma= terial-icons">face</i>'=20
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 )=20
>> =C2=A0 =C2=A0 =C2=A0 end=20
>> =C2=A0 =C2=A0 end=20
>>
>> Note matching on an exact string would fail if the character w= as=20
>> somewhere within a word (a typical would be em-dashes). One wo= uld have=20
>> to use the [utf8.codes] module to manually find and replace th= ose=20
>> characters in that case.=20
>>
>> [utf8.codes](https://www.lua.org/= manual/5.3/manual.html#pdf-utf8.codes)=20
>>
>> --=20
>> Albert Krewinkel=20
>> GPG: 8eed e3e2 e8c5 6f18 81fe =C2=A0e836 388d c0b2 1f63 1124= =20
>>
>
> --=20
> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email to pandoc-...@googlegroups.com.
> To view this discussion on the web visit https://groups.= google.com/d/msgid/pandoc-discuss/87a12669-ed81-4ce4-aa8e-eb5d3d6= 4bf3d%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/f48093d8-f00b-4287-9b31-abd24912d17d%40googlegroups.co= m.
------=_Part_43_894227285.1567110243823-- ------=_Part_42_1806725848.1567110243823--