From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/30908 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "r.d.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" Newsgroups: gmane.text.pandoc Subject: Re: Normalizing spaces in italics Date: Sat, 2 Jul 2022 14:13:17 -0700 (PDT) Message-ID: References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_604_956815091.1656796397214" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="10615"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBD7YREXNVQJBB3XJQKLAMGQE3HADTCI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat Jul 02 23:13:24 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-yw1-f185.google.com ([209.85.128.185]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1o7kQc-0002YP-Cv for gtp-pandoc-discuss@m.gmane-mx.org; Sat, 02 Jul 2022 23:13:22 +0200 Original-Received: by mail-yw1-f185.google.com with SMTP id 00721157ae682-3178996424dsf44487787b3.21 for ; Sat, 02 Jul 2022 14:13:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=xW8mNgPkZqDFEK2nyhbCiBOQzVclAOao+A1RO7QMPfI=; b=nSJFE9sap3uA2BWjg/23N7TXsCCAIVLXy4ngKoLIcGMYzew1vgD4neAJMcXxXkqnFT TaDvxCLZqKJ0N6yUTIDnYA95UPNnerRBZQBYb0Aw9Betik1XzqzOc6+2GTTnEGg0wPBu VrxTCgjJq+PRaYZvVDKPPYoi2fr/92SKx3QF+9OTYcajCjNOKrQNrAEAegXFnHEiMd/x nHlAppx+DXgeOotLuJPKu6gfV3iFZjXi0p/Pqoie0yniRhn/0eY72Z8gWYY0KYE2wf5k JQ3+71avlp7VQkFTt5max5y+828OfBPg6hPvqcu1Wtb218AnQ14ewJW9CPS1uBcyNq0U IzgA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=xW8mNgPkZqDFEK2nyhbCiBOQzVclAOao+A1RO7QMPfI=; b=VQy+VI+u+uMqplVGBgQJ+Ozmqq74qncfi4sX5izvAowSizQ54Y2CyBwKWJ91ftMfex FSpHlxSLkF+OjAXPtH8PeMs/PqWXT+zUO6Bpeqf4FWXail/YT435Sh3fsjsAV1Q1JcDs vhVB1+a7XeNdqtWU6jFWp1y81Zr2Xyg/wZqOHmyfUa/iwyDB8hLy1KGkZfgPc6nNsMr2 aUbB2SiGqodFOmvlUjDxPFO+Pw8aykC4vd1/6rOadvIeQbBbyJKP2Q+bY+3u8LVPE7ld K3+xNdh9eVu5dhuPOumYz+Pqwc4bQT9VOGzl7ldTrplzK6kkqyLG2YK45oyPSZX9EqS8 coDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=xW8mNgPkZqDFEK2nyhbCiBOQzVclAOao+A1RO7QMPfI=; b=2yu9q1eMMQsriRw2oUuZJCo9GPVNzs7itCkyGAIxSxDRmvAhEulao/J56y5Q1f+g35 jZFSyC0yGFXKDMwsJy7MJl0A6+wf/NDUQ1JuLkQiQfd7Age9tav6/GTtRG3pER+A6W4t Bafy9wMIGl/STXRXiD9Mo5fe1B+v/k6rop1KHt6FrJUNzzPAx2UfNHtdDSHoM7c9FzD9 IVlFnXIRkIKS7IEqs95X75V98eHmFpU20Ae0n4p2LuksJ3dRGamLGWiCX529OvHrTFuX 9AfzT+Ikpec+DxY6GRJd6vzLy+y877ZAo0nfygmPfOO2ZXhnyfHseZpfQje3MZIBBLYY TeqA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AJIora8yZ5X/NmCJYMqyku2QDq9tyhuuIdz8VIdsmG9GU/zHxLJX/liS wFz+DkRX5r/8auYdjkol/nY= X-Google-Smtp-Source: AGRyM1tI5Kv+dlmYh7PwRZn5Oj85TWh3xtwlKUvHASLTh6UIZ8i0S1HqWQ+6B3qzku8yFUWi1uxZjA== X-Received: by 2002:a05:690c:445:b0:317:9fbf:b1c5 with SMTP id bj5-20020a05690c044500b003179fbfb1c5mr26640766ywb.344.1656796401479; Sat, 02 Jul 2022 14:13:21 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a81:cc3:0:b0:317:9c38:4ea8 with SMTP id 186-20020a810cc3000000b003179c384ea8ls19537838ywm.11.gmail; Sat, 02 Jul 2022 14:13:18 -0700 (PDT) X-Received: by 2002:a0d:c486:0:b0:31c:3b63:91fe with SMTP id g128-20020a0dc486000000b0031c3b6391femr19792061ywd.7.1656796397688; Sat, 02 Jul 2022 14:13:17 -0700 (PDT) In-Reply-To: X-Original-Sender: R.D.Goulding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:30908 Archived-At: ------=_Part_604_956815091.1656796397214 Content-Type: multipart/alternative; boundary="----=_Part_605_1311528785.1656796397214" ------=_Part_605_1311528785.1656796397214 Content-Type: text/plain; charset="UTF-8" It works perfectly! Thanks, saved me a lot of manual fixing of files On Saturday, July 2, 2022 at 4:50:05 AM UTC-4 BP wrote: > I use this Lua filter to clean up when I convert from DOCX. > > ``````lua > local function handler (elem) > -- Get the length of the content > len = #elem.content > -- Check that the content isn't empty > if 0 < len then > -- Is the last child a space? > if 'Space' == elem.content[len].tag then > -- Remove the space (last child) > elem.content:remove() > -- Return a space *after* the element > return { elem, pandoc.Space() } > end > end > return nil > end > > return { > { > Emph = handler, > Strong = handler, > Strikeout = handler, > SmallCaps = handler, > Underline = handler, > Span = handler, > Link = handler, > } > } > `````` > > Den fre 1 juli 2022 18:37r.d.go...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org skrev: > >> I am a bit sloppy typing italics in my wordprocessor, and generally only >> turn off the italics after I hit the space at the end of the word, so I end >> up with markdown output that looks like this (when I convert from rtf to >> md): >> >> Strictly speaking the qualities that are imposed by the *logos *of a >> certain thing are the *activities *of the *logos* >> >> This looks ugly when I open it up in Emacs etc. I can fix these with >> regex replace in Emacs; but I thought pandoc had normalization by default >> now, which is supposed to fix these kinds of stylistic errors? I tried >> passing the markdown again through pandoc, to generate markdown, but it >> made no difference. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/bd84993b-b1cd-4128-aab2-ce1eff2c9768n%40googlegroups.com >> >> . >> > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/eb95ef1e-8b32-4454-95ca-94794db16961n%40googlegroups.com. ------=_Part_605_1311528785.1656796397214 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable It works perfectly! Thanks, saved me a lot of manual fixing of files
On Satur= day, July 2, 2022 at 4:50:05 AM UTC-4 BP wrote:
I use this Lua filter = to clean up when I convert from DOCX.

``````lua
local function handler (elem)
=C2=A0 -- Get the = length of the content
=C2=A0 len =3D #elem.content
=C2=A0 -- Check that the content isn't empty
=C2=A0 if 0 < len then
=C2=A0 = =C2=A0 -- Is the last child a space?
=C2=A0 =C2=A0 i= f 'Space' =3D=3D elem.content[len].tag then
= =C2=A0 =C2=A0 =C2=A0 -- Remove the space (last child)
=C2=A0 =C2=A0 =C2=A0 elem.content:remove()
=C2=A0 = =C2=A0 =C2=A0 -- Return a space *after* the element
= =C2=A0 =C2=A0 =C2=A0 return { elem, pandoc.Space() }
=C2=A0 =C2=A0 end
=C2=A0 end
=C2=A0 return nil
end

return {
=C2=A0 {
=C2=A0 =C2=A0 Emph=C2=A0 =C2=A0 =C2=A0 =3D handler,
=C2=A0 =C2=A0 Strong=C2=A0 =C2=A0 =3D handler,
=C2=A0 =C2=A0 Strikeout =3D handler,
=C2=A0 = =C2=A0 SmallCaps =3D handler,
=C2=A0 =C2=A0 Underlin= e =3D handler,
=C2=A0 =C2=A0 Span=C2=A0 =C2=A0 =C2= =A0 =3D handler,
=C2=A0 =C2=A0 Link=C2=A0 =C2=A0 =C2= =A0 =3D handler,
=C2=A0 }
}
``````

I am a bit sloppy typing italics in my wordproces= sor, and generally only turn off the italics after I hit the space at the e= nd of the word, so I end up with markdown output that looks like this (when= I convert from rtf to md):

Strictly speaking the qualit= ies that are imposed by the *logos *of a certain thing are the *activities = *of the *logos*

This looks ugly when I open it= up in Emacs etc. I can fix these with regex replace in Emacs; but I though= t pandoc had normalization by default now, which is supposed to fix these k= inds of stylistic errors? I tried passing the markdown again through pandoc= , to generate markdown, but it made no difference.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discus= ...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b= d84993b-b1cd-4128-aab2-ce1eff2c9768n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/eb95ef1e-8b32-4454-95ca-94794db16961n%40googlegroups.= com.
------=_Part_605_1311528785.1656796397214-- ------=_Part_604_956815091.1656796397214--