From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29809 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: "christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" Newsgroups: gmane.text.pandoc Subject: Re: Normalize adjacent Emph separated by Space? Date: Thu, 16 Dec 2021 20:32:11 -0800 (PST) Message-ID: <73a470ac-2da5-445c-ac89-f423310206ccn@googlegroups.com> References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_286_896248956.1639715531877" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="4313"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCS252WXTEIBBTNF6CGQMGQEIY65UEI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri Dec 17 05:32:17 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oo1-f57.google.com ([209.85.161.57]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1my4ul-0000wh-U1 for gtp-pandoc-discuss@m.gmane-mx.org; Fri, 17 Dec 2021 05:32:16 +0100 Original-Received: by mail-oo1-f57.google.com with SMTP id c15-20020a4a87cf000000b002caccd96998sf732872ooi.10 for ; Thu, 16 Dec 2021 20:32:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=WPAcpBzU6Wh7lTa8b3QTKrfHeZ7NC7Z4VdAuhd/qPOk=; b=OP3YPjKdj/h1oYQItC1D1LJgnJS0zHWuqvW+M0a84NdeR6MJQHS8eLCCJhy8Ng6Vbl KDk/rlL8l9H5FNp02qV4pJ9/NmDcQsw0zggwph/y5XghwQ5Fxb9bf7jVW72wcnmMqBtW oq+AFcQbVDQ+gFbTUZthImIT3emsij+S01+5dbPX4NPuHGPcOHI2Xkipfeg8zX3LSgQc YBbrT8bs7O9xA7hkEub1+EF5Y/aXOgCvvlebQpsVIMySq+lvz/29/pvx+xQw8tS3h6aW U/zuNSCjD+WsKPBTkRkRkWW2uuvHtcMKkec+UHmTYWwcXCcGMbaq12m/CgZcLhj6p8Ff g9ww== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=WPAcpBzU6Wh7lTa8b3QTKrfHeZ7NC7Z4VdAuhd/qPOk=; b=CMBNrzTSihNLtYp7Z5ep/3PqgqJNQ9OyBp+V2DmlncJIRnni2n3aesJEVnie/4rBCn 2Yc9kTAx8dI7ZNzBEnsLIDR6nkqVIvpOHKTVi25AM+GYw74yxtNEDenSzGzVIJNP4eWD Pb1jAc2FmtnGTmNvAmJ/jPXCqzILnw2eFVnJoxUFQgUgwTsjRz7t9lsMzqwywJ5Zk2LR NlnSKwm4tdwHjZNYT4TGAl7BAZP+0XusifC2x/GG1ZJ45+OfWNBHcAozLzqKeo4qcuxu 4EFpaCch1xgfhM7cU9PMtrTES0ESlbRzqdMxhJfCEUjyL5jNHmv84FsBQGRWObu4JhbY AsQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=WPAcpBzU6Wh7lTa8b3QTKrfHeZ7NC7Z4VdAuhd/qPOk=; b=Swi0/k/j8aX4+O9Z5wMXs9P/6f2DsIYuGhGwhCY2gWZdcb/hH+Fkigp8EV/xyPPs8q r6O1+AUTfDbeVkXj5Gl21KUtiTcGsrm19tXXlvofqe4SnU65Rp4kf5E9iRWM3dmf4GvP VKTTwNlPlcCZy9pZHn6PthQdjaKYoZ7knQg8YN4A3SeNd4BlBBM8OWg84YichMPDh7r+ 8D2CM/Ws0zOANn0hKgKsIbbulf8ReggJ065ZHtYCcfi9Qbvi4573P+LSKFFdaKRN1tax R2Lo23iZnHH/roYzogqOpRtjXjvSCf/BxrmzmjvSz+CXanKO7HaRqCj7BA/5Jx0vj4Yh 3N3A== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM532dmjhr9E+TVI1L4zlENL6tjc1/rXAo7LT5Acy3uv0K9XgI1MIK iz33VFOJZa0AnNT4kNibnIk= X-Google-Smtp-Source: ABdhPJz8VsxioP9Q1urK6tgRgXmfMfD+cUlI+dUkgoFdQYu9vGcgIwRa9o8ybSsfczlncfStyD3GAA== X-Received: by 2002:a05:6830:2019:: with SMTP id e25mr999755otp.96.1639715534866; Thu, 16 Dec 2021 20:32:14 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a4a:2ccf:: with SMTP id o198ls435702ooo.1.gmail; Thu, 16 Dec 2021 20:32:12 -0800 (PST) X-Received: by 2002:a4a:aa8f:: with SMTP id d15mr878390oon.86.1639715532406; Thu, 16 Dec 2021 20:32:12 -0800 (PST) In-Reply-To: X-Original-Sender: christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29809 Archived-At: ------=_Part_286_896248956.1639715531877 Content-Type: multipart/alternative; boundary="----=_Part_287_1464094019.1639715531877" ------=_Part_287_1464094019.1639715531877 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks. I implemented a similar idea in Python in=20 https://github.com/ickc/pandoc-amsthm/commit/8f806f5c33c220a3c1007343dbd193= a285a378cc Reproduced below. Expressing that idea in Python (and panflute) seems quite verbose. May be= =20 there=E2=80=99s smarter way, below is quite procedural: from __future__ import annotationsimport panflute as pffrom panflute.elemen= ts import Element, Doc def merge_emph(elem: Element, doc: Doc) -> list[Element] | None: """Merge neighboring Emph with optionally Space between them.""" if isinstance(elem, pf.Block): content =3D elem.content n =3D len(content) mutated =3D False # walk in reverse direction to avoid mutating current location i # also start with the 2nd last entry because we're matching 2 or mo= re elements for i in range(n - 2, -1, -1): elem_cur =3D content[i] # remember that we are mutated content and therefore len(conten= t) changes too elem_next =3D None if i + 1 >=3D len(content) else content[i + = 1] elem_next_next =3D None if i + 2 >=3D len(content) else content= [i + 2] if isinstance(elem_cur, pf.Emph): if isinstance(elem_next, pf.Emph): merged =3D list(elem_cur.content) + list(elem_next.cont= ent) content =3D list(content[:i]) + [pf.Emph(*merged)] + li= st(content[i + 2 :]) mutated =3D True elif isinstance(elem_next, pf.Space): if isinstance(elem_next_next, pf.Emph): merged =3D list(elem_cur.content) + [pf.Space] + li= st(elem_next_next.content) content =3D list(content[:i]) + [pf.Emph(*merged)] = + list(content[i + 3 :]) mutated =3D True if mutated: elem.content =3D content return None =E2=80=8B On Thursday, December 16, 2021 at 6:49:52 PM UTC-8 John MacFarlane wrote: > > Yes, normalization is handled automatically -- but what we > mean by normalization doesn't include your case, where the > Space is *outside* the Emph and would be brought inside it. > > You could write a filter that does this sort of normalization. > It would look in [Inline] lists for the pattern > > Emph xs , Space , Emph ys > > and convert this to > > Emph (xs ++ Space:ys) > > > "christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" writes: > > > Hi, > > > > I recently have a situation that I want to have adjacent Emph separated= =20 > by=20 > > Space to be =E2=80=9Cnormalized=E2=80=9D, combined to a single Emph. (A= nd variants like=20 > > this.) > > > > --normalize is removed in=20 > >=20 > https://github.com/jgm/pandoc/commit/8165014df679338d5bf228d84efc742c5ac3= 9d2=20 > > and I=E2=80=99m not sure if it is related. > > > > Example: > > > > $ echo "*text* *abc*" | pandoc -f markdown -t native > > [ Para [ Emph [ Str "text" ] , Space , Emph [ Str "abc" ] ] > > ] > > > > Is there someway to make it > > > > [ Para [ Emph [ Str "text" , Space , Str "abc" ] ] ] > > ] > > > > Instead? > > > > Another example is (perhaps generated in a filter) > > > > [ Para [ Emph [ Str "text " ] , Emph [ Str "abc" ] ]] > > > > Is there some way to normalize it to > > > > [ Para [ Emph [ Str "text" ] , Space, Emph [ Str "abc" ] ]] > > > > When I say =E2=80=9Csome way=E2=80=9D, preferably it is some pandoc fla= gs (for example=20 > > native to markdown then markdown to native is not reliable for the spac= e=20 > > problem above and won=E2=80=99t work to merge adjacent Emph), but even = if we=E2=80=99re=20 > > talking about doing it in filter, how to do things like this reliably? > > > > Thanks. > > > > P.S. > > > > Even for > > > > [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]] > > > > won=E2=80=99t be normalized: > > > > $ echo '[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]' | pandoc= =20 > -f native -t native > > [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ] ] > > > > But the text in the commit message seems to suggest it should: > > > > normalization is handled automatically by the Builder monoid instance. > > > > (But the commit is old so may be there=E2=80=99s more to it in the late= r=20 > commits.) > > =E2=80=8B > > > > --=20 > > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send= =20 > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/c887615c-9a1e-40a5-b6b0-= a7bf22c87350n%40googlegroups.com > . > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/73a470ac-2da5-445c-ac89-f423310206ccn%40googlegroups.com. ------=_Part_287_1464094019.1639715531877 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Thanks. I= implemented a similar idea in Python in https://github.com/ickc/pandoc-amsthm/commit/8f806f5= c33c220a3c1007343dbd193a285a378cc

Reproduced below.

Expressing that idea in Pyth= on (and panflute) seems quite verbose. May be there=E2=80=99s smarter way, below is quite procedural:

from __future__ import annotations
import panflute as pf
from=
 panflute.elements import Element, Doc

def merge_emph(elem: Eleme=
nt, doc: Doc) -> list[Element] | None:
    """Merg=
e neighboring Emph with optionally Space between them."""
    if isinstance(elem, pf.Block):
        content =3D elem.content
        n =3D len(content)

        mutated =3D False
        #=
 walk in reverse direction to avoid mutating current location i
        #=
 also start with the 2nd last entry because we're matching 2 or more elemen=
ts
        for=
 i i=
n range(n - 2, -1, -1):
            elem_cur =3D content[i]
            # remember that we are mutated content and therefore len(content) change=
s too
            elem_next =3D None if i + 1 >=3D len(content) else content[i + 1]
            elem_next_next =3D None if i + 2 >=3D len(content) else content[i + 2]
            if isinstance(elem_cur, pf.Emph):
                if isinstance(elem_next, pf.Emph):
                    merged =3D list(elem_cur.content) + list(elem_next.cont=
ent)
                    content =3D list(content[:i]) + [pf.Emph(*merged)] + li=
st(content[i + 2 :])
                    mutated =3D True
                elif isinstance(elem_next, pf.Space):
                    if isinstance(elem_next_next, pf.Emph):
                        merged =3D list(elem_cur.content) + [pf.Space] + li=
st(elem_next_next.content)
                        content =3D list(content[:i]) + [pf.Emph(*merged)] =
+ list(content[i + 3 :])
                        mutated =3D True
        if<=
/span> mutated:
            elem.content =3D content
    return<=
/span> None=

=E2=80=8B
On Thursday, December 16, 2021 at 6:49:52 PM= UTC-8 John MacFarlane wrote:

Yes, normalization is handled automatically -- but what we
mean by normalization doesn't include your case, where the
Space is *outside* the Emph and would be brought inside it.

You could write a filter that does this sort of normalization.
It would look in [Inline] lists for the pattern

Emph xs , Space , Emph ys

and convert this to

Emph (xs ++ Space:ys)


"christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <christi...-Re5JQEeQqe8@public.gmane.org= m> writes:

> Hi,
>
> I recently have a situation that I want to have adjacent Emph sepa= rated by=20
> Space to be =E2=80=9Cnormalized=E2=80=9D, combined to a single Emp= h. (And variants like=20
> this.)
>
> --normalize is removed in=20
> https:= //github.com/jgm/pandoc/commit/8165014df679338d5bf228d84efc742c5ac39d2= =20
> and I=E2=80=99m not sure if it is related.
>
> Example:
>
> $ echo "*text* *abc*" | pandoc -f markdown -t native
> [ Para [ Emph [ Str "text" ] , Space , Emph [ Str "= abc" ] ]
> ]
>
> Is there someway to make it
>
> [ Para [ Emph [ Str "text" , Space , Str "abc"= ] ] ]
> ]
>
> Instead?
>
> Another example is (perhaps generated in a filter)
>
> [ Para [ Emph [ Str "text " ] , Emph [ Str "abc&quo= t; ] ]]
>
> Is there some way to normalize it to
>
> [ Para [ Emph [ Str "text" ] , Space, Emph [ Str "a= bc" ] ]]
>
> When I say =E2=80=9Csome way=E2=80=9D, preferably it is some pando= c flags (for example=20
> native to markdown then markdown to native is not reliable for the= space=20
> problem above and won=E2=80=99t work to merge adjacent Emph), but = even if we=E2=80=99re=20
> talking about doing it in filter, how to do things like this relia= bly?
>
> Thanks.
>
> P.S.
>
> Even for
>
> [ Para [ Emph [ Str "text" ] , Emph [ Str "abc"= ; ] ]]
>
> won=E2=80=99t be normalized:
>
> $ echo '[ Para [ Emph [ Str "text" ] , Emph [ Str &q= uot;abc" ] ]]' | pandoc -f native -t native
> [ Para [ Emph [ Str "text" ] , Emph [ Str "abc"= ; ] ] ]
>
> But the text in the commit message seems to suggest it should:
>
> normalization is handled automatically by the Builder monoid insta= nce.
>
> (But the commit is old so may be there=E2=80=99s more to it in the= later commits.)
> =E2=80=8B
>
> --=20
> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email to pandoc-discus..= .@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c887= 615c-9a1e-40a5-b6b0-a7bf22c87350n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/73a470ac-2da5-445c-ac89-f423310206ccn%40googlegroups.= com.
------=_Part_287_1464094019.1639715531877-- ------=_Part_286_896248956.1639715531877--