Normalize adjacent Emph separated by Space?

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Normalize adjacent Emph separated by Space?
@ 2021-12-17  1:25 christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
       [not found] ` <c887615c-9a1e-40a5-b6b0-a7bf22c87350n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2021-12-17  1:25 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1976 bytes --]



Hi,

I recently have a situation that I want to have adjacent Emph separated by 
Space to be “normalized”, combined to a single Emph. (And variants like 
this.)

--normalize is removed in 
https://github.com/jgm/pandoc/commit/8165014df679338d5bf228d84efc742c5ac39d2 
and I’m not sure if it is related.

Example:

$ echo "*text* *abc*" | pandoc -f markdown -t native
[ Para [ Emph [ Str "text" ] , Space , Emph [ Str "abc" ] ]
]

Is there someway to make it

[ Para [ Emph [ Str "text" , Space , Str "abc" ] ] ]
]

Instead?

Another example is (perhaps generated in a filter)

[ Para [ Emph [ Str "text " ] , Emph [ Str "abc" ] ]]

Is there some way to normalize it to

[ Para [ Emph [ Str "text" ] , Space, Emph [ Str "abc" ] ]]

When I say “some way”, preferably it is some pandoc flags (for example 
native to markdown then markdown to native is not reliable for the space 
problem above and won’t work to merge adjacent Emph), but even if we’re 
talking about doing it in filter, how to do things like this reliably?

Thanks.

P.S.

Even for

[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]

won’t be normalized:

$ echo '[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]' | pandoc -f native -t native
[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ] ]

But the text in the commit message seems to suggest it should:

normalization is handled automatically by the Builder monoid instance.

(But the commit is old so may be there’s more to it in the later commits.)


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c887615c-9a1e-40a5-b6b0-a7bf22c87350n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 16898 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

[parent not found: <c887615c-9a1e-40a5-b6b0-a7bf22c87350n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]

* Re: Normalize adjacent Emph separated by Space?
       [not found] ` <c887615c-9a1e-40a5-b6b0-a7bf22c87350n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-12-17  2:49   ` John MacFarlane
       [not found]     ` <m2o85g6j3y.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: John MacFarlane @ 2021-12-17  2:49 UTC (permalink / raw)
  To: christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, pandoc-discuss


Yes, normalization is handled automatically -- but what we
mean by normalization doesn't include your case, where the
Space is *outside* the Emph and would be brought inside it.

You could write a filter that does this sort of normalization.
It would look in [Inline] lists for the pattern

  Emph xs , Space , Emph ys

and convert this to

  Emph (xs ++ Space:ys)


"christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hi,
>
> I recently have a situation that I want to have adjacent Emph separated by 
> Space to be “normalized”, combined to a single Emph. (And variants like 
> this.)
>
> --normalize is removed in 
> https://github.com/jgm/pandoc/commit/8165014df679338d5bf228d84efc742c5ac39d2 
> and I’m not sure if it is related.
>
> Example:
>
> $ echo "*text* *abc*" | pandoc -f markdown -t native
> [ Para [ Emph [ Str "text" ] , Space , Emph [ Str "abc" ] ]
> ]
>
> Is there someway to make it
>
> [ Para [ Emph [ Str "text" , Space , Str "abc" ] ] ]
> ]
>
> Instead?
>
> Another example is (perhaps generated in a filter)
>
> [ Para [ Emph [ Str "text " ] , Emph [ Str "abc" ] ]]
>
> Is there some way to normalize it to
>
> [ Para [ Emph [ Str "text" ] , Space, Emph [ Str "abc" ] ]]
>
> When I say “some way”, preferably it is some pandoc flags (for example 
> native to markdown then markdown to native is not reliable for the space 
> problem above and won’t work to merge adjacent Emph), but even if we’re 
> talking about doing it in filter, how to do things like this reliably?
>
> Thanks.
>
> P.S.
>
> Even for
>
> [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]
>
> won’t be normalized:
>
> $ echo '[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]' | pandoc -f native -t native
> [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ] ]
>
> But the text in the commit message seems to suggest it should:
>
> normalization is handled automatically by the Builder monoid instance.
>
> (But the commit is old so may be there’s more to it in the later commits.)
> 
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c887615c-9a1e-40a5-b6b0-a7bf22c87350n%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2o85g6j3y.fsf%40Johns-Air.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 3+ messages in thread

[parent not found: <m2o85g6j3y.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org>]

* Re: Normalize adjacent Emph separated by Space?
       [not found]     ` <m2o85g6j3y.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org>
@ 2021-12-17  4:32       ` christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
  0 siblings, 0 replies; 3+ messages in thread
From: christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2021-12-17  4:32 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 5110 bytes --]



Thanks. I implemented a similar idea in Python in 
https://github.com/ickc/pandoc-amsthm/commit/8f806f5c33c220a3c1007343dbd193a285a378cc

Reproduced below.

Expressing that idea in Python (and panflute) seems quite verbose. May be 
there’s smarter way, below is quite procedural:

from __future__ import annotationsimport panflute as pffrom panflute.elements import Element, Doc
def merge_emph(elem: Element, doc: Doc) -> list[Element] | None:
    """Merge neighboring Emph with optionally Space between them."""
    if isinstance(elem, pf.Block):
        content = elem.content
        n = len(content)

        mutated = False
        # walk in reverse direction to avoid mutating current location i
        # also start with the 2nd last entry because we're matching 2 or more elements
        for i in range(n - 2, -1, -1):
            elem_cur = content[i]
            # remember that we are mutated content and therefore len(content) changes too
            elem_next = None if i + 1 >= len(content) else content[i + 1]
            elem_next_next = None if i + 2 >= len(content) else content[i + 2]
            if isinstance(elem_cur, pf.Emph):
                if isinstance(elem_next, pf.Emph):
                    merged = list(elem_cur.content) + list(elem_next.content)
                    content = list(content[:i]) + [pf.Emph(*merged)] + list(content[i + 2 :])
                    mutated = True
                elif isinstance(elem_next, pf.Space):
                    if isinstance(elem_next_next, pf.Emph):
                        merged = list(elem_cur.content) + [pf.Space] + list(elem_next_next.content)
                        content = list(content[:i]) + [pf.Emph(*merged)] + list(content[i + 3 :])
                        mutated = True
        if mutated:
            elem.content = content
    return None


On Thursday, December 16, 2021 at 6:49:52 PM UTC-8 John MacFarlane wrote:

>
> Yes, normalization is handled automatically -- but what we
> mean by normalization doesn't include your case, where the
> Space is *outside* the Emph and would be brought inside it.
>
> You could write a filter that does this sort of normalization.
> It would look in [Inline] lists for the pattern
>
> Emph xs , Space , Emph ys
>
> and convert this to
>
> Emph (xs ++ Space:ys)
>
>
> "christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > Hi,
> >
> > I recently have a situation that I want to have adjacent Emph separated 
> by 
> > Space to be “normalized”, combined to a single Emph. (And variants like 
> > this.)
> >
> > --normalize is removed in 
> > 
> https://github.com/jgm/pandoc/commit/8165014df679338d5bf228d84efc742c5ac39d2 
> > and I’m not sure if it is related.
> >
> > Example:
> >
> > $ echo "*text* *abc*" | pandoc -f markdown -t native
> > [ Para [ Emph [ Str "text" ] , Space , Emph [ Str "abc" ] ]
> > ]
> >
> > Is there someway to make it
> >
> > [ Para [ Emph [ Str "text" , Space , Str "abc" ] ] ]
> > ]
> >
> > Instead?
> >
> > Another example is (perhaps generated in a filter)
> >
> > [ Para [ Emph [ Str "text " ] , Emph [ Str "abc" ] ]]
> >
> > Is there some way to normalize it to
> >
> > [ Para [ Emph [ Str "text" ] , Space, Emph [ Str "abc" ] ]]
> >
> > When I say “some way”, preferably it is some pandoc flags (for example 
> > native to markdown then markdown to native is not reliable for the space 
> > problem above and won’t work to merge adjacent Emph), but even if we’re 
> > talking about doing it in filter, how to do things like this reliably?
> >
> > Thanks.
> >
> > P.S.
> >
> > Even for
> >
> > [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]
> >
> > won’t be normalized:
> >
> > $ echo '[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]' | pandoc 
> -f native -t native
> > [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ] ]
> >
> > But the text in the commit message seems to suggest it should:
> >
> > normalization is handled automatically by the Builder monoid instance.
> >
> > (But the commit is old so may be there’s more to it in the later 
> commits.)
> > 
> >
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/c887615c-9a1e-40a5-b6b0-a7bf22c87350n%40googlegroups.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/73a470ac-2da5-445c-ac89-f423310206ccn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 14794 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-12-17  4:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-17  1:25 Normalize adjacent Emph separated by Space? christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
     [not found] ` <c887615c-9a1e-40a5-b6b0-a7bf22c87350n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-12-17  2:49   ` John MacFarlane
     [not found]     ` <m2o85g6j3y.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org>
2021-12-17  4:32       ` christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).