* Normalize adjacent Emph separated by Space?
@ 2021-12-17 1:25 christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
[not found] ` <c887615c-9a1e-40a5-b6b0-a7bf22c87350n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2021-12-17 1:25 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1976 bytes --]
Hi,
I recently have a situation that I want to have adjacent Emph separated by
Space to be “normalized”, combined to a single Emph. (And variants like
this.)
--normalize is removed in
https://github.com/jgm/pandoc/commit/8165014df679338d5bf228d84efc742c5ac39d2
and I’m not sure if it is related.
Example:
$ echo "*text* *abc*" | pandoc -f markdown -t native
[ Para [ Emph [ Str "text" ] , Space , Emph [ Str "abc" ] ]
]
Is there someway to make it
[ Para [ Emph [ Str "text" , Space , Str "abc" ] ] ]
]
Instead?
Another example is (perhaps generated in a filter)
[ Para [ Emph [ Str "text " ] , Emph [ Str "abc" ] ]]
Is there some way to normalize it to
[ Para [ Emph [ Str "text" ] , Space, Emph [ Str "abc" ] ]]
When I say “some way”, preferably it is some pandoc flags (for example
native to markdown then markdown to native is not reliable for the space
problem above and won’t work to merge adjacent Emph), but even if we’re
talking about doing it in filter, how to do things like this reliably?
Thanks.
P.S.
Even for
[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]
won’t be normalized:
$ echo '[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]' | pandoc -f native -t native
[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ] ]
But the text in the commit message seems to suggest it should:
normalization is handled automatically by the Builder monoid instance.
(But the commit is old so may be there’s more to it in the later commits.)
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c887615c-9a1e-40a5-b6b0-a7bf22c87350n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 16898 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Normalize adjacent Emph separated by Space?
[not found] ` <c887615c-9a1e-40a5-b6b0-a7bf22c87350n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-12-17 2:49 ` John MacFarlane
[not found] ` <m2o85g6j3y.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: John MacFarlane @ 2021-12-17 2:49 UTC (permalink / raw)
To: christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, pandoc-discuss
Yes, normalization is handled automatically -- but what we
mean by normalization doesn't include your case, where the
Space is *outside* the Emph and would be brought inside it.
You could write a filter that does this sort of normalization.
It would look in [Inline] lists for the pattern
Emph xs , Space , Emph ys
and convert this to
Emph (xs ++ Space:ys)
"christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> Hi,
>
> I recently have a situation that I want to have adjacent Emph separated by
> Space to be “normalized”, combined to a single Emph. (And variants like
> this.)
>
> --normalize is removed in
> https://github.com/jgm/pandoc/commit/8165014df679338d5bf228d84efc742c5ac39d2
> and I’m not sure if it is related.
>
> Example:
>
> $ echo "*text* *abc*" | pandoc -f markdown -t native
> [ Para [ Emph [ Str "text" ] , Space , Emph [ Str "abc" ] ]
> ]
>
> Is there someway to make it
>
> [ Para [ Emph [ Str "text" , Space , Str "abc" ] ] ]
> ]
>
> Instead?
>
> Another example is (perhaps generated in a filter)
>
> [ Para [ Emph [ Str "text " ] , Emph [ Str "abc" ] ]]
>
> Is there some way to normalize it to
>
> [ Para [ Emph [ Str "text" ] , Space, Emph [ Str "abc" ] ]]
>
> When I say “some way”, preferably it is some pandoc flags (for example
> native to markdown then markdown to native is not reliable for the space
> problem above and won’t work to merge adjacent Emph), but even if we’re
> talking about doing it in filter, how to do things like this reliably?
>
> Thanks.
>
> P.S.
>
> Even for
>
> [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]
>
> won’t be normalized:
>
> $ echo '[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]' | pandoc -f native -t native
> [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ] ]
>
> But the text in the commit message seems to suggest it should:
>
> normalization is handled automatically by the Builder monoid instance.
>
> (But the commit is old so may be there’s more to it in the later commits.)
>
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c887615c-9a1e-40a5-b6b0-a7bf22c87350n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2o85g6j3y.fsf%40Johns-Air.hsd1.ca.comcast.net.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Normalize adjacent Emph separated by Space?
[not found] ` <m2o85g6j3y.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org>
@ 2021-12-17 4:32 ` christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
0 siblings, 0 replies; 3+ messages in thread
From: christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2021-12-17 4:32 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 5110 bytes --]
Thanks. I implemented a similar idea in Python in
https://github.com/ickc/pandoc-amsthm/commit/8f806f5c33c220a3c1007343dbd193a285a378cc
Reproduced below.
Expressing that idea in Python (and panflute) seems quite verbose. May be
there’s smarter way, below is quite procedural:
from __future__ import annotationsimport panflute as pffrom panflute.elements import Element, Doc
def merge_emph(elem: Element, doc: Doc) -> list[Element] | None:
"""Merge neighboring Emph with optionally Space between them."""
if isinstance(elem, pf.Block):
content = elem.content
n = len(content)
mutated = False
# walk in reverse direction to avoid mutating current location i
# also start with the 2nd last entry because we're matching 2 or more elements
for i in range(n - 2, -1, -1):
elem_cur = content[i]
# remember that we are mutated content and therefore len(content) changes too
elem_next = None if i + 1 >= len(content) else content[i + 1]
elem_next_next = None if i + 2 >= len(content) else content[i + 2]
if isinstance(elem_cur, pf.Emph):
if isinstance(elem_next, pf.Emph):
merged = list(elem_cur.content) + list(elem_next.content)
content = list(content[:i]) + [pf.Emph(*merged)] + list(content[i + 2 :])
mutated = True
elif isinstance(elem_next, pf.Space):
if isinstance(elem_next_next, pf.Emph):
merged = list(elem_cur.content) + [pf.Space] + list(elem_next_next.content)
content = list(content[:i]) + [pf.Emph(*merged)] + list(content[i + 3 :])
mutated = True
if mutated:
elem.content = content
return None
On Thursday, December 16, 2021 at 6:49:52 PM UTC-8 John MacFarlane wrote:
>
> Yes, normalization is handled automatically -- but what we
> mean by normalization doesn't include your case, where the
> Space is *outside* the Emph and would be brought inside it.
>
> You could write a filter that does this sort of normalization.
> It would look in [Inline] lists for the pattern
>
> Emph xs , Space , Emph ys
>
> and convert this to
>
> Emph (xs ++ Space:ys)
>
>
> "christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > Hi,
> >
> > I recently have a situation that I want to have adjacent Emph separated
> by
> > Space to be “normalized”, combined to a single Emph. (And variants like
> > this.)
> >
> > --normalize is removed in
> >
> https://github.com/jgm/pandoc/commit/8165014df679338d5bf228d84efc742c5ac39d2
> > and I’m not sure if it is related.
> >
> > Example:
> >
> > $ echo "*text* *abc*" | pandoc -f markdown -t native
> > [ Para [ Emph [ Str "text" ] , Space , Emph [ Str "abc" ] ]
> > ]
> >
> > Is there someway to make it
> >
> > [ Para [ Emph [ Str "text" , Space , Str "abc" ] ] ]
> > ]
> >
> > Instead?
> >
> > Another example is (perhaps generated in a filter)
> >
> > [ Para [ Emph [ Str "text " ] , Emph [ Str "abc" ] ]]
> >
> > Is there some way to normalize it to
> >
> > [ Para [ Emph [ Str "text" ] , Space, Emph [ Str "abc" ] ]]
> >
> > When I say “some way”, preferably it is some pandoc flags (for example
> > native to markdown then markdown to native is not reliable for the space
> > problem above and won’t work to merge adjacent Emph), but even if we’re
> > talking about doing it in filter, how to do things like this reliably?
> >
> > Thanks.
> >
> > P.S.
> >
> > Even for
> >
> > [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]
> >
> > won’t be normalized:
> >
> > $ echo '[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]' | pandoc
> -f native -t native
> > [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ] ]
> >
> > But the text in the commit message seems to suggest it should:
> >
> > normalization is handled automatically by the Builder monoid instance.
> >
> > (But the commit is old so may be there’s more to it in the later
> commits.)
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/c887615c-9a1e-40a5-b6b0-a7bf22c87350n%40googlegroups.com
> .
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/73a470ac-2da5-445c-ac89-f423310206ccn%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 14794 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-12-17 4:32 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-17 1:25 Normalize adjacent Emph separated by Space? christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
[not found] ` <c887615c-9a1e-40a5-b6b0-a7bf22c87350n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-12-17 2:49 ` John MacFarlane
[not found] ` <m2o85g6j3y.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org>
2021-12-17 4:32 ` christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).