* Normalize adjacent Emph separated by Space? @ 2021-12-17 1:25 christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [not found] ` <c887615c-9a1e-40a5-b6b0-a7bf22c87350n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2021-12-17 1:25 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1976 bytes --] Hi, I recently have a situation that I want to have adjacent Emph separated by Space to be “normalized”, combined to a single Emph. (And variants like this.) --normalize is removed in https://github.com/jgm/pandoc/commit/8165014df679338d5bf228d84efc742c5ac39d2 and I’m not sure if it is related. Example: $ echo "*text* *abc*" | pandoc -f markdown -t native [ Para [ Emph [ Str "text" ] , Space , Emph [ Str "abc" ] ] ] Is there someway to make it [ Para [ Emph [ Str "text" , Space , Str "abc" ] ] ] ] Instead? Another example is (perhaps generated in a filter) [ Para [ Emph [ Str "text " ] , Emph [ Str "abc" ] ]] Is there some way to normalize it to [ Para [ Emph [ Str "text" ] , Space, Emph [ Str "abc" ] ]] When I say “some way”, preferably it is some pandoc flags (for example native to markdown then markdown to native is not reliable for the space problem above and won’t work to merge adjacent Emph), but even if we’re talking about doing it in filter, how to do things like this reliably? Thanks. P.S. Even for [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]] won’t be normalized: $ echo '[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]' | pandoc -f native -t native [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ] ] But the text in the commit message seems to suggest it should: normalization is handled automatically by the Builder monoid instance. (But the commit is old so may be there’s more to it in the later commits.) -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c887615c-9a1e-40a5-b6b0-a7bf22c87350n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 16898 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <c887615c-9a1e-40a5-b6b0-a7bf22c87350n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Normalize adjacent Emph separated by Space? [not found] ` <c887615c-9a1e-40a5-b6b0-a7bf22c87350n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2021-12-17 2:49 ` John MacFarlane [not found] ` <m2o85g6j3y.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: John MacFarlane @ 2021-12-17 2:49 UTC (permalink / raw) To: christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, pandoc-discuss Yes, normalization is handled automatically -- but what we mean by normalization doesn't include your case, where the Space is *outside* the Emph and would be brought inside it. You could write a filter that does this sort of normalization. It would look in [Inline] lists for the pattern Emph xs , Space , Emph ys and convert this to Emph (xs ++ Space:ys) "christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > Hi, > > I recently have a situation that I want to have adjacent Emph separated by > Space to be “normalized”, combined to a single Emph. (And variants like > this.) > > --normalize is removed in > https://github.com/jgm/pandoc/commit/8165014df679338d5bf228d84efc742c5ac39d2 > and I’m not sure if it is related. > > Example: > > $ echo "*text* *abc*" | pandoc -f markdown -t native > [ Para [ Emph [ Str "text" ] , Space , Emph [ Str "abc" ] ] > ] > > Is there someway to make it > > [ Para [ Emph [ Str "text" , Space , Str "abc" ] ] ] > ] > > Instead? > > Another example is (perhaps generated in a filter) > > [ Para [ Emph [ Str "text " ] , Emph [ Str "abc" ] ]] > > Is there some way to normalize it to > > [ Para [ Emph [ Str "text" ] , Space, Emph [ Str "abc" ] ]] > > When I say “some way”, preferably it is some pandoc flags (for example > native to markdown then markdown to native is not reliable for the space > problem above and won’t work to merge adjacent Emph), but even if we’re > talking about doing it in filter, how to do things like this reliably? > > Thanks. > > P.S. > > Even for > > [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]] > > won’t be normalized: > > $ echo '[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]' | pandoc -f native -t native > [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ] ] > > But the text in the commit message seems to suggest it should: > > normalization is handled automatically by the Builder monoid instance. > > (But the commit is old so may be there’s more to it in the later commits.) > > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c887615c-9a1e-40a5-b6b0-a7bf22c87350n%40googlegroups.com. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2o85g6j3y.fsf%40Johns-Air.hsd1.ca.comcast.net. ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <m2o85g6j3y.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org>]
* Re: Normalize adjacent Emph separated by Space? [not found] ` <m2o85g6j3y.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org> @ 2021-12-17 4:32 ` christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org 0 siblings, 0 replies; 3+ messages in thread From: christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2021-12-17 4:32 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 5110 bytes --] Thanks. I implemented a similar idea in Python in https://github.com/ickc/pandoc-amsthm/commit/8f806f5c33c220a3c1007343dbd193a285a378cc Reproduced below. Expressing that idea in Python (and panflute) seems quite verbose. May be there’s smarter way, below is quite procedural: from __future__ import annotationsimport panflute as pffrom panflute.elements import Element, Doc def merge_emph(elem: Element, doc: Doc) -> list[Element] | None: """Merge neighboring Emph with optionally Space between them.""" if isinstance(elem, pf.Block): content = elem.content n = len(content) mutated = False # walk in reverse direction to avoid mutating current location i # also start with the 2nd last entry because we're matching 2 or more elements for i in range(n - 2, -1, -1): elem_cur = content[i] # remember that we are mutated content and therefore len(content) changes too elem_next = None if i + 1 >= len(content) else content[i + 1] elem_next_next = None if i + 2 >= len(content) else content[i + 2] if isinstance(elem_cur, pf.Emph): if isinstance(elem_next, pf.Emph): merged = list(elem_cur.content) + list(elem_next.content) content = list(content[:i]) + [pf.Emph(*merged)] + list(content[i + 2 :]) mutated = True elif isinstance(elem_next, pf.Space): if isinstance(elem_next_next, pf.Emph): merged = list(elem_cur.content) + [pf.Space] + list(elem_next_next.content) content = list(content[:i]) + [pf.Emph(*merged)] + list(content[i + 3 :]) mutated = True if mutated: elem.content = content return None On Thursday, December 16, 2021 at 6:49:52 PM UTC-8 John MacFarlane wrote: > > Yes, normalization is handled automatically -- but what we > mean by normalization doesn't include your case, where the > Space is *outside* the Emph and would be brought inside it. > > You could write a filter that does this sort of normalization. > It would look in [Inline] lists for the pattern > > Emph xs , Space , Emph ys > > and convert this to > > Emph (xs ++ Space:ys) > > > "christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > > > Hi, > > > > I recently have a situation that I want to have adjacent Emph separated > by > > Space to be “normalized”, combined to a single Emph. (And variants like > > this.) > > > > --normalize is removed in > > > https://github.com/jgm/pandoc/commit/8165014df679338d5bf228d84efc742c5ac39d2 > > and I’m not sure if it is related. > > > > Example: > > > > $ echo "*text* *abc*" | pandoc -f markdown -t native > > [ Para [ Emph [ Str "text" ] , Space , Emph [ Str "abc" ] ] > > ] > > > > Is there someway to make it > > > > [ Para [ Emph [ Str "text" , Space , Str "abc" ] ] ] > > ] > > > > Instead? > > > > Another example is (perhaps generated in a filter) > > > > [ Para [ Emph [ Str "text " ] , Emph [ Str "abc" ] ]] > > > > Is there some way to normalize it to > > > > [ Para [ Emph [ Str "text" ] , Space, Emph [ Str "abc" ] ]] > > > > When I say “some way”, preferably it is some pandoc flags (for example > > native to markdown then markdown to native is not reliable for the space > > problem above and won’t work to merge adjacent Emph), but even if we’re > > talking about doing it in filter, how to do things like this reliably? > > > > Thanks. > > > > P.S. > > > > Even for > > > > [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]] > > > > won’t be normalized: > > > > $ echo '[ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ]]' | pandoc > -f native -t native > > [ Para [ Emph [ Str "text" ] , Emph [ Str "abc" ] ] ] > > > > But the text in the commit message seems to suggest it should: > > > > normalization is handled automatically by the Builder monoid instance. > > > > (But the commit is old so may be there’s more to it in the later > commits.) > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/c887615c-9a1e-40a5-b6b0-a7bf22c87350n%40googlegroups.com > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/73a470ac-2da5-445c-ac89-f423310206ccn%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 14794 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-12-17 4:32 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-12-17 1:25 Normalize adjacent Emph separated by Space? christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org [not found] ` <c887615c-9a1e-40a5-b6b0-a7bf22c87350n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2021-12-17 2:49 ` John MacFarlane [not found] ` <m2o85g6j3y.fsf-d8241O7hbXoP5tpWdHSM3tPlBySK3R6THiGdP5j34PU@public.gmane.org> 2021-12-17 4:32 ` christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).