From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32250 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Julien Dutant Newsgroups: gmane.text.pandoc Subject: Re: Lua filter to fix incorrectly nested lists? Date: Mon, 27 Feb 2023 12:11:13 -0800 (PST) Message-ID: References: <163effbf-b672-4501-9171-8c4681034a96n@googlegroups.com> <80183457-60c8-4fc3-aa16-13d2f93104f1n@googlegroups.com> <8c2cd1be-52b9-467b-a747-a88fc062209bn@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_2820_2063788805.1677528673019" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36488"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBC5Y3356IYIOFHHUT4DBUBF6ETJCU-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Feb 27 21:11:18 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qk1-f184.google.com ([209.85.222.184]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1pWjq9-0009GW-GU for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 27 Feb 2023 21:11:17 +0100 Original-Received: by mail-qk1-f184.google.com with SMTP id dm13-20020a05620a1d4d00b00742a22c4239sf2853389qkb.1 for ; Mon, 27 Feb 2023 12:11:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :sender:from:to:cc:subject:date:message-id:reply-to; bh=c910iACvjp+bSaZMSru6WaS7H4/glVZEqsnuHxtmDlw=; b=Cs8ORlBV+DutpwPoUKP1BsZ1xegzlDLTUutsX40qRPJSmm4XpozMY/tMvPZBUeiYoD Z81GxZIe3w9butOKu1+lAXsf2QV2gUBB/o/bOd27PJj8pBIDFfdMOl53C4xCO1SiaUvw tUu7Br5nb1MKXY1kt1lhfNNaOl3arnV7skGVk/Rv3lYqGCDGHlVnyfooZELBIXxp5lpr aoR8WefCEm6aa+ubZBkA0F9/hU1E71DXR9BCQW1Ev7lTP0zvNTHO26gwN2b4ufsb9XWU 9/7Ghh8yQ0MJyfdtUeMOWWHFMTf/tSeDK+T5L6acQy/wtONSWzirIFRxe1qaeJ6wH70M VWyQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=c910iACvjp+bSaZMSru6WaS7H4/glVZEqsnuHxtmDlw=; b=Kk4IKmH3epsiMdqGJqPSfdm/GXVKqaK04t/IAVB9VtUrjgo+T0qvWHl9EBEShz+DOc 7y+deKrYM5dnRXNXa+NA5DOQ2ak8nJNRFNMOwJ9JqLNUTPvsrkInEMan2yzBuXNjg8iy PhV9a0W7pAtHMFS/ijgCx94aRn/NvFslcCpvdGvVrZBRJ28T4h+wLMIaxr3BOVKTlVOV EZHpzO+MZsrtEbMwbJuHp8WIL3e+mbke4JpvjemcXghozzHaeDd2ecRC1wZlvADTR5+Q lT7+lc8EfoVB8XIHF6NCNnMJOGrzzK33a81m3kHlDz21LtfLVxx3hktxO0nwMCuX9ED2 l09A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-gm-message-state:sender:from:to:cc :subject:date:message-id:reply-to; bh=c910iACvjp+bSaZMSru6WaS7H4/glVZEqsnuHxtmDlw=; b=Qvwm469N0fjA/Zqk+xrSW+nkunkQzeQLdzim+zodV3KK0+RC6VG/Lgc1jfB9p5iNuY VWYHvNJ4+V/J+7wPg6X84s8nL9EoltUwTo9SpZuDdHx0QkZlSLXrih5/Bp2Yk6ExqBeh HACv6ltdqWwfUX+BQhIbIDFcs1Ja5EwD2BNLv0bTYZJSBTMyM9KXADp8z9j28WTqxcvV D3ihNMdsOVP4KIL6kfv+b2jsUjQ/2ZKbzbHrxU42k0CW+lQ5Lx6Lk0lJ9fJW0cLokrKE QlfQqy/I+XPfC8WAN3XY8PXnwTu1KGzuBLLwe+2QHrXjwvgjGQC4bhPtLqDJlCCfWJuo ELhQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AO0yUKWzieOJzkQ1gdZV3OryS4h8dwkaR75Ubf6yUqYdXklvuTjMkRZN 2pZ8Vx4jDpe3OPrWUPVLUF0= X-Google-Smtp-Source: AK7set8pbjq9hOtSymi3EUg5wbVmB1nCTBCN2MEGysDSapEqQIvkbtKXhX9Vjf8bWRjx+yRM77uRBA== X-Received: by 2002:ac8:5608:0:b0:3b8:6824:f33c with SMTP id 8-20020ac85608000000b003b86824f33cmr153329qtr.2.1677528676451; Mon, 27 Feb 2023 12:11:16 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:620a:3199:b0:6fb:e284:cb6 with SMTP id bi25-20020a05620a319900b006fbe2840cb6ls2264716qkb.3.-pod-prod-gmail; Mon, 27 Feb 2023 12:11:13 -0800 (PST) X-Received: by 2002:a05:620a:c86:b0:742:71e6:b8d4 with SMTP id q6-20020a05620a0c8600b0074271e6b8d4mr2479517qki.6.1677528673595; Mon, 27 Feb 2023 12:11:13 -0800 (PST) In-Reply-To: <8c2cd1be-52b9-467b-a747-a88fc062209bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: julien.dutant-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32250 Archived-At: ------=_Part_2820_2063788805.1677528673019 Content-Type: multipart/alternative; boundary="----=_Part_2821_15984807.1677528673020" ------=_Part_2821_15984807.1677528673020 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Well, couldn't help but give it a shot. Here's a short filter that does the= =20 trick. Will work at arbitrary depth. https://gist.github.com/jdutant/549ef06074d3ae00b78ca6ec8ed2cfe1 function fixList(elem)=20 local changed =3D false=20 local newList =3D pandoc.List:new()=20 local function isSubList(list)=20 return #list =3D=3D 1=20 and (list[1].t =3D=3D 'BulletList' or list[1].t =3D=3D 'OrderedList')=20 end=20 for _,item in ipairs(elem.c) do=20 if #newList > 0 and isSubList(item) then=20 -- append item's sublist to the last item of newList=20 changed =3D true=20 newList[#newList]:insert(item[1])=20 else=20 -- otherwise append item to newList=20 newList:insert(item)=20 end=20 end=20 if changed then=20 elem.c =3D newList=20 end=20 return changed and elem or nil=20 end=20 return {{=20 OrderedList =3D fixList,=20 BulletList =3D fixList, }} On Monday, February 27, 2023 at 12:33:54=E2=80=AFAM UTC JDTS wrote: > > Thanks, I'll investigate this. The HTML structure is generated and=20 > therefore quite uniform, so it may be possible to do the munging there.= =20 > On Sunday, February 26, 2023 at 10:47:36=E2=80=AFAM UTC-5 Julien Dutant w= rote: > >> From my labelled-lists filter ( >> https://github.com/dialoa/dialectica-filters/blob/main/labelled-lists/la= belled-lists.lua),=20 >> here is a filter + function that checks whether every item in a bullet l= ist=20 >> starts with a Span element.=20 >> >> ```lua=20 >> >> --- is_custom_labelled_list: Look for custom labels markup=20 >> -- Custom label markup requires each item starting with a span=20 >> -- containing the label=20 >> -- @param element pandoc BulletList element=20 >> function is_custom_labelled_list (element)=20 >> local is_cl_list =3D true=20 >> >> -- the content of BulletList is a List of List of Blocks=20 >> for _,blocks in ipairs(element.c) do=20 >> -- check that the first element of the first block is Span=20 >> if not( blocks[1].c[1].t =3D=3D 'Span' ) then=20 >> is_cl_list =3D false =20 >> break =20 >> end=20 >> end=20 >> return is_cl_list=20 >> >> end >> >> return {{=20 >> BulletList =3D function(element)=20 >> if is_custom_labelled_list(element) then=20 >> return pandoc.Para(pandoc.Str('Was a list of the required kind!))) >> end=20 >> end, }} >> >> ``` >> >> The difficulty with manipulating lists is to follow their intricate=20 >> structure: a BulletList element as a content (element.c) that is a pando= c=20 >> List. Each item in it (element.c[1], element.c[2]) is of Blocks type, i.= e.=20 >> a pandoc.List where the each element is a block. In your case you should= =20 >> check that the list item only contains one block of type ordered list: >> >> if #elem.c[i] =3D=3D 1 then list_item_contains_one_block_only =3D true e= nd >> >> and check that this block is of type OrderedList: >> if #elem.c[i]=3D=3D1 and elem.c[i].t =3D=3D 'OrderedList' then ... >> >> you should then add that block to the previous item, and remove the=20 >> current item. >> >> Hope this helps, >> >> J >> >> On Saturday, February 25, 2023 at 10:06:45=E2=80=AFPM UTC JDTS wrote: >> >>> Thanks. Any pointers to lua filters that do something similar? >>> >>> On Saturday, February 25, 2023 at 10:01:08=E2=80=AFAM UTC-5 Julien Duta= nt wrote: >>> >>>> Looks feasible. Pandoc converts the first html to: >>>> >>>> [ BulletList >>>> [ [ Plain >>>> [ ... Inlines ] >>>> ] >>>> , [ BulletList >>>> [ [ Plain >>>> [ ... Inlines ] >>>> ] >>>> , [ Plain >>>> [ ... Inlines ] >>>> ] >>>> ] >>>> ] >>>> , [ Plain >>>> [ Inlines ] >>>> ] >>>> ] >>>> ] >>>> >>>> I.e., the sublist is converted to its own list item. So the filter=20 >>>> should pick up list, check if any item within them consists of a lone= =20 >>>> sublist, and if so, move it to the previous item. (And best, apply the= =20 >>>> filter recursively to that sublist itself.) >>>> >>>> On Saturday, February 25, 2023 at 2:26:04=E2=80=AFPM UTC JDTS wrote: >>>> >>>>> The Apple Notes app produces (via AppleScript) HTML for notes with=20 >>>>> nested lists structured like: >>>>> >>>>>
    >>>>> >>>>>
  • Level 1 element 1
  • >>>>> >>>>>
      >>>>> >>>>>
    • Level 2 element 1
    • >>>>> >>>>>
    • Level 2 element 2
    • >>>>> >>>>>
    >>>>> >>>>>
  • Level 1 element 2
  • >>>>> >>>>>
>>>>> >>>>> As you can see, the sublist is incorrectly positioned. It should be= =20 >>>>> positioned *within* the
  • Level 1 element 1 item, ala: >>>>> >>>>>
      >>>>> >>>>>
    • Level 1 element 1 >>>>> >>>>>
        >>>>> >>>>>
      • Level 2 element 1
      • >>>>> >>>>>
      • Level 2 element 2
      • >>>>> >>>>>
      >>>>> >>>>>
    • >>>>> >>>>>
    • Level 1 element 2
    • >>>>> >>>>>
    >>>>> >>>>> Is there a straightforward way with Lua filters to fix this at the AS= T=20 >>>>> level, for arbitrary-depth sublist nesting? >>>>> >>>> --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/a299184a-2b46-4940-a634-bdb656bfa15dn%40googlegroups.com. ------=_Part_2821_15984807.1677528673020 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
    Well, couldn't help but give it a shot. Here's a short filter that doe= s the trick. Will work at arbitrary depth.

    https://gist.github.com/jdutant/549ef06074d3ae00b78ca6ec8ed2cfe1

    function fixList
    (elem)

    local changed =3D false

    local newList =3D pandoc.List:new()<= /span>


    local function isSubList(list)

    return= #list =3D=3D 1

    and (list[1].t =3D=3D = 'BulletList' or list[1= ].t =3D=3D 'OrderedList'= )

    end


    for _,item in ipairs(elem.c) d= o


    if #newList > 0 and= isSubList(item) then

    -- append item's sublist to the last item of newList

    changed =3D
    true

    newList[#
    newList]:insert(item[1])

    else

    -- otherwise append item to newList

    newList:insert
    (item)

    end


    end


    if changed then

    elem.c= =3D newList

    end


    return<= /span> changed and elem or nil

    end


    return {{

    OrderedList <= span>=3D
    fixList,

    BulletList =3D
    fixList,
    }}

    On Monday, February 27, 2023 at 12:33:5= 4=E2=80=AFAM UTC JDTS wrote:

    Thanks, I'll investigate this. =C2=A0The HTML struc= ture is generated and therefore quite uniform, so it may be possible to do = the munging there.=C2=A0
    On Sunday, February 26, 2023 at 10:47:36=E2=80=AFAM UTC-5 J= ulien Dutant wrote:
    From my labelled-lists filter (https://github.com/dialoa/dialectica= -filters/blob/main/labelled-lists/labelled-lists.lua), here is a filter= + function that checks whether every item in a bullet list starts with a S= pan element.

    ```lua

    =
    = --- is_custom_labelled_list: Look for custom label= s markup
    -- Custom label markup requires each item starting with a span

    -- containing the label

    -- @param element pandoc BulletList element

    function is_custom_labelled_list (element)

    =C2=A0=C2=A0 local
    is_cl_list =3D true


    =C2=A0=C2=A0 -- the content of BulletList is a List of List of Blocks

    =C2=A0=C2=A0 for
    _,blocks in ipairs(element.c<= /span>) do

    =C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 -- check that the first element of the f= irst block is Span

    =C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 if not( blocks[1].c[1].t =3D=3D 'Span' )
    then
    =C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 =C2=A0=C2=A0 is_cl_list =3D false=
    =C2=A0
    =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 break =C2=A0
    =C2=A0=C2=A0=C2= =A0=C2=A0 end

    =C2=A0=C2=A0=C2= =A0 end

    =C2=A0=C2=A0 return
    is_cl_list


    end

    return {= {

    BulletList <= span>=3D
    function(element)

    if= is_custom_labelled_list(element) then

    return
    pandoc.Para(pandoc.Str('Was a list of the require= d kind!)))

    = end

    end,
    }}

    ```
    The difficulty with manipulating lists is to follow their intr= icate=20 structure: a BulletList element as a content (element.c) that is a=20 pandoc List. Each item in it (element.c[1], element.c[2]) is of Blocks=20 type, i.e. a pandoc.List where the each element is a block. In your case yo= u should check that the list item only contains one block of type ordered l= ist:

    if #elem.c[i] =3D=3D 1 then list_item_contain= s_one_block_only =3D true end

    and check that this = block is of type OrderedList:
    if #elem.c[i]=3D=3D1 and elem.c[i].= t =3D=3D 'OrderedList' then ...

    you should= then add that block to the previous item, and remove the current item.

    Hope this helps,

    J
    =
    On Sa= turday, February 25, 2023 at 10:06:45=E2=80=AFPM UTC JDTS wrote:
    <= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 0.8ex;border-left:1p= x solid rgb(204,204,204);padding-left:1ex">Thanks. =C2=A0Any pointers to lu= a filters that do something similar?

    On Saturday, February 25, 2023 at 10:01:= 08=E2=80=AFAM UTC-5 Julien Dutant wrote:
    Looks feasible. Pandoc converts the first html to:<= /div>

    [ BulletList
    =C2=A0 =C2=A0 [ [ Plain
    =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 [ ... Inlines ]
    =C2=A0 =C2=A0 =C2=A0 ]
    = =C2=A0 =C2=A0 , [ BulletList
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [ [ Plai= n
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [ ... Inlines = ]
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ]
    =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 , [ Plain
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 [ ... Inlines=C2=A0 ]
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = ]
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ]
    =C2=A0 =C2=A0 =C2=A0 ]
    =C2= =A0 =C2=A0 , [ Plain
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [ Inlines ]
    = =C2=A0 =C2=A0 =C2=A0 ]
    =C2=A0 =C2=A0 ]
    ]

    I.e., the sublist is converted to its own list item. So the filter should = pick up list, check if any item within them consists of a lone sublist, and= if so, move it to the previous item. (And best, apply the filter recursive= ly to that sublist itself.)

    On Saturday, February 25, 2023 a= t 2:26:04=E2=80=AFPM UTC JDTS wrote:
    The Apple Notes app produces (via AppleScript) HTML for note= s with nested lists structured like:

    <ul>

    <li>Level 1 element 1</li>=

    <ul>

    <li>Level 2 element 1</li>=

    <li>Level 2 element 2</li>=

    </ul>

    <li>Level 1 element 2</li>=

    </ul>


    As you can see, the sublist is incorrectly positioned. =C2= =A0It should be positioned=C2=A0within=C2=A0the <li> Level 1 element 1 item, ala:

    <= div>

    <ul>

    <li>Level 1 element 1

    =C2=A0 =C2=A0 <ul>

    =C2=A0 =C2=A0 <li>Level 2 element 1</li>

    =C2=A0 =C2=A0 <li>Level 2 elem= ent 2</li>

    =C2=A0 =C2= =A0 </ul>

    </li><= br>

    <li>Level 1 elemen= t 2</li>

    </ul>


    Is there a straightforward way wi= th Lua filters to fix this at the AST level, for arbitrary-depth sublist ne= sting?

    --
    You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
    To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
    To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/a299184a-2b46-4940-a634-bdb656bfa15dn%40googlegroups.= com.
    ------=_Part_2821_15984807.1677528673020-- ------=_Part_2820_2063788805.1677528673019--