From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32252 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: JDTS Newsgroups: gmane.text.pandoc Subject: Re: Lua filter to fix incorrectly nested lists? Date: Mon, 27 Feb 2023 16:28:55 -0800 (PST) Message-ID: References: <163effbf-b672-4501-9171-8c4681034a96n@googlegroups.com> <80183457-60c8-4fc3-aa16-13d2f93104f1n@googlegroups.com> <8c2cd1be-52b9-467b-a747-a88fc062209bn@googlegroups.com> <8208c36c-dd86-49f6-9b77-32cc5f48299dn@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_5864_1570656564.1677544135301" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="25117"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDEZRENAQYORBSEV6WPQMGQE27D5UMY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Feb 28 01:29:00 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qk1-f189.google.com ([209.85.222.189]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1pWnrX-0006M9-Ru for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 28 Feb 2023 01:28:59 +0100 Original-Received: by mail-qk1-f189.google.com with SMTP id z23-20020a05620a101700b0073b328e7d17sf5051584qkj.9 for ; Mon, 27 Feb 2023 16:28:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :sender:from:to:cc:subject:date:message-id:reply-to; bh=bEOZp9vL+1IvAyA0XbqQ/224mVZvH/hRkv9JNzoRxOE=; b=oOn0UaaMYKeUEEAdGmQs7mLugSb8N2RiPiE7XL/IC0Y/0suFs//QPUiuF+hnhPYs73 klV8NCGBKVrtl1ZGXIs3AXwSENpPdFpy4LZ8KYgFI7c+H3XGEMUfyprA+qprminnKVN7 51QnIHDyrxec1awWOB4tyBbKTtCs8CRUTwkwcmIAb1i+oj/YpsbifeUwKIQVyjoFNJSv G7dO1k05mRHT4X8NqkcX77p756cOhLn1BHLNdEOL/PtcmXcpsoT3AYTV0H7U56vJMgKE SnY6KIfLLxD917XUM2kEnMESsY9X8G6Ogeq/8aTG0N0+g73o81zL0Y4cSXY/Ydc6GsmS rSlw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=bEOZp9vL+1IvAyA0XbqQ/224mVZvH/hRkv9JNzoRxOE=; b=MeYeLalK8/zYtWAff2KZ/w5l+js3K8Njj2tdUZLthel/+2rEx21myWsotCW9wOyVzY Gw4DZO6dLtcvRT6ac+anOkWLGgZXKWoi9J1bPk304cwGPa+4md81/Ce6wQlZV93yrlVc OaR1ycVCzaW89sYcsRm6nqSTKCbt5tnUHHgXelXI6DgrQSAlj69uAel6xjHKMKIoEUMC 2jtGxeJcIaf+gmyabtyz1KamnklrlSCnNVGSo4u3pdBQ4vxZY0ELTRbqwn4g5qdWHlPY 6vYudlS2IqZYHKHtvb6G/fj6mOnUGXTZUDGr+9kLCTPLdnOJBKzDoY2nK3MiHPCwLxpq jftg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-gm-message-state:sender:from:to:cc :subject:date:message-id:reply-to; bh=bEOZp9vL+1IvAyA0XbqQ/224mVZvH/hRkv9JNzoRxOE=; b=eXSZWNEY/yLRT+IeZkvXDp12Wk3ViVunqD6feldAEftHnKIRcn0HfUmr3R+YC9RbpU +JatnEEiPCSQYTIwGuHB7IT/F4A5YvTPdFOCUiX8s9lHIL1s/OYEOj+Ec6kREYzqiiW8 HI46xi7ifGBHvZL5SQIF8+iUTLikdsYdHwfU/XM0ung2gjtGwN5aAq79KSazhUt8EZzF 6LwfivwrduoZiu/0egcoOeHPk3en+08CKU+E2F9weSubhEUqx5osiyNgvXFoz6WiHB7y SDz12MrZWF6Mp0ShNvJdVUN0uX39xRMA/Tm7lx2rC7B4csNe+/ceNt9v3rH14uB5r4+R 3fVw== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AO0yUKWl3p9Ii8Kt7PYIu7ddItMS2XrpUZPRLS/NyGymsMDGEIG1/Iot pI3obDrvLiN/b8whyDW1p9g= X-Google-Smtp-Source: AK7set/B8JiE/WfEsNjiracpWUUt4uu1srwrrhM6+RTz71h7LhylsPBg0GUQszgkd+rm4CGFk/fkow== X-Received: by 2002:ad4:588b:0:b0:570:bd60:e1d7 with SMTP id dz11-20020ad4588b000000b00570bd60e1d7mr360546qvb.8.1677544138780; Mon, 27 Feb 2023 16:28:58 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6214:f08:b0:56e:8a76:960e with SMTP id gw8-20020a0562140f0800b0056e8a76960els8761554qvb.9.-pod-prod-gmail; Mon, 27 Feb 2023 16:28:56 -0800 (PST) X-Received: by 2002:a05:6214:5c42:b0:56c:235d:5d42 with SMTP id lz2-20020a0562145c4200b0056c235d5d42mr3636712qvb.0.1677544135903; Mon, 27 Feb 2023 16:28:55 -0800 (PST) In-Reply-To: <8208c36c-dd86-49f6-9b77-32cc5f48299dn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: jdtsmith-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32252 Archived-At: ------=_Part_5864_1570656564.1677544135301 Content-Type: multipart/alternative; boundary="----=_Part_5865_1497116896.1677544135301" ------=_Part_5865_1497116896.1677544135301 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable One other quick question: pandoc parses
as linebreak, and translates= =20 that into org as double-backslash \\. Any way to disable this? =20 On Monday, February 27, 2023 at 7:14:24=E2=80=AFPM UTC-5 JDTS wrote: > This works perfectly (including in targeting org, my use case). Thanks s= o=20 > much! > > On Monday, February 27, 2023 at 3:11:13=E2=80=AFPM UTC-5 Julien Dutant wr= ote: > >> Well, couldn't help but give it a shot. Here's a short filter that does= =20 >> the trick. Will work at arbitrary depth. >> >> https://gist.github.com/jdutant/549ef06074d3ae00b78ca6ec8ed2cfe1 >> >> >> function fixList(elem)=20 >> local changed =3D false=20 >> local newList =3D pandoc.List:new()=20 >> >> local function isSubList(list)=20 >> return #list =3D=3D 1=20 >> and (list[1].t =3D=3D 'BulletList' or list[1].t =3D=3D 'OrderedList')=20 >> end=20 >> >> for _,item in ipairs(elem.c) do=20 >> >> if #newList > 0 and isSubList(item) then=20 >> -- append item's sublist to the last item of newList=20 >> changed =3D true=20 >> newList[#newList]:insert(item[1])=20 >> else=20 >> -- otherwise append item to newList=20 >> newList:insert(item)=20 >> end=20 >> >> end=20 >> >> if changed then=20 >> elem.c =3D newList=20 >> end=20 >> >> return changed and elem or nil=20 >> end=20 >> >> return {{=20 >> OrderedList =3D fixList,=20 >> BulletList =3D fixList, }} >> >> On Monday, February 27, 2023 at 12:33:54=E2=80=AFAM UTC JDTS wrote: >> >>> >>> Thanks, I'll investigate this. The HTML structure is generated and=20 >>> therefore quite uniform, so it may be possible to do the munging there.= =20 >>> On Sunday, February 26, 2023 at 10:47:36=E2=80=AFAM UTC-5 Julien Dutant= wrote: >>> >>>> From my labelled-lists filter ( >>>> https://github.com/dialoa/dialectica-filters/blob/main/labelled-lists/= labelled-lists.lua),=20 >>>> here is a filter + function that checks whether every item in a bullet= list=20 >>>> starts with a Span element.=20 >>>> >>>> ```lua=20 >>>> >>>> --- is_custom_labelled_list: Look for custom labels markup=20 >>>> -- Custom label markup requires each item starting with a span=20 >>>> -- containing the label=20 >>>> -- @param element pandoc BulletList element=20 >>>> function is_custom_labelled_list (element)=20 >>>> local is_cl_list =3D true=20 >>>> >>>> -- the content of BulletList is a List of List of Blocks=20 >>>> for _,blocks in ipairs(element.c) do=20 >>>> -- check that the first element of the first block is Span=20 >>>> if not( blocks[1].c[1].t =3D=3D 'Span' ) then=20 >>>> is_cl_list =3D false =20 >>>> break =20 >>>> end=20 >>>> end=20 >>>> return is_cl_list=20 >>>> >>>> end >>>> >>>> return {{=20 >>>> BulletList =3D function(element)=20 >>>> if is_custom_labelled_list(element) then=20 >>>> return pandoc.Para(pandoc.Str('Was a list of the required kind!))) >>>> end=20 >>>> end, }} >>>> >>>> ``` >>>> >>>> The difficulty with manipulating lists is to follow their intricate=20 >>>> structure: a BulletList element as a content (element.c) that is a pan= doc=20 >>>> List. Each item in it (element.c[1], element.c[2]) is of Blocks type, = i.e.=20 >>>> a pandoc.List where the each element is a block. In your case you shou= ld=20 >>>> check that the list item only contains one block of type ordered list: >>>> >>>> if #elem.c[i] =3D=3D 1 then list_item_contains_one_block_only =3D true= end >>>> >>>> and check that this block is of type OrderedList: >>>> if #elem.c[i]=3D=3D1 and elem.c[i].t =3D=3D 'OrderedList' then ... >>>> >>>> you should then add that block to the previous item, and remove the=20 >>>> current item. >>>> >>>> Hope this helps, >>>> >>>> J >>>> >>>> On Saturday, February 25, 2023 at 10:06:45=E2=80=AFPM UTC JDTS wrote: >>>> >>>>> Thanks. Any pointers to lua filters that do something similar? >>>>> >>>>> On Saturday, February 25, 2023 at 10:01:08=E2=80=AFAM UTC-5 Julien Du= tant=20 >>>>> wrote: >>>>> >>>>>> Looks feasible. Pandoc converts the first html to: >>>>>> >>>>>> [ BulletList >>>>>> [ [ Plain >>>>>> [ ... Inlines ] >>>>>> ] >>>>>> , [ BulletList >>>>>> [ [ Plain >>>>>> [ ... Inlines ] >>>>>> ] >>>>>> , [ Plain >>>>>> [ ... Inlines ] >>>>>> ] >>>>>> ] >>>>>> ] >>>>>> , [ Plain >>>>>> [ Inlines ] >>>>>> ] >>>>>> ] >>>>>> ] >>>>>> >>>>>> I.e., the sublist is converted to its own list item. So the filter= =20 >>>>>> should pick up list, check if any item within them consists of a lon= e=20 >>>>>> sublist, and if so, move it to the previous item. (And best, apply t= he=20 >>>>>> filter recursively to that sublist itself.) >>>>>> >>>>>> On Saturday, February 25, 2023 at 2:26:04=E2=80=AFPM UTC JDTS wrote: >>>>>> >>>>>>> The Apple Notes app produces (via AppleScript) HTML for notes with= =20 >>>>>>> nested lists structured like: >>>>>>> >>>>>>>
    >>>>>>> >>>>>>>
  • Level 1 element 1
  • >>>>>>> >>>>>>>
      >>>>>>> >>>>>>>
    • Level 2 element 1
    • >>>>>>> >>>>>>>
    • Level 2 element 2
    • >>>>>>> >>>>>>>
    >>>>>>> >>>>>>>
  • Level 1 element 2
  • >>>>>>> >>>>>>>
>>>>>>> >>>>>>> As you can see, the sublist is incorrectly positioned. It should b= e=20 >>>>>>> positioned *within* the
  • Level 1 element 1 item, ala: >>>>>>> >>>>>>>
      >>>>>>> >>>>>>>
    • Level 1 element 1 >>>>>>> >>>>>>>
        >>>>>>> >>>>>>>
      • Level 2 element 1
      • >>>>>>> >>>>>>>
      • Level 2 element 2
      • >>>>>>> >>>>>>>
      >>>>>>> >>>>>>>
    • >>>>>>> >>>>>>>
    • Level 1 element 2
    • >>>>>>> >>>>>>>
    >>>>>>> >>>>>>> Is there a straightforward way with Lua filters to fix this at the= =20 >>>>>>> AST level, for arbitrary-depth sublist nesting? >>>>>>> >>>>>> --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/fb8d262d-bddc-4b79-8aca-703c1dffea36n%40googlegroups.com. ------=_Part_5865_1497116896.1677544135301 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable One other quick question: pandoc parses <br> as linebreak, and transl= ates that into org as double-backslash \\. =C2=A0Any way to disable this? = =C2=A0

    On Monday, February 27, 2023 at 7:14:24=E2=80=AFPM UTC-5 JDTS wrot= e:
    This works= perfectly (including in targeting org, my use case). =C2=A0Thanks so much!=

    O= n Monday, February 27, 2023 at 3:11:13=E2=80=AFPM UTC-5 Julien Dutant wrote= :
    Well, couldn= 9;t help but give it a shot. Here's a short filter that does the trick.= Will work at arbitrary depth.



    functio= n fixList(elem)

    local changed =3D false

    local newList =3D pandoc.List:new()


    local function isSubList(list)

    return #list =3D=3D 1

    and (list[1].t =3D=3D &#= 39;BulletList' or list[1<= /span>].t =3D=3D 'OrderedL= ist')

    end


    for _,item in ipairs(elem.c) do<= /span>


    if #newList > 0 and <= span>isSubList(item) then

    -= - append item's sublist to the last item of newList

    changed =3D
    true

    newList[#
    newList]:insert(item[1])

    else

    -= - otherwise append item to newList

    newList:insert
    (item)

    end


    end


    if= changed then

    elem.c =3D newList

    end


    return changed and elem or nil

    end=


    return {{

    OrderedList =3D
    fixList,

    BulletList =3D
    fixList,
    }}

    On Monday, February 27, 2023 at 12:33:54= =E2=80=AFAM UTC JDTS wrote:

    Thanks, I'll investigate this. =C2=A0The HTML structure is ge= nerated and therefore quite uniform, so it may be possible to do the mungin= g there.=C2=A0
    On Sunday, February 26, 2023 at 10:47:36=E2=80=AFAM UTC-5 Julien Dutan= t wrote:
    From my = labelled-lists filter (https://github.com/dialoa/dialectica-filters/bl= ob/main/labelled-lists/labelled-lists.lua), here is a filter + function= that checks whether every item in a bullet list starts with a Span element= .

    ```lua

    --- is_custom_labelled_list: Look for custom labels markup
    -- Custom label markup requires each item starting with a span

    -- containing the label

    -- @param element pandoc BulletList element

    function is_custom_labelled_list (element)

    =C2=A0=C2=A0 local
    is_cl_list =3D true


    =C2=A0=C2=A0 -- the content of BulletList is a List of List of Blocks

    =C2=A0=C2=A0 for
    _,blocks in ipairs(element.c<= /span>) do

    =C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 -- check that the first element of the f= irst block is Span

    =C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 if not( blocks[1].c[1].t =3D=3D 'Span' )
    then
    =C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 =C2=A0=C2=A0 is_cl_list =3D false=
    =C2=A0
    =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 break =C2=A0
    =C2=A0=C2=A0=C2= =A0=C2=A0 end

    =C2=A0=C2=A0=C2= =A0 end

    =C2=A0=C2=A0 return
    is_cl_list


    end

    return {= {

    BulletList <= span>=3D
    function(element)

    if= is_custom_labelled_list(element) then

    return
    pandoc.Para(pandoc.Str('Was a list of the require= d kind!)))

    = end

    end,
    }}

    ```
    The difficulty with manipulating lists is to follow their intr= icate=20 structure: a BulletList element as a content (element.c) that is a=20 pandoc List. Each item in it (element.c[1], element.c[2]) is of Blocks=20 type, i.e. a pandoc.List where the each element is a block. In your case yo= u should check that the list item only contains one block of type ordered l= ist:

    if #elem.c[i] =3D=3D 1 then list_item_contain= s_one_block_only =3D true end

    and check that this = block is of type OrderedList:
    if #elem.c[i]=3D=3D1 and elem.c[i].= t =3D=3D 'OrderedList' then ...

    you should= then add that block to the previous item, and remove the current item.

    Hope this helps,

    J
    =
    On Sa= turday, February 25, 2023 at 10:06:45=E2=80=AFPM UTC JDTS wrote:
    <= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 0.8ex;border-left:1p= x solid rgb(204,204,204);padding-left:1ex">Thanks. =C2=A0Any pointers to lu= a filters that do something similar?

    On Saturday, February 25, 2023 at 10:01:= 08=E2=80=AFAM UTC-5 Julien Dutant wrote:
    Looks feasible. Pandoc converts the first html to:<= /div>

    [ BulletList
    =C2=A0 =C2=A0 [ [ Plain
    =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 [ ... Inlines ]
    =C2=A0 =C2=A0 =C2=A0 ]
    = =C2=A0 =C2=A0 , [ BulletList
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [ [ Plai= n
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [ ... Inlines = ]
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ]
    =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 , [ Plain
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 [ ... Inlines=C2=A0 ]
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = ]
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ]
    =C2=A0 =C2=A0 =C2=A0 ]
    =C2= =A0 =C2=A0 , [ Plain
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [ Inlines ]
    = =C2=A0 =C2=A0 =C2=A0 ]
    =C2=A0 =C2=A0 ]
    ]

    I.e., the sublist is converted to its own list item. So the filter should = pick up list, check if any item within them consists of a lone sublist, and= if so, move it to the previous item. (And best, apply the filter recursive= ly to that sublist itself.)

    On Saturday, February 25, 2023 a= t 2:26:04=E2=80=AFPM UTC JDTS wrote:
    The Apple Notes app produces (via AppleScript) HTML for note= s with nested lists structured like:

    <ul>

    <li>Level 1 element 1</li>=

    <ul>

    <li>Level 2 element 1</li>=

    <li>Level 2 element 2</li>=

    </ul>

    <li>Level 1 element 2</li>=

    </ul>


    As you can see, the sublist is incorrectly positioned. =C2= =A0It should be positioned=C2=A0within=C2=A0the <li> Level 1 element 1 item, ala:

    <= div>

    <ul>

    <li>Level 1 element 1

    =C2=A0 =C2=A0 <ul>

    =C2=A0 =C2=A0 <li>Level 2 element 1</li>

    =C2=A0 =C2=A0 <li>Level 2 elem= ent 2</li>

    =C2=A0 =C2= =A0 </ul>

    </li><= br>

    <li>Level 1 elemen= t 2</li>

    </ul>


    Is there a straightforward way wi= th Lua filters to fix this at the AST level, for arbitrary-depth sublist ne= sting?

    --
    You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
    To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
    To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/fb8d262d-bddc-4b79-8aca-703c1dffea36n%40googlegroups.= com.
    ------=_Part_5865_1497116896.1677544135301-- ------=_Part_5864_1570656564.1677544135301--