From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/32236 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: JDTS Newsgroups: gmane.text.pandoc Subject: Re: Lua filter to fix incorrectly nested lists? Date: Sun, 26 Feb 2023 16:33:54 -0800 (PST) Message-ID: <8c2cd1be-52b9-467b-a747-a88fc062209bn@googlegroups.com> References: <163effbf-b672-4501-9171-8c4681034a96n@googlegroups.com> <80183457-60c8-4fc3-aa16-13d2f93104f1n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_552_1405842131.1677458034812" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31741"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDEZRENAQYORB5HU56PQMGQENNCOOLA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Feb 27 01:33:59 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-qv1-f60.google.com ([209.85.219.60]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1pWRSp-00084O-Gg for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 27 Feb 2023 01:33:59 +0100 Original-Received: by mail-qv1-f60.google.com with SMTP id jh21-20020a0562141fd500b0053c23b938a0sf2550041qvb.17 for ; Sun, 26 Feb 2023 16:33:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :sender:from:to:cc:subject:date:message-id:reply-to; bh=30xNkRp5hqlrPZuNVN37o85hdVgmcPx/xAte7WLMUtM=; b=glD1kX0QyLqJCATDpQAznZ8oGmP2/122Uv+031UCfVkaC/+fZCL/LFbxaFxQU0ljtj exJQZ/TJ5m1aW3EhUTFss7qpnXXTStE4PKn6c+o4cBRkiKjh1UfICQ6yUcbwM4Yh55JL 92/PTy001ogOGzHARs3h/x9rtOxfSXdfdJsZa60QC9Tg4OWmzoVVmH1YhVM0Jof7nHQB kxsR37KzWKBkQKIUMt0MJj8jd6PwBzl2prkselwJHOSsqXlwMNU52wTU6PDOQ4m4cJjR p/2pIj3UW0Z4mH0YqK9bYHQ8g+h4wdhXYOQZYqZCzTNKLSTtYtsDru+bMk3xbsl2Bz8h qD4A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=30xNkRp5hqlrPZuNVN37o85hdVgmcPx/xAte7WLMUtM=; b=VeYBNc2WJ7gLzLzxaCvW3RZvX88NK2Qlj/dpZpfWi2zuXvukkK/07csPVvHPr2Arsa JMazrlk6j04ZRBypsebJRkXSPNLYIwGZhGS6WhmwweHSA6UU4uvt53TQVFqvocTk3z9w f3Ok2FXo4MKPpsGebx9yM0ZTlQLvZQy7jFOjS8XEugsL/rYPozOX0/rB9vA0p+uBzoH2 ffyMeULMCYK0DGwdzuiwBv/PzxyhvgMNAnUTggkMA+13qqCW1ilCBWihk3EFdNhbbDTW VHzg1zlU9W4R+0eH3uhRf9QK6xcSAWxM9ZiJ5OnFAbypuZx+81br+2EB826wTFIar4b0 nbgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-gm-message-state:sender:from:to:cc :subject:date:message-id:reply-to; bh=30xNkRp5hqlrPZuNVN37o85hdVgmcPx/xAte7WLMUtM=; b=KQ7BPQ73OLpTKPD7gsfG4mW5Ix5aPEjGlBrzGFMlMlDH71AgEeDRg2+sg+KPXo/Djr xH8cJWSrs0PSIlCoJUn8hIhyRVASB3HoyQbeZZMXY7apMJpR7shAQR0NFoP8rY/jjeL5 CTTkKjh8BMcf36O70JQ1+r1y2RVMjjS4EfZmWoMX4gR9rYhm7Zq+SqLvj+NNjCeawnPx BJlDtTJxRh+MsjHUk5U/0giZfLJSzF0zs6AzpewH43z6OScWwXKY1fi3GzVO/CuIILUV u9WKfUfmvfi91pyB6SwM7nmNjy54ULeGGTvHuzYeporueGz3aqCHiWeCr2eKEMnmdwH4 tGnA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AO0yUKU8kebEcoaBjtMkDlElLWvDmz/Bw688+l/RoybSfvx8bwm/lQ5x sXW9zve7JJAPTJ+xRjWVwh0= X-Google-Smtp-Source: AK7set/A+92RBLqc5CDXGKsgWj+EHrfqSdU7JCXzeZ64iL9VzM/OyzRgnqyOccn5I9Q1MHyUtHPhQQ== X-Received: by 2002:a05:620a:14b8:b0:71f:b89c:4f2f with SMTP id x24-20020a05620a14b800b0071fb89c4f2fmr4622707qkj.9.1677458038462; Sun, 26 Feb 2023 16:33:58 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:ac8:7ed0:0:b0:3bf:a488:a7cf with SMTP id x16-20020ac87ed0000000b003bfa488a7cfls8539021qtj.4.-pod-prod-gmail; Sun, 26 Feb 2023 16:33:55 -0800 (PST) X-Received: by 2002:ac8:540d:0:b0:3bf:dc7e:9e42 with SMTP id b13-20020ac8540d000000b003bfdc7e9e42mr5161qtq.10.1677458035690; Sun, 26 Feb 2023 16:33:55 -0800 (PST) In-Reply-To: <80183457-60c8-4fc3-aa16-13d2f93104f1n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: jdtsmith-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:32236 Archived-At: ------=_Part_552_1405842131.1677458034812 Content-Type: multipart/alternative; boundary="----=_Part_553_200784311.1677458034812" ------=_Part_553_200784311.1677458034812 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks, I'll investigate this. The HTML structure is generated and=20 therefore quite uniform, so it may be possible to do the munging there.=20 On Sunday, February 26, 2023 at 10:47:36=E2=80=AFAM UTC-5 Julien Dutant wro= te: > From my labelled-lists filter ( > https://github.com/dialoa/dialectica-filters/blob/main/labelled-lists/lab= elled-lists.lua),=20 > here is a filter + function that checks whether every item in a bullet li= st=20 > starts with a Span element.=20 > > ```lua=20 > > --- is_custom_labelled_list: Look for custom labels markup=20 > -- Custom label markup requires each item starting with a span=20 > -- containing the label=20 > -- @param element pandoc BulletList element=20 > function is_custom_labelled_list (element)=20 > local is_cl_list =3D true=20 > > -- the content of BulletList is a List of List of Blocks=20 > for _,blocks in ipairs(element.c) do=20 > -- check that the first element of the first block is Span=20 > if not( blocks[1].c[1].t =3D=3D 'Span' ) then=20 > is_cl_list =3D false =20 > break =20 > end=20 > end=20 > return is_cl_list=20 > > end > > return {{=20 > BulletList =3D function(element)=20 > if is_custom_labelled_list(element) then=20 > return pandoc.Para(pandoc.Str('Was a list of the required kind!))) > end=20 > end, }} > > ``` > > The difficulty with manipulating lists is to follow their intricate=20 > structure: a BulletList element as a content (element.c) that is a pandoc= =20 > List. Each item in it (element.c[1], element.c[2]) is of Blocks type, i.e= .=20 > a pandoc.List where the each element is a block. In your case you should= =20 > check that the list item only contains one block of type ordered list: > > if #elem.c[i] =3D=3D 1 then list_item_contains_one_block_only =3D true en= d > > and check that this block is of type OrderedList: > if #elem.c[i]=3D=3D1 and elem.c[i].t =3D=3D 'OrderedList' then ... > > you should then add that block to the previous item, and remove the=20 > current item. > > Hope this helps, > > J > > On Saturday, February 25, 2023 at 10:06:45=E2=80=AFPM UTC JDTS wrote: > >> Thanks. Any pointers to lua filters that do something similar? >> >> On Saturday, February 25, 2023 at 10:01:08=E2=80=AFAM UTC-5 Julien Dutan= t wrote: >> >>> Looks feasible. Pandoc converts the first html to: >>> >>> [ BulletList >>> [ [ Plain >>> [ ... Inlines ] >>> ] >>> , [ BulletList >>> [ [ Plain >>> [ ... Inlines ] >>> ] >>> , [ Plain >>> [ ... Inlines ] >>> ] >>> ] >>> ] >>> , [ Plain >>> [ Inlines ] >>> ] >>> ] >>> ] >>> >>> I.e., the sublist is converted to its own list item. So the filter=20 >>> should pick up list, check if any item within them consists of a lone= =20 >>> sublist, and if so, move it to the previous item. (And best, apply the= =20 >>> filter recursively to that sublist itself.) >>> >>> On Saturday, February 25, 2023 at 2:26:04=E2=80=AFPM UTC JDTS wrote: >>> >>>> The Apple Notes app produces (via AppleScript) HTML for notes with=20 >>>> nested lists structured like: >>>> >>>>
    >>>> >>>>
  • Level 1 element 1
  • >>>> >>>>
      >>>> >>>>
    • Level 2 element 1
    • >>>> >>>>
    • Level 2 element 2
    • >>>> >>>>
    >>>> >>>>
  • Level 1 element 2
  • >>>> >>>>
>>>> >>>> As you can see, the sublist is incorrectly positioned. It should be= =20 >>>> positioned *within* the
  • Level 1 element 1 item, ala: >>>> >>>>
      >>>> >>>>
    • Level 1 element 1 >>>> >>>>
        >>>> >>>>
      • Level 2 element 1
      • >>>> >>>>
      • Level 2 element 2
      • >>>> >>>>
      >>>> >>>>
    • >>>> >>>>
    • Level 1 element 2
    • >>>> >>>>
    >>>> >>>> Is there a straightforward way with Lua filters to fix this at the AST= =20 >>>> level, for arbitrary-depth sublist nesting? >>>> >>> --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/8c2cd1be-52b9-467b-a747-a88fc062209bn%40googlegroups.com. ------=_Part_553_200784311.1677458034812 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
    Thanks, I'll investigate this. =C2=A0The HTML structure is generated = and therefore quite uniform, so it may be possible to do the munging there.= =C2=A0
    On = Sunday, February 26, 2023 at 10:47:36=E2=80=AFAM UTC-5 Julien Dutant wrote:=
    From my= labelled-lists filter (https://github.com/dialoa/dialectica-filters/bl= ob/main/labelled-lists/labelled-lists.lua), here is a filter + function= that checks whether every item in a bullet list starts with a Span element= .

    ```lua

    --- is_custom_labelled_list: Look for custom labels markup
    -- Custom label markup requires each item starting with a span

    -- containing the label

    -- @param element pandoc BulletList element

    function is_custom_labelled_list (element)

    =C2=A0=C2=A0 local
    is_cl_list =3D true


    =C2=A0=C2=A0 -- the content of BulletList is a List of List of Blocks

    =C2=A0=C2=A0 for
    _,blocks in ipairs(element.c<= /span>) do

    =C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 -- check that the first element of the f= irst block is Span

    =C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 if not( blocks[1].c[1].t =3D=3D 'Span' )
    then
    =C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 =C2=A0=C2=A0 is_cl_list =3D false=
    =C2=A0
    =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 break =C2=A0
    =C2=A0=C2=A0=C2= =A0=C2=A0 end

    =C2=A0=C2=A0=C2= =A0 end

    =C2=A0=C2=A0 return
    is_cl_list


    end

    return {= {

    BulletList <= span>=3D
    function(element)

    if= is_custom_labelled_list(element) then

    return
    pandoc.Para(pandoc.Str('Was a list of the require= d kind!)))

    = end

    end,
    }}

    ```
    The difficulty with manipulating lists is to follow their intr= icate=20 structure: a BulletList element as a content (element.c) that is a=20 pandoc List. Each item in it (element.c[1], element.c[2]) is of Blocks=20 type, i.e. a pandoc.List where the each element is a block. In your case yo= u should check that the list item only contains one block of type ordered l= ist:

    if #elem.c[i] =3D=3D 1 then list_item_contain= s_one_block_only =3D true end

    and check that this = block is of type OrderedList:
    if #elem.c[i]=3D=3D1 and elem.c[i].= t =3D=3D 'OrderedList' then ...

    you should= then add that block to the previous item, and remove the current item.

    Hope this helps,

    J
    =
    On Sa= turday, February 25, 2023 at 10:06:45=E2=80=AFPM UTC JDTS wrote:
    <= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 0.8ex;border-left:1p= x solid rgb(204,204,204);padding-left:1ex">Thanks. =C2=A0Any pointers to lu= a filters that do something similar?

    On Saturday, February 25, 2023 at 10:01:= 08=E2=80=AFAM UTC-5 Julien Dutant wrote:
    Looks feasible. Pandoc converts the first html to:<= /div>

    [ BulletList
    =C2=A0 =C2=A0 [ [ Plain
    =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 [ ... Inlines ]
    =C2=A0 =C2=A0 =C2=A0 ]
    = =C2=A0 =C2=A0 , [ BulletList
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [ [ Plai= n
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [ ... Inlines = ]
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ]
    =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 , [ Plain
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 [ ... Inlines=C2=A0 ]
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = ]
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ]
    =C2=A0 =C2=A0 =C2=A0 ]
    =C2= =A0 =C2=A0 , [ Plain
    =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [ Inlines ]
    = =C2=A0 =C2=A0 =C2=A0 ]
    =C2=A0 =C2=A0 ]
    ]

    I.e., the sublist is converted to its own list item. So the filter should = pick up list, check if any item within them consists of a lone sublist, and= if so, move it to the previous item. (And best, apply the filter recursive= ly to that sublist itself.)

    On Saturday, February 25, 2023 a= t 2:26:04=E2=80=AFPM UTC JDTS wrote:
    The Apple Notes app produces (via AppleScript) HTML for note= s with nested lists structured like:

    <ul>

    <li>Level 1 element 1</li>=

    <ul>

    <li>Level 2 element 1</li>=

    <li>Level 2 element 2</li>=

    </ul>

    <li>Level 1 element 2</li>=

    </ul>


    As you can see, the sublist is incorrectly positioned. =C2= =A0It should be positioned=C2=A0within=C2=A0the <li> Level 1 element 1 item, ala:

    <= div>

    <ul>

    <li>Level 1 element 1

    =C2=A0 =C2=A0 <ul>

    =C2=A0 =C2=A0 <li>Level 2 element 1</li>

    =C2=A0 =C2=A0 <li>Level 2 elem= ent 2</li>

    =C2=A0 =C2= =A0 </ul>

    </li><= br>

    <li>Level 1 elemen= t 2</li>

    </ul>


    Is there a straightforward way wi= th Lua filters to fix this at the AST level, for arbitrary-depth sublist ne= sting?

    --
    You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
    To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
    To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/8c2cd1be-52b9-467b-a747-a88fc062209bn%40googlegroups.= com.
    ------=_Part_553_200784311.1677458034812-- ------=_Part_552_1405842131.1677458034812--