From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29801 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Bastien DUMONT Newsgroups: gmane.text.pandoc Subject: Re: Custom styles in docx to markdown conversion. Date: Thu, 16 Dec 2021 14:06:19 +0000 Message-ID: References: <877dcckzsu.fsf@fastmail.fm> <87czlwhel7.fsf@fastmail.fm> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="4212"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDCINCES2QJRBLMQ5WGQMGQEU7LKBOY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Dec 16 15:07:43 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-wr1-f62.google.com ([209.85.221.62]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1mxrQ7-0000tt-In for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 16 Dec 2021 15:07:43 +0100 Original-Received: by mail-wr1-f62.google.com with SMTP id p17-20020adff211000000b0017b902a7701sf6957690wro.19 for ; Thu, 16 Dec 2021 06:07:43 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1639663663; cv=pass; d=google.com; s=arc-20160816; b=oHfbeCF/5N5Dy/a493lTvaAT/8W2bU3qasJzfWlE39vrjz+tdtY3DDTh8FLdXf6IXy Uw0jg7aOqdGD3A3VQamAqltc682RFiA+qhaVfvPfWf/KUm0jfanAhwOQUrpL8e5huVsj Exi2DdjeyLqeZndg4W8WPOD+qEGw4E0MMccc2NUXW6tS9qiM4Lyw9YHIcAKmLUxcCLns nQ3EPhLeYn4Pmz2tP2ng5EMaWhuL2Ku6KPqiaS2vZnNhmcjd6k3ZXD/aR5UkNUX9EEjq loIzUeJgE9AwWgQaT5nVyUoAhW41YPJ4avHvqBU91Zo37uqiw77iigwHkeU4MYGyhvwq N7GQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:sender:dkim-signature; bh=PHEDnB5uUs1sh8vR6t80j88lvfAPDjUoiQRafwl7kjA=; b=fC/NCzV18rU5lU1m+EJwsdLF/jkX5GSC4oTTdFG7XdldMUaVmPpgv6HebyNDQlZ+RY Acn74jkEyE7qSPypqsTjZAuL+L3jq6hk4G466msDl5gesAgXEY7xkyC51KqPRgp2HPpa rRzqwX1kLYnd6KQUZwEY2MdY09k9C3wRjgmkkv+ZDhyXBEMQza1rbZle/p2Ex70hrJBy ioZZ88YBIfkdPq8Y+zBz/ovzcLw/t2r0WwLxmoTFE6o2wF9LX2WFalaDQDbFintkNp1g LHSefLgfDutVFFzcpcw9zplTqcQ0xUUKfFt1KrpfSWvUEOF+S515lU01B49DkrT93Cs8 GM4A== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=DM10xJEq; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=PHEDnB5uUs1sh8vR6t80j88lvfAPDjUoiQRafwl7kjA=; b=WUqAs2M7/1TzZJF9DBa3Z49tpw6Ozms0Yw3kH5QrMEFlup7MpqfgTwSINXSy0rFxow qLQ4Yc3RPhfVrPO7QsDllLrpULduiX2zseb7QqzN2VcpdwcujOHVVlB4up/JAIHBU+59 WN+hyL1N2o7eM1veO1jzs4Deg/LA4ZXRWAwDYTObC3GO7iO8iICW4OWOkbJ20i3NgUIB rKFBnHhMLHTWLEYtRecfeU8/bX81TpdL+RXgDyuYXNQZDXp29wgCD4GoGP7+6r+4YvHZ 3A9my/c0xnZekOI+mMqAwWcm8ZA0i3NyPNgrHMDJAcdWxpNBJC+MJibYGxln1CuQpPpo Rdjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:date:from:to:subject:message-id :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=PHEDnB5uUs1sh8vR6t80j88lvfAPDjUoiQRafwl7kjA=; b=2053lu3zpqH7rE1G+4bFYsXxkrtq+LG4iNc/wB6ovzCm82/kI72SDxveHGYJqaaWHc 2gCb5J6XhmQih3QveP9r29NoZR6PMB9CD458s+zbLaeGLSxYIs4b8CDpATWk07kQsi7O Kag3NnSewf16Aq9rhMmaSjBepf66sUKEebjsuLLh6sEt9ufdynwxhN/ix+xzw0zR8Hzi VGXmzfNbafENsgHZgFCRaUdptr+K7/RbNtU5SlXQYRmkA6MAaXF4Z57mVI73/E7Ha9Av cxuWMfrSe/fikI0vsriWOYwo4mQE1mhNGO8GeYOS20+K Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM532WUSuhTl96Ql4Po4PmWLwxEGpZxqeIboJCRw/A4bZNZfkul0yR ZEx/y18WuUHCb6m4SZwypiQ= X-Google-Smtp-Source: ABdhPJz+q6qbohaeV57j3Pv/4ai6KdA90qL6xHaS9AQmRVkiA4ksB1mZhAGEPdH+mUuzR0gdMabDrw== X-Received: by 2002:a1c:c917:: with SMTP id f23mr5347590wmb.10.1639663663234; Thu, 16 Dec 2021 06:07:43 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:adf:d21b:: with SMTP id j27ls7129289wrh.3.gmail; Thu, 16 Dec 2021 06:07:40 -0800 (PST) X-Received: by 2002:a5d:4989:: with SMTP id r9mr9193799wrq.14.1639663660853; Thu, 16 Dec 2021 06:07:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1639663660; cv=none; d=google.com; s=arc-20160816; b=zm6xoJTjkdDxqZXSthHK/EsBEKcsaUIszmsqVhlaQDo9zeX+wEr6lMgba+Du1qXWCh DZMlh1mN/a0HnwFsSUKpZd8hq087fYP7cA43ShihXF49CBQ0GeMcgrCuN1JQtvFNv2eE ngKsriS6iMdEvF/CCGGTHHLTlUu+jS05ZKbhp/EqwZgwaRb0r6jqLbtJ/csDAd/gnxmy Yxm3DpmP8WBEIdTji/ZRLuB8nECjdJ2qZubew729EORScOgB57QZx/eMU9VvngdEjSCu bwSDqsYhSTQhbOUzyXGxx9zSI7AEO1JUiufmLp2w2iFrlla6mw1wJzGfY2XoiN8z4d3h vJAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:to:from:date :dkim-signature; bh=xxAwu0Cp2UqflUEP6I4wMrIF/C9qvEhYTE3XnOLfpic=; b=gj1hj/K6PhOJDD1pJiWRXwpCj4Wp+YvHZwXCYHJXkDFpozf9kvUU7VgNj4aJhf8wt1 JcdU/+TxYB492A+IuaTLk87AKqHJyjQH8naspixep9KJ0Loj8yBmfz0+CzxTRPHW06mB h+EraIBMwrVdGXeZXA1Ob/kfQ1hthN+Mw8V3cFqShUFy+kXBsumiFcRkQPjLY6ZdoOqJ 7CUPq3Et84MqdfXrDxIHn1SKU2Tf05Vw08j41aDP2oigoawfhBuV4md1Je1yJEnkUJyw P7fctyAEChDIzxUNBNQ5nZVtYqvMMXbx54pJcgyn1THHaitA8Brvon9IlNNmME+qslO3 FUhA== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=DM10xJEq; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Original-Received: from mout01.posteo.de (mout01.posteo.de. [185.67.36.65]) by gmr-mx.google.com with ESMTPS id i12si359957wml.2.2021.12.16.06.07.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Dec 2021 06:07:40 -0800 (PST) Received-SPF: pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) client-ip=185.67.36.65; Original-Received: from submission (posteo.de [89.146.220.130]) by mout01.posteo.de (Postfix) with ESMTPS id 7EC68240031 for ; Thu, 16 Dec 2021 15:07:40 +0100 (CET) Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4JFDTS0bxTz9rxQ for ; Thu, 16 Dec 2021 15:07:40 +0100 (CET) Content-Disposition: inline In-Reply-To: <87czlwhel7.fsf-97jfqw80gc6171pxa8y+qA@public.gmane.org> X-Original-Sender: bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=DM10xJEq; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29801 Archived-At: As for your second question, what you want is to get the content of the spa= n, without the containing span element itself. So: ``` function Span(el) if el.attributes['custom-style']:match('XYZ Body Text Char') then return el.content end end ``` The solution to "get rid of any OrderedList that immediately contains a Maj= or/Minor Head, but leave "normal" OrderedLists intact" is similar: since th= e content of an OrderedLists is a list of lists of Blocks, you want the ret= urn the content of the only Block in the first list of Blocks. This may wor= k: ``` function OrderedList(el) local possibleHeader =3D el.content[1][1] if possibleHeader.t =3D=3D 'Div' and possibleHeader.attributes['custom-style']:match('XYZ Minor Head') then return pandoc.Header(2, pandoc.utils.blocks_to_inlines(possibleHeader.c= ontent)) end end ``` Le Thursday 16 December 2021 =C3=A0 01:28:18PM, Joost Kremers a =C3=A9crit = : >=20 > On Fri, Dec 10 2021, John MacFarlane wrote: > >> Does that help? > > > > Yeah, that's enough information for me. > > > > What you need to do is to write a Lua filter like this: > > > > function Div(el) > > if el.attributes['custom-style']:match('XYZ Minor Head') then > > return pandoc.Header(2, pandoc.utils.blocks_to_inlines(el.content)) > > end > > end > > > > Hope it's clear what this does. >=20 > For some reason, it doesn't work... I tried to extend your filter to the > following: >=20 > ``` > function Div(el) > if el.attributes['custom-style']:match('XYZ Major Head') then > return pandoc.Header(1, pandoc.utils.blocks_to_inlines(el.content)) > elseif el.attributes['custom-style']:match('XYZ Minor Head') then > return pandoc.Header(2, pandoc.utils.blocks_to_inlines(el.content)) > elseif el.attributes['custom-style']:match('XYZ Body Text') then > return pandoc.Para(pandoc.utils.blocks_to_inlines(el.content)) > end > end > ``` >=20 > Using this filter, the custom style 'XYZ Body Text' is converted, but the= Major > and Minor Heads are not. When I convert to native (without the filter), I= don't > see a difference between Body Text on the one hand and Major or Minor Hea= ds on > the other: both are Div elements with "custom-style" set as indicated. On= ly the > body text is changed, the headers are not. >=20 > Could the problem be that the header Div's tend to appear inside an Order= edList? > For some strange reason, the Major and Minor Heads don't use numbering. I= nstead, > each header is an item in a numbered list... Is there a way to clean up s= uch > cases? I.e., get rid of any OrderedList that immediately contains a Major= /Minor > Head, but leave "normal" OrderedLists intact? >=20 > Another question: body text in the converted document is often enclosed i= n a > Span with a specific custom-style. I'd like to get rid of the span, since= the > style is of no interest to me, but I'm not sure what I should have the fu= nction > return. For example, the following: >=20 > ``` > function Span(el) > if el.attributes['custom-style']:match('XYZ Body Text Char') then > return pandoc.Para(pandoc.utils.blocks_to_inlines(el.content)) > end > end > ``` >=20 > raises an error. I also tried converting to Plain (honestly, I don't know= what > the correct type would be), and I tried just passing `el.content` to > `pandoc.Para`, but I keep getting errors. (Specifically: "Block > expected, got userdata", and also "table expected, got userdata" with Pla= in > instead of Para.) >=20 > I apologise for what is probably a barrage of newbie questions, but havin= g no > previous knowledge of Lua and only a vague understanding of Pandoc's inte= rnal > data types, I have a hard time figuring things out from the documentation= . >=20 > I appreciate any pointers. >=20 > TIA >=20 > --=20 > Joost Kremers > Life has its moments >=20 > --=20 > You received this message because you are subscribed to the Google Groups= "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgi= d/pandoc-discuss/87czlwhel7.fsf%40fastmail.fm. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/YbtH2yG9dD%2BfbURH%40localhost.