From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29800 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Joost Kremers Newsgroups: gmane.text.pandoc Subject: Re: Custom styles in docx to markdown conversion. Date: Thu, 16 Dec 2021 13:28:18 +0100 Message-ID: <87czlwhel7.fsf@fastmail.fm> References: <877dcckzsu.fsf@fastmail.fm> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35368"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: mu4e 1.6.9; emacs 28.0.90 Cc: John MacFarlane To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCS7B5O6XUOBB2HZ5SGQMGQE5H7IXSA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Dec 16 14:19:39 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-pj1-f63.google.com ([209.85.216.63]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1mxqfb-0008xX-HR for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 16 Dec 2021 14:19:39 +0100 Original-Received: by mail-pj1-f63.google.com with SMTP id x18-20020a17090a789200b001a7317f995csf1746434pjk.4 for ; Thu, 16 Dec 2021 05:19:39 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1639660777; cv=pass; d=google.com; s=arc-20160816; b=K6NW9/Un1Csc1gb3Gt1jfYJk4cHtSJiYDklN+fAGm6VQ39mkXUa27kVaXTaTNVYBDL yc6CQd9iZfHRQHCfxx9L20LF05oRaKUet60eZUpOvYr3kSlTNIiiSBBSfgpFLsNPEcJu C7zOdjbbsIwu5bp4p/hO+DturSVSN/jkKjcHQquKbXdWVzHs2bCYfA/ARfuyZ1enQ/A4 z8LEmQYq2nGQN1qxTEZxWLcUI5RDgGqClQKlM+jsbe5eF5gTuZcG9AxuCs66Cb5ohBAS NkMIogUxRYIWPEm0MyqJPf5UiR9VfbnQ5HUA2ws8QEBcn5J9lFcD5v8aNkDuxeFJZoeK Y5CQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version:message-id :in-reply-to:date:subject:cc:to:from:user-agent:references:sender :dkim-signature; bh=5q0wMkh3M7WXNyR+6TEBTapb0iHyXZkKAHKI2TujDrA=; b=OwL2clJsrlgsgTw3LBa3AiA2BVh9otzDQNgWy6gQVoItue0aYjiDNBWQBVRd6DE+k3 M1NksvDMag0JgzuGrLpuwLHIOKZiDFCCeMAaGouo7x9W6eH4Dd3GcHU2VyTpmqbNs7o6 2PaP22Vs65erBLB9MPcVhhF3qzG9b4si6jvtSErxLmt0e1BtxKOapzOMHX1Eg40SJoOh BerHA8e93rmIoR7boX3hK8iCzLi0wP8kJI2vNjDjWWhBB4vTSFVchJ0Ir0g0FTFLTe04 PqWKljKG5UynavStGUPPuT+/KJuErG/KCf4jQ6PXvqd6iOdIzjbworLKO0zeggEM8G6o 7aDA== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@fastmail.fm header.s=fm1 header.b=QZpiz59r; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b="eKhE/Kmh"; spf=pass (google.com: domain of joostkremers-97jfqw80gc6171pxa8y+qA@public.gmane.org designates 66.111.4.26 as permitted sender) smtp.mailfrom=joostkremers-97jfqw80gc6171pxa8y+qA@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fastmail.fm DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:references:user-agent:from:to:cc:subject:date:in-reply-to :message-id:mime-version:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=5q0wMkh3M7WXNyR+6TEBTapb0iHyXZkKAHKI2TujDrA=; b=lXj+D/+Hc3qq7K3DOsA1PFuJTJguQmNPGskfVOZ6LSEPtWr45HHmdf/Ymcz0zBYf+r FGerhle2nWbOB5a6CJ9aHzq5uc9ZfJu327eZk9E8hneiwibiUJ/mHXzg6VE0FSRFqZ9F jxy/GE51A9JpxQcwoO3y3a2EUazXh1hXIXzWPX5zjmhQCIYzfD5PbYlft+VDX/8QgzWF KSgWQ6DZE5QGI0DUvSkTia8Jn+rdhMC4/S+iKXz0CGOaNxyc2YcS7evYrLANgAvT2X7I jopUr2SyofYUoOWZXOiJhrrYDyW4tpDzFW/OKHGEOgD0o4vp/WcsrA334D7UozA99Xsi ml7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:references:user-agent:from:to:cc:subject :date:in-reply-to:message-id:mime-version:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=5q0wMkh3M7WXNyR+6TEBTapb0iHyXZkKAHKI2TujDrA=; b=ATHTqWgr+3zQlFL8qq/dryGH7TisG51J01KvQ9cmEEvV1qHCm8WStNMR5fw7FX9qdv rYrPlbAAYzOxWHIq5rzvHU14c6HWAHiPnkOXIhErPyczhB12FXC/KzqZAQNOgA/NqwWC V+vEXuSzOYmvnuWpkbo0zo/7BghnQgl+UlTEcT+29k1yoTrzkMOp2FszE1TiQuj8XYX9 rkKxVYYIMtWw5FpBt9MqvYDtQwZMlnbw2HDoJ3NVdW7Wd6Uh37rifIXTKOj+InZa+ikw L4k4snk3L/+I8Bl9NQ3Tu+CmqFhTPHbcpUxR+Ef9pcU8YilcZ7bKSveKudV8IFlDSVxT 5FoQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM532p9W4PWflraTH6osZM1qCf1VCQoec6M9zswSBffaoTpMkAlRAG rHug5mi3Ql4kgcUo4gcURFI= X-Google-Smtp-Source: ABdhPJwreThqBS5QMv1yfNknVsRTqwVoBJzJtuZaUbG0rt2EbONqFd9oC28eNQaykogoOgCOyxt6Jg== X-Received: by 2002:a17:902:d2c1:b0:148:a2e8:2c14 with SMTP id n1-20020a170902d2c100b00148a2e82c14mr9614233plc.99.1639660777690; Thu, 16 Dec 2021 05:19:37 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a17:903:244f:: with SMTP id l15ls3787386pls.8.gmail; Thu, 16 Dec 2021 05:19:36 -0800 (PST) X-Received: by 2002:a17:902:b40e:b0:148:a2f7:9d67 with SMTP id x14-20020a170902b40e00b00148a2f79d67mr9583257plr.134.1639660775914; Thu, 16 Dec 2021 05:19:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1639660775; cv=none; d=google.com; s=arc-20160816; b=hLmlIhwDZ0kgmCABMsQWpqt1rxLTJOXCR9xfTfpdgDdqFZ5h6THMiQ5L19iaxx1WO5 HRmkSyluUr4zX1Y9NxAzEiygehzX5YNEolD23nf8FzzBwmcyVlZTbJPY6DNusSg2ncKD PhAHc++m/58RR+PJI2G+xRIR+1UoSXGVF1go/gDE0i0aOWWh4s5R/28cCfe/utZvnRq3 ZljGzkKdKdSWoKoLCIU0DyNARkGPAFOIwx3cQU7HBbaR4GislEHjqhTFvt8EHc1RxuIO iquy2hl7Z+LNg0zh+eRXaaxRxkSI2ld0T+sZdGQEbzYXmfBadrQwliBB+/DDy24IyuQs bcpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:message-id:in-reply-to:date:subject:cc:to:from :user-agent:references:dkim-signature:dkim-signature; bh=9QgGw8dc4Mc1L/7rEJLs4XfWcqXYm4F2fp20zSHnGME=; b=pq6usIHU05qg84Bo2ExXaw7Ytj15gwr8Ibma4TQtqYjsQMvcqT7ruXR74O6nNiain0 GQBbqotBR6d8zdHvs0UTKDFvv7rUTMK+fe7Kf/DcXsHQJPDEnHguwlI6d+en+tUspWYc 5nr38p5H/hMgLkClqT4+6xLySx2BinGxBB4kaigEjR/ENIld/q9I2BtVzcqR4fEZ8i1h gxsx9DkqnUywLlt9qXaCwC7YQz926Mrt6ITlg2UX5dae/n41STB1FTjrkgHn1XmXQEni G/hO5Hs7yYMI1Mha36bJXG02Lnf54ajD8SojvzgMBmXrI2sCvm28wH2x347+xrB6uh3f r5Xg== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@fastmail.fm header.s=fm1 header.b=QZpiz59r; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b="eKhE/Kmh"; spf=pass (google.com: domain of joostkremers-97jfqw80gc6171pxa8y+qA@public.gmane.org designates 66.111.4.26 as permitted sender) smtp.mailfrom=joostkremers-97jfqw80gc6171pxa8y+qA@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fastmail.fm Original-Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com. [66.111.4.26]) by gmr-mx.google.com with ESMTPS id np5si277545pjb.3.2021.12.16.05.19.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Dec 2021 05:19:35 -0800 (PST) Received-SPF: pass (google.com: domain of joostkremers-97jfqw80gc6171pxa8y+qA@public.gmane.org designates 66.111.4.26 as permitted sender) client-ip=66.111.4.26; Original-Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 3D11D5C03E9; Thu, 16 Dec 2021 08:19:35 -0500 (EST) Original-Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Thu, 16 Dec 2021 08:19:35 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrleeggdehudcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhepfhgfhffvufffjgfkgggtsehttdertd dtredtnecuhfhrohhmpeflohhoshhtucfmrhgvmhgvrhhsuceojhhoohhsthhkrhgvmhgv rhhssehfrghsthhmrghilhdrfhhmqeenucggtffrrghtthgvrhhnpedviefhteeuveekud etheduieehvdelgfeijeehvdetfeegleelhefhjeduieejveenucevlhhushhtvghrufhi iigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehjohhoshhtkhhrvghmvghrshesfh grshhtmhgrihhlrdhfmh X-ME-Proxy: Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 16 Dec 2021 08:19:34 -0500 (EST) In-reply-to: X-Original-Sender: joostkremers-97jfqw80gc6171pxa8y+qA@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@fastmail.fm header.s=fm1 header.b=QZpiz59r; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b="eKhE/Kmh"; spf=pass (google.com: domain of joostkremers-97jfqw80gc6171pxa8y+qA@public.gmane.org designates 66.111.4.26 as permitted sender) smtp.mailfrom=joostkremers-97jfqw80gc6171pxa8y+qA@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fastmail.fm Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29800 Archived-At: On Fri, Dec 10 2021, John MacFarlane wrote: >> Does that help? > > Yeah, that's enough information for me. > > What you need to do is to write a Lua filter like this: > > function Div(el) > if el.attributes['custom-style']:match('XYZ Minor Head') then > return pandoc.Header(2, pandoc.utils.blocks_to_inlines(el.content)) > end > end > > Hope it's clear what this does. For some reason, it doesn't work... I tried to extend your filter to the following: ``` function Div(el) if el.attributes['custom-style']:match('XYZ Major Head') then return pandoc.Header(1, pandoc.utils.blocks_to_inlines(el.content)) elseif el.attributes['custom-style']:match('XYZ Minor Head') then return pandoc.Header(2, pandoc.utils.blocks_to_inlines(el.content)) elseif el.attributes['custom-style']:match('XYZ Body Text') then return pandoc.Para(pandoc.utils.blocks_to_inlines(el.content)) end end ``` Using this filter, the custom style 'XYZ Body Text' is converted, but the Major and Minor Heads are not. When I convert to native (without the filter), I don't see a difference between Body Text on the one hand and Major or Minor Heads on the other: both are Div elements with "custom-style" set as indicated. Only the body text is changed, the headers are not. Could the problem be that the header Div's tend to appear inside an OrderedList? For some strange reason, the Major and Minor Heads don't use numbering. Instead, each header is an item in a numbered list... Is there a way to clean up such cases? I.e., get rid of any OrderedList that immediately contains a Major/Minor Head, but leave "normal" OrderedLists intact? Another question: body text in the converted document is often enclosed in a Span with a specific custom-style. I'd like to get rid of the span, since the style is of no interest to me, but I'm not sure what I should have the function return. For example, the following: ``` function Span(el) if el.attributes['custom-style']:match('XYZ Body Text Char') then return pandoc.Para(pandoc.utils.blocks_to_inlines(el.content)) end end ``` raises an error. I also tried converting to Plain (honestly, I don't know what the correct type would be), and I tried just passing `el.content` to `pandoc.Para`, but I keep getting errors. (Specifically: "Block expected, got userdata", and also "table expected, got userdata" with Plain instead of Para.) I apologise for what is probably a barrage of newbie questions, but having no previous knowledge of Lua and only a vague understanding of Pandoc's internal data types, I have a hard time figuring things out from the documentation. I appreciate any pointers. TIA -- Joost Kremers Life has its moments