From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29346 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: BPJ Newsgroups: gmane.text.pandoc Subject: Re: Docx reader: First column header Date: Thu, 7 Oct 2021 13:50:11 +0200 Message-ID: References: <7a08b5a6-e1d5-437d-9d5e-5bf6cb57fa9cn@googlegroups.com> <668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000002245b205cdc1dec4" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37333"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCWMVYEK54FRBAF67OFAMGQEBQXPG3Q-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Oct 07 13:50:27 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-lf1-f63.google.com ([209.85.167.63]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1mYRus-0009YF-Sk for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 07 Oct 2021 13:50:26 +0200 Original-Received: by mail-lf1-f63.google.com with SMTP id v2-20020ac25582000000b003fd1c161a31sf4284976lfg.15 for ; Thu, 07 Oct 2021 04:50:26 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1633607426; cv=pass; d=google.com; s=arc-20160816; b=fbvL0As+zJkNayw26UYgM7CjquiaKDazd/sOMlsY3QoyBtv/iqZfhXGNlAxoapbLd2 rQRDGgAYaGZQQe9jE6YXvEavkI5QlKwrI2XBIAxiYfniswC9TIgpuDKezY6UIrpJLCWN Qmwyq+KK5I5jXuUQ3XmGZ3MncDnRzCUVq1HozK/1NPLV0IyqmO9Hhw8alwmXXBNkSFiy pkU++Y86P2eAcb+6Db5UajepzdN3fxhW8YW9cNPKtyx5c1tJBRJzMWkOY0ZASRGsy9EU QTKf4EMdeHDOhYo858kl5ctAr6+vokMRWFGVdsy+YqrQohcFP3T3WtGF1SYpBBJossf/ b7Zw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:to:subject:message-id:date:from :reply-to:in-reply-to:references:mime-version:sender:dkim-signature; bh=PkzDH3ysDK8ZtPoCdYT7frgBsflWjVY0ITh8I7EjQLI=; b=tDtZUKk8HbKY2D3wj9WvzEIxrvHDg6az879J8pTIkOvUyKvsPWegmnIOvKgG8LN6ru 5/JBGp0luIxxHBDIi6zsBA6SKp7WJdMa/czdl/5AcGY+9XfoscRhhNki9tFingDwGjqD wZ5MtMoXtWAvSGm6BUaKFBOMoNBlwD0g76TBMRk1rSwl4HjiiYchN8Vt5P+OqM9D2Wuo G8unmlmIAkBmOimF0P+HOgONwEn7NH0Lwq4TbBAj/K39fYdWQ25GN3Aln1DPgtoy1Otc 0lhc/6cwWpLk7i7gnbyDQXHCylPY24LIUAM6MN6X1NpUdurQ1dh3gnCgT8vMZk6zCYKY UZOQ== ARC-Authentication-Results: i=2; gmr-mx.google.com; spf=pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.167.47 as permitted sender) smtp.mailfrom=melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:mime-version:references:in-reply-to:reply-to:from:date :message-id:subject:to:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=PkzDH3ysDK8ZtPoCdYT7frgBsflWjVY0ITh8I7EjQLI=; b=QkerF079Cq98LeCk70OEaSQQ7auhAKQEAXd6+HeSK98TY+wS4G1UjpwhDHm9KDzIVn cPoArzQPPuS2tlrL+SgZ9QT7nNzZ0wVYvFjV9vzMSusF271k7S5DxzA/UaehVcqtC3jR Ji/3eJYGDtDHERen4mxIY8uIp/hPL9vWJnCPQrq7Lw3m9Ql/tEsta6+7g6IFbmQQZqQy /rbE7pT+ECD5e2jN76Na3W8qbT0vCWQNprhMCLIw9J6/4rt8Gs9gybOCXFNo1y3o3zft wGAKZtOkQVxuRlDYkAbFwf9n98h3kAZyMhV3KDbaerfJYuMTJgM4L3NZhQSvhwwRF57k BaxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:mime-version:references:in-reply-to :reply-to:from:date:message-id:subject:to:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=PkzDH3ysDK8ZtPoCdYT7frgBsflWjVY0ITh8I7EjQLI=; b=RcqcoTNElKPAVqbhGqkKpkG1onPxc1LnbZmXnWMaPsJJI+o8pe0tIvzdCLZrRU4Hd+ FwhHZnTYYN2gY4e29vUrY9Ok3nT0gcuL8Fx5Ey7PDAcQ9qLgSca94ODaXW+brqDHLndR Au0QUzDvjsLvU9kpb4Q5vyVeezfI8oFou7XA8jnCYVxPv63n8aRUz2PCaqjGKv6LTq28 jIsimrNJefHGbYdSVERts55i6dTpx/EV+zmBPAjgS8RHmtqXpYmI/pQgP+cSMOvcIJ/P KmiEpDeC9eV+F82n47rq+6LW1uJ1KU1Do8T7o5gzK0+z8fFdVTFw2aAt91s58jaUAaNk ptgg== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM533hbLv61FGNoOBms2n7NJjL/ivC1d0NocMqQ6cwq+BcusYMj11S FpknsqCVOhW6u5isN2cMbPY= X-Google-Smtp-Source: ABdhPJze5hcUM+0Cj+Ht61i8E1KxmHdk8CLEesv3R2G00/SjvEiowHtEzPfHcqGrhkmp8idmTM50iA== X-Received: by 2002:a05:6512:11cc:: with SMTP id h12mr4011695lfr.110.1633607426322; Thu, 07 Oct 2021 04:50:26 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a2e:a274:: with SMTP id k20ls548822ljm.11.gmail; Thu, 07 Oct 2021 04:50:23 -0700 (PDT) X-Received: by 2002:a05:651c:88d:: with SMTP id d13mr3952994ljq.399.1633607423425; Thu, 07 Oct 2021 04:50:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633607423; cv=none; d=google.com; s=arc-20160816; b=mPgLwgFJ8X6JhyqgFSYeEK+byZ0e3BbvqrnBDHMsFRhqnNgcrQHBAW/+e46uvYcv3L 8dz4EKssiJh1LllAPz8lNdEd+XO+j3zZ4XxBrNUaHxtQjfZaSBl54adPG7q8yIVmZqX4 XWGwjw0hxko9iv4OsSHxd293Zof3yorFUNCvO/9deo/7SD9vKOcAxloKJnFSY7RcLMv3 3m+jLTs9MwdcIOlDvm50hoQQkese1jKaJgnA9pgAs8ckCrbqTsfidHhjmOZOvfJeVLZ8 EmlI/j5XDvH2LuQRRqWPm1QAwvbUdejpCTeaAeLs92Mt5hxjQmn2ptOxN6SdY/q4Et2z ZsYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version; bh=m/x3/n9vF4jxFub6fGivKOhd3uaEEVaCYutXwJ9CJLs=; b=CSUoq7x9a17d6xoY3QTs2f6tgCYwv5fNgJpb9tkp9L8D40AdVBKxDkgxVrP0OVBjvM p1KhPAoBYnsUYlt+CflGjcd8Bs2dYfMCbMS2EWONB8Q6xjgadr2/vtH2G/qIZNjnfXYh NBRcw472bwF3tGqsQ+M3pN/KESIkge5AwfLM+2faRI5Yc0UgAH2xB0OwoFygzaoXOb7R 9PgzJ+VrXTUpbZWIHYLNTMrscktx0p5ZnsXbeeQ4gM6J9AXMX504UNLzbkLMsn96oPPL rNqgwkyTMkYsVRfONL9JQkUksl35fWPcvNDEf17eWDu6i0L0Gc1DYvJcDiazuLI528de AKyQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.167.47 as permitted sender) smtp.mailfrom=melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Original-Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com. [209.85.167.47]) by gmr-mx.google.com with ESMTPS id s16si56747lfp.6.2021.10.07.04.50.23 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 07 Oct 2021 04:50:23 -0700 (PDT) Received-SPF: pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.167.47 as permitted sender) client-ip=209.85.167.47; Original-Received: by mail-lf1-f47.google.com with SMTP id m3so23991807lfu.2 for ; Thu, 07 Oct 2021 04:50:23 -0700 (PDT) X-Received: by 2002:a05:6512:3045:: with SMTP id b5mr3843249lfb.259.1633607422721; Thu, 07 Oct 2021 04:50:22 -0700 (PDT) In-Reply-To: <668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.167.47 as permitted sender) smtp.mailfrom=melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29346 Archived-At: --0000000000002245b205cdc1dec4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The bold style was just a proof of concept because that is what people most often want. You could just as well wrap the cell content in a span or div for CSS styling, or inject raw LaTeX or other markup depending on your target format. If the content of the cells in your docx file is styled with a named character style[^1] you can run pandoc with `--from docx+styles` and paragraphs/text styled with a named style are wrapped in divs/spans with a custom attribute `custom-style` with the style name as value. A filter script can locate spans or divs with such an attribute and modify its attributes to something more CSS friendly and/or inject markup. It might even pay off to go through the docx file in a word processor and apply one or more named paragraph styles to be picked up by filter script(s). It is also possible to match the raw text of table cells, even though Lua regular expressions are rather limited.[^2] [^1]: I'm unsure whether paragraph styles work in tables =E2=80=94 I'm not = a Word user and only a very reluctant LibreOffice user and table styles are not yet supported by Pandoc. [^2]: Alternations, quantified groups and Unicode are not supported which severely limits matching possibilities. Sometimes it is possible to try multiple patterns instead. Den ons 6 okt. 2021 11:44Cardea skrev: > I'm sorry I know my jargon is not really precise, your script is closely > related to what > I want except that it systematically bolds the first column. > I looked into the documentation > > and I guess it is related to the conditional formatting of the first "VBa= nd" > On Tuesday, 5 October 2021 at 13:13:02 UTC+2 BP wrote: > >> On 2021-10-05 10:18, Cardea wrote: >> > >> > Greetings, >> > The docx reader now can converts pretty accurately word table; Also it >> > looks like first column table header are not kept around. I guess this >> is >> > so because the AST can not accommodate for this kind of structure. >> > Is there any project of at least keeping this information around? >> > >> > Thanks >> > >> >> Do you mean that you have paragraphs formatted with say "Heading 3" in a >> table cell or that you want the text in the first column formatted like >> in a column heading (for which the proper term is _stub_)? >> >> If the latter, and if the formatting you want is bold you can fake it >> with a filter like the one attached. >> >> -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/668fa3f9-b2a2-4d2c-a113-= ed93963cc3cbn%40googlegroups.com > > . > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/CADAJKhB9nkcrHiC7XJgFN58WqqYtqotCoPz-1D9GHV0cwiqORg%40mail.g= mail.com. --0000000000002245b205cdc1dec4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The bold style was just a proof of concept because that i= s what people most often want. You could just as well wrap the cell content= in a span or div for CSS styling, or inject raw LaTeX or other markup depe= nding on your target format.

I= f the content of the cells in your docx file is styled with a named charact= er style[^1] you can run pandoc with `--from docx+styles` and paragraphs/te= xt styled with a named style are wrapped in divs/spans with a custom attrib= ute `custom-style` with the style name as value. A filter script can locate= spans or divs with such an attribute and modify its attributes to somethin= g more CSS friendly and/or inject markup. It might even pay off to go throu= gh the docx file in a word processor and apply one or more named paragraph = styles to be picked up by filter script(s). It is also possible to match th= e raw text of table cells, even though Lua regular expressions are rather l= imited.[^2]

[^1]: I'= m unsure whether paragraph styles work in tables =E2=80=94 I'm not a Wo= rd user and only a very reluctant LibreOffice user and table styles are not= yet supported by Pandoc.

[^2]: Alternations, quantified groups and Unicode are not supported which= severely limits matching possibilities. Sometimes it is possible to try mu= ltiple patterns instead.

Den ons 6 okt. 2021 11:44Cardea <gchapuis10-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
I'm sorry I know my jargon is no= t really precise, your script is closely related to what
I w= ant except that it systematically bolds the first column.
I l= ooked into the documentation and I guess it is related to t= he conditional formatting of the first "VBand"
On Tuesday, 5 Octo= ber 2021 at 13:13:02 UTC+2 BP wrote:
On 2021-10-05 10:18, Cardea wrote:
>=20
> Greetings,
> The docx reader now can converts pretty accurately word table; Als= o it
> looks like first column table header are not kept around. I guess = this is
> so because the AST can not accommodate for this kind of structure.
> Is there any project of at least keeping this information around?
>=20
> Thanks
>=20

Do you mean that you have paragraphs formatted with say "Heading 3= " in a=20
table cell or that you want the text in the first column formatted like= =20
in a column heading (for which the proper term is _stub_)?

If the latter, and if the formatting you want is bold you can fake it= =20
with a filter like the one attached.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh4Ykp1iOSErHA@public.gmane.org= m.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/668fa3f9-b= 2a2-4d2c-a113-ed93963cc3cbn%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.= google.com/d/msgid/pandoc-discuss/CADAJKhB9nkcrHiC7XJgFN58WqqYtqotCoPz-1D9G= HV0cwiqORg%40mail.gmail.com.
--0000000000002245b205cdc1dec4--