* Docx reader: First column header
@ 2021-10-05 8:18 Cardea
[not found] ` <7a08b5a6-e1d5-437d-9d5e-5bf6cb57fa9cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Cardea @ 2021-10-05 8:18 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 712 bytes --]
Greetings,
The docx reader now can converts pretty accurately word table; Also it
looks like first column table header are not kept around. I guess this is
so because the AST can not accommodate for this kind of structure.
Is there any project of at least keeping this information around?
Thanks
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7a08b5a6-e1d5-437d-9d5e-5bf6cb57fa9cn%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 1061 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Docx reader: First column header
[not found] ` <7a08b5a6-e1d5-437d-9d5e-5bf6cb57fa9cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-10-05 11:12 ` Benct Philip Jonsson
[not found] ` <e7f1f2ad-d15e-3942-17f7-a8b62162b34d-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Benct Philip Jonsson @ 2021-10-05 11:12 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1: Type: text/plain, Size: 1071 bytes --]
On 2021-10-05 10:18, Cardea wrote:
>
> Greetings,
> The docx reader now can converts pretty accurately word table; Also it
> looks like first column table header are not kept around. I guess this is
> so because the AST can not accommodate for this kind of structure.
> Is there any project of at least keeping this information around?
>
> Thanks
>
Do you mean that you have paragraphs formatted with say "Heading 3" in a
table cell or that you want the text in the first column formatted like
in a column heading (for which the proper term is _stub_)?
If the latter, and if the formatting you want is bold you can fake it
with a filter like the one attached.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e7f1f2ad-d15e-3942-17f7-a8b62162b34d%40gmail.com.
[-- Attachment #2: fake-stubs.lua --]
[-- Type: text/x-lua, Size: 2736 bytes --]
-- Pandoc filter to simulate stubs (aka "row headers") in the first column of
-- tables by formatting simple cell content as bold/strong.
--
-- This filter is a heavily edited version of code compiled from MoonScript.
-- Please report any problems to <bpjonsson+pandoc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
--
-- This software is Copyright (c) 2021 by Benct Philip Jonsson.
-- This is free software, licensed under:
-- The MIT (X11) License
-- http://www.opensource.org/licenses/mit-license.php
-- Save some typing...
local p = assert(pandoc, "Cannot get the pandoc library")
local u = assert(p.utils, "Cannot get the pandoc.utils library")
local L = assert(p.List, "Cannot get the pandoc.List class")
local pt = p.types
-- I'm using SimpleTable based on the assumption that most/all
-- docx tables fit into a SimpleTable, because I'm not sure
-- that I can find my way around the new Table model...
local to_simple_table = u.to_simple_table
if not to_simple_table then
if pt and PANDOC_VERSION then
local Version = pt.Version
if Version and PANDOC_VERSION >= Version('2.10.0') then
error(format("This filter does not work with pandoc %s", tostring(PANDOC_VERSION)))
end
end
end
-- More saving on typing
local format = string.format
local unpack = table.unpack
-- Package some boilerplate
local function try (what, fun, ...)
-- In case fun is a callable object
call = function (...) return fun(...) end
-- Collect any nimber of return values
local res = {pcall(call, ...)}
if res[1] then -- on success
-- Return the return values
return unpack(res, 2)
else -- on failure
-- Propagate the error message along with the description
error(format("Error %s: %s", what, res[2]))
end
end
function Table (tab)
if to_simple_table then
tab = try( "converting Table to SimpleTable", to_simple_table, tab )
end
-- Loop over the indices so that we don't confuse the stateless iterator
-- when we modify the list of rows
for r=1, #tab.rows do
local row = tab.rows[r]
local stub = row[1] -- first cell in the row
-- Check that we got simple cell contents
if 1 == #stub then -- if only one block in the cell...
local block = stub[1]
local tag = block.tag
-- ...and that block is a Para or Plain
if ('Para' == tag) or ('Plain' == tag) then
-- Wrap the block content in a Strong to simulate a stub
block.content = L{ p.Strong(block.content) }
-- Is all this rearguard action necessary?
stub[1] = block
row[1] = stub
tab.rows[r] = row
end
end
end
if to_simple_table then
tab = try( "converting SimpleTable to Table", u.from_simple_table, tab )
end
return tab
end
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Docx reader: First column header
[not found] ` <e7f1f2ad-d15e-3942-17f7-a8b62162b34d-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2021-10-06 9:43 ` Cardea
[not found] ` <668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Cardea @ 2021-10-06 9:43 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1547 bytes --]
I'm sorry I know my jargon is not really precise, your script is closely
related to what
I want except that it systematically bolds the first column.
I looked into the documentation
<https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.tablelook?view=openxml-2.8.1>
and I guess it is related to the conditional formatting of the first "VBand"
On Tuesday, 5 October 2021 at 13:13:02 UTC+2 BP wrote:
> On 2021-10-05 10:18, Cardea wrote:
> >
> > Greetings,
> > The docx reader now can converts pretty accurately word table; Also it
> > looks like first column table header are not kept around. I guess this is
> > so because the AST can not accommodate for this kind of structure.
> > Is there any project of at least keeping this information around?
> >
> > Thanks
> >
>
> Do you mean that you have paragraphs formatted with say "Heading 3" in a
> table cell or that you want the text in the first column formatted like
> in a column heading (for which the proper term is _stub_)?
>
> If the latter, and if the formatting you want is bold you can fake it
> with a filter like the one attached.
>
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 2187 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Docx reader: First column header
[not found] ` <668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-10-07 11:50 ` BPJ
0 siblings, 0 replies; 4+ messages in thread
From: BPJ @ 2021-10-07 11:50 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1: Type: text/plain, Size: 3593 bytes --]
The bold style was just a proof of concept because that is what people most
often want. You could just as well wrap the cell content in a span or div
for CSS styling, or inject raw LaTeX or other markup depending on your
target format.
If the content of the cells in your docx file is styled with a named
character style[^1] you can run pandoc with `--from docx+styles` and
paragraphs/text styled with a named style are wrapped in divs/spans with a
custom attribute `custom-style` with the style name as value. A filter
script can locate spans or divs with such an attribute and modify its
attributes to something more CSS friendly and/or inject markup. It might
even pay off to go through the docx file in a word processor and apply one
or more named paragraph styles to be picked up by filter script(s). It is
also possible to match the raw text of table cells, even though Lua regular
expressions are rather limited.[^2]
[^1]: I'm unsure whether paragraph styles work in tables — I'm not a Word
user and only a very reluctant LibreOffice user and table styles are not
yet supported by Pandoc.
[^2]: Alternations, quantified groups and Unicode are not supported which
severely limits matching possibilities. Sometimes it is possible to try
multiple patterns instead.
Den ons 6 okt. 2021 11:44Cardea <gchapuis10-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
> I'm sorry I know my jargon is not really precise, your script is closely
> related to what
> I want except that it systematically bolds the first column.
> I looked into the documentation
> <https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.tablelook?view=openxml-2.8.1>
> and I guess it is related to the conditional formatting of the first "VBand"
> On Tuesday, 5 October 2021 at 13:13:02 UTC+2 BP wrote:
>
>> On 2021-10-05 10:18, Cardea wrote:
>> >
>> > Greetings,
>> > The docx reader now can converts pretty accurately word table; Also it
>> > looks like first column table header are not kept around. I guess this
>> is
>> > so because the AST can not accommodate for this kind of structure.
>> > Is there any project of at least keeping this information around?
>> >
>> > Thanks
>> >
>>
>> Do you mean that you have paragraphs formatted with say "Heading 3" in a
>> table cell or that you want the text in the first column formatted like
>> in a column heading (for which the proper term is _stub_)?
>>
>> If the latter, and if the formatting you want is bold you can fake it
>> with a filter like the one attached.
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhB9nkcrHiC7XJgFN58WqqYtqotCoPz-1D9GHV0cwiqORg%40mail.gmail.com.
[-- Attachment #2: Type: text/html, Size: 4813 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-10-07 11:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-05 8:18 Docx reader: First column header Cardea
[not found] ` <7a08b5a6-e1d5-437d-9d5e-5bf6cb57fa9cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-10-05 11:12 ` Benct Philip Jonsson
[not found] ` <e7f1f2ad-d15e-3942-17f7-a8b62162b34d-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2021-10-06 9:43 ` Cardea
[not found] ` <668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-10-07 11:50 ` BPJ
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).