public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Docx reader: First column header
@ 2021-10-05  8:18 Cardea
       [not found] ` <7a08b5a6-e1d5-437d-9d5e-5bf6cb57fa9cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Cardea @ 2021-10-05  8:18 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 712 bytes --]


Greetings,
The docx reader now can converts pretty accurately word table; Also it 
looks like first column table header are not kept around. I guess this is 
so because the AST can not accommodate for this kind of structure.
Is there any project of at least keeping this information around? 

Thanks

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7a08b5a6-e1d5-437d-9d5e-5bf6cb57fa9cn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1061 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Docx reader: First column header
       [not found] ` <7a08b5a6-e1d5-437d-9d5e-5bf6cb57fa9cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-10-05 11:12   ` Benct Philip Jonsson
       [not found]     ` <e7f1f2ad-d15e-3942-17f7-a8b62162b34d-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Benct Philip Jonsson @ 2021-10-05 11:12 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 1071 bytes --]

On 2021-10-05 10:18, Cardea wrote:
> 
> Greetings,
> The docx reader now can converts pretty accurately word table; Also it
> looks like first column table header are not kept around. I guess this is
> so because the AST can not accommodate for this kind of structure.
> Is there any project of at least keeping this information around?
> 
> Thanks
> 

Do you mean that you have paragraphs formatted with say "Heading 3" in a 
table cell or that you want the text in the first column formatted like 
in a column heading (for which the proper term is _stub_)?

If the latter, and if the formatting you want is bold you can fake it 
with a filter like the one attached.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e7f1f2ad-d15e-3942-17f7-a8b62162b34d%40gmail.com.

[-- Attachment #2: fake-stubs.lua --]
[-- Type: text/x-lua, Size: 2736 bytes --]

-- Pandoc filter to simulate stubs (aka "row headers") in the first column of
-- tables by formatting simple cell content as bold/strong.
--
-- This filter is a heavily edited version of code compiled from MoonScript.
-- Please report any problems to <bpjonsson+pandoc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
--
-- This software is Copyright (c) 2021 by Benct Philip Jonsson.

-- This is free software, licensed under:

--   The MIT (X11) License

-- http://www.opensource.org/licenses/mit-license.php

-- Save some typing...
local p = assert(pandoc, "Cannot get the pandoc library")
local u = assert(p.utils, "Cannot get the pandoc.utils library")
local L = assert(p.List, "Cannot get the pandoc.List class")
local pt = p.types

-- I'm using SimpleTable based on the assumption that most/all
-- docx tables fit into a SimpleTable, because I'm not sure
-- that I can find my way around the new Table model...

local to_simple_table = u.to_simple_table
if not to_simple_table then
  if pt and PANDOC_VERSION then
    local Version = pt.Version
    if Version and PANDOC_VERSION >= Version('2.10.0') then
      error(format("This filter does not work with pandoc %s", tostring(PANDOC_VERSION)))
    end
  end
end

-- More saving on typing
local format = string.format
local unpack = table.unpack

-- Package some boilerplate
local function try (what, fun, ...)
  -- In case fun is a callable object
  call = function (...) return fun(...) end
  -- Collect any nimber of return values
  local res = {pcall(call, ...)}
  if res[1] then -- on success
    -- Return the return values
    return unpack(res, 2)
  else -- on failure
    -- Propagate the error message along with the description
    error(format("Error %s: %s", what, res[2]))
  end
end

function Table (tab)
  if to_simple_table then
    tab = try( "converting Table to SimpleTable", to_simple_table, tab )
  end
  -- Loop over the indices so that we don't confuse the stateless iterator
  -- when we modify the list of rows
  for r=1, #tab.rows do
    local row  = tab.rows[r]
    local stub = row[1] -- first cell in the row
    -- Check that we got simple cell contents
    if 1 == #stub then -- if only one block in the cell...
      local block = stub[1]
      local tag = block.tag
      -- ...and that block is a Para or Plain
      if ('Para' == tag) or ('Plain' == tag) then
        -- Wrap the block content in a Strong to simulate a stub
        block.content = L{ p.Strong(block.content) }
        -- Is all this rearguard action necessary?
        stub[1] = block
        row[1]  = stub
        tab.rows[r] = row
      end
    end
  end
  if to_simple_table then
    tab = try( "converting SimpleTable to Table", u.from_simple_table, tab )
  end
  return tab
end

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Docx reader: First column header
       [not found]     ` <e7f1f2ad-d15e-3942-17f7-a8b62162b34d-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2021-10-06  9:43       ` Cardea
       [not found]         ` <668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Cardea @ 2021-10-06  9:43 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1547 bytes --]

I'm sorry I know my jargon is not really precise, your script is closely 
related to what 
I want except that it systematically bolds the first column.
I looked into the documentation 
<https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.tablelook?view=openxml-2.8.1> 
and I guess it is related to the conditional formatting of the first "VBand"
On Tuesday, 5 October 2021 at 13:13:02 UTC+2 BP wrote:

> On 2021-10-05 10:18, Cardea wrote:
> > 
> > Greetings,
> > The docx reader now can converts pretty accurately word table; Also it
> > looks like first column table header are not kept around. I guess this is
> > so because the AST can not accommodate for this kind of structure.
> > Is there any project of at least keeping this information around?
> > 
> > Thanks
> > 
>
> Do you mean that you have paragraphs formatted with say "Heading 3" in a 
> table cell or that you want the text in the first column formatted like 
> in a column heading (for which the proper term is _stub_)?
>
> If the latter, and if the formatting you want is bold you can fake it 
> with a filter like the one attached.
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2187 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Docx reader: First column header
       [not found]         ` <668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-10-07 11:50           ` BPJ
  0 siblings, 0 replies; 4+ messages in thread
From: BPJ @ 2021-10-07 11:50 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 3593 bytes --]

The bold style was just a proof of concept because that is what people most
often want. You could just as well wrap the cell content in a span or div
for CSS styling, or inject raw LaTeX or other markup depending on your
target format.

If the content of the cells in your docx file is styled with a named
character style[^1] you can run pandoc with `--from docx+styles` and
paragraphs/text styled with a named style are wrapped in divs/spans with a
custom attribute `custom-style` with the style name as value. A filter
script can locate spans or divs with such an attribute and modify its
attributes to something more CSS friendly and/or inject markup. It might
even pay off to go through the docx file in a word processor and apply one
or more named paragraph styles to be picked up by filter script(s). It is
also possible to match the raw text of table cells, even though Lua regular
expressions are rather limited.[^2]

[^1]: I'm unsure whether paragraph styles work in tables — I'm not a Word
user and only a very reluctant LibreOffice user and table styles are not
yet supported by Pandoc.

[^2]: Alternations, quantified groups and Unicode are not supported which
severely limits matching possibilities. Sometimes it is possible to try
multiple patterns instead.

Den ons 6 okt. 2021 11:44Cardea <gchapuis10-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

> I'm sorry I know my jargon is not really precise, your script is closely
> related to what
> I want except that it systematically bolds the first column.
> I looked into the documentation
> <https://docs.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.tablelook?view=openxml-2.8.1>
> and I guess it is related to the conditional formatting of the first "VBand"
> On Tuesday, 5 October 2021 at 13:13:02 UTC+2 BP wrote:
>
>> On 2021-10-05 10:18, Cardea wrote:
>> >
>> > Greetings,
>> > The docx reader now can converts pretty accurately word table; Also it
>> > looks like first column table header are not kept around. I guess this
>> is
>> > so because the AST can not accommodate for this kind of structure.
>> > Is there any project of at least keeping this information around?
>> >
>> > Thanks
>> >
>>
>> Do you mean that you have paragraphs formatted with say "Heading 3" in a
>> table cell or that you want the text in the first column formatted like
>> in a column heading (for which the proper term is _stub_)?
>>
>> If the latter, and if the formatting you want is bold you can fake it
>> with a filter like the one attached.
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhB9nkcrHiC7XJgFN58WqqYtqotCoPz-1D9GHV0cwiqORg%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 4813 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-07 11:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-05  8:18 Docx reader: First column header Cardea
     [not found] ` <7a08b5a6-e1d5-437d-9d5e-5bf6cb57fa9cn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-10-05 11:12   ` Benct Philip Jonsson
     [not found]     ` <e7f1f2ad-d15e-3942-17f7-a8b62162b34d-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2021-10-06  9:43       ` Cardea
     [not found]         ` <668fa3f9-b2a2-4d2c-a113-ed93963cc3cbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-10-07 11:50           ` BPJ

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).