public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Header attributes and docx
@ 2023-05-06  5:51 Miguel
       [not found] ` <d9a83a0b-50b4-46fb-9b7e-4f0855cf0599n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Miguel @ 2023-05-06  5:51 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 922 bytes --]


Say you have the following file:

# This is a section {#section}

Converting this to html/docbook/latex and back to markdown properly handles 
and keeps the header attribute:

> pandoc -t docbook test.md | pandoc -f docbook -t markdown

# This is a section {#section}

But when converting and reversing from docx, the header attribute is lost:

> pandoc -t docx test.md | pandoc -f docx -t markdown 

# This is a section

How can this be fixed, and convert back from docx without losing the given 
header id? 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d9a83a0b-50b4-46fb-9b7e-4f0855cf0599n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1320 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Header attributes and docx
       [not found] ` <d9a83a0b-50b4-46fb-9b7e-4f0855cf0599n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2023-05-06  6:41   ` Albert Krewinkel
       [not found]     ` <87r0ru9ce8.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Albert Krewinkel @ 2023-05-06  6:41 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Conversions are generally lossy; some info cannot be preserved. In this
case here the issue is that the docx reader ignores "bookmark" entries
in the input document. Changing this would require updates to the docx
reader:
https://github.com/jgm/pandoc/blob/main/src/Text/Pandoc/Readers/Docx.hs

Miguel <bagnon-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Say you have the following file:
>
> # This is a section {#section}
>
> Converting this to html/docbook/latex and back to markdown properly
> handles and keeps the header attribute:
>
>> pandoc -t docbook test.md | pandoc -f docbook -t markdown
>
> # This is a section {#section}
>
> But when converting and reversing from docx, the header attribute is
> lost:
>
>> pandoc -t docx test.md | pandoc -f docx -t markdown
>
> # This is a section
>
> How can this be fixed, and convert back from docx without losing the
> given header id? 


-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87r0ru9ce8.fsf%40zeitkraut.de.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Header attributes and docx
       [not found]     ` <87r0ru9ce8.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2023-05-06  7:18       ` Miguel
  0 siblings, 0 replies; 3+ messages in thread
From: Miguel @ 2023-05-06  7:18 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2682 bytes --]

Thanks Albert. 

I am not familiar/fluent with the pandoc readers/writers, but it seems that 
the bookmark is properly transported to the docx,

<?xml version="1.0" encoding="UTF-8"?>
<w:document 
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" 
xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" 
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
xmlns:o="urn:schemas-microsoft-com:office:office" 
xmlns:v="urn:schemas-microsoft-com:vml" 
xmlns:w10="urn:schemas-microsoft-com:office:word" 
xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" 
xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" 
xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing">
  <w:body>
    <w:bookmarkStart w:id="20" w:name="section" />
    <w:p>
      <w:pPr>
        <w:pStyle w:val="Heading1" />
      </w:pPr>
      <w:r>
        <w:t xml:space="preserve">This is a section</w:t>
      </w:r>
    </w:p>
    <w:bookmarkEnd w:id="20" />
    <w:sectPr />
  </w:body>
</w:document>

It is when writing back to md that the heading attribute is not sent back.


On Saturday, 6 May 2023 at 08:56:19 UTC+2 Albert Krewinkel wrote:

> Conversions are generally lossy; some info cannot be preserved. In this
> case here the issue is that the docx reader ignores "bookmark" entries
> in the input document. Changing this would require updates to the docx
> reader:
> https://github.com/jgm/pandoc/blob/main/src/Text/Pandoc/Readers/Docx.hs
>
> Miguel <bag...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > Say you have the following file:
> >
> > # This is a section {#section}
> >
> > Converting this to html/docbook/latex and back to markdown properly
> > handles and keeps the header attribute:
> >
> >> pandoc -t docbook test.md | pandoc -f docbook -t markdown
> >
> > # This is a section {#section}
> >
> > But when converting and reversing from docx, the header attribute is
> > lost:
> >
> >> pandoc -t docx test.md | pandoc -f docx -t markdown
> >
> > # This is a section
> >
> > How can this be fixed, and convert back from docx without losing the
> > given header id? 
>
>
> -- 
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/10bc5af5-5507-4a4e-b620-9d48a9cc4800n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 4166 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-05-06  7:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-06  5:51 Header attributes and docx Miguel
     [not found] ` <d9a83a0b-50b4-46fb-9b7e-4f0855cf0599n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-05-06  6:41   ` Albert Krewinkel
     [not found]     ` <87r0ru9ce8.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2023-05-06  7:18       ` Miguel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).