public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Docx to Markdown and Front Matter
@ 2021-04-22  9:04 Doeke Zanstra
       [not found] ` <100da112-ed0d-4618-b949-721a3079538bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Doeke Zanstra @ 2021-04-22  9:04 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1225 bytes --]

I'm converting docx to markdown, and I need a YAML front matter heading 
just before the markdown. I recently learnt to do this with the 
--stand-alone argument.

However, it is opaque how this exactly works. I only get front matter when 
the first paragraph is styled with the "Title" style (actually the Dutch 
localized "Titel" style). 

Are there other options available to get more meta-data out of the Word 
document? Via Word on macOS via the menu Archive > Properties > Summary, 
there are all kinds of meta data which could be useful as front matter:

- Titel
- Subject
- Author
- Manager
- Company
- Category
- Keywords
- Remarks
- Hyperlink base

Can this be used? Or are there other ways to get meta-data out of Word? 
Or would this need a feature request in pandoc?

Thanks in advance,
Doeke Zanstra


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/100da112-ed0d-4618-b949-721a3079538bn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1794 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Docx to Markdown and Front Matter
       [not found] ` <100da112-ed0d-4618-b949-721a3079538bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-04-26  6:01   ` John MacFarlane
       [not found]     ` <m235vdr3d6.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: John MacFarlane @ 2021-04-26  6:01 UTC (permalink / raw)
  To: Doeke Zanstra, pandoc-discuss


There are already some issues on the tracker that seem
relevant, e.g. #3109, #3034

Doeke Zanstra <doeke-5rSQWjF5bFWbyly6AaOUig@public.gmane.org> writes:

> I'm converting docx to markdown, and I need a YAML front matter heading 
> just before the markdown. I recently learnt to do this with the 
> --stand-alone argument.
>
> However, it is opaque how this exactly works. I only get front matter when 
> the first paragraph is styled with the "Title" style (actually the Dutch 
> localized "Titel" style). 
>
> Are there other options available to get more meta-data out of the Word 
> document? Via Word on macOS via the menu Archive > Properties > Summary, 
> there are all kinds of meta data which could be useful as front matter:
>
> - Titel
> - Subject
> - Author
> - Manager
> - Company
> - Category
> - Keywords
> - Remarks
> - Hyperlink base
>
> Can this be used? Or are there other ways to get meta-data out of Word? 
> Or would this need a feature request in pandoc?
>
> Thanks in advance,
> Doeke Zanstra
>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/100da112-ed0d-4618-b949-721a3079538bn%40googlegroups.com.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Docx to Markdown and Front Matter
       [not found]     ` <m235vdr3d6.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-04-26  7:37       ` BPJ
  0 siblings, 0 replies; 3+ messages in thread
From: BPJ @ 2021-04-26  7:37 UTC (permalink / raw)
  To: pandoc-discuss; +Cc: Doeke Zanstra

[-- Attachment #1: Type: text/plain, Size: 2783 bytes --]

At one point I experimented with a LibreOffice (or was it as long ago as
OpenOffice?) macro which pulled out metadata and put it as lines of KEY:
VALUE pairs at the top of the text. Not really successful but you might
have better luck with python-docx.


Den mån 26 apr. 2021 08:02John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> skrev:

>
> There are already some issues on the tracker that seem
> relevant, e.g. #3109, #3034
>
> Doeke Zanstra <doeke-5rSQWjF5bFWbyly6AaOUig@public.gmane.org> writes:
>
> > I'm converting docx to markdown, and I need a YAML front matter heading
> > just before the markdown. I recently learnt to do this with the
> > --stand-alone argument.
> >
> > However, it is opaque how this exactly works. I only get front matter
> when
> > the first paragraph is styled with the "Title" style (actually the Dutch
> > localized "Titel" style).
> >
> > Are there other options available to get more meta-data out of the Word
> > document? Via Word on macOS via the menu Archive > Properties > Summary,
> > there are all kinds of meta data which could be useful as front matter:
> >
> > - Titel
> > - Subject
> > - Author
> > - Manager
> > - Company
> > - Category
> > - Keywords
> > - Remarks
> > - Hyperlink base
> >
> > Can this be used? Or are there other ways to get meta-data out of Word?
> > Or would this need a feature request in pandoc?
> >
> > Thanks in advance,
> > Doeke Zanstra
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/100da112-ed0d-4618-b949-721a3079538bn%40googlegroups.com
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/m235vdr3d6.fsf%40MacBook-Pro.hsd1.ca.comcast.net
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCdeN25Gbebh%3DnW-viyn2NozKwNjZqHyJxq7j5qj4WrUg%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 4179 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-04-26  7:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-22  9:04 Docx to Markdown and Front Matter Doeke Zanstra
     [not found] ` <100da112-ed0d-4618-b949-721a3079538bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-04-26  6:01   ` John MacFarlane
     [not found]     ` <m235vdr3d6.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-04-26  7:37       ` BPJ

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).