public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Curious: ODT reader
@ 2015-01-26 22:16 Phillip Smith
       [not found] ` <4fef1220-23ec-441c-9e42-41ef29d6f1ea-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Phillip Smith @ 2015-01-26 22:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 1556 bytes --]

Hello,

Quick note to say thank you for this incredibly-useful library. I wish I 
had found it earlier. :)

I've searched through the Google Group archives and Github issues and I 
can't find an answer to the question of whether an odt reader is on the 
roadmap?

I'm completely ignorant here, having just starting using pandoc, but -- if 
it's not a waste of time to explain it to me -- I'd be curious to 
understand how an odt reader could be added.

There appears to be a suggestion for those needed this workflow (odt->other 
formats) of converting from odt to HTML, then using pandoc to convert that 
HTML to what-have-you. And there appear to be libraries in other languages 
for converting odt to html (e.g., https://github.com/imanel/odt2html). 
Thought it might not be elegant, would one idea be to provide a reader that 
wrapped that two-step process into pandoc for odt documents?

Many thanks in advance for your help in gaining a deeper understanding 
here. :)

Phillip.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4fef1220-23ec-441c-9e42-41ef29d6f1ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2066 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
  2015-01-26 22:42   ` Raniere Silva
@ 2015-01-26 22:36     ` Phillip Smith
       [not found]       ` <7EE5FAC3-481F-468F-AFE1-E898FC1E5387-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2015-01-27 16:17     ` John MacFarlane
  1 sibling, 1 reply; 15+ messages in thread
From: Phillip Smith @ 2015-01-26 22:36 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 933 bytes --]


On 2015-01-26, at 2:42 PM, Raniere Silva <raniere-Xhq86aZylhRTIXuwt5Zssw@public.gmane.org> wrote:

> Hi Phillip,
> 
>> I'm completely ignorant here, having just starting using pandoc, but -- if 
>> it's not a waste of time to explain it to me -- I'd be curious to 
>> understand how an odt reader could be added.
> 
> At src/Text/Pandoc/Readers/ you will find the source code of all readers.
> If you take a look at src/Text/Pandoc/Readers/Docx{/,.hs} you will see the
> source code of the DOCX reader. You can use it as a start point for the ODT
> reader.

Hi Raniere,

Many thanks for your response. I did have a look at the DOCX reader earlier.

Let me perhaps re-phrase my question: What have been the barriers that have prevented an odt reader from being added before?

I'm curious why so many readers are available, but not odt? Are there obstacles that are well-known and hard to overcome?

Many thanks in advance,

Phillip.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
       [not found] ` <4fef1220-23ec-441c-9e42-41ef29d6f1ea-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-01-26 22:42   ` Raniere Silva
  2015-01-26 22:36     ` Phillip Smith
  2015-01-27 16:17     ` John MacFarlane
  0 siblings, 2 replies; 15+ messages in thread
From: Raniere Silva @ 2015-01-26 22:42 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 458 bytes --]

Hi Phillip,

> I'm completely ignorant here, having just starting using pandoc, but -- if 
> it's not a waste of time to explain it to me -- I'd be curious to 
> understand how an odt reader could be added.

At src/Text/Pandoc/Readers/ you will find the source code of all readers.
If you take a look at src/Text/Pandoc/Readers/Docx{/,.hs} you will see the
source code of the DOCX reader. You can use it as a start point for the ODT
reader.

Cheers,
Raniere

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
  2015-01-26 22:42   ` Raniere Silva
  2015-01-26 22:36     ` Phillip Smith
@ 2015-01-27 16:17     ` John MacFarlane
  1 sibling, 0 replies; 15+ messages in thread
From: John MacFarlane @ 2015-01-27 16:17 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Raniere Silva [Jan 26 15 20:42 ]:
>Hi Phillip,
>
>> I'm completely ignorant here, having just starting using pandoc, but -- if
>> it's not a waste of time to explain it to me -- I'd be curious to
>> understand how an odt reader could be added.

https://github.com/jgm/pandoc/issues/1768


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
       [not found]       ` <7EE5FAC3-481F-468F-AFE1-E898FC1E5387-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-01-27 16:27         ` Jesse Rosenthal
       [not found]           ` <87ppa0rxyd.fsf-4GNroTWusrE@public.gmane.org>
  2015-01-27 18:10         ` John MacFarlane
  1 sibling, 1 reply; 15+ messages in thread
From: Jesse Rosenthal @ 2015-01-27 16:27 UTC (permalink / raw)
  To: Phillip Smith, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Hi Phillip,

I'm actually the author of the docx reader. I wrote it because I wanted
to use it. I'd guess that's the case for most of the readers in pandoc.

Which is to say that I don't think there's any technical limitation to
constructing such a reader. It would likely follow the same basic
pattern as the docx reader: unzip, parse, convert. It would in fact
probably be a bit easier than the docx reader because lists etc seem
much more sensible in ODT due to proper nesting. I just don't think
anyone has had sufficient desrie (or the combination of desire and
haskell chops) to make it happen.

By the way, before I wrote the docx reader, I prototyped it in python,
and wrote a python script that output pandoc json. If you don't want to
work in haskell you could look around for an ODT reader in another
language and go from there.

Best,
Jesse

Phillip Smith <phillipadsmith-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> On 2015-01-26, at 2:42 PM, Raniere Silva <raniere-Xhq86aZylhRTIXuwt5Zssw@public.gmane.org> wrote:
>
>> Hi Phillip,
>> 
>>> I'm completely ignorant here, having just starting using pandoc, but -- if 
>>> it's not a waste of time to explain it to me -- I'd be curious to 
>>> understand how an odt reader could be added.
>> 
>> At src/Text/Pandoc/Readers/ you will find the source code of all readers.
>> If you take a look at src/Text/Pandoc/Readers/Docx{/,.hs} you will see the
>> source code of the DOCX reader. You can use it as a start point for the ODT
>> reader.
>
> Hi Raniere,
>
> Many thanks for your response. I did have a look at the DOCX reader earlier.
>
> Let me perhaps re-phrase my question: What have been the barriers that have prevented an odt reader from being added before?
>
> I'm curious why so many readers are available, but not odt? Are there obstacles that are well-known and hard to overcome?
>
> Many thanks in advance,
>
> Phillip.
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7EE5FAC3-481F-468F-AFE1-E898FC1E5387%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
       [not found]           ` <87ppa0rxyd.fsf-4GNroTWusrE@public.gmane.org>
@ 2015-01-27 16:53             ` Phillip Smith
  2015-01-27 20:00             ` Phillip Smith
  1 sibling, 0 replies; 15+ messages in thread
From: Phillip Smith @ 2015-01-27 16:53 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, Jesse Rosenthal

On 2015-01-27, at 8:27 AM, Jesse Rosenthal <jrosenthal-4GNroTWusrE@public.gmane.org> wrote:
> Hi Phillip,
> 
> I'm actually the author of the docx reader. I wrote it because I wanted
> to use it. I'd guess that's the case for most of the readers in pandoc.
> 
> Which is to say that I don't think there's any technical limitation to
> constructing such a reader. It would likely follow the same basic
> pattern as the docx reader: unzip, parse, convert. It would in fact
> probably be a bit easier than the docx reader because lists etc seem
> much more sensible in ODT due to proper nesting. I just don't think
> anyone has had sufficient desrie (or the combination of desire and
> haskell chops) to make it happen.
> 
> By the way, before I wrote the docx reader, I prototyped it in python,
> and wrote a python script that output pandoc json. If you don't want to
> work in haskell you could look around for an ODT reader in another
> language and go from there.
> 
> Best,
> Jesse

Hi Jesse,

That's very helpful context to have.

I have been looking at the other libraries here, http://www.opendocumentformat.org/developers/  … and I'd be curious to see an example of the "pandoc JSON" if you could point me to one (and/or your python prototype). 

It does look like an interesting problem to solve, haskell aside. ;)  However, the team I'm working with at the moment is quite small, and we are heavily committed (like everyone!).

I see that the idea of a "bounty" has been raised before on this list. Our project is well funded and this would be a worthwhile investment.

Before I go searching for someone to write this reader out in the wild, is there anyone on this list that would be interested in taking it on? If so, please drop me a note off-list.

Phillip.


> 
> Phillip Smith <phillipadsmith-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> 
>> On 2015-01-26, at 2:42 PM, Raniere Silva <raniere-Xhq86aZylhRTIXuwt5Zssw@public.gmane.org> wrote:
>> 
>>> Hi Phillip,
>>> 
>>>> I'm completely ignorant here, having just starting using pandoc, but -- if 
>>>> it's not a waste of time to explain it to me -- I'd be curious to 
>>>> understand how an odt reader could be added.
>>> 
>>> At src/Text/Pandoc/Readers/ you will find the source code of all readers.
>>> If you take a look at src/Text/Pandoc/Readers/Docx{/,.hs} you will see the
>>> source code of the DOCX reader. You can use it as a start point for the ODT
>>> reader.
>> 
>> Hi Raniere,
>> 
>> Many thanks for your response. I did have a look at the DOCX reader earlier.
>> 
>> Let me perhaps re-phrase my question: What have been the barriers that have prevented an odt reader from being added before?
>> 
>> I'm curious why so many readers are available, but not odt? Are there obstacles that are well-known and hard to overcome?
>> 
>> Many thanks in advance,
>> 
>> Phillip.
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7EE5FAC3-481F-468F-AFE1-E898FC1E5387%40gmail.com.
>> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/245A7D92-1442-460A-98DE-EB2D8BA3512A%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
       [not found]       ` <7EE5FAC3-481F-468F-AFE1-E898FC1E5387-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2015-01-27 16:27         ` Jesse Rosenthal
@ 2015-01-27 18:10         ` John MacFarlane
       [not found]           ` <20150127181016.GB5844-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: John MacFarlane @ 2015-01-27 18:10 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Phillip Smith [Jan 26 15 14:36 ]:
>Let me perhaps re-phrase my question: What have been the barriers that have prevented an odt reader from being added before?
>
>I'm curious why so many readers are available, but not odt? Are there obstacles that are well-known and hard to overcome?

No.  It has just been waiting for somebody to have an itch severe enough to need scratching.  (Note that you might get decent results using libreoffice to do HTML or docbook export, and running that through pandoc.)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
       [not found]           ` <20150127181016.GB5844-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
@ 2015-01-27 18:19             ` Phillip Smith
  2015-01-27 20:01             ` Phillip Smith
  1 sibling, 0 replies; 15+ messages in thread
From: Phillip Smith @ 2015-01-27 18:19 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


On 2015-01-27, at 10:10 AM, John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:

> +++ Phillip Smith [Jan 26 15 14:36 ]:
>> Let me perhaps re-phrase my question: What have been the barriers that have prevented an odt reader from being added before?
>> 
>> I'm curious why so many readers are available, but not odt? Are there obstacles that are well-known and hard to overcome?
> 
> No.  It has just been waiting for somebody to have an itch severe enough to need scratching.  (Note that you might get decent results using libreoffice to do HTML or docbook export, and running that through pandoc.)

We need it to be scripted, so I'm not sure that would work...

Basically, we're working on a "book" of sorts in Github. However, many of the contributors and editors are not (yet) comfortable editing Markdown files. So we're working to rig up a simple workflow that listens for commits to the Github repository (using a web hook) and then generates what's needed, i.e., if the editor commits a .docx, a .md and .odt version of that file is generated and committed back to the repository.

This is working nicely for .md file and .docx files -- and, frankly, those are the main files being actively edited at the moment -- but I'm sure that some folks will prefer .odt and we'd like to have the workflow support that.

Phillip.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/62DD5110-F4D6-4927-A25B-C6318CFEE509%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
       [not found]           ` <87ppa0rxyd.fsf-4GNroTWusrE@public.gmane.org>
  2015-01-27 16:53             ` Phillip Smith
@ 2015-01-27 20:00             ` Phillip Smith
       [not found]               ` <e8873929-e613-43f6-98f9-a760a6e33772-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Phillip Smith @ 2015-01-27 20:00 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
  Cc: phillipadsmith-Re5JQEeQqe8AvxtiuMwx3w


[-- Attachment #1.1: Type: text/plain, Size: 4433 bytes --]

My responses seem to be getting delayed or grey listed for several hours, 
so I'm just going to re-post directly from the Google Groups interface here 
(apologies if you receive a duplicate tomorrow):

On Tuesday, January 27, 2015 at 8:27:07 AM UTC-8, Jesse Rosenthal wrote:
>
> Hi Phillip, 
>
> I'm actually the author of the docx reader. I wrote it because I wanted 
> to use it. I'd guess that's the case for most of the readers in pandoc. 
>
> Which is to say that I don't think there's any technical limitation to 
> constructing such a reader. It would likely follow the same basic 
> pattern as the docx reader: unzip, parse, convert. It would in fact 
> probably be a bit easier than the docx reader because lists etc seem 
> much more sensible in ODT due to proper nesting. I just don't think 
> anyone has had sufficient desrie (or the combination of desire and 
> haskell chops) to make it happen. 
>
> By the way, before I wrote the docx reader, I prototyped it in python, 
> and wrote a python script that output pandoc json. If you don't want to 
> work in haskell you could look around for an ODT reader in another 
> language and go from there. 
>
> Best, 
> Jesse 
>

Hi Jesse,

That's very helpful context to have.

I have been looking at the other libraries here, 
http://www.opendocumentformat.org/developers/  … and I'd be curious to see 
an example of the "pandoc JSON" if you could point me to one (and/or your 
python prototype). 

It does look like an interesting problem to solve, haskell aside. ;) 
 However, the team I'm working with at the moment is quite small, and we 
are heavily committed (like everyone!).

I see that the idea of a "bounty" has been raised before on this list. Our 
project is well funded and this would be a worthwhile investment.

Before I go searching for someone to write this reader out in the wild, is 
there anyone on this list that would be interested in taking it on? If so, 
please drop me a note off-list.
Phillip. 

 

> Phillip Smith <phillip...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: 
>
> > On 2015-01-26, at 2:42 PM, Raniere Silva <ran...-Xhq86aZylhRTIXuwt5Zssw@public.gmane.org 
> <javascript:>> wrote: 
> > 
> >> Hi Phillip, 
> >> 
> >>> I'm completely ignorant here, having just starting using pandoc, but 
> -- if 
> >>> it's not a waste of time to explain it to me -- I'd be curious to 
> >>> understand how an odt reader could be added. 
> >> 
> >> At src/Text/Pandoc/Readers/ you will find the source code of all 
> readers. 
> >> If you take a look at src/Text/Pandoc/Readers/Docx{/,.hs} you will see 
> the 
> >> source code of the DOCX reader. You can use it as a start point for the 
> ODT 
> >> reader. 
> > 
> > Hi Raniere, 
> > 
> > Many thanks for your response. I did have a look at the DOCX reader 
> earlier. 
> > 
> > Let me perhaps re-phrase my question: What have been the barriers that 
> have prevented an odt reader from being added before? 
> > 
> > I'm curious why so many readers are available, but not odt? Are there 
> obstacles that are well-known and hard to overcome? 
> > 
> > Many thanks in advance, 
> > 
> > Phillip. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> > To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org 
> <javascript:>. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/7EE5FAC3-481F-468F-AFE1-E898FC1E5387%40gmail.com. 
>
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e8873929-e613-43f6-98f9-a760a6e33772%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 6820 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
       [not found]           ` <20150127181016.GB5844-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
  2015-01-27 18:19             ` Phillip Smith
@ 2015-01-27 20:01             ` Phillip Smith
       [not found]               ` <cb88bfc2-97c1-4d3d-a7d3-3140b8086cb5-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Phillip Smith @ 2015-01-27 20:01 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 2020 bytes --]



On Tuesday, January 27, 2015 at 10:10:31 AM UTC-8, John MacFarlane wrote:
>
> +++ Phillip Smith [Jan 26 15 14:36 ]: 
> >Let me perhaps re-phrase my question: What have been the barriers that 
> have prevented an odt reader from being added before? 
> > 
> >I'm curious why so many readers are available, but not odt? Are there 
> obstacles that are well-known and hard to overcome? 
>
> No.  It has just been waiting for somebody to have an itch severe enough 
> to need scratching.  (Note that you might get decent results using 
> libreoffice to do HTML or docbook export, and running that through pandoc.) 
>

We need it to be scripted, so I'm not sure that would work... (I'm 
currently trying to find documentation for the lowriter library. Any 
pointers appreciated.)

Basically, we're working on a "book" of sorts in Github. However, many of 
the contributors and editors are not (yet) comfortable editing Markdown 
files. So we're working to rig up a simple workflow that listens for 
commits to the Github repository (using a web hook) and then generates 
what's needed, i.e., if the editor commits a .docx, a .md and .odt version 
of that file is generated and committed back to the repository.

This is working nicely for .md file and .docx files -- and, frankly, those 
are the main files being actively edited at the moment -- but I'm sure that 
some folks will prefer .odt and we'd like to have the workflow support that.
Phillip. 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/cb88bfc2-97c1-4d3d-a7d3-3140b8086cb5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2715 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
       [not found]               ` <cb88bfc2-97c1-4d3d-a7d3-3140b8086cb5-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-01-27 20:08                 ` Jesse Rosenthal
  2015-01-27 23:52                 ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
  1 sibling, 0 replies; 15+ messages in thread
From: Jesse Rosenthal @ 2015-01-27 20:08 UTC (permalink / raw)
  To: Phillip Smith, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Phillip Smith <phillipadsmith-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> We need it to be scripted, so I'm not sure that would work... (I'm 
> currently trying to find documentation for the lowriter library. Any 
> pointers appreciated.)

unoconv could take care of this for you:

http://dag.wiee.rs/home-made/unoconv/

(available in the repos for most distros, if you're running linux).

unoconv -f html input.odt && pandoc input.html -o output.whatever

I would try out html, docx, and docbook -- see what you have the best
results with. 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
       [not found]               ` <e8873929-e613-43f6-98f9-a760a6e33772-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-01-27 23:28                 ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
       [not found]                   ` <49b6d469-ce18-4ca6-a340-426122681018-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg @ 2015-01-27 23:28 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
  Cc: phillipadsmith-Re5JQEeQqe8AvxtiuMwx3w


[-- Attachment #1.1: Type: text/plain, Size: 1352 bytes --]



On Tuesday, January 27, 2015 at 9:00:20 PM UTC+1, Phillip Smith wrote:
[….] 

I'd be curious to see an example of the "pandoc JSON" if you could point me 
> to one (and/or your python prototype). 
>
You can “live”-produce one yourself in a terminal window with the help of 
pandoc, using simple Markdown input:

*kp@mbp>* pandoc -t json

# Headline

Paragraph.

List:

1. One
1. Two

^D

[{"unMeta":{}},[{"t":"Header","c":[1,["headline",[],[]],[{"t":"Str","c":"Headline"}]]},{"t":"Para","c":[{"t":"Str","c":"Paragraph."}]},{"t":"Para","c":[{"t":"Str","c":"List:"}]},{"t":"OrderedList","c":[[1,{"t":"Decimal","c":[]},{"t":"Period","c":[]}],[[{"t":"Plain","c":[{"t":"Str","c":"One"}]}],[{"t":"Plain","c":[{"t":"Str","c":"Two"}]}]]]}]]

​

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/49b6d469-ce18-4ca6-a340-426122681018%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 5030 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
       [not found]               ` <cb88bfc2-97c1-4d3d-a7d3-3140b8086cb5-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2015-01-27 20:08                 ` Jesse Rosenthal
@ 2015-01-27 23:52                 ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
       [not found]                   ` <eff8dcfa-a407-4ef6-8c3e-0c740ef3a56a-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg @ 2015-01-27 23:52 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 2949 bytes --]



On Tuesday, January 27, 2015 at 9:01:45 PM UTC+1, Phillip Smith wrote:


>
> On Tuesday, January 27, 2015 at 10:10:31 AM UTC-8, John MacFarlane wrote:
>>
>> +++ Phillip Smith [Jan 26 15 14:36 ]: 
>> >Let me perhaps re-phrase my question: What have been the barriers that 
>> have prevented an odt reader from being added before? 
>> > 
>> >I'm curious why so many readers are available, but not odt? Are there 
>> obstacles that are well-known and hard to overcome? 
>>
>> No.  It has just been waiting for somebody to have an itch severe enough 
>> to need scratching.  (Note that you might get decent results using 
>> libreoffice to do HTML or docbook export, and running that through pandoc.) 
>>
>
> We need it to be scripted, so I'm not sure that would work... (I'm 
> currently trying to find documentation for the lowriter library. Any 
> pointers appreciated.)
>
Someone has already posted a pointer to unoconv.

But LibreOffice can also be used on the command line directly to work as an 
export filter. (unoconv is just a sophisticated wrapper around the LO 
command line interface.) 

You can let LO convert to any file format it can export via its GUI. (I use 
it with a Makefile to generate LibreOffice-flavored PDFs from my 
Markdown/Pandoc workflow whenever a LaTeX-flavored PDF is not desired by my 
customer.)

So an ODT file to be converted to HTML would require a command line similar 
to the following (environment assumed to be OSX or Linux or Unix):

cd /where/LibreOffice/is/installed
./soffice "-env:UserInstallation=file:///tmp/LibO_Conversion__${USER}" \
            --headless \
            --convert-to "html:XHTML Writer File:UTF8" \        
            --outdir ${HOME} \
              input.odt

The -env:... stuff is there to make sure the --headless works independently 
from any currently/possibly active GUI instance of LibreOffice. (You can 
even use the command line interface to *import* all formats LO can read via 
--infilter=....

To see an overview of command line options, run ./soffice -help. For more 
detailled info about the available import and export filters, see:

   - 
   http://cgit.freedesktop.org/libreoffice/core/tree/filter/source/config/fragments/filters 
   - 
   http://ask.libreoffice.org/en/question/2641/convert-to-command-line-parameter/ 

​

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/eff8dcfa-a407-4ef6-8c3e-0c740ef3a56a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 12267 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
       [not found]                   ` <49b6d469-ce18-4ca6-a340-426122681018-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-01-27 23:57                     ` Phillip Smith
  0 siblings, 0 replies; 15+ messages in thread
From: Phillip Smith @ 2015-01-27 23:57 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
  Cc: phillipadsmith-Re5JQEeQqe8AvxtiuMwx3w


[-- Attachment #1.1: Type: text/plain, Size: 1079 bytes --]



On Tuesday, January 27, 2015 at 3:28:27 PM UTC-8, kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org 
wrote:
>
> On Tuesday, January 27, 2015 at 9:00:20 PM UTC+1, Phillip Smith wrote:
> [….] 
>
> I'd be curious to see an example of the "pandoc JSON" if you could point 
>> me to one (and/or your python prototype). 
>>
> You can “live”-produce one yourself in a terminal window with the help of 
> pandoc, using simple Markdown input:
>
Very helpful. Thank you. :)

Phillip. 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/26c5a2bb-9af2-444a-a9bc-3d26d6debd56%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2220 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Curious: ODT reader
       [not found]                   ` <eff8dcfa-a407-4ef6-8c3e-0c740ef3a56a-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-01-28  0:06                     ` Phillip Smith
  0 siblings, 0 replies; 15+ messages in thread
From: Phillip Smith @ 2015-01-28  0:06 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 3557 bytes --]



On Tuesday, January 27, 2015 at 3:52:02 PM UTC-8, kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org 
wrote:
>
> On Tuesday, January 27, 2015 at 9:01:45 PM UTC+1, Phillip Smith wrote:
>
>
>>
>> On Tuesday, January 27, 2015 at 10:10:31 AM UTC-8, John MacFarlane wrote:
>>>
>>> +++ Phillip Smith [Jan 26 15 14:36 ]: 
>>> >Let me perhaps re-phrase my question: What have been the barriers that 
>>> have prevented an odt reader from being added before? 
>>> > 
>>> >I'm curious why so many readers are available, but not odt? Are there 
>>> obstacles that are well-known and hard to overcome? 
>>>
>>> No.  It has just been waiting for somebody to have an itch severe enough 
>>> to need scratching.  (Note that you might get decent results using 
>>> libreoffice to do HTML or docbook export, and running that through pandoc.) 
>>>
>>
>> We need it to be scripted, so I'm not sure that would work... (I'm 
>> currently trying to find documentation for the lowriter library. Any 
>> pointers appreciated.)
>>
> Someone has already posted a pointer to unoconv.
>

Yes. Thank you. I've started experimenting with `unconv`.
 

> But LibreOffice can also be used on the command line directly to work as 
> an export filter. (unoconv is just a sophisticated wrapper around the LO 
> command line interface.)
>

Okay. I was looking for some documentation but was hunting for `lowriting` 
not `soffice`.
 

>  To see an overview of command line options, run ./soffice -help. For 
> more detailled info about the available import and export filters, see:
>
>    - 
>    http://cgit.freedesktop.org/libreoffice/core/tree/filter/source/config/fragments/filters 
>    <http://www.google.com/url?q=http%3A%2F%2Fcgit.freedesktop.org%2Flibreoffice%2Fcore%2Ftree%2Ffilter%2Fsource%2Fconfig%2Ffragments%2Ffilters&sa=D&sntz=1&usg=AFQjCNH_4tlF-KcbBJtbBUtMOVNsxB0Njw> 
>    - 
>    http://ask.libreoffice.org/en/question/2641/convert-to-command-line-parameter/ 
>    <http://www.google.com/url?q=http%3A%2F%2Fask.libreoffice.org%2Fen%2Fquestion%2F2641%2Fconvert-to-command-line-parameter%2F&sa=D&sntz=1&usg=AFQjCNFbjqrYFCqr0_Nujk7yLUrc3a1Eww>
>
> Helpful. I'll do some digging here.

The one immediate hurdle I'm seeing is that both LO (via GUI) and `unoconv` 
produce HTML output that contains data that we don't need, e.g., classes on 
headings and page numbers, which subsequently get added to the markdown 
file.

I'm reluctant to start down to far down the path of developing a less 
flexible two-step approach (odt -> html, then html -> markdown/docx) when 
it seems like there might be an option to create a new reader for .odt that 
would handle this more directly and elegantly.

My colleague is going to take a closer look at the docx reader this week. 
Still open to the idea of a bounty if anyone's got the interest and time.

Many thanks for all the help so far. Greatly appreciated.

Phillip.

>
>    
> ​
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/8a1fd1ad-bce5-4ddc-8451-b3199eea6375%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 12859 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-01-28  0:06 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-26 22:16 Curious: ODT reader Phillip Smith
     [not found] ` <4fef1220-23ec-441c-9e42-41ef29d6f1ea-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-01-26 22:42   ` Raniere Silva
2015-01-26 22:36     ` Phillip Smith
     [not found]       ` <7EE5FAC3-481F-468F-AFE1-E898FC1E5387-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-01-27 16:27         ` Jesse Rosenthal
     [not found]           ` <87ppa0rxyd.fsf-4GNroTWusrE@public.gmane.org>
2015-01-27 16:53             ` Phillip Smith
2015-01-27 20:00             ` Phillip Smith
     [not found]               ` <e8873929-e613-43f6-98f9-a760a6e33772-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-01-27 23:28                 ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
     [not found]                   ` <49b6d469-ce18-4ca6-a340-426122681018-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-01-27 23:57                     ` Phillip Smith
2015-01-27 18:10         ` John MacFarlane
     [not found]           ` <20150127181016.GB5844-nFAEphtLEs/fysO+viCLMa55KtNWUUjk@public.gmane.org>
2015-01-27 18:19             ` Phillip Smith
2015-01-27 20:01             ` Phillip Smith
     [not found]               ` <cb88bfc2-97c1-4d3d-a7d3-3140b8086cb5-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-01-27 20:08                 ` Jesse Rosenthal
2015-01-27 23:52                 ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
     [not found]                   ` <eff8dcfa-a407-4ef6-8c3e-0c740ef3a56a-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-01-28  0:06                     ` Phillip Smith
2015-01-27 16:17     ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).