From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/11866 Path: news.gmane.org!not-for-mail From: Phillip Smith Newsgroups: gmane.text.pandoc Subject: Re: Curious: ODT reader Date: Tue, 27 Jan 2015 16:06:34 -0800 (PST) Message-ID: <8a1fd1ad-bce5-4ddc-8451-b3199eea6375@googlegroups.com> References: <4fef1220-23ec-441c-9e42-41ef29d6f1ea@googlegroups.com> <20150126224239.GA30710@pupunha> <7EE5FAC3-481F-468F-AFE1-E898FC1E5387@gmail.com> <20150127181016.GB5844@protagoras.berkeley.edu> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1013_1374321856.1422403594808" X-Trace: ger.gmane.org 1422403602 2103 80.91.229.3 (28 Jan 2015 00:06:42 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 28 Jan 2015 00:06:42 +0000 (UTC) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDA6VFNJTMJBBC6QUCTAKGQEU6ZJQEA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Jan 28 01:06:41 2015 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-ie0-f184.google.com ([209.85.223.184]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YGG9U-0002OM-AY for gtp-pandoc-discuss@m.gmane.org; Wed, 28 Jan 2015 01:06:36 +0100 Original-Received: by mail-ie0-f184.google.com with SMTP id rl12sf3164988iec.1 for ; Tue, 27 Jan 2015 16:06:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :content-type:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe; bh=Ri219omIvL/pEiIWzis64hpi5Uj2OS5PABg1oTnc098=; b=rqRpzd81ktEJvmIGMyJTrHlzOHPQJXf1BMKt42vsZncwDjGSevIvCLMoOgPmoDizAT MFvPXMyLtm165gmjPnbPTBg2lMcQlV7nlHc3QH/XYo6dyczsweXPa4t5BOjSOvbxQH23 fYvGIN6h6yKd44BGu5QmYdubhd0MWoJFt8VQ4Xi7xHK3inrKcdTzZsfIvr/pO95g25yq xUXqvZ0RHW/Op5LubervVO6TFnUZxvrg2/mmluTjyi0FkcHdaWxTJHBsWLGjTiU67cKv mdTwXslKTqfWEUkHkNHCNjNYafWUPXreMBQYtYrazsucnlB1axzhRepJ0ZTYyiK7wIZe N8vg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :content-type:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe; bh=Ri219omIvL/pEiIWzis64hpi5Uj2OS5PABg1oTnc098=; b=RzR/cnuCv4VF8ao8ghHT9mGU9FCTlcjpzHTzcy6iFBRgaEGhQFisROWRUUlb0qL1Vi fo9LHK4+liIYbrzKUZ3BgjE+618kUeVljk1G1Hx58lqHMK+HbtKH5WUriNxZa//SxNY5 Xo1GIfvojL9NwDpDmby3MSOZnctJ62w5oy873eLg2aj5o1NqYF4C65EDzTMF02+g74yQ 3pSmU4NwIHyg4rRXYGmbixQrZuRyjrNNlWO/rUQ8NXnhr/C09d/rWdq+2qxCS0uuPX/d du0cGfodtCyUbJUKcmXWANoht8Db9rJLKuGgy1a2ewQ8SJuJHuyUI76lXeCIUYlOSY8r 9SRg== X-Received: by 10.140.106.8 with SMTP id d8mr55456qgf.7.1422403595528; Tue, 27 Jan 2015 16:06:35 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.140.35.84 with SMTP id m78ls340676qgm.90.gmail; Tue, 27 Jan 2015 16:06:35 -0800 (PST) X-Received: by 10.140.28.162 with SMTP id 31mr55846qgz.16.1422403595159; Tue, 27 Jan 2015 16:06:35 -0800 (PST) In-Reply-To: X-Original-Sender: phillipadsmith-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:11866 Archived-At: ------=_Part_1013_1374321856.1422403594808 Content-Type: multipart/alternative; boundary="----=_Part_1014_2040810307.1422403594808" ------=_Part_1014_2040810307.1422403594808 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Tuesday, January 27, 2015 at 3:52:02 PM UTC-8, kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org= =20 wrote: > > On Tuesday, January 27, 2015 at 9:01:45 PM UTC+1, Phillip Smith wrote: > > >> >> On Tuesday, January 27, 2015 at 10:10:31 AM UTC-8, John MacFarlane wrote= : >>> >>> +++ Phillip Smith [Jan 26 15 14:36 ]:=20 >>> >Let me perhaps re-phrase my question: What have been the barriers that= =20 >>> have prevented an odt reader from being added before?=20 >>> >=20 >>> >I'm curious why so many readers are available, but not odt? Are there= =20 >>> obstacles that are well-known and hard to overcome?=20 >>> >>> No. It has just been waiting for somebody to have an itch severe enoug= h=20 >>> to need scratching. (Note that you might get decent results using=20 >>> libreoffice to do HTML or docbook export, and running that through pand= oc.)=20 >>> >> >> We need it to be scripted, so I'm not sure that would work... (I'm=20 >> currently trying to find documentation for the lowriter library. Any=20 >> pointers appreciated.) >> > Someone has already posted a pointer to unoconv. > Yes. Thank you. I've started experimenting with `unconv`. =20 > But LibreOffice can also be used on the command line directly to work as= =20 > an export filter. (unoconv is just a sophisticated wrapper around the LO= =20 > command line interface.) > Okay. I was looking for some documentation but was hunting for `lowriting`= =20 not `soffice`. =20 > To see an overview of command line options, run ./soffice -help. For=20 > more detailled info about the available import and export filters, see: > > -=20 > http://cgit.freedesktop.org/libreoffice/core/tree/filter/source/config= /fragments/filters=20 > =20 > -=20 > http://ask.libreoffice.org/en/question/2641/convert-to-command-line-pa= rameter/=20 > > > Helpful. I'll do some digging here. The one immediate hurdle I'm seeing is that both LO (via GUI) and `unoconv`= =20 produce HTML output that contains data that we don't need, e.g., classes on= =20 headings and page numbers, which subsequently get added to the markdown=20 file. I'm reluctant to start down to far down the path of developing a less=20 flexible two-step approach (odt -> html, then html -> markdown/docx) when= =20 it seems like there might be an option to create a new reader for .odt that= =20 would handle this more directly and elegantly. My colleague is going to take a closer look at the docx reader this week.= =20 Still open to the idea of a bounty if anyone's got the interest and time. Many thanks for all the help so far. Greatly appreciated. Phillip. > > =20 > =E2=80=8B > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/8a1fd1ad-bce5-4ddc-8451-b3199eea6375%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. ------=_Part_1014_2040810307.1422403594808 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Tuesday, January 27, 2015 at 3:52:02 PM UTC-8, = kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org wrote:

On Tuesday, = January 27, 2015 at 9:01:45 PM UTC+1, Phillip Smith wrote:



On Tuesday, January 27, 201= 5 at 10:10:31 AM UTC-8, John MacFarlane wrote:
+++ Phillip Smith [Jan 26 15 14:36 ]:
>Let me perhaps re-phrase my question: What have been the barriers t= hat have prevented an odt reader from being added before?
>
>I'm curious why so many readers are available, but not odt? Are the= re obstacles that are well-known and hard to overcome?

No.  It has just been waiting for somebody to have an itch severe = enough to need scratching.  (Note that you might get decent results us= ing libreoffice to do HTML or docbook export, and running that through pand= oc.)

We need it to be scripted, so I'm not sure that would work... (I'm curre= ntly trying to find documentation for the lowriter library. Any pointers ap= preciated.)

Someone has already posted a pointe= r to unoconv.


=
Yes. Thank you. I've started experimenting with `unconv`.
<= div> 

But LibreOffice can also be used on= the command line directly to work as an export filter. (u= noconv is just a sophisticated wrapper around the LO command line in= terface.)


Okay. I was looki= ng for some documentation but was hunting for `lowriting` not `soffice`.
 

To see an overview of command line = options, run ./soffice -help. For more detailled info a= bout the available import and export filters, see:

Helpful. I'll do some digging here.
=

The one immediate hurdle I'm seeing is that both LO (via GUI) and = `unoconv` produce HTML output that contains data that we don't need, e.g., = classes on headings and page numbers, which subsequently get added to the m= arkdown file.

I'm reluctant to start down to far d= own the path of developing a less flexible two-step approach (odt -> htm= l, then html -> markdown/docx) when it seems like there might be an opti= on to create a new reader for .odt that would handle this more directly and= elegantly.

My colleague is going to take a closer= look at the docx reader this week. Still open to the idea of a bounty if a= nyone's got the interest and time.

Many thanks for all the help so f= ar. Greatly appreciated.

Phillip.
=E2=80=8B

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/8a1fd1ad-bce5-4ddc-8451-b3199eea6375%40googlegroups.co= m.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_1014_2040810307.1422403594808-- ------=_Part_1013_1374321856.1422403594808--