From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/10572 Path: news.gmane.org!not-for-mail From: Peter Sefton Newsgroups: gmane.text.pandoc Subject: Re: Please give the Docx reader a test drive Date: Fri, 15 Aug 2014 15:37:53 +1000 Message-ID: References: <871tsmwv2h.fsf@jhu.edu> <72E1556B-D515-4519-9E9A-20F7EBDBD240@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=047d7b67097534bccb0500a46c14 X-Trace: ger.gmane.org 1408081080 15720 80.91.229.3 (15 Aug 2014 05:38:00 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 15 Aug 2014 05:38:00 +0000 (UTC) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCNNBAPMXMPBBMNZW2PQKGQEK6TRKFQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri Aug 15 07:37:56 2014 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-vc0-f189.google.com ([209.85.220.189]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XIAD5-00007g-A5 for gtp-pandoc-discuss@m.gmane.org; Fri, 15 Aug 2014 07:37:55 +0200 Original-Received: by mail-vc0-f189.google.com with SMTP id hy10sf449724vcb.16 for ; Thu, 14 Aug 2014 22:37:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:x-original-sender:x-original-authentication-results :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type; bh=m+gNvnssvyBd9SDLyQLHPldpWLabAbYWQmCdVz5H130=; b=n96bNgAgVRVr+DnnDBHhdMsLTOpyPkyXxkUx1CH0CB69vT1mL6b0kzCqP3Hjv/6nwD dTiG6+cyLqZCdk6VmuMdjFA69brhx3lDZd6Mn1gO/kyFgCsm2HfDi46/MFUqoGca1VBX P+xw2dD0Pvu28dslkGPFQNVLY6w2m5NhsCSq3YixDGxS69sySjs6GErhOOlVY9MLvoVh 92wu6OSiENPH65/KbHdpRMzYhygElkD9WQMIsN4u/04UwvOgX3HuAYvRWXHcBLxZZzkT Tq8g8oX/MpxQB4U80EXkedjWcWeDkoEP65NISbKr7GT8C5ClUG386Xf/Q4OQ3gRu9eJF plzw== X-Received: by 10.140.30.52 with SMTP id c49mr16407qgc.7.1408081074280; Thu, 14 Aug 2014 22:37:54 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.140.101.210 with SMTP id u76ls1195374qge.62.gmail; Thu, 14 Aug 2014 22:37:53 -0700 (PDT) X-Received: by 10.236.63.197 with SMTP id a45mr5544998yhd.9.1408081073727; Thu, 14 Aug 2014 22:37:53 -0700 (PDT) Original-Received: from mail-qa0-x231.google.com (mail-qa0-x231.google.com [2607:f8b0:400d:c00::231]) by gmr-mx.google.com with ESMTPS id r5si1014470qcl.0.2014.08.14.22.37.53 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 14 Aug 2014 22:37:53 -0700 (PDT) Received-SPF: pass (google.com: domain of ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:400d:c00::231 as permitted sender) client-ip=2607:f8b0:400d:c00::231; Original-Received: by mail-qa0-f49.google.com with SMTP id dc16so1764974qab.22 for ; Thu, 14 Aug 2014 22:37:53 -0700 (PDT) X-Received: by 10.224.3.67 with SMTP id 3mr25487870qam.26.1408081073582; Thu, 14 Aug 2014 22:37:53 -0700 (PDT) Original-Received: by 10.140.41.168 with HTTP; Thu, 14 Aug 2014 22:37:53 -0700 (PDT) In-Reply-To: X-Original-Sender: ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:400d:c00::231 as permitted sender) smtp.mail=ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:10572 Archived-At: --047d7b67097534bccb0500a46c14 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sorry all, managed to hit send prematurely this post is garbled - I'll get back to you! On Fri, Aug 15, 2014 at 3:36 PM, Peter Sefton wrote: > Hi Jesse, > > Thanks for this - it's looking pretty good. Attached is a test document > where I don't think the output is right. > > The markdown I'm looking for is: > > ``` > > Test document =E2=80=93 multi paragraph lists > > - First level bullet > > - First level bullet > > > Part of above bullet > > > Quote > > - First level bullet > > Quote > > > ``` > > What I get is: > ``` > Test document =E2=80=93 multi paragraph lists > > - First level bullet > > - First level bullet > > > Part of above bullet > > - First level bullet > > ``` > > I think the behaviour should be to make things that are indented to the > same level as the text of a list paragraph a part of that list item. > > > > On Wed, Aug 13, 2014 at 2:27 PM, Jesse Rosenthal > wrote: > >> Dear Andrew, >> >> Thanks so much -- this was *extremely* helpful. I haven't solved all the >> issues you brought up, but I've solved a number of them (I hope you >> don't mind, but they're on my "dunning-fixes" branch). I've attached a >> markdown version of the new and improved output, in case you want to >> compare without pulling and building. >> >> Almost all of the issues have, I think, been fixed. (Individual notes >> below, including why a couple probably can't be fixed.) May I ask your >> permission to cut out chunks of this to use for test cases? >> >> On the individual issues: >> >> > - 18/19, 1123, 1130: Not quite sure what '> > class=3D"anchor">=E2=80=99 is for. >> >> Has to do with how docx does header anchors. I had been ignoring anchor >> spans with no id. Fixed. >> >> > - 83 to 120: Not sure if there=E2=80=99s a better way of dealing with = this >> > list. It=E2=80=99s pretty non-standard (should be a definition list), = so >> > probably not. >> >> I don't quite see how. It's not a list, or at least docx doesn't think >> it is, so it just ends up being treaated like weird paragraphs. And, >> unfortunately, we currently collapse tabs into spaces. That could >> be rethought if it's clear that tabs are used as you use them here. >> >> > - 188/89 (line in the output file): 'De uiris illustribus' italicized >> > in Word, but reduced to the colon; something similar happens at lines >> > 934 and 944. It looks as if italics are not applied if an =E2=80=98Ita= lic=E2=80=99 >> > character style is applied? >> >> I hadn't been interpreting this sort of character style before, since it >> usually just uses the ctrl-i italic setting. I now interpret "Italic" >> and "Bold". I'll keep an eye out for others to support as well. >> >> > - 191=E2=80=93205, 568=E2=80=9370, 576=E2=80=9379: A block quotation i= s not picked up, but >> > that=E2=80=99s my fault for using a non-standard style name. I only br= ing it >> > up because it seems odd that the one block quotation that was picked >> > up was the one that didn=E2=80=99t use my =E2=80=98Block Quotation=E2= =80=99 style. >> >> I had previously picked up "Quote" and "BlockQuote." I've now added >> "BlockQuotatation" to the list. >> >> > - 211, 706: Unexpected phrases italicized. >> >> I hadn't taken into account all the options for the italics tags (the >> tag is there, but just to tell me not to use it?) Anyway, now it should >> work >> >> > - 300: Adjacent styles for small capitals should perhaps be combined? >> >> Bug, plain and simple. >> >> > - 349, 376, 557, 558 (etc.): Space after a word set in small caps: >> > this is surely a problem in the original file and fixing it may have >> > issues, but it would be really neat if this could be cleaned up. >> >> Cleaned up now. This was a symptom of the above bug. >> >> > - The reader sometimes applies italics to headings (704, 880, etc.) >> > and sometimes doesn=E2=80=99t (it=E2=80=99s part of the paragraph styl= e), but I >> > imagine this is an inconsistency in the source document. >> >> I don't know if there's a way to solve this. The text is manually >> italicized, so I couldn't know that it's not a foreign word, or a book >> title, or something. Were it *just* the paragraph style, I think it >> would come out unitalicized. >> >> Thanks again, >> Jesse >> >> >> >> e >> >> -- >> You received this message because you are subscribed to the Google Group= s >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send a= n >> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/m1k36dt3nx.fsf%40jhu.ed= u >> . >> >> For more options, visit https://groups.google.com/d/optout. >> >> > > > -- > > Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com > Gmail, Twitter & Skype name: ptsefton > > --=20 Peter Sefton +61410326955 pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org http://ptsefton.com Gmail, Twitter & Skype name: ptsefton --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/CAGQnt7Wxyjn2VY-dkqarA1yuZvemqxH_jGYauEMDfNcKfRSL7g%40mail.g= mail.com. For more options, visit https://groups.google.com/d/optout. --047d7b67097534bccb0500a46c14 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Sorry all, managed to hit send prematurely this post is ga= rbled - I'll get back to you!


<= div class=3D"gmail_quote">On Fri, Aug 15, 2014 at 3:36 PM, Peter Sefton <= ptsefton-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
Hi Jesse,

Thanks for this - it's looking pretty good. Attached is a test docume= nt where I don't think the output is right.=C2=A0

The markdown I'm looking for is:

```

Test document =E2=80= =93 multi paragraph lists

- =C2=A0 First level bul= let

- =C2=A0 First level bullet

> Part of above bullet

> Quote

- =C2=A0 First leve= l bullet

=C2=A0 =C2=A0 Quote

<= /div>

```

What I get is:
<= div>```
Test document =E2=80=93 multi paragraph lists

- =C2=A0 First level bullet

- =C2=A0 First l= evel bullet

> Part of above bullet
- =C2=A0 First level bullet

```

I think the behaviou= r should be to make things that are indented to the same level as the text = of a list paragraph a part of that list item.



On Wed, Aug 13, 2014 at 2:27 PM, Jesse Rosenthal <jrosenthal@jh= u.edu> wrote:
Dear Andr= ew,

Thanks so much -- this was *extremely* helpful. I haven't solved all th= e
issues you brought up, but I've solved a number of them (I hope you
don't mind, but they're on my "dunning-fixes" branch). I&= #39;ve attached a
markdown version of the new and improved output, in case you want to
compare without pulling and building.

Almost all of the issues have, I think, been fixed. (Individual notes
below, including why a couple probably can't be fixed.) May I ask your<= br> permission to cut out chunks of this to use for test cases?

On the individual issues:

> - 18/19, 1123, 1130: Not quite sure what '<span
> class=3D"anchor"></span>=E2=80=99 is for.

Has to do with how docx does header anchors. I had been ignoring anch= or
spans with no id. Fixed.

> - 83 to 120: Not sure if there=E2=80=99s a better way of dealing with = this
> list. It=E2=80=99s pretty non-standard (should be a definition list), = so
> probably not.

I don't quite see how. It's not a list, or at least docx does= n't think
it is, so it just ends up being treaated like weird paragraphs. And,
unfortunately, we currently collapse tabs into spaces. That could
be rethought if it's clear that tabs are used as you use them here.

> - 188/89 (line in the output file): 'De uiris illustribus' ita= licized
> in Word, but reduced to the colon; something similar happens at lines<= br> > 934 and 944. It looks as if italics are not applied if an =E2=80=98Ita= lic=E2=80=99
> character style is applied?

I hadn't been interpreting this sort of character style before, s= ince it
usually just uses the ctrl-i italic setting. I now interpret "Italic&q= uot;
and "Bold". I'll keep an eye out for others to support as wel= l.

> - 191=E2=80=93205, 568=E2=80=9370, 576=E2=80=9379: A block quotation i= s not picked up, but
> that=E2=80=99s my fault for using a non-standard style name. I only br= ing it
> up because it seems odd that the one block quotation that was picked > up was the one that didn=E2=80=99t use my =E2=80=98Block Quotation=E2= =80=99 style.

I had previously picked up "Quote" and "BlockQuote.&qu= ot; I've now added
"BlockQuotatation" to the list.

> - 211, 706: Unexpected phrases italicized.

I hadn't taken into account all the options for the italics tags = (the
tag is there, but just to tell me not to use it?) Anyway, now it should wor= k

> - 300: Adjacent styles for small capitals should perhaps be combined?<= br>
Bug, plain and simple.

> - 349, 376, 557, 558 (etc.): Space after a word set in small caps:
> this is surely a problem in the original file and fixing it may have > issues, but it would be really neat if this could be cleaned up.

Cleaned up now. This was a symptom of the above bug.

> - The reader sometimes applies italics to headings (704, 880, etc.) > and sometimes doesn=E2=80=99t (it=E2=80=99s part of the paragraph styl= e), but I
> imagine this is an inconsistency in the source document.

I don't know if there's a way to solve this. The text is manu= ally
italicized, so I couldn't know that it's not a foreign word, or a b= ook
title, or something. Were it *just* the paragraph style, I think it
would come out unitalicized.

Thanks again,
Jesse



e

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https= ://groups.google.com/d/msgid/pandoc-discuss/m1k36dt3nx.fsf%40jhu.edu.
For more options, visit https://groups.google.com/d/optout.




--

Peter Sefton +61410326955= pt-uoIRqaBSbk9Wk0Htik3J/w@public.gmane.org= =C2=A0http://ptsefton.com=
Gmail, Twitter & Skype name: ptsefton




--

Peter Se= fton +61410326955 pt@p= tsefton.com=C2=A0http= ://ptsefton.com
Gmail, Twitter & Skype name: ptsefton

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.= google.com/d/msgid/pandoc-discuss/CAGQnt7Wxyjn2VY-dkqarA1yuZvemqxH_jGYauEMD= fNcKfRSL7g%40mail.gmail.com.
For more options, visit http= s://groups.google.com/d/optout.
--047d7b67097534bccb0500a46c14--