public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Docx reader: how much to correct word's mistakes?
@ 2014-06-22  5:12 Jesse Rosenthal
       [not found] ` <m1mwd5bk8d.fsf-4GNroTWusrE@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Jesse Rosenthal @ 2014-06-22  5:12 UTC (permalink / raw)
  To: pandoc-discuss

Dear All,

One problem that I've been encountering in writing the docx reader is
the fact that MS Word, for reasons known only to itself, will sometimes
decide to unformat spaces between formatted chunks of text. So even
though the text looks like:

    This is *italics and **bold italcs***.

word will for some reason produce something which will translate to

    This is *italics* *and **bold italics***.
or 

    This is *italics and* ***bold italics***.

Now, I know it does this even when you don't tell it to because I've
written a lot of one-sentence word docs as tests for the docx reader. 

My question, though, is whether folks would want the reader to *assume*
that this is always a mistake and fix it. In other words, are there ever
times that people would want unformatted spaces between formatted runs
of text?

It would be simple enough to have something like:

~~~
f ((Emph xs) : Space : (Emph ys) : zs) = f ((Emph (xs ++ [Space] ++ ys))
: zs)
~~~

(Not exactly how it would work, but close enough). 

So is this sort of error-correction something pandoc users would want?
Or is it too intrusive?

Best,
Jesse


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Docx reader: how much to correct word's mistakes?
       [not found] ` <m1mwd5bk8d.fsf-4GNroTWusrE@public.gmane.org>
@ 2014-06-22  5:41   ` Matthew Pickering
       [not found]     ` <CALuQ0m_30wYe-CG57_6E4hxxG4-aViURc+S5FPx1NtX=kdR9fA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Pickering @ 2014-06-22  5:41 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2555 bytes --]

I think what you describe is desired behavoir, if i'm not mistaken this is
how you have implemented it in the reader already?
On 22 Jun 2014 06:13, "Jesse Rosenthal" <jrosenthal-4GNroTWusrE@public.gmane.org> wrote:

> Dear All,
>
> One problem that I've been encountering in writing the docx reader is
> the fact that MS Word, for reasons known only to itself, will sometimes
> decide to unformat spaces between formatted chunks of text. So even
> though the text looks like:
>
>     This is *italics and **bold italcs***.
>
> word will for some reason produce something which will translate to
>
>     This is *italics* *and **bold italics***.
> or
>
>     This is *italics and* ***bold italics***.
>
> Now, I know it does this even when you don't tell it to because I've
> written a lot of one-sentence word docs as tests for the docx reader.
>
> My question, though, is whether folks would want the reader to *assume*
> that this is always a mistake and fix it. In other words, are there ever
> times that people would want unformatted spaces between formatted runs
> of text?
>
> It would be simple enough to have something like:
>
> ~~~
> f ((Emph xs) : Space : (Emph ys) : zs) = f ((Emph (xs ++ [Space] ++ ys))
> : zs)
> ~~~
>
> (Not exactly how it would work, but close enough).
>
> So is this sort of error-correction something pandoc users would want?
> Or is it too intrusive?
>
> Best,
> Jesse
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/m1mwd5bk8d.fsf%40jhu.edu.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CALuQ0m_30wYe-CG57_6E4hxxG4-aViURc%2BS5FPx1NtX%3DkdR9fA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 3697 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Retracting my question [was: Docx reader: how much to correct word's mistakes?]
       [not found]     ` <CALuQ0m_30wYe-CG57_6E4hxxG4-aViURc+S5FPx1NtX=kdR9fA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-06-22  5:52       ` Jesse Rosenthal
  2014-06-22  5:53       ` Docx reader: how much to correct word's mistakes? Ivan Lazar Miljenovic
  1 sibling, 0 replies; 4+ messages in thread
From: Jesse Rosenthal @ 2014-06-22  5:52 UTC (permalink / raw)
  To: Matthew Pickering, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Yes -- I was actually just about to write back to the list, and
apologize for the noise. I had been trying to fix another problem with
spaces (the fact that people will turn off their formatting after they
type their spaces, not before), and ended up creating another set of
problems. So I unfairly blamed MSWord for my own bug. On its way to
being fixed. Apologies, Redmond. And sorry to all of your inboxes as
well.

--Jesse

Matthew Pickering <matthewtpickering-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I think what you describe is desired behavoir, if i'm not mistaken this is
> how you have implemented it in the reader already?
> On 22 Jun 2014 06:13, "Jesse Rosenthal" <jrosenthal-4GNroTWusrE@public.gmane.org> wrote:
>
>> Dear All,
>>
>> One problem that I've been encountering in writing the docx reader is
>> the fact that MS Word, for reasons known only to itself, will sometimes
>> decide to unformat spaces between formatted chunks of text. So even
>> though the text looks like:
>>
>>     This is *italics and **bold italcs***.
>>
>> word will for some reason produce something which will translate to
>>
>>     This is *italics* *and **bold italics***.
>> or
>>
>>     This is *italics and* ***bold italics***.
>>
>> Now, I know it does this even when you don't tell it to because I've
>> written a lot of one-sentence word docs as tests for the docx reader.
>>
>> My question, though, is whether folks would want the reader to *assume*
>> that this is always a mistake and fix it. In other words, are there ever
>> times that people would want unformatted spaces between formatted runs
>> of text?
>>
>> It would be simple enough to have something like:
>>
>> ~~~
>> f ((Emph xs) : Space : (Emph ys) : zs) = f ((Emph (xs ++ [Space] ++ ys))
>> : zs)
>> ~~~
>>
>> (Not exactly how it would work, but close enough).
>>
>> So is this sort of error-correction something pandoc users would want?
>> Or is it too intrusive?
>>
>> Best,
>> Jesse
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/m1mwd5bk8d.fsf%40jhu.edu.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CALuQ0m_30wYe-CG57_6E4hxxG4-aViURc%2BS5FPx1NtX%3DkdR9fA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Docx reader: how much to correct word's mistakes?
       [not found]     ` <CALuQ0m_30wYe-CG57_6E4hxxG4-aViURc+S5FPx1NtX=kdR9fA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-06-22  5:52       ` Retracting my question [was: Docx reader: how much to correct word's mistakes?] Jesse Rosenthal
@ 2014-06-22  5:53       ` Ivan Lazar Miljenovic
  1 sibling, 0 replies; 4+ messages in thread
From: Ivan Lazar Miljenovic @ 2014-06-22  5:53 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Just brainstorming here, but I can think of times that you might not
want it to do so when it would require formatting to go across several
lines, especially if there was a newline before "bold italics".  I
suppose this is the distinction of breaking and non-breaking spaces.

On 22 June 2014 15:41, Matthew Pickering <matthewtpickering-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> I think what you describe is desired behavoir, if i'm not mistaken this is
> how you have implemented it in the reader already?
>
> On 22 Jun 2014 06:13, "Jesse Rosenthal" <jrosenthal-4GNroTWusrE@public.gmane.org> wrote:
>>
>> Dear All,
>>
>> One problem that I've been encountering in writing the docx reader is
>> the fact that MS Word, for reasons known only to itself, will sometimes
>> decide to unformat spaces between formatted chunks of text. So even
>> though the text looks like:
>>
>>     This is *italics and **bold italcs***.
>>
>> word will for some reason produce something which will translate to
>>
>>     This is *italics* *and **bold italics***.
>> or
>>
>>     This is *italics and* ***bold italics***.
>>
>> Now, I know it does this even when you don't tell it to because I've
>> written a lot of one-sentence word docs as tests for the docx reader.
>>
>> My question, though, is whether folks would want the reader to *assume*
>> that this is always a mistake and fix it. In other words, are there ever
>> times that people would want unformatted spaces between formatted runs
>> of text?
>>
>> It would be simple enough to have something like:
>>
>> ~~~
>> f ((Emph xs) : Space : (Emph ys) : zs) = f ((Emph (xs ++ [Space] ++ ys))
>> : zs)
>> ~~~
>>
>> (Not exactly how it would work, but close enough).
>>
>> So is this sort of error-correction something pandoc users would want?
>> Or is it too intrusive?
>>
>> Best,
>> Jesse
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/m1mwd5bk8d.fsf%40jhu.edu.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/CALuQ0m_30wYe-CG57_6E4hxxG4-aViURc%2BS5FPx1NtX%3DkdR9fA%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



-- 
Ivan Lazar Miljenovic
Ivan.Miljenovic-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
http://IvanMiljenovic.wordpress.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-06-22  5:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-22  5:12 Docx reader: how much to correct word's mistakes? Jesse Rosenthal
     [not found] ` <m1mwd5bk8d.fsf-4GNroTWusrE@public.gmane.org>
2014-06-22  5:41   ` Matthew Pickering
     [not found]     ` <CALuQ0m_30wYe-CG57_6E4hxxG4-aViURc+S5FPx1NtX=kdR9fA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-06-22  5:52       ` Retracting my question [was: Docx reader: how much to correct word's mistakes?] Jesse Rosenthal
2014-06-22  5:53       ` Docx reader: how much to correct word's mistakes? Ivan Lazar Miljenovic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).