* preserving blockquote spaces when converting from docx
@ 2017-10-30 14:39 Stefano Zacchiroli
[not found] ` <5d590149-457b-4de3-b863-57f70366e6d9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Stefano Zacchiroli @ 2017-10-30 14:39 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 2902 bytes --]
Heya,
I'm using pandoc to convert documents exported from Google Docs in docx
format to reStructuredText — or anything else, really, the issue I'm facing
seems independent from the output format.
A concrete example is this
document: https://docs.google.com/document/d/1wAMVrKIA2qtRGmoVDSUBJGmYZSygUaR0uOMW1GV3YE0/edit#heading=h.9zjhwskw53j8
, which uses indented paragraphs for code samples.
I'm trying to preserve the spaces used for indentation in those code
samples, but I'm failing to convince pandoc to do so.
The spaces I'm interested in are there in the docx. Here's an excerpt from
its xml markup:
<w:r w:rsidDel="00000000" w:rsidR="00000000" w:rsidRPr="00000000">
<w:rPr>
<w:rFonts w:ascii="Consolas" w:cs="Consolas"
w:eastAsia="Consolas" w:hAnsi="Consolas"/>
<w:sz w:val="20"/>
<w:szCs w:val="20"/>
<w:rtl w:val="0"/>
</w:rPr>
<w:t xml:space="preserve">2015-01-01 * "Taxi home from concert in
Brooklyn"</w:t>
<w:br w:type="textWrapping"/>
<w:t xml:space="preserve"> Assets:Cash -20 USD ; inline
comment</w:t>
<w:br w:type="textWrapping"/>
<w:t xml:space="preserve"> Expenses:Taxi</w:t>
</w:r>
Beancount recognizes those paragraphs as blockquotes (not sure why, maybe
on the basis of their indentation?), e.g.:
,BlockQuote
[Para [Strong [Str ";",Space,Str "I",Space,Str "paid",Space,Str
"and",Space,Str "left",Space,Str "the",Space,Str "taxi,",Space,Str
"forgot",Space,Str "to",Space,Str "take",Space,Str "change,",Space,Str
"it",Space,Str "was",Space,Str "cold.",LineBreak],Str
"2015-01-01",Space,Str "*",Space,Str "\"Taxi",Space,Str "home",Space,Str
"from",Space,Str "concert",Space,Str "in",Space,Str
"Brooklyn\"",LineBreak,Str "Assets:Cash",Space,Str "-20",Space,Str
"USD",Space,Str ";",Space,Str "inline",Space,Str "comment",LineBreak,Str
"Expenses:Taxi"]]
but note how it has normalized spaces to single Space tokens used as
separators.
Is there a way to tell pandoc to preserve those spaces, which are valuable
to me, in anything it decides is a BlockQuote ?
(Note that this happens before external filters are called, so AFAICT I
can't work around this using --filter)
Many thanks in advance,
Cheers.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5d590149-457b-4de3-b863-57f70366e6d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 4255 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: preserving blockquote spaces when converting from docx
[not found] ` <5d590149-457b-4de3-b863-57f70366e6d9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-10-30 17:09 ` John MACFARLANE
2017-10-31 7:52 ` Stefano Zacchiroli
0 siblings, 1 reply; 3+ messages in thread
From: John MACFARLANE @ 2017-10-30 17:09 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
You might try using nonbreaking spaces in the docx;
pandoc should preserve those.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: preserving blockquote spaces when converting from docx
2017-10-30 17:09 ` John MACFARLANE
@ 2017-10-31 7:52 ` Stefano Zacchiroli
0 siblings, 0 replies; 3+ messages in thread
From: Stefano Zacchiroli @ 2017-10-31 7:52 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
On Mon, Oct 30, 2017 at 10:09:10AM -0700, John MACFARLANE wrote:
> You might try using nonbreaking spaces in the docx;
> pandoc should preserve those.
Thanks for your answer! As I understand that would indeed work, but
unfortunately I don't control the documents myself. Also, it would be
annoying to change those spaces into nbsp, which they conceptually
aren't, just to make pandoc preserve them.
Is there really no way to hook into pandoc *before* it decides to remove
them?
Alternatively, what would be a definition of the policy that pandoc uses
to decide those are blockquotes? If nothing else works, what I can do is
writing a docx filter that adds the nbsp automatically, but I need to be
sure that I do that only where pandoc will "see" blockquotes.
TIA,
Cheers.
--
Stefano Zacchiroli . zack-CfJEcLwHECWjKv3TNrM5DQ@public.gmane.org . upsilon.cc/zack . . o . . . o . o
Computer Science Professor . CTO Software Heritage . . . . . o . . . o o
Former Debian Project Leader & OSI Board Director . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20171031075215.dwxkum55skzs52d3%40upsilon.cc.
For more options, visit https://groups.google.com/d/optout.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-10-31 7:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-30 14:39 preserving blockquote spaces when converting from docx Stefano Zacchiroli
[not found] ` <5d590149-457b-4de3-b863-57f70366e6d9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-10-30 17:09 ` John MACFARLANE
2017-10-31 7:52 ` Stefano Zacchiroli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).