public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* DOCX to markdown. Poetry. Keep whitespace and verse structure.
@ 2018-12-28 12:51 Lars Bingchong
       [not found] ` <4dfd9f3f-ca60-40b4-9925-9618ef468000-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Lars Bingchong @ 2018-12-28 12:51 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4197 bytes --]

Hello ladies and gentlemen. This is my first post in the "pandoc-discuss" 
group. Allow me to explain myself.

====TRYING TO====

* Convert a lot of DOCX documents, that have text structured like this:

----

Evigt liv til salg

Hvis nyeste forskning besad.
Gralen til evigt liv via morgenmad.
Ville du gå til bords?
Mæske dig i libidoens buffet.

Eller tror du, fordi du tror?
At evigt liv er en Guds givet gave.
Til frit valg på hylde 1.
Et omfavnende selv tak - det var så lidt.

At det eksisterer kan vi ikke bevise.
At det gør kan vi ikke benægte.
Fakta, aktualitetens modstander.
Og aktuelt er det evige liv for os.

På den ene eller den anden led.
Ønsker vi livet bliver ved.
For det er i det levne liv.
At livet giver 4.

Så hvad gør en klog.
Forsøger at leve evigt sæføli.
Om ikke i kød og dundrende mørkt blod.
Så ihukommelse af os selv i andre.

Skakmat
160603
(Genfødsel - evigt liv) - 2660 - kulturweekend

----

So what I would like pandoc to do when executing it on a DOCX document of 
the above type is:


   1. Keep the whitespace between the verses and the first line which is 
   the title
   2. Keep the verse structure so that lines that are not divided by a 
   whitespace line stay together

====TRIED====


   1. *sudo pandoc -s file.docx -t markdown -o mydoc.md --wrap=none 
   --extract-media . *--> that did not do the job
   2. Searching through this discussion group to see if this had already 
   been solved.
   3. Had a good look at the Pandoc documentation. Disclaimer, I have no 
   prior experience with LUA and have not used Pandoc to a great extend.
   4. Then I tried with a LUA filter, inspired by this disccusion >> 
   https://groups.google.com/forum/#!searchin/pandoc-discuss/paragraphs%7Csort:date/pandoc-discuss/wlP6AL11NIY/PxF4d6ilBQAJ
      1. I modified it a bit and ended up with.
   
```
function Pandoc(doc)
  local lb = pandoc.LineBlock(doc)
  for i,b in pairs(doc.content) do
    if b.t == "Para" and b.content ~= nil then
      table.insert(lb.content, b.content)
    end
  end
  return pandoc.Pandoc({lb}, doc.meta)
end
```
--> that gets the conversion in the right direction. Lines are not like this

```
Evigt liv til salg

Hvis nyeste forskning besad.

Gralen til evigt liv via morgenmad.

Ville du gå til bords?

Mæske dig i libidoens buffet.

Eller tror du, fordi du tror?
```

but like this:

```
| Evigt liv til salg
| Hvis nyeste forskning besad.
| Gralen til evigt liv via morgenmad.
| Ville du gå til bords?
| Mæske dig i libidoens buffet.
| Eller tror du, fordi du tror?
| At evigt liv er en Guds givet gave.
```

However as stated in the "..what I would like..." section, it does not:


   1. Keep the whitespace between the verses and the first line which is 
   the title
   2. Keep the verse structure so that lines that are not divided by a 
   whitespace line stay together

----

So I'm seeking help on how to accomplish what I want with a LUA filter, as 
this seems like the rigth path.

Thank you very much :-) and a happy new year (it's soon :-).

function Pandoc(doc)
  local lb = pandoc.LineBlock(doc)
  for i,b in pairs(doc.content) do
    if b.t == "Para" and b.content ~= nil then
      table.insert(lb.content, b.content)
    end
  end
  return pandoc.Pandoc({lb}, doc.meta)
end

sudo pandoc -s /Volumes/IBIGDATA/IBIG\ Data/Documents/POEMS\ -\ PHILOSOPHIES\ -\ WORDPLAY/FINISHED\ POEMS/DANISH/2016/Evigt\ liv\ til\ salg\ 160603.docx -t markdown -o mydoc.md --wrap=none --extract-media .

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4dfd9f3f-ca60-40b4-9925-9618ef468000%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 6464 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: DOCX to markdown. Poetry. Keep whitespace and verse structure.
       [not found] ` <4dfd9f3f-ca60-40b4-9925-9618ef468000-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-12-28 15:50   ` mb21
       [not found]     ` <6e2a4a4f-4cba-459c-8f15-b19726fe2496-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: mb21 @ 2018-12-28 15:50 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4637 bytes --]

You really shouldn't need to use sudo to run pandoc.

About your question: see https://pandoc.org/MANUAL.html#line-blocks


On Friday, December 28, 2018 at 1:51:11 PM UTC+1, Lars Bingchong wrote:
>
> Hello ladies and gentlemen. This is my first post in the "pandoc-discuss" 
> group. Allow me to explain myself.
>
> ====TRYING TO====
>
> * Convert a lot of DOCX documents, that have text structured like this:
>
> ----
>
> Evigt liv til salg
>
> Hvis nyeste forskning besad.
> Gralen til evigt liv via morgenmad.
> Ville du gå til bords?
> Mæske dig i libidoens buffet.
>
> Eller tror du, fordi du tror?
> At evigt liv er en Guds givet gave.
> Til frit valg på hylde 1.
> Et omfavnende selv tak - det var så lidt.
>
> At det eksisterer kan vi ikke bevise.
> At det gør kan vi ikke benægte.
> Fakta, aktualitetens modstander.
> Og aktuelt er det evige liv for os.
>
> På den ene eller den anden led.
> Ønsker vi livet bliver ved.
> For det er i det levne liv.
> At livet giver 4.
>
> Så hvad gør en klog.
> Forsøger at leve evigt sæføli.
> Om ikke i kød og dundrende mørkt blod.
> Så ihukommelse af os selv i andre.
>
> Skakmat
> 160603
> (Genfødsel - evigt liv) - 2660 - kulturweekend
>
> ----
>
> So what I would like pandoc to do when executing it on a DOCX document of 
> the above type is:
>
>
>    1. Keep the whitespace between the verses and the first line which is 
>    the title
>    2. Keep the verse structure so that lines that are not divided by a 
>    whitespace line stay together
>
> ====TRIED====
>
>
>    1. *sudo pandoc -s file.docx -t markdown -o mydoc.md --wrap=none 
>    --extract-media . *--> that did not do the job
>    2. Searching through this discussion group to see if this had already 
>    been solved.
>    3. Had a good look at the Pandoc documentation. Disclaimer, I have no 
>    prior experience with LUA and have not used Pandoc to a great extend.
>    4. Then I tried with a LUA filter, inspired by this disccusion >> 
>    https://groups.google.com/forum/#!searchin/pandoc-discuss/paragraphs%7Csort:date/pandoc-discuss/wlP6AL11NIY/PxF4d6ilBQAJ
>       1. I modified it a bit and ended up with.
>    
> ```
> function Pandoc(doc)
>   local lb = pandoc.LineBlock(doc)
>   for i,b in pairs(doc.content) do
>     if b.t == "Para" and b.content ~= nil then
>       table.insert(lb.content, b.content)
>     end
>   end
>   return pandoc.Pandoc({lb}, doc.meta)
> end
> ```
> --> that gets the conversion in the right direction. Lines are not like 
> this
>
> ```
> Evigt liv til salg
>
> Hvis nyeste forskning besad.
>
> Gralen til evigt liv via morgenmad.
>
> Ville du gå til bords?
>
> Mæske dig i libidoens buffet.
>
> Eller tror du, fordi du tror?
> ```
>
> but like this:
>
> ```
> | Evigt liv til salg
> | Hvis nyeste forskning besad.
> | Gralen til evigt liv via morgenmad.
> | Ville du gå til bords?
> | Mæske dig i libidoens buffet.
> | Eller tror du, fordi du tror?
> | At evigt liv er en Guds givet gave.
> ```
>
> However as stated in the "..what I would like..." section, it does not:
>
>
>    1. Keep the whitespace between the verses and the first line which is 
>    the title
>    2. Keep the verse structure so that lines that are not divided by a 
>    whitespace line stay together
>
> ----
>
> So I'm seeking help on how to accomplish what I want with a LUA filter, as 
> this seems like the rigth path.
>
> Thank you very much :-) and a happy new year (it's soon :-).
>
> function Pandoc(doc)
>   local lb = pandoc.LineBlock(doc)
>   for i,b in pairs(doc.content) do
>     if b.t == "Para" and b.content ~= nil then
>       table.insert(lb.content, b.content)
>     end
>   end
>   return pandoc.Pandoc({lb}, doc.meta)
> end
>
> sudo pandoc -s /Volumes/IBIGDATA/IBIG\ Data/Documents/POEMS\ -\ PHILOSOPHIES\ -\ WORDPLAY/FINISHED\ POEMS/DANISH/2016/Evigt\ liv\ til\ salg\ 160603.docx -t markdown -o mydoc.md --wrap=none --extract-media .
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6e2a4a4f-4cba-459c-8f15-b19726fe2496%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 6442 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: DOCX to markdown. Poetry. Keep whitespace and verse structure.
       [not found]     ` <6e2a4a4f-4cba-459c-8f15-b19726fe2496-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-12-29 15:39       ` Lars Bingchong
       [not found]         ` <e3573e37-0a5a-494f-9a4a-9eff62788682-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2019-01-01  2:19       ` Lars Bingchong
  1 sibling, 1 reply; 6+ messages in thread
From: Lars Bingchong @ 2018-12-29 15:39 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 5606 bytes --]

Hi @mb21,

Thank you for reploying. My thought to your reply.

- I would like to avoid editing the docx/word documents. I have over 500 
hundred. So I want to convert them, retain the structure as described above.
- I read the line-blocks link you refered to. As you can see I've 
successfully converted, via a lua-filter, the output to line-blocks. 
However, I'm looking out for someone to help me with retaining the 
structure as described above. For good measure, here is the pointers again.


   1. Keep the whitespace between the verses and the first line which is 
   the title
   2. Keep the verse structure so that lines that are not divided by a 
   whitespace line stay together

  ....see my initial post.

Thank you very much.

On Friday, December 28, 2018 at 4:50:42 PM UTC+1, mb21 wrote:
>
> You really shouldn't need to use sudo to run pandoc.
>
> About your question: see https://pandoc.org/MANUAL.html#line-blocks
>
>
> On Friday, December 28, 2018 at 1:51:11 PM UTC+1, Lars Bingchong wrote:
>>
>> Hello ladies and gentlemen. This is my first post in the "pandoc-discuss" 
>> group. Allow me to explain myself.
>>
>> ====TRYING TO====
>>
>> * Convert a lot of DOCX documents, that have text structured like this:
>>
>> ----
>>
>> Evigt liv til salg
>>
>> Hvis nyeste forskning besad.
>> Gralen til evigt liv via morgenmad.
>> Ville du gå til bords?
>> Mæske dig i libidoens buffet.
>>
>> Eller tror du, fordi du tror?
>> At evigt liv er en Guds givet gave.
>> Til frit valg på hylde 1.
>> Et omfavnende selv tak - det var så lidt.
>>
>> At det eksisterer kan vi ikke bevise.
>> At det gør kan vi ikke benægte.
>> Fakta, aktualitetens modstander.
>> Og aktuelt er det evige liv for os.
>>
>> På den ene eller den anden led.
>> Ønsker vi livet bliver ved.
>> For det er i det levne liv.
>> At livet giver 4.
>>
>> Så hvad gør en klog.
>> Forsøger at leve evigt sæføli.
>> Om ikke i kød og dundrende mørkt blod.
>> Så ihukommelse af os selv i andre.
>>
>> Skakmat
>> 160603
>> (Genfødsel - evigt liv) - 2660 - kulturweekend
>>
>> ----
>>
>> So what I would like pandoc to do when executing it on a DOCX document of 
>> the above type is:
>>
>>
>>    1. Keep the whitespace between the verses and the first line which is 
>>    the title
>>    2. Keep the verse structure so that lines that are not divided by a 
>>    whitespace line stay together
>>
>> ====TRIED====
>>
>>
>>    1. *sudo pandoc -s file.docx -t markdown -o mydoc.md --wrap=none 
>>    --extract-media . *--> that did not do the job
>>    2. Searching through this discussion group to see if this had already 
>>    been solved.
>>    3. Had a good look at the Pandoc documentation. Disclaimer, I have no 
>>    prior experience with LUA and have not used Pandoc to a great extend.
>>    4. Then I tried with a LUA filter, inspired by this disccusion >> 
>>    https://groups.google.com/forum/#!searchin/pandoc-discuss/paragraphs%7Csort:date/pandoc-discuss/wlP6AL11NIY/PxF4d6ilBQAJ
>>       1. I modified it a bit and ended up with.
>>    
>> ```
>> function Pandoc(doc)
>>   local lb = pandoc.LineBlock(doc)
>>   for i,b in pairs(doc.content) do
>>     if b.t == "Para" and b.content ~= nil then
>>       table.insert(lb.content, b.content)
>>     end
>>   end
>>   return pandoc.Pandoc({lb}, doc.meta)
>> end
>> ```
>> --> that gets the conversion in the right direction. Lines are not like 
>> this
>>
>> ```
>> Evigt liv til salg
>>
>> Hvis nyeste forskning besad.
>>
>> Gralen til evigt liv via morgenmad.
>>
>> Ville du gå til bords?
>>
>> Mæske dig i libidoens buffet.
>>
>> Eller tror du, fordi du tror?
>> ```
>>
>> but like this:
>>
>> ```
>> | Evigt liv til salg
>> | Hvis nyeste forskning besad.
>> | Gralen til evigt liv via morgenmad.
>> | Ville du gå til bords?
>> | Mæske dig i libidoens buffet.
>> | Eller tror du, fordi du tror?
>> | At evigt liv er en Guds givet gave.
>> ```
>>
>> However as stated in the "..what I would like..." section, it does not:
>>
>>
>>    1. Keep the whitespace between the verses and the first line which is 
>>    the title
>>    2. Keep the verse structure so that lines that are not divided by a 
>>    whitespace line stay together
>>
>> ----
>>
>> So I'm seeking help on how to accomplish what I want with a LUA filter, 
>> as this seems like the rigth path.
>>
>> Thank you very much :-) and a happy new year (it's soon :-).
>>
>> function Pandoc(doc)
>>   local lb = pandoc.LineBlock(doc)
>>   for i,b in pairs(doc.content) do
>>     if b.t == "Para" and b.content ~= nil then
>>       table.insert(lb.content, b.content)
>>     end
>>   end
>>   return pandoc.Pandoc({lb}, doc.meta)
>> end
>>
>> sudo pandoc -s /Volumes/IBIGDATA/IBIG\ Data/Documents/POEMS\ -\ PHILOSOPHIES\ -\ WORDPLAY/FINISHED\ POEMS/DANISH/2016/Evigt\ liv\ til\ salg\ 160603.docx -t markdown -o mydoc.md --wrap=none --extract-media .
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e3573e37-0a5a-494f-9a4a-9eff62788682%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 8010 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: DOCX to markdown. Poetry. Keep whitespace and verse structure.
       [not found]         ` <e3573e37-0a5a-494f-9a4a-9eff62788682-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-12-31 18:09           ` BP Jonsson
       [not found]             ` <ba1957bc-8aea-2db2-b961-5ce72cf1861c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: BP Jonsson @ 2018-12-31 18:09 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, Lars Bingchong

Den 2018-12-29 kl. 16:39, skrev Lars Bingchong:
> - I would like to avoid editing the docx/word documents. I have over 500
> hundred. So I want to convert them, retain the structure as described above.
> - I read the line-blocks link you refered to. As you can see I've
> successfully converted, via a lua-filter, the output to line-blocks.
> However, I'm looking out for someone to help me with retaining the
> structure as described above. For good measure, here is the pointers again.
> 
> 
>     1. Keep the whitespace between the verses and the first line which is
>     the title
>     2. Keep the verse structure so that lines that are not divided by a
>     whitespace line stay together

Please try this filter: <https://git.io/fhtnX>

Don't forget to read the usage instructions: you must explicitly 
specify `docx+empty_paragraphs` as input format!

/bpj


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: DOCX to markdown. Poetry. Keep whitespace and verse structure.
       [not found]             ` <ba1957bc-8aea-2db2-b961-5ce72cf1861c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2019-01-01  2:17               ` Lars Bingchong
  0 siblings, 0 replies; 6+ messages in thread
From: Lars Bingchong @ 2019-01-01  2:17 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1743 bytes --]

Hi BP Jonsson,

That filter worked wonders. I'm thrilled to have it working. Mayn this is 
nice.

Thank you so much for your help. Highly appreciated :-D.

Happy new year.

On Monday, December 31, 2018 at 7:09:50 PM UTC+1, BP Jonsson wrote:
>
> Den 2018-12-29 kl. 16:39, skrev Lars Bingchong: 
> > - I would like to avoid editing the docx/word documents. I have over 500 
> > hundred. So I want to convert them, retain the structure as described 
> above. 
> > - I read the line-blocks link you refered to. As you can see I've 
> > successfully converted, via a lua-filter, the output to line-blocks. 
> > However, I'm looking out for someone to help me with retaining the 
> > structure as described above. For good measure, here is the pointers 
> again. 
> > 
> > 
> >     1. Keep the whitespace between the verses and the first line which 
> is 
> >     the title 
> >     2. Keep the verse structure so that lines that are not divided by a 
> >     whitespace line stay together 
>
> Please try this filter: <https://git.io/fhtnX> 
>
> Don't forget to read the usage instructions: you must explicitly 
> specify `docx+empty_paragraphs` as input format! 
>
> /bpj 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fd79c324-9e7c-4c62-a27d-b5093e0d641b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2890 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: DOCX to markdown. Poetry. Keep whitespace and verse structure.
       [not found]     ` <6e2a4a4f-4cba-459c-8f15-b19726fe2496-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2018-12-29 15:39       ` Lars Bingchong
@ 2019-01-01  2:19       ` Lars Bingchong
  1 sibling, 0 replies; 6+ messages in thread
From: Lars Bingchong @ 2019-01-01  2:19 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4994 bytes --]

Sure thang, your where right @mb21 I did not have to run `pandoc` as sudo. 
Always good to avoid that.

Thank you and a happy new year to you.

On Friday, December 28, 2018 at 4:50:42 PM UTC+1, mb21 wrote:
>
> You really shouldn't need to use sudo to run pandoc.
>
> About your question: see https://pandoc.org/MANUAL.html#line-blocks
>
>
> On Friday, December 28, 2018 at 1:51:11 PM UTC+1, Lars Bingchong wrote:
>>
>> Hello ladies and gentlemen. This is my first post in the "pandoc-discuss" 
>> group. Allow me to explain myself.
>>
>> ====TRYING TO====
>>
>> * Convert a lot of DOCX documents, that have text structured like this:
>>
>> ----
>>
>> Evigt liv til salg
>>
>> Hvis nyeste forskning besad.
>> Gralen til evigt liv via morgenmad.
>> Ville du gå til bords?
>> Mæske dig i libidoens buffet.
>>
>> Eller tror du, fordi du tror?
>> At evigt liv er en Guds givet gave.
>> Til frit valg på hylde 1.
>> Et omfavnende selv tak - det var så lidt.
>>
>> At det eksisterer kan vi ikke bevise.
>> At det gør kan vi ikke benægte.
>> Fakta, aktualitetens modstander.
>> Og aktuelt er det evige liv for os.
>>
>> På den ene eller den anden led.
>> Ønsker vi livet bliver ved.
>> For det er i det levne liv.
>> At livet giver 4.
>>
>> Så hvad gør en klog.
>> Forsøger at leve evigt sæføli.
>> Om ikke i kød og dundrende mørkt blod.
>> Så ihukommelse af os selv i andre.
>>
>> Skakmat
>> 160603
>> (Genfødsel - evigt liv) - 2660 - kulturweekend
>>
>> ----
>>
>> So what I would like pandoc to do when executing it on a DOCX document of 
>> the above type is:
>>
>>
>>    1. Keep the whitespace between the verses and the first line which is 
>>    the title
>>    2. Keep the verse structure so that lines that are not divided by a 
>>    whitespace line stay together
>>
>> ====TRIED====
>>
>>
>>    1. *sudo pandoc -s file.docx -t markdown -o mydoc.md --wrap=none 
>>    --extract-media . *--> that did not do the job
>>    2. Searching through this discussion group to see if this had already 
>>    been solved.
>>    3. Had a good look at the Pandoc documentation. Disclaimer, I have no 
>>    prior experience with LUA and have not used Pandoc to a great extend.
>>    4. Then I tried with a LUA filter, inspired by this disccusion >> 
>>    https://groups.google.com/forum/#!searchin/pandoc-discuss/paragraphs%7Csort:date/pandoc-discuss/wlP6AL11NIY/PxF4d6ilBQAJ
>>       1. I modified it a bit and ended up with.
>>    
>> ```
>> function Pandoc(doc)
>>   local lb = pandoc.LineBlock(doc)
>>   for i,b in pairs(doc.content) do
>>     if b.t == "Para" and b.content ~= nil then
>>       table.insert(lb.content, b.content)
>>     end
>>   end
>>   return pandoc.Pandoc({lb}, doc.meta)
>> end
>> ```
>> --> that gets the conversion in the right direction. Lines are not like 
>> this
>>
>> ```
>> Evigt liv til salg
>>
>> Hvis nyeste forskning besad.
>>
>> Gralen til evigt liv via morgenmad.
>>
>> Ville du gå til bords?
>>
>> Mæske dig i libidoens buffet.
>>
>> Eller tror du, fordi du tror?
>> ```
>>
>> but like this:
>>
>> ```
>> | Evigt liv til salg
>> | Hvis nyeste forskning besad.
>> | Gralen til evigt liv via morgenmad.
>> | Ville du gå til bords?
>> | Mæske dig i libidoens buffet.
>> | Eller tror du, fordi du tror?
>> | At evigt liv er en Guds givet gave.
>> ```
>>
>> However as stated in the "..what I would like..." section, it does not:
>>
>>
>>    1. Keep the whitespace between the verses and the first line which is 
>>    the title
>>    2. Keep the verse structure so that lines that are not divided by a 
>>    whitespace line stay together
>>
>> ----
>>
>> So I'm seeking help on how to accomplish what I want with a LUA filter, 
>> as this seems like the rigth path.
>>
>> Thank you very much :-) and a happy new year (it's soon :-).
>>
>> function Pandoc(doc)
>>   local lb = pandoc.LineBlock(doc)
>>   for i,b in pairs(doc.content) do
>>     if b.t == "Para" and b.content ~= nil then
>>       table.insert(lb.content, b.content)
>>     end
>>   end
>>   return pandoc.Pandoc({lb}, doc.meta)
>> end
>>
>> sudo pandoc -s /Volumes/IBIGDATA/IBIG\ Data/Documents/POEMS\ -\ PHILOSOPHIES\ -\ WORDPLAY/FINISHED\ POEMS/DANISH/2016/Evigt\ liv\ til\ salg\ 160603.docx -t markdown -o mydoc.md --wrap=none --extract-media .
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d8dd5662-cb41-41ee-aff6-a1af9eb2c9a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 7323 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-01-01  2:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-28 12:51 DOCX to markdown. Poetry. Keep whitespace and verse structure Lars Bingchong
     [not found] ` <4dfd9f3f-ca60-40b4-9925-9618ef468000-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-28 15:50   ` mb21
     [not found]     ` <6e2a4a4f-4cba-459c-8f15-b19726fe2496-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-29 15:39       ` Lars Bingchong
     [not found]         ` <e3573e37-0a5a-494f-9a4a-9eff62788682-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-31 18:09           ` BP Jonsson
     [not found]             ` <ba1957bc-8aea-2db2-b961-5ce72cf1861c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-01-01  2:17               ` Lars Bingchong
2019-01-01  2:19       ` Lars Bingchong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).