public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Pandoc don't support 'span style' in table when convert from mediawiki to dokuwiki
@ 2019-07-24  5:09 Jason Wang
       [not found] ` <b7a02e85-4e10-473a-90b9-74cb2ff41bca-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Wang @ 2019-07-24  5:09 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1952 bytes --]

I want to convert mediawiki page to dokuwiki page
But not found external args can handle the following case
the 'span style' can't been format to normal style
Test case:

MediaWiki format
=======
{| style="width: 500px" cellspacing="1" cellpadding="1" border="1"
|-
| '''Project'''<br/>
| '''Product'''<br/>
|-
| X1<br/>
| MZ<br/>
|-
| X2<br/>
| DS<br/>
|-
| X3<br/>
| H1<br/>
|-
| <span style="color:#ff0000">X4</span><br/>
| <span style="color:#ff0000">YA</span><br/>
|-
| X5<br/>
| G1<br/>
|-
| X6<br/>
| BA<br/>
|}
======

try it online https://pandoc.org/try/

=======

|**Project**\\                                                    
 |**Product**\\                                                     |
|X1\\                                                              |MZ\\    
                                                          |
|X2\\                                                              |DS\\    
                                                          |
|X3\\                                                              |H1\\    
                                                          |
|<html><span style="color:#ff0000"></html>X4<html></span></html>\\ 
|<html><span style="color:#ff0000"></html>YA<html></span></html>\\ |
|X5\\                                                              |G1\\    
                                                          |
|X6\\                                                              |BA\\    
                                                          |

======

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b7a02e85-4e10-473a-90b9-74cb2ff41bca%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3353 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pandoc don't support 'span style' in table when convert from mediawiki to dokuwiki
       [not found] ` <b7a02e85-4e10-473a-90b9-74cb2ff41bca-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-07-24 10:06   ` Benct Philip Jonsson
       [not found]     ` <4f89d54e-b7f7-2179-d792-83cc63dd1078-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Benct Philip Jonsson @ 2019-07-24 10:06 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 2019-07-24 07:09, Jason Wang wrote:
> I want to convert mediawiki page to dokuwiki page
> But not found external args can handle the following case
> the 'span style' can't been format to normal style

For some reason Pandoc doesn't parse HTML which is embedded in 
Mediawiki, but returns it as raw HTML inlines.  This Lua filter fixes:

````
-- real-span.lua -- convert raw HTML spans/divs to native spans/divs in 
Pandoc
--
--      pandoc --lua-filter real-span ...

function RawInline (elem)
     if 'html' == elem.format then
         if elem.text:match('^%<span[^%>]*%>.-%<%/span%>$') then
             local ast = pandoc.read(elem.text, 'html')
             return pandoc.utils.blocks_to_inlines(ast.blocks, 
{pandoc.Space()})
         end
     end
     return nil
end

function RawBlock (elem)
     if 'html' == elem.format then
         if elem.text:match('^%<div[^%>]*%>.-%<%/div%>$') then
             local ast = pandoc.read(elem.text, 'html')
             return ast.blocks
         end
     end
     return nil
end

-- This software is Copyright (c) 2019 by Benct Philip Jonsson.
-- 
-- This is free software, licensed under:
-- 
--   The MIT (X11) License
-- See <http://www.opensource.org/licenses/mit-license.php>.
````

````sh
$ pandoc --lua-filter real-span.lua -r mediawiki -w dokuwiki 
example.mediawiki
|**Project**\\ |**Product**\\ |
|X1\\          |MZ\\          |
|X2\\          |DS\\          |
|X3\\          |H1\\          |
|X4\\          |YA\\          |
|X5\\          |G1\\          |
|X6\\          |BA\\          |
````


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pandoc don't support 'span style' in table when convert from mediawiki to dokuwiki
       [not found]     ` <4f89d54e-b7f7-2179-d792-83cc63dd1078-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2019-07-24 15:12       ` John MacFarlane
       [not found]         ` <m2o91jfovv.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: John MacFarlane @ 2019-07-24 15:12 UTC (permalink / raw)
  To: Benct Philip Jonsson, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Benct Philip Jonsson <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> On 2019-07-24 07:09, Jason Wang wrote:
>> I want to convert mediawiki page to dokuwiki page
>> But not found external args can handle the following case
>> the 'span style' can't been format to normal style
>
> For some reason Pandoc doesn't parse HTML which is embedded in 
> Mediawiki, but returns it as raw HTML inlines.  This Lua filter fixes:

"some reason" is this:  pandoc's HTML -> other conversions are,
in generally, lossy.  So we assume that if raw HTML is being used,
the author wants it to be conveyed exactly to the target.
This is also how raw HTML and LaTeX are treated in markdown.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pandoc don't support 'span style' in table when convert from mediawiki to dokuwiki
       [not found]         ` <m2o91jfovv.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-07-24 23:10           ` BP Jonsson
       [not found]             ` <CAFC_yuRwKku4qwFo=4QkgVJMs6=kz55tL=UOERdfwwH1VfPoeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: BP Jonsson @ 2019-07-24 23:10 UTC (permalink / raw)
  To: pandoc-discuss; +Cc: bpj

[-- Attachment #1: Type: text/plain, Size: 1838 bytes --]

That's a good reason. I just wasn't sure (and Pandoc does still parse
`<span>` tags in Markdown, doesn't it?) Sorry if I didn't use the most
idiomatic expression. English still trips me sometimes.

Den ons 24 juli 2019 17:12John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> skrev:

> Benct Philip Jonsson <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > On 2019-07-24 07:09, Jason Wang wrote:
> >> I want to convert mediawiki page to dokuwiki page
> >> But not found external args can handle the following case
> >> the 'span style' can't been format to normal style
> >
> > For some reason Pandoc doesn't parse HTML which is embedded in
> > Mediawiki, but returns it as raw HTML inlines.  This Lua filter fixes:
>
> "some reason" is this:  pandoc's HTML -> other conversions are,
> in generally, lossy.  So we assume that if raw HTML is being used,
> the author wants it to be conveyed exactly to the target.
> This is also how raw HTML and LaTeX are treated in markdown.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/m2o91jfovv.fsf%40johnmacfarlane.net
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuRwKku4qwFo%3D4QkgVJMs6%3Dkz55tL%3DUOERdfwwH1VfPoeQ%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 2916 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pandoc don't support 'span style' in table when convert from mediawiki to dokuwiki
       [not found]             ` <CAFC_yuRwKku4qwFo=4QkgVJMs6=kz55tL=UOERdfwwH1VfPoeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2019-07-25  0:21               ` John MacFarlane
       [not found]                 ` <yh480kd0hzgdzz.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  2019-08-02  5:28               ` Jason Wang
  1 sibling, 1 reply; 9+ messages in thread
From: John MacFarlane @ 2019-07-25  0:21 UTC (permalink / raw)
  To: BP Jonsson, pandoc-discuss; +Cc: bpj

BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> That's a good reason. I just wasn't sure (and Pandoc does still parse
> `<span>` tags in Markdown, doesn't it?) Sorry if I didn't use the most
> idiomatic expression. English still trips me sometimes.

Yes, span and div are special cases, because we have native Span and
Div elements in pandoc's AST.  So we can do lossless conversions here.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pandoc don't support 'span style' in table when convert from mediawiki to dokuwiki
       [not found]             ` <CAFC_yuRwKku4qwFo=4QkgVJMs6=kz55tL=UOERdfwwH1VfPoeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2019-07-25  0:21               ` John MacFarlane
@ 2019-08-02  5:28               ` Jason Wang
  1 sibling, 0 replies; 9+ messages in thread
From: Jason Wang @ 2019-08-02  5:28 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2146 bytes --]

Dear Benct Philip Jonsson,John MacFarlane:

  Thanks for explain and sample
  I will try to resolve the other mediawiki anomaly display

Thanks

在 2019年7月25日星期四 UTC+8上午7:11:00,BP Jonsson写道:
>
> That's a good reason. I just wasn't sure (and Pandoc does still parse 
> `<span>` tags in Markdown, doesn't it?) Sorry if I didn't use the most 
> idiomatic expression. English still trips me sometimes.
>
> Den ons 24 juli 2019 17:12John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org <javascript:>> 
> skrev:
>
>> Benct Philip Jonsson <mel...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes:
>>
>> > On 2019-07-24 07:09, Jason Wang wrote:
>> >> I want to convert mediawiki page to dokuwiki page
>> >> But not found external args can handle the following case
>> >> the 'span style' can't been format to normal style
>> >
>> > For some reason Pandoc doesn't parse HTML which is embedded in 
>> > Mediawiki, but returns it as raw HTML inlines.  This Lua filter fixes:
>>
>> "some reason" is this:  pandoc's HTML -> other conversions are,
>> in generally, lossy.  So we assume that if raw HTML is being used,
>> the author wants it to be conveyed exactly to the target.
>> This is also how raw HTML and LaTeX are treated in markdown.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/m2o91jfovv.fsf%40johnmacfarlane.net
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d0da9225-d9c0-49b7-9081-8e079f440076%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3884 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pandoc don't support 'span style' in table when convert from mediawiki to dokuwiki
       [not found]                 ` <yh480kd0hzgdzz.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-08-05  3:14                   ` Jason Wang
       [not found]                     ` <474a5e74-ce15-4b69-8620-9d3a87984644-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Wang @ 2019-08-05  3:14 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1535 bytes --]

I am try to convert following table in mediawiki format
{| border = "1" width="100%" class="wikitable"
|-
|ABC.
''Dec 27, 2018''
|- 
|}

It failed being read because of style information not support other option 
except 'style=xxx'


Error at "source" (line 2, column 4):
unexpected "b"
expecting lf new-line, "!", "<" or "|"
{| border = "1" width="100%" class="wikitable"

Dear Benct Philip Jonsson,John MacFarlane
For this case , maybe need change src/Text/Pandoc/Readers/MediaWiki.hs ?, I 
not found option(e.g. --skip-tables and so on ) can handle it,

在 2019年7月25日星期四 UTC+8上午8:22:10,John MacFarlane写道:
>
> BP Jonsson <bpjo...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: 
>
> > That's a good reason. I just wasn't sure (and Pandoc does still parse 
> > `<span>` tags in Markdown, doesn't it?) Sorry if I didn't use the most 
> > idiomatic expression. English still trips me sometimes. 
>
> Yes, span and div are special cases, because we have native Span and 
> Div elements in pandoc's AST.  So we can do lossless conversions here. 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/474a5e74-ce15-4b69-8620-9d3a87984644%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2522 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pandoc don't support 'span style' in table when convert from mediawiki to dokuwiki
       [not found]                     ` <474a5e74-ce15-4b69-8620-9d3a87984644-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-08-05  4:01                       ` John MacFarlane
       [not found]                         ` <m2y308gt0o.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: John MacFarlane @ 2019-08-05  4:01 UTC (permalink / raw)
  To: Jason Wang, pandoc-discuss


If there is valid mediawiki that causes pandoc's
parser to fail with an error, then I'd consider
that a bug.  Can you report your case on our bug
tracker, https://github.com/jgm/pandoc/issues, so
we don't lose track of it?

Jason Wang <ics.ipv6-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I am try to convert following table in mediawiki format
> {| border = "1" width="100%" class="wikitable"
> |-
> |ABC.
> ''Dec 27, 2018''
> |- 
> |}
>
> It failed being read because of style information not support other option 
> except 'style=xxx'
>
>
> Error at "source" (line 2, column 4):
> unexpected "b"
> expecting lf new-line, "!", "<" or "|"
> {| border = "1" width="100%" class="wikitable"


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Pandoc don't support 'span style' in table when convert from mediawiki to dokuwiki
       [not found]                         ` <m2y308gt0o.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-08-07 11:29                           ` Jason Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Jason Wang @ 2019-08-07 11:29 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1452 bytes --]

I find the root cause is include space between keyword(e.g. border) and 
equal sign, I fixed the non standard format before pandoc execute  

在 2019年8月5日星期一 UTC+8下午12:01:36,John MacFarlane写道:
>
>
> If there is valid mediawiki that causes pandoc's 
> parser to fail with an error, then I'd consider 
> that a bug.  Can you report your case on our bug 
> tracker, https://github.com/jgm/pandoc/issues, so 
> we don't lose track of it? 
>
> Jason Wang <ics...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: 
>
> > I am try to convert following table in mediawiki format 
> > {| border = "1" width="100%" class="wikitable" 
> > |- 
> > |ABC. 
> > ''Dec 27, 2018'' 
> > |- 
> > |} 
> > 
> > It failed being read because of style information not support other 
> option 
> > except 'style=xxx' 
> > 
> > 
> > Error at "source" (line 2, column 4): 
> > unexpected "b" 
> > expecting lf new-line, "!", "<" or "|" 
> > {| border = "1" width="100%" class="wikitable" 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5a5d492c-795b-4b61-a34f-8023d030813d%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2785 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-08-07 11:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-24  5:09 Pandoc don't support 'span style' in table when convert from mediawiki to dokuwiki Jason Wang
     [not found] ` <b7a02e85-4e10-473a-90b9-74cb2ff41bca-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-07-24 10:06   ` Benct Philip Jonsson
     [not found]     ` <4f89d54e-b7f7-2179-d792-83cc63dd1078-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-07-24 15:12       ` John MacFarlane
     [not found]         ` <m2o91jfovv.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-07-24 23:10           ` BP Jonsson
     [not found]             ` <CAFC_yuRwKku4qwFo=4QkgVJMs6=kz55tL=UOERdfwwH1VfPoeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-07-25  0:21               ` John MacFarlane
     [not found]                 ` <yh480kd0hzgdzz.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-08-05  3:14                   ` Jason Wang
     [not found]                     ` <474a5e74-ce15-4b69-8620-9d3a87984644-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-08-05  4:01                       ` John MacFarlane
     [not found]                         ` <m2y308gt0o.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-08-07 11:29                           ` Jason Wang
2019-08-02  5:28               ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).