Accepted HTML input

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Accepted HTML input
@ 2012-04-09  9:04 mb21
       [not found] ` <da514ded-a9df-448e-9dce-821f0128df79-EyPQ8oKdLiaB2x89WGtKiFYGCWtFR9XvQQ4Iyu8u01E@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: mb21 @ 2012-04-09  9:04 UTC (permalink / raw)
  To: pandoc-discuss

Hi,

I was just wondering where I can look up exactly what subset of HTML
is supported as input format. Couldn't find anything anywhere.. in the
source code maybe..?

Thanks a lot!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Accepted HTML input
       [not found] ` <da514ded-a9df-448e-9dce-821f0128df79-EyPQ8oKdLiaB2x89WGtKiFYGCWtFR9XvQQ4Iyu8u01E@public.gmane.org>
@ 2012-04-09 14:41   ` John MacFarlane
       [not found]     ` <20120409144128.GA19039-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2012-05-02  7:55   ` Accepted HTML input mb21
  1 sibling, 1 reply; 29+ messages in thread
From: John MacFarlane @ 2012-04-09 14:41 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

You could look in src/Text/Pandoc/Readers/HTML.hs
Otherwise it isn't documented.

+++ mb21 [Apr 09 12 02:04 ]:
> Hi,
> 
> I was just wondering where I can look up exactly what subset of HTML
> is supported as input format. Couldn't find anything anywhere.. in the
> source code maybe..?
> 
> Thanks a lot!
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Accepted HTML input
       [not found]     ` <20120409144128.GA19039-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2012-04-10  8:46       ` mb21
  2012-04-10 19:52         ` XML Serialization of Markdown extended mb21
  0 siblings, 1 reply; 29+ messages in thread
From: mb21 @ 2012-04-10  8:46 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2732 bytes --]

Thanks a lot.

The reason I'm looking into this is that I have a custom XHTML dialect (a 
subset of tags plus a few attributes added to enable footnotes, citations, 
etc.) which I'd like to convert to a number of formats. So when I 
discovered Pandoc, I was really amazed. The question now is, however, how 
to get my stuff programmatically into pandoc. AFAIK, the only input formats 
that support all of pandoc's goodies are its Markdown extended, and native 
and json serializations of its AST. Neither of which are that much fun to 
generate from an XML source; Markdown would require escaping in lots of 
scenarios and I looked at the native and json formats which split 
everything at the word boundaries..

So what do you guys think is the best way to handle such scenarios? Would 
it make sense to define an XML serialization of Markdown extended and write 
a Reader for that? (I learned a bit of Haskell once.. :P) Then I could use 
some XSLT to get from my XML to that intermediate format. Or is either the 
native or json format stable enough that I should invest the time to export 
to those? (I 
[read](http://stackoverflow.com/questions/8770034/converting-ipython-notebook-files-json-based-to-other-formats-with-pandoc) 
that those are not documented yet..)

Thanks, would be great if there were a viable bridge from the XML-world 
into Pandoc!

On Monday, 9 April 2012 16:41:28 UTC+2, fiddlosopher wrote:
>
> You could look in src/Text/Pandoc/Readers/HTML.hs
> Otherwise it isn't documented.
>
> +++ mb21 [Apr 09 12 02:04 ]:
> > Hi,
> > 
> > I was just wondering where I can look up exactly what subset of HTML
> > is supported as input format. Couldn't find anything anywhere.. in the
> > source code maybe..?
> > 
> > Thanks a lot!
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To unsubscribe from this group, send email to 
> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > For more options, visit this group at 
> http://groups.google.com/group/pandoc-discuss?hl=en.
> > 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/Z56DJ1vrerwJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.

[-- Attachment #2: Type: text/html, Size: 3331 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* XML Serialization of Markdown extended
  2012-04-10  8:46       ` mb21
@ 2012-04-10 19:52         ` mb21
  2012-04-11  5:08           ` HansBKK
  0 siblings, 1 reply; 29+ messages in thread
From: mb21 @ 2012-04-10 19:52 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2851 bytes --]

Changed to a more appropriate subject. Thanks for any feedback.

The reason I'm looking into this is that I have a custom XHTML dialect (a 
> subset of tags plus a few attributes added to enable footnotes, citations, 
> etc.) which I'd like to convert to a number of formats. So when I 
> discovered Pandoc, I was really amazed. The question now is, however, how 
> to get my stuff programmatically into pandoc. AFAIK, the only input formats 
> that support all of pandoc's goodies are its Markdown extended, and native 
> and json serializations of its AST. Neither of which are that much fun to 
> generate from an XML source; Markdown would require escaping in lots of 
> scenarios and I looked at the native and json formats which split 
> everything at the word boundaries..
>
> So what do you guys think is the best way to handle such scenarios? Would 
> it make sense to define an XML serialization of Markdown extended and write 
> a Reader for that? (I learned a bit of Haskell once.. :P) Then I could use 
> some XSLT to get from my XML to that intermediate format. Or is either the 
> native or json format stable enough that I should invest the time to export 
> to those? (I [read](
> http://stackoverflow.com/questions/8770034/converting-ipython-notebook-files-json-based-to-other-formats-with-pandoc) 
> that those are not documented yet..)
>
> Thanks, would be great if there were a viable bridge from the XML-world 
> into Pandoc!
>
>
> On Monday, 9 April 2012 16:41:28 UTC+2, fiddlosopher wrote:
>>
>> You could look in src/Text/Pandoc/Readers/HTML.hs
>> Otherwise it isn't documented.
>>
>> +++ mb21 [Apr 09 12 02:04 ]:
>> > Hi,
>> > 
>> > I was just wondering where I can look up exactly what subset of HTML
>> > is supported as input format. Couldn't find anything anywhere.. in the
>> > source code maybe..?
>> > 
>> > Thanks a lot!
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups "pandoc-discuss" group.
>> > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> > To unsubscribe from this group, send email to 
>> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> > For more options, visit this group at 
>> http://groups.google.com/group/pandoc-discuss?hl=en.
>> > 
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/avFaje1BlDAJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: Type: text/html, Size: 3658 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-04-10 19:52         ` XML Serialization of Markdown extended mb21
@ 2012-04-11  5:08           ` HansBKK
  2012-04-12 10:56             ` mb21
  0 siblings, 1 reply; 29+ messages in thread
From: HansBKK @ 2012-04-11  5:08 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1853 bytes --]

On Wednesday, April 11, 2012 2:52:20 AM UTC+7, mb21 wrote:
>
> Changed to a more appropriate subject. Thanks for any feedback.
>
> This topic is way above my paygrade, but perhaps even a  nonsensical 
contribution will stimulate someone with more experience to continue the 
conversation.

My understanding is that the "native" and "internal JSON" representations 
are relatively stable in practice, however if John exercised his 
prerogative to swap out libraries or otherwise change pandoc's internal 
behavior, that would break any work you put into an external "add-on".

I would think it would be a far better investment of your time to fork the 
HTML and markdown readers and writers, and see if you can add to them, so 
that all the features supported in p-markdown which you'd like to 
"standardize" in HTML can survive a two-way roundtrip conversion.

If you do this in an incremental fashion and solicit feedback on specific 
approaches with examples, I think it is quite possible that this could be 
done in a way which will improve pandoc, and possibly help advance the idea 
that such structures can be standardized in the larger XHTML world.

This approach may be more difficult, but IMO would bring a more valuable 
and longer-lasting result than adding yet another intermediate syntax to 
bridge the two formats.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/_tGwxj4_VhoJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.

[-- Attachment #2: Type: text/html, Size: 2120 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-04-11  5:08           ` HansBKK
@ 2012-04-12 10:56             ` mb21
  2012-04-12 16:04               ` John MacFarlane
  0 siblings, 1 reply; 29+ messages in thread
From: mb21 @ 2012-04-12 10:56 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2478 bytes --]

Thanks for the feedback. Yes, with that "XML Serialization of markdown 
extended" I was actually thinking of something like XHTML plus a few new 
tags for footnotes etc, or maybe a subset of DocBook. But maybe most folks 
around here don't think it's a good idea to define such a thing as they 
just write a new Reader for every XML dialect they want to import into 
pandoc? The thing is, pandoc would be way more useful for people that are 
no Haskell-wizards if there were such a well defined XML input format...

On Wednesday, 11 April 2012 07:08:41 UTC+2, HansBKK wrote:
>
> On Wednesday, April 11, 2012 2:52:20 AM UTC+7, mb21 wrote:
>>
>> Changed to a more appropriate subject. Thanks for any feedback.
>>
>> This topic is way above my paygrade, but perhaps even a  nonsensical 
> contribution will stimulate someone with more experience to continue the 
> conversation.
>
> My understanding is that the "native" and "internal JSON" representations 
> are relatively stable in practice, however if John exercised his 
> prerogative to swap out libraries or otherwise change pandoc's internal 
> behavior, that would break any work you put into an external "add-on".
>
> I would think it would be a far better investment of your time to fork the 
> HTML and markdown readers and writers, and see if you can add to them, so 
> that all the features supported in p-markdown which you'd like to 
> "standardize" in HTML can survive a two-way roundtrip conversion.
>
> If you do this in an incremental fashion and solicit feedback on specific 
> approaches with examples, I think it is quite possible that this could be 
> done in a way which will improve pandoc, and possibly help advance the idea 
> that such structures can be standardized in the larger XHTML world.
>
> This approach may be more difficult, but IMO would bring a more valuable 
> and longer-lasting result than adding yet another intermediate syntax to 
> bridge the two formats.
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/OigrCacbCQQJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: Type: text/html, Size: 2819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-04-12 10:56             ` mb21
@ 2012-04-12 16:04               ` John MacFarlane
       [not found]                 ` <20120412160409.GB28518-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: John MacFarlane @ 2012-04-12 16:04 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ mb21 [Apr 12 12 03:56 ]:
>    Thanks for the feedback. Yes, with that "XML Serialization of markdown
>    extended" I was actually thinking of something like XHTML plus a few
>    new tags for footnotes etc, or maybe a subset of DocBook. But maybe
>    most folks around here don't think it's a good idea to define such a
>    thing as they just write a new Reader for every XML dialect they want
>    to import into pandoc? The thing is, pandoc would be way more useful
>    for people that are no Haskell-wizards if there were such a well
>    defined XML input format...

There are a couple of packages on hackagedb that would allow
automatic serialization of pandoc's data structures (and any
instance of the Data typeclass) to and from XML.  But they
don't seem to be well maintained -- neither currently builds for
me.

http://hackage.haskell.org/package/text-xml-generic
http://hackage.haskell.org/package/generic-xml

I think it should be possible to use DrIFT and HaXML, though.
I'll look into it.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
       [not found]                 ` <20120412160409.GB28518-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2012-04-12 16:45                   ` John MacFarlane
       [not found]                     ` <20120412164509.GD28518-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: John MacFarlane @ 2012-04-12 16:45 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ John MacFarlane [Apr 12 12 09:04 ]:
> +++ mb21 [Apr 12 12 03:56 ]:
> >    Thanks for the feedback. Yes, with that "XML Serialization of markdown
> >    extended" I was actually thinking of something like XHTML plus a few
> >    new tags for footnotes etc, or maybe a subset of DocBook. But maybe
> >    most folks around here don't think it's a good idea to define such a
> >    thing as they just write a new Reader for every XML dialect they want
> >    to import into pandoc? The thing is, pandoc would be way more useful
> >    for people that are no Haskell-wizards if there were such a well
> >    defined XML input format...
> 
> There are a couple of packages on hackagedb that would allow
> automatic serialization of pandoc's data structures (and any
> instance of the Data typeclass) to and from XML.  But they
> don't seem to be well maintained -- neither currently builds for
> me.
> 
> http://hackage.haskell.org/package/text-xml-generic
> http://hackage.haskell.org/package/generic-xml
> 
> I think it should be possible to use DrIFT and HaXML, though.
> I'll look into it.

I didn't have much success with this.  Does anyone know of a simple
way to serialize instances of Data and Typeable as XML, the way we
currently serialize them as JSON?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
       [not found]                     ` <20120412164509.GD28518-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2012-04-12 22:11                       ` John MacFarlane
       [not found]                         ` <20120412221122.GA1327-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: John MacFarlane @ 2012-04-12 22:11 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ John MacFarlane [Apr 12 12 09:45 ]:
> +++ John MacFarlane [Apr 12 12 09:04 ]:
> > +++ mb21 [Apr 12 12 03:56 ]:
> > >    Thanks for the feedback. Yes, with that "XML Serialization of markdown
> > >    extended" I was actually thinking of something like XHTML plus a few
> > >    new tags for footnotes etc, or maybe a subset of DocBook. But maybe
> > >    most folks around here don't think it's a good idea to define such a
> > >    thing as they just write a new Reader for every XML dialect they want
> > >    to import into pandoc? The thing is, pandoc would be way more useful
> > >    for people that are no Haskell-wizards if there were such a well
> > >    defined XML input format...

On reflection, maybe a DocBook reader would be useful.  IT would be a
lot of work to make a complete one, since DocBook is so big, but
not too hard to make one that parses the subset of DocBook that pandoc
can generate.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
       [not found]                         ` <20120412221122.GA1327-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2012-04-13  7:31                           ` mb21
  2012-04-14  4:56                             ` HansBKK
                                               ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: mb21 @ 2012-04-13  7:31 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Glad you're interested.
Well, what makes the native json hard to parse and generate are also all the explicit whitespaces..
About the docbook reader, would you rather fork the html reader or use something like haXML ?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-04-13  7:31                           ` mb21
@ 2012-04-14  4:56                             ` HansBKK
  2012-04-14  5:28                             ` fiddlosopher
                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: HansBKK @ 2012-04-14  4:56 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1004 bytes --]

Wasn't someone already working on an AsciiDoc reader?

If that covers all the features you're looking for, it's IMO more in the 
spirit of pandoc, especially from a non-tech usability POV.



On Friday, April 13, 2012 2:31:22 PM UTC+7, mb21 wrote:
>
> Glad you're interested.
> Well, what makes the native json hard to parse and generate are also all 
> the explicit whitespaces..
> About the docbook reader, would you rather fork the html reader or use 
> something like haXML ?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/ewAdA50hUEYJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: Type: text/html, Size: 1246 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-04-13  7:31                           ` mb21
  2012-04-14  4:56                             ` HansBKK
@ 2012-04-14  5:28                             ` fiddlosopher
  2012-04-14  5:36                             ` fiddlosopher
  2012-04-14 23:44                             ` John MacFarlane
  3 siblings, 0 replies; 29+ messages in thread
From: fiddlosopher @ 2012-04-14  5:28 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1135 bytes --]

On Friday, April 13, 2012 12:31:22 AM UTC-7, mb21 wrote:
>
> Glad you're interested.
> Well, what makes the native json hard to parse and generate are also all 
> the explicit whitespaces..
> About the docbook reader, would you rather fork the html reader or use 
> something like haXML ?


The HTML reader is designed to deal with real-world HTML, with unclosed 
tags, etc.  For DocBook, it would be best just to use an XML parser. 
 Pandoc already depends on xml 
(http://hackage.haskell.org/package/xml-1.3.12); it's not quite as nice as 
HaXML, but should be perfectly adequate for a DocBook reader.
 
John

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/7-tebXbkRmgJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: Type: text/html, Size: 1434 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-04-13  7:31                           ` mb21
  2012-04-14  4:56                             ` HansBKK
  2012-04-14  5:28                             ` fiddlosopher
@ 2012-04-14  5:36                             ` fiddlosopher
  2012-04-14 23:44                             ` John MacFarlane
  3 siblings, 0 replies; 29+ messages in thread
From: fiddlosopher @ 2012-04-14  5:36 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1000 bytes --]


>
> Well, what makes the native json hard to parse and generate are also all 
> the explicit whitespaces..
>

I didn't understand this remark.  It's trivial to parse and generate the 
native JSON; you can do it with one line of code:

Example, in ghci:

Prelude Text.Pandoc Text.JSON.Generic> putStrLn $ encodeJSON [Para [Str 
"hi"]]
[{"Para":[{"Str":"hi"}]]
Prelude Text.Pandoc Text.JSON.Generic> decodeJSON 
"[{\"Para\":[{\"Str\":\"hi\"}]}]" :: [Block]
[Para [Str "hi"]]
 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/f2Iv41-8AoIJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: Type: text/html, Size: 1372 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-04-13  7:31                           ` mb21
                                               ` (2 preceding siblings ...)
  2012-04-14  5:36                             ` fiddlosopher
@ 2012-04-14 23:44                             ` John MacFarlane
       [not found]                               ` <20120414234404.GB11272-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  3 siblings, 1 reply; 29+ messages in thread
From: John MacFarlane @ 2012-04-14 23:44 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

I've added a branch with a skeleton docbook reader -- so  far it
just supports <para> and <emphasis>, by way of example, but it should
be pretty clear how to extend it further.

https://github.com/jgm/pandoc/tree/docbookreader

+++ mb21 [Apr 13 12 00:31 ]:
> Glad you're interested.
> Well, what makes the native json hard to parse and generate are also all the explicit whitespaces..
> About the docbook reader, would you rather fork the html reader or use something like haXML ?
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/LD4c1KikMPQJ.
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
       [not found]                               ` <20120414234404.GB11272-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2012-04-15  5:35                                 ` John MacFarlane
       [not found]                                   ` <20120415053500.GB23326-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: John MacFarlane @ 2012-04-15  5:35 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ John MacFarlane [Apr 14 12 16:44 ]:
> I've added a branch with a skeleton docbook reader -- so  far it
> just supports <para> and <emphasis>, by way of example, but it should
> be pretty clear how to extend it further.
> 
> https://github.com/jgm/pandoc/tree/docbookreader

And now it supports about 70% of the docbook pandoc markdown can
produce...


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
       [not found]                                   ` <20120415053500.GB23326-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2012-04-15 15:15                                     ` mb21
  2012-04-19 12:57                                       ` mb21
  0 siblings, 1 reply; 29+ messages in thread
From: mb21 @ 2012-04-15 15:15 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1243 bytes --]

wow, that's amazing, thanks!
I'm gonna try to compile it but need to download the haskell platform for 
mac first (i'm on a slow line). i'll post back as soon as i know more.

about the native json, i meant it's not that easy to parse/generate with 
non-haskell tools (e.g. XSLT)

On Sunday, 15 April 2012 07:35:01 UTC+2, fiddlosopher wrote:
>
> +++ John MacFarlane [Apr 14 12 16:44 ]:
> > I've added a branch with a skeleton docbook reader -- so  far it
> > just supports <para> and <emphasis>, by way of example, but it should
> > be pretty clear how to extend it further.
> > 
> > https://github.com/jgm/pandoc/tree/docbookreader
>
> And now it supports about 70% of the docbook pandoc markdown can
> produce...
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/TBQSDsCv_nkJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: Type: text/html, Size: 1627 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-04-15 15:15                                     ` mb21
@ 2012-04-19 12:57                                       ` mb21
  2012-04-20 17:38                                         ` John MacFarlane
  0 siblings, 1 reply; 29+ messages in thread
From: mb21 @ 2012-04-19 12:57 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2063 bytes --]

Hello again,
the reader works pretty good for me. I added support for the elements: 
'code' (inline element just as 'literal'), 'link' and 'info' (which are 
preferred in DocBook 5). sent you a pull request, please feel free to 
clean/improve my code, I'm new to Haskell but seems to work for me. So the 
next thing i'll look into are images, especially parsing something like:

        <mediaobject>
            <imageobject>
                <imagedata fileref="myimage.png" />
            </imageobject>
        </mediaobject>
not sure how to handle this mix of block and inline elements in the code 
though, what goes into parseBlock and what into parseInline? (there is 
also http://docbook.org/tdg51/en/html/inlinemediaobject.html)


On Sunday, 15 April 2012 17:15:36 UTC+2, mb21 wrote:
>
> wow, that's amazing, thanks!
> I'm gonna try to compile it but need to download the haskell platform for 
> mac first (i'm on a slow line). i'll post back as soon as i know more.
>
> about the native json, i meant it's not that easy to parse/generate with 
> non-haskell tools (e.g. XSLT)
>
> On Sunday, 15 April 2012 07:35:01 UTC+2, fiddlosopher wrote:
>>
>> +++ John MacFarlane [Apr 14 12 16:44 ]:
>> > I've added a branch with a skeleton docbook reader -- so  far it
>> > just supports <para> and <emphasis>, by way of example, but it should
>> > be pretty clear how to extend it further.
>> > 
>> > https://github.com/jgm/pandoc/tree/docbookreader
>>
>> And now it supports about 70% of the docbook pandoc markdown can
>> produce...
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/r7pj5pHUKNYJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: Type: text/html, Size: 3126 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-04-19 12:57                                       ` mb21
@ 2012-04-20 17:38                                         ` John MacFarlane
       [not found]                                           ` <20120420173829.GC14589-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: John MacFarlane @ 2012-04-20 17:38 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ mb21 [Apr 19 12 05:57 ]:
>    Hello again,
> 
>    the reader works pretty good for me. I added support for the elements:
>    'code' (inline element just as 'literal'), 'link' and 'info' (which are
>    preferred in DocBook 5). sent you a pull request, please feel free to
>    clean/improve my code, I'm new to Haskell but seems to work for me.

I've merged this (into master) and made some stylistic changes.
Use spaces, not tabs, for indentation.

>    So
>    the next thing i'll look into are images, especially parsing something
>    like:
> 
>            <mediaobject>
>                <imageobject>
>                    <imagedata fileref="myimage.png" />
>                </imageobject>
>            </mediaobject>
> 
>    not sure how to handle this mix of block and inline elements in the
>    code though, what goes into parseBlock and what into parseInline?
>    (there is also http://docbook.org/tdg51/en/html/inlinemediaobject.html)

In the docbook writer, I use mediaobject for block-level images
with captions, and inlinemediaobject for inline images.

You can test this with

pandoc -t docbook
![pic](pic.jpg)

in a a paragraph: ![pic](pic.jpg)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
       [not found]                                           ` <20120420173829.GC14589-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2012-04-21 11:40                                             ` mb21
  2012-04-21 16:31                                               ` John MacFarlane
  0 siblings, 1 reply; 29+ messages in thread
From: mb21 @ 2012-04-21 11:40 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2689 bytes --]

great, thanks.

Meanwhile, I've been trying to parse
<blockquote>
  <attribution>Shakespeare</attribution>
  <para>To be, or not...</para>
</blockquote>

into something like

> To be, or not...
>
>      – Shakespeare

with the following code:

"blockquote" -> case findChild (unqual "attribution") e of -- TODO this 
match doesn't work yet for some reason
                            Just a  -> blockQuote <$> ( (getBlocks e) >> 
(getBlocks a) )
                            Nothing -> blockQuote <$> getBlocks e

I guess I'll first have to take a closer look into monads and state stuff.. 
as most of the time when I try something I get errors like:
Couldn't match expected type `Inlines'
   with actual type `StateT
        DBState
        transformers-0.2.2.0:Data.Functor.Identity.Identity
        Inlines'


On Friday, 20 April 2012 19:38:29 UTC+2, fiddlosopher wrote:
>
> +++ mb21 [Apr 19 12 05:57 ]:
> >    Hello again,
> > 
> >    the reader works pretty good for me. I added support for the elements:
> >    'code' (inline element just as 'literal'), 'link' and 'info' (which 
> are
> >    preferred in DocBook 5). sent you a pull request, please feel free to
> >    clean/improve my code, I'm new to Haskell but seems to work for me.
>
> I've merged this (into master) and made some stylistic changes.
> Use spaces, not tabs, for indentation.
>
> >    So
> >    the next thing i'll look into are images, especially parsing something
> >    like:
> > 
> >            <mediaobject>
> >                <imageobject>
> >                    <imagedata fileref="myimage.png" />
> >                </imageobject>
> >            </mediaobject>
> > 
> >    not sure how to handle this mix of block and inline elements in the
> >    code though, what goes into parseBlock and what into parseInline?
> >    (there is also 
> http://docbook.org/tdg51/en/html/inlinemediaobject.html)
>
> In the docbook writer, I use mediaobject for block-level images
> with captions, and inlinemediaobject for inline images.
>
> You can test this with
>
> pandoc -t docbook
> ![pic](pic.jpg)
>
> in a a paragraph: ![pic](pic.jpg)
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/i_YJ6ZmY5hAJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: Type: text/html, Size: 4701 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-04-21 11:40                                             ` mb21
@ 2012-04-21 16:31                                               ` John MacFarlane
       [not found]                                                 ` <20120421163127.GA7585-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: John MacFarlane @ 2012-04-21 16:31 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ mb21 [Apr 21 12 04:40 ]:
>    great, thanks.
> 
>    Meanwhile, I've been trying to parse
> 
>    <blockquote>
> 
>      <attribution>Shakespeare</attribution>
> 
>      <para>To be, or not...</para>
> 
>    </blockquote>
> 
>    into something like
> 
>    > To be, or not...
> 
>    >
> 
>    >       Shakespeare
>    with the following code:
> 

I've added code to handle this.  For more complex things like
this, it's easier to use do notation, I think.  Let me know if
anything is puzzling about this code.

John


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
       [not found]                                                 ` <20120421163127.GA7585-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2012-04-22 11:22                                                   ` mb21
  2012-05-02  7:49                                                     ` mb21
  0 siblings, 1 reply; 29+ messages in thread
From: mb21 @ 2012-04-22 11:22 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1285 bytes --]

thanks!
I think i'm slowly getting the hang of it, added support for mediaobject, 
inlinemediaobject and caption..

On Saturday, 21 April 2012 18:31:27 UTC+2, fiddlosopher wrote:
>
> +++ mb21 [Apr 21 12 04:40 ]:
> >    great, thanks.
> > 
> >    Meanwhile, I've been trying to parse
> > 
> >    <blockquote>
> > 
> >      <attribution>Shakespeare</attribution>
> > 
> >      <para>To be, or not...</para>
> > 
> >    </blockquote>
> > 
> >    into something like
> > 
> >    > To be, or not...
> > 
> >    >
> > 
> >    >       Shakespeare
> >    with the following code:
> > 
>
> I've added code to handle this.  For more complex things like
> this, it's easier to use do notation, I think.  Let me know if
> anything is puzzling about this code.
>
> John
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/swRUQtT8PbgJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: Type: text/html, Size: 1832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-04-22 11:22                                                   ` mb21
@ 2012-05-02  7:49                                                     ` mb21
  2012-05-02 18:04                                                       ` John MacFarlane
  0 siblings, 1 reply; 29+ messages in thread
From: mb21 @ 2012-05-02  7:49 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1168 bytes --]

So I was wondering how I could get typographic quotes with the docbook 
reader. Does it make sense that "--smart" is a reader option? Well, 
probably for converting -- and --- it makes sense. But I think it would be 
great if there was an option like --typographicquotes that simply did a 
conversion in the middle of the pipeline (that is after the reader and 
before the writer is invoked). What do you think, hard to do?

> btw, currently I get the following error when compiling the master branch 
from github (I'm on Mac OS X 10.7.3)
"cabal: cannot configure pandoc-1.9.3. It requires blaze-html ==0.5.* 
and blaze-markup >=0.5.1 && <0.6"

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/SpMS7mcJetkJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: Type: text/html, Size: 1613 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Accepted HTML input
       [not found] ` <da514ded-a9df-448e-9dce-821f0128df79-EyPQ8oKdLiaB2x89WGtKiFYGCWtFR9XvQQ4Iyu8u01E@public.gmane.org>
  2012-04-09 14:41   ` John MacFarlane
@ 2012-05-02  7:55   ` mb21
  1 sibling, 0 replies; 29+ messages in thread
From: mb21 @ 2012-05-02  7:55 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 948 bytes --]

So I was wondering how I could get typographic quotes with the docbook 
reader. Does it make sense that "--smart" is a reader option? Well, 
probably for converting -- and --- it makes sense. But I think it would be 
great if there was an option like --typographicquotes that simply did a 
conversion in the middle of the pipeline (that is after the reader and 
before the writer is invoked). What do you think, hard to do?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/pKDhP1XwOaAJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: Type: text/html, Size: 1052 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-05-02  7:49                                                     ` mb21
@ 2012-05-02 18:04                                                       ` John MacFarlane
       [not found]                                                         ` <20120502180452.GB14046-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: John MacFarlane @ 2012-05-02 18:04 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ mb21 [May 02 12 00:49 ]:
>    So I was wondering how I could get typographic quotes with the docbook
>    reader. Does it make sense that "--smart" is a reader option? Well,
>    probably for converting -- and --- it makes sense. But I think it would
>    be great if there was an option like --typographicquotes that simply
>    did a conversion in the middle of the pipeline (that is after the
>    reader and before the writer is invoked). What do you think, hard to
>    do?

It's probably possible.  But why would you want to do this?  DocBook
input should be using `<quote>` tags or unicode curly quotes.

>    btw, currently I get the following error when compiling the master
>    branch from github (I'm on Mac OS X 10.7.3)
> 
>    "cabal: cannot configure pandoc-1.9.3. It requires blaze-html ==0.5.*
>    and blaze-markup >=0.5.1 && <0.6"

Do a 'cabal update', so cabal will be able to fetch the needed packages,
which were recently released.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
       [not found]                                                         ` <20120502180452.GB14046-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2012-05-04 13:28                                                           ` mb21
  2012-05-04 14:48                                                             ` John MacFarlane
  0 siblings, 1 reply; 29+ messages in thread
From: mb21 @ 2012-05-04 13:28 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2150 bytes --]

On Wednesday, 2 May 2012 20:04:52 UTC+2, fiddlosopher wrote:
>
> +++ mb21 [May 02 12 00:49 ]: 
> >    So I was wondering how I could get typographic quotes with the 
> docbook 
> >    reader. Does it make sense that "--smart" is a reader option? Well, 
> >    probably for converting -- and --- it makes sense. But I think it 
> would 
> >    be great if there was an option like --typographicquotes that simply 
> >    did a conversion in the middle of the pipeline (that is after the 
> >    reader and before the writer is invoked). What do you think, hard to 
> >    do? 
>
> It's probably possible.  But why would you want to do this?  DocBook 
> input should be using `<quote>` tags or unicode curly quotes.
>

well yes, i see your point. the input document "should" use unicode curly 
quotes. but originally, someone always wrote the text and usually used the 
typewriter quotes on the keyboard. so some program will have to implement 
smart quotes and it would be nice if pandoc could (optionally) handle that 
for all input formats, since it's already implemented. Also, many people 
don't bother with proper typographic quotes until they typeset it with TeX 
where it really shows when you use the wrong ones..

 

> >    btw, currently I get the following error when compiling the master 
> >    branch from github (I'm on Mac OS X 10.7.3) 
> > 
> >    "cabal: cannot configure pandoc-1.9.3. It requires blaze-html ==0.5.* 
> >    and blaze-markup >=0.5.1 && <0.6" 
>
> Do a 'cabal update', so cabal will be able to fetch the needed packages, 
> which were recently released.

 
 thanks, obviously that fixed it 8)

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/aJnBq4UEff8J.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: Type: text/html, Size: 2919 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-05-04 13:28                                                           ` mb21
@ 2012-05-04 14:48                                                             ` John MacFarlane
       [not found]                                                               ` <20120504144837.GB19229-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: John MacFarlane @ 2012-05-04 14:48 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ mb21 [May 04 12 06:28 ]:
>      It's probably possible.  But why would you want to do this?  DocBook
>      input should be using `<quote>` tags or unicode curly quotes.
> 
>    well yes, i see your point. the input document "should" use unicode
>    curly quotes. but originally, someone always wrote the text and usually
>    used the typewriter quotes on the keyboard. so some program will have
>    to implement smart quotes and it would be nice if pandoc could
>    (optionally) handle that for all input formats, since it's already
>    implemented. Also, many people don't bother with proper typographic
>    quotes until they typeset it with TeX where it really shows when you
>    use the wrong ones..

If you use pandoc to convert your docbook to markdown, then pipe
through 'pandoc --smart', you'll get smart quotes in the result. That
seems the easiest solution.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
       [not found]                                                               ` <20120504144837.GB19229-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2012-05-04 15:08                                                                 ` mb21
  2012-05-04 19:28                                                                   ` John MacFarlane
  0 siblings, 1 reply; 29+ messages in thread
From: mb21 @ 2012-05-04 15:08 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2235 bytes --]



On Friday, 4 May 2012 16:48:38 UTC+2, fiddlosopher wrote:
>
> +++ mb21 [May 04 12 06:28 ]: 
> >      It's probably possible.  But why would you want to do this? 
>  DocBook 
> >      input should be using `<quote>` tags or unicode curly quotes. 
> > 
> >    well yes, i see your point. the input document "should" use unicode 
> >    curly quotes. but originally, someone always wrote the text and 
> usually 
> >    used the typewriter quotes on the keyboard. so some program will have 
> >    to implement smart quotes and it would be nice if pandoc could 
> >    (optionally) handle that for all input formats, since it's already 
> >    implemented. Also, many people don't bother with proper typographic 
> >    quotes until they typeset it with TeX where it really shows when you 
> >    use the wrong ones.. 
>
> If you use pandoc to convert your docbook to markdown, then pipe 
> through 'pandoc --smart', you'll get smart quotes in the result. That 
> seems the easiest solution. 
>


Sorry if I'm missing something, but that currently doesn't seem to work as 
the Docbook Reader doesn't support the --smart option.

$ cat test.docbook 
<?xml version="1.0" encoding="UTF-8"?>
<article xmlns="http://docbook.org/ns/docbook" version="5.0">
   <info>
      <title>My text</title>
   </info>
   <section>
      <title>Introduction</title>
      <para>My "home" is my castle</para>
  </section>
</article>

$ pandoc --smart -f docbook -t markdown test.docbook 
Introduction
============

My "home" is my castle

But I'd like to have: My “home” is my castle.
Well, actually I'd like to have that when using -t latex, but I guess the 
issue is the same. Thanks for your patience :)

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pandoc-discuss/-/d1N81lu2cpgJ.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to pandoc-discuss+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pandoc-discuss?hl=en.


[-- Attachment #2: Type: text/html, Size: 3981 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
  2012-05-04 15:08                                                                 ` mb21
@ 2012-05-04 19:28                                                                   ` John MacFarlane
       [not found]                                                                     ` <20120504192856.GA7379-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: John MacFarlane @ 2012-05-04 19:28 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ mb21 [May 04 12 08:08 ]:
>    On Friday, 4 May 2012 16:48:38 UTC+2, fiddlosopher wrote:
> 
>      +++ mb21 [May 04 12 06:28 ]:
>      >      It's probably possible.  But why would you want to do this?
>      DocBook
>      >      input should be using `<quote>` tags or unicode curly quotes.
>      >
>      >    well yes, i see your point. the input document "should" use
>      unicode
>      >    curly quotes. but originally, someone always wrote the text and
>      usually
>      >    used the typewriter quotes on the keyboard. so some program
>      will have
>      >    to implement smart quotes and it would be nice if pandoc could
>      >    (optionally) handle that for all input formats, since it's
>      already
>      >    implemented. Also, many people don't bother with proper
>      typographic
>      >    quotes until they typeset it with TeX where it really shows
>      when you
>      >    use the wrong ones..
>      If you use pandoc to convert your docbook to markdown, then pipe
>      through 'pandoc --smart', you'll get smart quotes in the result.
>      That
>      seems the easiest solution.
> 
>    Sorry if I'm missing something, but that currently doesn't seem to work
>    as the Docbook Reader doesn't support the --smart option.
> 
>    $ cat test.docbook
>    <?xml version="1.0" encoding="UTF-8"?>
>    <article xmlns="http://docbook.org/ns/docbook" version="5.0">
>       <info>
>          <title>My text</title>
>       </info>
>       <section>
>          <title>Introduction</title>
>          <para>My "home" is my castle</para>
>      </section>
>    </article>
>    $ pandoc --smart -f docbook -t markdown test.docbook
>    Introduction
>    ============
>    My "home" is my castle
> 
>    But I'd like to have: My home is my castle.
>    Well, actually I'd like to have that when using -t latex, but I guess
>    the issue is the same. Thanks for your patience :)

What I meant was:

pandoc -f docbook -t markdown mydoc.db | pandoc --smart -f markdown -t latex



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: XML Serialization of Markdown extended
       [not found]                                                                     ` <20120504192856.GA7379-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2012-05-05 12:46                                                                       ` mb21
  0 siblings, 0 replies; 29+ messages in thread
From: mb21 @ 2012-05-05 12:46 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 197 bytes --]


>
> What I meant was: 
>
> pandoc -f docbook -t markdown mydoc.db | pandoc --smart -f markdown -t 
> latex 
>

Ah, I see. probably not the most elegant solution but it certainly works :) 
thanks 

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-05-05 12:46 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-09  9:04 Accepted HTML input mb21
     [not found] ` <da514ded-a9df-448e-9dce-821f0128df79-EyPQ8oKdLiaB2x89WGtKiFYGCWtFR9XvQQ4Iyu8u01E@public.gmane.org>
2012-04-09 14:41   ` John MacFarlane
     [not found]     ` <20120409144128.GA19039-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2012-04-10  8:46       ` mb21
2012-04-10 19:52         ` XML Serialization of Markdown extended mb21
2012-04-11  5:08           ` HansBKK
2012-04-12 10:56             ` mb21
2012-04-12 16:04               ` John MacFarlane
     [not found]                 ` <20120412160409.GB28518-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2012-04-12 16:45                   ` John MacFarlane
     [not found]                     ` <20120412164509.GD28518-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2012-04-12 22:11                       ` John MacFarlane
     [not found]                         ` <20120412221122.GA1327-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2012-04-13  7:31                           ` mb21
2012-04-14  4:56                             ` HansBKK
2012-04-14  5:28                             ` fiddlosopher
2012-04-14  5:36                             ` fiddlosopher
2012-04-14 23:44                             ` John MacFarlane
     [not found]                               ` <20120414234404.GB11272-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2012-04-15  5:35                                 ` John MacFarlane
     [not found]                                   ` <20120415053500.GB23326-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2012-04-15 15:15                                     ` mb21
2012-04-19 12:57                                       ` mb21
2012-04-20 17:38                                         ` John MacFarlane
     [not found]                                           ` <20120420173829.GC14589-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2012-04-21 11:40                                             ` mb21
2012-04-21 16:31                                               ` John MacFarlane
     [not found]                                                 ` <20120421163127.GA7585-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2012-04-22 11:22                                                   ` mb21
2012-05-02  7:49                                                     ` mb21
2012-05-02 18:04                                                       ` John MacFarlane
     [not found]                                                         ` <20120502180452.GB14046-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2012-05-04 13:28                                                           ` mb21
2012-05-04 14:48                                                             ` John MacFarlane
     [not found]                                                               ` <20120504144837.GB19229-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2012-05-04 15:08                                                                 ` mb21
2012-05-04 19:28                                                                   ` John MacFarlane
     [not found]                                                                     ` <20120504192856.GA7379-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2012-05-05 12:46                                                                       ` mb21
2012-05-02  7:55   ` Accepted HTML input mb21

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).