public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* markdown writer line wrapping
@ 2010-11-24  1:51 Nathan Gass
       [not found] ` <4CEC6F95.5000408-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Nathan Gass @ 2010-11-24  1:51 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Some questions about the line wrapping implementation for the markdown 
writer.

Is it correct that line wrapping only happens at the top level, so 
something like *very long emphasized text ... end of it* currently does 
not get wrapped?

And how would I go about to implement arbitrary deep line wrapping, so 
that something like *very long empahsized text [with a very long 
citation inside for @key p. 10]* gets wrapped correctly?

How does line-wrapping work in pandoc?

Nathan


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: markdown writer line wrapping
       [not found] ` <4CEC6F95.5000408-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
@ 2010-11-24  3:25   ` John MacFarlane
       [not found]     ` <20101124032534.GB25133-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: John MacFarlane @ 2010-11-24  3:25 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Nathan Gass [Nov 24 10 02:51 ]:
> Some questions about the line wrapping implementation for the
> markdown writer.
> 
> Is it correct that line wrapping only happens at the top level, so
> something like *very long emphasized text ... end of it* currently
> does not get wrapped?

Right.

> And how would I go about to implement arbitrary deep line wrapping,
> so that something like *very long empahsized text [with a very long
> citation inside for @key p. 10]* gets wrapped correctly?
> 
> How does line-wrapping work in pandoc?

See wrapped in Text.Pandoc.Shared.  (Also wrappedMarkdown in the
Markdown writer, which handles complications due to line breaks.)

You're right, it splits an [Inline] by Space at the top level, then applies a
function [Inline] -> m Doc to the sublists that result, then applies
fsep (from the PrettyPrint library) to combine the resulting [Doc]
into a single Doc with line wrapping.

Unfortunately, I can't see an easy fix (one that doesn't require major
architectural changes).

John


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: markdown writer line wrapping
       [not found]     ` <20101124032534.GB25133-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-12-13  4:38       ` John MacFarlane
  2010-12-18 21:46       ` John MacFarlane
  1 sibling, 0 replies; 10+ messages in thread
From: John MacFarlane @ 2010-12-13  4:38 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ John MacFarlane [Nov 23 10 19:25 ]:
> +++ Nathan Gass [Nov 24 10 02:51 ]:
> > Some questions about the line wrapping implementation for the
> > markdown writer.
> > 
> > Is it correct that line wrapping only happens at the top level, so
> > something like *very long emphasized text ... end of it* currently
> > does not get wrapped?
> 
> Right.
> 
> > And how would I go about to implement arbitrary deep line wrapping,
> > so that something like *very long empahsized text [with a very long
> > citation inside for @key p. 10]* gets wrapped correctly?
> > 
> > How does line-wrapping work in pandoc?
> 
> See wrapped in Text.Pandoc.Shared.  (Also wrappedMarkdown in the
> Markdown writer, which handles complications due to line breaks.)
> 
> You're right, it splits an [Inline] by Space at the top level, then applies a
> function [Inline] -> m Doc to the sublists that result, then applies
> fsep (from the PrettyPrint library) to combine the resulting [Doc]
> into a single Doc with line wrapping.
> 
> Unfortunately, I can't see an easy fix (one that doesn't require major
> architectural changes).

I've been working on a small prettyprinting library to use instead
of the one from 'pretty'.  It's designed to be a better fit from pandoc,
and it solves this line wrapping issue.

Unfortunately, it's currently slower than the standard prettyprinting library.
If you want to look at it, it's in the 'pretty' branch of jgm/pandoc on
github.

I haven't worked much yet on optimizing it, and so far I've just worked
it into the markdown writer -- and only incompletely.
Benchmarks show that it's significantly slower than the old version.
33 ms vs 18 ms.

Nonetheless I'm thinking about using it to replace Text.PrettyPrint.HughesPJ
throughout, if it can be optimized a bit...  Right now I'm using DLists;
I might try using Blaze.Builder.

John


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: markdown writer line wrapping
       [not found]     ` <20101124032534.GB25133-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-12-13  4:38       ` John MacFarlane
@ 2010-12-18 21:46       ` John MacFarlane
       [not found]         ` <20101218214621.GA3416-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: John MacFarlane @ 2010-12-18 21:46 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ John MacFarlane [Nov 23 10 19:25 ]:
> +++ Nathan Gass [Nov 24 10 02:51 ]:
> > Some questions about the line wrapping implementation for the
> > markdown writer.
> > 
> > Is it correct that line wrapping only happens at the top level, so
> > something like *very long emphasized text ... end of it* currently
> > does not get wrapped?
> 
> Right.
> 
> > And how would I go about to implement arbitrary deep line wrapping,
> > so that something like *very long empahsized text [with a very long
> > citation inside for @key p. 10]* gets wrapped correctly?
> > 
> > How does line-wrapping work in pandoc?
> 
> See wrapped in Text.Pandoc.Shared.  (Also wrappedMarkdown in the
> Markdown writer, which handles complications due to line breaks.)
> 
> You're right, it splits an [Inline] by Space at the top level, then applies a
> function [Inline] -> m Doc to the sublists that result, then applies
> fsep (from the PrettyPrint library) to combine the resulting [Doc]
> into a single Doc with line wrapping.
> 
> Unfortunately, I can't see an easy fix (one that doesn't require major
> architectural changes).

OK, I've made the major architectural changes that were required.
HEAD now contains a new prettyprinting library that is much better
suited to pandoc than Text.PrettyPrint.HughesPJ (which is designed
for source code, not text).

Wrapping in markdown, plain, and rst now works much better.
Also, duplicate blank lines are eliminated.

It still remains to adapt the other writers to use the new library,
but this can happen gradually.

I've also added a --columns option to pandoc.

John


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: markdown writer line wrapping
       [not found]         ` <20101218214621.GA3416-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-12-18 22:58           ` John MacFarlane
       [not found]             ` <20101218225821.GC4805-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: John MacFarlane @ 2010-12-18 22:58 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ John MacFarlane [Dec 18 10 13:46 ]:
> +++ John MacFarlane [Nov 23 10 19:25 ]:
> > +++ Nathan Gass [Nov 24 10 02:51 ]:
> > > Some questions about the line wrapping implementation for the
> > > markdown writer.
> > > 
> > > Is it correct that line wrapping only happens at the top level, so
> > > something like *very long emphasized text ... end of it* currently
> > > does not get wrapped?
> > 
> > Right.
> > 
> > > And how would I go about to implement arbitrary deep line wrapping,
> > > so that something like *very long empahsized text [with a very long
> > > citation inside for @key p. 10]* gets wrapped correctly?
> > > 
> > > How does line-wrapping work in pandoc?
> > 
> > See wrapped in Text.Pandoc.Shared.  (Also wrappedMarkdown in the
> > Markdown writer, which handles complications due to line breaks.)
> > 
> > You're right, it splits an [Inline] by Space at the top level, then applies a
> > function [Inline] -> m Doc to the sublists that result, then applies
> > fsep (from the PrettyPrint library) to combine the resulting [Doc]
> > into a single Doc with line wrapping.
> > 
> > Unfortunately, I can't see an easy fix (one that doesn't require major
> > architectural changes).
> 
> OK, I've made the major architectural changes that were required.
> HEAD now contains a new prettyprinting library that is much better
> suited to pandoc than Text.PrettyPrint.HughesPJ (which is designed
> for source code, not text).
> 
> Wrapping in markdown, plain, and rst now works much better.
> Also, duplicate blank lines are eliminated.
> 
> It still remains to adapt the other writers to use the new library,
> but this can happen gradually.
> 
> I've also added a --columns option to pandoc.

To clarify:  this specifies the column width for text wrapping.

I should also note that the new prettyprinting library is significantly
faster - so we have good speed improvements in the writers that use
it.

John


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: markdown writer line wrapping
       [not found]             ` <20101218225821.GC4805-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-12-18 23:46               ` Simon Michael
  2010-12-20 18:50               ` BP Jonsson
  1 sibling, 0 replies; 10+ messages in thread
From: Simon Michael @ 2010-12-18 23:46 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Very nice.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: markdown writer line wrapping
       [not found]             ` <20101218225821.GC4805-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2010-12-18 23:46               ` Simon Michael
@ 2010-12-20 18:50               ` BP Jonsson
       [not found]                 ` <4D0FA58A.8090001-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: BP Jonsson @ 2010-12-20 18:50 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

2010-12-18 23:58, John MacFarlane skrev:
> +++ John MacFarlane [Dec 18 10 13:46 ]:
>> +++ John MacFarlane [Nov 23 10 19:25 ]:
>>> +++ Nathan Gass [Nov 24 10 02:51 ]:
>>>> Some questions about the line wrapping implementation for the
>>>> markdown writer.
>>>>
>>>> Is it correct that line wrapping only happens at the top level, so
>>>> something like *very long emphasized text ... end of it* currently
>>>> does not get wrapped?
>>>
>>> Right.
>>>
>>>> And how would I go about to implement arbitrary deep line wrapping,
>>>> so that something like *very long empahsized text [with a very long
>>>> citation inside for @key p. 10]* gets wrapped correctly?
>>>>
>>>> How does line-wrapping work in pandoc?
>>>
>>> See wrapped in Text.Pandoc.Shared.  (Also wrappedMarkdown in the
>>> Markdown writer, which handles complications due to line breaks.)
>>>
>>> You're right, it splits an [Inline] by Space at the top level, then applies a
>>> function [Inline] ->  m Doc to the sublists that result, then applies
>>> fsep (from the PrettyPrint library) to combine the resulting [Doc]
>>> into a single Doc with line wrapping.
>>>
>>> Unfortunately, I can't see an easy fix (one that doesn't require major
>>> architectural changes).
>>
>> OK, I've made the major architectural changes that were required.
>> HEAD now contains a new prettyprinting library that is much better
>> suited to pandoc than Text.PrettyPrint.HughesPJ (which is designed
>> for source code, not text).
>>
>> Wrapping in markdown, plain, and rst now works much better.
>> Also, duplicate blank lines are eliminated.
>>
>> It still remains to adapt the other writers to use the new library,
>> but this can happen gradually.
>>
>> I've also added a --columns option to pandoc.
>
> To clarify:  this specifies the column width for text wrapping.
>
> I should also note that the new prettyprinting library is significantly
> faster - so we have good speed improvements in the writers that use
> it.
>
> John
>

When will these things be in the release?

And what about the other things I whined ;-) about recently?

> *   I prefer asterisk or plus as list bullets, the markdown writer
>     uses hyphen.
> *   I like to use underscores for emphasis and asterisks for strong
>     emphasis. The markdown writer uses asterisks for both.
> *   I prefer a smaller wrap width than 75 columns.
> *   I prefer  + newline for hard breaks, the markdown writer uses
>     two spaces + newline.
> *   The markdown writer squeezes tables laterally. I prefer the
>     left margin of columns to fall at tabstops.
> *   Delimited code blocks are converted to indented code blocks,
>     and any highlighting classes are lost -- which is really serious.
>
> The first three are minor annoyances which can be fixed with a perl
> oneliner but the others are each more serious than the preceding.
> The desirability of configurability and a config file rears its
> ugly head again.

Is the google code bug tracker still the place to go?
And should I enter them as one or several issues?
I guess one enhancement request for configurability
of the first four and one bug each for the last two.

/bpj


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: markdown writer line wrapping
       [not found]                 ` <4D0FA58A.8090001-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2010-12-20 19:16                   ` John MacFarlane
       [not found]                     ` <20101220191623.GA15603-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: John MacFarlane @ 2010-12-20 19:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ BP Jonsson [Dec 20 10 19:50 ]:
> 2010-12-18 23:58, John MacFarlane skrev:
 >>
> >>I've also added a --columns option to pandoc.
> >
> >To clarify:  this specifies the column width for text wrapping.
> >
> >I should also note that the new prettyprinting library is significantly
> >faster - so we have good speed improvements in the writers that use
> >it.
> >
> >John
> >
> 
> When will these things be in the release?

They will be in the 1.7 release.  I expect this will happen within
a few weeks.

> And what about the other things I whined ;-) about recently?
> 
> >*   I prefer asterisk or plus as list bullets, the markdown writer
> >    uses hyphen.

Everyone has different preferences here.  I can't satisfy
everyone!

> >*   I like to use underscores for emphasis and asterisks for strong
> >    emphasis. The markdown writer uses asterisks for both.

See above.

> >*   I prefer a smaller wrap width than 75 columns.

You can get that now using --columns.

> >*   I prefer \ + newline for hard breaks, the markdown writer uses
> >    two spaces + newline.

The reason for this is that the 2 spaces + newline is compatible with
standard markdown.  I've tried to keep the writer's output compatible,
where possible. This seems valuable.

> >*   The markdown writer squeezes tables laterally. I prefer the
> >    left margin of columns to fall at tabstops.

Pandoc tries to size the table columns to the same proportions as they were
in the original document. The new writer should be a bit better at preserving
absolute widths in the case where your input and output have the same column
size.  But there are always potential rounding issues.

> >*   Delimited code blocks are converted to indented code blocks,
> >    and any highlighting classes are lost -- which is really serious.

Again, this was motivated by the desire to keep the output compatible
with standard markdown, which doesn't have delimited code blocks.
Think about the use case of someone converting HTML to markdown.

But maybe what I should do is make the writer sensitive to the --strict
flag?

> Is the google code bug tracker still the place to go?

Yes.

John


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: markdown writer line wrapping
       [not found]                     ` <20101220191623.GA15603-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-12-21  3:50                       ` John MacFarlane
       [not found]                         ` <20101221035024.GA13268-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: John MacFarlane @ 2010-12-21  3:50 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ John MacFarlane [Dec 20 10 11:16 ]:
> +++ BP Jonsson [Dec 20 10 19:50 ]:
> > 2010-12-18 23:58, John MacFarlane skrev:
>  >>
> > >>I've also added a --columns option to pandoc.
> > >
> > >To clarify:  this specifies the column width for text wrapping.
> > >
> > >I should also note that the new prettyprinting library is significantly
> > >faster - so we have good speed improvements in the writers that use
> > >it.
> > >
> > >John
> > >
> > 
> > When will these things be in the release?
> 
> They will be in the 1.7 release.  I expect this will happen within
> a few weeks.
> 
> > And what about the other things I whined ;-) about recently?
> > 
> > >*   I prefer asterisk or plus as list bullets, the markdown writer
> > >    uses hyphen.
> 
> Everyone has different preferences here.  I can't satisfy
> everyone!
> 
> > >*   I like to use underscores for emphasis and asterisks for strong
> > >    emphasis. The markdown writer uses asterisks for both.
> 
> See above.
> 
> > >*   I prefer a smaller wrap width than 75 columns.
> 
> You can get that now using --columns.
> 
> > >*   I prefer \ + newline for hard breaks, the markdown writer uses
> > >    two spaces + newline.
> 
> The reason for this is that the 2 spaces + newline is compatible with
> standard markdown.  I've tried to keep the writer's output compatible,
> where possible. This seems valuable.
> 
> > >*   The markdown writer squeezes tables laterally. I prefer the
> > >    left margin of columns to fall at tabstops.
> 
> Pandoc tries to size the table columns to the same proportions as they were
> in the original document. The new writer should be a bit better at preserving
> absolute widths in the case where your input and output have the same column
> size.  But there are always potential rounding issues.
> 
> > >*   Delimited code blocks are converted to indented code blocks,
> > >    and any highlighting classes are lost -- which is really serious.
> 
> Again, this was motivated by the desire to keep the output compatible
> with standard markdown, which doesn't have delimited code blocks.
> Think about the use case of someone converting HTML to markdown.
> 
> But maybe what I should do is make the writer sensitive to the --strict
> flag?

OK, I've changed the markdown writer so that, provided you haven't
used the --strict option,

- it will use \ for line breaks
- it will use a delimited code block if you've specified any attributes

John


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: markdown writer line wrapping
       [not found]                         ` <20101221035024.GA13268-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2010-12-22 18:57                           ` BP Jonsson
  0 siblings, 0 replies; 10+ messages in thread
From: BP Jonsson @ 2010-12-22 18:57 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

2010-12-21 04:50, John MacFarlane skrev:
> OK, I've changed the markdown writer so that, provided you haven't
> used the --strict option,
>
> - it will use \ for line breaks
> - it will use a delimited code block if you've specified any attributes
>

Thanks!


> Everyone has different preferences here.  I can't satisfy
>> everyone!
>>

Yeah, that's why I wanted it to be configurable
With these changes and the --columns option there
won't be anything I can't fix with a rather simple
Perl script. (Well, the table-realigning script is
perhaps not that simple... ;-)

/bpj


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-12-22 18:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-24  1:51 markdown writer line wrapping Nathan Gass
     [not found] ` <4CEC6F95.5000408-8UOIJiGH10pyDzI6CaY1VQ@public.gmane.org>
2010-11-24  3:25   ` John MacFarlane
     [not found]     ` <20101124032534.GB25133-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-12-13  4:38       ` John MacFarlane
2010-12-18 21:46       ` John MacFarlane
     [not found]         ` <20101218214621.GA3416-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-12-18 22:58           ` John MacFarlane
     [not found]             ` <20101218225821.GC4805-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-12-18 23:46               ` Simon Michael
2010-12-20 18:50               ` BP Jonsson
     [not found]                 ` <4D0FA58A.8090001-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2010-12-20 19:16                   ` John MacFarlane
     [not found]                     ` <20101220191623.GA15603-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-12-21  3:50                       ` John MacFarlane
     [not found]                         ` <20101221035024.GA13268-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2010-12-22 18:57                           ` BP Jonsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).