public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Is it possible to control maximum line length in html output?
@ 2015-06-22  1:40 Geoff Russell
       [not found] ` <420dbcea-3fda-4acc-920e-7cb6b3b3dbbd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Geoff Russell @ 2015-06-22  1:40 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 938 bytes --]

I'm sending html files in emails and the MIME::Lite perl module doesn't 
like long lines (>1000 chars) ... not sure where
the problem is, but I get occassional added space in the middle of words 
when it breaks the lines up to MIME them.

I tried --columns=100 thinking it might split long html lines up, but that 
didn't seem to affect html output.  

Any ideas? 


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/420dbcea-3fda-4acc-920e-7cb6b3b3dbbd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1447 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is it possible to control maximum line length in html output?
       [not found] ` <420dbcea-3fda-4acc-920e-7cb6b3b3dbbd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-06-22  2:26   ` Daniel Staal
  2015-06-22  2:57     ` Geoff Russell
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Staal @ 2015-06-22  2:26 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

--As of June 21, 2015 6:40:02 PM -0700, Geoff Russell is alleged to have 
said:

> I'm sending html files in emails and the MIME::Lite perl module doesn't
> like long lines (>1000 chars) ... not sure where
> the problem is, but I get occassional added space in the middle of words
> when it breaks the lines up to MIME them.
>
>
> I tried --columns=100 thinking it might split long html lines up, but
> that didn't seem to affect html output.
>
>
> Any ideas?

--As for the rest, it is mine.

Since you're already using Perl, I have a quick one: Feed the text through 
Text::Wrap first.  A couple of extra lines in your script should fix the 
problem:

~~~

use Text::Wrap;

$Text::Wrap::columns = 100;
@wrapped_text = wrap('', '', @input_text);

~~~

(Above for example.  See docs for more info.)

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is it possible to control maximum line length in html output?
  2015-06-22  2:26   ` Daniel Staal
@ 2015-06-22  2:57     ` Geoff Russell
       [not found]       ` <e2af5f2f-72d0-48fd-bb73-a9da9da2b70b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Geoff Russell @ 2015-06-22  2:57 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 1748 bytes --]



On Monday, June 22, 2015 at 11:56:23 AM UTC+9:30, Daniel Staal wrote:
>
> [snip]
>
> --As for the rest, it is mine. 
>
> Since you're already using Perl, I have a quick one: Feed the text through 
> Text::Wrap first.  A couple of extra lines in your script should fix the 
> problem: 
>

Thank Daniel ... definitely worth investigating, but I'm a little worried 
that breaking at
word boundaries might break html tags in weird places. Perhaps I need to 
check html 
syntax details first.

Cheers,
Geoff
 

>
> ~~~ 
>
> use Text::Wrap; 
>
> $Text::Wrap::columns = 100; 
> @wrapped_text = wrap('', '', @input_text); 
>
> ~~~ 
>
> (Above for example.  See docs for more info.) 
>
> Daniel T. Staal 
>
> --------------------------------------------------------------- 
> This email copyright the author.  Unless otherwise noted, you 
> are expressly allowed to retransmit, quote, or otherwise use 
> the contents for non-commercial purposes.  This copyright will 
> expire 5 years after the author's death, or in 30 years, 
> whichever is longer, unless such a period is in excess of 
> local copyright law. 
> --------------------------------------------------------------- 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e2af5f2f-72d0-48fd-bb73-a9da9da2b70b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2630 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is it possible to control maximum line length in html output?
       [not found]       ` <e2af5f2f-72d0-48fd-bb73-a9da9da2b70b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-06-22  4:35         ` Daniel Staal
  2015-06-22  6:08           ` Geoff Russell
  2015-06-22 14:09         ` BP Jonsson
  1 sibling, 1 reply; 8+ messages in thread
From: Daniel Staal @ 2015-06-22  4:35 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

--As of June 21, 2015 7:57:23 PM -0700, Geoff Russell is alleged to have 
said:

> Thank Daniel ... definitely worth investigating, but I'm a little worried
> that breaking at
> word boundaries might break html tags in weird places. Perhaps I need to
> check html
> syntax details first.

--As for the rest, it is mine.

A quick look at the code makes me think the relevant regex they are using 
to break words is `(?=\s)\X` - meaning 'any whitespace'.  (Well, the 
boundary of.)  So you should be safe: Whitespace is collapsed in HTML, so 
adding to it or changing it won't break anything.  (As long as it's not 
created, and it looks like Text::Wrap doesn't do that.)

It should also be possible to change that regex by setting 
`$Text::Wrap::break` to whatever you need - though as I said you shouldn't 
need to.

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is it possible to control maximum line length in html output?
  2015-06-22  4:35         ` Daniel Staal
@ 2015-06-22  6:08           ` Geoff Russell
  0 siblings, 0 replies; 8+ messages in thread
From: Geoff Russell @ 2015-06-22  6:08 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 2149 bytes --]



On Monday, June 22, 2015 at 2:05:19 PM UTC+9:30, Daniel Staal wrote:
>
> --As of June 21, 2015 7:57:23 PM -0700, Geoff Russell is alleged to have 
> said: 
>
> > Thank Daniel ... definitely worth investigating, but I'm a little 
> worried 
> > that breaking at 
> > word boundaries might break html tags in weird places. Perhaps I need to 
> > check html 
> > syntax details first. 
>
> --As for the rest, it is mine. 
>
> A quick look at the code makes me think the relevant regex they are using 
> to break words is `(?=\s)\X` - meaning 'any whitespace'.  (Well, the 
> boundary of.)  So you should be safe: Whitespace is collapsed in HTML, so 
> adding to it or changing it won't break anything.  (As long as it's not 
> created, and it looks like Text::Wrap doesn't do that.) 
>
> It should also be possible to change that regex by setting 
> `$Text::Wrap::break` to whatever you need - though as I said you shouldn't 
> need to. 
>

I "rolled my own" to just break the long lines and break on a space ... and 
it works fine. Thanks
for suggesting it.

Cheers,
Geoff


 

>
> Daniel T. Staal 
>
> --------------------------------------------------------------- 
> This email copyright the author.  Unless otherwise noted, you 
> are expressly allowed to retransmit, quote, or otherwise use 
> the contents for non-commercial purposes.  This copyright will 
> expire 5 years after the author's death, or in 30 years, 
> whichever is longer, unless such a period is in excess of 
> local copyright law. 
> --------------------------------------------------------------- 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c51898f2-3f87-4390-b8bf-772824feffc3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 3045 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is it possible to control maximum line length in html output?
       [not found]       ` <e2af5f2f-72d0-48fd-bb73-a9da9da2b70b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2015-06-22  4:35         ` Daniel Staal
@ 2015-06-22 14:09         ` BP Jonsson
       [not found]           ` <558816FD.403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 8+ messages in thread
From: BP Jonsson @ 2015-06-22 14:09 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Den 2015-06-22 04:57, Geoff Russell skrev:
>
>
> On Monday, June 22, 2015 at 11:56:23 AM UTC+9:30, Daniel Staal wrote:
>>
>> [snip]
>>
>> --As for the rest, it is mine.
>>
>> Since you're already using Perl, I have a quick one: Feed the text through
>> Text::Wrap first.  A couple of extra lines in your script should fix the
>> problem:
>>
>
> Thank Daniel ... definitely worth investigating, but I'm a little worried
> that breaking at
> word boundaries might break html tags in weird places. Perhaps I need to
> check html
> syntax details first.

I would use HTML Tidy. There are some useful links on its WP
page: <https://en.wikipedia.org/wiki/HTML_Tidy>. If you are on
something Unixish it should be easy to install. In the Ubuntu
repo it's simply called "tidy". You will get a slightly old
version, but that shouldn't affect most normal use cases. It has
an option exactly for this, presumably designed not to break
anywhere harmful:

| --wrap
| Type: Integer
| Default: 68
| Example: 0 (no wrapping), 1, 2, ...
| This option specifies the right margin Tidy uses for line wrapping.
     Tidy tries to wrap lines so that they do not exceed this length.
     Set wrap to zero if you want to disable line wrapping.

If you are on something Unixish you should say
`man -H<browser> tidy`, where `<browser>` is the name of your
favorite web browser, and read its manual in the comfort of said
browser. Be careful with the --write-back option since there is
no backup option!

The Perl wrapper around HTML Tidy is less useful now, since it
needs you to build its own fork of tidylib,which never
succeeded for me. I have run it successfully with Capture::Tiny 
though.
A custom config file is most useful when doing that!

/bpj


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Fwd: Re: Is it possible to control maximum line length in html output?
       [not found]           ` <558816FD.403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-06-22 14:28             ` BP Jonsson
       [not found]               ` <55881B77.6080406-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: BP Jonsson @ 2015-06-22 14:28 UTC (permalink / raw)
  To: pandoc-discuss



I just downloaded and installed the current tidy5 from
<http://www.htacg.org/binaries/>  There were some harmless
warnings about missing metadata which I inspected and ignored.

You call it as tidy5 instead of tidy.


> Den 2015-06-22 04:57, Geoff Russell skrev:
> >
> >
> > On Monday, June 22, 2015 at 11:56:23 AM UTC+9:30, Daniel Staal wrote:
> >>
> >> [snip]
> >>
> >> --As for the rest, it is mine.
> >>
> >> Since you're already using Perl, I have a quick one: Feed the text through
> >> Text::Wrap first.  A couple of extra lines in your script should fix the
> >> problem:
> >>
> >
> > Thank Daniel ... definitely worth investigating, but I'm a little worried
> > that breaking at
> > word boundaries might break html tags in weird places. Perhaps I need to
> > check html
> > syntax details first.

> I would use HTML Tidy. There are some useful links on its WP
> page: <https://en.wikipedia.org/wiki/HTML_Tidy>. If you are on
> something Unixish it should be easy to install. In the Ubuntu
> repo it's simply called "tidy". You will get a slightly old
> version, but that shouldn't affect most normal use cases. It has
> an option exactly for this, presumably designed not to break
> anywhere harmful:

> | --wrap
> | Type: Integer
> | Default: 68
> | Example: 0 (no wrapping), 1, 2, ...
> | This option specifies the right margin Tidy uses for line wrapping.
>     Tidy tries to wrap lines so that they do not exceed this length.
>     Set wrap to zero if you want to disable line wrapping.

> If you are on something Unixish you should say
> `man -H<browser> tidy`, where `<browser>` is the name of your
> favorite web browser, and read its manual in the comfort of said
> browser. Be careful with the --write-back option since there is
> no backup option!

> The Perl wrapper around HTML Tidy is less useful now, since it
> needs you to build its own fork of tidylib,which never
> succeeded for me. I have run it successfully with Capture::Tiny
> though.
> A custom config file is most useful when doing that!

> /bpj



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: Is it possible to control maximum line length in html output?
       [not found]               ` <55881B77.6080406-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-06-22 23:34                 ` Geoff Russell
  0 siblings, 0 replies; 8+ messages in thread
From: Geoff Russell @ 2015-06-22 23:34 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: bpj-J3H7GcXPSITLoDKTGw+V6w


[-- Attachment #1.1: Type: text/plain, Size: 2887 bytes --]

Another good suggestion ... thanks BPJ.

On Monday, June 22, 2015 at 11:58:10 PM UTC+9:30, BP Jonsson wrote:
>
>
>
> I just downloaded and installed the current tidy5 from 
> <http://www.htacg.org/binaries/>  There were some harmless 
> warnings about missing metadata which I inspected and ignored. 
>
> You call it as tidy5 instead of tidy. 
>
>
> > Den 2015-06-22 04:57, Geoff Russell skrev: 
> > > 
> > > 
> > > On Monday, June 22, 2015 at 11:56:23 AM UTC+9:30, Daniel Staal wrote: 
> > >> 
> > >> [snip] 
> > >> 
> > >> --As for the rest, it is mine. 
> > >> 
> > >> Since you're already using Perl, I have a quick one: Feed the text 
> through 
> > >> Text::Wrap first.  A couple of extra lines in your script should fix 
> the 
> > >> problem: 
> > >> 
> > > 
> > > Thank Daniel ... definitely worth investigating, but I'm a little 
> worried 
> > > that breaking at 
> > > word boundaries might break html tags in weird places. Perhaps I need 
> to 
> > > check html 
> > > syntax details first. 
>
> > I would use HTML Tidy. There are some useful links on its WP 
> > page: <https://en.wikipedia.org/wiki/HTML_Tidy>. If you are on 
> > something Unixish it should be easy to install. In the Ubuntu 
> > repo it's simply called "tidy". You will get a slightly old 
> > version, but that shouldn't affect most normal use cases. It has 
> > an option exactly for this, presumably designed not to break 
> > anywhere harmful: 
>
> > | --wrap 
> > | Type: Integer 
> > | Default: 68 
> > | Example: 0 (no wrapping), 1, 2, ... 
> > | This option specifies the right margin Tidy uses for line wrapping. 
> >     Tidy tries to wrap lines so that they do not exceed this length. 
> >     Set wrap to zero if you want to disable line wrapping. 
>
> > If you are on something Unixish you should say 
> > `man -H<browser> tidy`, where `<browser>` is the name of your 
> > favorite web browser, and read its manual in the comfort of said 
> > browser. Be careful with the --write-back option since there is 
> > no backup option! 
>
> > The Perl wrapper around HTML Tidy is less useful now, since it 
> > needs you to build its own fork of tidylib,which never 
> > succeeded for me. I have run it successfully with Capture::Tiny 
> > though. 
> > A custom config file is most useful when doing that! 
>
> > /bpj 
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d249a8e6-6990-4b00-84c9-4deb4bb1fb18%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 4708 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-06-22 23:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-22  1:40 Is it possible to control maximum line length in html output? Geoff Russell
     [not found] ` <420dbcea-3fda-4acc-920e-7cb6b3b3dbbd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-06-22  2:26   ` Daniel Staal
2015-06-22  2:57     ` Geoff Russell
     [not found]       ` <e2af5f2f-72d0-48fd-bb73-a9da9da2b70b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-06-22  4:35         ` Daniel Staal
2015-06-22  6:08           ` Geoff Russell
2015-06-22 14:09         ` BP Jonsson
     [not found]           ` <558816FD.403-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-06-22 14:28             ` Fwd: " BP Jonsson
     [not found]               ` <55881B77.6080406-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-06-22 23:34                 ` Geoff Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).