Another good suggestion ... thanks BPJ.
On Monday, June 22, 2015 at 11:58:10 PM UTC+9:30, BP Jonsson wrote:
I just downloaded and installed the current tidy5 from
<http://www.htacg.org/binaries/> There were some harmless
warnings about missing metadata which I inspected and ignored.
You call it as tidy5 instead of tidy.
> Den 2015-06-22 04:57, Geoff Russell skrev:
> >
> >
> > On Monday, June 22, 2015 at 11:56:23 AM UTC+9:30, Daniel Staal wrote:
> >>
> >> [snip]
> >>
> >> --As for the rest, it is mine.
> >>
> >> Since you're already using Perl, I have a quick one: Feed the text through
> >> Text::Wrap first. A couple of extra lines in your script should fix the
> >> problem:
> >>
> >
> > Thank Daniel ... definitely worth investigating, but I'm a little worried
> > that breaking at
> > word boundaries might break html tags in weird places. Perhaps I need to
> > check html
> > syntax details first.
> I would use HTML Tidy. There are some useful links on its WP
> page: <https://en.wikipedia.org/wiki/HTML_Tidy>. If you are on
> something Unixish it should be easy to install. In the Ubuntu
> repo it's simply called "tidy". You will get a slightly old
> version, but that shouldn't affect most normal use cases. It has
> an option exactly for this, presumably designed not to break
> anywhere harmful:
> | --wrap
> | Type: Integer
> | Default: 68
> | Example: 0 (no wrapping), 1, 2, ...
> | This option specifies the right margin Tidy uses for line wrapping.
> Tidy tries to wrap lines so that they do not exceed this length.
> Set wrap to zero if you want to disable line wrapping.
> If you are on something Unixish you should say
> `man -H<browser> tidy`, where `<browser>` is the name of your
> favorite web browser, and read its manual in the comfort of said
> browser. Be careful with the --write-back option since there is
> no backup option!
> The Perl wrapper around HTML Tidy is less useful now, since it
> needs you to build its own fork of tidylib,which never
> succeeded for me. I have run it successfully with Capture::Tiny
> though.
> A custom config file is most useful when doing that!
> /bpj