public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* filter to break urls in HTML
@ 2014-12-17 20:56 Pablo Rodríguez
       [not found] ` <5491EDED.7030000-S0/GAf8tV78@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Pablo Rodríguez @ 2014-12-17 20:56 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Dear list,

having url in HTML documents (especially ePub files) leads to some weird
line breaks.

I discovered that the best way to break urls in HTML is to put zero
width spaces where the url could be broken at.

I would need a filter that parses the following url:

    <http://www.link.com#a=b.php?what>

in HTML as:

    <a href="http://www.link.com#a=b.php?what">http://&#8203;
    www&#8203;.&#8203;link&#8203;.&#8203;com&#8203;#&#8203;a&#8203;=
    &#8203;b&#8203;.&#8203;php&#8203;?&#8203;what</a>

These are two basic rules:

There is no zero-width space (&#8203;) before the string "://" (without
quotes).

After that, any character not pertaining to ranges [0-9A-Za-z] should
have a &#8203; before and after.

I think this would be the easiest way to implement it (instead of
defining the list of characters that should have the zero-width space
before and after).

I think this filter could be useful for anyone. For me this is important
for a book that I plan to write about pandoc (and git) for authors.
(Otherwise, paragraphs containing urls look really weird in some cases.)

Would a kind soul provide me with this kind of filter?

Many thanks for your help,


Pablo
-- 
http://www.ousia.tk


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-12-21  1:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-17 20:56 filter to break urls in HTML Pablo Rodríguez
     [not found] ` <5491EDED.7030000-S0/GAf8tV78@public.gmane.org>
2014-12-17 22:32   ` Matthew Pickering
     [not found]     ` <CALuQ0m8i3LN3oB-54Km1v0mZzh1FmZa-ugUWuyevufoNjv7HDw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-18  6:37       ` Pablo Rodríguez
     [not found]         ` <54927626.8020401-S0/GAf8tV78@public.gmane.org>
2014-12-18 19:29           ` Pablo Rodríguez
     [not found]             ` <54932B28.6080308-S0/GAf8tV78@public.gmane.org>
2014-12-18 19:57               ` Matthew Pickering
     [not found]                 ` <CALuQ0m-RMv-RkiF9gsKM8VPGi2v=vCsWJ+AQU1fxyBD=NEoT5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-18 22:33                   ` Pablo Rodríguez
2014-12-21  0:29   ` BP Jonsson
     [not found]     ` <54961478.30103-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-12-21  0:49       ` Matthew Pickering
     [not found]         ` <CALuQ0m8BFu_YCtuLRbT7P78p1FWsW5XMf+3gHMPn76rva50u6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-21  1:01           ` BP Jonsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).