public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Matthew Pickering <matthewtpickering-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: filter to break urls in HTML
Date: Wed, 17 Dec 2014 22:32:39 +0000	[thread overview]
Message-ID: <CALuQ0m8i3LN3oB-54Km1v0mZzh1FmZa-ugUWuyevufoNjv7HDw@mail.gmail.com> (raw)
In-Reply-To: <5491EDED.7030000-S0/GAf8tV78@public.gmane.org>

Dear Pablo,

Sorry that I took over 90 minutes (!) this time but here is your
filter nevertheless.

https://gist.github.com/mpickering/fdc747b9c8306659cb43

Example output:

```
pandoc -f markdown -t native --filter=pablourl.hs
<http://www.link.com#a=b.php?what>
[Para [Link [Str
"http://www\33283.\33283link\33283.\33283com\33283#\33283a\33283=\33283b\33283.\33283php\33283?\33283what"]
("http://www.link.com#a=b.php?what","")]]
```
I hope this satisfies your needs and good luck with your book.



On Wed, Dec 17, 2014 at 8:56 PM, Pablo Rodríguez <oinos-S0/GAf8tV78@public.gmane.org> wrote:
> Dear list,
>
> having url in HTML documents (especially ePub files) leads to some weird
> line breaks.
>
> I discovered that the best way to break urls in HTML is to put zero
> width spaces where the url could be broken at.
>
> I would need a filter that parses the following url:
>
>     <http://www.link.com#a=b.php?what>
>
> in HTML as:
>
>     <a href="http://www.link.com#a=b.php?what">http://&#8203;
>     www&#8203;.&#8203;link&#8203;.&#8203;com&#8203;#&#8203;a&#8203;=
>     &#8203;b&#8203;.&#8203;php&#8203;?&#8203;what</a>
>
> These are two basic rules:
>
> There is no zero-width space (&#8203;) before the string "://" (without
> quotes).
>
> After that, any character not pertaining to ranges [0-9A-Za-z] should
> have a &#8203; before and after.
>
> I think this would be the easiest way to implement it (instead of
> defining the list of characters that should have the zero-width space
> before and after).
>
> I think this filter could be useful for anyone. For me this is important
> for a book that I plan to write about pandoc (and git) for authors.
> (Otherwise, paragraphs containing urls look really weird in some cases.)
>
> Would a kind soul provide me with this kind of filter?
>
> Many thanks for your help,
>
>
> Pablo
> --
> http://www.ousia.tk
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5491EDED.7030000%40web.de.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CALuQ0m8i3LN3oB-54Km1v0mZzh1FmZa-ugUWuyevufoNjv7HDw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


  parent reply	other threads:[~2014-12-17 22:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-17 20:56 Pablo Rodríguez
     [not found] ` <5491EDED.7030000-S0/GAf8tV78@public.gmane.org>
2014-12-17 22:32   ` Matthew Pickering [this message]
     [not found]     ` <CALuQ0m8i3LN3oB-54Km1v0mZzh1FmZa-ugUWuyevufoNjv7HDw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-18  6:37       ` Pablo Rodríguez
     [not found]         ` <54927626.8020401-S0/GAf8tV78@public.gmane.org>
2014-12-18 19:29           ` Pablo Rodríguez
     [not found]             ` <54932B28.6080308-S0/GAf8tV78@public.gmane.org>
2014-12-18 19:57               ` Matthew Pickering
     [not found]                 ` <CALuQ0m-RMv-RkiF9gsKM8VPGi2v=vCsWJ+AQU1fxyBD=NEoT5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-18 22:33                   ` Pablo Rodríguez
2014-12-21  0:29   ` BP Jonsson
     [not found]     ` <54961478.30103-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-12-21  0:49       ` Matthew Pickering
     [not found]         ` <CALuQ0m8BFu_YCtuLRbT7P78p1FWsW5XMf+3gHMPn76rva50u6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-21  1:01           ` BP Jonsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALuQ0m8i3LN3oB-54Km1v0mZzh1FmZa-ugUWuyevufoNjv7HDw@mail.gmail.com \
    --to=matthewtpickering-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).