public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* filter to break urls in HTML
@ 2014-12-17 20:56 Pablo Rodríguez
       [not found] ` <5491EDED.7030000-S0/GAf8tV78@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Pablo Rodríguez @ 2014-12-17 20:56 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Dear list,

having url in HTML documents (especially ePub files) leads to some weird
line breaks.

I discovered that the best way to break urls in HTML is to put zero
width spaces where the url could be broken at.

I would need a filter that parses the following url:

    <http://www.link.com#a=b.php?what>

in HTML as:

    <a href="http://www.link.com#a=b.php?what">http://&#8203;
    www&#8203;.&#8203;link&#8203;.&#8203;com&#8203;#&#8203;a&#8203;=
    &#8203;b&#8203;.&#8203;php&#8203;?&#8203;what</a>

These are two basic rules:

There is no zero-width space (&#8203;) before the string "://" (without
quotes).

After that, any character not pertaining to ranges [0-9A-Za-z] should
have a &#8203; before and after.

I think this would be the easiest way to implement it (instead of
defining the list of characters that should have the zero-width space
before and after).

I think this filter could be useful for anyone. For me this is important
for a book that I plan to write about pandoc (and git) for authors.
(Otherwise, paragraphs containing urls look really weird in some cases.)

Would a kind soul provide me with this kind of filter?

Many thanks for your help,


Pablo
-- 
http://www.ousia.tk


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: filter to break urls in HTML
       [not found] ` <5491EDED.7030000-S0/GAf8tV78@public.gmane.org>
@ 2014-12-17 22:32   ` Matthew Pickering
       [not found]     ` <CALuQ0m8i3LN3oB-54Km1v0mZzh1FmZa-ugUWuyevufoNjv7HDw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-12-21  0:29   ` BP Jonsson
  1 sibling, 1 reply; 9+ messages in thread
From: Matthew Pickering @ 2014-12-17 22:32 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Dear Pablo,

Sorry that I took over 90 minutes (!) this time but here is your
filter nevertheless.

https://gist.github.com/mpickering/fdc747b9c8306659cb43

Example output:

```
pandoc -f markdown -t native --filter=pablourl.hs
<http://www.link.com#a=b.php?what>
[Para [Link [Str
"http://www\33283.\33283link\33283.\33283com\33283#\33283a\33283=\33283b\33283.\33283php\33283?\33283what"]
("http://www.link.com#a=b.php?what","")]]
```
I hope this satisfies your needs and good luck with your book.



On Wed, Dec 17, 2014 at 8:56 PM, Pablo Rodríguez <oinos-S0/GAf8tV78@public.gmane.org> wrote:
> Dear list,
>
> having url in HTML documents (especially ePub files) leads to some weird
> line breaks.
>
> I discovered that the best way to break urls in HTML is to put zero
> width spaces where the url could be broken at.
>
> I would need a filter that parses the following url:
>
>     <http://www.link.com#a=b.php?what>
>
> in HTML as:
>
>     <a href="http://www.link.com#a=b.php?what">http://&#8203;
>     www&#8203;.&#8203;link&#8203;.&#8203;com&#8203;#&#8203;a&#8203;=
>     &#8203;b&#8203;.&#8203;php&#8203;?&#8203;what</a>
>
> These are two basic rules:
>
> There is no zero-width space (&#8203;) before the string "://" (without
> quotes).
>
> After that, any character not pertaining to ranges [0-9A-Za-z] should
> have a &#8203; before and after.
>
> I think this would be the easiest way to implement it (instead of
> defining the list of characters that should have the zero-width space
> before and after).
>
> I think this filter could be useful for anyone. For me this is important
> for a book that I plan to write about pandoc (and git) for authors.
> (Otherwise, paragraphs containing urls look really weird in some cases.)
>
> Would a kind soul provide me with this kind of filter?
>
> Many thanks for your help,
>
>
> Pablo
> --
> http://www.ousia.tk
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5491EDED.7030000%40web.de.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CALuQ0m8i3LN3oB-54Km1v0mZzh1FmZa-ugUWuyevufoNjv7HDw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: filter to break urls in HTML
       [not found]     ` <CALuQ0m8i3LN3oB-54Km1v0mZzh1FmZa-ugUWuyevufoNjv7HDw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-12-18  6:37       ` Pablo Rodríguez
       [not found]         ` <54927626.8020401-S0/GAf8tV78@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Pablo Rodríguez @ 2014-12-18  6:37 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 12/17/2014 11:32 PM, Matthew Pickering wrote:
> Dear Pablo,
> 
> Sorry that I took over 90 minutes (!) this time but here is your
> filter nevertheless.

Dear Matthew,

many thanks for your filter. Sorry that it took so long.

Line breaks now look really great.

> https://gist.github.com/mpickering/fdc747b9c8306659cb43

(There is a minor issue: the zero-width char should be '\x200B' instead
of '\x8203'.)

I have a further request about the filter. (Sorry, I’d avoid it, but
this is about readability.)

Autolinks are parsed into HTML with the class "uri". This is perfect to
style them as different from standard text and from other links such as
"[a link](http://www.link.com)".

Would it be possible that the filter adds "class=uri" when parsing the
autolink to HTML?

And if the following requires no more than ten seconds: could it be
possible that the filter only is applied when parsing autolinks into HTML?

Many thanks for your help again,


Pablo



> Example output:
> 
> ```
> pandoc -f markdown -t native --filter=pablourl.hs
> <http://www.link.com#a=b.php?what>
> [Para [Link [Str
> "http://www\33283.\33283link\33283.\33283com\33283#\33283a\33283=\33283b\33283.\33283php\33283?\33283what"]
> ("http://www.link.com#a=b.php?what","")]]
> ```
> I hope this satisfies your needs and good luck with your book.


-- 
http://www.ousia.tk

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/54927626.8020401%40web.de.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: filter to break urls in HTML
       [not found]         ` <54927626.8020401-S0/GAf8tV78@public.gmane.org>
@ 2014-12-18 19:29           ` Pablo Rodríguez
       [not found]             ` <54932B28.6080308-S0/GAf8tV78@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Pablo Rodríguez @ 2014-12-18 19:29 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 12/18/2014 07:37 AM, Pablo Rodríguez wrote:
> [...]
> I have a further request about the filter. (Sorry, I’d avoid it, but
> this is about readability.)
> 
> Autolinks are parsed into HTML with the class "uri". This is perfect to
> style them as different from standard text and from other links such as
> "[a link](http://www.link.com)".
> 
> Would it be possible that the filter adds "class=uri" when parsing the
> autolink to HTML?

Dear Mathew,

I think I have found a workaround that seems to work.

I don’t really know whether this is right but I replaced:

  | x == url = Link [Str (insertSpaces' x)] (url, tit)

with:

  | x == url = Span ("",["uri"],[]) [Link [Str (insertSpaces' x)] (url,
tit)]

From what I understand, I added a span from the uri class. (At least is
what I intended [this is all Greek to me].)

Many thanks for your extremely generous help,


Pablo
-- 
http://www.ousia.tk

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/54932B28.6080308%40web.de.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: filter to break urls in HTML
       [not found]             ` <54932B28.6080308-S0/GAf8tV78@public.gmane.org>
@ 2014-12-18 19:57               ` Matthew Pickering
       [not found]                 ` <CALuQ0m-RMv-RkiF9gsKM8VPGi2v=vCsWJ+AQU1fxyBD=NEoT5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Pickering @ 2014-12-18 19:57 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

That's exactly right. (Sorry I didn't mean to imply it took me 90
minutes, it took no more than 5 but I was only able to help you out 90
minutes after you had posted to the list)



On Thu, Dec 18, 2014 at 7:29 PM, Pablo Rodríguez <oinos-S0/GAf8tV78@public.gmane.org> wrote:
> On 12/18/2014 07:37 AM, Pablo Rodríguez wrote:
>> [...]
>> I have a further request about the filter. (Sorry, I’d avoid it, but
>> this is about readability.)
>>
>> Autolinks are parsed into HTML with the class "uri". This is perfect to
>> style them as different from standard text and from other links such as
>> "[a link](http://www.link.com)".
>>
>> Would it be possible that the filter adds "class=uri" when parsing the
>> autolink to HTML?
>
> Dear Mathew,
>
> I think I have found a workaround that seems to work.
>
> I don’t really know whether this is right but I replaced:
>
>   | x == url = Link [Str (insertSpaces' x)] (url, tit)
>
> with:
>
>   | x == url = Span ("",["uri"],[]) [Link [Str (insertSpaces' x)] (url,
> tit)]
>
> From what I understand, I added a span from the uri class. (At least is
> what I intended [this is all Greek to me].)
>
> Many thanks for your extremely generous help,
>
>
> Pablo
> --
> http://www.ousia.tk
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/54932B28.6080308%40web.de.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CALuQ0m-RMv-RkiF9gsKM8VPGi2v%3DvCsWJ%2BAQU1fxyBD%3DNEoT5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: filter to break urls in HTML
       [not found]                 ` <CALuQ0m-RMv-RkiF9gsKM8VPGi2v=vCsWJ+AQU1fxyBD=NEoT5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-12-18 22:33                   ` Pablo Rodríguez
  0 siblings, 0 replies; 9+ messages in thread
From: Pablo Rodríguez @ 2014-12-18 22:33 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 12/18/2014 08:57 PM, Matthew Pickering wrote:
> That's exactly right. (Sorry I didn't mean to imply it took me 90
> minutes, it took no more than 5 but I was only able to help you out 90
> minutes after you had posted to the list)

Many thanks for the confirmation, Matthew.

Your reply in 90 minutes was more than fast. I’m really thankful for that.

And sorry for not having replied myself that fast :-).


Pablo


> On Thu, Dec 18, 2014 at 7:29 PM, Pablo Rodríguez wrote:
>> On 12/18/2014 07:37 AM, Pablo Rodríguez wrote:
>>> [...]
>>> I have a further request about the filter. (Sorry, I’d avoid it, but
>>> this is about readability.)
>>>
>>> Autolinks are parsed into HTML with the class "uri". This is perfect to
>>> style them as different from standard text and from other links such as
>>> "[a link](http://www.link.com)".
>>>
>>> Would it be possible that the filter adds "class=uri" when parsing the
>>> autolink to HTML?
>>
>> Dear Mathew,
>>
>> I think I have found a workaround that seems to work.
>>
>> I don’t really know whether this is right but I replaced:
>>
>>   | x == url = Link [Str (insertSpaces' x)] (url, tit)
>>
>> with:
>>
>>   | x == url = Span ("",["uri"],[]) [Link [Str (insertSpaces' x)] (url,
>> tit)]
>>
>> From what I understand, I added a span from the uri class. (At least is
>> what I intended [this is all Greek to me].)
>>
>> Many thanks for your extremely generous help,
>>
>>
>> Pablo

-- 
http://www.ousia.tk

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5493564B.8050105%40web.de.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: filter to break urls in HTML
       [not found] ` <5491EDED.7030000-S0/GAf8tV78@public.gmane.org>
  2014-12-17 22:32   ` Matthew Pickering
@ 2014-12-21  0:29   ` BP Jonsson
       [not found]     ` <54961478.30103-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: BP Jonsson @ 2014-12-21  0:29 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Den 2014-12-17 21:56, Pablo Rodríguez skrev:
> I would need a filter that parses the following url:
> 
>      <http://www.link.com#a=b.php?what>
> 
> in HTML as:
> 
>      <a href="http://www.link.com#a=b.php?what">http://&#8203;
>      www&#8203;.&#8203;link&#8203;.&#8203;com&#8203;#&#8203;a&#8203;=
>      &#8203;b&#8203;.&#8203;php&#8203;?&#8203;what</a>

Are you not afraid that someone who copypastes that URL will get angry at you? :-)

Actually I wrote such a filter in Perl just for kicks since I ralized that the main action could be compacted into a single substitution:

    $_->{c}[0][0]{c} =~ s{ (^.+?://) | (?= [^-:a-z0-9%] ) | (?<= [-:] ) }{ $1 || "\x{200b}" }egix;

I figured that I would like to have breaks after hyphens and colons but before other punctuation, except for percent-encoded characters/bytes where I wouldn't want any breaks at all, hence the three-way alternation in the search pattern. 

BTW Matthew, isnt `nb = '\x8203'` in your Haskell version a mistake. Codepoint 8203 *hex* is U+8203 CJK UNIFIED IDEOGRAPH-8203 while 8203 *decimal* is U+200B ZERO WIDTH SPACE!

/bpj

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/54961478.30103%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: filter to break urls in HTML
       [not found]     ` <54961478.30103-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-12-21  0:49       ` Matthew Pickering
       [not found]         ` <CALuQ0m8BFu_YCtuLRbT7P78p1FWsW5XMf+3gHMPn76rva50u6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Pickering @ 2014-12-21  0:49 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Hi bpj,

Yes it is a mistake as Pablo pointed out above.



On Sun, Dec 21, 2014 at 12:29 AM, BP Jonsson <bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org> wrote:
> Den 2014-12-17 21:56, Pablo Rodríguez skrev:
>> I would need a filter that parses the following url:
>>
>>      <http://www.link.com#a=b.php?what>
>>
>> in HTML as:
>>
>>      <a href="http://www.link.com#a=b.php?what">http://&#8203;
>>      www&#8203;.&#8203;link&#8203;.&#8203;com&#8203;#&#8203;a&#8203;=
>>      &#8203;b&#8203;.&#8203;php&#8203;?&#8203;what</a>
>
> Are you not afraid that someone who copypastes that URL will get angry at you? :-)
>
> Actually I wrote such a filter in Perl just for kicks since I ralized that the main action could be compacted into a single substitution:
>
>     $_->{c}[0][0]{c} =~ s{ (^.+?://) | (?= [^-:a-z0-9%] ) | (?<= [-:] ) }{ $1 || "\x{200b}" }egix;
>
> I figured that I would like to have breaks after hyphens and colons but before other punctuation, except for percent-encoded characters/bytes where I wouldn't want any breaks at all, hence the three-way alternation in the search pattern.
>
> BTW Matthew, isnt `nb = '\x8203'` in your Haskell version a mistake. Codepoint 8203 *hex* is U+8203 CJK UNIFIED IDEOGRAPH-8203 while 8203 *decimal* is U+200B ZERO WIDTH SPACE!
>
> /bpj
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/54961478.30103%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CALuQ0m8BFu_YCtuLRbT7P78p1FWsW5XMf%2B3gHMPn76rva50u6w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: filter to break urls in HTML
       [not found]         ` <CALuQ0m8BFu_YCtuLRbT7P78p1FWsW5XMf+3gHMPn76rva50u6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-12-21  1:01           ` BP Jonsson
  0 siblings, 0 replies; 9+ messages in thread
From: BP Jonsson @ 2014-12-21  1:01 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Ah yes, he did indeed. Sorry about the noise!

/bpj

Den 2014-12-21 01:49, Matthew Pickering skrev:
> Hi bpj,
>
> Yes it is a mistake as Pablo pointed out above.
>
>
>
> On Sun, Dec 21, 2014 at 12:29 AM, BP Jonsson <bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org> wrote:
>> Den 2014-12-17 21:56, Pablo Rodríguez skrev:
>>> I would need a filter that parses the following url:
>>>
>>>       <http://www.link.com#a=b.php?what>
>>>
>>> in HTML as:
>>>
>>>       <a href="http://www.link.com#a=b.php?what">http://&#8203;
>>>       www&#8203;.&#8203;link&#8203;.&#8203;com&#8203;#&#8203;a&#8203;=
>>>       &#8203;b&#8203;.&#8203;php&#8203;?&#8203;what</a>
>>
>> Are you not afraid that someone who copypastes that URL will get angry at you? :-)
>>
>> Actually I wrote such a filter in Perl just for kicks since I ralized that the main action could be compacted into a single substitution:
>>
>>      $_->{c}[0][0]{c} =~ s{ (^.+?://) | (?= [^-:a-z0-9%] ) | (?<= [-:] ) }{ $1 || "\x{200b}" }egix;
>>
>> I figured that I would like to have breaks after hyphens and colons but before other punctuation, except for percent-encoded characters/bytes where I wouldn't want any breaks at all, hence the three-way alternation in the search pattern.
>>
>> BTW Matthew, isnt `nb = '\x8203'` in your Haskell version a mistake. Codepoint 8203 *hex* is U+8203 CJK UNIFIED IDEOGRAPH-8203 while 8203 *decimal* is U+200B ZERO WIDTH SPACE!
>>
>> /bpj
>>
>> --
>> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/54961478.30103%40gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/54961BEE.3010907%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-12-21  1:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-17 20:56 filter to break urls in HTML Pablo Rodríguez
     [not found] ` <5491EDED.7030000-S0/GAf8tV78@public.gmane.org>
2014-12-17 22:32   ` Matthew Pickering
     [not found]     ` <CALuQ0m8i3LN3oB-54Km1v0mZzh1FmZa-ugUWuyevufoNjv7HDw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-18  6:37       ` Pablo Rodríguez
     [not found]         ` <54927626.8020401-S0/GAf8tV78@public.gmane.org>
2014-12-18 19:29           ` Pablo Rodríguez
     [not found]             ` <54932B28.6080308-S0/GAf8tV78@public.gmane.org>
2014-12-18 19:57               ` Matthew Pickering
     [not found]                 ` <CALuQ0m-RMv-RkiF9gsKM8VPGi2v=vCsWJ+AQU1fxyBD=NEoT5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-18 22:33                   ` Pablo Rodríguez
2014-12-21  0:29   ` BP Jonsson
     [not found]     ` <54961478.30103-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-12-21  0:49       ` Matthew Pickering
     [not found]         ` <CALuQ0m8BFu_YCtuLRbT7P78p1FWsW5XMf+3gHMPn76rva50u6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-12-21  1:01           ` BP Jonsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).