From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/11569 Path: news.gmane.org!not-for-mail From: Matthew Pickering Newsgroups: gmane.text.pandoc Subject: Re: filter to break urls in HTML Date: Sun, 21 Dec 2014 00:49:58 +0000 Message-ID: References: <5491EDED.7030000@web.de> <54961478.30103@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1419123006 30610 80.91.229.3 (21 Dec 2014 00:50:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 21 Dec 2014 00:50:06 +0000 (UTC) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCO2LGEC4AIBBN5S3CSAKGQEO6KAQ6Y-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sun Dec 21 01:50:02 2014 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-wg0-f55.google.com ([74.125.82.55]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Y2Uie-0000Wu-4T for gtp-pandoc-discuss@m.gmane.org; Sun, 21 Dec 2014 01:50:00 +0100 Original-Received: by mail-wg0-f55.google.com with SMTP id y19sf216861wgg.20 for ; Sat, 20 Dec 2014 16:49:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe; bh=bzuLVt+lo1DrpCG8/npH2FfUGMTSDkWW2mbc+8pMaLg=; b=cJgInjmhWtVrkiwGvuXBfhLrsSn3tFRzbOi7ZoC12TetFdYVGTEVDeuVfXIxANOd/P rg8eCnW2wdGuc0YLh1X+f9fnNTmIqkHaRxLMZ3ppfs7YyMOzx9ZWu7yjlqPUYn7bnSnY gM3h4AeOPOoEaIQFYrSPwEiY90/Jxbe0fofv/FrmhS4aoyqTgnPii32RSl4fu9Q4+mKc IYgu9Ejk22KIpXBUrFreEUTVOFQoSfKw38X+ntDIqp9DPKLppq1Km2O4TVD6jl0dv2qD nYbofsyWhVWOGyDZ0Nsak8o4mJi/0cbeX3M73ip16GX/rACY7IqVR3kGNBoBIi/wFiif s7wQ== X-Received: by 10.180.91.13 with SMTP id ca13mr52570wib.19.1419122999740; Sat, 20 Dec 2014 16:49:59 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.180.78.98 with SMTP id a2ls399748wix.41.gmail; Sat, 20 Dec 2014 16:49:59 -0800 (PST) X-Received: by 10.180.228.38 with SMTP id sf6mr1281328wic.5.1419122999192; Sat, 20 Dec 2014 16:49:59 -0800 (PST) Original-Received: from mail-la0-x22d.google.com (mail-la0-x22d.google.com. [2a00:1450:4010:c03::22d]) by gmr-mx.google.com with ESMTPS id mu9si678631lbc.0.2014.12.20.16.49.59 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 20 Dec 2014 16:49:59 -0800 (PST) Received-SPF: pass (google.com: domain of matthewtpickering-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2a00:1450:4010:c03::22d as permitted sender) client-ip=2a00:1450:4010:c03::22d; Original-Received: by mail-la0-f45.google.com with SMTP id gq15so2537114lab.32 for ; Sat, 20 Dec 2014 16:49:59 -0800 (PST) X-Received: by 10.112.130.132 with SMTP id oe4mr14870879lbb.82.1419122999034; Sat, 20 Dec 2014 16:49:59 -0800 (PST) Original-Received: by 10.114.21.130 with HTTP; Sat, 20 Dec 2014 16:49:58 -0800 (PST) In-Reply-To: <54961478.30103-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> X-Original-Sender: matthewtpickering-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of matthewtpickering-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2a00:1450:4010:c03::22d as permitted sender) smtp.mail=matthewtpickering-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:11569 Archived-At: Hi bpj, Yes it is a mistake as Pablo pointed out above. On Sun, Dec 21, 2014 at 12:29 AM, BP Jonsson wrote: > Den 2014-12-17 21:56, Pablo Rodr=C3=ADguez skrev: >> I would need a filter that parses the following url: >> >> >> >> in HTML as: >> >> http://​ >> www​.​link​.​com​#​a​=3D >> ​b​.​php​?​what > > Are you not afraid that someone who copypastes that URL will get angry at= you? :-) > > Actually I wrote such a filter in Perl just for kicks since I ralized tha= t the main action could be compacted into a single substitution: > > $_->{c}[0][0]{c} =3D~ s{ (^.+?://) | (?=3D [^-:a-z0-9%] ) | (?<=3D [-= :] ) }{ $1 || "\x{200b}" }egix; > > I figured that I would like to have breaks after hyphens and colons but b= efore other punctuation, except for percent-encoded characters/bytes where = I wouldn't want any breaks at all, hence the three-way alternation in the s= earch pattern. > > BTW Matthew, isnt `nb =3D '\x8203'` in your Haskell version a mistake. Co= depoint 8203 *hex* is U+8203 CJK UNIFIED IDEOGRAPH-8203 while 8203 *decimal= * is U+200B ZERO WIDTH SPACE! > > /bpj > > -- > You received this message because you are subscribed to the Google Groups= "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgi= d/pandoc-discuss/54961478.30103%40gmail.com. > For more options, visit https://groups.google.com/d/optout. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/CALuQ0m8BFu_YCtuLRbT7P78p1FWsW5XMf%2B3gHMPn76rva50u6w%40mail= .gmail.com. For more options, visit https://groups.google.com/d/optout.