From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/16463 Path: news.gmane.org!.POSTED!not-for-mail From: =?UTF-8?Q?J=C3=BCrgen_Schulze?= <1manfactory-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Newsgroups: gmane.text.pandoc Subject: Re: Hyphenation Date: Sat, 17 Dec 2016 00:53:28 -0800 (PST) Message-ID: <112f7607-4771-4672-924d-98850eba8616@googlegroups.com> References: <60d24566-f642-4d47-9d47-1a6f91a7d562@googlegroups.com> <877f80h9x3.fsf@espresso.zeitkraut.de> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_628_1130129246.1481964808759" X-Trace: blaine.gmane.org 1481964814 31759 195.159.176.226 (17 Dec 2016 08:53:34 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 17 Dec 2016 08:53:34 +0000 (UTC) Cc: albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCAMDSEJR4LBBCP22PBAKGQEI3LYLEI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat Dec 17 09:53:29 2016 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-it0-f59.google.com ([209.85.214.59]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cIAkA-00074c-Qz for gtp-pandoc-discuss@m.gmane.org; Sat, 17 Dec 2016 09:53:27 +0100 Original-Received: by mail-it0-f59.google.com with SMTP id c20sf7435195itb.0 for ; Sat, 17 Dec 2016 00:53:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:cc:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=CcZ3+7QSxR9pX26YHCjn8Bcp10o1Vi5UpjyqNNtGMLc=; b=Dsf3b/+KUZM5moKaOaOnhP2mZ/yQmppqsx//b7v2r4HXWMJNoaiH/Q//JeJZMT868R YkKj4eiT7dx5OOkxo+OYyGVIrteyNFhWSb9AXpezXXwCwiQNrq16mXigb34JLNPHqCCD AdmJlQPgIyuEimn9ZnSOKeuW1Lc9/w0vUd5YrM7RhRcTJpQrkcvYCL0DRVYwc2zv/68M yZEFNqxA3Os3jsPbHzQOME7S0LSyUjH2M/7sBgJP2vZFPZuLzS57CUVRIwNotCAml6lS Iy6znz+GCViOPYWl3+1qBmT8tvgajcXi5BuFOaAcJyQtj1X6mRqvs5DcAAbfPQbYYjib tyFQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=CcZ3+7QSxR9pX26YHCjn8Bcp10o1Vi5UpjyqNNtGMLc=; b=L1sRK2j+erklgrR8Ff1dRvDihTyqIxGpxUQe7IQSLPTPAMkKh+BUHpGXEMsXD7ustd JkZvBvGiC3PXjuyWKUgD6DDyuywS24mnYvJr6khSU3BIluKjECEZYw3UIl51lfe9pkma /DGFsoM//oI7/y4LOCwcgFobO7jWO5GjHpohQlVl+ZTmbWSyEgdFpoHx2ceCNUALDmSU G4iDZ96xgLUqASxaBtnI/HRQ+RhZ2pt2RveV3hUrghCskKmhrxsd5GotaOJGkDBHL/cz FgG1VKcl5ul7Ub2vOOk90KOWSGAYp+C/YWEjMrgPP6x2z2VvUIeJ4l1zP3XLz2peP5gR FoQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:cc:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=CcZ3+7QSxR9pX26YHCjn8Bcp10o1Vi5UpjyqNNtGMLc=; b=MOxut1L3gwW70oZF7xbwqPwQRkBu7NHjDed41GChYXn7TNbXYGvodyL6IOXldhUN26 KSqRq391RMe+JRn9EgVlDSwxDCxcCK6QIi0wH8YHpBeHfUT1+fm7o/Jwtt7P5Leh7dP+ rLwr+4XWnOUUAlXX21zpyoLGV7muMupDfKq6SfHV7+qJnMet8VhzPJauUcoN6w3H3JXg i0DUfY0sRyBOeJQ+LBDtcpl0WWbtxNdrj69zJJLGcl7tpOUroyBwpSMNuLjpMbwsbP+7 Hc8VRtdZtDmsS6s34kmcZ3yrNvXoPeh2ujlIpCJ0h/z0F/vT5q85bMnY++eRnHS8aZLc VT0A== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AKaTC00sOR55G73PXyr1gFfKGeDoz3AvrzooFgjifvOMbb2e6xWwSqFy23JkC2kpXw08vg== X-Received: by 10.157.42.66 with SMTP id t60mr324668ota.2.1481964810678; Sat, 17 Dec 2016 00:53:30 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.157.4.76 with SMTP id 70ls8592376otc.32.gmail; Sat, 17 Dec 2016 00:53:29 -0800 (PST) X-Received: by 10.157.37.125 with SMTP id j58mr390255otd.18.1481964809741; Sat, 17 Dec 2016 00:53:29 -0800 (PST) In-Reply-To: <877f80h9x3.fsf-NJ6QtbQ9hATDZamjJ9D3v6C1jgCzLlUE@public.gmane.org> X-Original-Sender: 1manfactory-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:16463 Archived-At: ------=_Part_628_1130129246.1481964808759 Content-Type: multipart/alternative; boundary="----=_Part_629_1810262124.1481964808759" ------=_Part_629_1810262124.1481964808759 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello, I decided to give your soulution a try. When running pandoc -s in.txt --filter hyphfilter.py -o out.txt I get this: > pandoc: Error running filter hyphfilter.py > hyphfilter.py: createProcess: runInteractiveProcess: exec: does not exist= =20 > (No such file or directory) What am I doing wrong? I have no clue about Python. Juergen Am Freitag, 18. November 2016 20:20:51 UTC+1 schrieb Albert Krewinkel: > > J=C3=BCrgen Schulze <1manf...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org > writes:=20 > > > Hello, I would like the hyphenation abilities of pandoc when generating= =20 > PDF=20 > > documents to use for simple text/html files where the hyphenation is=20 > inserted=20 > > with entity "­".=20 > >=20 > > Something like=20 > >=20 > > pandoc --smart --wrap=3Dnone text.txt -o text2.txt=20 > >=20 > > How can this be done?=20 > > One method would be to rely on the CSS `hyphens` property. Unforunately,= =20 > that property is not supported by Chrome/Webkit, so an additional=20 > polyfill library like [Hyphenator](https://github.com/mnater/Hyphenator)= =20 > would be required.=20 > > Alternatively, a pandoc filters can insert soft hyphens directly.=20 > You'll need to install the python libraries `panflute` and `pyphen`.=20 > Put the following code into a file and call it as a pandoc filter.=20 > > > #!/usr/bin/env python3=20 > from panflute import *=20 > import pyphen=20 > > dic =3D pyphen.Pyphen(lang=3D'en_US')=20 > > def hyphenate(inline, doc):=20 > if type(inline) =3D=3D Str:=20 > hyphenated =3D dic.inserted(inline.text, hyphen=3D'=C2=AD')= =20 > return Str(hyphenated)=20 > > if __name__ =3D=3D "__main__":=20 > toJSONFilter(hyphenate)=20 > > > The above code uses the unicode soft hyphen instead of the HTML entity,= =20 > which helps keeping the filesize low. Just change the `hyphen`=20 > parameter if that's not what you want.=20 > > --=20 > Albert Krewinkel=20 > GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124=20 > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/112f7607-4771-4672-924d-98850eba8616%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. ------=_Part_629_1810262124.1481964808759 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hello, I decided to give your soulution a try.
When ru= nning
pandoc -s in.txt --filter hyphfilter.py -o out.txt
I get this:
pandoc: Error running filter hyphfilter.py
hyphfilte= r.py: createProcess: runInteractiveProcess: exec: does not exist (No such f= ile or directory)
What am I doing wrong?
I have no cl= ue about Python.

Juergen


Am Freitag, 18= . November 2016 20:20:51 UTC+1 schrieb Albert Krewinkel:
J=C3=BCrgen Schulze <1manf...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> w= rites:

> Hello, I would like the hyphenation abilities of pandoc when gener= ating PDF
> documents to use for simple text/html files where the hyphenation = is inserted
> with entity "&shy;".
>
> Something like
>
> pandoc --smart --wrap=3Dnone text.txt -o text2.txt
>
> How can this be done?

One method would be to rely on the CSS `hyphens` property. Unforunately= ,
that property is not supported by Chrome/Webkit, so an additional
polyfill library like [Hyphenator](https://= github.com/mnater/Hyphenator)
would be required.

Alternatively, a pandoc filters can insert soft hyphens directly.
You'll need to install the python libraries `panflute` and `pyphen`= .
Put the following code into a file and call it as a pandoc filter.


=C2=A0 =C2=A0 #!/usr/bin/env python3
=C2=A0 =C2=A0 from panflute import *
=C2=A0 =C2=A0 import pyphen

=C2=A0 =C2=A0 dic =3D pyphen.Pyphen(lang=3D'en_US')

=C2=A0 =C2=A0 def hyphenate(inline, doc):
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if type(inline) =3D=3D Str:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 hyphenated =3D dic.inserted(i= nline.text, hyphen=3D'=C2=AD')
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return Str(hyphenated)

=C2=A0 =C2=A0 if __name__ =3D=3D "__main__":
=C2=A0 =C2=A0 =C2=A0 =C2=A0 toJSONFilter(hyphenate)


The above code uses the unicode soft hyphen instead of the HTML entity,
which helps keeping the filesize low. =C2=A0Just change the `hyphen`
parameter if that's not what you want.

--=20
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe =C2=A0e836 388d c0b2 1f63 1124

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/112f7607-4771-4672-924d-98850eba8616%40googlegroups.co= m.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_629_1810262124.1481964808759-- ------=_Part_628_1130129246.1481964808759--