From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/92372 Path: news.gmane.org!not-for-mail From: Kip Warner Newsgroups: gmane.comp.tex.context Subject: Bad PDF to text crawlers Date: Wed, 19 Aug 2015 14:05:51 -0700 Message-ID: <20150819210551.GH26883@kip-desktop> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0411075197==" X-Trace: ger.gmane.org 1440018410 18736 80.91.229.3 (19 Aug 2015 21:06:50 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 19 Aug 2015 21:06:50 +0000 (UTC) To: ntg-context@ntg.nl Original-X-From: ntg-context-bounces@ntg.nl Wed Aug 19 23:06:36 2015 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([5.39.185.229]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZSAZ7-0005Md-7I for gctc-ntg-context-518@m.gmane.org; Wed, 19 Aug 2015 23:06:33 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 331731023B for ; Wed, 19 Aug 2015 23:06:31 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id MWzB3zLqDd1O for ; Wed, 19 Aug 2015 23:06:30 +0200 (CEST) Original-Received: from balder.ntg.nl (localhost [IPv6:::1]) by balder.ntg.nl (Postfix) with ESMTP id 35C821023D for ; Wed, 19 Aug 2015 23:06:07 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 720DD10205 for ; Wed, 19 Aug 2015 23:06:02 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id R7Kse80LENQ8 for ; Wed, 19 Aug 2015 23:06:00 +0200 (CEST) Original-Received: from filter2-utr.mf.surf.net (filter2-utr.mf.surf.net [195.169.124.153]) by balder.ntg.nl (Postfix) with ESMTP id D69E010201 for ; Wed, 19 Aug 2015 23:06:00 +0200 (CEST) Original-Received: from homiemail-a91.g.dreamhost.com (sub5.mail.dreamhost.com [208.113.200.129]) by filter2-utr.mf.surf.net (8.14.4/8.14.4/Debian-4) with ESMTP id t7JL5sKv018271 for ; Wed, 19 Aug 2015 23:05:59 +0200 Original-Received: from homiemail-a91.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a91.g.dreamhost.com (Postfix) with ESMTP id 488BBAE069 for ; Wed, 19 Aug 2015 14:05:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thevertigo.com; h=date :from:to:subject:message-id:mime-version:content-type; s= thevertigo.com; bh=QkBxaErjmy/mONQRRU9+RUFZcWA=; b=SPplkc0GyaHC/ 5V/J1elA8yWWzsQR89wQmDkas8G+COx8rZRw1kCq4dJWvkQQq5rqNbecm7HvmKG8 xbB48aphnzz4Tf/1Mbpdmu7vKiM0ycymIKhG8dXc1AGWvvsambOvNQ5z9gmL/gkf svI85fQLsCluxqIafalINA2pCXly00= Original-Received: from kip-desktop (unknown [69.172.169.61]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: kip@thevertigo.com) by homiemail-a91.g.dreamhost.com (Postfix) with ESMTPSA id 26EF5AE05B for ; Wed, 19 Aug 2015 14:05:53 -0700 (PDT) User-Agent: Mutt/1.5.23 (2014-03-12) X-Bayes-Prob: 0.0001 (Score 0, tokens from: ntg-context@ntg.nl, base:default, @@RPTN) X-CanIt-Geo: ip=208.113.200.129; country=US; region=California; city=Brea; latitude=33.9262; longitude=-117.8019; http://maps.google.com/maps?q=33.9262,-117.8019&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 04P6J5TeW - 5a260112a080 - 20150819 (trained as not-spam) X-Scanned-By: CanIt (www . roaringpenguin . com) on 195.169.124.153 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.16 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: "ntg-context" Xref: news.gmane.org gmane.comp.tex.context:92372 Archived-At: --===============0411075197== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="yzvKDKJiLNESc64M" Content-Disposition: inline --yzvKDKJiLNESc64M Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hey list, I have an important document online that I would prefer to keep as a PDF=20 and not in another format. Unfortunately bots frequently try to provide=20 those looking for it with a text version they try to extract (beyond my=20 control). The extraction looks just absolutely awful and has been a=20 major pain in leaving readers with a really bad understanding of the=20 contents of the document. I was thinking that there must be some way of tricking these bots,=20 depending on how they are implemented, and let's assume they will always=20 find the PDF, to get them to extract only a small invisible layer that=20 just contains some hidden text directing a user to the location to=20 download the original high quality ConTeXt PDF. Any suggestions? --=20 Kip Warner -- Senior Software Engineer OpenPGP encrypted/signed mail preferred http://www.thevertigo.com --yzvKDKJiLNESc64M Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlXU768ACgkQLXnfK7bii21gxACgroLG5M1HmolBQbRAKQVHrsco qMcAn1+ktT43XNs0gFLc4dKPdZdjvV7c =gFGf -----END PGP SIGNATURE----- --yzvKDKJiLNESc64M-- --===============0411075197== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18KSWYgeW91ciBxdWVzdGlvbiBpcyBvZiBpbnRlcmVz dCB0byBvdGhlcnMgYXMgd2VsbCwgcGxlYXNlIGFkZCBhbiBlbnRyeSB0byB0aGUgV2lraSEKCm1h aWxsaXN0IDogbnRnLWNvbnRleHRAbnRnLm5sIC8gaHR0cDovL3d3dy5udGcubmwvbWFpbG1hbi9s aXN0aW5mby9udGctY29udGV4dAp3ZWJwYWdlICA6IGh0dHA6Ly93d3cucHJhZ21hLWFkZS5ubCAv IGh0dHA6Ly90ZXguYWFuaGV0Lm5ldAphcmNoaXZlICA6IGh0dHA6Ly9mb3VuZHJ5LnN1cGVsZWMu ZnIvcHJvamVjdHMvY29udGV4dHJldi8Kd2lraSAgICAgOiBodHRwOi8vY29udGV4dGdhcmRlbi5u ZXQKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX18= --===============0411075197==--