ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Kip Warner <kip@thevertigo.com>
To: ntg-context@ntg.nl
Subject: Bad PDF to text crawlers
Date: Wed, 19 Aug 2015 14:05:51 -0700	[thread overview]
Message-ID: <20150819210551.GH26883@kip-desktop> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 876 bytes --]

Hey list,

I have an important document online that I would prefer to keep as a PDF 
and not in another format. Unfortunately bots frequently try to provide 
those looking for it with a text version they try to extract (beyond my 
control). The extraction looks just absolutely awful and has been a 
major pain in leaving readers with a really bad understanding of the 
contents of the document.

I was thinking that there must be some way of tricking these bots, 
depending on how they are implemented, and let's assume they will always 
find the PDF, to get them to extract only a small invisible layer that 
just contains some hidden text directing a user to the location to 
download the original high quality ConTeXt PDF.

Any suggestions?

-- 
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

             reply	other threads:[~2015-08-19 21:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-19 21:05 Kip Warner [this message]
2015-08-19 21:35 ` Peter Münster
2015-08-20 16:43   ` Kip Warner
2015-08-20 17:57 ` creating multirow curly brace in tables to symbolize row span Henry House
2015-08-20 18:05   ` Aditya Mahajan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150819210551.GH26883@kip-desktop \
    --to=kip@thevertigo.com \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).