ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Bad PDF to text crawlers
@ 2015-08-19 21:05 Kip Warner
  2015-08-19 21:35 ` Peter Münster
  2015-08-20 17:57 ` creating multirow curly brace in tables to symbolize row span Henry House
  0 siblings, 2 replies; 5+ messages in thread
From: Kip Warner @ 2015-08-19 21:05 UTC (permalink / raw)
  To: ntg-context


[-- Attachment #1.1: Type: text/plain, Size: 876 bytes --]

Hey list,

I have an important document online that I would prefer to keep as a PDF 
and not in another format. Unfortunately bots frequently try to provide 
those looking for it with a text version they try to extract (beyond my 
control). The extraction looks just absolutely awful and has been a 
major pain in leaving readers with a really bad understanding of the 
contents of the document.

I was thinking that there must be some way of tricking these bots, 
depending on how they are implemented, and let's assume they will always 
find the PDF, to get them to extract only a small invisible layer that 
just contains some hidden text directing a user to the location to 
download the original high quality ConTeXt PDF.

Any suggestions?

-- 
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bad PDF to text crawlers
  2015-08-19 21:05 Bad PDF to text crawlers Kip Warner
@ 2015-08-19 21:35 ` Peter Münster
  2015-08-20 16:43   ` Kip Warner
  2015-08-20 17:57 ` creating multirow curly brace in tables to symbolize row span Henry House
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Münster @ 2015-08-19 21:35 UTC (permalink / raw)
  To: ntg-context

On Wed, Aug 19 2015, Kip Warner wrote:

> I was thinking that there must be some way of tricking these bots, 
> depending on how they are implemented, and let's assume they will always 
> find the PDF, to get them to extract only a small invisible layer that 
> just contains some hidden text directing a user to the location to 
> download the original high quality ConTeXt PDF.

Even if you would find a way today, tomorrow there would be other bots,
that see the same text, as the humans.


> Any suggestions?

Get the value of HTTP_USER_AGENT and send the replacement text, if the
agent is a bot. Or use robots.txt.

-- 
           Peter
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bad PDF to text crawlers
  2015-08-19 21:35 ` Peter Münster
@ 2015-08-20 16:43   ` Kip Warner
  0 siblings, 0 replies; 5+ messages in thread
From: Kip Warner @ 2015-08-20 16:43 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Peter Münster


[-- Attachment #1.1: Type: text/plain, Size: 467 bytes --]

On Wed, 2015-08-19 at 23:35 +0200, Peter Münster wrote:
> Even if you would find a way today, tomorrow there would be other 
> bots, that see the same text, as the humans.

Yes, probably.

> Get the value of HTTP_USER_AGENT and send the replacement text, if 
> the agent is a bot. Or use robots.txt.

I'll give that some thought. Thank you.

-- 
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 5+ messages in thread

* creating multirow curly brace in tables to symbolize row span
  2015-08-19 21:05 Bad PDF to text crawlers Kip Warner
  2015-08-19 21:35 ` Peter Münster
@ 2015-08-20 17:57 ` Henry House
  2015-08-20 18:05   ` Aditya Mahajan
  1 sibling, 1 reply; 5+ messages in thread
From: Henry House @ 2015-08-20 17:57 UTC (permalink / raw)
  To: mailing list for ConTeXt users

List:

I'm trying to create a table with this effect:

Parcel      |Area
          ⎧ | 1 acre trees
parcel 1  ⎨ | 2 acre vines
          ⎩ | 3 acre open
          ⎧ | 5 acre trees
parcel 2  ⎨ | 6 acre vines
          ⎩ | 4 acre open


In other words, I would like a big curly bracket with leftwards point spanning
three table rows to tell the reader that the leftmost column's entries apply to
a three-row span in the next column (a style often seen in tables in older
books).

I've tried, probably naively, the following approach, using the unicode symbols
for the bracket pieces and alternatively using math-mode symbols found in
http://meeting.contextgarden.net/2011/talks/day3_05_ulrik_opentype/Samples/unimath-symbols.pdf
"Every symbol defined by unicode-math". The symbols are recognized in neither
form, unfortunately.

Any suggestions on how I can either make these symbols render, or a different
approach to achieve my goal?

Complete test document:


\enableregime[utf]\setuppapersize[letter][letter]
\usetypescript[serif,sans,mono][hanging][normal]
\setupalign[hanging]
\usetypescript[modern-base][texnansi]
\setupbodyfont[reset]
\setupbodyfont[modern]
\definetypeface[boldmath][mm][boldmath][modern][default]
\usemodule[cmscbf]
\usemodule[unicode-math]
\setupbodyfont[11pt]

\starttext

\bTABLE
\bTR{}\bTD{}Parcel   \eTD\bTD{}                           \eTD\bTD Area \eTD\eTR%
\bTR{}\bTD{}         \eTD\bTD{}\mathematics{\lbraceuend}  \eTD\bTD 1 \eTD\eTR%
\bTR{}\bTD{}parcel 2 \eTD\bTD{}\mathematics{\lbracemid}   \eTD\bTD 2 \eTD\eTR%
\bTR{}\bTD{}         \eTD\bTD{}\mathematics{\lbracelend}  \eTD\bTD 3 \eTD\eTR%
\eTABLE

\bTABLE
\bTR{}\bTD{}         \eTD\bTD{}⎧                          \eTD\bTD 1 \eTD\eTR%
\bTR{}\bTD{}parcel 4 \eTD\bTD{}⎨                          \eTD\bTD 2 \eTD\eTR%
\bTR{}\bTD{}         \eTD\bTD{}⎩                          \eTD\bTD 3 \eTD\eTR%
\eTABLE

\stoptext



--
Henry House
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: creating multirow curly brace in tables to symbolize row span
  2015-08-20 17:57 ` creating multirow curly brace in tables to symbolize row span Henry House
@ 2015-08-20 18:05   ` Aditya Mahajan
  0 siblings, 0 replies; 5+ messages in thread
From: Aditya Mahajan @ 2015-08-20 18:05 UTC (permalink / raw)
  To: mailing list for ConTeXt users

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1367 bytes --]

On Thu, 20 Aug 2015, Henry House wrote:

> List:
>
> I'm trying to create a table with this effect:
>
> Parcel      |Area
>           ⎧ | 1 acre trees
> parcel 1  ⎨ | 2 acre vines
>           ⎩ | 3 acre open
>           ⎧ | 5 acre trees
> parcel 2  ⎨ | 6 acre vines
>           ⎩ | 4 acre open
>
>
> In other words, I would like a big curly bracket with leftwards point spanning
> three table rows to tell the reader that the leftmost column's entries apply to
> a three-row span in the next column (a style often seen in tables in older
> books).
>
> Any suggestions on how I can either make these symbols render, or a different
> approach to achieve my goal?

As an alternative, have you considered separating the groups by white 
space. For example:

\starttext

\bTABLE[frame=off, loffset=0.5em]
   \bTH
     \bTD Parcel \eTD
     \bTD Area   \eTD
   \eTH
   \bTR
     \bTD[nr=3, align={lohi}]
       Parcel 1
     \eTD
     \bTD 1 acre trees \eTD
   \eTR
   \bTR
     \bTD 2 acre vines \eTD
   \eTR
   \bTR
     \bTD 3 acre open \eTD
   \eTR
   \bTR[topdistance=\lineheight]
   \bTR
     \bTD[nr=3, align={lohi}]
       Parcel 2
     \eTD
     \bTD 5 acre trees \eTD
   \eTR
   \bTR
     \bTD 6 acre vines \eTD
   \eTR
   \bTR
     \bTD 4 acre open \eTD
   \eTR
\eTABLE

\stoptext

Aditya

[-- Attachment #2: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-08-20 18:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-19 21:05 Bad PDF to text crawlers Kip Warner
2015-08-19 21:35 ` Peter Münster
2015-08-20 16:43   ` Kip Warner
2015-08-20 17:57 ` creating multirow curly brace in tables to symbolize row span Henry House
2015-08-20 18:05   ` Aditya Mahajan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).