ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Post-processing ConTeXt's output for text search
@ 2007-07-04  8:28 Piotr Kopszak
  2007-07-04 10:05 ` Patrick Gundlach
  0 siblings, 1 reply; 2+ messages in thread
From: Piotr Kopszak @ 2007-07-04  8:28 UTC (permalink / raw)
  To: ntg-context

Hello list, 


I would like to implement online  text search of a book I'm publishing
with ConTeXt  now, however, without  making the book  itself available
online, something like the snippet view in text search on Google Books
site.   I don't even  need snippet  view, just  page numbers  would be
sufficient for the beginning.  I  am not asking for complete solution,
of course,  rather an advice on  direction in which to  go.  May first
vague idea  is following:  to make  queries fast I  think it  would be
useful to obtain  a list of words with numbers of  pages on which they
appear, something very similar to  plain index. So perhaps it would be
possible to force indexing engine to treat every word in text as if it
was an argument of \index command  and split out a list in text format
which it would be easy to feed  into a database.  This is just a blind
guess which is  far from perfect by design,  only something that seems
easiest to implement.  Please, tell  me what other approaches would be
more promising.



Thanks in advance

Piotr

--

  Piotr Kopszak, Ph.D.
  Polish Art Gallery, National Museum in Warsaw
  ----------------------------->    http://kopszak.mnw.art.pl/
  http://www.magnatune.com/artists/altri_stromenti

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Post-processing ConTeXt's output for text search
  2007-07-04  8:28 Post-processing ConTeXt's output for text search Piotr Kopszak
@ 2007-07-04 10:05 ` Patrick Gundlach
  0 siblings, 0 replies; 2+ messages in thread
From: Patrick Gundlach @ 2007-07-04 10:05 UTC (permalink / raw)
  To: ntg-context

Hi,

you could split your pdf into separate pages, use an full-text search
engine such as swish-e to index each of the pages and store the results
together with the pagenumber (from the splitted pages).

IMO this would be the easiest thing.

Patrick
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-07-04 10:05 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-04  8:28 Post-processing ConTeXt's output for text search Piotr Kopszak
2007-07-04 10:05 ` Patrick Gundlach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).