* Post-processing ConTeXt's output for text search
@ 2007-07-04 8:28 Piotr Kopszak
2007-07-04 10:05 ` Patrick Gundlach
0 siblings, 1 reply; 2+ messages in thread
From: Piotr Kopszak @ 2007-07-04 8:28 UTC (permalink / raw)
To: ntg-context
Hello list,
I would like to implement online text search of a book I'm publishing
with ConTeXt now, however, without making the book itself available
online, something like the snippet view in text search on Google Books
site. I don't even need snippet view, just page numbers would be
sufficient for the beginning. I am not asking for complete solution,
of course, rather an advice on direction in which to go. May first
vague idea is following: to make queries fast I think it would be
useful to obtain a list of words with numbers of pages on which they
appear, something very similar to plain index. So perhaps it would be
possible to force indexing engine to treat every word in text as if it
was an argument of \index command and split out a list in text format
which it would be easy to feed into a database. This is just a blind
guess which is far from perfect by design, only something that seems
easiest to implement. Please, tell me what other approaches would be
more promising.
Thanks in advance
Piotr
--
Piotr Kopszak, Ph.D.
Polish Art Gallery, National Museum in Warsaw
-----------------------------> http://kopszak.mnw.art.pl/
http://www.magnatune.com/artists/altri_stromenti
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Post-processing ConTeXt's output for text search
2007-07-04 8:28 Post-processing ConTeXt's output for text search Piotr Kopszak
@ 2007-07-04 10:05 ` Patrick Gundlach
0 siblings, 0 replies; 2+ messages in thread
From: Patrick Gundlach @ 2007-07-04 10:05 UTC (permalink / raw)
To: ntg-context
Hi,
you could split your pdf into separate pages, use an full-text search
engine such as swish-e to index each of the pages and store the results
together with the pagenumber (from the splitted pages).
IMO this would be the easiest thing.
Patrick
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2007-07-04 10:05 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-04 8:28 Post-processing ConTeXt's output for text search Piotr Kopszak
2007-07-04 10:05 ` Patrick Gundlach
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).