9front - general discussion about 9front
 help / color / mirror / Atom feed
From: "Sigrid Solveig Haflínudóttir" <ftrvxmtrx@gmail.com>
To: 9front@9front.org
Subject: Re: [9front] PDF search bounty
Date: Mon, 31 May 2021 01:33:34 +0200	[thread overview]
Message-ID: <FFD3759B654A57BD7EB227EA36833E50@gmail.com> (raw)
In-Reply-To: <20CB1473-5752-4552-BE6E-B86988738BB6@cpan.org>

Quoth Romano <unobe@cpan.org>:
> What is included in Sigrid's attempted pdffs? Perhaps that would include search.
> 
> On May 30, 2021 10:59:04 PM UTC, Stanley Lieber <sl@stanleylieber.com> wrote:
> >On May 30, 2021 4:10:56 PM EDT, binary cat <dogedoge61@gmail.com>
> >wrote:
> >>What is the state of the $200 bounty on searching through PDFs?
> >>I thought I might give it a shot.
> >>
> >
> >i'm not aware of anyone having done any work on this.
> >
> >sl

Mostly just object extraction.  Text, images, etc.  Unpacking (gzip,
lzw and so on).  The part that is required for pdf2text is no there,
but is allegedly not too complex to implement.  Page contents usually
are a bunch of drawing operations that also include parts of text
being placed in specific locations (defined by coordinates X and Y) on
the page.  Search was definitely part of the plan.

Further development has been stalled due to assumption it might get
accepted as a GSOC project.  Since that did not happen, I will
continue as I have free time (and will) for this.  Noam might do that
too.


  reply	other threads:[~2021-05-30 23:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-30 20:10 binary cat
2021-05-30 22:59 ` Stanley Lieber
2021-05-30 23:09   ` Sigrid Solveig Haflínudóttir
2021-05-30 23:23   ` Romano
2021-05-30 23:33     ` Sigrid Solveig Haflínudóttir [this message]
2021-05-30 23:24   ` Noam Preil
2021-05-30 23:38 ` ori
2021-05-31 18:12   ` binary cat
2021-06-01 22:29 ` Noam Preil
2021-08-05 22:56   ` Noam Preil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FFD3759B654A57BD7EB227EA36833E50@gmail.com \
    --to=ftrvxmtrx@gmail.com \
    --cc=9front@9front.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).