On 2/3/23 10:01 AM, Bakul Shah wrote: > > https://github.com/ocrmypdf/OCRmyPDF > > It's a python script that runs most any unix and uses > tesseract. Its author's motivation seems similar to yours: > > I searched the web for a free command line tool to OCR PDF files: I found many, but none of them were really satisfying: > • Either they produced PDF files with misplaced text under the image (making copy/paste impossible) > • Or they did not handle accents and multilingual characters > • Or they changed the resolution of the embedded images > • Or they generated ridiculously large PDF files > • Or they crashed when trying to OCR > • Or they did not produce valid PDF files > • On top of that none of them produced PDF/A files (format dedicated for long time storage) > ...so I decided to develop my own tool. Nice. Off to checking out OCRmyPDF! > I rarely print PDFs any more. I can't seem to get away from having to highlight and mark up the stuff I read. I love pdf's searchability of words, but not for quickly locating a section, or just browsing and studying them. I can flip pages much faster with paper than an ebook it seems :). -will