From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 26249 invoked from network); 1 Jun 2021 09:27:17 -0000 Received: from 1ess.inri.net (216.126.196.35) by inbox.vuxu.org with ESMTPUTF8; 1 Jun 2021 09:27:17 -0000 Received: from mail-lf1-f54.google.com ([209.85.167.54]) by 1ess; Mon May 31 14:18:51 -0400 2021 Received: by mail-lf1-f54.google.com with SMTP id i9so17907296lfe.13 for <9front@9front.org>; Mon, 31 May 2021 11:18:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=AmIr70w9/j2QDQSObgXodLAMNd7jIqv6YgmD7UWZnic=; b=BEEz66eWp9osY7eXwC5MiRdw/XAMXbF6AHx4FcT/idProNQCOU96JX9MH9mUx56+gM +b5XVvr2g2sUc/JpSDzhv0M5sLRgUeTI6TltkAwizFVBUmOzZiAllIUPVfh+iyGzqsY6 EPWi5ePc00sXuQeD9y1jtRKvGLZ9pFFWrbXjPe4u3T797Pe2husX2L6T6wwI4MT6QLO1 H5QveiLwMCRkKkiYt/tHPXPvHMbnwLOICtjDb6HtsSYhTQYqyA1VU0M3rpagWNu/1dzR PAritwSRJ22exNk8hu4jJucNsrI24OipGfoi7V7J4Y+shr8FKYyP30qYCgZwv139TeyN l8Gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=AmIr70w9/j2QDQSObgXodLAMNd7jIqv6YgmD7UWZnic=; b=DCIEicLSdPVCnhR3EY/BdIlreA/xwHBYCtsFEs1N7xXCtU86S0yHUKpTIcMNxFNHEG tBv9n0W2XibEZh+J+Rgg9tVdt6uRcu2dbfkE3T8jAl3pp6ylYmr8Q36iPYRnyxHnsGiC I8170mq46xJMQ5RBjJeIkyaRV+35ZslPkBiPu3SQfnWgYyKrv5kuVP6T0I9JdXeRzY9v oWhnp/t8JOHJDkIoulfedzhnvuR2RKwJ42oDcULOCuiktzS4SPywdwMIWIUJb6JFdvAR ndgLjM1dJp0n85ji9Lik0K6+RFdFuU1NsFYR3XcA/wfuYeDRqpoJHfF0ONu95iUWfFD9 aM8A== X-Gm-Message-State: AOAM531qu2iY2ZixCQDYQTK5C8h3SA7WlTpTyVE9cUnu6gqsfLBAa4vK LH5huo22wxR8yKsuc/GAYW+DiGOl9FXZMmHGvwbV/XI5QqQ= X-Google-Smtp-Source: ABdhPJy/brzNFaakVFVhs4VO9bSDPVPkZvJOLaN9iAGqYEMD5ByVHja0ArOdew+E28OgMlLkrFaxAI7GUhu09z72kUE= X-Received: by 2002:a05:6402:3488:: with SMTP id v8mr26327856edc.51.1622484755541; Mon, 31 May 2021 11:12:35 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: binary cat Date: Mon, 31 May 2021 14:12:19 -0400 Message-ID: To: 9front@9front.org Content-Type: text/plain; charset="UTF-8" List-ID: <9front.9front.org> List-Help: X-Glyph: ➈ X-Bullshit: shared ActivityPub cache-aware table-based strategy-aware generator Subject: Re: [9front] PDF search bounty Reply-To: 9front@9front.org Precedence: bulk For extracting text from pdfs, I found `/sys/lib/ghostscript/ps2ascii.gs` to be useful. Currently I just have an awk+rc script wrapped around it, already does a decent job, using `plumb` to bring up the correct page. I modified `page` to accept x/y coordinate arguments, but it doesn't use the same scaling/origin as `ps2ascii.gs`, so that's giving me some trouble.