From: Sanjoy Mahajan <sanjoy@MIT.EDU>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: comparing pdfs
Date: Sun, 25 Oct 2009 09:21:59 -0400 [thread overview]
Message-ID: <E1N232l-0008Ge-JN@approx.mit.edu> (raw)
In-Reply-To: Your message of "Wed, 21 Oct 2009 16:13:19 +0200." <fe8d59da0910210713h7e665822s640a95c3fd1e0fa3@mail.gmail.com>
> Is there a way more quick and clean than for cycle and ppm files?
The following shell script is not quick or clean, but it is thorough and
I use it to find changes from one version of a pdf file to the next.
For example, for my textbook page proofs, after I fix a bad line break,
I compare the new and most recent previous versions to check that the
fix has not created subsequent bad page breaks.
The script runs on GNU/Linux and requires pdftoppm (from xpdf or
poppler) and imagemagick (for the 'compare' utility).
It generates the comparison bitmaps in a /tmp directory. The output
looks like:
/tmp/tmp.IWHtn0z7hk/diff-1.ppm 4250.32 (0.0648558)
/tmp/tmp.IWHtn0z7hk/diff-2.ppm 3429.2 (0.0523262)
/tmp/tmp.IWHtn0z7hk/diff-3.ppm 2890.33 (0.0441036)
/tmp/tmp.IWHtn0z7hk/diff-4.ppm 1455.9 (0.0222157)
where column 1 is the filename, which tells you which page is being
compared, column 2 is a measure of the difference between the two files
on that page, and column 2 is a normalized measure of column 2.
To view the pages in order of most to least changed:
compare-pdfs a.pdf b.pdf | sort -nr -k2 | awk '{print $1}' | xargs feh -FV
I put this command in the Makefile for a project.
-Sanjoy
#! /bin/bash
# Usage: $0 file1.pdf file2.pdf
# compares file1.pdf and file2.pdf by converting each page to bitmaps using
# pdftoppm and then using the 'compare' ImageMagick utility
#
# Copyright 2007-2009 Sanjoy Mahajan. Licensed under the GNU GPL version 2
# or (at your option) any later version.
#
# HISTORY
# 2009-09-30: Fix capture of dB output; don't use a viewer; use pdftoppm
# 2007-01-15: First version
#
dpi=144
if [ -z "$1" -o -z "$2" ]; then
echo "Usage: $0 file1.pdf file2.pdf"
exit 3
fi
# generate the many page images in a temporary directory
d=`mktemp -d`
pdftoppm -r $dpi $1 $d/one &
pdftoppm -r $dpi $2 $d/two &
wait
# find the union of the page numbers (in case one pdf has more pages)
pages=`ls $d/{one,two}-*.ppm | sed "s/.*-\([0-9][0-9]*\).ppm/\1/" | sort -un`
# compare each page
for p in $pages ; do
if ! [ -e "$d/one-$p.ppm" ] ; then
echo "$p: missing from $1"
continue
fi
if ! [ -e "$d/two-$p.ppm" ] ; then
echo "$p: missing from $2"
continue
fi
echo -n "$d/diff-$p.ppm "
compare -metric mae $d/{one,two}-$p.ppm $d/diff-$p.ppm 2>&1
done
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
next prev parent reply other threads:[~2009-10-25 13:21 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-21 14:13 luigi scarso
2009-10-23 8:50 ` luigi scarso
2009-10-25 9:16 ` Taco Hoekwater
2009-10-25 13:21 ` Sanjoy Mahajan [this message]
2009-10-26 20:33 ` Henning Hraban Ramm
2009-10-26 23:16 ` luigi scarso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1N232l-0008Ge-JN@approx.mit.edu \
--to=sanjoy@mit.edu \
--cc=ntg-context@ntg.nl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).