ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Sanjoy Mahajan <sanjoy@MIT.EDU>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: comparing pdfs
Date: Sun, 25 Oct 2009 09:21:59 -0400	[thread overview]
Message-ID: <E1N232l-0008Ge-JN@approx.mit.edu> (raw)
In-Reply-To: Your message of "Wed, 21 Oct 2009 16:13:19 +0200." <fe8d59da0910210713h7e665822s640a95c3fd1e0fa3@mail.gmail.com>

> Is there a way more quick and clean than  for cycle and ppm files?

The following shell script is not quick or clean, but it is thorough and
I use it to find changes from one version of a pdf file to the next.
For example, for my textbook page proofs, after I fix a bad line break,
I compare the new and most recent previous versions to check that the
fix has not created subsequent bad page breaks.

The script runs on GNU/Linux and requires pdftoppm (from xpdf or
poppler) and imagemagick (for the 'compare' utility).

It generates the comparison bitmaps in a /tmp directory.  The output
looks like:

/tmp/tmp.IWHtn0z7hk/diff-1.ppm   4250.32 (0.0648558)
/tmp/tmp.IWHtn0z7hk/diff-2.ppm   3429.2 (0.0523262)
/tmp/tmp.IWHtn0z7hk/diff-3.ppm   2890.33 (0.0441036)
/tmp/tmp.IWHtn0z7hk/diff-4.ppm   1455.9 (0.0222157)

where column 1 is the filename, which tells you which page is being
compared, column 2 is a measure of the difference between the two files
on that page, and column 2 is a normalized measure of column 2.  

To view the pages in order of most to least changed:

compare-pdfs a.pdf b.pdf | sort -nr -k2 | awk '{print $1}' | xargs feh -FV

I put this command in the Makefile for a project.

-Sanjoy

#! /bin/bash

# Usage: $0 file1.pdf file2.pdf
#   compares file1.pdf and file2.pdf by converting each page to bitmaps using
#   pdftoppm and then using the 'compare' ImageMagick utility
#
# Copyright 2007-2009 Sanjoy Mahajan.  Licensed under the GNU GPL version 2
# or (at your option) any later version.
#
# HISTORY
#   2009-09-30: Fix capture of dB output; don't use a viewer; use pdftoppm
#   2007-01-15: First version
#

dpi=144

if [ -z "$1" -o -z "$2" ]; then
  echo "Usage: $0 file1.pdf file2.pdf"
  exit 3
fi

# generate the many page images in a temporary directory
d=`mktemp -d`
pdftoppm -r $dpi $1 $d/one &
pdftoppm -r $dpi $2 $d/two &
wait

# find the union of the page numbers (in case one pdf has more pages)
pages=`ls $d/{one,two}-*.ppm | sed "s/.*-\([0-9][0-9]*\).ppm/\1/" | sort -un`

# compare each page
for p in $pages ; do
  if ! [ -e "$d/one-$p.ppm" ] ; then
    echo "$p: missing from $1"
    continue
  fi
  if ! [ -e "$d/two-$p.ppm" ] ; then
    echo "$p: missing from $2"
    continue
  fi
  echo -n "$d/diff-$p.ppm   "
  compare -metric mae $d/{one,two}-$p.ppm $d/diff-$p.ppm 2>&1
done
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


  parent reply	other threads:[~2009-10-25 13:21 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-21 14:13 luigi scarso
2009-10-23  8:50 ` luigi scarso
2009-10-25  9:16   ` Taco Hoekwater
2009-10-25 13:21 ` Sanjoy Mahajan [this message]
2009-10-26 20:33 ` Henning Hraban Ramm
2009-10-26 23:16   ` luigi scarso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1N232l-0008Ge-JN@approx.mit.edu \
    --to=sanjoy@mit.edu \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).