From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/53685 Path: news.gmane.org!not-for-mail From: Sanjoy Mahajan Newsgroups: gmane.comp.tex.context Subject: Re: comparing pdfs Date: Sun, 25 Oct 2009 09:21:59 -0400 Message-ID: References: Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1256476961 20098 80.91.229.12 (25 Oct 2009 13:22:41 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 25 Oct 2009 13:22:41 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Sun Oct 25 14:22:34 2009 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by lo.gmane.org with esmtp (Exim 4.50) id 1N233J-0005D2-NS for gctc-ntg-context-518@m.gmane.org; Sun, 25 Oct 2009 14:22:33 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id C83C9C9A8A; Sun, 25 Oct 2009 14:21:27 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 7VJw3M+8NrSR; Sun, 25 Oct 2009 14:21:16 +0100 (CET) Original-Received: from balder.ntg.nl (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 237B0C9A70; Sun, 25 Oct 2009 14:21:16 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id F1F69C9A41 for ; Sun, 25 Oct 2009 14:21:14 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id TzzQ85Gx7rKk for ; Sun, 25 Oct 2009 14:21:12 +0100 (CET) Original-Received: from biscayne-one-station.mit.edu (BISCAYNE-ONE-STATION.MIT.EDU [18.7.7.80]) by balder.ntg.nl (Postfix) with ESMTP id C2178C9A70 for ; Sun, 25 Oct 2009 14:21:11 +0100 (CET) Original-Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by biscayne-one-station.mit.edu (8.13.6/8.9.2) with ESMTP id n9PDM3ue022387; Sun, 25 Oct 2009 09:22:03 -0400 (EDT) Original-Received: from approx.mit.edu (pool-71-174-190-77.bstnma.east.verizon.net [71.174.190.77]) (authenticated bits=0) (User authenticated as sanjoy@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id n9PDMDXA003794 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Sun, 25 Oct 2009 09:22:14 -0400 (EDT) Original-Received: from sanjoy by approx.mit.edu with local (Exim 4.69) (envelope-from ) id 1N232l-0008Ge-JN; Sun, 25 Oct 2009 09:21:59 -0400 In-Reply-To: Your message of "Wed, 21 Oct 2009 16:13:19 +0200." X-Mailer: MH-E 8.2; nmh 1.3; GNU Emacs 23.1.1 X-Scanned-By: MIMEDefang 2.42 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.12 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:53685 Archived-At: > Is there a way more quick and clean than for cycle and ppm files? The following shell script is not quick or clean, but it is thorough and I use it to find changes from one version of a pdf file to the next. For example, for my textbook page proofs, after I fix a bad line break, I compare the new and most recent previous versions to check that the fix has not created subsequent bad page breaks. The script runs on GNU/Linux and requires pdftoppm (from xpdf or poppler) and imagemagick (for the 'compare' utility). It generates the comparison bitmaps in a /tmp directory. The output looks like: /tmp/tmp.IWHtn0z7hk/diff-1.ppm 4250.32 (0.0648558) /tmp/tmp.IWHtn0z7hk/diff-2.ppm 3429.2 (0.0523262) /tmp/tmp.IWHtn0z7hk/diff-3.ppm 2890.33 (0.0441036) /tmp/tmp.IWHtn0z7hk/diff-4.ppm 1455.9 (0.0222157) where column 1 is the filename, which tells you which page is being compared, column 2 is a measure of the difference between the two files on that page, and column 2 is a normalized measure of column 2. To view the pages in order of most to least changed: compare-pdfs a.pdf b.pdf | sort -nr -k2 | awk '{print $1}' | xargs feh -FV I put this command in the Makefile for a project. -Sanjoy #! /bin/bash # Usage: $0 file1.pdf file2.pdf # compares file1.pdf and file2.pdf by converting each page to bitmaps using # pdftoppm and then using the 'compare' ImageMagick utility # # Copyright 2007-2009 Sanjoy Mahajan. Licensed under the GNU GPL version 2 # or (at your option) any later version. # # HISTORY # 2009-09-30: Fix capture of dB output; don't use a viewer; use pdftoppm # 2007-01-15: First version # dpi=144 if [ -z "$1" -o -z "$2" ]; then echo "Usage: $0 file1.pdf file2.pdf" exit 3 fi # generate the many page images in a temporary directory d=`mktemp -d` pdftoppm -r $dpi $1 $d/one & pdftoppm -r $dpi $2 $d/two & wait # find the union of the page numbers (in case one pdf has more pages) pages=`ls $d/{one,two}-*.ppm | sed "s/.*-\([0-9][0-9]*\).ppm/\1/" | sort -un` # compare each page for p in $pages ; do if ! [ -e "$d/one-$p.ppm" ] ; then echo "$p: missing from $1" continue fi if ! [ -e "$d/two-$p.ppm" ] ; then echo "$p: missing from $2" continue fi echo -n "$d/diff-$p.ppm " compare -metric mae $d/{one,two}-$p.ppm $d/diff-$p.ppm 2>&1 done ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________