Re: Charts, Graphs, Tufte, and ConTeXt

From: Karl Ove Hufthammer <karloh@mi.uib.no>
Subject: Re: Charts, Graphs, Tufte, and ConTeXt
Date: Thu, 27 Jul 2006 16:14:23 +0200	[thread overview]
Message-ID: <eaaho0$4am$1@sea.gmane.org> (raw)
In-Reply-To: <554cfd4f0607261402g3fa7767cp3be2ac11001199d3@mail.gmail.com>

Nicolas Grilly skreiv:

> Karl Ove Hufthammer <karloh@mi.uib.no> wrote:
>> Yes! R (especially using the new grid and lattice framework) produces
>> excellent charts and graphs, with very sensible default options (much
>> of it based on Cleveland's research).
> 
> What is Cleveland's research? Can you provide references on the web?

Cleveland has done much research on graphical perception and the visual
decoding of information from data displays. He was one of the first to do
actual scientific study on this.

Earlier, many people had opinions on various common graphs (e.g., ‘pie
charts are bad – I don’t like them’). Cleveland came along and did actual
scientific *experiments* to show why some type of graphs were worse than
others for presenting data (e.g., ‘humans are very bad at judging angles
and very good at judging position along a common scale; that’s why pie
charts are terrible and dot plots good at presenting (the same) data’),
and he proposed new graphical display *based* on this research.

See for example this very interesting and easy to read article:

Title:               Graphical Perception: Theory, Experimentation, and
                     Application to the Development of Graphical Methods
Author(s):           William S. Cleveland; Robert McGill
Source:              Journal of the American Statistical Association, Vol. 79,
                     No. 387. (Sep., 1984), pp. 531-554.
Stable URL:         
http://links.jstor.org/sici?sici=0162-1459%28198409%2979%3A387%3C531%3AGPTEAA%3E2.0.CO%3B2-Y

Some of Cleveland’s research resulted in novel graphical displays, such as
trellis displays, coplots and dot plots, and much of it resulted in
improvements to common displays. Unfortunately, many of these smaller
improvements and very minor but important details seems to be unknown to
people who design graphing software. Let me mention a few (not too
exciting) examples:

Circles should be used instead of rectangles as plotting symbols, especially
with data overlap, because overlapping rectangles still look like
rectangles, while overlapping circles look nothing like circles. Cleveland
actually recommended a list of plotting symbols (for plotting several
groups in one plot) for use in scatterplots; see:

Title:               A Model for Studying Display Methods of Statistical
                     Graphics
Author(s):           William S. Cleveland
Source:              Journal of Computational and Graphical Statistics, Vol. 2,
                     No. 4. (Dec., 1993), pp. 323-343.
Stable URL:         
http://links.jstor.org/sici?sici=1061-8600%28199312%292%3A4%3C323%3AAMFSDM%3E2.0.CO%3B2-Y

Tick marks should point outwards, not inwards (so they don’t camouflage
data).

The data rectangle should always be slightly smaller than the scale-line
rectangle (the box around the data), again to avoid camouflaging the data.

These are just a few (perhaps less interesting) features of graph design
that R does correctly, but many other programs (e.g., gnuplot, at least for
tick marks and data rectangles) don’t (by default).

Much of Cleveland’s research has been summarised in his excellent book

W.S. Cleveland. Elements of Graphing Data. Revised edition. 1994.

See also his other book

W.S. Cleveland. Visualizing data. 1993.

Other articles of his that may be of interest:

Title:               Graphical Perception and Graphical Methods for Analyzing
                     Scientific Data
Author(s):           William S. Cleveland; Robert McGill
Source:              Science, New Series, Vol. 229, No. 4716. (Aug. 30, 1985),
                     pp. 828-833.
Stable URL:         
http://links.jstor.org/sici?sici=0036-8075%2819850830%293%3A229%3A4716%3C828%3AGPAGMF%3E2.0.CO%3B2-D
Abstract:            Graphical perception is the visual decoding of the
                     quantitative and qualitative information encoded on
                     graphs. Recent investigations have uncovered basic
                     principles of human graphical perception that have
                     important implications for the display of data. The
                     computer graphics revolution has stimulated the invention
                     of many graphical methods for analyzing and presenting
                     scientific data, such as box plots, two-tiered error bars,
                     scatterplot smoothing, dot charts, and graphing on a log
                     base 2 scale.

Title:               Graphical Perception: The Visual Decoding of Quantitative
                     Information on Graphical Displays of Data
Author(s):           William S. Cleveland; Robert McGill
Source:              Journal of the Royal Statistical Society. Series A
                     (General), Vol. 150, No. 3. (1987), pp. 192-229.
Stable URL:         
http://links.jstor.org/sici?sici=0035-9238%281987%29150%3A3%3C192%3AGPTVDO%3E2.0.CO%3B2-T
Abstract:            Studies in graphical perception, both theoretical and
                     experimental, provide a scientific foundation for the
                     construction area of statistical graphics. From these
                     studies a paradigm that has important applications for
                     practice has begun to emerge. The paradigm is based on
                     elementary codes: Basic geometric and textural aspects of
                     a graph that encode the quantitative information. The
                     methodology that can be invoked to study graphical
                     perception is illustrated by an investigation of the shape
                     parameter of a two-variable graph, a topic that has had
                     much discussion, but little scientific study, for at least
                     70 years.

Title:               The Many Faces of a Scatterplot
Author(s):           William S. Cleveland; Robert McGill
Source:              Journal of the American Statistical Association, Vol. 79,
                     No. 388. (Dec., 1984), pp. 807-822.
Stable URL:         
http://links.jstor.org/sici?sici=0162-1459%28198412%2979%3A388%3C807%3ATMFOAS%3E2.0.CO%3B2-G
Abstract:            The scatterplot is one of our most powerful tools for data
                     analysis. Still, we can add graphical information to
                     scatterplots to make them considerably more powerful.
                     These graphical additions, faces of sorts, can enhance
                     capabilities that scatterplots already have or can add
                     whole new capabilities that faceless scatterplots do not
                     have at all. The additions we discuss here-some new and
                     some old-are (a) sunflowers, (b) category codes, (c) point
                     cloud sizings, (d) smoothings for the dependence of $y$ on
                     $x$ (middle smoothings, spread smoothings, and upper and
                     lower smoothings), and (e) smoothings for the bivariate
                     distribution of $x$ and $y$ (pairs of middle smoothings,
                     sum-difference smoothings, scale-ratio smoothings, and
                     polar smoothings). The development of these additions is
                     based in part on a number of graphical principles that can
                     be applied to the development of statistical graphics in
                     general.

-- 
Karl Ove Hufthammer

_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context