Area Judgements: Areal Problem

Tim Brock / Monday, April 6, 2015

Research suggests we aren't very good at judging areas. We're much better at judging lengths and position. I've already touched on this before when discussing whether to choose a pie chart or a bar chart, but area judgements are also required in other popular visualization formats. If we want to be effective at communicating data through visualizations we should favor encodings that require length or position judgements. Hence we should reject the proportional circles on the left in favour of the bars on the right (or something similar like a dot plot).

If that was all there was to it then this would be a very short article. But there are other reasons one might choose to use area. In his book Information Visualization, Colin Ware suggests use of area in place of length when there is a large difference in size between the largest and smallest values that require encoding. (For example, most of the spices above retail in the UK at a few pence per gram. Saffron, on the other hand, costs ~£5-15 per gram!) The use of a second dimension means smaller values can be seen and magnitudes distinguished without the need for a very large screen or piece of paper. Large magnitude differences are often displayed using logarithmic scales or transformations. This, however, may not be appropriate for a general audience.

A different (probably more common) use of area encoding is in data-map making. In this case position is already being put to use to show geographic features and locations. In the example below, the center of each circle marks the location of the stadium of a Barclays Premier League soccer team while the area of each circle is proportional to the capacity of the stadium (as given in the official Premier League Handbook).

As an alternative to "patch maps" (now frequently termed choropleth maps), Cleveland and McGill suggest the use of "framed rectangles" for encoding values on maps. These are individual bars encased in their own outer rectangle that provides a fixed reference for (theoretically) easier comparison. The perceptual task in decoding the data is then length measurement or relative position along non-aligned scales. In principle this could also work as an alternative to the proportional circles used above. The map below, using the same data as the map above, illustrates this option.

How do the two methods compare? Firstly I think it's intuitive that the center of a circle represents a location. In the framed-rectangle version it is the center of the base of the frame that marks the location; this is probably less obvious. A much bigger issue for the framed-rectangle map is what happens when points overlap. This is most problematic in London, where six clubs are based. With circles we can distinguish between the grounds despite the overlapping; we can, for example, see that one stadium (Arsenal's) is noticeably bigger than the other five. In the framed-rectangle map not even the number of stadiums in London is clear. This alone makes it a poor choice in this particular instance. Where data points are more dispersed and/or more evenly distributed this may not necessarily be the case.

Of course if your main interest is to communicate stadium capacities and not locations then a simple bar chart or dot plot would (again) be appropriate.

This last point can be generalised: don't automatically assume a data map is the best choice of visualization for your data just because it contains geographical aspects. Hans Rosling's famous TED talk of February 2006 highlights this wonderfully. While his talk centers around political geography, his main visualization tool is not a map but a bubble plot (aka a bubble chart). He still uses area (and indeed color) to encode data, but by eschewing longitude and latitude he's able to use horizontal and vertical position to show more information about his topic of concern - global development. (Gapminder World allows you to interactively explore the data he discusses at your own leisure.)

So bubble plots represent another type of visualization where use of area encoding helps to convey a message. This doesn't change the fact that we're better at judging position than we are at judging area, so if you go down the bubble plot route then use the horizontal and vertical scales for the most important variables in the dataset. This may, of course, depend on what information it is critical to convey at any given time and thus may be different for different publications or presentations or even separate slides in the same presentation.