… damn lies and statistics

The saying, “there are three kinds of lies: lies, damned lies, and statistics” is attributed to Mark Twain but may have originated from Benjamin Disraeli. It’s one of those phrases that has been misquoted more times than I can remember because in truth statistics is a branch of mathematics essential to understanding the world. Statistics is like a scalpel; in the hands of a skilled surgeon it can save lives, but in the hands of a serial killer it is an instrument of death. And the worse serial killers of statistics are marketing executives and politicians.

Probably the most common statistical parameter used by the news media is the average. The average size of family, or the average income; it has become so familiar it hardly gets a second thought. An average is calculated by adding up all the numbers in a data set and then dividing by the number of data points. An average can be useful as it reduces a lot of data to one easily understood number, but it can also mislead for precisely the same reason. The problem with average is that it says nothing about the way the data are distributed. An average is most useful when the data are what is known as a Gaussian, or normal distribution, whereby they are evenly distributed and the spread of numbers is bell-shaped (see Figure-1).

Random events such as the height of individuals in a country, or the weight of grains of sand on the beach will form a classical Gaussian distribution. Other data, such as earnings in the UK are not Gaussian, they are skewed to one side. To take an extreme example to illustrate what I mean, fifty people might have an average salary of £30,000 per year but that average could equally arise from all fifty earning £30,000 each or from forty-nine having a salary of one pound per year and one person having a salary of £1,499,951.

I want to be careful not to make political points rather than focus on the use – or misuse – of statistics. All politicians, in my experience, have selected statistics to suit their purpose at one time or another but it just happens that a particular example has recently come along. There was a Tweet from UK politician Dominic Raab saying real wages are rising at the fastest rate for 10-years (Figure-2).

The plot shows average weekly earnings versus year. It shows the latest average weekly wage as £495, which is equivalent to £25,740 per year. The first thing that strikes me is that the graph is pretty flat across 2016-2018 and to my eye, the fastest rate of increase was around 2015. The claim that wages are rising at the fasted rate for 10-years therefore seems to me to be somewhat ambitious based upon the graph in the Tweet.

Nevertheless, putting the basics of graphical interpretation to one side, the use of average is fine if the data have a Gaussian distribution such as in Figure-1 but, as I’ve already said, UK earnings are not Gaussian. Take the 2013/14 earnings statistics for example (Figure-3). In such cases, two other parameters are more meaningful: the median and the mode. (For a Gaussian distribution, the average, median and mode all coincide, but these parameters part company if the spread of data leans one way or another).

The median shows the point at which the data are centred, that is the mid-way point. Since those earning lower wages are more numerous than those earning higher wages, then the average will always be a higher number than the median. The mode sits at the apex of the data, coinciding with the most frequently occurring number. For the 2013/14 statistics the mode was around £16,500 per year and given the distribution pattern it’s going to be a lower number than the median and the average. You can see the mode of £16,500 is around 56% of the mean of 29,172, a very significant difference. In short, the use of average with non-Gaussian wages data will always give a more favourable number than the median and mode. This is important because the use of average in this case hides the income for most wage earners.

Does this really matter? Yes it does. If a democratic society is to have informed opinions then at the very least those who are elected to office should present a complete picture of what’s happening in the country. We hear a lot about fake news – which is frankly, just stuff made up. The misuse of statistics however, is much more subtle. It can give an air of authority but also be perniciously misleading. Whether the politicians are spinning the statistics to suit their preconceived beliefs or whether they are just ignorant, I will leave to the reader to decide. In the meantime, beware of political statistics and remember Mark Twain.

%d bloggers like this: