Statistics 101: The Arithmetic Mean (aka Mean or Average)

Statistics! Not Sadistics! the Start of a Series

Peter Flom
In this introductory statistics article, we will explore the mean, formally known as the arithmetic mean (average) and how it's used and abused; in later articles we will look at other measures of central tendency such as the median, the mode, and some others.

When can the mean be calculated?

There are various ways to classify variables. One useful way is to distinguish between continuous and categorical data. Data is continuous if it can (at least in theory) take on any number. Data is categorical if it can only take on certain numbers. For example, weight, income, age and IQ are continuous. Choice of whom to vote for (e.g. McCain or Obama) party, hair color, and marital status are categorical. We will discuss this more in a later article.

When you have continuous data, two things that you often want to know are "What values are likely?" and "How spread out are the values?" Today, we will look at the first question, which, in statistician's language, is called central tendency. The most common measure of central tendency is the mean, more formally the arithmetic mean, and less formally the average. (To see why the mean makes no sense for categorical data - well, what's the average of McCain and Obama? Or of married and single? Perhaps the latter is "engaged"?)

How to calculate the mean

The mean is probably familiar, even if you only know it as the average. Add up the numbers, divide by how many numbers there are, and you've got the mean. So, for example, if the IQs of the people in your family are

155 (that would be you)
135 (your sister)
and
70 (her husband)
then the mean is (155 + 135 + 70)/ 3 = 120

Or, suppose the heights of the students in introductory psychology are (in inches, rounded to the nearest inch)

64 65 64 67 64 67 66 70 66 66

66 64 69 69 62 67 64 59 66 67

65 71 67 68 59 69 67 65 68 66

68 67 75 67 69 70 67 76 67 70

68 67 78 67 73 64 75 65 70 68.

The arithmetic mean of the above is 67.36 inches.

The mean: When not to use it

The mean is a bad choice if the data are skewed, which means that there is a 'tail' to the distribution on one side, but not the other. One common example of this is income. Some people make a whole lot more than the average person, but no one makes that much less. For instance, if the average income in the USA is $30,000 per year (I made that up) then there are some people who make millions more than that, but the poorest people make $30,000 less. When the data are skewed, the median and the trimmed or Winsorized mean are good choices. (You don't see the trimmed mean much, but it can be very useful), I will cover these in later articles.

The mean is also a bad choice if the data are multimodal, which means they have two or more "humps". For instance, if you had data on the heights of basketball players and jockeys, taking the overall mean would not be very informative.

The mean: What can go wrong

People sometimes try to average things that shouldn't be averaged. The most common is to average percentages. This is a bad idea. Here are some data from the last presidential election I will use just 4 states, to keep it simple; the same thing applies with all 50):

State Obama McCain

CA 61% 37%

NY 63% 36%

WY 33% 65%

UT 34% 63%

If one averages the percentages, one would get 48% for Obama (61 + 63 + 33 + 34)/4 and 50% for McCain (37 + 36 + 65 + 63) but that isn't right. A percentage is a form of a fraction, and you have to add the numerators and denominators and then form a new percentage, that is, add up the NUMBER voting Dem and Repub. and then get the percentage from the total. Here are the total voting, in millions of people:

State Obama McCain

CA 8.2 5.0

NY 4.8 2.8

WY 0.1 0.2

UT 0.3 0.6

Tot 13.4 8.6

In these four states, Obama got 61% of the vote..

As another example, suppose I ask the following;

In September, Joe's average gas mileage was 30 mpg. In October, it was 20 mpg. What was his average gas mileage for September and October? You might think; 20 + 30 = 50, divide by 2 = 25. But that's not the mean, because he might have driven different distances in the two months. If he drove 2000 miles in September, and 500 in October, then in September he used 2000/30 = 67 gallons, and in October he used 500/20 = 25 gallons. So, in total, he used 92 gallons to drive 2500 miles, and the mean is 2500/92 = 27.2 mpg.

Published by Peter Flom

I am a statistician, working with a wide variety of clients, mostly researchers in psychology, education, medicine, social sciences and other fields. I also have given talks and written articles on learning...  View profile

3 Comments

Post a Comment
  • Peter Flom10/18/2009

    Satrina ... There's no better way to find the mean, but I hope you are doing this in some software package! Even Excel is OK for means (although I wouldn't do any complex statistics in it). If you don't need the exact mean, you can round the numbers, but it won't really save much time.

  • Satrina M. Hill10/18/2009

    I am calculating the mean for hospital data. The problem is that the number are extremely long, example operating expense 205,292,690 keeping in mind I have 12 items with lengthy numbers that must be averaged for 2001 and 2005. For numbers this large is there a better way to find the mean or is there a rule that should be followed when calculating the mean for financial comparison.

  • Kristie Leong M.D.9/7/2009

    You explain it better than the textbooks did when I was in college. :-)

Displaying Comments

To comment, please sign in to your Yahoo! account, or sign up for a new account.