Statistics 101: The Arithmetic Mean (aka Mean or Average)
Statistics! Not Sadistics! the Start of a Series
When can the mean be calculated?
There are various ways to classify variables. One useful way is to distinguish between continuous and categorical data. Data is continuous if it can (at least in theory) take on any number. Data is categorical if it can only take on certain numbers. For example, weight, income, age and IQ are continuous. Choice of whom to vote for (e.g. McCain or Obama) party, hair color, and marital status are categorical. We will discuss this more in a later article.
When you have continuous data, two things that you often want to know are "What values are likely?" and "How spread out are the values?" Today, we will look at the first question, which, in statistician's language, is called central tendency. The most common measure of central tendency is the mean, more formally the arithmetic mean, and less formally the average. (To see why the mean makes no sense for categorical data - well, what's the average of McCain and Obama? Or of married and single? Perhaps the latter is "engaged"?)
How to calculate the mean
The mean is probably familiar, even if you only know it as the average. Add up the numbers, divide by how many numbers there are, and you've got the mean. So, for example, if the IQs of the people in your family are
155 (that would be you)
135 (your sister)
and
70 (her husband)
then the mean is (155 + 135 + 70)/ 3 = 120
Or, suppose the heights of the students in introductory psychology are (in inches, rounded to the nearest inch)
64 65 64 67 64 67 66 70 66 66
66 64 69 69 62 67 64 59 66 67
65 71 67 68 59 69 67 65 68 66
68 67 75 67 69 70 67 76 67 70
68 67 78 67 73 64 75 65 70 68.
The arithmetic mean of the above is 67.36 inches.
The mean: When not to use it
The mean is a bad choice if the data are skewed, which means that there is a 'tail' to the distribution on one side, but not the other. One common example of this is income. Some people make a whole lot more than the average person, but no one makes that much less. For instance, if the average income in the USA is $30,000 per year (I made that up) then there are some people who make millions more than that, but the poorest people make $30,000 less. When the data are skewed, the median and the trimmed or Winsorized mean are good choices. (You don't see the trimmed mean much, but it can be very useful), I will cover these in later articles.
The mean is also a bad choice if the data are multimodal, which means they have two or more "humps". For instance, if you had data on the heights of basketball players and jockeys, taking the overall mean would not be very informative.
The mean: What can go wrong
People sometimes try to average things that shouldn't be averaged. The most common is to average percentages. This is a bad idea. Here are some data from the last presidential election I will use just 4 states, to keep it simple; the same thing applies with all 50):
State Obama McCain
CA 61% 37%
NY 63% 36%
WY 33% 65%
UT 34% 63%
If one averages the percentages, one would get 48% for Obama (61 + 63 + 33 + 34)/4 and 50% for McCain (37 + 36 + 65 + 63) but that isn't right. A percentage is a form of a fraction, and you have to add the numerators and denominators and then form a new percentage, that is, add up the NUMBER voting Dem and Repub. and then get the percentage from the total. Here are the total voting, in millions of people:
State Obama McCain
CA 8.2 5.0
NY 4.8 2.8
WY 0.1 0.2
UT 0.3 0.6
Tot 13.4 8.6
In these four states, Obama got 61% of the vote..
As another example, suppose I ask the following;
In September, Joe's average gas mileage was 30 mpg. In October, it was 20 mpg. What was his average gas mileage for September and October? You might think; 20 + 30 = 50, divide by 2 = 25. But that's not the mean, because he might have driven different distances in the two months. If he drove 2000 miles in September, and 500 in October, then in September he used 2000/30 = 67 gallons, and in October he used 500/20 = 25 gallons. So, in total, he used 92 gallons to drive 2500 miles, and the mean is 2500/92 = 27.2 mpg.
Published by Peter Flom
I am a statistician, working with a wide variety of clients, mostly researchers in psychology, education, medicine, social sciences and other fields. I also have given talks and written articles on learning... View profile
- Get Better Gas Mileage on BeetlesVolkswagen Beetles are average when it comes to getting good gas mileage. There is room for improvement. Even if you improve your gas mileage by three miles per gallon, you can see significant savings over the course...
- Tips to Improve Your Gas MileageThere are many simple inexpensive way to improve your gas mileage. Gas is expensive today so it is important to conserve.
- How to Get Better Gas MileageTips on way to get better gas mileage so you can save money!
- Tips to Increase Your Gas MileageTips to increase gas mileage
- Optimizing Your Fuel Efficiency: Tips for Getting the Ultimate Gas MileageGiven what I know and what I've researched, I hope this article helps you save money on gas mileage.
- Statistical Concepts - MBA Accouting
- Cars with Good Gas Mileage: Squeeze that Dollar!
- More Gas and Better Gas Mileage for Your Money
- How to Increase Your Gas Mileage with a Few Easy Tricks
- New EPA Gas Mileage Ratings Tied to Cash for Clunkers Eligible Vehicles
- How to Improve Gas Mileage
- What's the Best Van Gas Mileage that You Can Buy?




3 Comments
Post a CommentSatrina ... There's no better way to find the mean, but I hope you are doing this in some software package! Even Excel is OK for means (although I wouldn't do any complex statistics in it). If you don't need the exact mean, you can round the numbers, but it won't really save much time.
I am calculating the mean for hospital data. The problem is that the number are extremely long, example operating expense 205,292,690 keeping in mind I have 12 items with lengthy numbers that must be averaged for 2001 and 2005. For numbers this large is there a better way to find the mean or is there a rule that should be followed when calculating the mean for financial comparison.
You explain it better than the textbooks did when I was in college. :-)