Statistics 101: Which Type of Regression Should I Choose?

It Mostly Depends on Your Dependent Variable

Peter Flom
Regression is a set of statistical techniques for relating a dependent variable to one or more independent variables. Briefly, a dependent variable (sometimes called an outcome variable) is one that you think is related to the independent variables. Although regression can't prove causation, you usually think that the relationship goes from the independent variable(s) and to the dependent variable. (For more on the distinction, see this article).

The most common kind of regression; the first one (and sometimes the only one) learned in statistics classes is known as ordinary least squares, or OLS regression. OLS regression is appropriate when the dependent variable is continuous, either interval or ratio level (if you do not know what those terms mean, see this article). OLS regression also makes other assumptions, but if the dependent variable is not continuous, OLS regression cannot be appropriate.

Of course, many variables are continuous, or nearly so: Income, weight, IQ, SAT score and many others. But many are not. They come in various forms.

Some variables are counts: For example, the number of children you have, the number of cars you own, the number of times you have been married. What distinguishes count variables is that they are integers, that is, whole numbers, and that they never negative. You cannot have a negative number of children or cars. The two most common kinds of regression for count variables are Poisson regression and negative binomial regression. So, for example, if you wanted to look at the number of children people had, and relate it to (say) age, income, racial/ethnic group, and the number of brothers and sisters the parents have, you would use one of these models. The main difference is that Poisson regression makes a very restrictive assumption about the relationship between the conditional mean and the conditional variance, while negative binomial regression relaxes this assumption. In my experience, the assumption of Poisson regression is almost never met.

Many count variables have a lot of zero counts. For example, the number of heart attacks a person has had: Most people have had none. To deal with such variables, there are variations called zero-inflated Poisson regression and zero-inflated negative binomial regression.

Some variables are dichotomies: They can only take two values. These include living vs. dying, getting a particular disease vs. not getting the disease, voting for Obama vs. voting for McCain and so on). For these variables, the appropriate regression method is binary logistic regression .

Some variables are nominal or ordinal, again, see this article for a detailed explanation, but, briefly, ordinal variables have an order but no defined interval and nominal variables have not even order. For nominal variables, the right kind of regression is multinomial logistic regression. For ordinal variables the right method is ordinal logistic regression.

Finally, some variables are times to events. For example, time until death, time until getting a degree and so on. One key trait here is that such variables are often censored. Although there are different kinds of censoring, the most common is right censoring. This means that some people do not have the event happen while you are studying them. But they may have one after the study is over. There is a whole field of statistics called survival analysis which deals with these variables, but the most common kind of regression for times is Cox proportional hazards regression.

This is just a brief survey; typically, courses are offered in each of the topics listed. But I hope it gives you some ideas about the varieties of regression.

Published by Peter Flom

I am a statistician, working with a wide variety of clients, mostly researchers in psychology, education, medicine, social sciences and other fields. I also have given talks and written articles on learning...  View profile

5 Comments

Post a Comment
  • Teila Tankersley8/30/2011

    You are a gifted writer, bravo and loved the one on Autism - It's Not a Spectrum, It's a Ballpark

  • Mike Powers8/25/2011

    Wha...? I can't even begin to comprehend statistics.

  • Michele Starkey8/25/2011

    Nicely written, cheers ;)

  • Mary Oberg8/24/2011

    I don't remember anything from my statistics classes in college.

  • Don Rothra8/24/2011

    I was here earlier but couldn't leave a comment. This is intriguing. Nice work, Peter.

Displaying Comments

To comment, please sign in to your Yahoo! account, or sign up for a new account.