##
Overview
Section* *

For all of the regression analyses that we performed so far in this course, it has been obvious which of the major predictors we should include in our regression model. Unfortunately, this is typically not the case. More often than not, a researcher has a large set of candidate predictor variables from which he tries to identify the most appropriate predictors to include in his regression model.

Of course, the larger the number of candidate predictor variables, the larger the number of possible regression models. For example, if a researcher has (only) 10 candidate predictor variables, he has \(2^{10} = 1024\) possible regression models from which to choose. Clearly, some assistance would be needed in evaluating all of the possible regression models. That's where the two variable selection methods — **stepwise regression** and **best subsets regression** — come in handy.

In this lesson, we'll learn about the above two variable selection methods. Our goal throughout will be to choose a small subset of predictors from the larger set of candidate predictors so that the resulting regression model is **simple** yet **useful**. That is, as always, our resulting regression model should:

- provide a good summary of the trend in the response, and/or
- provide good predictions of the response, and/or
- provide good estimates of the slope coefficients.

**Note!**The data sets herein are not really all that large. For the sake of illustration, they necessarily have to be small, so that the largeness of the data set does not obscure the pedagogical point being made.

## Objectives

- Understand the impact of the four different kinds of models with respect to their "correctness" — correctly specified, underspecified, overspecified, and correct but with extraneous predictors.
- As a way of ensuring that you understand the general idea behind stepwise regression, be able to conduct stepwise regression "by hand."
- Know the limitations of stepwise regression.
- Know the general idea behind best subsets regression.
- Know how to choose an optimal model based on the \(R^{2}\) value, the adjusted \(R^{2}\) value,
*MSE*and the \(C_p\) criterion. - Know the limitations of best subsets regression.
- Know the seven steps of good model building strategy.

#### Lesson 10 Code Files

Below is a zip file that contains all the data sets used in this lesson:

- bloodpress.txt
- cement.txt
- iqsize.txt
- martian.txt
- peru.txt
- Physical.txt