Prediction

The estimated regression line is

\[ \hat Y=\hat \alpha_1+\hat \beta_1 x \]

We can use this regression line to predict values of Y for specific values of x. We will denote this predicted value with \(\hat \mu_{Y|X}(x)\).

Example: What is the predicted height for a woman whose mother is 69 inches tall?

\(\hat y=\hat \alpha_1+\hat\beta_1x=29.917+0.542\times 69=67.315\)

\(\hat \mu_{Y|X=69}=\hat E(Y|X=69)=67.315\) inches.

This value represents two types of prediction.

  1. The average value of Y for the sub-population with \(X=x\)

Average height of all women whose mothers are 69 inches

Smaller error due to averaging

  1. An individual value of Y when \(X=x\)

The height of an individual (future observation) women whose mother is 69 inches.

Higher error for individual prediction

The errors and intervals associated with these two cases are different.

It is easier to estimate the average for populations (smaller error, narrower confidence interval) than to predict a response for an individual (larger error, larger confidence interval).

Confidence Interval for an Average Response

A \((1-\alpha)100\)% confidence interval is

\[ \hat \mu_{Y|X}(x)\pm t_{n-2,\alpha/2}S_{\epsilon}\sqrt{{1\over n}+{(x-\bar X)^2\over \sum(X_i-\bar X)^2}} \]

The margin of error of average response interval will converge to zero as \(n\to \infty\)

  • The margin of error depends on variation of response (error).

  • Confidence interval at the center of predictors, and wider as x moves away from \(\bar X\)

  • Confidence interval is smaller as sample increase

  • Larger variance of X leads to a smaller Confidence interval. (more “type” of sample collected.)

Prediction Interval for an Individual Response

A \((1-\alpha)100\)% prediction interval is

\[ \hat \mu_{Y|X}(x)\pm t_{n-2,\alpha/2}S_{\epsilon}\sqrt{1+{1\over n}+{(x-\bar X)^2\over \sum(X_i-\bar X)^2}} \]

The margin of error of prediction interval will not converge to zero as \(n\to \infty\)

The prediction band is larger in prediction than average response

In R

Both of these intervals can be computed using R.

Example: Compute and interpret the 95% confidence interval and 95% prediction interval when mother’s height is 69 inches.

t <- data.frame(Mheight=69)
predict(model, t, interval = "confidence", level = 0.95)
##        fit      lwr      upr
## 1 67.29798 66.94365 67.65231
predict(model, t, interval = "prediction", level = 0.95)
##        fit      lwr      upr
## 1 67.29798 62.83808 71.75789

Caution: Don’t do prediction for values of x that are far outside of the range of the data.

fit is the \(\hat\mu_{Y|X}\) of data

Interpretation:

  • Average interval: We are 95% confident that the average height of women whose mother are 69 inches tall is between 66.94 an 67.65.

  • Prediction interval: We are 9% confident that a particular women whose mother is 69 inches tall is between 62.84 and 71.76 inches.