The estimated regression line is
\[ \hat Y=\hat \alpha_1+\hat \beta_1 x \]
We can use this regression line to predict values of Y for specific values of x. We will denote this predicted value with \(\hat \mu_{Y|X}(x)\).
Example: What is the predicted height for a woman whose mother is 69 inches tall?
\(\hat y=\hat \alpha_1+\hat\beta_1x=29.917+0.542\times 69=67.315\)
\(\hat \mu_{Y|X=69}=\hat E(Y|X=69)=67.315\) inches.
This value represents two types of prediction.
Average height of all women whose mothers are 69 inches
Smaller error due to averaging
The height of an individual (future observation) women whose mother is 69 inches.
Higher error for individual prediction
The errors and intervals associated with these two cases are different.
It is easier to estimate the average for populations (smaller error, narrower confidence interval) than to predict a response for an individual (larger error, larger confidence interval).
A \((1-\alpha)100\)% confidence interval is
\[ \hat \mu_{Y|X}(x)\pm t_{n-2,\alpha/2}S_{\epsilon}\sqrt{{1\over n}+{(x-\bar X)^2\over \sum(X_i-\bar X)^2}} \]
The margin of error of average response interval will converge to zero as \(n\to \infty\)
The margin of error depends on variation of response (error).
Confidence interval at the center of predictors, and wider as x moves away from \(\bar X\)
Confidence interval is smaller as sample increase
Larger variance of X leads to a smaller Confidence interval. (more “type” of sample collected.)
A \((1-\alpha)100\)% prediction interval is
\[ \hat \mu_{Y|X}(x)\pm t_{n-2,\alpha/2}S_{\epsilon}\sqrt{1+{1\over n}+{(x-\bar X)^2\over \sum(X_i-\bar X)^2}} \]
The margin of error of prediction interval will not converge to zero as \(n\to \infty\)
The prediction band is larger in prediction than average response
Both of these intervals can be computed using R.
Example: Compute and interpret the 95% confidence interval and 95% prediction interval when mother’s height is 69 inches.
t <- data.frame(Mheight=69)
predict(model, t, interval = "confidence", level = 0.95)
## fit lwr upr
## 1 67.29798 66.94365 67.65231
predict(model, t, interval = "prediction", level = 0.95)
## fit lwr upr
## 1 67.29798 62.83808 71.75789
Caution: Don’t do prediction for values of x that are far outside of the range of the data.
fit is the \(\hat\mu_{Y|X}\) of data
Interpretation:
Average interval: We are 95% confident that the average height of women whose mother are 69 inches tall is between 66.94 an 67.65.
Prediction interval: We are 9% confident that a particular women whose mother is 69 inches tall is between 62.84 and 71.76 inches.