Comparing Two Means

In this section, we will discuss comparing means for two populations.

Case 1: Equal Variances

  • Assume \(\sigma_1^2=\sigma_2^2=\sigma^2\).

  • A pooled estimator of the common variance \(\sigma^2\) is

\[ \begin{aligned} S^2_p&={(n_1-1)S^2_1+(n_2-1)S_2^2\over n_1+n_2-2}\\ &={\sum_{i=1}^{n_1}(X_{1i}-\bar X_1)+\sum_{i=1}^{n_2}(X_{2i}-\bar X_2)\over n_1 + n_2 -2} \end{aligned} \]

a weighted average of \(S_1^2 and S_2^2\)

  • If both population are normal, then

\[ {(\bar X_1-\bar X_2)-(\mu_1-\mu_2)\over \sqrt{S_p^2({1\over n_1}+{1\over n_2})}}\sim T_{n_1+n_2-2} \]

\[ \bar X_1\sim N(\mu_1,{\sigma_1^2\over n_1}),\bar X_2\sim N(\mu_2,{\sigma_2^2\over n_2})\\ \]

Since \(\bar X_1\) is independent from \(\bar X_2\)

\[ \bar X_1-\bar X_2\sim N(\mu_1-\mu_2,{\sigma^2_1\over n_1}+{\sigma^2_2\over n_2}) \]

  • In practice, our inference will hold if
  1. the populations are not normal but \(n_1\geq 30\) and \(n_2\geq 30\)

  2. \(S_1^2\) and \(s_2^2\) are “close enough”:

\[ {max\{S_1^2,S_2^2\}\over min\{S_1^2,S_2^2\}}<\begin{cases}5~if~n_1,n_2\approx 7\\3~if~n_1,n_2\approx15\\2~if~n_1,n_2\approx 30\\\end{cases} \]

Example: During the 2003 season, Major League Baseball took steps to speed up the play of baseball games in order to maintain fan interest. For a sample of 32 games played during the summer of 2002 and a sample of 35 games played during the summer of 2003, the sample mean duration of the games was computed. For games in 2002, the sample mean was 172 minutes with a standard deviation of 10.1 minutes. For games in 2003, the sample mean was 166 minutes with a standard deviation of 12.2 minutes.

Population 1:

Population 2:

Check assumptions:

Confidence Interval for \(\mu_1-\mu_2\)

A \((1-\alpha)100\%\) confidence interval for \(\mu_1-\mu_2\) is

\[ (\bar X_1-\bar X_2)\pm t_{n_1+n_2-2,\alpha/2}\sqrt{S^2_p({1\over n_1}+{1\over n_2})} \]

Example: Computer a 95% confidence interval for the difference in average game time for 2002 and 2003