In this section, we will discuss comparing means for two populations.
Let \(X_1,...,X_{n_1}\) be a simple random sample from a population with mean \(\mu_1\) and variance \(\sigma_1^2\).
Let \(X_1,..,X_{n_2}\) be a simple random sample from a population with mean \(\mu_2\) and variance \(\sigma_2^2\).
We will assume the two sided samples are independent.
Our inference will be focused on \(\mu_1-\mu_2\).
Assume \(\sigma_1^2=\sigma_2^2=\sigma^2\).
A pooled estimator of the common variance \(\sigma^2\) is
\[ \begin{aligned} S^2_p&={(n_1-1)S^2_1+(n_2-1)S_2^2\over n_1+n_2-2}\\ &={\sum_{i=1}^{n_1}(X_{1i}-\bar X_1)+\sum_{i=1}^{n_2}(X_{2i}-\bar X_2)\over n_1 + n_2 -2} \end{aligned} \]
a weighted average of \(S_1^2 and S_2^2\)
\[ {(\bar X_1-\bar X_2)-(\mu_1-\mu_2)\over \sqrt{S_p^2({1\over n_1}+{1\over n_2})}}\sim T_{n_1+n_2-2} \]
\[ \bar X_1\sim N(\mu_1,{\sigma_1^2\over n_1}),\bar X_2\sim N(\mu_2,{\sigma_2^2\over n_2})\\ \]
Since \(\bar X_1\) is independent from \(\bar X_2\)
\[ \bar X_1-\bar X_2\sim N(\mu_1-\mu_2,{\sigma^2_1\over n_1}+{\sigma^2_2\over n_2}) \]
the populations are not normal but \(n_1\geq 30\) and \(n_2\geq 30\)
\(S_1^2\) and \(s_2^2\) are “close enough”:
\[ {max\{S_1^2,S_2^2\}\over min\{S_1^2,S_2^2\}}<\begin{cases}5~if~n_1,n_2\approx 7\\3~if~n_1,n_2\approx15\\2~if~n_1,n_2\approx 30\\\end{cases} \]
Example: During the 2003 season, Major League Baseball took steps to speed up the play of baseball games in order to maintain fan interest. For a sample of 32 games played during the summer of 2002 and a sample of 35 games played during the summer of 2003, the sample mean duration of the games was computed. For games in 2002, the sample mean was 172 minutes with a standard deviation of 10.1 minutes. For games in 2003, the sample mean was 166 minutes with a standard deviation of 12.2 minutes.
Population 1:
Population 2:
Check assumptions:
Confidence Interval for \(\mu_1-\mu_2\)
A \((1-\alpha)100\%\) confidence interval for \(\mu_1-\mu_2\) is
\[ (\bar X_1-\bar X_2)\pm t_{n_1+n_2-2,\alpha/2}\sqrt{S^2_p({1\over n_1}+{1\over n_2})} \]
Example: Computer a 95% confidence interval for the difference in average game time for 2002 and 2003