We can compare proportions from two populations using data from two independent samples.
Let \(X_1\sim Bin(n_1,p_1)\) and let \(\hat p_1={X_1\over n_1}\), \(\sim N(p_1,{p_1(1-p_1)\over n_1})\)
Let \(X_2\sim Bin(n_2,p_2)\) and let \(\hat p_2={X_2\over n_2}\), \(\sim N(p_2,{p_2(1-p_2)\over n_12})\)
Result: If \(n_1\hat p_1\geq 8\), \((1-n_1)\hat p_1\geq 8\), \(n_2\hat p_2\geq8\), \((1-n_2)\hat p_2\geq8\), then
\[ {(\hat p_1-\hat p_2)-(p_1-p_2)\over \sqrt{{\hat p_1(1-\hat p_1)\over n_1}+{\hat p_2(1-\hat p_2)\over n_2}}}\dot \sim N(0,1) \]
\(\sqrt{{\hat p_1(1-\hat p_1)\over n_1}+{\hat p_2(1-\hat p_2)\over n_2}}\) is the standard error.
Example: Time magazine reported the result of a telephone poll of 800 adult Americans. The question posed of the Americans who were surveyed was: “Should the federal tax on cigarettes be raised to pay for health care reform?” Non-smokers Smokers $ n_1 = 605,~n_2 = 195 $ 351 said “yes” 41 said “yes”
\(\hat p_1={351\over 60.5}=0.58\), \(\hat p_2={41\over 195}=0.21\)
A $(1-) 100% $ confidence interval for \(p_1-p_2\) is
\[ (\hat p_1-\hat p_2)\pm z_{\alpha/2}\sqrt{{\hat p_1(1-\hat p_1)\over n_1}+{\hat p_2(1-\hat p_2)\over n_2}} \]
Example: Find 95% confidence interval for the difference in the proportions of non-smokers and smokers who say “yes”.
\[ Z_{0.025}=1.96=qnorm(0.975)\\ (0.58-0.21\pm 1.96\times ) \]
\[ H_0:p_1-p_2=\Delta_0~~H_0:p_1-p_2=\Delta_0~~H_0:p_1-p_2=\Delta_0\\ H_a:p_1-p_2>\Delta_0~~H_a:p_1-p_2<\Delta_0~~H_a:p_1-p_2\neq\Delta_0\\ \]
\[ Z_{H_0}^{P_1P_2}={(\hat p_1-\hat p_2)-\Delta_0\over \sqrt{{\hat p_1(1-\hat p_1)\over n_1}+{\hat p_2(1-\hat p_2)\over n_2}}} \]
\[ Z_{H_0}^{P_1P_2}={\hat p_1-\hat p_2\over \sqrt{\hat p(1-\hat p)({1\over n_1}+{1\over n_2})}} \]
where
\[ \hat p={n_1\hat p_1+n_2\hat p_2\over n_1+n_2} \]
If \(H_0\) is true, \(Z_{H_0}^{P_1P_2}\dot \sim N(0,1)\)
We can find rejection rules and p-values using the same method as our previous Z test.
Example: Use \(\alpha = 0.05\) to test whether the proportion of non-smokers who say “yes” is greater than the proportion of smokers.
\[ H_0:p_1=p_2,H_a:p_1>p_2 \]
\[ \hat p={351+41\over 605+195}=0.49\\ \]
\[ Z_{H_0}^P={(0.58-0.21)-0\over\sqrt{0.49\times(1-0.49)\times({1\over605 }+{1\over 195})}}=8.988 \]
Method 1. Reject rule: if \(Z_{H_0}^p>\alpha\) reject \(H_0\)
Method 2. p-value: \(P(Z>8.988)=1-pnrom(8.988)\approx0\), reject \(H_0\)
We have sufficient evidence that the proportion of the non smokers who say yes is greater than that of smokers.
The prop.test function in R will perform the hypothesis test in the case of \(\Delta_0 = 0\).
x <- c(351,41)
n <- c(605,195)
prop.test(x, n, alternative = "greater", correct = F)
##
## 2-sample test for equality of proportions without continuity correction
##
## data: x out of n
## X-squared = 80.746, df = 1, p-value < 2.2e-16
## alternative hypothesis: greater
## 95 percent confidence interval:
## 0.3116585 1.0000000
## sample estimates:
## prop 1 prop 2
## 0.5801653 0.2102564
prop.test(x, n, alternative = "two.sided", correct = F)
##
## 2-sample test for equality of proportions without continuity correction
##
## data: x out of n
## X-squared = 80.746, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.3004992 0.4393185
## sample estimates:
## prop 1 prop 2
## 0.5801653 0.2102564
In practice, use continuity correction.
In large n, the result of correction and non corrections are similar
Previously, we’ve learned how to do a hypothesis test comparing two means.
Now, we would like to compare means across more than two populations.
Why don’t we just do multiple pairwise comparisons?
We will lose control of type I error (\(\alpha\))
very likely to identify a false significance by chance.
ANOVA uses a single hypothesis test to check whether the means across several populations are equal.
\[ H0 : \mu_1 = \mu_2 = · · · = \mu_k\\ Ha: at~least~one mean~is~different \]
Example: A psychologist predicts that students will learn most effectively with a constant background sound, as opposed to an unpredictable sound or no sound at all. She randomly divides fifteen students into three groups of five. All students study a passage of text for 30 minutes. Those in group 1 study with background sound at a constant volume in the background. Those in group 2 study with noise that changes volume periodically and randomly. Those in group 3 study with no sound at all. After studying, all students take a 10 point multiple choice test over the material. Their scores are in the table below.
| Constant Sound | Random Sound | No Sound | |
|---|---|---|---|
| 7 | 5 | 2 | |
| 4 | 5 | 4 | |
| 6 | 3 | 6 | |
| 8 | 1 | 1 | |
| 9 | 4 | 2 | |
| Sample Mean | 6.8 | 3.6 | 3.0 |
| Sample Std. Dev. | 1.9 | 1.7 | 2.0 |
Idea behind ANOVA:
If H0 is true (\(\mu_1 = \mu_2 = \mu_3\)), the variability between the sample means should be small. (sample means shall be similar with each other)
If \(H_0\) is false, the variability between the sample means should be large.
The variability between the sample means is called the variability between groups.
In order to determine if the variability between groups is large or small, we need to compare it to the variability within each group.
This is why this method is called Analysis of Variance.
When \(H_0:\mu_1 = \mu_2 = \mu_3\) is false, the between groups variability will be much greater than the with in groups variability.
When \(H_0:\mu_1 = \mu_2 = \mu_3\) is true, the between groups variability will be about the same as the within groups variability.
Suppose we have independent random samples form k populations:
\[ \begin {matrix} X_{11},X_{12},...,X_{1n_1}\\ X_{21},X_{22},...,X_{2n_2}\\ ...\\ X_{k1},X_{k2},...,X_{kn_k}\\ \end{matrix} \]
Notation
\(\bar X_i\) is the sample mean for the ith random sample.
\(S_i^2\) is the sample variance for the ith random sample.
\(\bar X\) is the overall sample mean. (also called ground mean)
\(N=n_1+...+n_k\) is the overall sample size.
Examples (sound):
\[ N=5+5+5=15\\ \bar X=4.47\\ n_1=5,\bar X_1=6.8,S_1=1.9\\ n_2=..,\bar X_2=..,S_2=..\\ n_3=..,\bar X_3=..,S_3=..\\ \]
Variability between groups:
\[ SSTr=\sum_{i=1}^kn_i(\bar X_i-\bar X)^2 \]
\[ MSTr={SSTr\over k-1} \]
SSTr is called the treatment sum of squares.
MSTr is called the mean squares for treatment.
Example: What are the SSTr and MSTr for the sound example?
\[ SSTr=5(6.8-4.47)^2+5(3.6-4.47)^2+5(3.0-4.47)^2=41.73\\ MSTr={41.3\over3-1}=20.87 \]
Variability within groups:
\[ SSE=\sum _{i=1}^k\sum_{j=1}^{n-i}(X_{ij}-\bar X_{i})^2=\sum _{i=1}^k(n_i-1)S_i^2 \]
\[ MSE={SSE\over N-k} \]
SSE is called the error sum of squares.
MSE is called the mean squares for the error.
Example: What are SSE and MSE for the sound example?
\[ SSE=4\times 1.9^2+4\times 1.7^2+4\times 2.0^2=42.0\\ MSE={42.0\over 15-3}=3.5 \]
The test statistic \(F_{H_0}\) is the ratio of the MSTr and MSE:
\[ F_{H_0}={MSTr\over MSE} \]
\(H_0\) is true: when \(F_{H_0}\) is small
\(H_a\) is true: when \(F_{H_0}\) is large
If \(H_0\) is true, \(F_{H_0}\) has an F distribution.
– The F distribution is skewed to the right. – The F distribution is frequently used when the test statistic is a ratio. – The shape is determined by two degrees of freedom: ν1 and ν2.
The degrees of freedom:
– \(ν1 = DF_{SST r} = k − 1\)
– \(ν2 = DF_{SSE} = N − k\)
Example: What is the test statistic and what are the degrees of freedom for the sound example?
\[ F={20.87\over 3.5}=5.96\\ v_1=2,v_2=12 \]
Conclusion
In R: qf(1-a, ν1, ν2) gives \(F_{ν1,ν2,α}\).
In R: pf(x, ν1, ν2) gives \(P(F_{ν1,ν2} \leq x)\).
Example: What is the conclusion for the sound example? Use \(\alpha = 0.05\).
Method 1: Rejection rule
qf(0.95,2,12)
## [1] 3.885294
which is less than 5.96, reject \(H_0\)
Method 2: p-value
pf(5.96,2,12)
## [1] 0.9840588