Confidence Intervals for Proportions

Example: Suppose we take a survey of 100 American adults and find that 89 of them favor expanding solar energy. How would we estimate the true proportion of American adults who favor expanding solar energy?

estimate sample proportion by \(\hat p={89\over 100}\) so sample proportion, 89%. Is a point estimator of p. However, the \(\hat p\) is random if we re-sample.

Now: We’ll provide an interval of plausible values for this proportion.

Let \(X \sim Bin(n,p)\) and let \(\hat p={X\over n}\). Then use the Central Limit Theorem,

\[ {\hat p -p \over \sqrt{\hat p(1-p)\hat p \over n}}\dot\sim N(0,1) \]

\(p\) is the mean of \(\hat p\) and the \(\sqrt{\hat p(1-p)\hat p \over n}\) is the estimated standard error of \(\hat p\).

A \((1-\alpha)100\%\) confidence interval for p.

\[ P(-1.96\leq {\hat p -p \over \sqrt{\hat p(1-p)\hat p \over n}}\leq 1.96)=95\% \]

rearrange the equation above, you can get:

\[ \begin{pmatrix}\hat p-z_{\alpha/2} \sqrt{\hat p(1-p)\hat p \over n},\hat p+z_{\alpha/2} \sqrt{\hat p(1-p)\hat p \over n}\end {pmatrix} \]

where \(z_{\alpha/2} \sqrt{\hat p(1-p)\hat p \over n}\) is called the margin of error (ME)

This confidence interval works well as long as \(n\hat p\geq 8\) (number of observed successes) and \(n(1-\hat p)\geq 8\) (number of observed failures) In general, the larger \(np\), \(n(1-p)\), the better Central Limit Theorem works.

Recall \(z_{\alpha /2}\) can be found form R using qnorm(1-a/2, 0, 1).

Example: Find the 95% confidence interval for the proportion of American adults who favor expanding solar energy.

for this example, we can get 95% confidence interval that p will lies in

(0.89+qnorm(0.975)*sqrt(0.89*(1-0.89)/100))
## [1] 0.9513253
(0.89+qnorm(0.025)*sqrt(0.89*(1-0.89)/100))
## [1] 0.8286747

When we increase the sample size, we can narrow the range of estimation

(0.89+qnorm(0.975)*sqrt(0.89*(1-0.89)/1000))
## [1] 0.9093928
(0.89+qnorm(0.025)*sqrt(0.89*(1-0.89)/1000))
## [1] 0.8706072

How do we interpret this confidence interval?

Precision

The precision in the estimation of p is quantified by the margin of error or the length of the confidence interval (2ME).

For proportions:

\[ MOE=z_{\alpha/2}\sqrt{\hat p (1-\hat p)\over n} \]

MOE depends on \(\alpha\), as \(\alpha\) increase, Margin of error decrease. MOE also depends on \(n\), as \(n\) increases, Margin of error decrease.

How can we make our confidence interval shorter?

  1. Reduce the confidence level.
qnorm(0.995)
## [1] 2.575829
qnorm(0.975)
## [1] 1.959964
qnorm(0.95)
## [1] 1.644854
  1. Increase the sample size

L is desired length of Confidence Interval.

\(L =2z_{\alpha/2}\sqrt{\hat p(1-\hat p)\over n} \to n={4 z_{\alpha/2}\hat p(1-\hat p)\over L^2}\)

Choose n so that

\[ n\geq {4z^2_{\alpha/2}\hat p_{pr}(1-\hat p_{pr})\over L^2} \]

(The value of n is round up already)

  • \(\hat p_{pr}\) is preliminary estimate of p.

  • If no \(\hat p_{pr}\) exists, use \(\hat p_{pr}=0.5\). (the worst case that maximizes \(\hat p(1-\hat p)\))

Example: Find the sample size required for a 95% confidence interval of length 0.06 for the proportion of American adults who favor expanding solar energy.

\[ L=0.06, z_{\alpha/2}=0.96,\hat p=0.89\\ n\geq {4\times(1.96)^2\times 0.89\times (1-0.89)\over0.06^2}=417.88\approx 418 \]

The worst case, use \(\hat p=0.5\)

\[ n\geq {4\times(1.96)^2\times 0.5\times (1-0.5)\over 0.06^2}=1067.1\approx 1068 \]