Confidnece Interval for the Mean

Let \(X_1,... X_n\) be a simple random sample from a population with mean \(\mu\) and variance \(\sigma ^2\).

\(\bar X\) is the sample mean, \(S^2\) is the sample variance

Z Confidence Interval for the Mean

If the population variance \(\sigma^2\) is known, then by CLT

\[ {\bar X-\mu\over \sigma/\sqrt{n}}\dot\sim N(0,1) \]

This is equivalent to

\[ \bar X \dot\sim N(\mu,{\sigma^2\over n}) \]

\[ P(-z_{\alpha/2}\leq {\bar X-\mu\over \sigma/\sqrt{n}}\leq z_{\alpha/2})=1-\alpha\\ P(-z_{\alpha/2}\cdot \sigma/\sqrt{n}\leq \bar X-\mu\leq z_{\alpha/2}\cdot \sigma/\sqrt{n})=1-\alpha\\ P( \bar X-z_{\alpha/2}\cdot \sigma/\sqrt{n}\leq\mu\leq \bar X+z_{\alpha/2}\cdot \sigma/\sqrt{n})=1-\alpha \]

A \((1-\alpha)100\%\) confidence interval for \(\mu\) is

\[ \begin{pmatrix}\bar X-z_{\alpha/2}{\sigma\over \sqrt{n}},\bar X+z_{\alpha/2}{\sigma\over \sqrt{n}}\end{pmatrix} \]

\(z_{\alpha/2}{\sigma\over \sqrt{n}}\) is the Margin of Error.

This confidence interval only works if

  1. \(n\geq 30\) (from central limit theorem) or the population has a normal distribution

  2. \(\sigma^2\) is known

This method is not really commonly used…

T Confidence Interval for the Mean

If we make the additional assumption that \(X-1,...,x-n\) are iid \(N(\mu,\sigma^2)\), then

\[ {\bar X-\mu\over S/\sqrt{n}}\sim T_{n-1} \]

We replace \(\sigma\) by sample standard deviation S. with n-1 degree of freedom

A \((1-\alpha)100%\) confidence interval for \(\mu\) is

\[ \begin{pmatrix}\bar X-t_{n-1,\alpha/2}{S\over \sqrt{n}},\bar X-t_{n-1,\alpha/2}{S\over \sqrt{n}}\end {pmatrix} \]

Replace \(z_{\alpha/2}\) with \(t_{n-1,\alpha/2}\) (critical value) \(z_{\alpha/2}<t_{n-1}\).

Notes:

  1. If the population is normal, this confidence interval hods for all n.

  2. If the population is not normal, this confidence interval will still work well if \(n\geq 30\)

  3. \(t_{n-1,\alpha/2}\) can be found in R using qt(1-\alpha/2, n-1)

Example: The following data include temperatures for a random sample of 10 time points in New York in September 1973. Find a 90% confidence interval for the average temperature.

temp=c(81,72,67,88,93,96,84,82,82,67)
qqnorm(temp)
qqline(temp)

This looks fine for normality

\[ n=10,\bar X=81.20,S=10.01\\ t_{9,0.05}=qt(0.95,9)=1.833\\ ME=1.833\times{10.01\over\sqrt{10}}=5.80\\ 81.20\pm5.8=[75.4,87.00] \]

Interpretation:

We are 90% confident that the average temperature in New York in Sep. 1973 is between 75.4 and 87.00 degree.

We can obtain T confidence intervals in R:

confint(lm(temp ~ 1),level =0.9)
##                  5 %     95 %
## (Intercept) 75.39804 87.00196
# lm stands for linear model, level is the confidence interval. 90% in this case)

Precision

For means, the margin of error is

\[ MOE=z_{\alpha/2}{\sigma\over \sqrt{n}}~ or~ MOE=t_{n-1,\alpha/2} {S\over \sqrt{n}} \]

L is the desired length of Confidence Interval.

\(L=2\times z_{\alpha/2}\times {\sigma\over \sqrt{n}}\to n=(2\times z_{\alpha/2}\times {\sigma\over L})^2\)

To make the confidence interval shorter, we can increase the sample size.

Choose n so that

\[ n\geq(2z_{\alpha/2}{S_{pr}\over L})^2 \]

  • \(S_{pr}\) is a preliminary estimate of \(\sigma\).

Note: Often a preliminary estimate of \(\sigma\) is not available.

  1. Guess a likely range of values that the variable will take.

  2. Use \(S_{pr}={range\over 3.5}\) (if uniform) \(S_{pr}={range\over 4}\) (is normal)

Example: What sample size should we use to obtain a 90% confidence interval of length 5 for the average temperature in New York in 1973?

\[ L=5, S_{pr}=10.01,z_{\alpha/2}=qnorm(0.95)=1.645\\ n\geq[2\times 1.645\times{0.01\over 5}]^2=43.38=44 \]

Always round up.

Confidence Intervals for the Variance

Example: Suppose we want a confidence interval for the variance in temperature in New York in 1973.

\(\sigma^2\)= true variance, \(S^2\)= sample variance to estimate \(\sigma^2\)

It turns out that if the population has a normal distribution

\[ {(n-1)S^2\over \sigma^2}\sim\chi^2_{n-1} \]

The chi-square Distribution

If \(Z_1,...,Z_n\) are iid with \(N(0,1)\) then \(Z_1^2,...,Z_n^2\sim \chi_n^2\).

  • \(X\sim \chi_v^2\) has the \(\chi^2\) distribution with v degrees of freedom

  • Plot of \(\chi_v^2\) densities:

    • Positive support

    • Non-symmetric

library(ggplot2)

ggplot(data.frame(x = c(0, 20)), aes(x = x)) +
     stat_function(fun = dchisq, args = list(df = 1))

ggplot(data.frame(x = c(0, 20)), aes(x = x)) +
     stat_function(fun = dchisq, args = list(df = 2))

ggplot(data.frame(x = c(0, 20)), aes(x = x)) +
     stat_function(fun = dchisq, args = list(df = 4))

ggplot(data.frame(x = c(0, 20)), aes(x = x)) +
     stat_function(fun = dchisq, args = list(df = 8))

Confidence Interval

A \((1-\alpha)100\%\) confidence interval for \(\sigma^2\) is

\[ {(n-1)S^2\over \chi_{n-1,\alpha/2}^2}<\sigma^2<{(n-1)S^2\over \chi_{n-1,1-\alpha/2}^2} \]

percentiles for the \(\chi^2\) distribution can be found in R:

qchisq(p, v) gives \(\chi^2_{v,1-p}\)