Recap for Law of Large numbers

The Central Limit Theorem

  • In general, finding the exact distribution of \(\hat X\) is difficult.

  • However, for large sample sizes, the distribution of \(\hat X\) can be approximated

Central Limit Theorem (CLT): Let \(X_1,...X_n\) be iid with finite mean \(\mu\) and finite variance \(\sigma^2\). Then for large enough n,

  1. $X $ has approximately a normal distribution with mean \(\mu\) and variance \(\sigma^2\over n\)

\[ \bar X\dot\sim N(\mu,{\sigma^2\over n}) \]

  1. \(T=X_1+...+X_n\) has approximately a normal distribution with mean \(n\mu\) and variance \(n\mu\).

\[ T=X_1+...+X_n\dot\sim N(n\mu,n\sigma^2) \]

approximate/estimate mean would be better as n increase, In practice, worked well for \(n\geq 30\).

Key result we use for inference

Example: The level of impurity in a randomly selected batch of chemicals is a random variable with \(\mu\) = 4.0% and \(\sigma\) = 1.5%. For a random sample of 50 batches, find

  1. an approximation to the probability that the average level of impurity is between 3.5% and 3.8%.

Let \(X_1,..,X_{50}\) be the impurety level in the batches.

\(X_1,..,X_{50}\) are iid with mean \(\mu=4\%\), and \(\sigma^2=(1.5\%)^2\)

By Central Limit Theorem: \(\bar X \dot\sim N(0.04,{0.015^2\over 50})\)

\[ P(3.5\% < \bar X<3.8\%)=0.1636782 \]

pnorm(3.8,4,sqrt(1.5^2/50))-pnorm(3.5,4,sqrt(1.5^2/50))
## [1] 0.1636782
  1. an approximation to the 95th percentile of the average impurity level.
qnorm(0.95,4,sqrt(1.5^2/50))
## [1] 4.348926

Remark \(\bar X\) is more concentrated around \(\mu\) than individual \(X_i\).

Normal Approximation to the Binomial

DeMoivre-Laplace Theorem: If \(T\sim Bin (n,p)\) then, for large enought n,

\[ T\dot\sim N(np,np(1-p)) \]

Recall \(T=\sum^n_{i=1}X_i\), where X: iid \(Ber(p)\)

with \(E(X_i)=p, Var(X_i)=p(1-p)\). if \(n\geq 30\)

Continuity correlation: Let \(T\sim Bin(n,p)\) and let \(Y\sim N(np,np(1-p))\).

The condition for CLT is \(np\geq 5\), and \(n(1-p)\geq 5\), derived from \(n\geq 30\)

To correct the approximation using a continuous random variable to a discrete random variable.

  • \(P(T\leq k)\approx P(Y\leq k+0.5)\),\(T\sim Bin(n,p),Y\sim N(np,np(1-p))\)

  • \(P(T= k)\approx P(k-0.5< Y < k+0.5)\)

  • \(P(T\geq k)\approx P(Y\geq k-0.5)\)

Example: Suppose that only 60% of all drivers wear seat belts at all times. In a random sample of 500 drivers let X denote the number of drivers who wear seat belt at all times.

  1. State the exact distribution of X and use R to find \(P(270 \leq X \leq 320)\).

\(X\sim Bin(n,p)\) where \(n=500,p=0.6\)

pbinom(320,500,0.6)-pbinom(269,500,0.6)
## [1] 0.9671151
# use 269 to include the 270.
  1. Approximate \(P(270 \leq X \leq 320)\) using the continuity correction.

By Central Limit Theorem, \(X\dot \sim N(np,np(1-p))\), with \(np=500\times 0.6,np(1-p)=500\times (1-0.6)\times 0.6=Var\)

\[ \approx P(270\leq X\leq 320)\\ =P(X<320.5)-P(X<269.5) \]

In R

pnorm(320.5,500*0.6,sqrt(500*0.6*0.4))-pnorm(269.5,500*0.6,sqrt(500*0.6*0.4))
## [1] 0.9666716

When sample size is low, the result might not be as valid as expected.

Use big sample for approximation.