Lecture

Recap for Law of Large numbers

The Central Limit Theorem

In general, finding the exact distribution of $\hat X$ is difficult.
However, for large sample sizes, the distribution of $\hat X$ can be approximated

Central Limit Theorem (CLT): Let $X_1,...X_n$ be iid with finite mean $\mu$ and finite variance $\sigma^2$. Then for large enough n,

$X $ has approximately a normal distribution with mean $\mu$ and variance $\sigma^2\over n$

\[ \bar X\dot\sim N(\mu,{\sigma^2\over n}) \]

$T=X_1+...+X_n$ has approximately a normal distribution with mean $n\mu$ and variance $n\mu$.

\[ T=X_1+...+X_n\dot\sim N(n\mu,n\sigma^2) \]

approximate/estimate mean would be better as n increase, In practice, worked well for $n\geq 30$.

Key result we use for inference

Example: The level of impurity in a randomly selected batch of chemicals is a random variable with $\mu$ = 4.0% and $\sigma$ = 1.5%. For a random sample of 50 batches, find

an approximation to the probability that the average level of impurity is between 3.5% and 3.8%.

Let $X_1,..,X_{50}$ be the impurety level in the batches.

$X_1,..,X_{50}$ are iid with mean $\mu=4\%$, and $\sigma^2=(1.5\%)^2$

By Central Limit Theorem: $\bar X \dot\sim N(0.04,{0.015^2\over 50})$

\[ P(3.5\% < \bar X<3.8\%)=0.1636782 \]

pnorm(3.8,4,sqrt(1.5^2/50))-pnorm(3.5,4,sqrt(1.5^2/50))

## [1] 0.1636782

an approximation to the 95th percentile of the average impurity level.

qnorm(0.95,4,sqrt(1.5^2/50))

## [1] 4.348926

Remark $\bar X$ is more concentrated around $\mu$ than individual $X_i$.

Normal Approximation to the Binomial

DeMoivre-Laplace Theorem: If $T\sim Bin (n,p)$ then, for large enought n,

\[ T\dot\sim N(np,np(1-p)) \]

Recall $T=\sum^n_{i=1}X_i$, where X: iid $Ber(p)$

with $E(X_i)=p, Var(X_i)=p(1-p)$. if $n\geq 30$

Continuity correlation: Let $T\sim Bin(n,p)$ and let $Y\sim N(np,np(1-p))$.

The condition for CLT is $np\geq 5$, and $n(1-p)\geq 5$, derived from $n\geq 30$

To correct the approximation using a continuous random variable to a discrete random variable.

$P(T\leq k)\approx P(Y\leq k+0.5)$,$T\sim Bin(n,p),Y\sim N(np,np(1-p))$
$P(T= k)\approx P(k-0.5< Y < k+0.5)$
$P(T\geq k)\approx P(Y\geq k-0.5)$

Example: Suppose that only 60% of all drivers wear seat belts at all times. In a random sample of 500 drivers let X denote the number of drivers who wear seat belt at all times.

State the exact distribution of X and use R to find $P(270 \leq X \leq 320)$.

$X\sim Bin(n,p)$ where $n=500,p=0.6$

pbinom(320,500,0.6)-pbinom(269,500,0.6)

## [1] 0.9671151

# use 269 to include the 270.

Approximate $P(270 \leq X \leq 320)$ using the continuity correction.

By Central Limit Theorem, $X\dot \sim N(np,np(1-p))$, with $np=500\times 0.6,np(1-p)=500\times (1-0.6)\times 0.6=Var$

\[ \approx P(270\leq X\leq 320)\\ =P(X<320.5)-P(X<269.5) \]

In R

pnorm(320.5,500*0.6,sqrt(500*0.6*0.4))-pnorm(269.5,500*0.6,sqrt(500*0.6*0.4))

## [1] 0.9666716

When sample size is low, the result might not be as valid as expected.

Use big sample for approximation.

Lecture_23

Zheyuan Wu

2023-03-20

Recap for Law of Large numbers

The Central Limit Theorem

Normal Approximation to the Binomial