In general, finding the exact distribution of \(\hat X\) is difficult.
However, for large sample sizes, the distribution of \(\hat X\) can be approximated
Central Limit Theorem (CLT): Let \(X_1,...X_n\) be iid with finite mean \(\mu\) and finite variance \(\sigma^2\). Then for large enough n,
\[ \bar X\dot\sim N(\mu,{\sigma^2\over n}) \]
\[ T=X_1+...+X_n\dot\sim N(n\mu,n\sigma^2) \]
approximate/estimate mean would be better as n increase, In practice, worked well for \(n\geq 30\).
Key result we use for inference
Example: The level of impurity in a randomly selected batch of chemicals is a random variable with \(\mu\) = 4.0% and \(\sigma\) = 1.5%. For a random sample of 50 batches, find
Let \(X_1,..,X_{50}\) be the impurety level in the batches.
\(X_1,..,X_{50}\) are iid with mean \(\mu=4\%\), and \(\sigma^2=(1.5\%)^2\)
By Central Limit Theorem: \(\bar X \dot\sim N(0.04,{0.015^2\over 50})\)
\[ P(3.5\% < \bar X<3.8\%)=0.1636782 \]
pnorm(3.8,4,sqrt(1.5^2/50))-pnorm(3.5,4,sqrt(1.5^2/50))
## [1] 0.1636782
qnorm(0.95,4,sqrt(1.5^2/50))
## [1] 4.348926
Remark \(\bar X\) is more concentrated around \(\mu\) than individual \(X_i\).
DeMoivre-Laplace Theorem: If \(T\sim Bin (n,p)\) then, for large enought n,
\[ T\dot\sim N(np,np(1-p)) \]
Recall \(T=\sum^n_{i=1}X_i\), where X: iid \(Ber(p)\)
with \(E(X_i)=p, Var(X_i)=p(1-p)\). if \(n\geq 30\)
Continuity correlation: Let \(T\sim Bin(n,p)\) and let \(Y\sim N(np,np(1-p))\).
The condition for CLT is \(np\geq 5\), and \(n(1-p)\geq 5\), derived from \(n\geq 30\)
To correct the approximation using a continuous random variable to a discrete random variable.
\(P(T\leq k)\approx P(Y\leq k+0.5)\),\(T\sim Bin(n,p),Y\sim N(np,np(1-p))\)
\(P(T= k)\approx P(k-0.5< Y < k+0.5)\)
\(P(T\geq k)\approx P(Y\geq k-0.5)\)
Example: Suppose that only 60% of all drivers wear seat belts at all times. In a random sample of 500 drivers let X denote the number of drivers who wear seat belt at all times.
\(X\sim Bin(n,p)\) where \(n=500,p=0.6\)
pbinom(320,500,0.6)-pbinom(269,500,0.6)
## [1] 0.9671151
# use 269 to include the 270.
By Central Limit Theorem, \(X\dot \sim N(np,np(1-p))\), with \(np=500\times 0.6,np(1-p)=500\times (1-0.6)\times 0.6=Var\)
\[ \approx P(270\leq X\leq 320)\\ =P(X<320.5)-P(X<269.5) \]
In R
pnorm(320.5,500*0.6,sqrt(500*0.6*0.4))-pnorm(269.5,500*0.6,sqrt(500*0.6*0.4))
## [1] 0.9666716
When sample size is low, the result might not be as valid as expected.
Use big sample for approximation.