Normal Distribution

From Binomial Distribution

\(X \sim Bin(n,p)\)

Eg: the number heads when flip a coin twice, the number of boys among 100 newborns ….

…. CDF:

Normal Distribution

Motivating Example of Normal Distribution

\[ Y=X/n \]

its expected value is \(E(Y)=p\) no mater what n is.

And \(Var(Y)=p(1-p)/n\),$ Var (Y)$ decreases as n increases.

From Discrete to Continuous Distribution

  • The discrete jumps in the cdf become continuous increments. So cdf is smooth for continuous random variable.

  • From discrete random variable, the jump size is the pmf, For continuous version,, we can take the derivative of cdf to define pdf.

  • For discrete r.v., the sum of jump size is 1. For continuous r.v., we have \(\int^{\infty}_{-\infty}f(x)dx=1\)

For continuous random variable:

  • \(f(x)\) is NOT the jump size but the rate of increase, \(f(x)=lim_{\epsilon->0}{P(x-\epsilon<X<x+\epsilon)\over 2\epsilon}\), which is \({dF(x)\over dx}\).

  • \(f(x)\neq P(X=x)\)

  • \(f(x)\) may not be smaller than 1.

  • \(P(a<X<b)=\int^b_af(x)dx\), the probability is the area under pdf curve.

  • \(P(X=x)=0\) for any given x.

Normal Distribution

\[ X\sim N(\mu,\sigma^2) \]

\[ f(x)={1\over\sqrt{2\pi\sigma^2}}e^{-{1\over2\sigma^2}(x-\mu)^2},~~~x\in R, \] where \(\mu\in R\) and \(\sigma^2>0\)

\[ F(x)=\int^x_{-\infty}f(t)dt,~~~x\in R \]

Standard Normal:

\(\mu=0,\sigma^2=1\). denoted by \(Z\sim(0,1)\)

The standard normal distribution has pdf:

\[ f(x)={1\over\sqrt2\pi}e^{-{1\over2}x^2},~~~-\infty<x<\infty \]

  • We often use \(\phi\)(·) and \(\Phi\)(·) to denote the pdf and cdf of the standard normal distribution.

  • If X ∼N(μ,σ2), then \({X-\mu\over\sigma} \sim N(0,1)\), which is often denoted by Z.

  • cdf of Normal is in-explicit and can be evaluated using tables (referred to as Normal table or Z table) or computer. R: pnorm(x, mean, sd)

Calculation

Classic values:

\(Z\sim N(0,1),P(Z<1)\)

pnorm(1,0,1)
## [1] 0.8413447

\(Z\sim N(0,1),P(-1<Z<1)\)

pnorm(1,0,1)-pnorm(-1,0,1)
## [1] 0.6826895

\(Z\sim N(0,1),P(-2<Z<2)\)

pnorm(2,0,1)-pnorm(-2,0,1)
## [1] 0.9544997

\(Z\sim N(0,1),P(-3<Z<3)\)

pnorm(3,0,1)-pnorm(-3,0,1)
## [1] 0.9973002
  • Obtain the cut-off values (or range) of the random variable to satisfy certain probability

  • Ex: Find the z-value such that P(Z < z) = 0.95

qnorm(0.95,0,1)
## [1] 1.644854
  • Ex: Find the interval [−z,z] s.t. P(−z < Z < z) = 0.90 solve z s.t. P(Z > z) = 0.05: z = 1.645

  • 0.95? (z=1.960) 0.99? (z=2.576)

  • Ex: Let \(X \sim N(0.5,0.052)\). Find the interval [a,b] such that P(a < X < b)=0.95.

b=qnorm(0.975,0.5,0.05)
a=qnorm(0.025,0.5,0.05)
b
## [1] 0.5979982
a
## [1] 0.4020018

Properties of Normal

  • \({X-\mu\over\sigma}\sim N(0,1)\)

  • \(a+bX\sim N(a+b\mu,b^2\sigma^2)\)

  • Let \(x_\alpha\) denote the \((1 −\alpha)-100\)th percentile of X, and let \(z_\alpha\) denote the \((1 −\alpha)-100\)th percentile of Z. Then

\[ x_\alpha = \mu + \sigma z_\alpha \]

Example: Tire Tread Thickness

A machine manufactures tires with an initial tread thickness that is normally distributed with mean 10 mm and standard deviation 2 mm. The tire has a 50,000-mile warranty. In order to last for 50,000 miles the tread thickness must initially be at least 7.9 mm. If the initial thickness of tread is measured to be less than 7.9 mm, then the tire is sold as an alternative brand with a warranty of less than 50,000 miles.

  • Find the proportion of tires sold under the alternative brand.

Let X be the Tread Thickness

\[ P(X<7.9)= \]

pnorm(7.9,10,2)
## [1] 0.1468591
  • The demand for the alternative brand of tires is such that 30% of the total output should be sold under the alternative brand name. What should the critical thickness, originally 7.9 mm, be set at in order to meet the demand?
qnorm(0.3,10,2)
## [1] 8.951199

Checking Normality: QQ plot

QQ plot is a plot to compare the empirical quantiles against theoretical quantiles. It is often used to check the distribution assumption.

The most commonly used QQ plot is QQ plot for checking normality, which is also referred to as Normal plot or Normal probability plot.

  1. Standardize data to find z-scores. \(z_i\)

  2. Rank the z-scores from the smallest to the largest. Use the rank to calculate the empirical quantile for each point. \(k/(n + 1) or (k −0.375)/(n + 1 −0.75)\)

  3. Find the z-value from Normal Table corresponding to the empirical quantile of each point. \(z_i = \Phi−1(k/(n + 1))\).

  4. Plot the z-scores in the first step against these z-values. (x-axis: ̃zi, y-axis: $$z_i)

One may replace Φ in step 3 by the cdf of any specified distribution, to check the distribution assumption.

qqnorm(x)
qqline(x)