Lecture

Normal Distribution

From Binomial Distribution

$X \sim Bin(n,p)$

Consider a fixed number of independent Bernoulli trials with the same probability of success.
Binomial r.v.: Let X =the number of successes in n independent Bernoulli trials, each of which has prob. p to success.

Eg: the number heads when flip a coin twice, the number of boys among 100 newborns ….

PDF:$f(x)=P(X=x)=\begin{pmatrix}n\\x\end{pmatrix}p^xq^{n-x}$ for $x=0,1,2,3...n$

…. CDF:

Normal Distribution

Motivating Example of Normal Distribution

The number of success in about Binomial distributions heaviliy depends on the number of trials, n.
Instead, if we consider the proportion of success,

\[ Y=X/n \]

its expected value is $E(Y)=p$ no mater what n is.

And $Var(Y)=p(1-p)/n$,$ Var (Y)$ decreases as n increases.

From Discrete to Continuous Distribution

The discrete jumps in the cdf become continuous increments. So cdf is smooth for continuous random variable.
From discrete random variable, the jump size is the pmf, For continuous version,, we can take the derivative of cdf to define pdf.
For discrete r.v., the sum of jump size is 1. For continuous r.v., we have $\int^{\infty}_{-\infty}f(x)dx=1$

For continuous random variable:

$f(x)$ is NOT the jump size but the rate of increase, $f(x)=lim_{\epsilon->0}{P(x-\epsilon<X<x+\epsilon)\over 2\epsilon}$, which is ${dF(x)\over dx}$.
$f(x)\neq P(X=x)$
$f(x)$ may not be smaller than 1.
$P(a<X<b)=\int^b_af(x)dx$, the probability is the area under pdf curve.
$P(X=x)=0$ for any given x.

Normal Distribution

\[ X\sim N(\mu,\sigma^2) \]

Normal random variable: a continuous random variable with pdf

\[ f(x)={1\over\sqrt{2\pi\sigma^2}}e^{-{1\over2\sigma^2}(x-\mu)^2},~~~x\in R, \] where $\mu\in R$ and $\sigma^2>0$

CDF:

\[ F(x)=\int^x_{-\infty}f(t)dt,~~~x\in R \]

It can be shown that $E(X)=\mu$ and $Var(X)=\sigma^2$.
A normal density (pdf) is fully specified by only two parameters: $\mu,\sigma^2$
Every normal distribution is symmetric about its mean, μ.
Changing μ: shifts the “bell-shaped” curve left and right
Changing σ: shrinks/stretches the curve
There are an infinite number of normal distributions (just change the mean and variance/standard deviation).

Standard Normal:

$\mu=0,\sigma^2=1$. denoted by $Z\sim(0,1)$

The standard normal distribution has pdf:

\[ f(x)={1\over\sqrt2\pi}e^{-{1\over2}x^2},~~~-\infty<x<\infty \]

We often use $\phi$(·) and $\Phi$(·) to denote the pdf and cdf of the standard normal distribution.
If X ∼N(μ,σ2), then ${X-\mu\over\sigma} \sim N(0,1)$, which is often denoted by Z.
cdf of Normal is in-explicit and can be evaluated using tables (referred to as Normal table or Z table) or computer. R: pnorm(x, mean, sd)

Calculation

Classic values:

$Z\sim N(0,1),P(Z<1)$

pnorm(1,0,1)

## [1] 0.8413447

$Z\sim N(0,1),P(-1<Z<1)$

pnorm(1,0,1)-pnorm(-1,0,1)

## [1] 0.6826895

$Z\sim N(0,1),P(-2<Z<2)$

pnorm(2,0,1)-pnorm(-2,0,1)

## [1] 0.9544997

$Z\sim N(0,1),P(-3<Z<3)$

pnorm(3,0,1)-pnorm(-3,0,1)

## [1] 0.9973002

Obtain the cut-off values (or range) of the random variable to satisfy certain probability
Ex: Find the z-value such that P(Z < z) = 0.95

qnorm(0.95,0,1)

## [1] 1.644854

Ex: Find the interval [−z,z] s.t. P(−z < Z < z) = 0.90 solve z s.t. P(Z > z) = 0.05: z = 1.645
0.95? (z=1.960) 0.99? (z=2.576)
Ex: Let $X \sim N(0.5,0.052)$. Find the interval [a,b] such that P(a < X < b)=0.95.

b=qnorm(0.975,0.5,0.05)
a=qnorm(0.025,0.5,0.05)
b

## [1] 0.5979982

## [1] 0.4020018

Properties of Normal

${X-\mu\over\sigma}\sim N(0,1)$
$a+bX\sim N(a+b\mu,b^2\sigma^2)$
Let $x_\alpha$ denote the $(1 −\alpha)-100$th percentile of X, and let $z_\alpha$ denote the $(1 −\alpha)-100$th percentile of Z. Then

\[ x_\alpha = \mu + \sigma z_\alpha \]

Example: Tire Tread Thickness

A machine manufactures tires with an initial tread thickness that is normally distributed with mean 10 mm and standard deviation 2 mm. The tire has a 50,000-mile warranty. In order to last for 50,000 miles the tread thickness must initially be at least 7.9 mm. If the initial thickness of tread is measured to be less than 7.9 mm, then the tire is sold as an alternative brand with a warranty of less than 50,000 miles.

Find the proportion of tires sold under the alternative brand.

Let X be the Tread Thickness

\[ P(X<7.9)= \]

pnorm(7.9,10,2)

## [1] 0.1468591

The demand for the alternative brand of tires is such that 30% of the total output should be sold under the alternative brand name. What should the critical thickness, originally 7.9 mm, be set at in order to meet the demand?

qnorm(0.3,10,2)

## [1] 8.951199

Checking Normality: QQ plot

QQ plot is a plot to compare the empirical quantiles against theoretical quantiles. It is often used to check the distribution assumption.

The most commonly used QQ plot is QQ plot for checking normality, which is also referred to as Normal plot or Normal probability plot.

Standardize data to find z-scores. $z_i$
Rank the z-scores from the smallest to the largest. Use the rank to calculate the empirical quantile for each point. $k/(n + 1) or (k −0.375)/(n + 1 −0.75)$
Find the z-value from Normal Table corresponding to the empirical quantile of each point. $z_i = \Phi−1(k/(n + 1))$.
Plot the z-scores in the first step against these z-values. (x-axis: ̃zi, y-axis: $$z_i)

One may replace Φ in step 3 by the cdf of any specified distribution, to check the distribution assumption.

qqnorm(x)
qqline(x)

Lecture_15

Zheyuan Wu

2023-02-22

Normal Distribution

Normal Distribution

From Discrete to Continuous Distribution

Normal Distribution

Standard Normal:

Calculation

Properties of Normal

Checking Normality: QQ plot