From Binomial Distribution
\(X \sim Bin(n,p)\)
Consider a fixed number of independent Bernoulli trials with the same probability of success.
Binomial r.v.: Let X =the number of successes in n independent Bernoulli trials, each of which has prob. p to success.
Eg: the number heads when flip a coin twice, the number of boys among 100 newborns ….
…. CDF:
Motivating Example of Normal Distribution
The number of success in about Binomial distributions heaviliy depends on the number of trials, n.
Instead, if we consider the proportion of success,
\[ Y=X/n \]
its expected value is \(E(Y)=p\) no mater what n is.
And \(Var(Y)=p(1-p)/n\),$ Var (Y)$ decreases as n increases.
The discrete jumps in the cdf become continuous increments. So cdf is smooth for continuous random variable.
From discrete random variable, the jump size is the pmf, For continuous version,, we can take the derivative of cdf to define pdf.
For discrete r.v., the sum of jump size is 1. For continuous r.v., we have \(\int^{\infty}_{-\infty}f(x)dx=1\)
For continuous random variable:
\(f(x)\) is NOT the jump size but the rate of increase, \(f(x)=lim_{\epsilon->0}{P(x-\epsilon<X<x+\epsilon)\over 2\epsilon}\), which is \({dF(x)\over dx}\).
\(f(x)\neq P(X=x)\)
\(f(x)\) may not be smaller than 1.
\(P(a<X<b)=\int^b_af(x)dx\), the probability is the area under pdf curve.
\(P(X=x)=0\) for any given x.
\[ X\sim N(\mu,\sigma^2) \]
\[ f(x)={1\over\sqrt{2\pi\sigma^2}}e^{-{1\over2\sigma^2}(x-\mu)^2},~~~x\in R, \] where \(\mu\in R\) and \(\sigma^2>0\)
\[ F(x)=\int^x_{-\infty}f(t)dt,~~~x\in R \]
It can be shown that \(E(X)=\mu\) and \(Var(X)=\sigma^2\).
A normal density (pdf) is fully specified by only two parameters: \(\mu,\sigma^2\)
Every normal distribution is symmetric about its mean, μ.
Changing μ: shifts the “bell-shaped” curve left and right
Changing σ: shrinks/stretches the curve
There are an infinite number of normal distributions (just change the mean and variance/standard deviation).
\(\mu=0,\sigma^2=1\). denoted by \(Z\sim(0,1)\)
The standard normal distribution has pdf:
\[ f(x)={1\over\sqrt2\pi}e^{-{1\over2}x^2},~~~-\infty<x<\infty \]
We often use \(\phi\)(·) and \(\Phi\)(·) to denote the pdf and cdf of the standard normal distribution.
If X ∼N(μ,σ2), then \({X-\mu\over\sigma} \sim N(0,1)\), which is often denoted by Z.
cdf of Normal is in-explicit and can be evaluated using tables (referred to as Normal table or Z table) or computer. R: pnorm(x, mean, sd)
Classic values:
\(Z\sim N(0,1),P(Z<1)\)
pnorm(1,0,1)
## [1] 0.8413447
\(Z\sim N(0,1),P(-1<Z<1)\)
pnorm(1,0,1)-pnorm(-1,0,1)
## [1] 0.6826895
\(Z\sim N(0,1),P(-2<Z<2)\)
pnorm(2,0,1)-pnorm(-2,0,1)
## [1] 0.9544997
\(Z\sim N(0,1),P(-3<Z<3)\)
pnorm(3,0,1)-pnorm(-3,0,1)
## [1] 0.9973002
Obtain the cut-off values (or range) of the random variable to satisfy certain probability
Ex: Find the z-value such that P(Z < z) = 0.95
qnorm(0.95,0,1)
## [1] 1.644854
Ex: Find the interval [−z,z] s.t. P(−z < Z < z) = 0.90 solve z s.t. P(Z > z) = 0.05: z = 1.645
0.95? (z=1.960) 0.99? (z=2.576)
Ex: Let \(X \sim N(0.5,0.052)\). Find the interval [a,b] such that P(a < X < b)=0.95.
b=qnorm(0.975,0.5,0.05)
a=qnorm(0.025,0.5,0.05)
b
## [1] 0.5979982
a
## [1] 0.4020018
\({X-\mu\over\sigma}\sim N(0,1)\)
\(a+bX\sim N(a+b\mu,b^2\sigma^2)\)
Let \(x_\alpha\) denote the \((1 −\alpha)-100\)th percentile of X, and let \(z_\alpha\) denote the \((1 −\alpha)-100\)th percentile of Z. Then
\[ x_\alpha = \mu + \sigma z_\alpha \]
Example: Tire Tread Thickness
A machine manufactures tires with an initial tread thickness that is normally distributed with mean 10 mm and standard deviation 2 mm. The tire has a 50,000-mile warranty. In order to last for 50,000 miles the tread thickness must initially be at least 7.9 mm. If the initial thickness of tread is measured to be less than 7.9 mm, then the tire is sold as an alternative brand with a warranty of less than 50,000 miles.
Let X be the Tread Thickness
\[ P(X<7.9)= \]
pnorm(7.9,10,2)
## [1] 0.1468591
qnorm(0.3,10,2)
## [1] 8.951199
QQ plot is a plot to compare the empirical quantiles against theoretical quantiles. It is often used to check the distribution assumption.
The most commonly used QQ plot is QQ plot for checking normality, which is also referred to as Normal plot or Normal probability plot.
Standardize data to find z-scores. \(z_i\)
Rank the z-scores from the smallest to the largest. Use the rank to calculate the empirical quantile for each point. \(k/(n + 1) or (k −0.375)/(n + 1 −0.75)\)
Find the z-value from Normal Table corresponding to the empirical quantile of each point. \(z_i = \Phi−1(k/(n + 1))\).
Plot the z-scores in the first step against these z-values. (x-axis: ̃zi, y-axis: $$z_i)
One may replace Φ in step 3 by the cdf of any specified distribution, to check the distribution assumption.
qqnorm(x)
qqline(x)