A binomial experiment is when n Bernoulli experiments, each having prbability of success p, are performed independently.
The binomial random variable Y is the number of success in the n Bernoulli trials.
The parameters of a binomial random variable are n and p
Notation:\(Y\sim Bin(n,p)\).
The PMF:
\[ p(y)=P(Y-y)=\begin{pmatrix}n\\y\end{pmatrix}p^y(1-p^{n-y}),~~~y=0,1,2,3,4,...n \]
Example: Suppose 70% of all purchases in a certain store are made with a credit card. Let Y denote the number of credit card used in the next 10 purchases. What is \(P(5\leq Y \leq8)\)
\[ Y\sim Bin(10,0.7) \]
p=0.7
n=10
result=0
for (y in 5:8){
# R will computer the PMF, P(Y=y):
result=result+dbinom(y,n,p)
}
result
## [1] 0.8033427
# R will also computer the CDF, P(y<=y):
pbinom(8,n,p)-pbinom(4,n,p)
## [1] 0.8033427
The expected value and variance of \(Y\sim Bin(n,p)\) are
\[ E(Y)=np~~~~~~~\sigma_Y^2=np(1-p) \]
Example: what is the expected number of credit card purchases? What is the variance?
\[ E(Y)=n\times p=10\times 0.7=7\\ Var(Y)=10\times 0.7\times(1-0.7)=2.1 \]
Construct based on Bernoulli trails
In a geometric experiment, independent Bernoulli trails, each with probability of success p, are performed until the first success occurs.
The geometric random variable X is the number of trails up to and including the first success.
Notation: \(X\sim Geo(p)\)
The PMF:
\[ p(x)=P(X=x)=(1-p)^{x-1}p,~~~~~x=1,2,3.... \]
The \((1-p)^{x-1}\) is the probability of failure in x-1 trails.
The CDF: \[ F(x)=P(X\le x)=1-(1-p)^x~~~~~x=1,2,3,... \]
The expected value and variance of \(X\sim Geo(p)\) are
\[ E(X)={1\over p}~~~~~~~\sigma_X^2={1-p \over p^2} \]
Example: Suppose you need to find a store that carries a special printer ink. You know that of the stores that carry printer ink, 15% of them carry the special ink. You randomly call each store until one has the ink you need.
Let x denote the number of calls until you find the ink.
\[ X\sim Geo(0.15)\\ P(X=3)=(1-0.15)^20.15=0.108 \]
\[ P(X\le 3)=1-(1-0.15)^3=0.386 \]
\[ E(X)={1\over0.15}=6.67\\ \sigma_x^2={1-0.15\over 0.15^2}=37.78 \]
In a negative binomial experiment, independent Bernoulli trials, each with probability of success p, are performed until the rth success occurs.
The negative binomial random variable Y is the total number of trials up to and including the rth success.
Notation: \(Y\sim NB(r,p)\)
The sample space of Y is \(S_Y=\{r,r+1,r+2...\}\)
The PMF:
\[ p(y)=P(Y=y)=\begin{pmatrix}y-1\\r-1\end{pmatrix}p^r(1-p)^{y-r} \]
Example: An oil company conducts a geological study that indicates that an exploratory oil well should have a 20% chance of striking oil.
Let X denote the number of wells until first strike
\[ X\sim Geo(0.2) P(X=3)=(1-0.2)^20.2=0.128 \]
Let Y denote the number of wells until third strike.
\[ Y\sim NB(3,0.2)\\ P(Y=7)=\begin{pmatrix}6\\2\end{pmatrix}0.2^2(1-0.2)^{7-3}0.2=0.049 \]
y=7
r=3
# y-r is the number of failure to got the xth success
p=0.2
dnbinom(y-r,r,p)
## [1] 0.049152
The expected value and variance of \(Y\sim NB(r,p)\) are
\[ E(Y)={r\over p}~~~~~~~\sigma_Y^2={r(1-p)\over p^2} \]
\[ \mu_Y=E(Y)={3\over 0.2}=15 \]
An extension of geometric distribution, \(Geo(p)=NB(1,p)\)
Suppose a population consists of \(M_1\) (number of failures) objects labeled 1 and \(M_2\) (number of success) objects labeled 0, and that a sample of size n is selected at random without replacement.
The hypergeometric random variable X is the number of objects labeled 1 in the sample.
Notation: \(X\sim Hyp(M_1,M_2,n)\)
The PMF:
\[ p(x)=P(X=x)={\begin{pmatrix}M_1\\x\end{pmatrix}\begin{pmatrix}M_2\\n-x\end{pmatrix}\over\begin{pmatrix}M_1+M_2\\n\end{pmatrix}} \]
\(\begin{pmatrix}M_1\\x\end{pmatrix}\) is the number of ways getting x failures from \(M_1\) failures in the population
\(\begin{pmatrix}M_2\\n-x\end{pmatrix}\) is the number of ways getting n-x successes from \(M_2\) successes in the population
\(\begin{pmatrix}M_1+M_2\\n\end{pmatrix}\) is the number of ways getting n items from \(M_1+M_2\) items in the population
The sample space of X:
\[ S_X=\{max(0,n-M_2),....min(n-M_1)\} \]
Example: A crate contains 50 light bulbs of which 5 are defective and 45 are not. A quality control inspector randomly samples 4 bulbs without replace- ment. Let X be the number of defective bulbs in the sample.
\[ X\sim HyperG(M_1,M_2,n),M_1=5,M_2=45,n=4\\ S_X=\{0,1,2,3,4\},0=max(0,4-45),4=min(4,5) \]
Find the probability that less than 3 bulbs are defective.
\[ P(X=x)={\begin{pmatrix}5\\x\end{pmatrix}\begin{pmatrix}45\\4-x\end{pmatrix}\over\begin{pmatrix}50\\4\end{pmatrix}},~~~x=0,1,2,3,4\\ P(X<2)=P(X=0)+P(X=1)=P(X\leq 1)\\ \]
dhyper(0,5,45,4)+dhyper(1,5,45,4)
## [1] 0.9550369
phyper(1,5,45,4)
## [1] 0.9550369
The expected value and variance of \(X\sim Hyp(M_1,M_2<n)\) are
\[ E(X)={nM_1\over N}~~~~~~~\sigma_X^2={nM_1\over N}(i-{M_1\over N})({N-n\over N-1}) \]
Where \(N=M_1+M_2\)