The mean square error (MSE) of an estimator \(\hat \theta\) for \(\theta\) is
\[ \begin{aligned} MSE(\hat \theta)&=E_\theta[(\hat \theta - \theta)^2]\\ &=E((\hat \theta-E(\hat \theta)+E(\hat \theta)-\theta)^2)\\ &=E[(\hat\theta-E(\hat \theta))^2+(E(\hat\theta)-\theta)^2-2((\hat\theta-E(\hat \theta))\times (E(\hat\theta)-\theta))]\\ &=Var_\theta(\hat \theta)+[bias(\hat \theta)]^2 \end{aligned} \]
Note: The MSE can also be expressed as
\[ MSE(\hat\theta)=Var_\theta(\hat \theta)+[bias(\hat \theta)]^2 \]
For unbiased estimator:
For unbiased estimators, the preferred estimator is the one with the smallest variance. Even it have larger bias
Method of moments estimation is a model based estimation technique.
It equates theoretical moments with sample moments/
This yields a set of equation to solve for the estimators.
The kth theoretical moment
\[ \mu_k=E(X^k) \]
The kth sample moment:
\[ \hat \mu_k={1\over n}\sum^n_i=1{X_i^k} \]
Example: Let \(X_1,...X_n\) be iid \(Poisson(\lambda)\).
k=1, only need to use the \(1^{th}\) moment.
\(\mu_1=E(X_i)\), for \(i=1,...,n\), Note \(E(X_i)=\lambda\)
\(\hat \mu_1={1\over n}\sum^n_{i=1}{X_i^k}\)
Setting \(\hat \mu=\bar X\) is the moment estimator for \(\lambda\)
DNE.
Example: Let \(X_1,...X_n\) be iid from the uniform on \([\alpha, \beta]\) distribution. Find the method of moment estimator of \(\alpha\) and \(\beta\) using
\[ \mu_1 = E(X_i)={\alpha+\beta\over 2}\\ \mu_2=E(X_i^2)=Var(X_i)+[bias(X_i)]^2={(\beta-\alpha)^2\over 12}+({\alpha+\beta\over 2})^2\\ \]
Setting \(\hat \mu_1={1\over n}\sum^n_{i=i}{X_i}\), \(\hat \mu_1={1\over n}\sum^n_{i=1}{X_i^2}\)
Then solve the equation.
Note: The textbook uses a slightly different approach for the second sample moment.
Maximum likelihood (ML) estimation is a model based estimation technique.
Answers the question “what value of the parameter is most likely to have generated the data?”
Maximizes the probability of the observed data.
Suppose \(X_1, . . . , X_n\) are iid random variables with PDF \(f(x_i| \theta)\) or PMF \(p(x_i| \theta)\).
The likelihood function is
\[ lik(\theta)=\prod^n_{i=1}f(x_i|\theta)~or~lik(\theta)=\prod^n_{i=1}p(x_i|\theta) \]
Treat as a function of \(\theta\), Joint pdf of \(X_1, . . . , X_n\) evaluated at the observed values: \(X_1, . . . , X_n\)
The value of θ that maximizes \(lik(\theta)\) is the maximum likelihood estimator \((MLE) \hat \theta\).
It is often easier to maximize the log of the likelihood function.
\[ \mathcal{L} (\theta)=\sum^n-{i=1}log[f(x_i|\theta)] \]
Example: Let \(X_1, . . . , X_n\) be iid \(N(\theta,1)\). Find the
Recall: \(f(x_i|\theta)={1\over \sqrt{2\pi\sigma^2}}exp\{-{1\over 2\sigma^2}(x_1-\theta)^2\}\)
\(={1\over \sqrt{2\pi}}exp\{-{1\over 2}(x_1-\theta)^2\}\)
\[ log[f(x_i|\theta)]=-{1\over2}(x_1-\theta)^2+log({1\over \sqrt{2\pi}}) \]
\[ {\partial \mathcal{L}(\theta)\over\partial\theta}={1\over2}\sum^n_{i=1}2(x_i-\theta)(-1)=\sum^n_{i=1}x_i-n\theta \]
\[ {\partial^2 \mathcal{L}(\theta)\over\partial^2\theta}=-n<0 \]
so
\[ \hat \theta={1\over n}\sum^n_{i=1}x_i \]
maximize the likelihood function. Hence, \(\hat \theta=\bar X\) is the MLE