Lecture

Mean Square Error

The mean square error (MSE) of an estimator \(\hat \theta\) for \(\theta\) is

\[ \begin{aligned} MSE(\hat \theta)&=E_\theta[(\hat \theta - \theta)^2]\\ &=E((\hat \theta-E(\hat \theta)+E(\hat \theta)-\theta)^2)\\ &=E[(\hat\theta-E(\hat \theta))^2+(E(\hat\theta)-\theta)^2-2((\hat\theta-E(\hat \theta))\times (E(\hat\theta)-\theta))]\\ &=Var_\theta(\hat \theta)+[bias(\hat \theta)]^2 \end{aligned} \]

If there are multiple estmator for \(\theta\), the estimatro with the smallest MSE is preferred.

Note: The MSE can also be expressed as

\[ MSE(\hat\theta)=Var_\theta(\hat \theta)+[bias(\hat \theta)]^2 \]

For unbiased estimator:

For unbiased estimators, the preferred estimator is the one with the smallest variance. Even it have larger bias

Method of Moments Estimation

Method of moments estimation is a model based estimation technique.

It equates theoretical moments with sample moments/
This yields a set of equation to solve for the estimators.

The kth theoretical moment

\[ \mu_k=E(X^k) \]

The kth sample moment:

\[ \hat \mu_k={1\over n}\sum^n_i=1{X_i^k} \]

Example: Let \(X_1,...X_n\) be iid \(Poisson(\lambda)\).

Find the method of moments estimator of \(\lambda\).

k=1, only need to use the \(1^{th}\) moment.

\(\mu_1=E(X_i)\), for \(i=1,...,n\), Note \(E(X_i)=\lambda\)

\(\hat \mu_1={1\over n}\sum^n_{i=1}{X_i^k}\)

Setting \(\hat \mu=\bar X\) is the moment estimator for \(\lambda\)

Is the estimator unbiased?

DNE.

Example: Let \(X_1,...X_n\) be iid from the uniform on \([\alpha, \beta]\) distribution. Find the method of moment estimator of \(\alpha\) and \(\beta\) using

\[ \mu_1 = E(X_i)={\alpha+\beta\over 2}\\ \mu_2=E(X_i^2)=Var(X_i)+[bias(X_i)]^2={(\beta-\alpha)^2\over 12}+({\alpha+\beta\over 2})^2\\ \]

Setting \(\hat \mu_1={1\over n}\sum^n_{i=i}{X_i}\), \(\hat \mu_1={1\over n}\sum^n_{i=1}{X_i^2}\)

Then solve the equation.

Note: The textbook uses a slightly different approach for the second sample moment.

Maimum Likelihood Estimation

Maximum likelihood (ML) estimation is a model based estimation technique.

Answers the question “what value of the parameter is most likely to have generated the data?”
Maximizes the probability of the observed data.

Suppose \(X_1, . . . , X_n\) are iid random variables with PDF \(f(x_i| \theta)\) or PMF \(p(x_i| \theta)\).

The likelihood function is

\[ lik(\theta)=\prod^n_{i=1}f(x_i|\theta)~or~lik(\theta)=\prod^n_{i=1}p(x_i|\theta) \]

Treat as a function of \(\theta\), Joint pdf of \(X_1, . . . , X_n\) evaluated at the observed values: \(X_1, . . . , X_n\)

The value of θ that maximizes \(lik(\theta)\) is the maximum likelihood estimator \((MLE) \hat \theta\).
It is often easier to maximize the log of the likelihood function.

\[ \mathcal{L} (\theta)=\sum^n-{i=1}log[f(x_i|\theta)] \]

Example: Let \(X_1, . . . , X_n\) be iid \(N(\theta,1)\). Find the

Recall: \(f(x_i|\theta)={1\over \sqrt{2\pi\sigma^2}}exp\{-{1\over 2\sigma^2}(x_1-\theta)^2\}\)

\(={1\over \sqrt{2\pi}}exp\{-{1\over 2}(x_1-\theta)^2\}\)

\[ log[f(x_i|\theta)]=-{1\over2}(x_1-\theta)^2+log({1\over \sqrt{2\pi}}) \]

\[ {\partial \mathcal{L}(\theta)\over\partial\theta}={1\over2}\sum^n_{i=1}2(x_i-\theta)(-1)=\sum^n_{i=1}x_i-n\theta \]

\[ {\partial^2 \mathcal{L}(\theta)\over\partial^2\theta}=-n<0 \]

\[ \hat \theta={1\over n}\sum^n_{i=1}x_i \]

maximize the likelihood function. Hence, \(\hat \theta=\bar X\) is the MLE

Lecture_25

Zheyuan Wu

2023-03-24

Mean Square Error

Method of Moments Estimation

Maimum Likelihood Estimation