Example from previous week.
Example: Let \(X_1, X_2\) be iid \(N(0, 1)\) and let \(Y = 4X_1 + X_2\).
\[ \begin{aligned} Var(Y)&=Var(4X_1+X_2)\\ &=Var(4X_1)+Var(X_2)+2Cov(4X_1,x_2)\\ &=4^2+Var(X_1)+Var(X_2)+0(from~iid)\\ &=4^2\times 1 + 1+0\\ &=17 \end{aligned} \]
\[ \begin{aligned} Cov(X_1,Y)&=Cov(X_1,4X_1+X_2)\\ &=Cov(X_1,4X_1)+Cov(X_1,X_2)\\ &=4Cov(X_1,X_1)+Cov(X_1,X_2)\\ &=4Var(X_1)+0\\ &=4+0\\ &=4 \end{aligned} \]
\[ \sigma(X_1)=1,\sigma(Y)=\sqrt{Var(Y)}=\sqrt{17}\\ Corr(X_1,Y)={Cov(X_1,Y)\over \sigma(X_1)\sigma(Y)}=0.97 \]
Note: Correlation and Covariance only measure linear dependence. Corr(X,Y)=0 doesn’t imply that X and Y are independent.
\(Corr(X,Y)\) is also written as \(\rho(X,Y)\) pearson correlation
If\((X_1,Y_1),...,(X_n,Y_n)\) is a sample from the bivariate distribution of (X,Y),
\[ S_{X,Y}={1\over n-1}\sum_{i=1}^n(X_i-\bar X)(Y_i-\bar Y) \]
\[ r_{X,Y}={S_{X,Y}\over S_X S_Y} \]
where \(\bar X\) and \(\bar Y\) are the sample means of X and Y and \(S_X\) and \(S_Y\) are the sample standard deviation of X and Y.
In R, the sample covariance is given by cov(x,y).
In R, the sample correlation coefficient is given by cor(x,y).
Example: Consider the data set below.
x | 10 11 18 16 21 y | 9 7 4 11 8
x<-c(10,11,18,16,21)
y<-c(9,7,4,11,8)
cov (x,y)
## [1] -2.45
cor(x,y)
## [1] -0.2031884