extra office hour from undergraduates next week.
\[ \mu = {1\over{N}}\sum^n_{i=1}{v_i} \]
you may keep two digit for accuracy in homework.
\[ \bar{x} = {1\over{n}}\sum^n_{i=1}{x_i} \]
\[ \sigma^2={1\over{N}}\sum^N_{i=1}{(v_i-\mu)^2} \]
The square function is simpler than absolute value (square function is differentiable and looks nice.)
We can also demote the population variance as \(\sigma^2_X\) or Var(X), the variance of the random variable X.
The population standard deviation is the square root of the population variance.
\[ \sigma = \sqrt{\sigma^2} \]
unit of variance: (original unit)^2
unit of standard deviation: original unit
we use n-1 to make estimation unbiaseness of S^2 for ^2. (we don’t have full scope for the data anymore when we use sample to esitmate population)
if all the sample response are the same, S=0. (no variability in data)
Sample data
temperature<-c(1,6,3,4,5,2,7)
mean(temperature)
## [1] 4
var(temperature)
## [1] 4.666667
sd(temperature)
## [1] 2.160247
The \(x_{(i)}\) is the 100(\({i-0.5}\over n\))-th sample percentile. The 0.5 here is used to avoid 100 percentile.
sort(temperature)
## [1] 1 2 3 4 5 6 7
Some percentiles hare of particular interest:
quantile(temperature,0.25)
## 25%
## 2.5
quantile(temperature,0.75)
## 75%
## 5.5
# median
quantile(temperature,0.5)
## 50%
## 4
median(temperature)
## [1] 4
summary(temperature)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 2.5 4.0 4.0 5.5 7.0
sample average is sensitive for outliers, while the median is less sensitive to outliers.
The sample interquartile range (IQR) is an alternative measure of variability. \[ IQR=q_3-q_1 \] Boxplots
\[ max=min(q_3+1.5\times IQR,actual\ max) \\ min=max(q_1-1.5\times IQR,actual\ min) \] exceptions are called outliers.