extra office hour from undergraduates next week.

Averages

\[ \mu = {1\over{N}}\sum^n_{i=1}{v_i} \]

you may keep two digit for accuracy in homework.

\[ \bar{x} = {1\over{n}}\sum^n_{i=1}{x_i} \]

Variance and standard deviation

\[ \sigma^2={1\over{N}}\sum^N_{i=1}{(v_i-\mu)^2} \]

The square function is simpler than absolute value (square function is differentiable and looks nice.)

\[ \sigma = \sqrt{\sigma^2} \]

unit of variance: (original unit)^2
unit of standard deviation: original unit

we use n-1 to make estimation unbiaseness of S^2 for ^2. (we don’t have full scope for the data anymore when we use sample to esitmate population)

if all the sample response are the same, S=0. (no variability in data)

In R

Sample data

temperature<-c(1,6,3,4,5,2,7)
mean(temperature)
## [1] 4
var(temperature)
## [1] 4.666667
sd(temperature)
## [1] 2.160247

Percentile

The \(x_{(i)}\) is the 100(\({i-0.5}\over n\))-th sample percentile. The 0.5 here is used to avoid 100 percentile.

In R

sort(temperature)
## [1] 1 2 3 4 5 6 7

Some percentiles hare of particular interest:

  • 50th percentile = median = \(\widetilde{x}\)
  • 25th percentile = lower sample quantile = \(q_1\)
  • 75th percentile = higher sample quantile = \(q_3\)

In R

quantile(temperature,0.25)
## 25% 
## 2.5
quantile(temperature,0.75)
## 75% 
## 5.5
# median
quantile(temperature,0.5)
## 50% 
##   4
median(temperature)
## [1] 4
  • The R command summary provides five number summary for the data \(min,q1,median,q2,max\)
summary(temperature)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     2.5     4.0     4.0     5.5     7.0

sample average is sensitive for outliers, while the median is less sensitive to outliers.

The sample interquartile range (IQR) is an alternative measure of variability. \[ IQR=q_3-q_1 \] Boxplots

\[ max=min(q_3+1.5\times IQR,actual\ max) \\ min=max(q_1-1.5\times IQR,actual\ min) \] exceptions are called outliers.