STAT 1010 - Fall 2022
By the end of this lesson you should:
Understand measures of center and know how to compute them
Understand measures of spread and know how to compute them
Graphical representation and interpretation of one numerical variable in R
A numeric or quantitative variable is a variable that can be measured.
EDA is about learning the structure of a dataset through a series of numerical and graphical techniques.
When you do EDA, you’ll look for both
general trends and
interesting outliers in your data.
generate questions that will help inform subsequent analysis.
Mean
Median (\(50\%th\) percentile)
\(\overline{var} =\frac{the\; sum\; of\; all\; the\; observations\; in\; the \; var}{the \; number \;of \;observations}\)
Mean - sensitive to outliers and skewed distributions
Median - more stable estimate
Mode - most common value
Percentiles
quartiles
Interquartile range
variance \(s^2\)
standard deviation \(s\)
range
\(s_{var}^2 =\frac{(observation\; 1 \; in\; the \; var - \overline{var})^2 + (observation\; 2 \; in\; the \; var - \overline{var})^2 + \dots + (observation\; n \; in\; the \; var - \overline{var})^2}{the \; number \;of \;observations - 1}\)
\(s_{var} =\sqrt{\frac{(observation\; 1 \; in\; the \; var - \overline{var})^2 + (observation\; 2 \; in\; the \; var - \overline{var})^2 + \dots + (observation\; n \; in\; the \; var - \overline{var})^2}{the \; number \;of \;observations - 1}}\)
Click here or the qr code below to write your first line of code