Lec 4 - Exploring numeric data

STAT 1010 - Fall 2022

Learning outcomes

By the end of this lesson you should:

  • Understand measures of center and know how to compute them

  • Understand measures of spread and know how to compute them

  • Graphical representation and interpretation of one numerical variable in R

Definition of numeric variable

A numeric or quantitative variable is a variable that can be measured.

The purpose of Exploratory Data Analysis (EDA)

  • EDA is about learning the structure of a dataset through a series of numerical and graphical techniques.

  • When you do EDA, you’ll look for both

    • general trends and

    • interesting outliers in your data.

generate questions that will help inform subsequent analysis.

Observations

Variables

Measures of center

  • Mean

  • Median (\(50\%th\) percentile)

Mean

\(\overline{var} =\frac{the\; sum\; of\; all\; the\; observations\; in\; the \; var}{the \; number \;of \;observations}\)

Mean vs median

Mean vs median

Mean - sensitive to outliers and skewed distributions

Median - more stable estimate

Mode - most common value

Measures of spread

  • Percentiles

    • quartiles

    • Interquartile range

  • variance \(s^2\)

  • standard deviation \(s\)

  • range

Variance

\(s_{var}^2 =\frac{(observation\; 1 \; in\; the \; var - \overline{var})^2 + (observation\; 2 \; in\; the \; var - \overline{var})^2 + \dots + (observation\; n \; in\; the \; var - \overline{var})^2}{the \; number \;of \;observations - 1}\)

Graphically

A few of the “squares”

Standard deviation

\(s_{var} =\sqrt{\frac{(observation\; 1 \; in\; the \; var - \overline{var})^2 + (observation\; 2 \; in\; the \; var - \overline{var})^2 + \dots + (observation\; n \; in\; the \; var - \overline{var})^2}{the \; number \;of \;observations - 1}}\)

Words to know

  • mean
  • median or quartile 2 (Q2)
  • lower quartile (LQ) or quartile 1 (Q1)
  • upper quartile (UQ) or quartile 3 (Q3)
  • interquartile range (IQR)

Your turn

Click here or the qr code below to write your first line of code