# Lec 4 - Exploring numeric data

STAT 1010 - Fall 2022

# Learning outcomes

By the end of this lesson you should:

• Understand measures of center and know how to compute them

• Understand measures of spread and know how to compute them

• Graphical representation and interpretation of one numerical variable in R

## Definition of numeric variable

A numeric or quantitative variable is a variable that can be measured.

## The purpose of Exploratory Data Analysis (EDA)

• EDA is about learning the structure of a dataset through a series of numerical and graphical techniques.

• When you do EDA, you’ll look for both

• general trends and

• interesting outliers in your data.

generate questions that will help inform subsequent analysis.

# Measures of center

• Mean

• Median ($50\%th$ percentile)

# Mean

$\overline{var} =\frac{the\; sum\; of\; all\; the\; observations\; in\; the \; var}{the \; number \;of \;observations}$

# Mean vs median

Mean - sensitive to outliers and skewed distributions

Median - more stable estimate

Mode - most common value

# Measures of spread

• Percentiles

• quartiles

• Interquartile range

• variance $s^2$

• standard deviation $s$

• range

# Variance

$s_{var}^2 =\frac{(observation\; 1 \; in\; the \; var - \overline{var})^2 + (observation\; 2 \; in\; the \; var - \overline{var})^2 + \dots + (observation\; n \; in\; the \; var - \overline{var})^2}{the \; number \;of \;observations - 1}$

# Standard deviation

$s_{var} =\sqrt{\frac{(observation\; 1 \; in\; the \; var - \overline{var})^2 + (observation\; 2 \; in\; the \; var - \overline{var})^2 + \dots + (observation\; n \; in\; the \; var - \overline{var})^2}{the \; number \;of \;observations - 1}}$

# Words to know

• mean
• median or quartile 2 (Q2)
• lower quartile (LQ) or quartile 1 (Q1)
• upper quartile (UQ) or quartile 3 (Q3)
• interquartile range (IQR)