Chapter 12: The normal probability model

STAT 1010 - Fall 2022

Learning outcomes

The shape of the normal distribution and why it is important
The central limit theorem.
Understand shift and scales and how to compute them to find \(z\) scores
Basic tests of normality including qqplots and kurtosis

Binomial -> Normal

As the number of trials increases, the binomial pdf becomes well approximated by a normal distribution.

“Observed data often represent the accumulation of many small factors.”

Normal

\(Y \sim N(\mu = 4, \sigma^2 = 25)\)

We use two parameters to describe a Normal distribution
\(Y \sim N(\mu = 0, \sigma^2 = 1)\)
\(Y\) above is the standard normal distribution

Central limit theorem

The probability distribution of a sum of independent random variables of comparable variance approaches a normal distribution as the number of summed random variables increases.

Shifts & scales

Standardizing

Historically, it could be quite hard to find probabilities, so standardizing was important.

use shift and scale information from above
\(Z = \frac{X-\mu_X}{\sigma_X}\)
Find \(E(Z)\)
\(E(Z) = E(\frac{X-\mu_X}{\sigma_X})\)
\[\begin{aligned} E(Z) & = E(\frac{X-\mu_X}{\sigma_X})\\ & = \frac{1}{\sigma_X}E(X-\mu_X) \\ &= 0 \end{aligned}\]

02:00

Standardizing

use shift and scale information from above
\(Z = \frac{X-\mu_X}{\sigma_X}\)
Find \(Var(Z)\)
\[\begin{aligned} Var(Z) & = Var(\frac{X-\mu_X}{\sigma_X})\\ & = \frac{1}{\sigma^2}Var(X-\mu_X) \\ &= \frac{\sigma^2}{\sigma^2} = 1 \end{aligned}\]

02:00

Example

Let \(Y \sim N(\mu_Y = 5, \sigma_Y^2 = 4)\), standardize the following and find the probability:

\(P(Y < 3)\)
\[\begin{aligned} P(Y < 3) &= P(\frac{Y - \mu_Y}{\sigma_Y} < \frac{3-\mu_Y}{\sigma_Y}) \\ & = P(Z < \frac{3-5}{2} = -1) \end{aligned}\]
pnorm(-1) \(0.1586553\)
pnorm(-1, mean = 5, sd = 2)

02:00

Queen Bee

Percentiles are always to the left.

Example

Let \(Y \sim N(\mu_Y = 5, \sigma_Y^2 = 4)\), standardize the following and find the probability:

\(P(Y > 7)\)
\[\begin{aligned} P(Y > 7) &= P(\frac{Y - \mu_Y}{\sigma_Y} > \frac{7-\mu_Y}{\sigma_Y}) \\ & = P(Z > \frac{7-5}{2} = 1) \end{aligned}\]
pnorm(1) \(0.8413447\)
pnorm(1, mean = 5, sd = 2)

02:00

Graph

Example (cont’d)

\(P(Z > 1) = 1 - P(Z<1)\)
1 - pnorm(1) \(0.1586553\)
1 - pnorm(1, mean = 5, sd = 2)

02:00

Rule

Undo

Percentiles

Find \(z : P(Z < z) = .1711\)
qnorm(.1711) \(-0.9498273\)
A medical test produces a score that measures the risk of a disease. In healthy adults, the test score is normally distributed with \(\mu=10\) point and \(\sigma = 2.5\). Lower scores suggest the disease is present. Test scores below what threshold signal a problem for only \(1\%\) of healthy adults?
qnorm(.01, mean = 10, sd = 2.5) \(4.18413\)

02:00

Normality tests

Looking at the data

multimodal
skewness
outliers

Plot - normal

Plot - tale of 2 tails

Tale of 2 tails

Fat tails

Thin tails

Bimodal

QQPlot in `R`

ggplot(diamonds, aes(sample=price)) +
  stat_qq() + # add the dots
  stat_qq_line() # and the line

Skewness

find the \(z\) scores for all data (\(z_i = \frac{x_i - \bar{x}}{s}\))
\[K_3 = \frac{z_1^3 +z_2^3 + ... + z_n^3}{n}\]
If \(K_3 \approx 0\), then \(x\) is symmetric
As \(K_3\) gets larger than 0, more right-skewed
As \(K_3\) gets smaller than 0, more left-skewed

Kurtosis

find the \(z\) scores for all data (\(z_i = \frac{x_i - \bar{x}}{s}\))
\[K_4 = \frac{z_1^4 +z_2^4 + ... + z_n^4}{n} - 3\]
If \(K_4 \approx 0\), then \(x\) is approximately normal
As \(K_4 < 0\) flat uniform distribution without tails
As \(K_4 > 0\) many outliers

Take home

If you see departures from normality (large or small kurtosis, QQ plots that deviate from a straight line) PLOT the data and check.

Chapter 12: The normal probability model

Learning outcomes

Binomial -> Normal

Normal

Central limit theorem

Shifts & scales

Standardizing

Standardizing

Example

Queen Bee

Example

Graph

Example (cont’d)

Rule

Undo

Percentiles

Normality tests

Plot - normal

Plot - tale of 2 tails

Tale of 2 tails

Fat tails

Thin tails

Bimodal

QQPlot in R

Skewness

Kurtosis

Take home

QQPlot in `R`