Chapter 14: Sampling variation and quality

STAT 1010 - Fall 2022

Learning outcomes

  • The shape of the distribution of the sample mean and why it is important
  • When we can assume a normal distribution for the sample mean
  • Control charts and what they are used for
  • Find control limits

Sampling

  • SRS
  • Stratified sampling
  • Cluster sampling
  • it would be nice to sample repeatedly to see how the mean values compare

Benefits of averaging

  • reduces variation
  • more normal than the original distribution

Normal model

  • Find kurtosis(\(K_4\))
  • If \(n > 10 |K_4|\), where \(n\) is the sample size, then a normal model adequately approximates the distribution of the sample mean \(\bar{X}\).
  • If we know the data come from a normal distribution this is also true.

Standard error

\[ SD(\bar{X}) = SE(\bar{X}) = \frac{\sigma}{\sqrt{n}} \]

Sampling distribution

If \(X \sim N(\mu_X, \sigma_X^2)\)

\[ \bar{X} \sim N(\mu = \mu_X, \sigma^2 = \frac{\sigma_X^2}{n}) \]

Example

  1. Let \(Y \sim N(\mu = 5, \sigma^2 = 16)\), find the distribution of the mean of repeated samples of size 4.
    • \(\bar{Y} \sim N(\mu = 5, \sigma^2 = 4)\)

Control limits

  • if mean production is outside certain values, we may need to stop and recallibrate machinery
  • these values are called control limits
  • \(\mu - L \leq \bar{X} \leq \mu + L\)
  • \(\mu - L\) and \(\mu + L\) are control limits

Types of errors

  • False positives - type 1 error
    • act when you should not
    • probabiliy of occurence denoted \(\alpha\)
  • False negatives - type 2 error
    • don’t act when you should
    • probabiliy of occurence denoted \(\beta\)

Setting control limits - 1

If \(\bar{X} \sim N(\mu = 12, \sigma^2 = 2.3)\) how can we find the control limits?

  1. Set the control limits and find the \(\alpha\) value
    • We want the control limits to be between 10 and 14
    • \(Pr(\bar{X} < 10 \textrm{ or } \bar{X} > 14)\)
    • \(\begin{aligned}P(\bar{X} < 10) &= P(\frac{\bar{X} - \mu_\bar{X}}{\sigma_\bar{X}} < \frac{10-\mu_\bar{X}}{\sigma_\bar{X}}) \\ & = P(Z < \frac{10-12}{\sqrt{2.3}} = -1.318761)\end{aligned}\)
    • pnorm(-1.318761) \(\approx 0.09362451\)
    • pnorm(10, mean = 12, sd = sqrt(2.3))

Setting control limits - 1 cont’d

If \(\bar{X} \sim N(\mu = 12, \sigma^2 = 2.3)\) how can we find the control limits?

  1. Set the control limits and find the \(\alpha\) value
    • We want the control limits to be between 10 and 14
    • \(\begin{aligned}P(\bar{X} > 14) &= P(\frac{\bar{X} - \mu_\bar{X} }{\sigma_\bar{X}} > \frac{14-\mu_\bar{X}}{\sigma_\bar{X}}) \\ & = P(Z > \frac{14-12}{\sqrt{2.3}} = 1.318761)\end{aligned}\)
    • 1 - pnorm(1.318761) \(\approx 0.09362451\)
    • 1 - pnorm(14, mean = 12, sd = sqrt(2.3))
    • the \(\alpha\) value is \(19\%\) which is very high!

Setting control limits - 2

  1. Set the \(\alpha\) value and find the control limits
  • We want the \(\alpha\) to be \(0.025\)
    • \(Pr(\bar{X} < z_{0.0125} \textrm{ or } \bar{X} > z_{0.0125})\)
    • qnorm(0.0125) \(\approx -2.241403\)
    • \(\begin{aligned}-2.241403 =& \frac{X - \mu_\bar{X}}{\sigma_\bar{X}}\\ & = \frac{X - 12}{\sqrt{2.3}} \\ &= -2.241403\sqrt{2.3} +12 = 8.600744 \end{aligned}\)
    • qnorm(0.0125, mean = 12, sd = sqrt(2.3))

Setting control limits - 2 - cont’d

  1. Set the \(\alpha\) value and find the control limits
  • We want the \(\alpha\) to be \(0.025\)
    • \(Pr(\bar{X} < z_{0.0125} \textrm{ or } \bar{X} > z_{0.0125})\)
    • qnorm(1- 0.0125) \(\approx 2.241403\)
    • \(\begin{aligned}2.241403 =& \frac{X - \mu_\bar{X}}{\sigma_\bar{X}}\\ & = \frac{X - 12}{\sqrt{2.3}} \\ &= 2.241403\sqrt{2.3} +12 = 15.39926 \end{aligned}\)
    • qnorm(1-0.0125, mean = 12, sd = sqrt(2.3))

Repeated testing

We are not testing once, but multiple times. Assuming independence:

  • \(\begin{aligned}P(\textrm{within limits for 10 days}) =& P(\textrm{within limits for day 1}) \cdot P(\textrm{within limits for day 2}) \cdot \dots \cdot P(\textrm{within limits for day 10})\\ &= 0.975^{10} \approx 0.7763296 \end{aligned}\)

  • There is a \(1-0.7763296 = 0.2236704\) percent false positive rate

  • Management must decide if there is a false positive by checking for mechanical errors and inspecting equipment

  • Adjust \(\alpha\) value to address this

Control charts for variation

X-bar charts are slow to detect under or over filling

  • S-Chart tracks the standard deviation from sample to sample
  • R-Chart tracks the range from sample to sample