gender | happy | meh | sad |
---|---|---|---|
female | 100 | 30 | 110 |
male | 70 | 32 | 120 |
STAT 1010 - Fall 2022
R
We have discussed this before and now we will formalize this work. Like walking up a lighthouse.
\[H_0: \text{Qualitative variable 1 and Qualitative variable 2 are independent}\]
\[H_A: \text{Qualitative variable 1 and Qualitative variable 2 are not independent}\]
A manufacturing firm is considering a shift from a 5-day workweek (8 hours per day) to a 4-day workweek (10 hours per day). Samples of the preferences of 188 employees in two divisions produced the following contingency table:
Observed counts
Divisions | ||||
---|---|---|---|---|
Clerical | Production | Total | ||
Preferences | 5-day | 17 | 46 | 63 |
4-day | 28 | 38 | 66 | |
Total | 45 | 84 | 129 |
a. What would it mean if the preference of employees is independent of division?
b. State \(H_0\) for the \(\chi^2\) test of independence in terms of the parameters of two segments of the population of employees.
If one variable is independent of the other, then the variable should have the same proportion in each level as the totals have in each level.
Expected counts
Divisions | ||||
---|---|---|---|---|
Clerical | Production | Total | ||
Preferences | 5-day | \(45/129\cdot63 \approx 22\) | \(84/129\cdot63 \approx 41\) | 63 |
4-day | \(45/129\cdot66 \approx 23\) | \(84/129\cdot66 \approx 43\) | 66 | |
Total | 45 | 84 | 129 |
\[ \chi^2 = sum\frac{(observed - expected)^2}{expected}\]
We want to know how much this varies from the expected that is why the expected is the denominator.
Find the expected counts for the following observed variables
gender | happy | meh | sad |
---|---|---|---|
female | 100 | 30 | 110 |
male | 70 | 32 | 120 |
happy | meh | sad |
---|---|---|
88.31169 | 32.20779 | 119.4805 |
81.68831 | 29.79221 | 110.5195 |
If the color change in the bars are approximately at the same height in every level of the variable on the \(x\) axis, this is evidence against rejecting \(H_0\). We might be more inclined to think that \(H_0\) is true.
\[df \text{ for a } \chi^2 \text{ test of independence} = (r-1)(c-1)\]
where \(r= \text{number of rows}\) and \(c= \text{number of columns}\)
Assumptions:
R
1 - pchisq(chi-square, df)
In example 2, perform a hypothesis test and find the \(\chi^2\) value and the p-value. Are gender and mood independent?
1 - pchisq(5.099, df = 2)
\(\approx 0.078\)The \(\chi^2\) test is a general test that provides evidence that one of the proportions might differ. To estimate which proportion differs we will need to build a confidence intervals like we did here.
There are a few places where \(\chi^2\) tests are particularly useful. For example, in fraud detection. Benford’s law outlines the frequency of first digits (1-9) in a number. If the proportions are significantly different than Benford’s law there may be evidence of fraud.
When an outcome is either Binomial or Poisson one way to check that predictions are correct is to perform a \(\chi^2\) test on the actual and predicted.
Click here or the qr code below