library(tidyverse) # for data manipulation and data visualization
library(here) # to organize files
HW 1 - Titanic dataset
Due Friday, 23 September, 2pm on Gradescope
Introduction
For this homework assignment we will be exploring the Titanic dataset.
Learning goals
In this assignment, you will…
- Download and import data into R
- Explore categorical variables
- Compute the expected counts of a \(\chi^2\) distribution and
- Perform a \(\chi^2\) test on some variables
Getting started
Log in to RStudio
Click on your Stat1010.Rproj.
Go to File ➛ New File ➛ Quarto Document and name the document hw-1 click create
Packages
The following packages will be used in this assignment:
Data: Titanic
On April 15, 1912, the largest passenger liner ever made collided with an iceberg during her maiden voyage. When the Titanic sank it killed 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships. One of the reasons that the shipwreck resulted in such loss of life wasthat there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others.
The titanic.csv file contains data for 887 of the real Titanic passengers. Each row represents one person. The columns describe different attributes about the person including whether they survived, their age, their passenger-class, their sex, and the fare they paid
Pclass
: The class of passengers on the titanic, with \(1st\) being the highest class, and \(3rd\) the lowest.Survived
: A variable that records whether or not a passenger survived the sinking of the titanic.
Part 1: Download and import the data
Download the file into your Stat1010 R project in the data folder.
Import your data into R.
Part 2: Exploring categorical variables
How many observations and how many variables in this dataset?
Classify all the variables into two categories: qualitative or quantitative
How many levels of passenger class are there in the variable?
How many passengers survived the sinking of the titanic?
Make a contingency table of class and survival. Which combination of class and survival is the most common?
What proportion of \(3rd\) class passengers survived?
What proportion of \(1st\) class passengers survived?
Draw a plot that conditions on class and displays the proportions of each that survived and update labels. What does the plot suggest regarding the chance of survival and your class? Does it suggest that they are associated?
Assuming Survived and Pclass are independent, write R coding to find the expected counts for all cells of the contingency table. (Hint: you will need to use the rowSums(.[2:3]) and colSums(.[2:4]) functions, depending on the order that you compute the two.)
Perform a \(\chi^2\) test on the two variables Survived and Pclass. Include both \(H_0\) and \(H_A\) and interpret the \(p-value\). What conclusions can you draw about the independence of Survived and Pclass?
Submission
To submit your assignment:
- Then submit the pdf following the link on the course website.
Grading
Total points available: 25 points.
Component | Points |
---|---|
Part 1 Qu 1 | 2 |
Part 1 Qu 2 | 3 |
Part 2 Qu 1 | 2 |
Part 2 Qu 2 | 2 |
Part 2 Qu 3 | 1 |
Part 2 Qu 4 | 2 |
Part 2 Qu 5 Pt 1 | 1 |
Part 2 Qu 5 Pt 2 | 1 |
Part 2 Qu 6 | 3 |
Part 2 Qu 7 | 4 |
Part 2 Qu 8 | 4 |