STAT 1010 - Fall 2022
By the end of this lesson you should:
Be able to identify categorical variables and why they are important
Graphical representation of two categorical variables in R
Tabular representation of two categorical variables
Graphical representation of one categorical variable
Use this link, or the qr code below
A categorical or qualitative variable is a variable that can not be measured. They are descriptors or grouping factors.
Exploratory Data Analysis is about learning the structure of a dataset through a series of numerical and graphical techniques.
When you do EDA, you’ll look for both
general trends and
interesting outliers in your data.
generate questions that will help inform subsequent analysis.
How many freshman are there in our class?
What proportion of our class turned in the diamonds assignment?
How many transportation stocks are in our portfolio?
DOES NOT RESTRICT THE POPULATION
Others?
The count of the distribution of one variable.
This is referred to as the marginal distribution because in a contingency table, we usually compute the column sums and row sums in the margins.
Of all the freshman students in our class what percent turned in the diamonds exercise?
Of all the students who turned in the diamonds exercise how many were freshman?
How many transportation stocks are in our portfolio that performed well last year?
Of all the stocks in our portfolio that performed well last year, how many are transportation stocks?
RESTRICTED TO A SUBPOPULATION
Others?
Mutually exclusive
Associated
Independence
Chi-Squared test
\(H_0\) and \(H_A\)
Contingency table
Click here or the qr code below to write your first line of code