Why Data Types and Measurement Scales?
Data types are important in data visualization as they determine the visualization method or chart type. In other words, we need to know what type of variable we are dealing with in order to choose the most suitable chart type. Having a good understanding of different types of data and measurement scales is also important to perform Exploratory Data Analysis (EDA), as certain statistical measures only apply to certain measurement scales.
Here is what we will cover in this article:
- Types of data
- What are measurement scales?
Types of data
What is Qualitative or Categorical Data?
What is Quantitative Data?
Identifying data types – Example 1
Lets look at an example and try to identify Categorical and Quantitative Data.
IT, HR, Sales, Total belong to a category called Department. These are categorical variables.
Employees, Contractors belong to category which we could call types of workers. These are also categorical variables.
Are numbers always quantitative data?
Numbers can also represent categorical data. An example is Year, say 2015, 2016, 2017. Another example is Employee ID say 1000, 1012, 1016, 1090.
These represent categorical variables because though they take the form of numbers, they in fact describe something. They do not measure something.
Is categorical data always non-numerical?
Is quantitative data always numerical?
Identifying data types – Example 2
Lets look at one more example and try to identify the quantitative and categorical variables.
x-axis: Q1, Q2, Q3 represent the quarters, hence categorical
y-axis: Count of employees is quantitative
Legend: IT, HR are categorical
Bars: The height of the bars represent the count of employees and are quantitative. The colours represent IT and HR and is categorical. The bars are hence representative of both quantitative and categorical variables.
What are Measurement Scales?
There are 4 measurement scales. First we will define them, then we will look at each of the measurement types in depth.
- Nominal: Nominal data is “named” or “labelled” categorical data that can be divided into various groups.
- Ordinal: Ordinal data is a categorical data type where variables have natural ordered categories. The distance between the categories is not known in ordinal data.
- Interval: Interval data is a numeric data type in which we not only know the order of data bit also the exact distance between them.
- Ratio: Ratio is an interval scale with the added condition that zero indicates absence of that variable.
Measurement scales in depth
Categorical or Qualitative
Eye colour : Blue, Green, Brown
Departments: IT, HR, Sales, Marketing
Employee names: John, Skye, Margaret
- Nominal data does not have an intrinsic order. For instance, we cannot order eye colours.
- Nominal data can be numbers. Example, postal or zip code is a nominal data. Take for example 2 postal codes 12345, 67890. 12345 is less than 67890. But that does mean 12345 was issued before 67890. There is no intrinsic order to postal codes.
- Comparisons – equals and not equals – are the only mathematical operations on nominal data.
- Nominal data can be grouped. For example, all IT employees is a valid group.
Mode is the only valid statistical measure on nominal data.
Consider this sample for eye colours of 20 people.
Ordinal data is categorical or qualitative in nature.
Consider this question:
I am satisfied with opportunities for professional growth:
- Strongly Agree
- Neutral/Neither agree nor disagree
- Strongly Disagree
These options could be code as
- 1 – Strongly Agree
- 2 – Agree
- 3 – Neutral/Neither agree nor disagree
- 4 – Disagree
- 5 – Strongly Disagree
Another common example of Ordinal Data is a Likert Scale:
- 1 – Like
- 2 – Like Somewhat
- 3 – Neutral
- 4 – Dislike Somewhat
- 5 – Dislike
- Ordinal data has a natural order
- The distance between the ordered categories is not meaningful. For example, it does not make sense to calculate distance between like and like somewhat.
- Ordinal data can be coded as numbers
- Comparisons – equals and not equals – are possible on ordinal data
- Median : Since ordinal data can be sorted, a median can be calculated.
Interval data is quantitative in nature.
Temperature: 10oC, 20oC, 0oC
- Interval data is ordered
- Distance between data points is meaningful. For example the distance between 10oC and 20oC is 10oC.
- Interval data does not have a true zero.
What is a true zero:
True zero refers to the absence of what is being measured. In case of interval data, zero is not the absence of a characteristic or quantity.
For example, zero money is absence of money. But in case of interval data like Temperature, 0oC is not the absence of temperature.
What is the importance of true zero?
In the absence of true zero, ratios are meaningless. Hence interval scale does not have a ratio.
- In the absence of true zero, multiplication and division are meaningless. 20oC is not twice as hot as 10oC. 12 PM is not twice that of 6PM.
- Standard Deviation
Ratio type data is quantitative in nature.
Mass: 20kg, 40kg
Distance: 100m, 400m
- Ratio data is ordered
- Distance between points is meaningful
- Ratio data has a true zero. 0kg is absence of weight. 0m is absence of distance.
- Standard Deviation