Why Data Types and Measurement Scales?
Data types are important in data visualization as they determine the visualization method or chart type. In other words, we need to know what type of variable we are dealing with in order to choose the most suitable chart type. Having a good understanding of different types of data and measurement scales is also important to perform Exploratory Data Analysis (EDA), as certain statistical measures only apply to certain measurement scales.
Here is what we will cover in this article:
Types of data
Data is broadly classified as Qualitative Data (aka categorical data) and Quantitative Data
What is Qualitative or Categorical Data?
Qualitative data characterizes attributes and describeswhat quantitative values measure.
What is Quantitative Data?
Quantitative data measures something.
Identifying data types – Example 1
Lets look at an example and try to identify Categorical and Quantitative Data.
Department | Employees | Contractors |
IT | 251 | 47 |
HR | 22 | 5 |
Sales | 25 | 178 |
Total | 298 | 230 |
IT, HR, Sales, Total belong to a category called Department. These are categorical variables.
Employees, Contractors belong to category which we could call types of workers. These are also categorical variables.
The numbers 251, 22, 25, 298, 47, 5, 178, 230 are quantitative variables.
Are numbers always quantitative data?
No.
Numbers can also represent categorical data. An example is Year, say 2015, 2016, 2017. Another example is Employee ID say 1000, 1012, 1016, 1090.
These represent categorical variables because though they take the form of numbers, they in fact describe something. They do not measure something.
A simple way to identify a quantitative variable is to ask the question: Does addition or subtraction of numbers make sense? If yes, the variable is quantitative, if no, the variable is categorical.
Is categorical data always non-numerical?
As we saw above, the answer is No. Categorical data can sometimes be represented as numbers.
Is quantitative data always numerical?
Identifying data types – Example 2
Lets look at one more example and try to identify the quantitative and categorical variables.
x-axis: Q1, Q2, Q3 represent the quarters, hence categorical
y-axis: Count of employees is quantitative
Legend: IT, HR are categorical
Bars: The height of the bars represent the count of employees and are quantitative. The colours represent IT and HR and is categorical. The bars are hence representative of both quantitative and categorical variables.
Along with understanding variable types, it is important to know about measurement scales.
What are Measurement Scales?
Measurement scales describe the nature of information within the variable.
Definitions
There are 4 measurement scales. First we will define them, then we will look at each of the measurement types in depth.
- Nominal: Nominal data is “named” or “labelled” categorical data that can be divided into various groups.
- Ordinal: Ordinal data is a categorical data type where variables have natural ordered categories. The distance between the categories is not known in ordinal data.
- Interval: Interval data is a numeric data type in which we not only know the order of data bit also the exact distance between them.
- Ratio: Ratio is an interval scale with the added condition that zero indicates absence of that variable.
Measurement scales in depth
Nominal data
Data type:
Categorical or Qualitative
Examples:
Eye colour : Blue, Green, Brown
Departments: IT, HR, Sales, Marketing
Employee names: John, Skye, Margaret
Characteristics:
- Nominal data does not have an intrinsic order. For instance, we cannot order eye colours.
- Nominal data can be numbers. Example, postal or zip code is a nominal data. Take for example 2 postal codes 12345, 67890. 12345 is less than 67890. But that does mean 12345 was issued before 67890. There is no intrinsic order to postal codes.
Mathematical Operations:
- Comparisons – equals and not equals – are the only mathematical operations on nominal data.
- Nominal data can be grouped. For example, all IT employees is a valid group.
Statistical Measures:
Mode is the only valid statistical measure on nominal data.
Consider this sample for eye colours of 20 people.
Eye Colour | Count |
Blue | 4 |
Green | 4 |
Brown | 12 |
The mode of this sample is Brown eye colour.
Ordinal Data
Data type:
Ordinal data is categorical or qualitative in nature.
Examples:
Consider this question:
I am satisfied with opportunities for professional growth:
- Strongly Agree
- Agree
- Neutral/Neither agree nor disagree
- Disagree
- Strongly Disagree
These options could be code as
- 1 – Strongly Agree
- 2 – Agree
- 3 – Neutral/Neither agree nor disagree
- 4 – Disagree
- 5 – Strongly Disagree
Another common example of Ordinal Data is a Likert Scale:
- 1 – Like
- 2 – Like Somewhat
- 3 – Neutral
- 4 – Dislike Somewhat
- 5 – Dislike
Characteristics:
- Ordinal data has a natural order
- The distance between the ordered categories is not meaningful. For example, it does not make sense to calculate distance between like and like somewhat.
- Ordinal data can be coded as numbers
Mathematical Operations:
- Comparisons – equals and not equals – are possible on ordinal data
- Sorting
Statistical Measures:
- Median : Since ordinal data can be sorted, a median can be calculated.
- Mode
Interval Data
Data type:
Interval data is quantitative in nature.
Examples:
Temperature: 10oC, 20oC, 0oC
Time, Date
Characteristics:
- Interval data is ordered
- Distance between data points is meaningful. For example the distance between 10oC and 20oC is 10oC.
- Interval data does not have a true zero.
What is a true zero:
True zero refers to the absence of what is being measured. In case of interval data, zero is not the absence of a characteristic or quantity.
For example, zero money is absence of money. But in case of interval data like Temperature, 0oC is not the absence of temperature.
What is the importance of true zero?
In the absence of true zero, ratios are meaningless. Hence interval scale does not have a ratio.
Mathematical Operations:
- Addition
- Subtraction
- In the absence of true zero, multiplication and division are meaningless. 20oC is not twice as hot as 10oC. 12 PM is not twice that of 6PM.
Statistical Measures:
- Mean
- Median
- Mode
- Standard Deviation
Ratio Data
Data type:
Ratio type data is quantitative in nature.
Examples:
Mass: 20kg, 40kg
Distance: 100m, 400m
Characteristics:
- Ratio data is ordered
- Distance between points is meaningful
- Ratio data has a true zero. 0kg is absence of weight. 0m is absence of distance.
Mathematical Operations:
- Addition
- Subtraction
- Multiplication
- Division
Statistical Measures:
- Mean
- Median
- Mode
- Standard Deviation