How to Make Choropleth Maps Using Data Classification
You have your quantitative data ready to go. Your mouse is hovering over “classify” waiting to generate the many colors of choropleth maps.
But you can’t help wonder if you’re choosing the right data classification mode.
Equal intervals, quantile, natural breaks, pretty breaks – there’s a lot to choose from. But what is the difference between each of them?
This post will help you understand the types of choropleth maps that exist and which one to choose for your maps.
Choose Your Number of Classes
First, you must aggregate data based on a number of classes. When you have more classes, you get more variation sometimes making it harder to separate shading. If you want to test out different shading, ColorBrewer has a tool for color advice.
For example, here’s 10 classes:
While fewer classes provides less separation between classes such as 5 classes below.
After all, the number of classes you decide with really depends on the purpose of your map.
Select Your Data Classification Method
Second, you will have to decide how to classify your data. To put it another way, data classification arranges your data with boundaries to separate classes. You could separate your classes with an equal interval mode:
Alternatively, you could select a quantile type of classifier where it arranges the data differently (more on this below)
Each data classification technique produces unique choropleth maps. But they all paint a different story to the map reader. The one thing you must realize is that you’re using the same data in each choropleth map, but what’s really changing is how you classify the data.
Our Example Data
The most important thing you have to realize is that for each of these choropleth maps we create, we use the same data. What’s changing is how we classify the data.
In this example, we count the number of letters in country names. For example:
- Mali, Cuba and Peru and others are four letter countries.
- Whereas, Bosnia and Herzegovina has 22 characters.
If you plot out 4 to 22 characters, it will have a lot of colors.
For example, the four-letter countries are the lightest shades of green. As the letter count increases, the shading gets darker.
Which country belongs to which group? It’s hard to tell.
So this is why we use data classification. When we group by classes, there’s less shading and we aggregate the data by group.
Ultimately, the question is how do we define those class boundaries or bins? In other words, how do we classify the data into groups?
First, let’s try dividing classes into evenly-spaced groupings like equal intervals below and see what happens.
Equal Interval Data Classification
Equal interval is cut and dry. All it really does is divide the classes into equal groups.
- Class 1: 4 – 8 (113 countries have four, five, six, seven or eight letters)
- Class 2: 8 – 12 (41)
- Class 3: 12 – 16 (12)
- Class 4: 16 – 20 (8)
- Class 5: 20 – 24 (2)
The minimum number of characters of a country is 4 such as Peru. The maximum number of characters is 24, which is Central African Republic. When you plot each country and their number of characters on a map, it looks like this (the brackets indicate the count):
Equal interval data classification subtracts the maximum value from minimum value (24-4=20). In our example, we generated 5 classes but the number of classes is entirely up to you. Then, it divides 20 by 5 and you get an interval (20/5=4).
Almost always, equal interval choropleth maps result in an unequal count of countries per class. For example, class 1 has 113 countries out of 176 countries with four, five, six and seven letters.
However, only 2 countries have more than 20 letters. As a result, this map displays more light shaded colors compared to only 2 with the dark shading.
But what happens if you want the count of countries in each class to be close to equal? That’s when you should use a quantile map.
Quantile (Equal Count) Classification
The quantile map tries to bin the same count of features in each of the 5 classes. In other words, quantile maps tries to arrange groups so they have the same quantity. As a result, the shading will look equally distributed in quantile types of maps.
- Class 1: 4 – 6 (56 countries have 4, 5 or 6-letter names)
- Class 2: 6 – 7 (38)
- Class 3: 7 – 8 (19)
- Class 4: 9 – 11 (36)
- Class 5: 12 – 24 (27)
Quantile maps takes the total of number of features (176 countries in our case). Then, it divides the total by the number of classes to get the average (176/5=35.2). Finally, quantile maps counts the quantity in each group and arranges them as close to the average as possible.
You can see how the count of each class looks very similar and are close to 35.2. For each class, there are not too many or too few for count.
Despite the balanced style in quantile choropleth maps, they can also be misleading. They are misleading because people tend to look at a shade and group it in the same category. For example, a 12-letter country gets the same dark shading as a 24-letter country… and where’s the justice in that?
Natural Breaks (Jenks) Classification
The first thing to remember about the Natural Breaks (Jenks) classification is that it is an optimization method for choropleth maps. In short, it arranges each groupings so there is less variation in each class or shading.
- Class 1: 4 – 6 (56)
- Class 2: 6 – 8 (57)
- Class 3: 8 – 12 (41)
- Class 4: 12 – 18 (18)
- Class 5: 18 – 24 (4)
Natural Breaks (Jenks) takes an iterative approach by comparing the sum of squared deviations between classes to the array mean. Then, the algorithm uses a goodness of variance fit with 1 as a perfect fit and 0 as a poor fit.
The founder of the Natural Breaks data classification method was a cartographer by the name of George Frederick Jenks. He specialized in monitoring the eye movements of people when looking at a map. And the results for this map looked great too.
You can see how this data classification method minimizes variation in each group. As we have lots of shorter country names, it finds suitable class ranges. But it still manages to group outliers with longer country names in a class of its own.
Standard Deviation Classification
Standard deviation is a statistical technique type of map based on how much the data differs from the mean. You measure the mean and standard deviation for your data. Then, each standard deviation becomes a class in your choropleth maps.
In our case, the mean number of characters is about 8.5 with a standard deviation of 3.7 characters. As a result, all countries with 5 to 8 characters will be placed in the 0 to -1 standard deviation grouping. Likewise, countries with 9 to 12 letters are grouped in 0 to 1 standard deviation range like this:
- Class 1: <-1 σ (9)
- Class 2: -1 to 0 σ (104)
- Class 3: 0 to 1 σ (41)
- Class 4: 1 to 2 σ (10)
- Class 5: 2 to 3 σ (9)
- Class 6: 3 to 4 σ (2)
- Class 7: >=4 σ (1)
The raw categories as output need a bit of clarification to the reader. What is the average? What is the range for each standard deviation?
Despite these inconsistencies, standard deviation types of maps might be one of the most appropriate because of its statistical origin. All the 4 letter countries are <-1 standard deviations. Countries with 5 to 8 letters are -1 to 0 standard deviations. The one 24-letter country is >4 standard deviations because of its extreme deviation from the mean of 8.5.
Pretty Breaks Classification
If you want round numbers in your ranges, then you should choose pretty breaks. All pretty breaks does is rounds each break-point up or down. So instead of having a break point as 599.364 it will become 600,000 with pretty breaks.
It’s a bit hard to see how round the numbers are (it’s grouping by 5’s) in this example because all the examples above also produce round numbers. But when you have large numbers like population estimates (see below), it will generate some very pretty breaks.
- Class 1: 4 – 5 (29)
- Class 2: 5 – 10 (111)
- Class 3: 10 – 15 (24)
- Class 4: 15 – 20 (10)
- Class 5: 20 – 24 (2)
As a result of making rounded numbers, pretty breaks will also be very picky for the number of classes you decide.
Here’s how population estimates compares between the data classification techniques:
Natural Breaks (Jenks):
Pretty Breaks. Now that’s pretty:
Try It Out Yourself
Choropleth maps use different shading and coloring to display the quantity or value in defined areas.
Often the case, the map maker uses a type of data classification to produce its own unique choropleth map. Each data classification method impacts the reader differently.
There are several ways to classify data in GIS. We’ve outlined their differences with different examples for choropleth maps. Use this guide to classify practically anything like crime rates, level of education and politics.
What is your favorite data classification method? Let us know with a comment below.