A Critique of Alabama’s Visualization Techniques

James Kelly
7 min readOct 28, 2020

Data is everywhere. In almost every industry, data is used to identify trends, analyze results, and predict the future. Therefore, it is imperative that data be visualized in a way that is appealing to the eye, simple to look at, yet conveys the appropriate information about the topic to the reader without any unnecessary information. In this past year, data has become even more important to our society because of Covid-19. Everyday, new studies are done and statistics are reported about Covid-19 trends. Unfortunately, many of these reports are oftentimes misleading, confusing, or simply use bad data visualization techniques.

I found a pretty bad data visualization fact sheet reported by the Alabama state government:

https://www.alabamapublichealth.gov/covid19/assets/cov-al-cases-062420.pdf

These data come represent a running count of Covid-19 cases and tests in Alabama. These data were reported on the week of June 24th, 2020.

Data

To begin, let us analyze the data. The data clearly comes from covid tests taken from people within Alabama, which is good. The only place they mention Alabama is in the upper right corner, but it is acceptable because this visualization can only be accessed from the Alabama state website, so person who is looking at it knows what state the data is from.

However, the numbers above give us an idea of the most recent Covid-19 numbers in Alabama. What do these numbers even mean? How can we interpret this? To make this visualization more effective, we need more data. 358, 319 people were tested, but how does this compare to the population of Alabama? After some outside research, I found from a Google search that the population is about 4.9 million. after some simple math, I found that this means that only 7% of the population had been tested! There were 879 deaths out of only 7% of the population. If 100% of the population had been tested, that would mean ~9,000 deaths!

The data is misleading because it does not show the whole story. They are reporting data for Covid-19 cases based upon data from Covid-19 tests. This data is not accurate because there are a plethora of unreported data from people that are untested that need to be included into this report for a more accurate story.

The worst part of the dataset is in the “Age Category” pie chart. All the pie sections added up to 100.1%. Huh?? Somebody had made a mistake, and one of those numbers is off by .1. We do not know which one is wrong, so really we have no way to interpret this chart correctly. Even though .1% does not seem like a big deal, it is the difference between 13,123 25–49 year olds testing positive (41.5%) or 13,092 people 25–49 year olds testing positive (41.40. That’s a mistake of 31 people!

Motivation

The intention of this visualization is clearly to report Covid-19 case and test statistics from the state of Alabama. These data are especially useful for lawmakers when considering what measures to take for this type of virus. However, the data appears to be biased. One reason it is biased is because it only includes a small sample size of all the people in Alabama. State officials are making decisions for the whole population based on data that was collected from on 7% of the population. How does that make you feel?

Obviously, they can only use data that they have obtained to make these visualizations. But, their mistake is not in using these data in this fashion, it is failing to make clear that this data is not representative of the whole state of Alabama.

Audience

The intended audience is obviously the people of Alabama, but also anyone interested in the Covid-19 statistics for this state. This could include federal lawmakers, lab researchers, and others as well. I would say that this visualization is not very effective for the audience. They simply give us numbers to look at in the top section. They also use pie charts: something we learned that is not an effective way of reporting data. How does this help us compare to anything? I will go into further detail below in Visual Representation.

Visual Representation

Now on to the visual representation of the data. First of all, they are using pie charts… At stated in this article by Kieran Healy, pie charts are a bad idea. The use of pie charts poses a perceptual problem when the looked at since humans “tend to misjudge quantities encoded as angles”. Acute angles are underestimated, while obtuse angles are over estimated. If you take the labels away from each pie slice, it is near impossible to tell what quantitative value each slice represents. Secondly, what do these pie charts show us? I honestly still do not know. Take another look.

I came up with a few reasonable solutions. I think these charts show us a percentage of what group tested positive out of the all the tests in Alabama? But it also could be the percentage of each group that was tested in Alabama? Or is it the percentage of people of died after testing positive? My instincts lead me towards the first solution, but that is only because I have had a bit of experience in reporting statistics. How would a construction worker from Alabama interpret this? The point is clear: at least somewhere, describe what the information is that the visualization is intended to report. The same argument is presented for the column on the right side: what are these numbers being compared against?

Does this mean that 2,249 long-term care facility residents tested positive? Is this 2,249 / 3,000 residents? Or is is 2,249/ 1,000,000? In either case, lawmakers would take vastly different reactive measures. This is why it is important that we have a whole picture of these data. I would say that those numbers are out of the group of people who were tested, but once again, it is unknown.

Yet another flaw in this visualization is an aesthetic problem. First, look at the color palette. This is the range of colors used. They use a blue, orange, green, red, and blue in this figure. The colors seem to be acceptable since you can easily distinguish each pie slice. One major flaw I noticed is that in one chart, “Unknown” is orange, while it is purple and green in other charts. Using the same color for “Unknown” in each chart will provide more consistency and prevent people from misinterpreting the information. Another flaw is that the red on the outside and the red used in the charts do not seem to be the same shade.

I would also like to point out the spacing in this visualization. It feels like the image is squished together, which makes it harder to read. I believe they could have gotten away with smaller fonts and images. The larger fonts even overlap with smaller fonts in some places:

Content created like this just makes it seem like it was not created professionally.

My Redesign

In my redesign, I focused on making the important information the government is trying to communicate clearly visible. I turned each pie chart into a bar chart and where appropriate, ordered the bars by categories. I also included labels, numerical values, and text that conveys the total amount of tests and deaths. I decided to use a bar chart because it is useful for displaying quantitative data for categorical variables.

Note: these data in only this bar graph do not add up to 31,624 because of the error of 100.1% I mentioned above. I had to work backwards using the total number of confirmed, the percentages, and simple math to get these values.

I also changed the titles to be more descriptive. Now the title describe exactly what the data is showing us. I also decided to put the quantitative values of each bar, since the data varies so much. This way it is clear to see the exact numerical value that each bar represents. Otherwise it would be nearly impossible to tell what the values are.

I also made it a point to use the same hue, or shade, of each color. This makes it easier on the eye to look at as opposed to if there was a mix of dark and light shades. Note that I included the date of this data in the title, so it is clear to the reader when this is from. In the original it was tucked away in small font in the lower right hand corner, barely noticeable.

I also made the decision to completely eliminated the right hand column because I believe it this information be better displayed in its own graphic.

Clearly, it is imperative that these data be communicating in an appropriate way. Otherwise the data will be misleading, or misrepresent that point that was trying to be made. Pie charts are almost always bad; don’t use them, especially if you are a professional. I could not believe that a state government posted this type of visualization in 2020, but it is important that we stop reporting data in these horrible ways so that we can better understand the data.

--

--