BI Trends

Why Bad Data Led to Botched Election Analytics & How to Avoid the Same Pitfalls

By Kevin Hardiman
Share on LinkedIn Tweet about this on Twitter Share on Facebook

A major concern for any analytics offering is data quality.

Unfortunately, dozens of news outlets are now apologizing for putting forth questionable data during the 2016 election. Looking at any number of news sites, Trump’s estimated chances of winning topped out at around 40 percent over the past six months. Going into Election Day, Clinton’s chances of winning hovered between 70 and 90 percent, according to news outlets like The New York Times, FiveThirtyEight, and Reuters. But in the last few hours of November 8th, Trump’s chance skyrocketed from 15 percent to over 85 percent.


Getting it right at the finish line does not excuse months and months of incorrect data. Even one major misstep like this can be dire to an organization’s reputation. In a recent KPMG report, 70 percent of organizations agreed that data and analytics can expose them to reputational risk. “Making decisions or targeting consumers based on inaccurate predictions will quickly erode, if not extinguish, consumer trust and shake the confidence of those executives who rely on these predictions to make informed decisions,” stated the report.

How could numerous established and trusted news outlets have all made such a misstep in their analytics? The answer lies just as much in the data collection as it does the analysis.

There are more polls today than ever, and people who don’t know how to poll are conducting them. People go into polls with bias, but that bias is often not reflected in the final analysis that’s shown to consumers. In some instances, pollsters didn’t return to the same sample groups to see if their opinions had changed. In other cases, reporters simply leaned too heavily on the polls, treating them as gospel when other factors (such as voted turnout) needed to be considered more heavily.

Data analytics has been around for a long time. But data used to be consumed primarily by people with advanced mathematics backgrounds or advanced degrees in data science—experts who could review the methodology behind the polls and determine whether it was trustworthy or not. Now, data analytics has quickly spilled over into the realm of nontechnical people who don’t have the skills or tools to see flaws in methodology.

News outlets have a responsibility to not take the polls at face value without doing any investigation. They need to look at the process the polls are using, look at the quality of their polling, then exclude the ones that don’t meet a high level of quality.

This same lesson applies to any company offering analytics to its customers or internal end users. If you want people to trust the data you’re giving them, you have to perfect the methodology behind it. No matter what analytics solution you use, it’s essential to audit the data in it to ensure it’s the information users need and is governed from a data-quality standpoint.

That starts with simply identifying the data you need. For election data this is fairly standard—poll constituents, collect data, analyze it. However, to my earlier point, there were likely missteps along the way (e.g. not continuing to sample the same groups).

Businesses have a somewhat different set of challenges to contend with. Nearly every organization today is outsourcing their fundamental applications that are critical to running their business (e.g. CRM, Marketing, Finance, HR), and the data lives with those applications. What makes it difficult—and what can lead to dirty data—is that each of these applications stores data in a different place, and usually in different formats. Moreover, these applications don’t share common underlying data structures or definitions of common terms such as “customer.” The data is all very different: Each piece is specific to that application and is rarely designed for analysis.

After data collection, we move on to merging the data. In political polls this involves taking the data you’ve collected and correlating it with other data—from different polls, past election years, demographic data, and the like. For businesses, this involves blending various data sources. You can’t just download a bunch of CSV files and put it all in Excel, with no easy way to merge disparate data fields together from different applications and get a complete picture of the business.

Finally, company leaders need to manage data for performance and analysis. Whether you’re a pollster or a product manager, your data will inevitably be pulled from a number of different places. For this, you need an intermediate place to store the data for analysis that’s smart enough to self-tune for performance and simple enough for anyone to manage in order to keep the data fresh.

It’s clear that consumers want analytics and those data insights can be used to keep audiences on a site—or in an application. But it’s more important than ever to get that data and analysis right. You may only get one chance to prove yourself trustworthy and valuable to your end users, so make data quality a top priority of any analytics project.


Originally published November 17, 2016; updated on August 9th, 2017

About the Author

Kevin Hardiman is the Senior Director of Product Architecture at Logi Analytics. He has over 17 years of experience with Business Intelligence and data visualization.