Relying on Good Data to Make Smart Decisions: How to Identify What’s “Good”?

December 8, 2016 Posted by - Data Strategy Director

As I reflect back on election night one month ago, regardless of who you voted for, the results were surprising to many. I think back on the feedback noted pollsters at fivethirtyeight.com started giving in the late hours of November 8 –

Pop A turned out more than we expected

Pop B flipped from how they behaved 4 years ago,”

The polling missed on in pop C.”

Somewhere in my head, a voice trying to rationalize expectations spoke up and said, “Of course they were wrong, they are still relying on polls.”

As fast as the data space has moved over the past few years, many decision makers have rooted their careers in the same sort of predictions that surprised most on election day. My hope is that this election brings a new generation through an old “ah-ha” moment when considering any projections – “bad data in, bad data out.” While every data scientist will roll their eyes at my key questions below, this is intended as a simple cheat-sheet for the decision-makers who rely on data to guide their course. If you are making any decisions based on data, consider the following questions and takeaways:

Where is the data sourced and how did you get it? Polling data asks people a question which they respond with consciously. That polling data gets aggregated against demographic data, which can be dubiously sourced. If you are relying on a self-reported data set that is scaled with another self-reported data set, this should create more questions than it answers.

Key Takeaway: If your company didn’t directly generate the data, ask your provider to explain their sources.

What scale is it at and how consistent is it? Panels and surveys are fantastic at getting the current opinion of exactly who is asked. Size is the issue because very few panels and polls are able to capture a large and diverse enough group to cover their target. There were over 125MM voters and polls rarely get 2000 responders, of which virtually none get consistent respondents over time.

Key Takeaway: Ask about coverage and bias. What could possibly skew this data one way and how was it controlled for?

How current / timely is it? Most election polls root themselves in what was done 2 years ago. This doesn’t contemplate how many people move across states in that timeframe. Or, got a new job. Or, had a kid. All of these are major life changes can impact their engagement with brands just as much as political parties. Understanding the currency of any data your look at, whether it is sales figures or demographics is important for you to feel confident in your decision.

Key Takeaway: Ask about the age of the data or model that generated the insight. When was it last assembled and verified?

How Corroborated is the Insight and what’s the impact of being wrong? This is probably the toughest question to focus on. Nate Silver was quick to publish how skeptical he was of his own projections, but that is a tough factor to consider as it requires a level of contrarian thinking.

Key Takeaway: Ask your analyst how many variables were accounted for. Consider the error range. Understand the assumptions of the model and evaluate what happens if the assumptions are false.

Cardlytics leverages directly-reported spend data from banks. Before that I worked for a credit bureau, so I have an obvious favoritism for data over panels and polls. Whatever your opinion on the outcome of this election, the data most of us saw didn’t predict the election winner, and we should try and consider if there is a lesson in this. Before we build our world view of what will or won’t happen we should always ask about the validity of what’s driving us towards a decision.