Insights

Data driven or driving with data?

By Dr Katherine O Keefe

July 30, 2021

19min read

graphs of performance analytics on a laptop screen for data driven business

Data driven decision making seems intuitively a great idea. Basing your organisational strategy on sound information and having concrete figures supporting your decision are likely to be a much stronger foundation for good decisions and good outcomes than unsupported opinion or gut feeling. However, there are limits to this idea, and my question to you is who’s really driving? Do you want a data driven decision making or data informed decision making?

There’s a fundamental conception about what data is and what data is good for that underlies our understandings of data quality and our requirements for data strategy, and what we mean by “data driven”. Data as we model and collect it creates an abstract representation of concrete things, and how we use it has real world impacts. Data is the map, not the territory. Data in use is where the map and the territory interact. The real-world implementation of new technologies and data-driven decision making are a good reminder that we create the data and the models that “drive” decision making, and that data, information, knowledge, and wisdom are all different things.

Data vs. information vs. knowledge vs. wisdom

Data is a bunch of lines and squiggles created as a one-to-one representation of landscape features. Information is a map. Knowledge is the map key. You can automate a lot of this knowledge, inputting the data and the rules for reading the data into a machine that identifies your location according to identifiers that match it to corresponding map locations and outputs directions from that location to your desired one. But, the difference between data driven decision making and informed decision making supported by data is understanding the context in which the decision is being made and the limitations of the data and model, and how that affects real world impacts.

Wisdom is recognising that even if the data says that is a road and you should go that way, the map is missing some key information and continuing in that direction would be dangerous. In the early days of the introduction of Sat Navs in cars, there were multiple stories of people being led astray along inappropriate routes and even being led to nearly drive off cliffs because their GPS device gave “wrong” instructions, because the system did not appropriately distinguish between a narrow footpath and a drivable road. That’s the kind of data-driven driving you don’t want, and it is exactly that – data driven driving, without fuller knowledge of the context and limitations of the data driving the decisions.

Discriminative algorithms?

This doesn’t just show up in driving directions, of course. Five years ago, Amazon Prime launched same day delivery service in the US to some zipcodes in major cities, but people immediately noticed a disparity in the demographics of where Prime same day delivery was and wasn’t offered. The maps of service availability just happened to be a match for the old redlining maps from historical housing discrimination practices and still reflected disparate impacts on predominantly black neighbourhoods. This may have been a purely data-driven calculation that was not calculated on racial demographics at all, but the social impact of a decision, deliberate or not, was damaging.

Undesirable outcomes can have root causes in multiple points:
• In the decisions made in what data to acquire and how to create or collect it;
• In the quality of the datasets;
• In the models created;
• In how data and models are interpreted or applied;
• In the context in which decisions or actions take place;
• In how we determine what is an acceptable result, what is quality outcome, and when something is ready for deployment;
• In misinformation or incorrect understandings regarding any of the above.

Machine Learning is really good at identifying, replicating, and in some cases amplifying patterns in our datasets. This can be a good example of “data driven decision making”. But a black box algorithm can risk the generation of biased and discriminatory decisions or outcomes that you can’t explain or stand over, because you don’t know what data, metrics, or logic the outcome was derived from. If a machine learning algorithm has taught itself shortcuts that make decisions based on something that your organization cannot ethically or legally depend on, how do you know? AI is not good at understanding context. But to make a good decision regarding the data, you need to know the context and fitness for purpose of your data, and understand the real world impact of the decision.

Data quality and correlations (or the lack of pirates could be causing global warming)

The difference is between being solely “driven” by the data and driving with support and understanding of data. If you incorporate techniques from AI or machine learning into risk models, underwriting, or decision making processes in your organization, you need to plan for and be aware of the potential for “digital redlining”. Machine learning and algorithmic or “AI” decision making are prime examples of data driven decision making. But whether it’s a machine or a human making the decisions based on data, you can only make decisions based on the data you actually have, on the map you created for yourself. If that data has gaps, quality issues, or creates a model that is in some way unfit for the type of decisions based on it, the outcomes may be worse than decisions made on a manager’s “feelings” (which may also be based on incomplete information or models that don’t match reality on the ground).

This brings us to the basics of data quality, knowing what questions to ask about the data you have and its context . . . and using wisdom to guide decisions, not just information. If you want to use data strategically and transform your organisation so that decisions are informed and supported by data and information, you need to know how well the map you have represents the territory for your particular purpose.

12 years ago, Daragh wrote his first Castlebridge blog post, quoting a model for learning for people and organisations, and he was asking a lot of these questions.
Listening and leading are really two basic steps of learning. Learning, in a more complete way, is:

• Asking the right questions
• Gathering the expected and unexpected data
• Responding to the analysis
• Analysing the data

A lot has changed in 12 years, but the fundamentals still apply. How was your data created? Under what circumstances and assumptions? Do the technologies and systems you use fit your needs and purposes? To know whether your data is fit for purpose, you need a clear understanding of not just these answers, but also what your purpose is. Whether you’re looking at strategy, digital transformation, automation, AI, or improving the quality of your organisation’s data you need to start with asking the right questions so you can create quality data and useful models that support knowledge, understanding, and wisdom.