“Data Literacy” is a very current buzzword. It’s been identified as a strategic necessity for data driven organizations, and an essential competency for employees.  But, as usual with many popular words, it’s not always very clear what people mean when they talk about “literacy”. The definition is unclear.  Are we all talking about the same thing?  If Gartner describes it as the ability to “speak data” as if it’s a second language, what do they actually mean and is “speaking data” even possible?

When we talk about “Data Literacy” (or following the lead of reading and writing competencies “Data Literacies” . . . what are we talking about?

Using a pen vs. being able to read and write

In last week’s blog, Daragh pointed out that tools-based definitions of data literacy such as the ones promoted by tools vendors and Gartner are the equivalent of confusing being able to use a pen with the ability to read.  Some of this is conflation of “data literacy” with “digital literacy”, which has been used to describe the “ability to use information and communication technologies to find, evaluate, create, and communicate information”.  The need for “digital literacy” in a changing technological landscape drove the teaching of Microsoft word and Excel in university programmes 25 years ago.  It’s currently a skillset that many primary and secondary school teachers have been trying to teach on the fly to even get their students set up for remote schooling tools mid-pandemic.  Digital literacy is now a vital foundational ability to work with computers to access information, communicate online, and create and manage digital content.  But advanced tool use is not data literacy.

The ability to use relevant tools is an important skillset, but facile tool use isn’t the same thing as understanding and being able to create meaning from the data processed by those tools.  Also. . . tools change and there are a lot of them. Trying to pin literacy on tool usage will give you the frustration of a constantly moving target. Imagine being confronted with this map of technologies as a way to query your “literacy”!

map of data tools for a cloud native landscape from Cloud Native Computing Foundationprojects

 

https://twitter.com/ReinH/status/1303800051802628096

Hanging data “literacy” on advanced tool use may engender a form of learned helplessness, as the tools and technology landscape rapidly change. It is also likely to result in a sort of “Matthew Effect”,  where “the rich get richer and the poor get poorer” – those with the base knowledge that allows them to “read to learn” can navigate a changing landscape, but those who are still “learning to read” are likely to get more frustrated when their skill base is suddenly reset when faced with a new set of tool types.

A broader Data Literacy – a different set of functional skills, integrating concepts and functional knowledge – is necessary to understand how data is managed and how it affects the organization, and to help determine what tools or technologies are appropriate to support your goals and get the most value out of your organization’s data.

 

Lies, Damn Lies, and Statistics Data Science

We need to consider “literacy” from both the “reading” and “writing” sides of things: “reading” as the ability to understand and interpret data as presented, and “writing” as the ability to create accurate, and actionable meaning that those who are reading can use and create value from. “Speaking data” is not the same thing as speaking English or Mandarin.

Much of the basics of “Data Literacy” are also described as “numeracy” and “statistical literacy” or  “statistical numeracy” .  At least some statistical literacy is vital for being able to understand and evaluate data for analysis and reporting in organizations, if even to be able to spot what’s wrong in visualizations and graphs such as this one, which went viral without attribution on social media a few weeks ago (supposedly created by “An economics professor at a high-ranking public university”):

Scattered dots labled as states with a diagonal line showing no relation whatsoever, labled "relation across states between physician salary and covid-19 mortality"

(It’s not only pie charts that are evil.)

Some of this basic statistical numeracy seems to have been lost in talking about “Big Data” analysis and doing data science in organizations.  The technical capability to create and ingest massive amounts of data and perform advanced pattern recognition operations on them does not remove the need for basic statistical understandings of the difference between correlation and causation.

(Perhaps the release of Britney Spears’ “Work Bitch” did not in fact have a measurable impact on US employment rates.)

Twitter post stating "I love data science" showing an image of a declining unemployment rate from 2013 to 2018 with an arrow pointing to the far left, labeled "Britney Spears releases "Work Bitch", 2013

https://twitter.com/GrantrGregory/status/1281396252828041216

Nor does the computation of large numbers do away with the need to understand and use the scientific method.  (I have a whole other rant about how often “data science” forgets the “science”.)  This is something that Google forgot in its predictive analytics, claiming that Google search results on symptoms could be used to model and predict influenza trends in the US.

The Map is not the Territory

From an organization’s perspective, Data Literacy might be defined as “the measure of the ability of an individual to understand the meaning and purpose of data in context so that they can be enabled to perform immediate job functions and also to extrapolate and correlate knowledge about data in new contexts and unforeseen/novel situations.”  When we talk about data literacy, we need to consider that literacy is contextual, and that the generation of meaning through data needs to be relevant to the organization’s needs. Some “data literacy” skillsets, like statistical literacy and understanding data visualization, are broadly necessary and useful in many organizations.  These competencies will be applicable to new contexts and situations. Some of these skills are understanding the importance of data quality, core data management skills, and the ability to use data to support decisions and action.  Other skillsets are much more narrow.

Take, for instance, the New Zealander who won the francophone Scrabble championships in 2015. Nigel Richards spent months memorizing the French official Scrabble book, but his ability to navigate communication in the French language was limited to “bonjour” and stating the scores.  He may have memorized the Scrabble dictionary, but he does not speak French. Richards’ skill in playing scrabble and memorization skills are truly exceptional, but he is not literate in French and his skills are only applicable in a very limited context.

As always, all models are wrong, but some are useful.  Another essential foundation to data literacy is comprehending how data is created and used, its fitness for purpose, and the effects of data quality—good or poor —on the organization. Data is not the same thing as the entity it describes, and the quality of our data will affect the quality and usefulness of our models or maps we use to guide our decisions.  Understanding this isn’t a given.

If a consultancy is asking “Do you speak Data”, perhaps the first question is, “What do you mean?” Is Data a language? Are we using the right metaphor? What are the strategic data needs of our organization, and how can we be sure to create and use good quality, well-managed data to inform our actions and support good outcomes? What skills and competencies are relevant to your organization’s strategy and context?