Yuletide Information Quality
Here at Castlebridge Towers, we thought we might share a short case study of a recent consulting engagement and the unforeseen knock on impacts.
We recently headed north of Tromso in Norway to visit with a leading logistics and distribution service provider. We had been invited by their CEO to assess the potential applications of Big Data technologies in their operation. But when we got there we found that the scope of the project was much, much different. Our first meeting was with one of the management team, Buddy. He walked us through some of the business processes and functions in the organisation that enabled them to process their incredible logistical feats, particularly during their peak season which is a narrow window at the end of December and into early January (depending on target geographic markets).
The information gathering phase of any project is important. So we listened intently to Buddy’s description of their core processes to identify what business function might be a driver for what kind of technology investment. Below is a summary of our notes:
- He’s making a list >> This is indicative of a BI reporting requirement. Why is a list being made when a standardised report could automate some, if not all of the analysis phase?
- He’s checking it twice >> OK. That’s not ideal. That suggest a degree of scrap and rework might be happening. Or it could mean there is a 2-step accuracy validation process. We need to find out more.
- He’s going to find out who’s naughty or nice >> Hmmm.. Behavioural analytics. Query: What is the source for this and is lineage of source to inferred data set traceable? What about data subjects in jurisdictions where there are restrictions on behavioural profiling in Data Privacy laws? Does this organisation have consent or other lawful basis for conducting this processing? Is there an internal ethics committee to avoid them crossing the line into behavioural manipulation or processing of sensitive personal data?
- He sees you when you’re sleeping >> OK… This smells bad. Mass surveillance activity based to gather behavioural data? Is this proportionate? What is the source data set or data gathering process? Are wearable devices (fitbits, mobile phones etc.) being used to derive this data?
- He knows when you’re awake >> This is getting worse from a Privacy perspective. Technology investment should be curtailed until appropriate controls are in place over the access to and use of this data. The question of whether the data has been obtained fairly for a specified and lawful purpose also needs to be addressed! I suppose you can’t spell the CEO’s name with out N, S, and A!
- He knows if you’ve been bad or good >> Oh heck, behavioural profiling based on social media data? Twitter activity? For a large data set that would be a use case for a Big Data Solution, but only if the original source data was obtained in a legitimate manner.
- So you’d better be good for goodness sake!>> This now poses a difficulty. Automated processing to derive behavioural and other profiles that may have a legal or equivalent impact on an identifiable data subject? That’s an issue under EU Data Protection laws. And frankly it’s just a bit Orwellian.
We filed short report with the client organisation recommending that, amongst other things they:
- Consider implementing a Data Governance programme with an emphasis on data privacy principles compliance
- Ensure that there is adequate prior notification of processing of personal data, and potentially sensitive personal data. Reliance on “word of mouth” and “custom and practice” as has traditionally been the case will not be sufficient
- Review the methods by which data is obtained, logged, and reported on to reduce the need to check lists twice. Development of a Business Data Taxonomy or Ontology would be of use here, particularly with regard to classification of data subjects based on the behavioural profiling.
- In instances where a mass surveillance activity is being undertaken to build behavioural profiles, consideration should be given to whether the processing is both necessary and proportionate.
- An appropriate appeals mechanism should be put in place that allows data subjects to object to conclusions drawn about them through behavioural analytics and that some form of non-technology based interaction must make the final decisions. i.e. Big data analytics processes should only feed indicators to a decision maker based on probablitities and with a right of appeal and correction of data.
The unintended consequences? Well, there are some new lyrics to a Christmas Carol now, to help the staff remember the process and culture change. At least that’s what Buddy tells us.
He will be preparing a list, based on data derived from a variety of first party and third party sources,
He’s checking only the low-quality score records in his BI dashboard twice
He’s using probablity clustering and machine learning to infer who is naughty or nice (based on a documented algorithm)
Santa Claus is coming to town.
Santa Claus is coming to town.
He has inferred from social media and mobile data usage when you’re sleeping.
He has inferred from wearable device logging and social media posts when you’re awake.
He has applied an algorithmic classification based on behavioural characteristics,
But there’s a transparent process for appeal and correction of incorrect data for goodness sakes.
Sigh… we miss the old version, but frankly the data quality, data privacy, and data governance issues that existed in the old processes simply didn’t stand up to scrutiny in a Big Data, post-Snowden world….