If you are a physicist or an astronomer (and in the world of Analytics & Information Management, we have a few of these) you will be very familiar with the imbalance in the universe between visible matter - basically the stars and planets we can see, and 'dark matter'. Dark matter seems to make up much of the mass of the universe, but it's difficult to work out how to see or measure it.
It's much the same with data. A lot of the data we have is visible. Most of what you see through your SAP or salesforce interface, or your mobile banking app. Customer names & IDs, transaction values. Product codes and descriptions. By analysing this data we aim to get insights that lead on to business actions.
But, just like dark matter, there's a lot of hidden data out there, which may not be so easily analysed. And the volume of dark data massively outstrips the volume of visible data. This dark data is found in logs, in metadata, in text fields and documents, in video, in audio, in pictures. While visible data can be easily analysed in databases, this dark data needs some complicated extraction before it can be analysed.
There are different types of dark data. Take a message, such as a tweet. A tweet is 'dark' as it needs to have the language extracted so that a computer can analyse what is written in it. The metadata around the tweet is ‘dark’ also - the time of day sent, the @user, the #hashtag, the device, the location. Analysing the text in the tweet gives you an insight into what is being said, who said it, how happy or angry the sender is. The tweet may contain images or audio, which when analysed using image recognition tools can extract content such as descriptions or terms. Process metadata is ‘dark’. In an SAP system for example, the records of the metadata around the transactions – when data was created or changed (e.g. from ‘in progress’ to ‘completed’ or ‘sent back’) – can give insights using what we call ‘process bionics’ to understand exactly what volume of transactions do not follow the designed process and why.
I take a photo with my phone and post it on Facebook, sharing it with my friends. I have shared two types of ‘dark data’ - the post and photo itself and the metadata surrounding it – where and when the photo was taken and uploaded, and by whom. When it is 'liked', Facebook knows who I am connected to. It will aggregate this post with millions of other posts and photos to get insights that lead, in Facebook's case, to targeted advertising, or sales of insights to other companies.
Just like a sales transaction provides a record of a point in time, and only provides real insights when collected, aggregated and analysed together with other transaction data, the same is the case with dark data – it's not the individual message, transaction or document that provides a specific insight, but what it means in the context of wider patterns.
Last year we did just this for the digital platform of a major bank, helping the bank identify and drive actions that doubled customer engagement with the commercial digital app. The logs behind every web site or mobile site recorded the traces of each user's behaviour, so we were able to gain insights into which parts of the digital experience worked and which didn’t. The client was then able to change the design of the app and provide recommendations based on a combination of the visible data (transactions and balances) and the dark data (the logs, pages viewed, advice seen), leading to improved experience and banking service.