The high number of systems, data transformations and extensive data and system legacy make it challenging for financial institutions to provide regulators with the required level of transparency on their data integration- and aggregation processes used for reporting. Data lineage provides insights into which business and technical transformation logic has been applied to your data and by whom.
In our previous article we elaborated on the definition and usage of metadata and its increasing importance in the fast changing regulatory environment.
In general, metadata is the prerequisite for proper data lineage which by the international association of data management professionals (DAMA, 2017) is defined as “the pathway along which data moves from its point of origin to its point of usage”.1 It also describes what happens to data as it flows through the organization.
Figure 1 demonstrates a typical data-driven reporting environment where source data is transformed in such a way it is accepted by the data model, which in turn is deployed by translation of reporting requirements into a semantic layer. Three types of data lineage are considered when aiming for a data-driven reporting process:
Data lineage has many applications which are important in order to release the full potential of data-driven reporting process:
Transparency and visualization of your data flows enable reporting specialists, business analysts and data specialists to work together and understand each other. It enables an efficient “search for data” in your organization, resulting in cost reductions and a shorter lead time of error solving and change requests.
When implemented and used correctly, data lineage may enhance the control on your data transformation- and reporting processes. Which in turn will result in improved quality of your data and reports and a eases the discussion with the supervisory bodies.
A prerequisite for data lineage is a data governance framework which includes a well-defined metadata strategy and ownership of metadata within your organization. Thereby the focus should be on metadata being available, accurate and complete. A next step is identification and connection of all your data and its metadata from source to transformation layer to reporting solution.
Metadata identification and management processes are currently known as being labour intensive, inefficient and expensive. Robotics and Artificial Intelligence (AI) can play a major role in metadata-tagging and generation of metadata. While manual metadata-tagging can be an expensive process. It can cost (on average) € 2 - € 5 per metadata item.2 On the other hand, AI can reduce this amount 10x while at the same time improves the overall quality of your metadata and thus data lineage.
Therefore, automation of real-time identification and connection of metadata as well as the visualization of data lineage will enhance many data intensive reporting processes in your organization, while also significantly reduces the costs associated to it.
This in turn will lay the cornerstone for your organization to become future-proof and a truly data-driven organization eventually. In our next article we will elaborate on the application of these automation techniques in the world of metadata management, data lineage and how this can help your organization in improving the reporting processes.
1 DAMA International, Data Management Body Of Knowledge, page 28, 2nd edition, 2017
2 EDIA (2018), Why is automated metadata-tagging better than manual tagging?
Would you like to know more about Metadata Management or Data Lineage? Please contact Yuri Jolly via the details below.