To the pointCurrent market repository solutions are creating problems for companies, reaching their limits as the global data volume continues to increase exponentially. As data warehouses and data lakes become more and more complex, data-driven decision making is impacting companies’ businesses. |
In the era of big data, the volume of data produced globally increases exponentially with each year. According to International Data Corporation forecasts, by 2025 the volume of data worldwide is expected to reach more than 175 zettabytes. To put this figure into perspective, if all this data were stored on DVDs and placed side-by-side, they would form a line circling the Earth 222 times1. Companies produce a vast quantity and scope of data, and when managed effectively it can generate powerful business insights and bring a competitive advantage. The key to maximizing the data’s potential is having easy access to clear and coherent data. Having the right data architecture and data governance in place is an essential starting point to ensuring consistent data quality and brings the potential to transform data into a highly versatile product.
Data mesh continues to gain popularity with companies because its architectural design helps to overcome the pitfalls of current data repository solutions. By taking advantage of data mesh, you could make a bigger impact on your business. So, what is data mesh, what are its benefits, and how does it differentiate from current data architecture solutions?
Current companies’ data architecture
The most well-known platforms for centralizing data from multiple sources are data repository solutions called data Warehouses and data lakes. Data warehouses (DWH) are a reliable data management system because they aggregate large volumes of data from multiple sources into a single repository. Data is structured, historically unified and ready to use2. In contrast, a data lake contains raw data that can be structured and unstructured. It provides a massive data store, but it takes time to retrieve data because data lakes have a flat architecture3.
These repository solutions were adopted by many companies at the beginning of the big data era to develop their business intelligence and help in decision making. While these platforms play an essential role for storing and analyzing data, they are reaching their limits as data sources and volume are increasing. It has become unrealistic to integrate everything into a single data platform, hence the idea of unifying data sources under a common semantic umbrella.
The limitations of data warehouses and data lakes
As the volume of available data is expanding, data repository solutions are becoming increasingly complex. Within the corporate context, creating data products in compliance with organizational and regulatory standards becomes a time consuming and arduous activity. A more complex structure also decreases scalability and agility, which is problematic within a context where decisions need to be made quickly4. Furthermore, new types of data sources are emerging every day and need to be captured and understood in order to leverage their potential.
When leveraging data using these centralized platforms, the duty to ingest, transform and deliver data to the different business teams falls on the central IT organization. When business domain data owners circumvent IT, or miscommunication leads to the use of “shadow IT,” more disparate data sources are created that are non-compliant with internal processes 4,5. New architectures like data mesh or data fabric were developed, based on the distributed nature of data governance, to overcome some of these shortcomings.
What is data mesh and how does it help to overcome challenges?
Data mesh has gained popularity since being introduced in 20195, due to its new way of managing data through a democratized approach backed by a centralized, self-service infrastructure. Its main objective is to build business data products without specifying the technology involved 6. This is facilitated by three layers:
Figure 1: Layers of data mesh architecture and their benefits
Data mesh architecture is based on four principles which are designed to overcome the disadvantages of other types of data repositories7, 8, 9:
Implementing these four principles and considering data as a group of repositories containing data products, data mesh offers concrete solutions through restructuring, therefore alleviating companies’ most pressing data architecture problems.
Despite these benefits, one should be aware of potential barriers to overcome before switching to a data mesh. To improve its architecture and be more easily adopted by companies, the repository solution should address the following difficulties4:
Data Valorization and AI integration through Data mesh implementation
Data mesh designs valorize data by enhancing its versatility across businesses. As it provides better interoperability, organizations should consider using agnostic products to ensure that they are not ‘locked-in’ to a single provider. As the flexibility to manage data increases, businesses can be extended by using new tools such as AI solutions. This will unlock many different possibilities which will help organizations to leverage their data and create a higher business impact through informed decision making.
[1] IDC, "The Digitization of the World - From Edge to Core," IDC, 11 2018. [Online]. Available: https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf. [Accessed 10 06 2022].
[2] Qlik, "Data Warehouse," [Online]. Available: https://www.qlik.com/us/data-warehouse. [Accessed 18 05 2022].
[3] Qlik, "Data Lake," [Online]. Available: https://www.qlik.com/us/data-lake. [Accessed 18 05 2022].
[4] Deloitte, "From data mess to a data mesh," Deloitte, [Online]. Available: https://www2.deloitte.com/nl/nl/pages/strategy-analytics-and-ma/articles/from-data-mess-to-a-data-mesh.html. [Accessed 16 05 2022].
[5] Z. Dehghani, "How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh," 20 05 2019. [Online]. Available: https://martinfowler.com/articles/data-monolith-to-mesh.html. [Accessed 16 05 2022].
[6] Gartner, "Quick Answer: Are Data Fabric and Data Mesh the Same or Different?," Gartner, 1 11 2021. [Online]. Available: https://www.gartner.com/doc/reprints?id=1-292DG4LD&ct=220209&st=sb&utm_campaign=TY%20Mailers&utm_medium=email&_hsmi=182672238&_hsenc=p2ANqtz--00qJgAzU26v3DoBBLqASGm_vJVdhGQV5gnAirC2zfIEj_o0wChJj9zj2wGnWiCV18YxKDIKMGFZDzhn6xkoGVW--VMw&utm_content=182672238. [Accessed 9 06 2022].
[7] Z. Dehghani, "Data Mesh Principles and Logical Architecture," martinFowler.com, 03 12 2020. [Online]. Available: https://martinfowler.com/articles/data-mesh-principles.html#DataAsAProduct. [Accessed 20 06 2022].
[8] J. Christ, L. Visengeriyeva and S. Harrer, "Data Mesh Architecture," [Online]. Available: https://www.datamesh-architecture.com/. [Accessed 17 05 2022].
[9] Starbust Data, "What is Data Mesh?," [Online]. Available: https://www.starburst.io/learn/data-fundamentals/what-is-data-mesh/. [Accessed 18 05 2022].