In the previous blogs about the Layered Architecture for Data Platforms we introduced the layered architecture for data platforms, dove deeper into the Data sources and Ingestion layer, discussed the Processing layer, and investigated the various technologies that can be used to store data.In this blog, we look into the Analytics Layer, where the data is analyzed and prepared so that it can be visualized and reported in the Visualization Layer ( see figure 1).
We will discuss the different categories of analytics, the methods that can be used, how analytics follows a certain process, the technologies that can be brought into play and the trends in analytics.
Figure 1- Layers of a Data Platform
When data is ingested, transformed and/or stored, it can then be further analyzed to find trends and answers (to questions) in the data. The purpose of the Analytics Layer is therefore to develop and run analytical models on the data, and for this to succeed, it is important that the source data is cleansed and well prepared. If the data is not of excellent quality, you cannot trust the results of the analytics (garbage in is garbage out).
Analytics can be divided in the two categories: Business Intelligence and Advanced Analytics. Business Intelligence includes reports and/or dashboards that contain the results of KPIs (Key Performance Indicators) that are related to the performance of the business; Advanced Analytics often means that more advanced algorithms are applied to get the results.
In addition to the two categories described above, a distinction can be made between analytics purposes. This distinction elaborates on something about the type of analytics that is needed, the kind of data, how the data needs to be stored, and if analytical models are needed. A few examples for which analytics can be used are:
Data analytics can be done using a variety of different methods. To give you an idea of how extensive this list truly is, here are a few of them: reporting, dashboarding, self-service BI (Business Intelligence), ad-hoc queries, automatic monitoring and alerting, scorecards, online analytical processing (OLAP), statistical or quantitative analysis, data mining, predictive modeling, machine learning, image recognition, big data analytics, and natural language processing.
Traditional analytics is mostly done by providing the data that is stored in a relational database to a reporting or dashboarding tool. In some cases, an OLAP layer is used between the database and the reporting/dashboarding tool to improve the performance by storing already pre-calculated aggregated results. Often this OLAP layer is using in-memory technology to improve the performance even further. OLAP layers can be particularly useful when it is known which information needs to be shown (and can be pre-calculated), for instance when using pre-developed dashboards. When the information requirements are less well known, it is difficult to predict which data needs to be pre-calculated and stored in the OLAP layer.
For diagnostic, predictive, prescriptive or automated analytics often more advanced analytics methods are used, such as predictive monitoring, machine learning, big data analytics and/or natural language processing. A practical example of predictive analytics is the Deloitte Impact Foundation initiative ‘Cognitive Deforestation Prevention’ that predicts where illegal deforestation will happen.
Cognitive analytics mostly uses natural language processing to understand speech or natural text, or applies image recognition to identify people or to detect emotions. A practical example of using image recognition is our AI4Animals solution that improves animal monitoring in slaughterhouses.
Search-based analytics also mostly uses natural language processing, but in combination with big data analytics. Natural language processing is used to understand the question from the user and big data analytics is then used to find the relevant information in a large number of documents. A practical example of this kind of analytics is represented by the chatbots that are often found on websites.
Ad-hoc analytics mostly uses ad-hoc queries or self-service BI to find answers for one-time questions. It is important in this case that the analyst has access to all the relevant data.
Best Practices
Use a centralized model repository for your analytical models to keep track of model versions, status and training results.
Make sure that your analytical model is transparent, so that everybody can understand why the model comes to a certain outcome.
Carefully consider which data and parameters should be used by taking the privacy, sensitivity and politics into account.
Keep the development, training and production environment separate.
Authorization is important; who should have access to the models and data sources?
Depending on which methods are used for the analytics, most types of analytics follows a certain process. An example of such a process is shown in figure 2.
Figure 2 - Analytics process
Firstly, it is crucial to define the business need or use case. What is the purpose of the analytics? Once that is clear, you need to find and get the required data. Where is the data stored? Is it already stored in the data platform? Is the data allowed to be used for this purpose? If the data is available, the analyst needs to understand what the data means so that it can then be prepared for the analytics. The analyst needs to choose an algorithm in which the model will be built. The analytical model then needs to be built and subsequently validated. After the data has been prepped and analyzed, the results can be visualized and communicated.
The steps in this process that are grey in figure 2 are actually conducted in other layers of the layered architecture. “Get the data” is done in previous layers (Data Sources, Ingestion, Processing and Storage) and “Communicate the results” is part of the Visualization Layer, which we will cover in the next blog.
We have discussed that analytics can be done by several types of methods and that it should follow a certain process. Analytics can be done with a number of different technologies from many vendors as well as in the cloud or on-premise. The choice of technology is dependent on:
There are a lot of considerations to take into account when deciding on the best technology to use, and it really depends on your use case, what kind of analytics are needed and what methods should be used.
We are seeing the following trends in data analytics:
Deloitte can help you with choosing which kind of analytics can be best used, with which methods and on which technologies to make sure that it will fit in the data platform. We can also help in developing the analytical models and implementing them. Our next blog will be about the Visualization Layer. If you want to know more on how the data can be visualized, please read our next blog in our series about the Layered Architecture.
Deloitte's Data Modernization & Analytics team helps clients with modernizing their data-infrastructure to accelerate analytics delivery, such as self-service BI and AI-powered solutions. This is done by combining best practices and proven solutions with innovative, next-generation technologies, such as cloud-enabled platforms and big data architectures.