Layered Architecture for Data Platforms: the place where the data is analyzed

Article

Layered Architecture for Data Platforms: the place where the data is analyzed

Data can be analyzed in many different ways. In this part 5 of the series on the Layered Architecture for Data Platforms we describe the different technologies and techniques to analyze data.

In the previous blogs about the Layered Architecture for Data Platforms we introduced the layered architecture for data platforms, dove deeper into the Data sources and Ingestion layer, discussed the Processing layer, and investigated the various technologies that can be used to store data. In this blog, we look into the Analytics Layer, where the data is analyzed and prepared so that it can be visualized and reported in the Visualization Layer ( see figure 1).

We will discuss the different categories of analytics, the methods that can be used, how analytics follows a certain process, the technologies that can be brought into play and the trends in analytics.

 

Figure 1- Layers of a Data Platform

When data is ingested, transformed and/or stored, it can then be further analyzed to find trends and answers (to questions) in the data. The purpose of the Analytics Layer is therefore to develop and run analytical models on the data, and for this to succeed, it is important that the source data is cleansed and well prepared. If the data is not of excellent quality, you cannot trust the results of the analytics (garbage in is garbage out).

Analytics can be divided in the two categories: Business Intelligence and Advanced Analytics. Business Intelligence includes reports and/or dashboards that contain the results of KPIs (Key Performance Indicators) that are related to the performance of the business; Advanced Analytics often means that more advanced algorithms are applied to get the results.

In addition to the two categories described above, a distinction can be made between analytics purposes. This distinction elaborates on something about the type of analytics that is needed, the kind of data, how the data needs to be stored, and if analytical models are needed. A few examples for which analytics can be used are:

  • Traditional (descriptive) analytics: Using analytics on current and/or historic data to show the current or past performance.
  • Diagnostic analytics: Using analytics on current and/or historic data to give information about why certain events happened.
  • Predictive analytics: Using analytics in combination with current and historic data to predict future outcomes.
  • Prescriptive analytics: Using analytics to not only predict what will happen and why it will happen, but also to suggest options to mitigate the risks or to benefit from future opportunities.
  • Automated analytics: Using real-time data in combination with analytics to automate decision making for operational processes.
  • Cognitive analytics: Using human-like intelligence to give structure to unstructured data, such as natural language.
  • Search based analytics: Using analytics to use natural language processing to find and extract meaningful information from the data.
  • Ad-hoc analytics: Using analytics to find answers to specific (often one-time) questions using the available data.

Methods

Data analytics can be done using a variety of different methods. To give you an idea of how extensive this list truly is, here are a few of them: reporting, dashboarding, self-service BI (Business Intelligence), ad-hoc queries, automatic monitoring and alerting, scorecards, online analytical processing (OLAP), statistical or quantitative analysis, data mining, predictive modeling, machine learning, image recognition, big data analytics, and natural language processing.

Traditional analytics is mostly done by providing the data that is stored in a relational database to a reporting or dashboarding tool. In some cases, an OLAP layer is used between the database and the reporting/dashboarding tool to improve the performance by storing already pre-calculated aggregated results. Often this OLAP layer is using in-memory technology to improve the performance even further. OLAP layers can be particularly useful when it is known which information needs to be shown (and can be pre-calculated), for instance when using pre-developed dashboards. When the information requirements are less well known, it is difficult to predict which data needs to be pre-calculated and stored in the OLAP layer.

For diagnostic, predictive, prescriptive or automated analytics often more advanced analytics methods are used, such as predictive monitoring, machine learning, big data analytics and/or natural language processing. A practical example of predictive analytics is the Deloitte Impact Foundation initiative ‘Cognitive Deforestation Prevention’ that predicts where illegal deforestation will happen.

Cognitive analytics mostly uses natural language processing to understand speech or natural text, or applies image recognition to identify people or to detect emotions. A practical example of using image recognition is our AI4Animals solution that improves animal monitoring in slaughterhouses.

Search-based analytics also mostly uses natural language processing, but in combination with big data analytics. Natural language processing is used to understand the question from the user and big data analytics is then used to find the relevant information in a large number of documents. A practical example of this kind of analytics is represented by the chatbots that are often found on websites.

Ad-hoc analytics mostly uses ad-hoc queries or self-service BI to find answers for one-time questions. It is important in this case that the analyst has access to all the relevant data.

Best Practices

Use a centralized model repository for your analytical models to keep track of model versions, status and training results.

Make sure that your analytical model is transparent, so that everybody can understand why the model comes to a certain outcome.

Carefully consider which data and parameters should be used by taking the privacy, sensitivity and politics into account.

Keep the development, training and production environment separate.

Authorization is important; who should have access to the models and data sources?

Analytics Process

Depending on which methods are used for the analytics, most types of analytics follows a certain process. An example of such a process is shown in figure 2.

 

Figure 2 - Analytics process

Firstly, it is crucial to define the business need or use case. What is the purpose of the analytics? Once that is clear, you need to find and get the required data. Where is the data stored? Is it already stored in the data platform? Is the data allowed to be used for this purpose? If the data is available, the analyst needs to understand what the data means so that it can then be prepared for the analytics. The analyst needs to choose an algorithm in which the model will be built. The analytical model then needs to be built and subsequently validated. After the data has been prepped and analyzed, the results can be visualized and communicated.

The steps in this process that are grey in figure 2 are actually conducted in other layers of the layered architecture. “Get the data” is done in previous layers (Data Sources, Ingestion, Processing and Storage) and “Communicate the results” is part of the Visualization Layer, which we will cover in the next blog.

Choice of technologies

We have discussed that analytics can be done by several types of methods and that it should follow a certain process. Analytics can be done with a number of different technologies from many vendors as well as in the cloud or on-premise. The choice of technology is dependent on:

  • Technologies of the other layers: the technologies in the analytics layer should match with the technologies in the other layers of the data platform.
  • Batch or Real-time: should the analytics be done in (near) real-time or can it run in batches? Or do you need both?
  • Sandbox vs Productionized environment: is it necessary to use a sandbox-like playground for one-offs or a productionized environment for repeatable analytics?
  • Cloud or On-Premise: do you need to use a cloud environment for the analytics? Especially if your analytics workload is very unpredictable, the scalability of the cloud is a great benefit.
  • Software-as-a-Service (SaaS) solution: should you use a SAAS solution for the analytics? Cloud providers are offering Machine Learning-as-a-Service or pre-trained AI models. SaaS has the benefit of not having to worry about the infrastructure and that minimal time for the setup is required. This means that you can almost immediately start working on the analytical use case.
  • Schema-on-write vs. Schema-on-read: will you use already modeled data in a database (schema-on-write) for the analytics or should the modeling be part of the analytics (schema-on-read)? For example, when using data from a data lake.
  • Containerization: should the analytics solution be packaged, deployed and run in containers?
  • Consumption: how should the results be consumed by the (end-)user?

There are a lot of considerations to take into account when deciding on the best technology to use, and it really depends on your use case, what kind of analytics are needed and what methods should be used.

Trends

We are seeing the following trends in data analytics:

  • In the last couple of years, a lot of attention has been drawn to the methods of building good analytical models. Now, the attention is focused more on how to productionize the analytical models.
  • Agile working methods are becoming more popular for developing analytical models. An example of this is the MLOps method that can be used to develop machine learning models. Our colleagues have written an article about MLOps for the banking industry.
  • Developing analytical models is becoming easier because of the possibilities that are offered by the cloud providers where, for some solutions, there are already pre-trained analytical models that can be used. These pre-trained models have the benefit that you are not required to train your model which also means that you do not need the data which would be necessary to train the model. Getting enough data of excellent quality to train a model is often difficult, so this is a great advantage. However, keep in mind that pre-trained models are only available for a limited set of common use cases.
  • Often, analytics use cases work with data from OLTP applications, for example ERP, CRM or manufacturing systems. But nowadays, more frequently, event based data sources like IOT devices, sensors or machines are being used as a source for the analytics. This enables use cases like predictive maintenance, asset/plant performance optimalization and improved quality control.
  • Analytics is now used for the data management processes of the data platform. This is called Augmented Data Management and can help to reduce data management tasks by 45 percent. You can read more about it here.

Deloitte can help you with choosing which kind of analytics can be best used, with which methods and on which technologies to make sure that it will fit in the data platform. We can also help in developing the analytical models and implementing them. Our next blog will be about the Visualization Layer. If you want to know more on how the data can be visualized, please read our next blog in our series about the Layered Architecture.

Deloitte's Data Modernization & Analytics team helps clients with modernizing their data-infrastructure to accelerate analytics delivery, such as self-service BI and AI-powered solutions. This is done by combining best practices and proven solutions with innovative, next-generation technologies, such as cloud-enabled platforms and big data architectures.

Did you find this useful?