Skip to main content

Unveiling modern automated testing

Confluent Kafka's distributed event streaming

By Malvika Munshi

Recent changes in technology and the handling of massive streams of data across different industry platforms have led to a revolution in the world of data management and data utilisation. These innovations have influenced how traditional automation frameworks are developed and executed. Let’s explore a modern automated testing approach with the Confluent Kafka system.

 
What is Confluent Kafka?

As quoted on the official Confluent Kafka website, "Apache Kafka is an open source distributed streaming system used for stream processing, real-time data pipelines, and data integration at scale."

Distributed event streaming is the buzzword when we talk about Confluent Kafka, however to a layman, we can describe it as the transmission of data across different platforms in real-time.

Kafka has made real-time data streaming easier in many ways:

  1. Improved data consistency across systems
  2. Publishing data feeds to/from various applications
  3. Raw input data can be consumed from Kafka topics and then aggregated, enriched, or transformed for consumption based upon a filtration criteria.

Let’s take a real-world scenario. Consider an online shopping website. One of the key streams of data for an application like that would be the activity of customers on the web application, such as order placements, search history, transactions and many more key behaviours that are captured as data. This data is then stored in a database. With Kafka, there is no need for the system to query the database for next steps and following actions. Instead, we can use a real-time stream of customer data, which flows both ways instantaneously – connecting systems in a web. Confluent Kafka can ingest this incoming stream, maintain the order, and then serve it to other outgoing services. It therefore provides access to real-time customer data without the associated application performance and scalability related risks.

 

Relevant Kafka terminology

Before we dive into the Kafka test automation setup, let’s unpack some of the terminology.

Topic – Kafka topics are virtual groups or logs that hold messages and events in a logical sequence, allowing users to send and receive data between Kafka servers with ease.

Partition – to increase the throughput of Kafka and do work in parallel, one Kafka topic can be split into many partitions.

Producer – a producer writes data in the form of messages to the Kafka cluster. Producers use a partitioning strategy to assign each message to a partition.

Consumer – a consumer is an application that reads data from Kafka topics – it consumes and receives the data. It subscribes to one or more topics in the Kafka cluster and then further feeds on tokens or messages from the Kafka topics.

Broker – Kafka is a cluster or a group of brokers. Producers sends messages to brokers. These received messages are consumed by downstream consumers. The longevity of data is dependent upon the retention time (one week is set by default).

Kafka cluster – a Kafka cluster is a system that consists of several brokers, topics, and partitions for both. The key objective is to distribute workloads equally among replicas and partitions.

Offset - a consumer offset keeps track of the latest messages read, and it is stored in a Kafka topic. Each record in a partition is assigned and identified by its unique offset. These offsets increment sequentially per message and do not get reset when messages are expired. The first event always has an offset of zero.

 

High-level architecture description

Data and records are sent to a Kafka cluster from producers (e.g. application services such as databases that are exporting real time data in feeds). This data is stored in a Kafka cluster, where it is filtered into several topics (groups that hold data from one or more producer streams) and then into smaller subsets called partitions (the units that Kafka replicates and distributes for fault tolerance). Consumers and application services that want to access the data can then subscribe to specific topics and receive a real-time feed of the relevant data as required.

 

Test framework setup overview

We have developed a Java Maven-based test automation framework at the Deloitte Quality Engineering to make it easier to compare real-time data that is produced to a Kafka cluster and subsequently consumed by consumers depending on the filtration criteria.

Benefits of the test automation framework:

  • Ease to data consumption
  • Implementing complete E2E coverage
  • Reduces the effort required for manual testing, finding defects, and assuring quicker feedback
  • Validation across different integration components
  • This is an extensible reusable solution which can be used to do database and API validations
  • Reporting mechanism

 

Summary

Detailing what Confluent Kafka is, its architecture and the various head-scratching terminology that goes with it, can help organisations decide whether the test automation framework is right for them. Feel free to reach out to Deloitte NZ Quality Engineering at qeresourceingneeds@deloitte.com with any further questions, and also if you’d like to setup a test automation framework for a Kafka implementation.