Kafka CDC for Event Driven Architectures

Daniel Selans

Apps built around event-driven architectures often divide functionality across multiple services, each of which may have its own database or file storage. This division of responsibility allows databases to be chosen and optimized for specific use cases; a data lake has different concerns than a transaction processing system.

However, this separation creates a challenge—how can services share data quickly and reliably to maintain consistency?

Change data capture (CDC) is a widely used solution. CDC is a data integration and replication technique that captures changes in a source database and propagates them to one or more target databases. CDC is often used in scenarios requiring near-real-time data synchronization.

In this article, we’ll take a closer look at CDC, with a particular focus on change data capture with Kafka.

What is Kafka?

Apache Kafka is a distributed streaming platform designed to handle high volume, high throughput, and low-latency streaming data. It is deployed as a cluster of brokers that manage the storage and distribution of messages between producers and consumers, which publish and retrieve the messages.

Kafka allows for the creation of fault-tolerant, scalable, and distributed applications that can process and analyze real-time data.

How Does Kafka CDC Work?

Kafka is ideal for CDC processes. As a data streaming solution, it can receive messages containing changes from the source database and publish them to Kafka consumers, which can then apply those changes to the target databases.

CDC with Kafka can be broken down into the following steps:

CDC tools monitor the source database for changes, including inserts, updates, and deletes, and capture the changes as they occur. This is typically done by reading the database’s transaction log or by using triggers on the database tables, although the exact method differs depending on the database.
The captured changes are published to Kafka topics. A Kafka topic is a category or feed name to which messages are published and from which messages are consumed. Topics in Kafka are partitioned and replicated across a cluster of brokers for high availability and scalability.
The messages are typically stored in the JSON or Avro format in Kafka. They contain information about the change, including the table name, the type of change (insert, update, or delete), and the values of the affected columns.
The target database can use a Kafka consumer to read the messages from the topics and apply the changes.

For example, in a FinTech application, when a user makes a transaction, it is stored in a database. The CDC tool captures the change and publishes the transaction details as a message to a Kafka topic. The target database, which may be part of a fraud detection or credit assessment system, consumes the message from the Kafka topic and updates its records accordingly.

The Benefits of Using Kafka for CDC

Change data capture with Kafka offers several advantages. Kafka is designed to handle high volume data streams, making it an ideal platform for CDC in large-scale applications. It enables near-real-time synchronization of the source and target databases, reducing the risk of errors and inconsistencies.

Because Kafka provides pre-built connectors for various data sources, it is easy to integrate CDC with new and existing systems and applications. Additionally, Kafka libraries are available for many programming languages, making it straightforward to create producers and consumers in the preferred language of each project.

Streamdal and Kafka CDC

Streamdal’s Streaming Data Performance Monitoring (SDPM) platform connects to CDC streams from Kafka, allowing users to take advantage of a wide range of streaming data monitoring capabilities, including semantic data monitoring, anomaly and PII detection, and data replays to any destination.

Get started with Streamdal today, or book a demonstration to learn more.

What is Kafka?

How Does Kafka CDC Work?

The Benefits of Using Kafka for CDC

Streamdal and Kafka CDC

Daniel Selans

CTO

Dan is the co-founder and CTO of Streamdal. Dan is a tech industry veteran with 20+ years of experience working as a principal engineer at companies like New Relic, InVision, Digital Ocean and various data centers. He is passionate about distributed systems, software architecture and enabling observability in next-gen systems.

Continue Exploring

Data Protection

Thu Jul 27 2023

Data Protection: Challenges and Opportunities

Daniel Selans

Explore data protection strategies, key regulations, and the role of automation in safeguarding sensitive information in an ever-evolving digital landscape.

Data Consistency in Distributed Enterprise Applications

Daniel Selans

Learn about data consistency in distributed enterprise apps, why it matters, and how to maintain it using validation and real-time data monitoring.

Kafka CDC for Event Driven Architectures

What is Kafka?

How Does Kafka CDC Work?

The Benefits of Using Kafka for CDC

Streamdal and Kafka CDC

Continue Exploring

Data Protection: Challenges and Opportunities

Data Consistency in Distributed Enterprise Applications

Unlock real-time data visibility today!