Data streaming using Kafka for a retail company

By implementing a robust integration pipeline, we created an ecosystem to collect, process, and distribute vast amounts of data in real time.

    SHARE

Background

A large retail company wanted to build a robust integration pipeline to handle the vast data generated by various company departments and third-party collaborators across their IT systems. They needed a solution that could efficiently collect, process, and distribute data in real time to support critical business processes, such as inventory management, stock management, delivery, and more.

We chose Apache Kafka as the most suitable tool to build the integration pipeline solution.

Team

  • 2

Duration

  • Ongoing

Team role

  • Senior Software Developers

Industry

  • Retail

Technology

  • Java
  • Apache Kafka
  • Spring
  • Oracle
  • AWS
  • GCP
  • Kubernetes
  • OpenShift

Technology choice reasoning

Scalability and high performance
The retail company deals with a high volume of data from multiple sources. Kafka's ability to handle millions of messages per second ensured the pipeline could scale seamlessly as the company's data volumes grew without downtime. Kafka's support of scaling the consumers based on the partitioning ensured high throughput. Kafka's support of scaling the brokers in the cluster provided data durability, enabling data replication.

Flexibility
One of the requirements was to support a diverse technology landscape, both inbound and unbound. This includes different databases, messaging systems of other providers, or APIs. Kafka's support of hundreds of connectors allowed seamless integration with various IT systems and smooth data flow between different components of the integration pipeline. Confluent's Kafka Rest Proxy allowed support for cases where data has to be provided by REST API. This flexibility simplified the data integration process and reduced implementation complexity.

High availability and fault tolerance
Reliability was a key consideration, as any downtime or data loss could lead to significant business disruptions. Kafka's distributed architecture, with data replication across multiple brokers, ensured fault tolerance. In case of a broker failure, the remaining brokers could seamlessly take over, ensuring uninterrupted data flow.

Durability
Data reliability was critical for the retail company's integration pipeline. Kafka's durability and persistence ensured that even if a consumer system temporarily went offline or experienced issues, data would be retained and made available once the system recovered. This feature prevented data loss and ensured the pipeline's reliability.

Pub-Sub model
The retail company had multiple downstream systems and departments that required access to specific data sets. Kafka's pub-sub messaging model allowed the various IT systems to publish data from multiple sources to relevant topics, while downstream systems could subscribe to the topics of their interest. This decoupled architecture enabled flexible integration and reduced dependencies between systems.

Results

By leveraging Apache Kafka as the integration pipeline, we helped our customer successfully build a scalable, high performant, reliable, and real-time data processing system. Kafka's ability to handle huge data volumes efficiently, provide fault tolerance, and offer integration flexibility addressed the company's needs and enabled efficient data flow across their enterprise systems supporting its business-critical capabilities. All the integration services were written using Java.

Rafał Maciak
Senior Software Developer

"Leveraging Apache Kafka helped us unlock the power of real-time, fault-tolerant, and scalable data flows across IT systems."

If you need to build such a performant and reliable solution, contact us!

E-Book: Start with Apache Kafka Ebook

Find out more