LOADING...
HomeAboutServices CareersContact Us
FLAGSHIP CASE STUDY

Enterprise FMCG
Vendor Data Platform

End-to-end Medallion Architecture on AWS Databricks — centralizing order management, vendor analytics, and sales reporting for a ₹20Cr+ monthly revenue FMCG operation with 5000+ active vendors.

Industry: FMCG / Food Products
Duration: 8 Months
Team Size: 6 Engineers
Status: ● Live in Production
AWS DatabricksApache KafkaPySpark Apache AirflowDelta LakeAWS S3 PythonCI/CDREST APIs
5000+
Vendors Managed
1-2M/Day
Orders Processed
₹20Cr+
Monthly Revenue
99.9%
Pipeline Uptime
60%
Faster Reporting
<5 min
Data Freshness
100%
Automated QA
Zero
Data Loss Incidents

1 The Challenge

A leading FMCG manufacturer with 5000+ active vendors across India was struggling with fragmented data systems. Order data flowed through multiple disconnected channels — vendor databases, REST APIs, purchase systems, company websites, and even SMS-based ordering.

  • No centralized data platform — data scattered across 15+ source systems with no single source of truth
  • Manual reporting taking 3-5 days — business decisions delayed due to slow Excel-based analytics
  • Data quality issues — duplicate records, inconsistent schemas, missing fields across vendor data
  • No real-time visibility — operations team couldn't see live order status, vendor performance, or revenue trends
  • Scale pressure — 1-2 million order transactions daily with growing vendor network

2 Our Solution

We designed and implemented an end-to-end Medallion Architecture (Bronze → Silver → Gold) on AWS Databricks, creating a unified enterprise data platform that ingests, transforms, validates, and serves analytics-ready data.

ARCHITECTURE DIAGRAM
🥉 BRONZE LAYER
Raw data ingestion from Kafka streams, REST APIs, vendor databases, SMS channels. Stored as-is in AWS S3 for auditability and reprocessing.
🥈 SILVER LAYER
PySpark transformations — schema validation, null handling, deduplication, nested JSON flattening, latest record selection, business rule validation.
🥇 GOLD LAYER
Curated Delta tables for vendor analytics, sales reporting, revenue trends, order success rates, operational dashboards, and executive KPIs.
📡 Sources 🔄 Kafka/API 🪣 S3 Bronze ⚡ Databricks 🥇 Gold Tables 📊 Dashboards

3 What We Built

  • Kafka + REST API ingestion pipelines — collecting data from vendor databases, APIs, purchase systems, websites, and SMS-based ordering channels in real-time
  • Scalable PySpark transformation engine — schema validation, null handling, deduplication, nested JSON flattening, latest record selection, and business rule validation
  • Apache Airflow orchestration — reliable batch and near real-time ETL workflows with monitoring, alerting, retry mechanisms, and SLA tracking
  • Delta Lake optimization — ACID transactions, schema evolution, time-travel for data recovery, Z-ordering for query performance
  • Automated data quality framework — 200+ validation rules, anomaly detection, freshness checks, and automated alerting on data quality SLA breaches
  • AI-powered pipeline failure analysis — lightweight LLM-based assistant that analyzes pipeline failures and suggests fixes automatically
  • CI/CD deployment workflows — automated testing, staging, and production deployment with rollback capabilities using GitHub Actions
  • Executive dashboards — real-time vendor performance, sales reporting, revenue trends, order success rates, and regional analytics

4 Tech Stack

AWS DatabricksApache KafkaPySpark Apache AirflowDelta LakeAWS S3 PythonSQLREST APIs GitHub ActionsGreat ExpectationsPower BI AWS LambdaAWS CloudWatchTerraform

5 Client Testimonial

"

"RM Technologies transformed our entire vendor management and data operations. What used to take 3-5 days of manual reporting now happens in real-time. The Databricks pipeline they built is rock-solid — processing over a million orders daily without a single failure. Our business intelligence has improved dramatically."

RK
Director
Leading FMCG Manufacturer, Uttar Pradesh

Similar Case Studies

Need a Similar Data Platform?

Let's discuss how we can build a scalable, production-ready data solution for your enterprise.