M.Tech · Distributed Systems · 2026

Distributed
Data Management

An interactive showcase of data parsing, distributed databases, exploratory analysis pipelines, and federated data architecture.

Explore Assignments View on GitHub Lecture Notes

4Assignments

12+Experiments

5+Technologies

Course Overview

What I Learned

A deep dive into managing data at scale — from raw ingestion to distributed intelligence.

Data Engineering

Mastering heterogeneous data ingestion — CSV, JSON, XML, binary formats, HTML scraping, and SQL/NoSQL CRUD operations with Python.

PandasSQLiteMongoDB

Federated Architecture

Understanding distributed query processing, schema heterogeneity, mediator-wrapper patterns, and CAP theorem tradeoffs.

2PC ProtocolMVCCSaga Pattern

Exploratory Analysis

Pre-processing pipelines with feature engineering, outlier detection, correlation heatmaps on Indian water resource data.

NumPySeabornSklearn

Machine Learning

Applying Linear Regression for predictive modeling with StandardScaler, train-test splits, and performance metrics (MSE, R²).

LinearRegressionR² ScoreMSE

Portfolio

Assignments Showcase

Click on an assignment card to explore the full report, code, and experimental outputs.

Lab 01

Database & Analysis Pipeline

End-to-End Data Management

A comprehensive lab covering data parsing across multiple formats, NoSQL/SQL CRUD, EDA with visualization, and a Linear Regression ML model on Indian Water Resource data.

PythonPandasSQLiteMongoDBSklearnMatplotlib

View Lab Report & Code →

Theory 01

Federated Data Management

Academic Research Report

A professional book-format chapter covering distributed query processing, schema heterogeneity resolution, mediator-wrapper architecture, 2PC/Saga transactions, CAP theorem, security & real-world case studies.

LaTeXTikZAcademic WritingDistributed Systems

Open Digital Report →

Lab 02