⚡
Data Engineering Cohort
Contact
Login
Course Syllabus
Complete topic breakdown — SQL, Python, Apache Spark, Azure Databricks & Azure Data Factory.
SQL
🗃️
SQL
Structured query language — fundamentals to advanced
Language
DDL — Data Definition Language
DML — Data Manipulation Language
Conditional Statements
GROUP BY
ORDER BY
Aggregation Functions
Joins
Window Functions
Sub Queries
CTE — Common Table Expressions
Interview Problem Patterns
Indexing — Theory
Python
🐍
Python
Core programming fundamentals
Language
Data Types
Loops
Functions
OOPS — Object Oriented Programming
Data Structures
File Management
Exception Handling
Threading
Apache Spark
⚙️
Spark Architecture
Internals & execution model
Core
Spark Architecture
Internals of Job Submission
Driver Node Roles
Worker Node Roles
DAG Internals
Catalyst Optimiser
Job → Stages → Tasks
Narrow vs Wide Transformations
Data Shuffling
DataFrames vs Datasets vs RDD
📥
Data Read API
Sources, formats & schema
I/O
Spark Read API with Various Option Parameters
Schema Inference
Manual Schema — String
Manual Schema — Struct
File Formats — Complex JSON
File Formats — CSV, TSV, PSV
File Formats — ORC, Parquet, Delta
REST API Read
Read from Kafka Data Producer
Read Compressed Data
📤
Data Write API
Output & write strategies
I/O
Write as File — CSV, JSON, Parquet
Write as Managed Table
Write as External Table
Write with PartitionBy
Write with BucketBy
Write with Custom Partition Count
Write in Compressed Format
🔀
Transformations
Column ops, complex types, joins & aggregations
Transform
Column Manipulations — Add
Column Manipulations — Rename
Column Manipulations — Drop
Column Manipulations — Merge
String Manipulations
Deriving Calculated Values
Conditional Values — Case Statements
Handling Arrays in a Column
Handling Struct Columns
Handling Complex JSON
Filter Operations — filter API
Filter Operations — where API
Null Handling — Drop
Null Handling — Replace
Column Reversing
Type Casting
Joins — Inner
Joins — Outer
Joins — Anti
Joins — Semi
Window Functions
GroupBy Aggregations
▶️
Actions
Triggers execution
Action
Count Operation
Collect Operation
Grouping
Show
⚡
Performance Tuning
Optimization & execution analysis
Tuning
Caching
Persist
Broadcast Join Optimisation
SMB Join Optimisation
SHJ Join Optimisation
Manual Data Skew Handling
Repartition
Coalesce
Predicate Pushdown
Column Pruning
Select Required Columns
Why UDF is Not Recommended
AQE — Adaptive Query Execution
Spark Execution Plan Analysis
Spark UI Walkthrough
Bucketing In Depth & its Usages
Partitioning In Depth with Parquet
Azure Databricks
◈
Platform & Architecture
Workspace & infrastructure
Platform
Databricks Architecture
Control Plane
Data Plane
Production Grade Databricks Workspace Setup
Workflow Creation
Alerts Management & Monitoring
📋
Unity Catalog
Governance framework
Governance
Unity Catalog Governance Framework
Schemas
Tables
Volumes
Views
Functions
Δ
Delta Table
Internals & metadata
Delta
Delta Table / File overview
Transactional Log Metadata Analysis
Internals of Delta Table
Delta Table Properties
Vacuum
Time Travel in Data
Optimize
Z Order By
Deletion Vector
Liquid Clustering
🔧
Advanced Features
Acceleration, CDC, pipelines, streaming & DevOps
Advanced
Photon Acceleration
CDC — Change Data Capture
Merge Statements
Spark Declarative Pipelines
Auto Loader
Structured Streaming
Databricks Asset Bundle — DevOps
Genie
AI Playground
Azure Data Factory
🔌
Runtime & Components
Connectivity & pipeline building blocks
IR
Integration Runtimes
Data Migration — On-Prem to Azure Cloud
Linked Services
Datasets
Triggers
⚙️
Activities
All pipeline activities
Activities
Copy Activity
GetMetadata — File and Folder
LookUp Activity
ForEach Activity
If Condition Activity
Switch Activity
WaitUntil Activity
SetVariable Activity
Delete Activity
🔗
Integrations & Scenarios
Databricks & industry patterns
Integration
Databricks Integration
Industry-standard Scenarios