Course Syllabus

Complete topic breakdown — SQL, Python, Apache Spark, Azure Databricks & Azure Data Factory.

SQL
🗃️
SQL
Structured query language — fundamentals to advanced
Language
  • DDL — Data Definition Language
  • DML — Data Manipulation Language
  • Conditional Statements
  • GROUP BY
  • ORDER BY
  • Aggregation Functions
  • Joins
  • Window Functions
  • Sub Queries
  • CTE — Common Table Expressions
  • Interview Problem Patterns
  • Indexing — Theory
Python
🐍
Python
Core programming fundamentals
Language
  • Data Types
  • Loops
  • Functions
  • OOPS — Object Oriented Programming
  • Data Structures
  • File Management
  • Exception Handling
  • Threading
Apache Spark
⚙️
Spark Architecture
Internals & execution model
Core
  • Spark Architecture
  • Internals of Job Submission
  • Driver Node Roles
  • Worker Node Roles
  • DAG Internals
  • Catalyst Optimiser
  • Job → Stages → Tasks
  • Narrow vs Wide Transformations
  • Data Shuffling
  • DataFrames vs Datasets vs RDD
📥
Data Read API
Sources, formats & schema
I/O
  • Spark Read API with Various Option Parameters
  • Schema Inference
  • Manual Schema — String
  • Manual Schema — Struct
  • File Formats — Complex JSON
  • File Formats — CSV, TSV, PSV
  • File Formats — ORC, Parquet, Delta
  • REST API Read
  • Read from Kafka Data Producer
  • Read Compressed Data
📤
Data Write API
Output & write strategies
I/O
  • Write as File — CSV, JSON, Parquet
  • Write as Managed Table
  • Write as External Table
  • Write with PartitionBy
  • Write with BucketBy
  • Write with Custom Partition Count
  • Write in Compressed Format
🔀
Transformations
Column ops, complex types, joins & aggregations
Transform
  • Column Manipulations — Add
  • Column Manipulations — Rename
  • Column Manipulations — Drop
  • Column Manipulations — Merge
  • String Manipulations
  • Deriving Calculated Values
  • Conditional Values — Case Statements
  • Handling Arrays in a Column
  • Handling Struct Columns
  • Handling Complex JSON
  • Filter Operations — filter API
  • Filter Operations — where API
  • Null Handling — Drop
  • Null Handling — Replace
  • Column Reversing
  • Type Casting
  • Joins — Inner
  • Joins — Outer
  • Joins — Anti
  • Joins — Semi
  • Window Functions
  • GroupBy Aggregations
▶️
Actions
Triggers execution
Action
  • Count Operation
  • Collect Operation
  • Grouping
  • Show
Performance Tuning
Optimization & execution analysis
Tuning
  • Caching
  • Persist
  • Broadcast Join Optimisation
  • SMB Join Optimisation
  • SHJ Join Optimisation
  • Manual Data Skew Handling
  • Repartition
  • Coalesce
  • Predicate Pushdown
  • Column Pruning
  • Select Required Columns
  • Why UDF is Not Recommended
  • AQE — Adaptive Query Execution
  • Spark Execution Plan Analysis
  • Spark UI Walkthrough
  • Bucketing In Depth & its Usages
  • Partitioning In Depth with Parquet
Azure Databricks
Platform & Architecture
Workspace & infrastructure
Platform
  • Databricks Architecture
  • Control Plane
  • Data Plane
  • Production Grade Databricks Workspace Setup
  • Workflow Creation
  • Alerts Management & Monitoring
📋
Unity Catalog
Governance framework
Governance
  • Unity Catalog Governance Framework
  • Schemas
  • Tables
  • Volumes
  • Views
  • Functions
Δ
Delta Table
Internals & metadata
Delta
  • Delta Table / File overview
  • Transactional Log Metadata Analysis
  • Internals of Delta Table
  • Delta Table Properties
  • Vacuum
  • Time Travel in Data
  • Optimize
  • Z Order By
  • Deletion Vector
  • Liquid Clustering
🔧
Advanced Features
Acceleration, CDC, pipelines, streaming & DevOps
Advanced
  • Photon Acceleration
  • CDC — Change Data Capture
  • Merge Statements
  • Spark Declarative Pipelines
  • Auto Loader
  • Structured Streaming
  • Databricks Asset Bundle — DevOps
  • Genie
  • AI Playground
Azure Data Factory
🔌
Runtime & Components
Connectivity & pipeline building blocks
IR
  • Integration Runtimes
  • Data Migration — On-Prem to Azure Cloud
  • Linked Services
  • Datasets
  • Triggers
⚙️
Activities
All pipeline activities
Activities
  • Copy Activity
  • GetMetadata — File and Folder
  • LookUp Activity
  • ForEach Activity
  • If Condition Activity
  • Switch Activity
  • WaitUntil Activity
  • SetVariable Activity
  • Delete Activity
🔗
Integrations & Scenarios
Databricks & industry patterns
Integration
  • Databricks Integration
  • Industry-standard Scenarios