What Is ETL Testing? (Definition + Real-World Example)
ETL Testing is the process of validating data during Extract, Transform, and Load (ETL) operations to ensure that data moved from source systems to a data warehouse (DW) is accurate, complete, consistent, and performant.
Real-world example
In an e-commerce project, order data is extracted from transactional databases, transformed to calculate discounts, taxes, and net sales, and loaded into fact tables for reporting.
An ETL tester validates:
- Source vs target record counts
- Transformation logic against Source-to-Target (S2T) mapping
- Aggregated report values against DW tables
This is why what is ETL testing interview questions usually begin with concepts and quickly move to scenario-based validation using SQL.
Data Warehouse Flow: Source → Staging → Transform → Load → Reporting
- Source – OLTP databases, flat files, APIs
- Staging – Raw data landing area
- Transformation – Business rules, joins, SCD logic
- Load – Fact and Dimension tables
- Reporting – BI dashboards, analytics
What Is ETL Testing Interview Questions & Answers
(Basic → Advanced | Interview-Oriented)
🔹 Basic ETL Testing Interview Questions (1–15)
- What is ETL testing?
ETL testing validates data accuracy during extract, transform, and load processes. - Why is ETL testing required?
To ensure reports and analytics are based on correct data. - What is a data warehouse?
A centralized repository for historical and analytical data. - What is Source-to-Target (S2T) mapping?
A document defining source fields, transformations, and target fields. - What are the types of ETL testing?
Source, staging, transformation, target, reconciliation, performance testing. - What is staging area testing?
Validating that staging data matches source data exactly. - What are audit fields?
load_date, batch_id, created_by, updated_date. - What is data reconciliation?
Matching record counts and totals between source and target. - Difference between fact and dimension tables?
Facts store measures; dimensions store descriptive attributes. - What is full load vs incremental load?
Full loads all data; incremental loads only changed data. - What is reject data testing?
Validating records rejected due to rule violations. - What is data quality testing?
Checking accuracy, completeness, and consistency. - What is mapping validation?
Ensuring ETL logic matches S2T mapping. - What is end-to-end ETL testing?
Source → DW → Report validation. - Why is SQL important for ETL testers?
SQL is used to validate data, transformations, and aggregations.
🔹 Intermediate ETL Testing Questions (16–35)
- What is SCD Type 1?
Overwrites old data without history. - What is SCD Type 2?
Maintains history using effective dates and current_flag. - How do you test SCD2?
Validate multiple records per business key and date ranges. - What is surrogate key testing?
Ensuring keys are unique and non-null. - What is late-arriving dimension?
Fact arrives before dimension data. - How do you test aggregation logic?
Compare aggregated source and target data. - What is hashing used for?
Change detection and deduplication. - What is referential integrity testing?
Fact foreign keys must exist in dimension tables. - What is CDC (Change Data Capture)?
Processing only changed data. - What is soft delete testing?
Validating delete_flag instead of physical deletion. - What is schema drift?
Source schema changes affecting ETL. - How do you test incremental loads?
Validate only delta records are loaded. - What is partition testing?
Validate correct data placement in partitions. - How do you validate BI reports?
Reconcile report values with DW tables. - What is data masking testing?
Ensuring sensitive data is obfuscated. - What is data lineage?
Tracking data from source to report. - How do you test ETL error handling?
Validate rejects, logs, and alerts. - What is restartability testing?
Ensuring reruns don’t create duplicates. - What is archival testing?
Validating data movement to history tables. - What is performance testing in ETL?
Ensuring jobs meet SLA.
🔹 Advanced & Scenario-Based Questions (36–55)
- Scenario: Source has 1M records, target has 980K. What do you check?
Filters, rejects, and deduplication. - Scenario: NULLs appear in mandatory columns. What failed?
Default/null-handling logic. - Scenario: Duplicate rows in target. Root cause?
Many-to-many joins. - How do you validate window functions?
Verify ranking and deduplication logic. - Scenario: ETL job misses SLA. What do you analyze?
Query plans, indexing, partitions. - How do you test audit/control tables?
Validate batch status and counts. - Scenario: Report mismatch with DW. First check?
S2T mapping and aggregation grain. - How do you test ETL rollback?
Ensure partial loads are reverted. - Scenario: Parallel jobs cause deadlocks. What test?
Concurrency validation. - How do you test historical reloads?
Check duplication and SCD logic. - Scenario: Late data impacts aggregates. What do you do?
Recalculate impacted partitions. - How do you validate checksum/hash totals?
Compare source vs target hashes. - Scenario: Currency conversion mismatch. What to test?
Exchange rate logic. - How do you test cloud ETL pipelines?
Scalability and cost performance. - Scenario: NULLs after LEFT JOIN. Why?
Missing dimension records. - How do you test metadata tables?
Validate load timestamps and counts. - Scenario: Unexpected data spike. What test?
Source anomaly validation. - How do you test data quality thresholds?
Validate reject percentages. - How do you perform real-time ETL testing?
Source → Target → Report reconciliation. - How do you explain ETL testing in interviews?
With real scenarios, SQL examples, and business impact.
Real SQL Query Examples for ETL Validation
Sample Tables
- src_orders(order_id, cust_id, amount, order_dt)
- dim_customer(cust_sk, cust_id, current_flag)
- fact_sales(order_id, cust_sk, amount, load_dt)
1️⃣ Record Count Validation
SELECT COUNT(*) FROM src_orders;
SELECT COUNT(*) FROM fact_sales;
2️⃣ JOIN Validation
SELECT COUNT(*) missing_dim
FROM fact_sales f
LEFT JOIN dim_customer d
ON f.cust_sk = d.cust_sk
WHERE d.cust_sk IS NULL;
3️⃣ GROUP BY Aggregation
SELECT cust_id, SUM(amount)
FROM src_orders
GROUP BY cust_id;
4️⃣ Window Function – Deduplication
SELECT *
FROM (
SELECT order_id,
ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn
FROM src_orders
) t
WHERE rn = 1;
5️⃣ Performance Tuning Validation
EXPLAIN
SELECT *
FROM fact_sales
WHERE load_dt >= CURRENT_DATE – 1;
ETL Tools Commonly Asked in Interviews
- Informatica
- Microsoft SQL Server Integration Services
- Ab Initio
- Pentaho
- Talend
ETL Defect Examples + Test Case Sample
Defect Example
Issue: Duplicate records in fact table
- Root Cause: Incorrect join logic
- Fix: Correct join keys and add deduplication
Sample Test Case
- Test Case: Validate SCD2 update
- Expected:
- Old record expired
- New record inserted with current_flag = ‘Y’
- Old record expired
What Is ETL Testing – Quick Revision Sheet
- Validate counts, sums, NULLs, duplicates
- Verify S2T mapping carefully
- Test SCD1, SCD2, audit fields, hashing
- Use JOIN, GROUP BY, window functions
- Always reconcile source vs target vs reports
FAQs (Snippet-Friendly)
Q1. What is ETL testing in simple words?
ETL testing ensures correct data movement from source to data warehouse.
Q2. What questions are asked in ETL testing interviews?
Conceptual, SQL-based, and scenario-based questions.
Q3. How do you test ETL using SQL?
By validating counts, joins, aggregations, and transformations.
