What Is ETL Testing Interview Questions -

What Is ETL Testing? (Definition + Real-World Example)

ETL Testing is the process of validating data during Extract, Transform, and Load (ETL) operations to ensure that data moved from source systems to a data warehouse (DW) is accurate, complete, consistent, and performant.

Real-world example

In an e-commerce project, order data is extracted from transactional databases, transformed to calculate discounts, taxes, and net sales, and loaded into fact tables for reporting.
An ETL tester validates:

Source vs target record counts
Transformation logic against Source-to-Target (S2T) mapping
Aggregated report values against DW tables

This is why what is ETL testing interview questions usually begin with concepts and quickly move to scenario-based validation using SQL.

Data Warehouse Flow: Source → Staging → Transform → Load → Reporting

Source – OLTP databases, flat files, APIs
Staging – Raw data landing area
Transformation – Business rules, joins, SCD logic
Load – Fact and Dimension tables
Reporting – BI dashboards, analytics

What Is ETL Testing Interview Questions & Answers

(Basic → Advanced | Interview-Oriented)

🔹 Basic ETL Testing Interview Questions (1–15)

What is ETL testing?
ETL testing validates data accuracy during extract, transform, and load processes.
Why is ETL testing required?
To ensure reports and analytics are based on correct data.
What is a data warehouse?
A centralized repository for historical and analytical data.
What is Source-to-Target (S2T) mapping?
A document defining source fields, transformations, and target fields.
What are the types of ETL testing?
Source, staging, transformation, target, reconciliation, performance testing.
What is staging area testing?
Validating that staging data matches source data exactly.
What are audit fields?
load_date, batch_id, created_by, updated_date.
What is data reconciliation?
Matching record counts and totals between source and target.
Difference between fact and dimension tables?
Facts store measures; dimensions store descriptive attributes.
What is full load vs incremental load?
Full loads all data; incremental loads only changed data.
What is reject data testing?
Validating records rejected due to rule violations.
What is data quality testing?
Checking accuracy, completeness, and consistency.
What is mapping validation?
Ensuring ETL logic matches S2T mapping.
What is end-to-end ETL testing?
Source → DW → Report validation.
Why is SQL important for ETL testers?
SQL is used to validate data, transformations, and aggregations.

🔹 Intermediate ETL Testing Questions (16–35)

What is SCD Type 1?
Overwrites old data without history.
What is SCD Type 2?
Maintains history using effective dates and current_flag.
How do you test SCD2?
Validate multiple records per business key and date ranges.
What is surrogate key testing?
Ensuring keys are unique and non-null.
What is late-arriving dimension?
Fact arrives before dimension data.
How do you test aggregation logic?
Compare aggregated source and target data.
What is hashing used for?
Change detection and deduplication.
What is referential integrity testing?
Fact foreign keys must exist in dimension tables.
What is CDC (Change Data Capture)?
Processing only changed data.
What is soft delete testing?
Validating delete_flag instead of physical deletion.
What is schema drift?
Source schema changes affecting ETL.
How do you test incremental loads?
Validate only delta records are loaded.
What is partition testing?
Validate correct data placement in partitions.
How do you validate BI reports?
Reconcile report values with DW tables.
What is data masking testing?
Ensuring sensitive data is obfuscated.
What is data lineage?
Tracking data from source to report.
How do you test ETL error handling?
Validate rejects, logs, and alerts.
What is restartability testing?
Ensuring reruns don’t create duplicates.
What is archival testing?
Validating data movement to history tables.
What is performance testing in ETL?
Ensuring jobs meet SLA.

🔹 Advanced & Scenario-Based Questions (36–55)

Scenario: Source has 1M records, target has 980K. What do you check?
Filters, rejects, and deduplication.
Scenario: NULLs appear in mandatory columns. What failed?
Default/null-handling logic.
Scenario: Duplicate rows in target. Root cause?
Many-to-many joins.
How do you validate window functions?
Verify ranking and deduplication logic.
Scenario: ETL job misses SLA. What do you analyze?
Query plans, indexing, partitions.
How do you test audit/control tables?
Validate batch status and counts.
Scenario: Report mismatch with DW. First check?
S2T mapping and aggregation grain.
How do you test ETL rollback?
Ensure partial loads are reverted.
Scenario: Parallel jobs cause deadlocks. What test?
Concurrency validation.
How do you test historical reloads?
Check duplication and SCD logic.
Scenario: Late data impacts aggregates. What do you do?
Recalculate impacted partitions.
How do you validate checksum/hash totals?
Compare source vs target hashes.
Scenario: Currency conversion mismatch. What to test?
Exchange rate logic.
How do you test cloud ETL pipelines?
Scalability and cost performance.
Scenario: NULLs after LEFT JOIN. Why?
Missing dimension records.
How do you test metadata tables?
Validate load timestamps and counts.
Scenario: Unexpected data spike. What test?
Source anomaly validation.
How do you test data quality thresholds?
Validate reject percentages.
How do you perform real-time ETL testing?
Source → Target → Report reconciliation.
How do you explain ETL testing in interviews?
With real scenarios, SQL examples, and business impact.

Real SQL Query Examples for ETL Validation

Sample Tables

src_orders(order_id, cust_id, amount, order_dt)
dim_customer(cust_sk, cust_id, current_flag)
fact_sales(order_id, cust_sk, amount, load_dt)

1️⃣ Record Count Validation

SELECT COUNT(*) FROM src_orders;

SELECT COUNT(*) FROM fact_sales;

2️⃣ JOIN Validation

SELECT COUNT(*) missing_dim

FROM fact_sales f

LEFT JOIN dim_customer d

ON f.cust_sk = d.cust_sk

WHERE d.cust_sk IS NULL;

3️⃣ GROUP BY Aggregation

SELECT cust_id, SUM(amount)

FROM src_orders

GROUP BY cust_id;

4️⃣ Window Function – Deduplication

SELECT *

FROM (

SELECT order_id,

ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn

FROM src_orders

) t

WHERE rn = 1;

5️⃣ Performance Tuning Validation

EXPLAIN

SELECT *

FROM fact_sales

WHERE load_dt >= CURRENT_DATE – 1;

ETL Tools Commonly Asked in Interviews

Informatica
Microsoft SQL Server Integration Services
Ab Initio
Pentaho
Talend

ETL Defect Examples + Test Case Sample

Defect Example

Issue: Duplicate records in fact table

Root Cause: Incorrect join logic
Fix: Correct join keys and add deduplication

Sample Test Case

Test Case: Validate SCD2 update
Expected:
- Old record expired
- New record inserted with current_flag = ‘Y’

What Is ETL Testing – Quick Revision Sheet

Validate counts, sums, NULLs, duplicates
Verify S2T mapping carefully
Test SCD1, SCD2, audit fields, hashing
Use JOIN, GROUP BY, window functions
Always reconcile source vs target vs reports

FAQs (Snippet-Friendly)

Q1. What is ETL testing in simple words?
ETL testing ensures correct data movement from source to data warehouse.

Q2. What questions are asked in ETL testing interviews?
Conceptual, SQL-based, and scenario-based questions.

Q3. How do you test ETL using SQL?
By validating counts, joins, aggregations, and transformations.