1️⃣ What is ETL Testing? (Definition + Example)
ETL Testing is the process of validating that data is correctly Extracted from source systems, Transformed as per business rules, and Loaded into the target data warehouse or data mart.
Real-World Example
In a retail analytics project:
- Data is extracted from POS systems (MySQL, flat files)
- Transformed to apply currency conversion, deduplication, and SCD2 logic
- Loaded into a star-schema data warehouse
- Consumed by reporting tools
For 7 years experience, interviewers expect:
- Strong ETL architecture knowledge
- Advanced SQL validation
- Handling large data volumes
- Deep understanding of SCDs, audit fields, performance tuning
2️⃣ Data Warehouse Flow – Source → Staging → Transform → Load → Reporting
Typical ETL / DW Architecture
| Layer | Description | Testing Focus |
| Source | OLTP DBs, APIs, Files | Data completeness |
| Staging | Raw landing tables | Data cleansing |
| Transformation | Business rules, SCDs | Logic validation |
| Load | Fact & Dimension tables | Accuracy, keys |
| Reporting | BI dashboards | Aggregation checks |
Key Validation Areas
- Source-to-Target (S2T) mapping
- Record count reconciliation
- Data type & length validation
- Audit columns (batch_id, load_date)
- Incremental load logic
3️⃣ ETL Testing Interview Questions for 7 Years Experience (with Answers)
🔹 Basic & Conceptual Questions
- What is ETL testing?
Validation of data extraction, transformation rules, and loading accuracy. - Difference between ETL testing and data warehouse testing?
ETL testing focuses on data movement; DW testing includes schema, facts, dimensions, and reporting. - What is staging area?
Intermediate storage for raw extracted data. - What is S2T mapping?
A document defining source fields, target fields, and transformation rules. - What are audit fields?
Metadata columns like load_date, batch_id, created_ts.
🔹 Intermediate ETL QA Questions
- Explain SCD Type 1.
Overwrites existing data; no history. - Explain SCD Type 2.
Maintains historical records using effective dates and active flags. - What is incremental load testing?
Validating only changed or new records are loaded. - What is data reconciliation?
Comparing source and target datasets for consistency. - What is hashing in ETL testing?
Using checksum/hash totals to validate large datasets efficiently.
🔹 Advanced ETL Interview Questions (7+ Years Level)
- How do you test ETL jobs with billions of records?
Sampling, hashing, partition-wise validation, and aggregate checks. - How do you validate complex transformations?
By recreating business logic in SQL and comparing results. - How do you test restartability?
Force-fail job mid-run and validate resume logic. - How do you validate surrogate key generation?
Check uniqueness and non-reusability across loads. - What is late-arriving dimension handling?
Loading fact records before dimension records with placeholder keys.
4️⃣ Real SQL Query Examples for ETL Validation
Sample Source Table
sales_src(order_id, cust_id, amount, order_date)
Sample Target Table
fact_sales(order_key, cust_key, total_amount, order_dt)
✅ Record Count Validation
SELECT COUNT(*) FROM sales_src
WHERE order_date >= ‘2024-01-01’;
SELECT COUNT(*) FROM fact_sales
WHERE order_dt >= ‘2024-01-01’;
✅ JOIN Validation (Data Accuracy)
SELECT s.order_id, s.amount, f.total_amount
FROM sales_src s
JOIN fact_sales f
ON s.order_id = f.order_key
WHERE s.amount <> f.total_amount;
✅ GROUP BY Aggregation Validation
SELECT cust_id, SUM(amount)
FROM sales_src
GROUP BY cust_id;
SELECT cust_key, SUM(total_amount)
FROM fact_sales
GROUP BY cust_key;
✅ Window Function – Duplicate Detection
SELECT order_key
FROM (
SELECT order_key,
ROW_NUMBER() OVER (PARTITION BY order_key ORDER BY order_dt) rn
FROM fact_sales
) x
WHERE rn > 1;
✅ Performance Tuning Check
EXPLAIN ANALYZE
SELECT * FROM fact_sales
WHERE order_dt = ‘2024-06-01’;
5️⃣ Scenario-Based ETL Testing Questions with Answers
🔹 Scenario 1: Record Count Mismatch
Q: Source has 2M records, target has 1.95M.
A: Validate filters, rejected rows, lookup failures, and error tables.
🔹 Scenario 2: Null Values in Mandatory Columns
Q: Target has NULL in NOT NULL columns.
A: Check source nulls, default expressions, transformation logic.
🔹 Scenario 3: SCD2 Not Maintaining History
Q: Old records overwritten.
A: Verify effective_date, end_date, and active_flag logic.
🔹 Scenario 4: ETL Performance Issue
Q: Job exceeds SLA by 2 hours.
A: Analyze indexes, partitioning, parallelism, push-down optimization.
6️⃣ ETL Tools Commonly Asked in Interviews
- Informatica – Mappings, workflows, sessions
- Microsoft SSIS – Control Flow, Data Flow
- Ab Initio – High-performance processing
- Pentaho – Kettle transformations
- Talend – Cloud & open-source ETL
Interviewers focus more on logic + SQL than tool UI.
7️⃣ ETL Defect Examples (Real-Time)
| Defect Type | Example |
| Mapping defect | Incorrect source column mapped |
| Data loss | Filter removes valid records |
| SCD defect | History not preserved |
| Performance | Job exceeds SLA |
| Data type | Truncation issues |
8️⃣ Sample ETL Test Case
Test Case: Validate SCD Type 2 – Customer Dimension
- Source: customer_src
- Target: dim_customer
- Validation Points:
- Only one active record
- Old record end_date populated
- New surrogate key generated
9️⃣ ETL Testing Quick Revision Sheet
- ETL architecture & data flow
- SCD1 vs SCD2
- Incremental load logic
- SQL joins, GROUP BY, window functions
- Hash totals & reconciliation
- Performance tuning basics
🔟 FAQs (For Featured Snippet Ranking)
Q1. What SQL level is expected for 7 years ETL testing experience?
Advanced SQL including joins, subqueries, window functions, and performance analysis.
Q2. Is automation required for ETL testing?
Primarily SQL-based manual testing with selective automation.
Q3. What is the most important skill for ETL testers?
Understanding business logic + strong SQL expertise.
Q4. How many ETL interview questions should I prepare?
At least 80–120 for senior-level roles.
