1️⃣ What is ETL Testing? (Definition + Example)
ETL Testing is the process of verifying that data is correctly Extracted from source systems, Transformed based on business rules, and Loaded into the target data warehouse or data mart without data loss, duplication, or corruption.
Real-World Example
In an e-commerce project:
- Orders data is extracted from MySQL and CSV files
- Business rules apply discounts, currency conversion, and deduplication
- Data is loaded into fact and dimension tables
- Reports show daily revenue and customer trends
An ETL tester validates:
- Source-to-Target (S2T) mappings
- Transformation logic
- Record counts and data accuracy
- Performance and restartability
This makes etl testing interview questions for testers highly focused on SQL, logic validation, and real production scenarios.
2️⃣ Data Warehouse Flow – Source → Staging → Transform → Load → Reporting
Typical ETL / DW Architecture
| Layer | Description | Testing Focus |
| Source | OLTP DBs, APIs, flat files | Data completeness |
| Staging | Raw extracted data | Cleansing & duplicates |
| Transformation | Business rules, SCDs | Logic accuracy |
| Load | Fact & Dimension tables | Keys & integrity |
| Reporting | BI dashboards | Aggregation checks |
Key Tester Responsibilities
- Validate S2T mapping
- Verify incremental and full loads
- Check audit fields (batch_id, load_date)
- Perform reconciliation & hashing
3️⃣ ETL Testing Interview Questions for Testers (Basic → Advanced)
🔹 Basic ETL Testing Interview Questions
- What is ETL testing?
Validation of extraction, transformation, and loading of data. - What is the purpose of staging tables?
To temporarily store raw data before transformation. - What is S2T mapping?
Document that maps source columns to target columns with transformation rules. - Difference between ETL testing and database testing?
ETL testing validates data movement and transformation; DB testing validates schema and constraints. - What are fact and dimension tables?
Facts store metrics; dimensions store descriptive attributes.
🔹 Intermediate ETL QA Questions
- What is SCD Type 1?
Overwrites old data without maintaining history. - What is SCD Type 2?
Maintains historical records using effective_date and active_flag. - What is incremental load?
Loading only new or changed data since last run. - What are audit fields?
Metadata columns like created_date, updated_date, batch_id. - What is data reconciliation?
Comparing source and target data for consistency.
🔹 Advanced / Real-Time ETL Interview Questions
- How do you test large ETL loads?
Using record count checks, hashing, sampling, and aggregates. - How do you validate transformations without ETL tool access?
By writing SQL that mimics transformation logic. - What is hashing in ETL testing?
Using checksum/hash totals to validate large datasets efficiently. - How do you test restartability of ETL jobs?
Fail the job intentionally and verify resume logic. - What is late-arriving dimension handling?
Loading facts before dimensions using default surrogate keys.
(You can expect 40–100 similar questions in interviews; preparing these core concepts is critical.)
4️⃣ Real SQL Query Examples for ETL Validation
Sample Source Table
orders_src(order_id, customer_id, amount, order_date)
Sample Target Table
fact_orders(order_key, cust_key, total_amount, order_dt)
✅ Record Count Validation
SELECT COUNT(*) FROM orders_src
WHERE order_date >= ‘2024-01-01’;
SELECT COUNT(*) FROM fact_orders
WHERE order_dt >= ‘2024-01-01’;
✅ JOIN Validation (Data Accuracy)
SELECT s.order_id, s.amount, f.total_amount
FROM orders_src s
JOIN fact_orders f
ON s.order_id = f.order_key
WHERE s.amount <> f.total_amount;
✅ GROUP BY Aggregation Validation
SELECT customer_id, SUM(amount)
FROM orders_src
GROUP BY customer_id;
SELECT cust_key, SUM(total_amount)
FROM fact_orders
GROUP BY cust_key;
✅ Window Function – Duplicate Check
SELECT order_key
FROM (
SELECT order_key,
ROW_NUMBER() OVER (PARTITION BY order_key ORDER BY order_dt) rn
FROM fact_orders
) t
WHERE rn > 1;
✅ Performance Tuning Validation
EXPLAIN ANALYZE
SELECT * FROM fact_orders
WHERE order_dt = ‘2024-06-01’;
5️⃣ Scenario-Based ETL Testing Questions with Answers
🔹 Scenario 1: Record Count Mismatch
Q: Source has 1,000,000 rows, target has 995,000.
A: Check filters, rejected rows, join conditions, and error tables.
🔹 Scenario 2: NULL Values in Target
Q: Mandatory column has NULL values.
A: Validate source nulls, default logic, and expression transformations.
🔹 Scenario 3: SCD2 Not Working
Q: Old records are overwritten.
A: Check effective_date, end_date, and active_flag logic.
🔹 Scenario 4: Performance Issue
Q: ETL job exceeds SLA.
A: Review indexes, partitions, parallelism, and push-down optimization.
6️⃣ ETL Tools Commonly Asked in Interviews
- Informatica – Mappings, workflows, sessions
- Microsoft SSIS – Control Flow, Data Flow
- Ab Initio – High-performance ETL
- Pentaho – Kettle transformations
- Talend – Open-source & cloud ETL
Interviewers mainly evaluate logic, SQL skills, and testing approach, not just tool UI knowledge.
7️⃣ ETL Defect Examples (Real-Time)
| Defect Type | Example |
| Mapping defect | Wrong source column mapped |
| Data loss | Valid records filtered out |
| SCD defect | History not preserved |
| Performance | Job exceeds SLA |
| Data type | Truncation issues |
8️⃣ Sample ETL Test Case
Test Case: Validate SCD Type 2 – Customer Dimension
- Source Table: customer_src
- Target Table: dim_customer
- Validations:
- Only one active record
- Old record has end_date
- New surrogate key generated
- Audit fields populated
9️⃣ ETL Testing Revision Sheet (Quick Review)
- ETL architecture & data flow
- S2T mapping validation
- SCD1 vs SCD2
- Incremental vs full load
- SQL joins, GROUP BY, window functions
- Hash totals & reconciliation
- Performance tuning basics
🔟 FAQs – ETL Testing Interview Questions for Testers
Q1. What SQL level is required for ETL testers?
Intermediate to advanced SQL with joins and window functions.
Q2. Is ETL testing manual or automated?
Primarily SQL-driven manual testing with selective automation.
Q3. What is the most important skill for ETL testers?
Understanding business logic and strong SQL validation.
Q4. How many ETL interview questions should testers prepare?
At least 60–100 questions covering concepts and scenarios.
