1️⃣ What is ETL Testing? (Definition + Example)
ETL Testing validates that data is Extracted from source systems, Transformed according to business rules, and Loaded accurately into a data warehouse.
Example
A banking ETL job extracts transactions from Oracle, converts currency values, applies SCD2 logic on customers, and loads results into a reporting mart. ETL testing ensures:
- No data loss
- Transformations are correct
- History is preserved
- Performance meets SLA
For 6 years experienced professionals, interviews focus on architecture understanding, SQL depth, performance tuning, and real production scenarios.
2️⃣ Data Warehouse Flow – Source → Staging → Transform → Load → Reporting
Typical ETL Architecture
| Layer | Purpose |
| Source | OLTP systems, APIs, flat files |
| Staging | Raw landing, data cleansing |
| Transformation | Business rules, SCD, aggregation |
| Load | Fact & Dimension tables |
| Reporting | BI tools, dashboards |
Key validations:
- Record count checks
- Hash total validation
- Data type & length checks
- Audit columns (created_dt, batch_id)
3️⃣ ETL Testing Interview Questions + Answers (Basic → Advanced)
🔹 Basic ETL QA Questions (Experience Warm-up)
- What is ETL testing?
Validating extraction, transformation rules, and accurate loading into the target system. - Difference between ETL testing and database testing?
ETL testing focuses on data movement + transformation, DB testing focuses on schema & data integrity. - What is staging area?
An intermediate storage used before transformation. - What is S2T mapping?
Document defining Source-to-Target column mapping and transformation logic. - What is data reconciliation?
Comparing source and target data for accuracy.
🔹 Intermediate ETL Interview Questions (Real-World)
- How do you validate source-to-target mapping?
Using S2T documents + SQL queries. - What is SCD Type 1?
Overwrites old data, no history. - What is SCD Type 2?
Maintains history using start_date, end_date, active_flag. - What are audit fields?
batch_id, load_date, created_by, updated_ts. - Difference between full load and incremental load?
Full reloads all data; incremental loads delta changes.
🔹 Advanced ETL Testing Interview Questions (6+ Years Level)
- How do you test incremental loads?
By validating CDC logic, timestamps, and surrogate keys. - What is data skew and how do you test it?
Uneven data distribution; validate using GROUP BY counts. - How do you validate transformations without UI access?
Using SQL-based expected vs actual logic. - Explain hash validation.
Sum or checksum comparison to validate large datasets. - How do you test restartability of ETL jobs?
By failing jobs mid-run and re-executing.
4️⃣ Real SQL Validation Examples (With Sample Data)
🔸 Sample Source Table
orders_src(order_id, cust_id, amount, order_date)
🔸 Target Fact Table
fact_orders(order_key, cust_key, total_amount, order_dt)
✅ Record Count Validation
SELECT COUNT(*) FROM orders_src
WHERE order_date >= ‘2024-01-01’;
SELECT COUNT(*) FROM fact_orders
WHERE order_dt >= ‘2024-01-01’;
✅ JOIN Validation
SELECT s.order_id, s.amount, t.total_amount
FROM orders_src s
JOIN fact_orders t
ON s.order_id = t.order_key
WHERE s.amount <> t.total_amount;
✅ GROUP BY Aggregation Validation
SELECT cust_id, SUM(amount)
FROM orders_src
GROUP BY cust_id;
SELECT cust_key, SUM(total_amount)
FROM fact_orders
GROUP BY cust_key;
✅ Window Function – Duplicate Detection
SELECT order_id
FROM (
SELECT order_id,
ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_date) rn
FROM fact_orders
) x
WHERE rn > 1;
✅ Performance Tuning Validation
EXPLAIN ANALYZE
SELECT * FROM fact_orders WHERE order_dt = ‘2024-01-01’;
5️⃣ Scenario-Based ETL Testing Questions with Answers
🔹 Scenario 1: Record Count Mismatch
Q: Source has 1M records, target has 990K.
A: Check rejected records, transformation filters, joins, lookup mismatches.
🔹 Scenario 2: Null Values in Target
Q: Mandatory field is NULL in target.
A: Validate source nulls, default logic, expression transformations.
🔹 Scenario 3: SCD2 Not Working
Q: Old records overwritten instead of versioned.
A: Validate effective_date, current_flag logic.
🔹 Scenario 4: ETL Job Slowness
Q: Job exceeded SLA.
A: Check indexing, partitioning, parallelism, push-down optimization.
6️⃣ ETL Tools Commonly Asked in Interviews
- Informatica – PowerCenter, mappings, sessions
- Microsoft SSIS – Control Flow, Data Flow
- Ab Initio – High-performance ETL
- Pentaho – Kettle, transformations
- Talend – Open & enterprise ETL
Interviewers expect tool-agnostic logic + SQL validation skills.
7️⃣ ETL Defect Examples (Real Project)
| Defect Type | Example |
| Mapping defect | Wrong column mapped |
| Data loss | Filter removes valid rows |
| SCD defect | History not maintained |
| Performance | Job exceeds SLA |
| Data type | Truncation issues |
8️⃣ Sample ETL Test Case
Test Case: Validate SCD2 Customer Dimension
- Source: customer_src
- Target: dim_customer
- Validation:
- Only one active record
- Old record end_date populated
- New surrogate key generated
9️⃣ Quick Revision Sheet (Interview Last-Minute)
- ETL flow & architecture
- SCD1 vs SCD2
- Incremental logic
- SQL joins & window functions
- Hash & reconciliation
- Performance tuning basics
🔟 FAQs (For Featured Snippets)
Q1. What level of SQL is expected for ETL testing?
Advanced SQL including joins, subqueries, window functions.
Q2. Is ETL testing manual or automation?
Primarily manual SQL-driven, with automation support.
Q3. What is the most important ETL testing skill?
Understanding business logic + strong SQL.
Q4. How many interview questions should I prepare?
At least 80–120 for 6+ years experience.
