What Is ETL Testing? (Definition + Real-World Example)
ETL Testing is the process of validating data as it moves through Extract → Transform → Load stages to ensure accuracy, completeness, consistency, and performance in a Data Warehouse (DW).
Real-world example
In a retail DW, daily sales data is extracted from POS systems, transformed to apply discounts, tax rules, and currency conversion, and loaded into fact tables. ETL testing ensures:
- No data loss during extraction
- Transformations follow business rules
- Aggregated sales in reports match source totals
This makes etl testing scenario based interview questions highly practical and business-oriented.
Data Warehouse Flow: Source → Staging → Transform → Load → Reporting
- Source – OLTP databases, flat files, APIs
- Staging – Raw data landing area
- Transformation – Business logic, joins, SCD handling
- Load – Fact & Dimension tables
- Reporting – BI dashboards, analytics
ETL Testing Scenario Based Interview Questions & Answers
(Basic → Advanced | Real-Time Focus)
🔹 Basic ETL Testing Scenarios (1–15)
- How do you validate record count between source and target?
By comparing source count with target count after applying transformation filters. - What if source has 1M records and target has 990K?
Check rejected records, filter conditions, and duplicate removal logic. - How do you test null handling in ETL?
Validate mandatory columns for unexpected NULLs and default values. - Scenario: Source column is nullable but target is NOT NULL. What do you test?
Verify default mapping or rejection logic. - What is Source-to-Target (S2T) validation?
Verifying each source column mapping, transformation, and target field. - How do you test data type mismatches?
Validate truncation, rounding, and casting rules. - Scenario: Duplicate records in target but not in source. Why?
Incorrect joins or missing deduplication logic. - What is audit field testing?
Validate load_date, batch_id, created_by fields. - How do you test incremental loads?
Validate only delta records are processed using CDC logic. - Scenario: Job rerun creates duplicates. How do you test restartability?
Ensure idempotent logic and proper delete/merge strategy. - What is data reconciliation?
Matching totals, counts, and key metrics. - How do you validate staging data?
Ensure staging matches source exactly (no transformation). - What is reject data testing?
Validating rejected rows and error reasons. - Scenario: File delimiter changes unexpectedly. What test?
File format and schema validation. - How do you validate date transformations?
Check timezone, format, and business calendar rules.
🔹 Intermediate Scenario-Based Questions (16–35)
- What is SCD Type 1 scenario testing?
Validate overwrite of old values without history. - What is SCD Type 2 scenario testing?
Validate new row creation with effective_from, effective_to, current_flag. - Scenario: Customer address changes. What do you test?
SCD2 history preservation. - How do you test surrogate keys?
Ensure uniqueness and non-null values. - What is late arriving dimension scenario?
Fact arrives before dimension; test dummy key handling. - Scenario: Fact row has invalid dimension key. What happens?
Reject or map to unknown key. - How do you test aggregation logic?
Compare aggregated target data with source calculations. - What is hashing used for in ETL testing?
Change detection and deduplication. - Scenario: One-to-many join inflates data. How do you test?
Validate join cardinality. - How do you test data quality rules?
Range checks, domain checks, pattern validation. - Scenario: Negative sales amount appears. What do you test?
Business rule validation. - How do you validate currency conversion?
Compare converted amounts with exchange rate tables. - What is partition testing?
Validate correct partition placement of data. - Scenario: Historical data reload. What do you test?
Data duplication and versioning. - How do you test ETL error handling?
Validate alerts, logs, and rerun capability. - What is referential integrity testing?
Fact-dimension key consistency. - Scenario: Dimension record expires incorrectly. What to check?
Effective date logic. - How do you test soft deletes?
Validate delete_flag instead of physical delete. - What is schema drift scenario?
Source schema changes; test ETL adaptability. - How do you validate report data?
Compare BI output with DW tables.
🔹 Advanced & Real-Time Scenarios (36–55)
- Scenario: ETL job misses SLA. How do you test performance?
Analyze query plans, indexing, partitions. - How do you test parallel ETL jobs?
Check locking, duplicates, deadlocks. - Scenario: Data skew in big tables. What to test?
Distribution and partition balance. - How do you validate window functions in ETL?
Compare ranked data with business rules. - Scenario: Late-arriving facts affect aggregates. What to test?
Recalculation logic. - How do you test CDC failures?
Validate missing delta data. - Scenario: Reprocessing rejected data. What to validate?
Corrected rows and counts. - How do you test ETL rollback?
Ensure partial loads are reverted. - Scenario: PII columns exposed in target. What test?
Data masking validation. - How do you test multi-source integration?
Source precedence and conflict resolution. - Scenario: Time zone mismatch. What to validate?
Timestamp normalization. - How do you test archival logic?
Validate data movement to history tables. - Scenario: Slowly changing fact. How to test?
Validate adjustment logic. - How do you test cloud DW ETL?
Cost, scalability, and performance checks. - Scenario: Unexpected NULLs after join. Why?
Outer join issues. - How do you test ETL metadata tables?
Batch status and counts. - Scenario: File arrives late. What to test?
Dependency and rerun logic. - How do you validate checksum/hash totals?
Compare source vs target hashes. - Scenario: Aggregates don’t match reports. What to test?
Grain mismatch. - How do you test end-to-end ETL flow?
Source → DW → BI reconciliation.
Real SQL Query Examples for ETL Validation
Sample Tables
- src_orders(order_id, cust_id, amount, order_dt)
- stg_orders
- fact_sales(order_id, cust_sk, amount, load_dt)
1️⃣ Record Count Validation
SELECT COUNT(*) FROM src_orders;
SELECT COUNT(*) FROM fact_sales;
2️⃣ JOIN Validation
SELECT COUNT(*) missing_dim
FROM fact_sales f
LEFT JOIN dim_customer d
ON f.cust_sk = d.cust_sk
WHERE d.cust_sk IS NULL;
3️⃣ GROUP BY Aggregation Check
SELECT cust_id, SUM(amount)
FROM src_orders
GROUP BY cust_id;
4️⃣ Window Function – Deduplication
SELECT *
FROM (
SELECT order_id,
ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn
FROM src_orders
) t
WHERE rn = 1;
5️⃣ Performance Tuning Validation
EXPLAIN
SELECT *
FROM fact_sales
WHERE load_dt >= CURRENT_DATE – 1;
ETL Tools Commonly Asked in Interviews
- Informatica
- Microsoft SQL Server Integration Services
- Ab Initio
- Pentaho
- Talend
ETL Defect Examples + Test Case Sample
Defect Example
Issue: Duplicate fact records
- Root cause: Many-to-many join
- Fix: Correct join condition, add dedup logic
Sample Test Case
- Test Case: Validate SCD2 update
- Expected: Old record expires, new record inserted
- SQL: Validate current_flag and date ranges
ETL Testing Quick Revision Sheet
- Validate counts, sums, NULLs, duplicates
- Check S2T mappings thoroughly
- Test SCD1, SCD2, audit fields
- Use JOIN, GROUP BY, window functions
- Always reconcile source vs target vs reports
FAQs (SEO & Snippet Friendly)
Q1. What are ETL testing scenario based interview questions?
They focus on real-time data issues like mismatches, null handling, and performance.
Q2. Which SQL is important for ETL testing?
JOINs, GROUP BY, window functions, and performance tuning queries.
Q3. How do you test SCD2 in real projects?
By validating history rows, effective dates, and current flags.
