ETL Testing Scenario Based Interview Questions -

What Is ETL Testing? (Definition + Real-World Example)

ETL Testing is the process of validating data as it moves through Extract → Transform → Load stages to ensure accuracy, completeness, consistency, and performance in a Data Warehouse (DW).

Real-world example

In a retail DW, daily sales data is extracted from POS systems, transformed to apply discounts, tax rules, and currency conversion, and loaded into fact tables. ETL testing ensures:

No data loss during extraction
Transformations follow business rules
Aggregated sales in reports match source totals

This makes etl testing scenario based interview questions highly practical and business-oriented.

Data Warehouse Flow: Source → Staging → Transform → Load → Reporting

Source – OLTP databases, flat files, APIs
Staging – Raw data landing area
Transformation – Business logic, joins, SCD handling
Load – Fact & Dimension tables
Reporting – BI dashboards, analytics

ETL Testing Scenario Based Interview Questions & Answers

(Basic → Advanced | Real-Time Focus)

🔹 Basic ETL Testing Scenarios (1–15)

How do you validate record count between source and target?
By comparing source count with target count after applying transformation filters.
What if source has 1M records and target has 990K?
Check rejected records, filter conditions, and duplicate removal logic.
How do you test null handling in ETL?
Validate mandatory columns for unexpected NULLs and default values.
Scenario: Source column is nullable but target is NOT NULL. What do you test?
Verify default mapping or rejection logic.
What is Source-to-Target (S2T) validation?
Verifying each source column mapping, transformation, and target field.
How do you test data type mismatches?
Validate truncation, rounding, and casting rules.
Scenario: Duplicate records in target but not in source. Why?
Incorrect joins or missing deduplication logic.
What is audit field testing?
Validate load_date, batch_id, created_by fields.
How do you test incremental loads?
Validate only delta records are processed using CDC logic.
Scenario: Job rerun creates duplicates. How do you test restartability?
Ensure idempotent logic and proper delete/merge strategy.
What is data reconciliation?
Matching totals, counts, and key metrics.
How do you validate staging data?
Ensure staging matches source exactly (no transformation).
What is reject data testing?
Validating rejected rows and error reasons.
Scenario: File delimiter changes unexpectedly. What test?
File format and schema validation.
How do you validate date transformations?
Check timezone, format, and business calendar rules.

🔹 Intermediate Scenario-Based Questions (16–35)

What is SCD Type 1 scenario testing?
Validate overwrite of old values without history.
What is SCD Type 2 scenario testing?
Validate new row creation with effective_from, effective_to, current_flag.
Scenario: Customer address changes. What do you test?
SCD2 history preservation.
How do you test surrogate keys?
Ensure uniqueness and non-null values.
What is late arriving dimension scenario?
Fact arrives before dimension; test dummy key handling.
Scenario: Fact row has invalid dimension key. What happens?
Reject or map to unknown key.
How do you test aggregation logic?
Compare aggregated target data with source calculations.
What is hashing used for in ETL testing?
Change detection and deduplication.
Scenario: One-to-many join inflates data. How do you test?
Validate join cardinality.
How do you test data quality rules?
Range checks, domain checks, pattern validation.
Scenario: Negative sales amount appears. What do you test?
Business rule validation.
How do you validate currency conversion?
Compare converted amounts with exchange rate tables.
What is partition testing?
Validate correct partition placement of data.
Scenario: Historical data reload. What do you test?
Data duplication and versioning.
How do you test ETL error handling?
Validate alerts, logs, and rerun capability.
What is referential integrity testing?
Fact-dimension key consistency.
Scenario: Dimension record expires incorrectly. What to check?
Effective date logic.
How do you test soft deletes?
Validate delete_flag instead of physical delete.
What is schema drift scenario?
Source schema changes; test ETL adaptability.
How do you validate report data?
Compare BI output with DW tables.

🔹 Advanced & Real-Time Scenarios (36–55)

Scenario: ETL job misses SLA. How do you test performance?
Analyze query plans, indexing, partitions.
How do you test parallel ETL jobs?
Check locking, duplicates, deadlocks.
Scenario: Data skew in big tables. What to test?
Distribution and partition balance.
How do you validate window functions in ETL?
Compare ranked data with business rules.
Scenario: Late-arriving facts affect aggregates. What to test?
Recalculation logic.
How do you test CDC failures?
Validate missing delta data.
Scenario: Reprocessing rejected data. What to validate?
Corrected rows and counts.
How do you test ETL rollback?
Ensure partial loads are reverted.
Scenario: PII columns exposed in target. What test?
Data masking validation.
How do you test multi-source integration?
Source precedence and conflict resolution.
Scenario: Time zone mismatch. What to validate?
Timestamp normalization.
How do you test archival logic?
Validate data movement to history tables.
Scenario: Slowly changing fact. How to test?
Validate adjustment logic.
How do you test cloud DW ETL?
Cost, scalability, and performance checks.
Scenario: Unexpected NULLs after join. Why?
Outer join issues.
How do you test ETL metadata tables?
Batch status and counts.
Scenario: File arrives late. What to test?
Dependency and rerun logic.
How do you validate checksum/hash totals?
Compare source vs target hashes.
Scenario: Aggregates don’t match reports. What to test?
Grain mismatch.
How do you test end-to-end ETL flow?
Source → DW → BI reconciliation.

Real SQL Query Examples for ETL Validation

Sample Tables

src_orders(order_id, cust_id, amount, order_dt)
stg_orders
fact_sales(order_id, cust_sk, amount, load_dt)

1️⃣ Record Count Validation

SELECT COUNT(*) FROM src_orders;

SELECT COUNT(*) FROM fact_sales;

2️⃣ JOIN Validation

SELECT COUNT(*) missing_dim

FROM fact_sales f

LEFT JOIN dim_customer d

ON f.cust_sk = d.cust_sk

WHERE d.cust_sk IS NULL;

3️⃣ GROUP BY Aggregation Check

SELECT cust_id, SUM(amount)

FROM src_orders

GROUP BY cust_id;

4️⃣ Window Function – Deduplication

SELECT *

FROM (

SELECT order_id,

ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn

FROM src_orders

) t

WHERE rn = 1;

5️⃣ Performance Tuning Validation

EXPLAIN

SELECT *

FROM fact_sales

WHERE load_dt >= CURRENT_DATE – 1;

ETL Tools Commonly Asked in Interviews

Informatica
Microsoft SQL Server Integration Services
Ab Initio
Pentaho
Talend

ETL Defect Examples + Test Case Sample

Defect Example

Issue: Duplicate fact records

Root cause: Many-to-many join
Fix: Correct join condition, add dedup logic

Sample Test Case

Test Case: Validate SCD2 update
Expected: Old record expires, new record inserted
SQL: Validate current_flag and date ranges

ETL Testing Quick Revision Sheet

Validate counts, sums, NULLs, duplicates
Check S2T mappings thoroughly
Test SCD1, SCD2, audit fields
Use JOIN, GROUP BY, window functions
Always reconcile source vs target vs reports

FAQs (SEO & Snippet Friendly)

Q1. What are ETL testing scenario based interview questions?
They focus on real-time data issues like mismatches, null handling, and performance.

Q2. Which SQL is important for ETL testing?
JOINs, GROUP BY, window functions, and performance tuning queries.

Q3. How do you test SCD2 in real projects?
By validating history rows, effective dates, and current flags.