ETL Testing Interview Questions for Testers – Complete Practical Guide with SQL & Real Scenarios

1️⃣ What is ETL Testing? (Definition + Example)

ETL Testing is the process of verifying that data is correctly Extracted from source systems, Transformed based on business rules, and Loaded into the target data warehouse or data mart without data loss, duplication, or corruption.

Real-World Example

In an e-commerce project:

  • Orders data is extracted from MySQL and CSV files
  • Business rules apply discounts, currency conversion, and deduplication
  • Data is loaded into fact and dimension tables
  • Reports show daily revenue and customer trends

An ETL tester validates:

  • Source-to-Target (S2T) mappings
  • Transformation logic
  • Record counts and data accuracy
  • Performance and restartability

This makes etl testing interview questions for testers highly focused on SQL, logic validation, and real production scenarios.


2️⃣ Data Warehouse Flow – Source → Staging → Transform → Load → Reporting

Typical ETL / DW Architecture

LayerDescriptionTesting Focus
SourceOLTP DBs, APIs, flat filesData completeness
StagingRaw extracted dataCleansing & duplicates
TransformationBusiness rules, SCDsLogic accuracy
LoadFact & Dimension tablesKeys & integrity
ReportingBI dashboardsAggregation checks

Key Tester Responsibilities

  • Validate S2T mapping
  • Verify incremental and full loads
  • Check audit fields (batch_id, load_date)
  • Perform reconciliation & hashing

3️⃣ ETL Testing Interview Questions for Testers (Basic → Advanced)

🔹 Basic ETL Testing Interview Questions

  1. What is ETL testing?
    Validation of extraction, transformation, and loading of data.
  2. What is the purpose of staging tables?
    To temporarily store raw data before transformation.
  3. What is S2T mapping?
    Document that maps source columns to target columns with transformation rules.
  4. Difference between ETL testing and database testing?
    ETL testing validates data movement and transformation; DB testing validates schema and constraints.
  5. What are fact and dimension tables?
    Facts store metrics; dimensions store descriptive attributes.

🔹 Intermediate ETL QA Questions

  1. What is SCD Type 1?
    Overwrites old data without maintaining history.
  2. What is SCD Type 2?
    Maintains historical records using effective_date and active_flag.
  3. What is incremental load?
    Loading only new or changed data since last run.
  4. What are audit fields?
    Metadata columns like created_date, updated_date, batch_id.
  5. What is data reconciliation?
    Comparing source and target data for consistency.

🔹 Advanced / Real-Time ETL Interview Questions

  1. How do you test large ETL loads?
    Using record count checks, hashing, sampling, and aggregates.
  2. How do you validate transformations without ETL tool access?
    By writing SQL that mimics transformation logic.
  3. What is hashing in ETL testing?
    Using checksum/hash totals to validate large datasets efficiently.
  4. How do you test restartability of ETL jobs?
    Fail the job intentionally and verify resume logic.
  5. What is late-arriving dimension handling?
    Loading facts before dimensions using default surrogate keys.

(You can expect 40–100 similar questions in interviews; preparing these core concepts is critical.)


4️⃣ Real SQL Query Examples for ETL Validation

Sample Source Table

orders_src(order_id, customer_id, amount, order_date)

Sample Target Table

fact_orders(order_key, cust_key, total_amount, order_dt)


✅ Record Count Validation

SELECT COUNT(*) FROM orders_src

WHERE order_date >= ‘2024-01-01’;

SELECT COUNT(*) FROM fact_orders

WHERE order_dt >= ‘2024-01-01’;


✅ JOIN Validation (Data Accuracy)

SELECT s.order_id, s.amount, f.total_amount

FROM orders_src s

JOIN fact_orders f

ON s.order_id = f.order_key

WHERE s.amount <> f.total_amount;


✅ GROUP BY Aggregation Validation

SELECT customer_id, SUM(amount)

FROM orders_src

GROUP BY customer_id;

SELECT cust_key, SUM(total_amount)

FROM fact_orders

GROUP BY cust_key;


✅ Window Function – Duplicate Check

SELECT order_key

FROM (

  SELECT order_key,

         ROW_NUMBER() OVER (PARTITION BY order_key ORDER BY order_dt) rn

  FROM fact_orders

) t

WHERE rn > 1;


✅ Performance Tuning Validation

EXPLAIN ANALYZE

SELECT * FROM fact_orders

WHERE order_dt = ‘2024-06-01’;


5️⃣ Scenario-Based ETL Testing Questions with Answers

🔹 Scenario 1: Record Count Mismatch

Q: Source has 1,000,000 rows, target has 995,000.
A: Check filters, rejected rows, join conditions, and error tables.


🔹 Scenario 2: NULL Values in Target

Q: Mandatory column has NULL values.
A: Validate source nulls, default logic, and expression transformations.


🔹 Scenario 3: SCD2 Not Working

Q: Old records are overwritten.
A: Check effective_date, end_date, and active_flag logic.


🔹 Scenario 4: Performance Issue

Q: ETL job exceeds SLA.
A: Review indexes, partitions, parallelism, and push-down optimization.


6️⃣ ETL Tools Commonly Asked in Interviews

  • Informatica – Mappings, workflows, sessions
  • Microsoft SSIS – Control Flow, Data Flow
  • Ab Initio – High-performance ETL
  • Pentaho – Kettle transformations
  • Talend – Open-source & cloud ETL

Interviewers mainly evaluate logic, SQL skills, and testing approach, not just tool UI knowledge.


7️⃣ ETL Defect Examples (Real-Time)

Defect TypeExample
Mapping defectWrong source column mapped
Data lossValid records filtered out
SCD defectHistory not preserved
PerformanceJob exceeds SLA
Data typeTruncation issues

8️⃣ Sample ETL Test Case

Test Case: Validate SCD Type 2 – Customer Dimension

  • Source Table: customer_src
  • Target Table: dim_customer
  • Validations:
    • Only one active record
    • Old record has end_date
    • New surrogate key generated
    • Audit fields populated

9️⃣ ETL Testing Revision Sheet (Quick Review)

  • ETL architecture & data flow
  • S2T mapping validation
  • SCD1 vs SCD2
  • Incremental vs full load
  • SQL joins, GROUP BY, window functions
  • Hash totals & reconciliation
  • Performance tuning basics

🔟 FAQs – ETL Testing Interview Questions for Testers

Q1. What SQL level is required for ETL testers?
Intermediate to advanced SQL with joins and window functions.

Q2. Is ETL testing manual or automated?
Primarily SQL-driven manual testing with selective automation.

Q3. What is the most important skill for ETL testers?
Understanding business logic and strong SQL validation.

Q4. How many ETL interview questions should testers prepare?
At least 60–100 questions covering concepts and scenarios.

Leave a Comment

Your email address will not be published. Required fields are marked *