ETL Testing Interview Questions for 6 Years Experienced – Complete Expert Guide

1️⃣ What is ETL Testing? (Definition + Example)

ETL Testing validates that data is Extracted from source systems, Transformed according to business rules, and Loaded accurately into a data warehouse.

Example
A banking ETL job extracts transactions from Oracle, converts currency values, applies SCD2 logic on customers, and loads results into a reporting mart. ETL testing ensures:

  • No data loss
  • Transformations are correct
  • History is preserved
  • Performance meets SLA

For 6 years experienced professionals, interviews focus on architecture understanding, SQL depth, performance tuning, and real production scenarios.


2️⃣ Data Warehouse Flow – Source → Staging → Transform → Load → Reporting

Typical ETL Architecture

LayerPurpose
SourceOLTP systems, APIs, flat files
StagingRaw landing, data cleansing
TransformationBusiness rules, SCD, aggregation
LoadFact & Dimension tables
ReportingBI tools, dashboards

Key validations:

  • Record count checks
  • Hash total validation
  • Data type & length checks
  • Audit columns (created_dt, batch_id)

3️⃣ ETL Testing Interview Questions + Answers (Basic → Advanced)

🔹 Basic ETL QA Questions (Experience Warm-up)

  1. What is ETL testing?
    Validating extraction, transformation rules, and accurate loading into the target system.
  2. Difference between ETL testing and database testing?
    ETL testing focuses on data movement + transformation, DB testing focuses on schema & data integrity.
  3. What is staging area?
    An intermediate storage used before transformation.
  4. What is S2T mapping?
    Document defining Source-to-Target column mapping and transformation logic.
  5. What is data reconciliation?
    Comparing source and target data for accuracy.

🔹 Intermediate ETL Interview Questions (Real-World)

  1. How do you validate source-to-target mapping?
    Using S2T documents + SQL queries.
  2. What is SCD Type 1?
    Overwrites old data, no history.
  3. What is SCD Type 2?
    Maintains history using start_date, end_date, active_flag.
  4. What are audit fields?
    batch_id, load_date, created_by, updated_ts.
  5. Difference between full load and incremental load?
    Full reloads all data; incremental loads delta changes.

🔹 Advanced ETL Testing Interview Questions (6+ Years Level)

  1. How do you test incremental loads?
    By validating CDC logic, timestamps, and surrogate keys.
  2. What is data skew and how do you test it?
    Uneven data distribution; validate using GROUP BY counts.
  3. How do you validate transformations without UI access?
    Using SQL-based expected vs actual logic.
  4. Explain hash validation.
    Sum or checksum comparison to validate large datasets.
  5. How do you test restartability of ETL jobs?
    By failing jobs mid-run and re-executing.

4️⃣ Real SQL Validation Examples (With Sample Data)

🔸 Sample Source Table

orders_src(order_id, cust_id, amount, order_date)

🔸 Target Fact Table

fact_orders(order_key, cust_key, total_amount, order_dt)


✅ Record Count Validation

SELECT COUNT(*) FROM orders_src

WHERE order_date >= ‘2024-01-01’;

SELECT COUNT(*) FROM fact_orders

WHERE order_dt >= ‘2024-01-01’;


✅ JOIN Validation

SELECT s.order_id, s.amount, t.total_amount

FROM orders_src s

JOIN fact_orders t

ON s.order_id = t.order_key

WHERE s.amount <> t.total_amount;


✅ GROUP BY Aggregation Validation

SELECT cust_id, SUM(amount)

FROM orders_src

GROUP BY cust_id;

SELECT cust_key, SUM(total_amount)

FROM fact_orders

GROUP BY cust_key;


✅ Window Function – Duplicate Detection

SELECT order_id

FROM (

  SELECT order_id,

         ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_date) rn

  FROM fact_orders

) x

WHERE rn > 1;


✅ Performance Tuning Validation

EXPLAIN ANALYZE

SELECT * FROM fact_orders WHERE order_dt = ‘2024-01-01’;


5️⃣ Scenario-Based ETL Testing Questions with Answers

🔹 Scenario 1: Record Count Mismatch

Q: Source has 1M records, target has 990K.
A: Check rejected records, transformation filters, joins, lookup mismatches.


🔹 Scenario 2: Null Values in Target

Q: Mandatory field is NULL in target.
A: Validate source nulls, default logic, expression transformations.


🔹 Scenario 3: SCD2 Not Working

Q: Old records overwritten instead of versioned.
A: Validate effective_date, current_flag logic.


🔹 Scenario 4: ETL Job Slowness

Q: Job exceeded SLA.
A: Check indexing, partitioning, parallelism, push-down optimization.


6️⃣ ETL Tools Commonly Asked in Interviews

  • Informatica – PowerCenter, mappings, sessions
  • Microsoft SSIS – Control Flow, Data Flow
  • Ab Initio – High-performance ETL
  • Pentaho – Kettle, transformations
  • Talend – Open & enterprise ETL

Interviewers expect tool-agnostic logic + SQL validation skills.


7️⃣ ETL Defect Examples (Real Project)

Defect TypeExample
Mapping defectWrong column mapped
Data lossFilter removes valid rows
SCD defectHistory not maintained
PerformanceJob exceeds SLA
Data typeTruncation issues

8️⃣ Sample ETL Test Case

Test Case: Validate SCD2 Customer Dimension

  • Source: customer_src
  • Target: dim_customer
  • Validation:
    • Only one active record
    • Old record end_date populated
    • New surrogate key generated

9️⃣ Quick Revision Sheet (Interview Last-Minute)

  • ETL flow & architecture
  • SCD1 vs SCD2
  • Incremental logic
  • SQL joins & window functions
  • Hash & reconciliation
  • Performance tuning basics

🔟 FAQs (For Featured Snippets)

Q1. What level of SQL is expected for ETL testing?
Advanced SQL including joins, subqueries, window functions.

Q2. Is ETL testing manual or automation?
Primarily manual SQL-driven, with automation support.

Q3. What is the most important ETL testing skill?
Understanding business logic + strong SQL.

Q4. How many interview questions should I prepare?
At least 80–120 for 6+ years experience.

Leave a Comment

Your email address will not be published. Required fields are marked *