Scenario Based ETL Testing Interview Questions

What Is ETL Testing? (Definition + Real-World Example)

ETL Testing is the process of validating data as it is Extracted from source systems, Transformed using business rules, and Loaded into a Data Warehouse (DW) or data lake.

Real-world example

In a telecom project, daily call-detail records are extracted from multiple source systems, transformed to calculate billable duration, discounts, and taxes, and loaded into fact tables for reporting.
Scenario based ETL testing ensures:

  • No data loss during extraction
  • Transformations follow Source-to-Target (S2T) rules
  • Aggregated reports exactly match source totals

That’s why scenario based ETL testing interview questions focus heavily on real production issues, not just theory.


Data Warehouse Flow: Source → Staging → Transform → Load → Reporting

  1. Source – OLTP databases, files, APIs
  2. Staging – Raw data landing area
  3. Transformation – Business rules, joins, SCD logic
  4. Load – Fact and Dimension tables
  5. Reporting – BI dashboards, analytics, MIS

Scenario Based ETL Testing Interview Questions & Answers

(Basic → Advanced | Real-Time Focus)


🔹 Basic Scenario Based ETL Questions (1–15)

  1. Scenario: Source has 1,00,000 records, target has 98,000. What do you check?
    Filters, rejected records, duplicate removal logic, join conditions.
  2. How do you validate record count in ETL testing?
    By comparing source vs target counts after transformations.
  3. Scenario: NULL values appear in mandatory target columns. What do you test?
    Default value logic and null-handling rules in mapping.
  4. What is Source-to-Target (S2T) validation?
    Verifying that each source field is correctly mapped and transformed.
  5. Scenario: Duplicate rows appear in target but not in source. Why?
    Incorrect joins or missing deduplication logic.
  6. What is staging area testing?
    Ensuring staging data exactly matches source data.
  7. Scenario: Data type mismatch error during load. What do you validate?
    Casting, truncation, and precision rules.
  8. What are audit fields?
    load_date, batch_id, created_by, updated_date.
  9. Scenario: ETL job fails mid-way. What do you test after rerun?
    Restartability and duplicate prevention logic.
  10. What is data reconciliation?
    Matching totals and counts between source and target.
  11. Scenario: Wrong data in report but DW table looks correct. What next?
    Validate report query logic and aggregation grain.
  12. What is full load vs incremental load?
    Full reloads all data; incremental loads only deltas.
  13. Scenario: Incremental load misses some records. What to check?
    CDC logic and watermark conditions.
  14. What is reject data testing?
    Validating rejected records and error reasons.
  15. Why is SQL critical for ETL testing?
    SQL validates data accuracy and transformations.

🔹 Intermediate Scenario Based ETL Questions (16–35)

  1. Scenario: Customer address changes. How do you test it?
    Validate SCD Type 2 history creation.
  2. What is SCD Type 1?
    Overwrites old data without history.
  3. What is SCD Type 2?
    Maintains historical data using effective dates and current_flag.
  4. Scenario: Fact record arrives before dimension. What is this called?
    Late-arriving dimension scenario.
  5. How do you test surrogate key generation?
    Ensure uniqueness and non-null values.
  6. Scenario: Aggregated totals don’t match source. What do you test?
    GROUP BY logic and transformation rules.
  7. What is hashing used for in ETL testing?
    Change detection and deduplication.
  8. Scenario: Many-to-many join inflates data. How to test?
    Validate join cardinality.
  9. How do you test referential integrity?
    Ensure fact foreign keys exist in dimension tables.
  10. Scenario: Negative revenue appears in report. What do you test?
    Business rule validation.
  11. What is CDC (Change Data Capture)?
    Capturing and processing only changed data.
  12. Scenario: Date values shift by one day. Why?
    Time zone conversion issues.
  13. What is soft delete testing?
    Validate delete_flag instead of physical deletion.
  14. Scenario: Schema changes in source. What do you test?
    ETL adaptability and schema drift handling.
  15. How do you validate data masking?
    Ensure PII fields are obfuscated.
  16. Scenario: Reprocessing rejected records. What to validate?
    Corrected data and updated counts.
  17. What is partition testing?
    Validate correct partition placement.
  18. Scenario: Historical reload requested. What do you test?
    Data duplication and SCD logic.
  19. How do you validate BI reports?
    Reconcile reports with DW tables.
  20. What is end-to-end ETL testing?
    Source → DW → Report validation.

🔹 Advanced & Real-Time ETL Scenarios (36–55)

  1. Scenario: ETL job misses SLA. What do you analyze?
    Query plans, indexing, partitioning.
  2. How do you test ETL performance tuning?
    Validate execution plans and join strategies.
  3. Scenario: Parallel jobs cause deadlocks. What test?
    Concurrency and locking validation.
  4. How do you validate window functions?
    Verify ranking and dedup logic.
  5. Scenario: Late data impacts aggregates. What do you do?
    Recalculate affected partitions.
  6. How do you test audit/control tables?
    Validate batch status and row counts.
  7. Scenario: Regulatory report mismatch. First step?
    Verify S2T mapping.
  8. How do you test ETL rollback?
    Ensure partial loads are reverted.
  9. Scenario: Source file arrives late. What to test?
    Dependency and rerun logic.
  10. How do you validate checksum/hash totals?
    Compare source vs target hashes.
  11. Scenario: Currency conversion mismatch. What to test?
    Exchange rate logic.
  12. How do you test cloud ETL pipelines?
    Scalability, cost, performance.
  13. Scenario: NULLs after LEFT JOIN. Why?
    Missing dimension records.
  14. How do you test archival logic?
    Validate movement to history tables.
  15. Scenario: Incorrect SCD expiry date. Root cause?
    Effective date logic error.
  16. How do you validate metadata tables?
    Batch IDs, timestamps, counts.
  17. Scenario: Unexpected data spike. What test?
    Source anomaly validation.
  18. How do you test restartability?
    Ensure idempotent loads.
  19. Scenario: Data quality threshold breached. What action?
    Reject, alert, and reprocess.
  20. How do you perform real-time ETL testing?
    Source → Target → Report reconciliation.

Real SQL Query Examples for ETL Validation

Sample Tables

  • src_orders(order_id, cust_id, amount, order_dt)
  • dim_customer(cust_sk, cust_id, current_flag)
  • fact_sales(order_id, cust_sk, amount, load_dt)

1️⃣ Record Count Validation

SELECT COUNT(*) FROM src_orders;

SELECT COUNT(*) FROM fact_sales;

2️⃣ JOIN Validation

SELECT COUNT(*) AS missing_dim

FROM fact_sales f

LEFT JOIN dim_customer d

ON f.cust_sk = d.cust_sk

WHERE d.cust_sk IS NULL;

3️⃣ GROUP BY Aggregation

SELECT cust_id, SUM(amount)

FROM src_orders

GROUP BY cust_id;

4️⃣ Window Function – Deduplication

SELECT *

FROM (

  SELECT order_id,

         ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn

  FROM src_orders

) t

WHERE rn = 1;

5️⃣ Performance Tuning Validation

EXPLAIN

SELECT *

FROM fact_sales

WHERE load_dt >= CURRENT_DATE – 1;


ETL Tools Commonly Asked in Interviews

  • Informatica
  • Microsoft SQL Server Integration Services
  • Ab Initio
  • Pentaho
  • Talend

ETL Defect Examples + Test Case Sample

Defect Example

Issue: Duplicate records in fact table

  • Root Cause: Incorrect join logic
  • Fix: Correct join keys and add deduplication

Sample Test Case

  • Test Case: Validate SCD2 update
  • Expected Result:
    • Old record expired
    • New record inserted with current_flag = ‘Y’

Scenario Based ETL Testing – Quick Revision Sheet

  • Validate counts, sums, NULLs, duplicates
  • Check S2T mapping line-by-line
  • Test SCD1, SCD2, audit fields, hashing
  • Use JOIN, GROUP BY, window functions
  • Always reconcile source vs target vs reports

FAQs (Snippet-Friendly)

Q1. What are scenario based ETL testing interview questions?
They focus on real-time ETL issues like mismatches, reruns, and performance.

Q2. Which SQL is mandatory for ETL testing?
JOINs, GROUP BY, window functions, and performance tuning queries.

Q3. How do you test SCD2 in real projects?
By validating history rows, effective dates, and current flags.

Leave a Comment

Your email address will not be published. Required fields are marked *