Real Time ETL Testing Interview Questions

What Is ETL Testing? (Definition + Real-Time Example)

ETL Testing is the process of validating data as it flows through Extract → Transform → Load stages to ensure accuracy, completeness, consistency, auditability, and performance in a Data Warehouse (DW).

Real-time project example

In a live production environment, daily transaction data is extracted from OLTP systems, transformed with business rules (currency conversion, deduplication, SCD handling), and loaded into fact and dimension tables.
A real-time ETL tester validates that:

  • No records are missed or duplicated
  • Transformations follow Source-to-Target (S2T) rules
  • Reports generated from DW exactly match business expectations

That is why real time ETL testing interview questions are always scenario-driven and SQL-heavy.


Data Warehouse Flow: Source → Staging → Transform → Load → Reporting

  1. Source – OLTP databases, files, APIs
  2. Staging – Raw data landing (no transformations)
  3. Transformation – Business rules, joins, SCD logic
  4. Load – Fact & Dimension tables
  5. Reporting – BI dashboards, analytics, MIS

Real Time ETL Testing Interview Questions & Answers

(Basic → Advanced | Scenario-Based)


🔹 Basic Real-Time ETL Questions (1–15)

  1. What is real-time ETL testing?
    Validating ETL jobs using production-like scenarios and real business data flows.
  2. Difference between ETL testing and real-time ETL testing?
    Real-time focuses on production issues like delays, reruns, and data mismatches.
  3. What is Source-to-Target (S2T) mapping?
    A document mapping source fields to target fields with transformations.
  4. What are the main ETL testing types?
    Source, staging, transformation, target, reconciliation, performance testing.
  5. What is staging area testing?
    Ensuring staging data exactly matches source data.
  6. What are audit fields?
    load_date, batch_id, created_by, updated_date.
  7. How do you validate record counts?
    Compare source and target counts after filters.
  8. What is data reconciliation?
    Matching totals and counts between source and target.
  9. What is full load vs incremental load?
    Full reloads all data; incremental loads only changed data.
  10. What is reject data testing?
    Validating rejected records and error reasons.
  11. What is data quality testing?
    Accuracy, completeness, consistency checks.
  12. What is mapping validation?
    Ensuring ETL logic matches S2T rules.
  13. What is end-to-end ETL testing?
    Source → DW → Report validation.
  14. What is data lineage?
    Tracking data from source to report.
  15. Why is SQL important for ETL testers?
    SQL is used to validate data and transformations.

🔹 Scenario-Based Real-Time Questions (16–35)

  1. Scenario: Source has 1M records, target has 980K. What do you check?
    Filters, rejected rows, deduplication logic.
  2. How do you test NULL handling?
    Validate mandatory columns and default values.
  3. What is SCD Type 1?
    Overwrites old values without history.
  4. What is SCD Type 2?
    Maintains history using effective dates and current_flag.
  5. Scenario: Customer address changes. What do you test?
    New SCD2 row creation and expiry of old row.
  6. What is surrogate key testing?
    Ensure keys are unique and non-null.
  7. What is late-arriving dimension scenario?
    Fact arrives before dimension; validate dummy key usage.
  8. Scenario: Duplicate records in target but not in source. Why?
    Many-to-many joins or missing dedup logic.
  9. How do you test aggregation logic?
    Compare aggregated source data with target.
  10. What is hashing used for?
    Change detection and deduplication.
  11. How do you test CDC (Change Data Capture)?
    Validate only delta records are processed.
  12. Scenario: Negative amount appears in report. What test?
    Business rule validation.
  13. What is referential integrity testing?
    Fact keys must exist in dimension tables.
  14. Scenario: Data type mismatch error. What do you test?
    Casting and truncation rules.
  15. How do you test date transformations?
    Time zone and format validation.
  16. What is soft delete testing?
    Validate delete_flag instead of physical delete.
  17. Scenario: ETL job rerun creates duplicates. Why?
    Restartability logic failure.
  18. What is schema drift?
    Source schema change affecting ETL.
  19. How do you validate BI reports?
    Reconcile reports with DW tables.
  20. What is data masking testing?
    Validate PII fields are masked.

🔹 Advanced Real-Time ETL Scenarios (36–55)

  1. Scenario: ETL job misses SLA. What do you analyze?
    Query plans, indexing, partitions.
  2. How do you test performance tuning?
    Validate join strategy and execution plans.
  3. Scenario: Many-to-many join inflates data. What test?
    Join cardinality validation.
  4. How do you validate window functions?
    Check ranking and dedup logic.
  5. Scenario: Late data impacts aggregates. What do you do?
    Recalculate impacted partitions.
  6. How do you test audit/control tables?
    Validate batch status and row counts.
  7. Scenario: Report mismatch with DW. First step?
    Verify S2T and aggregation logic.
  8. How do you test ETL rollback?
    Ensure partial loads are reverted.
  9. Scenario: Parallel jobs cause deadlocks. What test?
    Concurrency and locking validation.
  10. How do you test historical reloads?
    Check duplication and SCD logic.
  11. Scenario: Source file arrives late. What test?
    Dependency and rerun logic.
  12. How do you validate checksum/hash totals?
    Compare source vs target hashes.
  13. Scenario: Currency conversion mismatch. What to validate?
    Exchange rate tables and logic.
  14. How do you test cloud ETL pipelines?
    Scalability, cost, performance.
  15. Scenario: NULLs after LEFT JOIN. Why?
    Missing dimension records.
  16. How do you test archival logic?
    Validate data movement to history tables.
  17. Scenario: Incorrect SCD expiry date. Root cause?
    Effective date logic issue.
  18. How do you validate metadata tables?
    Batch IDs, timestamps, counts.
  19. Scenario: Unexpected data spike. What test?
    Source anomaly validation.
  20. How do you perform real-time end-to-end ETL testing?
    Source → DW → Report reconciliation.

Real SQL Query Examples for ETL Validation

Sample Tables

  • src_orders(order_id, cust_id, amount, order_dt)
  • dim_customer(cust_sk, cust_id, current_flag)
  • fact_sales(order_id, cust_sk, amount, load_dt)

1️⃣ Record Count Validation

SELECT COUNT(*) FROM src_orders;

SELECT COUNT(*) FROM fact_sales;

2️⃣ JOIN Validation

SELECT COUNT(*) AS missing_dim

FROM fact_sales f

LEFT JOIN dim_customer d

ON f.cust_sk = d.cust_sk

WHERE d.cust_sk IS NULL;

3️⃣ GROUP BY Aggregation

SELECT cust_id, SUM(amount)

FROM src_orders

GROUP BY cust_id;

4️⃣ Window Function – Deduplication

SELECT *

FROM (

  SELECT order_id,

         ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn

  FROM src_orders

) t

WHERE rn = 1;

5️⃣ Performance Tuning Validation

EXPLAIN

SELECT *

FROM fact_sales

WHERE load_dt >= CURRENT_DATE – 1;


ETL Tools Asked in Real-Time Interviews

  • Informatica
  • Microsoft SQL Server Integration Services
  • Ab Initio
  • Pentaho
  • Talend

ETL Defect Examples + Test Case Sample

Defect Example

Issue: Duplicate fact records

  • Root Cause: Incorrect join logic
  • Fix: Correct join keys and add deduplication

Sample Test Case

  • Test Case: Validate SCD2 update
  • Expected Result:
    • Old record expired
    • New record inserted with current_flag = ‘Y’

Real-Time ETL Testing Quick Revision Sheet

  • Validate counts, sums, NULLs, duplicates
  • Check S2T mapping thoroughly
  • Test SCD1, SCD2, audit fields, hashing
  • Use JOIN, GROUP BY, window functions
  • Always reconcile source vs target vs reports

FAQs (SEO Snippet Friendly)

Q1. What are real time ETL testing interview questions?
They focus on production-level ETL issues like mismatches, reruns, and performance.

Q2. Which SQL is mandatory for ETL testers?
JOINs, GROUP BY, window functions, and performance tuning queries.

Q3. How do you test SCD2 in real-time projects?
By validating history rows, effective dates, and current flags.

Leave a Comment

Your email address will not be published. Required fields are marked *