Deloitte ETL Testing Interview Questions – Real-Time QA Guide with Answers

1. What is ETL Testing? (Definition + Real Example)

ETL Testing validates that data is correctly Extracted, Transformed, and Loaded from multiple source systems into a Data Warehouse (DW) while meeting business rules, data quality standards, audit requirements, and performance SLAs.

Deloitte-Style Real-World Example

In a Deloitte analytics engagement for a finance client:

  • Source: Core banking OLTP, CRM, flat files
  • Target: Enterprise DW used for regulatory and management reporting
  • ETL testing validates:
    • Source-to-target row counts
    • Transformation logic (currency conversion, risk scoring)
    • SCD Type 1 & Type 2 dimension behavior
    • Audit fields (batch_id, load_ts, source_system)
    • Performance & reconciliation for daily regulatory loads

Deloitte interviewers strongly evaluate concept clarity, SQL depth, and scenario-based thinking.


2. DW Flow – Source → Staging → Transform → Load → Reporting

  1. Source Layer – OLTP DBs, APIs, files
  2. Staging Layer – Raw data landing (minimal checks)
  3. Transformation Layer – Business rules, joins, aggregations, SCD handling
  4. Load Layer – Fact & dimension tables
  5. Reporting Layer – BI tools, dashboards, regulatory reports

Testing focus: S2T mapping validation, reconciliation, referential integrity, aggregates, and SLA adherence.


3. ETL Architecture – Deloitte QA Perspective

  • ETL Tool Layer (Informatica / SSIS / Ab Initio)
  • Metadata & Mapping Layer (S2T, business rules)
  • Control Tables (job status, row counts, rejects)
  • Audit & Reconciliation Framework
  • Parallel Processing / Partitioning

ETL testers ensure accuracy, completeness, restartability, and performance.


4. Deloitte ETL Testing Interview Questions & Answers (Basic → Advanced)

🔹 Basic ETL Testing Questions (1–15)

  1. What is ETL testing?
    Validation of extract, transform, and load processes.
  2. Why is ETL testing critical in Deloitte projects?
    Deloitte works on compliance-heavy and data-driven programs where data errors have financial and legal impact.
  3. What is a data warehouse?
    A centralized repository optimized for analytics and reporting.
  4. What is a staging area?
    Temporary storage for raw extracted data.
  5. What is S2T mapping?
    Source-to-Target document defining column mappings and transformation logic.
  6. What is a fact table?
    Stores numeric business measures.
  7. What is a dimension table?
    Stores descriptive attributes.
  8. What are audit fields?
    batch_id, load_date, record_source, checksum.
  9. What is data reconciliation?
    Comparing source and target data to ensure completeness.
  10. What is full load?
    Reloading all records into target tables.
  11. What is incremental load?
    Loading only changed or new records.
  12. What is primary key testing?
    Validating uniqueness and non-null values.
  13. What is reject data?
    Records failing validation rules.
  14. What is data profiling?
    Analyzing source data patterns before ETL.
  15. What is null validation?
    Ensuring null handling follows business rules.

🔹 Intermediate ETL QA Questions (16–35)

  1. Explain SCD Type 1.
    Overwrites existing dimension data.
  2. Explain SCD Type 2.
    Maintains historical records with effective dates and flags.
  3. How do you test SCD2 logic?
    Validate new row insert, old row expiry, and current_flag.
  4. What is a surrogate key?
    A system-generated unique identifier.
  5. How do you validate surrogate keys?
    Check uniqueness, non-null, and sequence integrity.
  6. What is CDC?
    Change Data Capture for delta processing.
  7. How do you test CDC?
    Compare before/after snapshots and counts.
  8. What is referential integrity testing?
    Ensuring fact FK exists in dimension PK.
  9. What is aggregation testing?
    Validating SUM, COUNT, AVG logic.
  10. What is lookup testing?
    Verifying reference data mapping accuracy.
  11. What is deduplication?
    Removing duplicate business keys.
  12. How do you test deduplication?
    Using GROUP BY or window functions.
  13. What is a late-arriving dimension?
    Fact arrives before dimension.
  14. What is data skew?
    Uneven data distribution impacting performance.
  15. What is restartability testing?
    Ensuring ETL resumes correctly after failure.
  16. What is metadata testing?
    Validating column names, types, lengths.
  17. What is data lineage?
    Tracing data from source to report.
  18. What is threshold testing?
    Failing jobs when reject count exceeds limits.
  19. What is hashing in ETL?
    Detecting data changes efficiently.
  20. What is SLA testing?
    Validating ETL job completion within time limits.

🔹 Advanced / Scenario-Based Questions (36–60)

  1. How do you handle record count mismatch?
    Check filters, joins, rejects, CDC logic.
  2. How do you validate null handling?
    Verify defaults, rejects, or allowed nulls.
  3. How do you test ETL performance?
    Partitioning, indexing, parallelism.
  4. How do you test incremental loads?
    Validate watermark logic.
  5. How do you test multi-source joins?
    Validate join keys and cardinality.
  6. How do you test aggregation failures?
    Recalculate totals using SQL.
  7. How do you test timezone conversions?
    Validate timestamps across zones.
  8. How do you test re-runs?
    Ensure idempotency (no duplicates).
  9. How do you test historical data loads?
    Validate backdated inserts.
  10. How do you validate audit tables?
    Compare source_count vs target_count.
  11. What causes ETL performance bottlenecks?
    Large joins, data skew, missing indexes.
  12. How do you test file-based ETL?
    Header/footer, delimiter, encoding.
  13. How do you test schema changes?
    Backward compatibility checks.
  14. How do you validate reporting layer data?
    BI totals vs DW aggregates.
  15. Describe a critical ETL defect you found.
    Example: SCD2 failure, incorrect aggregation.

5. Real SQL Query Examples for ETL Validation

Sample Tables

src_orders(order_id, cust_id, amount, order_date)
fact_sales(order_sk, cust_sk, sales_amt, order_date, batch_id)

JOIN Validation (Missing Records)

SELECT COUNT(*)

FROM src_orders s

LEFT JOIN fact_sales f

ON s.order_id = f.order_sk

WHERE f.order_sk IS NULL;

GROUP BY Aggregation

SELECT order_date, SUM(amount)

FROM src_orders

GROUP BY order_date;

SELECT order_date, SUM(sales_amt)

FROM fact_sales

GROUP BY order_date;

Window Function – De-duplication

SELECT *

FROM (

  SELECT *,

         ROW_NUMBER() OVER

         (PARTITION BY order_id ORDER BY load_date DESC) rn

  FROM stage_orders

) t

WHERE rn = 1;

Performance Validation

EXPLAIN ANALYZE

SELECT cust_sk, SUM(sales_amt)

FROM fact_sales

GROUP BY cust_sk;


6. Scenario-Based ETL Testing Use Cases

ScenarioValidation Approach
Record mismatchSource vs target counts
Null valuesDefault or reject logic
Duplicate recordsWindow functions
Late-arriving dataSCD2 backdated insert
Slow jobPartitioning & indexing

7. ETL Tools Commonly Asked in Deloitte Interviews

  • Informatica
  • Microsoft SSIS
  • Ab Initio
  • Pentaho
  • Talend

Deloitte focuses more on ETL concepts, SQL depth, and data quality frameworks than tool-specific syntax.


8. ETL Defect Examples + Sample Test Case

Defect: SCD2 record not expiring

  • Expected: old row current_flag = ‘N’
  • Actual: two active records
  • Severity: High

Sample Test Case:

  • Update dimension attribute
  • Validate new row insertion and old row expiry

9. ETL Testing Quick Revision Sheet

  • ETL flow & architecture
  • S2T mapping validation
  • SCD1 vs SCD2
  • SQL JOIN, GROUP BY, window functions
  • Performance tuning & reconciliation

10. FAQs – Featured Snippet Ready

Q1. Does Deloitte ask SQL in ETL testing interviews?
Yes. SQL validation is mandatory.

Q2. Is Informatica mandatory for Deloitte ETL roles?
No. Strong ETL concepts matter more than tools.

Q3. How many ETL interview rounds at Deloitte?
Typically 1–2 technical rounds.

Leave a Comment

Your email address will not be published. Required fields are marked *