1. What is ETL Testing? (Definition + Real Example)
ETL Testing validates that data is correctly Extracted, Transformed, and Loaded from multiple source systems into a Data Warehouse (DW) while meeting business rules, data quality standards, audit requirements, and performance SLAs.
Deloitte-Style Real-World Example
In a Deloitte analytics engagement for a finance client:
- Source: Core banking OLTP, CRM, flat files
- Target: Enterprise DW used for regulatory and management reporting
- ETL testing validates:
- Source-to-target row counts
- Transformation logic (currency conversion, risk scoring)
- SCD Type 1 & Type 2 dimension behavior
- Audit fields (batch_id, load_ts, source_system)
- Performance & reconciliation for daily regulatory loads
- Source-to-target row counts
Deloitte interviewers strongly evaluate concept clarity, SQL depth, and scenario-based thinking.
2. DW Flow – Source → Staging → Transform → Load → Reporting
- Source Layer – OLTP DBs, APIs, files
- Staging Layer – Raw data landing (minimal checks)
- Transformation Layer – Business rules, joins, aggregations, SCD handling
- Load Layer – Fact & dimension tables
- Reporting Layer – BI tools, dashboards, regulatory reports
Testing focus: S2T mapping validation, reconciliation, referential integrity, aggregates, and SLA adherence.
3. ETL Architecture – Deloitte QA Perspective
- ETL Tool Layer (Informatica / SSIS / Ab Initio)
- Metadata & Mapping Layer (S2T, business rules)
- Control Tables (job status, row counts, rejects)
- Audit & Reconciliation Framework
- Parallel Processing / Partitioning
ETL testers ensure accuracy, completeness, restartability, and performance.
4. Deloitte ETL Testing Interview Questions & Answers (Basic → Advanced)
🔹 Basic ETL Testing Questions (1–15)
- What is ETL testing?
Validation of extract, transform, and load processes. - Why is ETL testing critical in Deloitte projects?
Deloitte works on compliance-heavy and data-driven programs where data errors have financial and legal impact. - What is a data warehouse?
A centralized repository optimized for analytics and reporting. - What is a staging area?
Temporary storage for raw extracted data. - What is S2T mapping?
Source-to-Target document defining column mappings and transformation logic. - What is a fact table?
Stores numeric business measures. - What is a dimension table?
Stores descriptive attributes. - What are audit fields?
batch_id, load_date, record_source, checksum. - What is data reconciliation?
Comparing source and target data to ensure completeness. - What is full load?
Reloading all records into target tables. - What is incremental load?
Loading only changed or new records. - What is primary key testing?
Validating uniqueness and non-null values. - What is reject data?
Records failing validation rules. - What is data profiling?
Analyzing source data patterns before ETL. - What is null validation?
Ensuring null handling follows business rules.
🔹 Intermediate ETL QA Questions (16–35)
- Explain SCD Type 1.
Overwrites existing dimension data. - Explain SCD Type 2.
Maintains historical records with effective dates and flags. - How do you test SCD2 logic?
Validate new row insert, old row expiry, and current_flag. - What is a surrogate key?
A system-generated unique identifier. - How do you validate surrogate keys?
Check uniqueness, non-null, and sequence integrity. - What is CDC?
Change Data Capture for delta processing. - How do you test CDC?
Compare before/after snapshots and counts. - What is referential integrity testing?
Ensuring fact FK exists in dimension PK. - What is aggregation testing?
Validating SUM, COUNT, AVG logic. - What is lookup testing?
Verifying reference data mapping accuracy. - What is deduplication?
Removing duplicate business keys. - How do you test deduplication?
Using GROUP BY or window functions. - What is a late-arriving dimension?
Fact arrives before dimension. - What is data skew?
Uneven data distribution impacting performance. - What is restartability testing?
Ensuring ETL resumes correctly after failure. - What is metadata testing?
Validating column names, types, lengths. - What is data lineage?
Tracing data from source to report. - What is threshold testing?
Failing jobs when reject count exceeds limits. - What is hashing in ETL?
Detecting data changes efficiently. - What is SLA testing?
Validating ETL job completion within time limits.
🔹 Advanced / Scenario-Based Questions (36–60)
- How do you handle record count mismatch?
Check filters, joins, rejects, CDC logic. - How do you validate null handling?
Verify defaults, rejects, or allowed nulls. - How do you test ETL performance?
Partitioning, indexing, parallelism. - How do you test incremental loads?
Validate watermark logic. - How do you test multi-source joins?
Validate join keys and cardinality. - How do you test aggregation failures?
Recalculate totals using SQL. - How do you test timezone conversions?
Validate timestamps across zones. - How do you test re-runs?
Ensure idempotency (no duplicates). - How do you test historical data loads?
Validate backdated inserts. - How do you validate audit tables?
Compare source_count vs target_count. - What causes ETL performance bottlenecks?
Large joins, data skew, missing indexes. - How do you test file-based ETL?
Header/footer, delimiter, encoding. - How do you test schema changes?
Backward compatibility checks. - How do you validate reporting layer data?
BI totals vs DW aggregates. - Describe a critical ETL defect you found.
Example: SCD2 failure, incorrect aggregation.
5. Real SQL Query Examples for ETL Validation
Sample Tables
src_orders(order_id, cust_id, amount, order_date)
fact_sales(order_sk, cust_sk, sales_amt, order_date, batch_id)
JOIN Validation (Missing Records)
SELECT COUNT(*)
FROM src_orders s
LEFT JOIN fact_sales f
ON s.order_id = f.order_sk
WHERE f.order_sk IS NULL;
GROUP BY Aggregation
SELECT order_date, SUM(amount)
FROM src_orders
GROUP BY order_date;
SELECT order_date, SUM(sales_amt)
FROM fact_sales
GROUP BY order_date;
Window Function – De-duplication
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER
(PARTITION BY order_id ORDER BY load_date DESC) rn
FROM stage_orders
) t
WHERE rn = 1;
Performance Validation
EXPLAIN ANALYZE
SELECT cust_sk, SUM(sales_amt)
FROM fact_sales
GROUP BY cust_sk;
6. Scenario-Based ETL Testing Use Cases
| Scenario | Validation Approach |
| Record mismatch | Source vs target counts |
| Null values | Default or reject logic |
| Duplicate records | Window functions |
| Late-arriving data | SCD2 backdated insert |
| Slow job | Partitioning & indexing |
7. ETL Tools Commonly Asked in Deloitte Interviews
- Informatica
- Microsoft SSIS
- Ab Initio
- Pentaho
- Talend
Deloitte focuses more on ETL concepts, SQL depth, and data quality frameworks than tool-specific syntax.
8. ETL Defect Examples + Sample Test Case
Defect: SCD2 record not expiring
- Expected: old row current_flag = ‘N’
- Actual: two active records
- Severity: High
Sample Test Case:
- Update dimension attribute
- Validate new row insertion and old row expiry
9. ETL Testing Quick Revision Sheet
- ETL flow & architecture
- S2T mapping validation
- SCD1 vs SCD2
- SQL JOIN, GROUP BY, window functions
- Performance tuning & reconciliation
10. FAQs – Featured Snippet Ready
Q1. Does Deloitte ask SQL in ETL testing interviews?
Yes. SQL validation is mandatory.
Q2. Is Informatica mandatory for Deloitte ETL roles?
No. Strong ETL concepts matter more than tools.
Q3. How many ETL interview rounds at Deloitte?
Typically 1–2 technical rounds.
