What Is ETL Testing? (Definition + EY-Style Real Example)
ETL Testing ensures that data extracted from multiple source systems is correctly transformed according to business rules and accurately loaded into a Data Warehouse (DW) for reporting and analytics.
EY-style real-world example
In large consulting programs (like those handled by Ernst & Young), ETL testing often involves:
- Multiple source systems (ERP, CRM, flat files)
- Complex transformations (tax rules, regulatory calculations)
- Strict audit, reconciliation, and data quality controls
An ETL tester validates that financial or regulatory reports exactly match approved business logic and source data.
Data Warehouse Flow: Source → Staging → Transform → Load → Reporting
- Source – OLTP databases, APIs, files
- Staging – Raw data landing (no transformation)
- Transformation – Business rules, joins, SCD logic
- Load – Fact & Dimension tables
- Reporting – BI tools, dashboards, regulatory reports
EY ETL Testing Interview Questions & Answers
(Basic → Advanced | Scenario-Based)
🔹 Basic ETL Testing Questions (1–15)
- What is ETL testing?
Validation of extracted, transformed, and loaded data for accuracy and completeness. - Why is ETL testing critical in EY projects?
Because data is often used for financial, audit, and regulatory reporting. - What is Source-to-Target (S2T) mapping?
A document defining how each source field maps to target fields with transformations. - What are common ETL testing types?
Source validation, transformation validation, target validation, reconciliation, performance testing. - What is staging area testing?
Ensuring staging data exactly matches source data. - What are audit fields?
load_date, batch_id, created_by, updated_date. - How do you validate record counts?
Compare source and target counts after filters. - What is data reconciliation?
Matching totals and counts between source and target. - What is data warehouse testing?
Testing schemas, facts, dimensions, and reports. - What is a fact table?
Stores measurable business data (sales, revenue). - What is a dimension table?
Stores descriptive attributes (customer, product). - Difference between ETL testing and database testing?
ETL focuses on data movement and transformation. - What is full load vs incremental load?
Full reloads all data; incremental loads only deltas. - What is reject data testing?
Validating rejected records and error reasons. - What is data quality testing?
Checking accuracy, consistency, completeness.
🔹 Intermediate Scenario-Based Questions (16–35)
- Scenario: Source has 1M rows, target has 980K. What do you check?
Filters, rejects, duplicate removal logic. - How do you test NULL handling?
Validate mandatory fields and default values. - What is SCD Type 1?
Overwrites old data without history. - What is SCD Type 2?
Maintains history using effective dates and flags. - Scenario: Customer address changes. What do you test?
SCD2 history creation. - What is surrogate key testing?
Ensure uniqueness and non-null keys. - What is late-arriving dimension scenario?
Fact arrives before dimension; validate dummy keys. - How do you test aggregation logic?
Compare aggregated source vs target data. - Scenario: Duplicate records in target. Why?
Incorrect joins or missing dedup logic. - What is hashing used for?
Change detection and deduplication. - How do you test CDC (Change Data Capture)?
Validate only changed records are processed. - Scenario: Negative amount in financial report. What test?
Business rule validation. - What is referential integrity testing?
Fact keys must exist in dimensions. - Scenario: Data type mismatch error. What do you test?
Casting and truncation rules. - How do you validate date transformations?
Timezone and format validation. - What is soft delete testing?
Validate delete_flag instead of physical delete. - Scenario: Re-run ETL job causes duplicates. What failed?
Restartability logic. - How do you test data masking?
Validate PII fields are obfuscated. - What is schema drift?
Source schema changes; test ETL adaptability. - How do you test BI reports?
Reconcile report data with DW tables.
🔹 Advanced & EY Real-Time Scenarios (36–55)
- Scenario: ETL job misses SLA. What do you test?
Query plans, indexes, partitions. - How do you test performance tuning?
Validate join strategy and parallelism. - Scenario: Many-to-many join inflates data. What test?
Join cardinality validation. - How do you test window functions?
Validate ranking and dedup logic. - Scenario: Late data impacts aggregates. What do you do?
Recalculate impacted partitions. - How do you test audit and control tables?
Validate batch status and row counts. - Scenario: Regulatory report mismatch. What do you check first?
S2T mapping and aggregation logic. - How do you test ETL rollback?
Ensure partial loads are reverted. - Scenario: Parallel jobs cause locking. What test?
Concurrency and isolation. - How do you test historical reloads?
Data duplication and SCD handling. - Scenario: Source file arrives late. What do you test?
Dependency and rerun logic. - How do you test checksum/hash totals?
Compare source vs target hashes. - Scenario: Currency conversion mismatch. What do you validate?
Exchange rate application. - How do you test cloud DW ETL?
Cost, scalability, performance checks. - Scenario: NULLs after LEFT JOIN. Why?
Missing dimension records. - How do you test archival logic?
Validate data movement to history tables. - Scenario: Incorrect SCD expiry date. What failed?
Effective date logic. - How do you validate metadata tables?
Batch IDs, counts, timestamps. - Scenario: Unexpected data spike. What test?
Source anomaly detection. - How do you perform end-to-end ETL testing?
Source → DW → Report reconciliation.
Real SQL Query Examples for ETL Validation
Sample Tables
- src_orders(order_id, cust_id, amount, order_dt)
- dim_customer(cust_sk, cust_id, current_flag)
- fact_sales(order_id, cust_sk, amount, load_dt)
1️⃣ Record Count Validation
SELECT COUNT(*) FROM src_orders;
SELECT COUNT(*) FROM fact_sales;
2️⃣ JOIN Validation
SELECT COUNT(*) missing_dim
FROM fact_sales f
LEFT JOIN dim_customer d
ON f.cust_sk = d.cust_sk
WHERE d.cust_sk IS NULL;
3️⃣ GROUP BY Aggregation
SELECT cust_id, SUM(amount)
FROM src_orders
GROUP BY cust_id;
4️⃣ Window Function – Deduplication
SELECT *
FROM (
SELECT order_id,
ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn
FROM src_orders
) t
WHERE rn = 1;
5️⃣ Performance Tuning Validation
EXPLAIN
SELECT *
FROM fact_sales
WHERE load_dt >= CURRENT_DATE – 1;
ETL Tools Commonly Asked in EY Interviews
- Informatica
- Microsoft SQL Server Integration Services
- Ab Initio
- Pentaho
- Talend
ETL Defect Examples + Test Case Sample
Defect Example
Issue: Duplicate fact records in financial report
- Root Cause: Incorrect join logic
- Fix: Correct join keys and add deduplication
Sample Test Case
- Test Case: Validate SCD2 update
- Expected Result:
- Old record expired
- New record inserted with current_flag = ‘Y’
- Old record expired
EY ETL Testing Quick Revision Sheet
- Validate record counts, sums, NULLs, duplicates
- Check S2T mapping line by line
- Test SCD1, SCD2, audit fields, hashing
- Use JOIN, GROUP BY, window functions
- Always reconcile source vs target vs reports
FAQs (SEO Snippet Friendly)
Q1. What are EY ETL testing interview questions?
They focus on real-time data validation, reconciliation, and regulatory-grade accuracy.
Q2. Which SQL is important for EY ETL roles?
JOINs, GROUP BY, window functions, and performance tuning queries.
Q3. How do you test SCD2 in real projects?
By validating history rows, effective dates, and current flags.
