What Is ETL Testing? (Definition + Wipro Project Example)
ETL Testing is the process of validating data during Extract, Transform, and Load operations to ensure accuracy, completeness, consistency, auditability, and performance in a Data Warehouse (DW).
Wipro real-world project context
In large enterprise and BFSI programs delivered by Wipro, ETL testing typically involves:
- Multiple heterogeneous source systems (OLTP, ERP, flat files, APIs)
- Complex transformations (business rules, aggregations, SCD handling)
- High data volumes with strict SLAs and regulatory needs
A Wipro ETL tester is expected to validate source data, transformations, SCD logic, audit fields, and final BI reports using strong SQL and scenario-based thinking.
Data Warehouse Flow: Source → Staging → Transform → Load → Reporting
- Source – OLTP databases, ERP/CRM systems, files
- Staging – Raw data landing zone (no business rules)
- Transformation – Cleansing, joins, aggregations, SCD logic
- Load – Fact and Dimension tables
- Reporting – BI dashboards, MIS, analytics
Wipro ETL Testing Interview Questions & Best Answers
(Basic → Advanced | Real-World & Scenario-Based)
🔹 Basic ETL Testing Questions (1–15)
- What is ETL testing?
Validation of data accuracy and transformations during ETL processing. - Why is ETL testing important in Wipro projects?
Because enterprise clients rely on accurate analytics and compliance reports. - What is Source-to-Target (S2T) mapping?
A document defining source fields, transformation logic, and target columns. - What are the main types of ETL testing?
Source testing, staging testing, transformation testing, target testing, reconciliation, performance testing. - What is staging area testing?
Ensuring staging data exactly matches source data. - What are audit fields?
load_date, batch_id, created_by, updated_date. - How do you validate record counts?
Compare source vs target counts after filters and transformations. - What is data reconciliation?
Matching totals and counts between source and target. - Difference between fact and dimension tables?
Facts store measures; dimensions store descriptive attributes. - What is data warehouse testing?
Testing schemas, ETL logic, and reports. - What is full load vs incremental load?
Full load reloads all data; incremental load processes only delta data. - What is reject data testing?
Validating rejected rows and error reasons. - What is mapping validation?
Verifying transformations strictly follow S2T mapping. - What is end-to-end ETL testing?
Source → DW → Report validation. - Why is SQL mandatory for ETL testers?
SQL validates data, transformations, and aggregates.
🔹 Scenario-Based Intermediate Questions (16–35)
- Scenario: Source has 1M records, target has 9.6L. What do you check?
Filters, rejected rows, duplicate elimination, join conditions. - How do you test NULL handling?
Validate mandatory columns and default value logic. - What is SCD Type 1?
Overwrites old data without maintaining history. - What is SCD Type 2?
Maintains history using effective_from, effective_to, and current_flag. - Scenario: Customer address changes. What do you test?
New SCD2 record insertion and expiry of old record. - What is surrogate key testing?
Ensure surrogate keys are unique and non-null. - What is late-arriving dimension?
Fact arrives before dimension; validate dummy/unknown key handling. - Scenario: Duplicate rows appear in target but not in source. Why?
Incorrect joins or missing deduplication logic. - How do you test aggregation logic?
Compare aggregated source totals with target data. - What is hashing used for?
Change detection and deduplication. - How do you test CDC (Change Data Capture)?
Validate only changed records are loaded. - Scenario: Negative amount appears in report. What do you test?
Business rule and transformation logic. - What is referential integrity testing?
Fact foreign keys must exist in dimension tables. - Scenario: Data type mismatch during load. What to validate?
Casting, truncation, precision rules. - How do you test date transformations?
Time zone and format validation. - What is soft delete testing?
Validate delete_flag instead of physical deletion. - Scenario: ETL job rerun creates duplicates. Why?
Restartability logic failure. - What is schema drift?
Source schema changes impacting ETL. - How do you validate BI reports?
Reconcile report totals with DW tables. - What is data masking testing?
Ensure PII fields are obfuscated.
🔹 Advanced & Real-Time Wipro Scenarios (36–55)
- Scenario: ETL job misses SLA. What do you analyze?
Query plans, indexing, partitioning. - How do you test ETL performance tuning?
Validate join strategy and execution plans. - Scenario: Many-to-many join inflates data. What test?
Join cardinality validation. - How do you validate window functions?
Check ranking and deduplication logic. - Scenario: Late data impacts aggregates. What do you do?
Recalculate impacted partitions. - How do you test audit/control tables?
Validate batch status and row counts. - Scenario: Report mismatch with DW. First step?
Verify S2T mapping and aggregation logic. - How do you test ETL rollback?
Ensure partial loads are reverted. - Scenario: Parallel jobs cause deadlocks. What test?
Concurrency and locking validation. - How do you test historical reloads?
Check duplication and SCD handling. - Scenario: Source file arrives late. What to test?
Dependency and rerun logic. - How do you validate checksum/hash totals?
Compare source vs target hashes. - Scenario: Currency conversion mismatch. What to validate?
Exchange rate tables and logic. - How do you test cloud ETL pipelines?
Scalability, cost, and performance. - Scenario: NULLs after LEFT JOIN. Why?
Missing dimension records. - How do you test archival logic?
Validate movement to history tables. - Scenario: Incorrect SCD expiry date. Root cause?
Effective date logic issue. - How do you validate metadata tables?
Batch IDs, timestamps, and counts. - Scenario: Unexpected data spike. What test?
Source anomaly and business rule validation. - How do you perform end-to-end ETL testing in Wipro projects?
Source → DW → Report reconciliation.
Real SQL Query Examples for ETL Validation
Sample Tables
- src_orders(order_id, cust_id, amount, order_dt)
- dim_customer(cust_sk, cust_id, current_flag)
- fact_sales(order_id, cust_sk, amount, load_dt)
1️⃣ Record Count Validation
SELECT COUNT(*) FROM src_orders;
SELECT COUNT(*) FROM fact_sales;
2️⃣ JOIN Validation
SELECT COUNT(*) AS missing_dim
FROM fact_sales f
LEFT JOIN dim_customer d
ON f.cust_sk = d.cust_sk
WHERE d.cust_sk IS NULL;
3️⃣ GROUP BY Aggregation
SELECT cust_id, SUM(amount)
FROM src_orders
GROUP BY cust_id;
4️⃣ Window Function – Deduplication
SELECT *
FROM (
SELECT order_id,
ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn
FROM src_orders
) t
WHERE rn = 1;
5️⃣ Performance Tuning Validation
EXPLAIN
SELECT *
FROM fact_sales
WHERE load_dt >= CURRENT_DATE – 1;
ETL Tools Commonly Asked in Wipro Interviews
- Informatica
- Microsoft SQL Server Integration Services
- Ab Initio
- Pentaho
- Talend
ETL Defect Examples + Test Case Sample
Defect Example
Issue: Duplicate records in fact table
- Root Cause: Incorrect join logic
- Fix: Correct join keys and add deduplication
Sample Test Case
- Test Case: Validate SCD2 update
- Expected Result:
- Old record expired
- New record inserted with current_flag = ‘Y’
- Old record expired
Wipro ETL Testing – Quick Revision Sheet
- Validate counts, sums, NULLs, duplicates
- Check S2T mappings line-by-line
- Test SCD1, SCD2, audit fields, hashing
- Use JOIN, GROUP BY, window functions
- Always reconcile source vs target vs reports
FAQs (Snippet-Friendly)
Q1. What are Wipro ETL testing interview questions?
They focus on real-time ETL scenarios, data reconciliation, and enterprise-grade data quality.
Q2. Which SQL is important for Wipro ETL roles?
JOINs, GROUP BY, window functions, and performance tuning queries.
Q3. How do you test SCD2 in real projects?
By validating history rows, effective dates, and current flags.
