What Is ETL Testing? (Definition + Project Example)
ETL Testing is the process of validating data during Extract, Transform, and Load operations to ensure correctness, completeness, consistency, and performance in a Data Warehouse (DW).
Real-world project context
In large digital-transformation programs delivered by Mindtree (now part of LTIMindtree), ETL testing typically involves:
- Multiple source systems (ERP, CRM, cloud apps)
- Complex transformations (business rules, aggregations, SCD handling)
- High data quality and SLA expectations for enterprise reporting
ETL testers are expected to validate data end-to-end and handle real-time data issues confidently during interviews.
Data Warehouse Flow: Source → Staging → Transform → Load → Reporting
- Source – OLTP DBs, APIs, flat files
- Staging – Raw data landing zone
- Transformation – Business rules, joins, SCD logic
- Load – Fact and Dimension tables
- Reporting – BI dashboards, analytics, MIS reports
Mindtree ETL Testing Interview Questions & Answers
(Basic → Advanced | Scenario-Based & Real-Time)
🔹 Basic ETL Testing Questions (1–15)
- What is ETL testing?
Validation of data accuracy and transformations during ETL processing. - Why is ETL testing important in Mindtree projects?
Because enterprise clients depend on accurate analytics and reports. - What is Source-to-Target (S2T) mapping?
A document defining source columns, transformations, and target columns. - What are the main ETL testing types?
Source, transformation, target, reconciliation, performance testing. - What is staging area testing?
Verifying staging data matches source exactly. - What are audit fields?
load_date, batch_id, created_by, updated_date. - How do you validate record counts?
Compare source vs target counts after applying filters. - What is data reconciliation?
Matching totals and counts between source and target. - Difference between fact and dimension table?
Facts store measures; dimensions store descriptive attributes. - What is data warehouse testing?
Testing schemas, data models, ETL, and reports. - What is full load vs incremental load?
Full reloads all data; incremental loads only deltas. - What is reject data testing?
Validating rejected records and error reasons. - What is data quality testing?
Accuracy, completeness, consistency checks. - What is mapping validation?
Verifying transformations match S2T. - What is end-to-end ETL testing?
Source → DW → Report validation.
🔹 Scenario-Based Intermediate Questions (16–35)
- Scenario: Source has 1M records, target has 970K. What do you check?
Filters, rejects, duplicate removal logic. - How do you test NULL handling?
Validate mandatory columns and default values. - What is SCD Type 1?
Overwrites old values without maintaining history. - What is SCD Type 2?
Maintains history with effective dates and current_flag. - Scenario: Customer address changed. What do you test?
New SCD2 record insertion and expiry of old record. - What is surrogate key testing?
Ensure uniqueness and non-null values. - What is late-arriving dimension?
Fact arrives before dimension; validate dummy key handling. - Scenario: Duplicate records in target but not in source. Why?
Incorrect join or missing deduplication. - How do you test aggregation logic?
Compare aggregated source data with target. - What is hashing used for?
Change detection and deduplication. - How do you test CDC (Change Data Capture)?
Validate only changed records are loaded. - Scenario: Negative sales amount appears. What do you test?
Business rule validation. - What is referential integrity testing?
Fact keys must exist in dimension tables. - Scenario: Data type mismatch error. What to validate?
Casting and truncation rules. - How do you test date transformations?
Time zone and format validation. - What is soft delete testing?
Validate delete_flag instead of physical delete. - Scenario: ETL job rerun creates duplicates. Why?
Restartability logic failure. - What is schema drift?
Source schema change; ETL adaptation testing. - How do you test BI reports?
Compare report totals with DW tables. - What is data masking testing?
Validate PII fields are obfuscated.
🔹 Advanced & Real-Time Scenarios (36–55)
- Scenario: ETL job misses SLA. What do you analyze?
Query plans, indexing, partitioning. - How do you test performance tuning?
Validate join strategy and parallel execution. - Scenario: Many-to-many join inflates data. What test?
Join cardinality validation. - How do you validate window functions?
Check ranking and dedup logic. - Scenario: Late data impacts aggregates. What do you do?
Recalculate impacted partitions. - How do you test audit and control tables?
Validate batch status and row counts. - Scenario: Report mismatch with DW. First step?
Verify S2T mapping and aggregation logic. - How do you test ETL rollback?
Ensure partial loads are reverted. - Scenario: Parallel jobs cause deadlocks. What test?
Concurrency and isolation validation. - How do you test historical reloads?
Check duplication and SCD logic. - Scenario: Source file arrives late. What test?
Dependency and rerun logic. - How do you test checksum/hash totals?
Compare source vs target hashes. - Scenario: Currency conversion mismatch. What to validate?
Exchange rate logic. - How do you test cloud ETL pipelines?
Cost, scalability, and performance. - Scenario: NULLs after LEFT JOIN. Why?
Missing dimension records. - How do you test archival logic?
Validate data movement to history tables. - Scenario: Incorrect SCD expiry date. Root cause?
Effective date logic error. - How do you validate metadata tables?
Batch IDs, timestamps, counts. - Scenario: Unexpected data spike. What test?
Source anomaly validation. - How do you do end-to-end ETL testing?
Source → DW → Report reconciliation.
Real SQL Query Examples for ETL Validation
Sample Tables
- src_orders(order_id, cust_id, amount, order_dt)
- dim_customer(cust_sk, cust_id, current_flag)
- fact_sales(order_id, cust_sk, amount, load_dt)
1️⃣ Record Count Validation
SELECT COUNT(*) FROM src_orders;
SELECT COUNT(*) FROM fact_sales;
2️⃣ JOIN Validation
SELECT COUNT(*) AS missing_dim
FROM fact_sales f
LEFT JOIN dim_customer d
ON f.cust_sk = d.cust_sk
WHERE d.cust_sk IS NULL;
3️⃣ GROUP BY Aggregation
SELECT cust_id, SUM(amount)
FROM src_orders
GROUP BY cust_id;
4️⃣ Window Function – Deduplication
SELECT *
FROM (
SELECT order_id,
ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn
FROM src_orders
) t
WHERE rn = 1;
5️⃣ Performance Tuning Validation
EXPLAIN
SELECT *
FROM fact_sales
WHERE load_dt >= CURRENT_DATE – 1;
ETL Tools Commonly Asked in Mindtree Interviews
- Informatica
- Microsoft SQL Server Integration Services
- Ab Initio
- Pentaho
- Talend
ETL Defect Examples + Test Case Sample
Defect Example
Issue: Duplicate records in fact table
- Root Cause: Incorrect join logic
- Fix: Correct join keys and add deduplication
Sample Test Case
- Test Case: Validate SCD2 update
- Expected Result:
- Old record expired
- New record inserted with current_flag = ‘Y’
- Old record expired
Mindtree ETL Testing Quick Revision Sheet
- Validate counts, sums, NULLs, duplicates
- Check S2T mapping line-by-line
- Test SCD1, SCD2, audit fields, hashing
- Use JOIN, GROUP BY, window functions
- Always reconcile source vs target vs reports
FAQs (SEO & Snippet Friendly)
Q1. What are Mindtree ETL testing interview questions?
They focus on real-time ETL scenarios, data reconciliation, and enterprise-grade data quality.
Q2. Which SQL is important for ETL testing roles?
JOINs, GROUP BY, window functions, and performance tuning queries.
Q3. How do you test SCD2 in real projects?
By validating history rows, effective dates, and current flags.
