Mindtree ETL Testing Interview Questions -

What Is ETL Testing? (Definition + Project Example)

ETL Testing is the process of validating data during Extract, Transform, and Load operations to ensure correctness, completeness, consistency, and performance in a Data Warehouse (DW).

Real-world project context

In large digital-transformation programs delivered by Mindtree (now part of LTIMindtree), ETL testing typically involves:

Multiple source systems (ERP, CRM, cloud apps)
Complex transformations (business rules, aggregations, SCD handling)
High data quality and SLA expectations for enterprise reporting

ETL testers are expected to validate data end-to-end and handle real-time data issues confidently during interviews.

Data Warehouse Flow: Source → Staging → Transform → Load → Reporting

Source – OLTP DBs, APIs, flat files
Staging – Raw data landing zone
Transformation – Business rules, joins, SCD logic
Load – Fact and Dimension tables
Reporting – BI dashboards, analytics, MIS reports

Mindtree ETL Testing Interview Questions & Answers

(Basic → Advanced | Scenario-Based & Real-Time)

🔹 Basic ETL Testing Questions (1–15)

What is ETL testing?
Validation of data accuracy and transformations during ETL processing.
Why is ETL testing important in Mindtree projects?
Because enterprise clients depend on accurate analytics and reports.
What is Source-to-Target (S2T) mapping?
A document defining source columns, transformations, and target columns.
What are the main ETL testing types?
Source, transformation, target, reconciliation, performance testing.
What is staging area testing?
Verifying staging data matches source exactly.
What are audit fields?
load_date, batch_id, created_by, updated_date.
How do you validate record counts?
Compare source vs target counts after applying filters.
What is data reconciliation?
Matching totals and counts between source and target.
Difference between fact and dimension table?
Facts store measures; dimensions store descriptive attributes.
What is data warehouse testing?
Testing schemas, data models, ETL, and reports.
What is full load vs incremental load?
Full reloads all data; incremental loads only deltas.
What is reject data testing?
Validating rejected records and error reasons.
What is data quality testing?
Accuracy, completeness, consistency checks.
What is mapping validation?
Verifying transformations match S2T.
What is end-to-end ETL testing?
Source → DW → Report validation.

🔹 Scenario-Based Intermediate Questions (16–35)

Scenario: Source has 1M records, target has 970K. What do you check?
Filters, rejects, duplicate removal logic.
How do you test NULL handling?
Validate mandatory columns and default values.
What is SCD Type 1?
Overwrites old values without maintaining history.
What is SCD Type 2?
Maintains history with effective dates and current_flag.
Scenario: Customer address changed. What do you test?
New SCD2 record insertion and expiry of old record.
What is surrogate key testing?
Ensure uniqueness and non-null values.
What is late-arriving dimension?
Fact arrives before dimension; validate dummy key handling.
Scenario: Duplicate records in target but not in source. Why?
Incorrect join or missing deduplication.
How do you test aggregation logic?
Compare aggregated source data with target.
What is hashing used for?
Change detection and deduplication.
How do you test CDC (Change Data Capture)?
Validate only changed records are loaded.
Scenario: Negative sales amount appears. What do you test?
Business rule validation.
What is referential integrity testing?
Fact keys must exist in dimension tables.
Scenario: Data type mismatch error. What to validate?
Casting and truncation rules.
How do you test date transformations?
Time zone and format validation.
What is soft delete testing?
Validate delete_flag instead of physical delete.
Scenario: ETL job rerun creates duplicates. Why?
Restartability logic failure.
What is schema drift?
Source schema change; ETL adaptation testing.
How do you test BI reports?
Compare report totals with DW tables.
What is data masking testing?
Validate PII fields are obfuscated.

🔹 Advanced & Real-Time Scenarios (36–55)

Scenario: ETL job misses SLA. What do you analyze?
Query plans, indexing, partitioning.
How do you test performance tuning?
Validate join strategy and parallel execution.
Scenario: Many-to-many join inflates data. What test?
Join cardinality validation.
How do you validate window functions?
Check ranking and dedup logic.
Scenario: Late data impacts aggregates. What do you do?
Recalculate impacted partitions.
How do you test audit and control tables?
Validate batch status and row counts.
Scenario: Report mismatch with DW. First step?
Verify S2T mapping and aggregation logic.
How do you test ETL rollback?
Ensure partial loads are reverted.
Scenario: Parallel jobs cause deadlocks. What test?
Concurrency and isolation validation.
How do you test historical reloads?
Check duplication and SCD logic.
Scenario: Source file arrives late. What test?
Dependency and rerun logic.
How do you test checksum/hash totals?
Compare source vs target hashes.
Scenario: Currency conversion mismatch. What to validate?
Exchange rate logic.
How do you test cloud ETL pipelines?
Cost, scalability, and performance.
Scenario: NULLs after LEFT JOIN. Why?
Missing dimension records.
How do you test archival logic?
Validate data movement to history tables.
Scenario: Incorrect SCD expiry date. Root cause?
Effective date logic error.
How do you validate metadata tables?
Batch IDs, timestamps, counts.
Scenario: Unexpected data spike. What test?
Source anomaly validation.
How do you do end-to-end ETL testing?
Source → DW → Report reconciliation.

Real SQL Query Examples for ETL Validation

Sample Tables

src_orders(order_id, cust_id, amount, order_dt)
dim_customer(cust_sk, cust_id, current_flag)
fact_sales(order_id, cust_sk, amount, load_dt)

1️⃣ Record Count Validation

SELECT COUNT(*) FROM src_orders;

SELECT COUNT(*) FROM fact_sales;

2️⃣ JOIN Validation

SELECT COUNT(*) AS missing_dim

FROM fact_sales f

LEFT JOIN dim_customer d

ON f.cust_sk = d.cust_sk

WHERE d.cust_sk IS NULL;

3️⃣ GROUP BY Aggregation

SELECT cust_id, SUM(amount)

FROM src_orders

GROUP BY cust_id;

4️⃣ Window Function – Deduplication

SELECT *

FROM (

SELECT order_id,

ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn

FROM src_orders

) t

WHERE rn = 1;

5️⃣ Performance Tuning Validation

EXPLAIN

SELECT *

FROM fact_sales

WHERE load_dt >= CURRENT_DATE – 1;

ETL Tools Commonly Asked in Mindtree Interviews

Informatica
Microsoft SQL Server Integration Services
Ab Initio
Pentaho
Talend

ETL Defect Examples + Test Case Sample

Defect Example

Issue: Duplicate records in fact table

Root Cause: Incorrect join logic
Fix: Correct join keys and add deduplication

Sample Test Case

Test Case: Validate SCD2 update
Expected Result:
- Old record expired
- New record inserted with current_flag = ‘Y’

Mindtree ETL Testing Quick Revision Sheet

Validate counts, sums, NULLs, duplicates
Check S2T mapping line-by-line
Test SCD1, SCD2, audit fields, hashing
Use JOIN, GROUP BY, window functions
Always reconcile source vs target vs reports

FAQs (SEO & Snippet Friendly)

Q1. What are Mindtree ETL testing interview questions?
They focus on real-time ETL scenarios, data reconciliation, and enterprise-grade data quality.

Q2. Which SQL is important for ETL testing roles?
JOINs, GROUP BY, window functions, and performance tuning queries.

Q3. How do you test SCD2 in real projects?
By validating history rows, effective dates, and current flags.