TCS ETL Testing Interview Questions

What Is ETL Testing? (Definition + TCS Project Example)

ETL Testing is the process of validating data during Extract, Transform, and Load operations to ensure accuracy, completeness, consistency, auditability, and performance in a Data Warehouse (DW).

TCS real-world project context

In large-scale enterprise and BFSI programs delivered by Tata Consultancy Services, ETL testing typically involves:

  • Multiple heterogeneous source systems (OLTP, ERP, flat files)
  • High data volumes and strict SLAs
  • Regulatory, financial, and MIS reporting requirements

A TCS ETL tester is expected to validate source data, complex transformations, SCD logic, and final reports using strong SQL skills and scenario-based thinking.


Data Warehouse Flow: Source → Staging → Transform → Load → Reporting

  1. Source – OLTP databases, ERP/CRM, files, APIs
  2. Staging – Raw data landing (no business rules)
  3. Transformation – Cleansing, joins, aggregations, SCD handling
  4. Load – Fact and Dimension tables
  5. Reporting – BI dashboards, MIS, regulatory reports

TCS ETL Testing Interview Questions & Answers

(Basic → Advanced | Real-World & Scenario-Based)


🔹 Basic ETL Testing Questions (1–15)

  1. What is ETL testing?
    Validation of data accuracy and transformations during ETL processing.
  2. Why is ETL testing important in TCS projects?
    Because enterprise clients depend on accurate analytics and compliance reports.
  3. What is Source-to-Target (S2T) mapping?
    A document defining source fields, transformation logic, and target columns.
  4. What are the types of ETL testing?
    Source testing, staging testing, transformation testing, target testing, reconciliation, performance testing.
  5. What is staging area testing?
    Ensuring staging data exactly matches source data.
  6. What are audit fields?
    load_date, batch_id, created_by, updated_date.
  7. How do you validate record counts?
    Compare source vs target counts after filters.
  8. What is data reconciliation?
    Matching totals and counts between source and target.
  9. Difference between fact and dimension tables?
    Facts store measures; dimensions store descriptive attributes.
  10. What is data warehouse testing?
    Testing schemas, ETL logic, and reports.
  11. What is full load vs incremental load?
    Full load reloads all data; incremental load processes deltas.
  12. What is reject data testing?
    Validating rejected rows and error reasons.
  13. What is mapping validation?
    Verifying transformations follow S2T exactly.
  14. What is end-to-end ETL testing?
    Source → DW → Report validation.
  15. Why is SQL mandatory for ETL testers?
    SQL validates data, transformations, and aggregates.

🔹 Scenario-Based Intermediate Questions (16–35)

  1. Scenario: Source has 1M records, target has 9.7L. What do you check?
    Filters, rejected rows, duplicate elimination, join conditions.
  2. How do you test NULL handling?
    Validate mandatory columns and default values.
  3. What is SCD Type 1?
    Overwrites old data without maintaining history.
  4. What is SCD Type 2?
    Maintains history using effective_from, effective_to, and current_flag.
  5. Scenario: Customer address changes. What do you test?
    New SCD2 record insertion and expiry of old record.
  6. What is surrogate key testing?
    Ensure surrogate keys are unique and non-null.
  7. What is late-arriving dimension?
    Fact arrives before dimension; validate dummy/unknown key handling.
  8. Scenario: Duplicate rows appear in target but not in source. Why?
    Incorrect joins or missing deduplication logic.
  9. How do you test aggregation logic?
    Compare aggregated source totals with target data.
  10. What is hashing used for?
    Change detection and deduplication.
  11. How do you test CDC (Change Data Capture)?
    Validate only changed records are loaded.
  12. Scenario: Negative amount appears in report. What do you test?
    Business rule validation.
  13. What is referential integrity testing?
    Fact foreign keys must exist in dimension tables.
  14. Scenario: Data type mismatch during load. What to validate?
    Casting, truncation, precision rules.
  15. How do you test date transformations?
    Time zone and format validation.
  16. What is soft delete testing?
    Validate delete_flag instead of physical deletion.
  17. Scenario: ETL job rerun creates duplicates. Why?
    Restartability logic failure.
  18. What is schema drift?
    Source schema changes impacting ETL.
  19. How do you validate BI reports?
    Reconcile report totals with DW tables.
  20. What is data masking testing?
    Ensure PII fields are obfuscated.

🔹 Advanced & Real-Time TCS Scenarios (36–55)

  1. Scenario: ETL job misses SLA. What do you analyze?
    Query plans, indexing, partitioning.
  2. How do you test ETL performance tuning?
    Validate join strategy and execution plans.
  3. Scenario: Many-to-many join inflates data. What test?
    Join cardinality validation.
  4. How do you validate window functions?
    Check ranking and deduplication logic.
  5. Scenario: Late data impacts aggregates. What do you do?
    Recalculate impacted partitions.
  6. How do you test audit/control tables?
    Validate batch status and row counts.
  7. Scenario: Report mismatch with DW. First step?
    Verify S2T mapping and aggregation logic.
  8. How do you test ETL rollback?
    Ensure partial loads are reverted.
  9. Scenario: Parallel jobs cause deadlocks. What test?
    Concurrency and locking validation.
  10. How do you test historical reloads?
    Check duplication and SCD handling.
  11. Scenario: Source file arrives late. What to test?
    Dependency and rerun logic.
  12. How do you validate checksum/hash totals?
    Compare source vs target hashes.
  13. Scenario: Currency conversion mismatch. What to validate?
    Exchange rate tables and logic.
  14. How do you test cloud ETL pipelines?
    Scalability, cost, and performance.
  15. Scenario: NULLs after LEFT JOIN. Why?
    Missing dimension records.
  16. How do you test archival logic?
    Validate movement to history tables.
  17. Scenario: Incorrect SCD expiry date. Root cause?
    Effective date logic issue.
  18. How do you validate metadata tables?
    Batch IDs, timestamps, counts.
  19. Scenario: Unexpected data spike. What test?
    Source anomaly and business rule validation.
  20. How do you perform end-to-end ETL testing in TCS projects?
    Source → DW → Report reconciliation.

Real SQL Query Examples for ETL Validation

Sample Tables

  • src_orders(order_id, cust_id, amount, order_dt)
  • dim_customer(cust_sk, cust_id, current_flag)
  • fact_sales(order_id, cust_sk, amount, load_dt)

1️⃣ Record Count Validation

SELECT COUNT(*) FROM src_orders;

SELECT COUNT(*) FROM fact_sales;

2️⃣ JOIN Validation

SELECT COUNT(*) AS missing_dim

FROM fact_sales f

LEFT JOIN dim_customer d

ON f.cust_sk = d.cust_sk

WHERE d.cust_sk IS NULL;

3️⃣ GROUP BY Aggregation

SELECT cust_id, SUM(amount)

FROM src_orders

GROUP BY cust_id;

4️⃣ Window Function – Deduplication

SELECT *

FROM (

  SELECT order_id,

         ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn

  FROM src_orders

) t

WHERE rn = 1;

5️⃣ Performance Tuning Validation

EXPLAIN

SELECT *

FROM fact_sales

WHERE load_dt >= CURRENT_DATE – 1;


ETL Tools Commonly Asked in TCS Interviews

  • Informatica
  • Microsoft SQL Server Integration Services
  • Ab Initio
  • Pentaho
  • Talend

ETL Defect Examples + Test Case Sample

Defect Example

Issue: Duplicate records in fact table

  • Root Cause: Incorrect join logic
  • Fix: Correct join keys and add deduplication

Sample Test Case

  • Test Case: Validate SCD2 update
  • Expected Result:
    • Old record expired
    • New record inserted with current_flag = ‘Y’

TCS ETL Testing – Quick Revision Sheet

  • Validate counts, sums, NULLs, duplicates
  • Check S2T mappings line-by-line
  • Test SCD1, SCD2, audit fields, hashing
  • Use JOIN, GROUP BY, window functions
  • Always reconcile source vs target vs reports

FAQs (Snippet-Friendly)

Q1. What are TCS ETL testing interview questions?
They focus on real-time ETL scenarios, data reconciliation, and enterprise-grade data quality.

Q2. Which SQL is important for TCS ETL roles?
JOINs, GROUP BY, window functions, and performance tuning queries.

Q3. How do you test SCD2 in real projects?
By validating history rows, effective dates, and current flags.

Leave a Comment

Your email address will not be published. Required fields are marked *