CTS ETL Testing Interview Questions – Real-Time Interview Guide with Answers

1. What is ETL Testing? (Definition + Real Example)

ETL Testing is the process of validating that data is correctly Extracted, Transformed, and Loaded from source systems into a data warehouse (DW) as per business rules, data quality standards, and performance SLAs.

Real-Time Example (CTS Project Scenario)

In a Cognizant (CTS) insurance or banking analytics project:

  • Source: OLTP systems, flat files, APIs
  • Target: Enterprise Data Warehouse
  • ETL Testing validates:
    • Source vs target record counts
    • Transformation logic (premium calculation, currency conversion)
    • SCD Type 1 & Type 2 dimension handling
    • Audit fields (batch_id, load_date, checksum)
    • Performance tuning for daily/nightly loads

CTS interviewers typically focus on ETL fundamentals + strong SQL skills + real project experience.


2. Data Warehouse Flow – Source → Staging → Transform → Load → Reporting

  1. Source Layer
    OLTP databases, CRM, ERP, flat files
  2. Staging Layer
    Raw extracted data with minimal validation
  3. Transformation Layer
    Business rules, joins, aggregations, SCD handling
  4. Load Layer
    Fact and dimension tables
  5. Reporting Layer
    BI tools, dashboards, analytics

Testing focus: S2T mapping validation, reconciliation, referential integrity, aggregates, and performance.


3. ETL Architecture – CTS Tester’s Perspective

  • ETL Tool Layer (Informatica / SSIS / Ab Initio)
  • Control Tables (job status, row counts, error counts)
  • Audit & Reconciliation Framework
  • Parallel Processing & Partitioning

ETL testers ensure data accuracy, completeness, restartability, and SLA compliance.


4. CTS ETL Testing Interview Questions & Answers (Basic → Advanced)

🔹 Basic ETL Testing Interview Questions (1–15)

  1. What is ETL testing?
    Validation of data extraction, transformation, and loading processes.
  2. Why is ETL testing important in CTS projects?
    CTS works on enterprise-scale data where data quality directly impacts business decisions.
  3. What is a data warehouse?
    A centralized repository for reporting and analytics.
  4. What is a staging area?
    Temporary storage for raw extracted data.
  5. What is S2T mapping?
    Source-to-Target document defining column-level mappings and transformation rules.
  6. What is a fact table?
    Stores measurable business metrics like sales or premium amounts.
  7. What is a dimension table?
    Stores descriptive attributes like customer, product, policy.
  8. What are audit fields?
    batch_id, load_date, source_system, checksum.
  9. What is data reconciliation?
    Comparing source and target data for completeness and accuracy.
  10. What is full load?
    Reloading all data into the target table.
  11. What is incremental load?
    Loading only new or changed records.
  12. What is primary key testing?
    Ensuring uniqueness and non-null values.
  13. What is reject data?
    Records that fail validation rules.
  14. What is data profiling?
    Analyzing source data patterns before ETL.
  15. What is null validation?
    Verifying how null values are handled per business rules.

🔹 Intermediate ETL QA Questions (16–35)

  1. Explain SCD Type 1.
    Overwrites old data without maintaining history.
  2. Explain SCD Type 2.
    Maintains history using effective dates and current flags.
  3. How do you test SCD2 logic?
    Validate new row insertion, old row expiry, and current_flag.
  4. What is a surrogate key?
    A system-generated unique identifier.
  5. How do you validate surrogate keys?
    Check uniqueness, sequence, and non-null values.
  6. What is CDC (Change Data Capture)?
    Processing only changed or new records.
  7. How do you test CDC logic?
    Compare before and after snapshots.
  8. What is referential integrity testing?
    Ensuring fact foreign keys exist in dimension primary keys.
  9. What is aggregation testing?
    Validating SUM, COUNT, AVG calculations.
  10. What is lookup testing?
    Verifying reference data mappings.
  11. What is deduplication?
    Removing duplicate business keys.
  12. How do you test dedup logic?
    Using GROUP BY or window functions.
  13. What is a late-arriving dimension?
    Fact arrives before the corresponding dimension.
  14. What is data skew?
    Uneven data distribution impacting performance.
  15. What is restartability testing?
    Ensuring ETL resumes correctly after failure.
  16. What is metadata testing?
    Validating column names, data types, and lengths.
  17. What is data lineage?
    Tracking data from source to report.
  18. What is threshold testing?
    Job fails if reject count exceeds limit.
  19. What is hashing in ETL?
    Detecting data changes efficiently.
  20. What is SLA testing?
    Ensuring ETL jobs complete within time limits.

🔹 Advanced / Scenario-Based Questions (36–60)

  1. How do you handle record count mismatch?
    Check filters, joins, rejects, CDC logic.
  2. How do you validate null handling?
    Verify defaults, rejects, or allowed nulls.
  3. How do you test ETL performance?
    Partitioning, indexing, and parallel execution.
  4. How do you test incremental loads?
    Validate watermark logic.
  5. How do you test multi-source joins?
    Validate join keys and cardinality.
  6. How do you test aggregation failures?
    Recalculate totals using SQL.
  7. How do you test timezone conversions?
    Validate timestamps across time zones.
  8. How do you test re-runs?
    Ensure no duplicate data is loaded.
  9. How do you test historical data loads?
    Validate back-dated inserts.
  10. How do you validate audit tables?
    Compare source_count vs target_count.
  11. What causes ETL performance bottlenecks?
    Large joins, data skew, missing indexes.
  12. How do you test file-based ETL?
    Header/footer, delimiter, encoding checks.
  13. How do you test schema changes?
    Backward compatibility validation.
  14. How do you validate reporting layer data?
    BI totals vs DW aggregates.
  15. Explain a critical ETL defect you found.
    Example: SCD2 failure, data loss, wrong aggregation.

5. Real SQL Query Examples for ETL Validation

Sample Tables

src_orders(order_id, cust_id, amount, order_date)
fact_sales(order_sk, cust_sk, sales_amt, order_date, batch_id)

JOIN Validation

SELECT COUNT(*)

FROM src_orders s

LEFT JOIN fact_sales f

ON s.order_id = f.order_sk

WHERE f.order_sk IS NULL;

GROUP BY Aggregation

SELECT order_date, SUM(amount)

FROM src_orders

GROUP BY order_date;

SELECT order_date, SUM(sales_amt)

FROM fact_sales

GROUP BY order_date;

Window Function – Deduplication

SELECT *

FROM (

  SELECT *,

         ROW_NUMBER() OVER

         (PARTITION BY order_id ORDER BY load_date DESC) rn

  FROM stage_orders

) t

WHERE rn = 1;

Performance Validation

EXPLAIN ANALYZE

SELECT cust_sk, SUM(sales_amt)

FROM fact_sales

GROUP BY cust_sk;


6. Scenario-Based ETL Testing Use Cases

ScenarioValidation
Record mismatchSource vs target counts
Null valuesDefault or reject logic
Duplicate recordsWindow functions
Late-arriving dataSCD2 backdated insert
Slow jobPartitioning & indexing

7. ETL Tools Commonly Asked in CTS Interviews

  • Informatica
  • Microsoft SSIS
  • Ab Initio
  • Pentaho
  • Talend

CTS interviewers focus more on ETL concepts and SQL validation than tool-specific syntax.


8. ETL Defect Examples + Sample Test Case

Defect: SCD2 record not expiring

  • Expected: old row current_flag = ‘N’
  • Actual: two active records
  • Severity: High

Sample Test Case:

  • Update dimension attribute
  • Validate new row insertion and old row expiry

9. ETL Testing Quick Revision Sheet

  • ETL flow & architecture
  • S2T mapping validation
  • SCD1 vs SCD2
  • SQL JOIN, GROUP BY, window functions
  • Performance tuning & reconciliation

10. FAQs – Snippet-Ready

Q1. Does CTS ask SQL in ETL testing interviews?
Yes, SQL validation is mandatory.

Q2. Is Informatica mandatory for CTS ETL roles?
No. Strong ETL concepts matter more than tools.

Q3. How many ETL interview rounds at CTS?
Usually 1–2 technical rounds.

Leave a Comment

Your email address will not be published. Required fields are marked *