CGI ETL Testing Interview Questions – Real-Time QA Guide with Answers

1. What is ETL Testing? (Definition + Example)

ETL Testing ensures that data is correctly Extracted, Transformed, and Loaded from source systems into a data warehouse (DW) according to business rules, performance SLAs, and data quality standards.

Real-World Example (CGI Project Context)

In a healthcare or banking DW project handled by CGI:

  • Source: OLTP databases, flat files, APIs
  • Target: Enterprise Data Warehouse
  • Testing scope:
    • Source vs target record counts
    • Transformation logic (business rules, calculations)
    • SCD Type 1 & Type 2 handling
    • Audit fields (batch_id, load_date, checksum)
    • Performance tuning for nightly loads

CGI interviewers expect strong ETL fundamentals, SQL skills, and scenario-based thinking.


2. Data Warehouse Flow – Source → Staging → Transform → Load → Reporting

  1. Source Layer
    OLTP systems, CRM, ERP, files
  2. Staging Layer
    Raw data landing, minimal transformations
  3. Transformation Layer
    Business logic, joins, aggregations, SCD handling
  4. Load Layer
    Fact and dimension tables
  5. Reporting Layer
    BI dashboards and analytics

Testing focus: S2T mapping validation, data reconciliation, referential integrity, aggregates, and performance.


3. ETL Architecture – Tester’s Perspective (CGI)

  • ETL Tool Layer (Informatica / SSIS / Ab Initio)
  • Control Tables (job status, row counts)
  • Audit & Reconciliation Framework
  • Parallel Processing & Partitioning

Testers validate data accuracy, completeness, restartability, and SLA compliance.


4. CGI ETL Testing Interview Questions & Answers (Basic → Advanced)

🔹 Basic ETL Testing Questions (1–15)

  1. What is ETL testing?
    Validation of data extraction, transformation, and loading processes.
  2. Why is ETL testing important in CGI projects?
    CGI works with enterprise clients where incorrect data impacts business decisions and compliance.
  3. What is a data warehouse?
    Centralized repository optimized for reporting and analytics.
  4. What is staging area?
    Temporary storage for raw extracted data.
  5. What is S2T mapping?
    Source-to-Target document defining column-level mappings and rules.
  6. What is a fact table?
    Stores measurable business metrics.
  7. What is a dimension table?
    Stores descriptive attributes.
  8. What are audit fields?
    batch_id, load_date, source_system, checksum.
  9. What is data reconciliation?
    Comparing source and target data for accuracy.
  10. What is full load?
    Reloading all data into target.
  11. What is incremental load?
    Loading only new or changed records.
  12. What is primary key testing?
    Validating uniqueness and non-null values.
  13. What is reject data?
    Invalid records captured separately.
  14. What is data profiling?
    Analyzing source data patterns before ETL.
  15. What is null validation?
    Ensuring null handling follows business rules.

🔹 Intermediate ETL QA Questions (16–35)

  1. Explain SCD Type 1.
    Overwrites old data without history.
  2. Explain SCD Type 2.
    Maintains history using effective dates and flags.
  3. How do you test SCD2?
    Validate new row insertion and old row expiry.
  4. What is surrogate key?
    System-generated unique identifier.
  5. How do you validate surrogate keys?
    Check uniqueness and non-null constraints.
  6. What is CDC?
    Change Data Capture for delta processing.
  7. How do you test CDC logic?
    Compare before/after snapshots.
  8. What is referential integrity testing?
    Ensuring fact FK exists in dimension PK.
  9. What is aggregation testing?
    Validating SUM, COUNT, AVG calculations.
  10. What is lookup testing?
    Verifying reference data mappings.
  11. What is deduplication?
    Removing duplicate business keys.
  12. How do you test dedup logic?
    Using window functions or GROUP BY.
  13. What is late-arriving dimension?
    Fact arrives before dimension.
  14. What is data skew?
    Uneven data distribution affecting performance.
  15. What is restartability testing?
    Ensuring job resumes correctly after failure.
  16. What is metadata testing?
    Validating column names, types, lengths.
  17. What is data lineage?
    Tracking data from source to report.
  18. What is threshold testing?
    Job fails if rejects exceed limit.
  19. What is hashing in ETL?
    Detects data changes efficiently.
  20. What is SLA testing?
    Validating job completion within time limits.

🔹 Advanced / Scenario-Based Questions (36–60)

  1. How do you handle record count mismatch?
    Check filters, joins, rejects, CDC logic.
  2. How do you validate null handling?
    Verify defaults, rejects, or allowed nulls.
  3. How do you test ETL performance?
    Partitioning, indexing, parallel execution.
  4. How do you test incremental loads?
    Validate watermark logic.
  5. How do you test multi-source joins?
    Validate join keys and cardinality.
  6. How do you test aggregation failures?
    Recalculate totals using SQL.
  7. How do you test timezone conversions?
    Validate timestamps across zones.
  8. How do you test re-runs?
    Ensure no duplicate records.
  9. How do you test historical data loads?
    Validate backdated records.
  10. How do you validate audit tables?
    Compare source_count vs target_count.
  11. What causes ETL performance bottlenecks?
    Large joins, data skew, missing indexes.
  12. How do you test file-based ETL?
    Header/footer, delimiter, encoding.
  13. How do you test schema changes?
    Backward compatibility checks.
  14. How do you validate reporting layer data?
    BI totals vs DW aggregates.
  15. Explain a critical ETL defect you found.
    Example: Data loss, wrong aggregation, SCD failure.

5. Real SQL Query Examples for ETL Validation

Sample Tables

src_orders(order_id, cust_id, amount, order_date)
fact_sales(order_sk, cust_sk, sales_amt, order_date, batch_id)

JOIN Validation

SELECT COUNT(*)

FROM src_orders s

LEFT JOIN fact_sales f

ON s.order_id = f.order_sk

WHERE f.order_sk IS NULL;

GROUP BY Aggregation

SELECT order_date, SUM(amount)

FROM src_orders

GROUP BY order_date;

SELECT order_date, SUM(sales_amt)

FROM fact_sales

GROUP BY order_date;

Window Function – Deduplication

SELECT *

FROM (

  SELECT *,

         ROW_NUMBER() OVER

         (PARTITION BY order_id ORDER BY load_date DESC) rn

  FROM stage_orders

) t

WHERE rn = 1;

Performance Validation

EXPLAIN ANALYZE

SELECT cust_sk, SUM(sales_amt)

FROM fact_sales

GROUP BY cust_sk;


6. Scenario-Based ETL Testing Use Cases

ScenarioValidation
Record mismatchSource vs target counts
Null valuesDefault or reject logic
Duplicate dataWindow functions
Late dataSCD2 backdated insert
Slow jobPartitioning & indexing

7. ETL Tools Commonly Asked in CGI Interviews

  • Informatica
  • Microsoft SSIS
  • Ab Initio
  • Pentaho
  • Talend

CGI interviewers emphasize conceptual clarity over tool-specific syntax.


8. ETL Defect Examples + Sample Test Case

Defect: SCD2 record not expiring

  • Expected: old row current_flag = ‘N’
  • Actual: two active records
  • Severity: High

Sample Test Case:

  • Update dimension attribute
  • Validate new row insertion and old row expiry

9. ETL Testing Quick Revision Sheet

  • ETL flow & architecture
  • S2T mapping validation
  • SCD1 vs SCD2
  • SQL JOIN, GROUP BY, window functions
  • Performance tuning & reconciliation

10. FAQs – Snippet-Ready

Q1. Does CGI ask SQL in ETL testing interviews?
Yes, SQL validation is mandatory.

Q2. Is Informatica mandatory for CGI ETL roles?
No. ETL concepts are more important than tools.

Q3. How many ETL rounds at CGI?
Usually 1–2 technical rounds.

Leave a Comment

Your email address will not be published. Required fields are marked *