PwC ETL Testing Interview Questions

What Is ETL Testing? (Definition + PwC-Style Example)

ETL Testing is the process of validating data during Extract, Transform, and Load operations to ensure correctness, completeness, consistency, auditability, and performance in a Data Warehouse (DW).

PwC real-world project example

In consulting and assurance programs delivered by PwC, ETL testing commonly supports:

  • Financial & regulatory reporting
  • Risk and compliance analytics
  • Multi-source enterprise data platforms

An ETL tester ensures that source data, transformation logic, and final reports are fully traceable, auditable, and compliant with business and regulatory rules.


Data Warehouse Flow: Source → Staging → Transform → Load → Reporting

  1. Source – OLTP databases, ERP/CRM, flat files, APIs
  2. Staging – Raw data landing area (no business rules)
  3. Transformation – Cleansing, joins, aggregations, SCD logic
  4. Load – Fact and Dimension tables
  5. Reporting – BI dashboards, MIS, regulatory reports

PwC ETL Testing Interview Questions & Answers

(Basic → Advanced | Scenario-Based & Real-Time)


🔹 Basic ETL Testing Questions (1–15)

  1. What is ETL testing?
    Validating data accuracy, completeness, and transformations during ETL processing.
  2. Why is ETL testing critical in PwC projects?
    Because data is used for financial, audit, and regulatory decision-making.
  3. What is Source-to-Target (S2T) mapping?
    A document defining how each source column maps to target columns with transformations.
  4. What are the types of ETL testing?
    Source testing, staging testing, transformation testing, target testing, reconciliation, performance testing.
  5. What is staging area testing?
    Verifying staging data exactly matches source data.
  6. What are audit fields in ETL?
    load_date, batch_id, created_by, updated_date.
  7. How do you validate record counts?
    Compare source vs target counts after filters and transformations.
  8. What is data reconciliation?
    Matching totals, counts, and metrics between source and target.
  9. Difference between fact and dimension tables?
    Facts store measures; dimensions store descriptive attributes.
  10. What is data warehouse testing?
    Testing data models, ETL logic, and reports.
  11. What is full load vs incremental load?
    Full loads reload all data; incremental loads only deltas.
  12. What is reject data testing?
    Validating rejected records and error reasons.
  13. What is data quality testing?
    Accuracy, completeness, consistency checks.
  14. What is mapping validation?
    Ensuring transformations follow S2T exactly.
  15. What is end-to-end ETL testing?
    Source → DW → Report validation.

🔹 Scenario-Based Intermediate Questions (16–35)

  1. Scenario: Source has 1M records, target has 960K. What do you check?
    Filters, rejected rows, duplicate removal, join conditions.
  2. How do you test NULL handling?
    Validate mandatory columns and default value logic.
  3. What is SCD Type 1?
    Overwrites old data without maintaining history.
  4. What is SCD Type 2?
    Maintains history using effective_from, effective_to, and current_flag.
  5. Scenario: Customer address changes. What do you test?
    SCD2 history row creation and expiry of old record.
  6. What is surrogate key testing?
    Ensure surrogate keys are unique and non-null.
  7. What is a late-arriving dimension?
    Fact arrives before dimension; validate dummy/unknown keys.
  8. Scenario: Duplicate rows appear in target but not in source. Why?
    Many-to-many joins or missing dedup logic.
  9. How do you test aggregation logic?
    Compare aggregated source data with target totals.
  10. What is hashing used for in ETL?
    Change detection and deduplication.
  11. How do you test CDC (Change Data Capture)?
    Validate only changed records are processed.
  12. Scenario: Negative revenue appears in report. What do you test?
    Business rule validation and transformation logic.
  13. What is referential integrity testing?
    Fact foreign keys must exist in dimension tables.
  14. Scenario: Data type mismatch error occurs. What to validate?
    Casting, truncation, and precision rules.
  15. How do you test date transformations?
    Time zone, format, and business calendar validation.
  16. What is soft delete testing?
    Validate delete_flag instead of physical deletion.
  17. Scenario: ETL rerun creates duplicates. What failed?
    Restartability and idempotency logic.
  18. What is schema drift?
    Source schema changes impacting ETL.
  19. How do you validate BI reports?
    Reconcile report numbers with DW tables.
  20. What is data masking testing?
    Ensure PII fields are obfuscated in target.

🔹 Advanced & PwC Real-Time Scenarios (36–55)

  1. Scenario: ETL job misses SLA. What do you analyze?
    Query plans, indexing, partitioning, parallelism.
  2. How do you test ETL performance tuning?
    Validate join strategies and execution plans.
  3. Scenario: Many-to-many join inflates data. What test?
    Join cardinality validation.
  4. How do you validate window functions?
    Verify ranking, deduplication, and SCD logic.
  5. Scenario: Late data affects aggregates. What do you do?
    Recalculate impacted partitions.
  6. How do you test audit and control tables?
    Validate batch status, row counts, timestamps.
  7. Scenario: Regulatory report mismatch. First check?
    S2T mapping and aggregation logic.
  8. How do you test ETL rollback?
    Ensure partial loads are reverted.
  9. Scenario: Parallel jobs cause locking. What test?
    Concurrency and isolation testing.
  10. How do you test historical reloads?
    Check duplication and SCD logic.
  11. Scenario: Source file arrives late. What test?
    Dependency handling and rerun logic.
  12. How do you validate checksum/hash totals?
    Compare source vs target hashes.
  13. Scenario: Currency conversion mismatch. What to validate?
    Exchange rate tables and application logic.
  14. How do you test cloud DW ETL pipelines?
    Scalability, cost, and performance.
  15. Scenario: NULLs after LEFT JOIN. Why?
    Missing dimension records.
  16. How do you test archival logic?
    Validate data movement to history tables.
  17. Scenario: Incorrect SCD expiry date. Root cause?
    Effective date logic issue.
  18. How do you validate metadata tables?
    Batch IDs, counts, load timestamps.
  19. Scenario: Unexpected data spike. What test?
    Source anomaly and business rule validation.
  20. How do you perform end-to-end ETL testing?
    Source → DW → Report reconciliation.

Real SQL Query Examples for ETL Validation

Sample Tables

  • src_orders(order_id, cust_id, amount, order_dt)
  • dim_customer(cust_sk, cust_id, current_flag)
  • fact_sales(order_id, cust_sk, amount, load_dt)

1️⃣ Record Count Validation

SELECT COUNT(*) FROM src_orders;

SELECT COUNT(*) FROM fact_sales;

2️⃣ JOIN Validation

SELECT COUNT(*) AS missing_dim

FROM fact_sales f

LEFT JOIN dim_customer d

ON f.cust_sk = d.cust_sk

WHERE d.cust_sk IS NULL;

3️⃣ GROUP BY Aggregation

SELECT cust_id, SUM(amount)

FROM src_orders

GROUP BY cust_id;

4️⃣ Window Function – Deduplication

SELECT *

FROM (

  SELECT order_id,

         ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn

  FROM src_orders

) t

WHERE rn = 1;

5️⃣ Performance Tuning Validation

EXPLAIN

SELECT *

FROM fact_sales

WHERE load_dt >= CURRENT_DATE – 1;


ETL Tools Commonly Asked in PwC Interviews

  • Informatica
  • Microsoft SQL Server Integration Services
  • Ab Initio
  • Pentaho
  • Talend

ETL Defect Examples + Test Case Sample

Defect Example

Issue: Duplicate records in fact table

  • Root Cause: Incorrect join logic
  • Fix: Correct join keys and add deduplication

Sample Test Case

  • Test Case: Validate SCD2 update
  • Expected Result:
    • Old record expired
    • New record inserted with current_flag = ‘Y’

PwC ETL Testing Quick Revision Sheet

  • Validate counts, sums, NULLs, duplicates
  • Check S2T mappings thoroughly
  • Test SCD1, SCD2, audit fields, hashing
  • Use JOIN, GROUP BY, window functions
  • Always reconcile source vs target vs reports

FAQs (SEO & Snippet Friendly)

Q1. What are PwC ETL testing interview questions?
They focus on real-time ETL scenarios, reconciliation, and regulatory-grade data accuracy.

Q2. Which SQL is important for PwC ETL roles?
JOINs, GROUP BY, window functions, and performance tuning queries.

Q3. How do you test SCD2 in real projects?
By validating history rows, effective dates, and current flags.

Leave a Comment

Your email address will not be published. Required fields are marked *