Accenture ETL Testing Interview Questions – Real-Time QA Guide with Answers

1. What is ETL Testing? (Definition + Real Example)

ETL Testing is the process of validating Extract, Transform, and Load operations to ensure data accuracy, completeness, consistency, and performance when data moves from source systems to a data warehouse (DW).

Real-Time Example (Accenture Project Context)

In a retail analytics project:

  • Source: POS systems, CRM, flat files
  • Target: Enterprise Data Warehouse
  • Validation:
    • Source vs target record counts
    • Transformation rules (discounts, tax, currency conversion)
    • SCD Type 2 for customer history
    • Audit fields (batch_id, load_date)
    • Performance SLA (nightly load < 2 hours)

Accenture interviewers expect strong ETL fundamentals + SQL validation skills + real project scenarios.


2. Data Warehouse Flow – Source → Staging → Transform → Load → Reporting

  1. Source Layer
    • OLTP databases, APIs, CSV/XML/JSON files
  2. Staging Layer
    • Raw data landing, minimal validation
  3. Transformation Layer
    • Business rules, joins, aggregations, SCD handling
  4. Load Layer
    • Fact and dimension tables
  5. Reporting Layer
    • BI tools, dashboards, analytics

Key testing focus: S2T mapping, data reconciliation, referential integrity, aggregates, performance.


3. ETL Architecture (Accenture Testing Perspective)

  • ETL Tool Layer: Informatica / SSIS / Ab Initio
  • Control Tables: Batch status, row counts
  • Audit Framework: Rejects, duplicates, hash totals
  • Parallel Processing: Partitioning for performance

4. Accenture ETL Testing Interview Questions & Answers (Basic → Advanced)

🔹 Basic ETL Testing Questions (1–15)

  1. What is ETL testing?
    Validation of data extraction, transformation logic, and loading into target systems.
  2. Why is ETL testing important at Accenture?
    Accenture handles enterprise-scale data where data quality directly impacts business decisions.
  3. What is a data warehouse?
    A centralized repository optimized for reporting and analytics.
  4. What is staging area?
    Temporary storage for raw extracted data.
  5. What is S2T mapping?
    Source-to-Target document defining column mapping and transformation rules.
  6. What is a fact table?
    Stores business measures like sales_amount.
  7. What is a dimension table?
    Stores descriptive attributes like customer, product.
  8. What are audit columns?
    load_date, batch_id, source_system, checksum.
  9. What is data reconciliation?
    Comparing source and target data for accuracy.
  10. What is full load?
    Complete reload of target data.
  11. What is incremental load?
    Loads only changed/new data.
  12. What is primary key testing?
    Ensuring uniqueness and non-null values.
  13. What is reject data?
    Invalid records captured separately.
  14. What is data profiling?
    Analyzing data patterns before ETL.
  15. What is null validation?
    Ensuring null handling follows business rules.

🔹 Intermediate ETL QA Questions (16–35)

  1. Explain SCD Type 1.
    Overwrites old dimension data (no history).
  2. Explain SCD Type 2.
    Maintains history using effective dates and flags.
  3. How do you test SCD2?
    Validate new row insertion, old row expiry, current_flag.
  4. What is surrogate key?
    System-generated unique key.
  5. How do you validate surrogate keys?
    Check uniqueness and non-null values.
  6. What is CDC?
    Change Data Capture to process deltas only.
  7. How do you test CDC logic?
    Compare before/after data snapshots.
  8. What is referential integrity testing?
    Ensure fact FK exists in dimension PK.
  9. What is aggregation testing?
    Validating SUM, COUNT, AVG calculations.
  10. What is lookup testing?
    Validate reference data mapping.
  11. What is deduplication?
    Removing duplicate business keys.
  12. How do you test dedup logic?
    Using window functions or group by.
  13. What is late-arriving dimension?
    Fact arrives before dimension.
  14. What is data skew?
    Uneven data distribution affecting performance.
  15. What is restartability testing?
    Ensuring job resumes after failure.
  16. What is metadata testing?
    Validating column types, lengths.
  17. What is data lineage?
    Tracking data from source to report.
  18. What is threshold testing?
    Job fails if rejects exceed limit.
  19. What is hash key?
    Used to detect data changes efficiently.
  20. What is SLA testing?
    Ensuring ETL jobs meet time limits.

🔹 Advanced / Scenario-Based Questions (36–60)

  1. How do you handle record count mismatch?
    Check filters, joins, rejects, CDC logic.
  2. How do you validate null handling?
    Check defaults, rejects, or allowed nulls.
  3. How do you test performance tuning?
    Partitioning, indexing, parallelism.
  4. How do you test incremental loads?
    Validate watermark logic.
  5. How do you test multi-source joins?
    Validate join keys and cardinality.
  6. How do you test aggregation failures?
    Recalculate totals using SQL.
  7. How do you test timezone conversions?
    Validate timestamps across zones.
  8. How do you test re-runs?
    Ensure no duplicate data.
  9. How do you test historical data loads?
    Backdated data validation.
  10. How do you validate audit tables?
    Compare source_count vs target_count.
  11. What causes performance bottlenecks?
    Large joins, data skew, missing indexes.
  12. How do you test file-based ETL?
    Header/footer, delimiter, encoding.
  13. How do you test schema changes?
    Backward compatibility checks.
  14. How do you validate reporting layer?
    BI totals vs DW aggregates.
  15. Explain a critical ETL defect you found.
    (Data loss, wrong aggregation, SCD failure)

5. Real SQL Query Examples for ETL Validation

Sample Tables

src_orders(order_id, cust_id, amount, order_date)
fact_sales(order_sk, cust_sk, sales_amt, order_date, batch_id)

JOIN Validation

SELECT COUNT(*)

FROM src_orders s

LEFT JOIN fact_sales f

ON s.order_id = f.order_sk

WHERE f.order_sk IS NULL;

GROUP BY Aggregation

SELECT order_date, SUM(amount)

FROM src_orders

GROUP BY order_date;

SELECT order_date, SUM(sales_amt)

FROM fact_sales

GROUP BY order_date;

Window Function – Deduplication

SELECT *

FROM (

  SELECT *, ROW_NUMBER() OVER

  (PARTITION BY order_id ORDER BY load_date DESC) rn

  FROM stage_orders

) t

WHERE rn = 1;

Performance Validation

EXPLAIN ANALYZE

SELECT cust_sk, SUM(sales_amt)

FROM fact_sales

GROUP BY cust_sk;


6. Scenario-Based ETL Testing Use Cases

ScenarioValidation
Record mismatchSource vs target count
Null valuesDefault or reject
Duplicate dataWindow functions
Late dataSCD2 backdated insert
Slow jobPartition & indexing

7. ETL Tools Commonly Asked at Accenture

  • Informatica
  • Microsoft SSIS
  • Ab Initio
  • Pentaho
  • Talend

Accenture focuses more on ETL concepts than tool syntax.


8. ETL Defect Examples + Sample Test Case

Defect: SCD2 record not expiring

  • Expected: old row current_flag = ‘N’
  • Actual: two active records
  • Severity: High

Test Case:

  • Update dimension attribute
  • Validate historical and current rows

9. ETL Testing Quick Revision Sheet

  • ETL flow & architecture
  • S2T mapping validation
  • SCD1 vs SCD2
  • SQL joins, group by, window functions
  • Performance & reconciliation

10. FAQs – Featured Snippet Ready

Q1. Does Accenture ask SQL in ETL interviews?
Yes, SQL validation is mandatory.

Q2. Is Informatica mandatory?
No, ETL concepts matter more than tools.

Q3. How many ETL rounds at Accenture?
Typically 1–2 technical rounds.

Leave a Comment

Your email address will not be published. Required fields are marked *