Ab Initio ETL Testing Interview Questions – Real-Time QA Guide (40

1) What is ETL Testing? (Definition + Example)

ETL Testing validates Extract → Transform → Load processes to ensure data is accurately moved from source systems into a data warehouse (DW) with correct business rules, mappings, and performance.

Example:
A banking DW loads daily transactions from OLTP tables into a fact table. ETL testing verifies:

Row counts match (after filters)
Transformations (currency conversion, de-duplication)
Slowly Changing Dimensions (SCD1/SCD2)
Audit fields (batch_id, load_ts)
Performance within SLA

2) DW Flow – Source → Staging → Transform → Load → Reporting

Source: OLTP, files, APIs
Staging: Raw landing, minimal transforms
Transform: Business rules, lookups, aggregations
Load: Facts/dimensions with constraints
Reporting: BI tools consume curated data

Key validations: source-to-target (S2T), referential integrity, aggregates, reconciliation, restartability.

3) Ab Initio ETL Architecture (Tester’s View)

GDE graphs orchestrate flows
Transforms: Join, Rollup, Normalize, Denormalize
Metadata: Record formats, layouts
Control/Audit: Row counts, checksums, reject handling
Parallelism: Partitioning for performance

4) Interview Questions + Best Answers (Basic → Advanced)

Basics (1–15)

What is Ab Initio?
A high-performance ETL tool using parallel graphs for large-scale data processing.
What is ETL testing?
Validation of data extraction, transformation logic, and loading accuracy.
Difference between ETL and ELT?
ETL transforms before load; ELT loads first, transforms in DW.
What is S2T mapping?
Document mapping source fields to target fields with rules.
What is staging?
Temporary area for raw data before transformations.
What is a fact table?
Stores measurable metrics (e.g., sales_amount).
What is a dimension?
Descriptive attributes (customer, product).
What are audit fields?
batch_id, load_date, record_source, checksum.
What is reject handling?
Capturing invalid rows with reasons.
What is data reconciliation?
Matching counts/totals between source and target.
What is a lookup?
Reference data used to enrich records.
What is normalization?
Splitting repeating groups into rows.
What is denormalization?
Combining tables for query performance.
What is partitioning?
Dividing data for parallel processing.
What is checksum/hashing?
Detects data changes efficiently.

Intermediate (16–35)

Explain SCD Type 1 vs Type 2.
SCD1 overwrites; SCD2 preserves history with effective dates.
How do you validate SCD2?
Check new row insert, old row expiry, current_flag.
How to validate joins in Ab Initio?
Verify join keys, cardinality, null handling.
What is rollup?
Aggregates rows (SUM, COUNT) by keys.
How to test aggregations?
Recompute aggregates via SQL and compare.
What is CDC?
Change Data Capture—only deltas are processed.
How do you test CDC?
Compare before/after images, counts by operation.
What is surrogate key?
System-generated unique key for dimensions.
How to validate surrogate keys?
Uniqueness, non-null, sequence integrity.
What is late-arriving dimension?
Fact arrives before dimension; handle via placeholders.
How to test null handling?
Validate defaults, rejects, or pass-through rules.
What is data skew?
Uneven distribution causing performance issues.
How to handle data skew?
Re-partition, salting, skew hints.
What is restartability?
ETL resumes from last successful checkpoint.
How to test restartability?
Fail mid-run; rerun and validate idempotency.
What is metadata testing?
Validate record formats, data types, lengths.
How do you test file formats?
Delimiter, header/footer, encoding.
What is referential integrity?
Fact foreign keys exist in dimensions.
How to validate RI?
Anti-join facts vs dimensions.
What is SLA testing?
Ensure load completes within time limits.

Advanced (36–60)

How do you test performance tuning?
Partition strategy, parallelism, I/O metrics.
What is window function usage in ETL?
Running totals, dedup with ROW_NUMBER().
How to test deduplication?
Business key uniqueness post-transform.
What is hashing in SCD2?
Detect attribute change using hash diff.
How to validate hash logic?
Recompute hash in SQL and compare.
What is data lineage?
Trace data from source to report.
How to test lineage?
Verify S2T and report calculations.
What is bad record threshold?
Max allowed rejects before job fails.
How to test thresholds?
Inject bad rows and observe behavior.
What is incremental vs full load?
Delta vs reload all data.
How to test incremental loads?
Counts by date watermark.
How do you validate timestamps/time zones?
Check conversions, DST handling.
What is multi-file dependency?
Jobs relying on multiple inputs.
How to test dependencies?
Control tables, arrival checks.
What is reconciliation at report level?
BI totals match DW aggregates.

5) Real SQL Query Examples (Validation)

Sample Data

Source Orders (src_orders)
(order_id, cust_id, amount, order_dt)

Target Fact (fact_sales)
(order_sk, cust_sk, sales_amt, order_dt, batch_id)

JOIN Validation

SELECT COUNT(*)

FROM src_orders s

LEFT JOIN fact_sales f

ON s.order_id = f.order_sk

WHERE f.order_sk IS NULL;

GROUP BY Aggregation

SELECT order_dt, SUM(amount) src_sum

FROM src_orders

GROUP BY order_dt;

SELECT order_dt, SUM(sales_amt) tgt_sum

FROM fact_sales

GROUP BY order_dt;

Window Function (De-dup)

SELECT *

FROM (

SELECT *, ROW_NUMBER() OVER

(PARTITION BY order_id ORDER BY load_ts DESC) rn

FROM stage_orders

) t

WHERE rn = 1;

Performance Tuning Check

EXPLAIN ANALYZE

SELECT cust_sk, SUM(sales_amt)

FROM fact_sales

GROUP BY cust_sk;

6) Scenario-Based ETL Testing Questions

Mismatch counts: Validate filters, joins, rejects
Nulls in mandatory fields: Defaults vs rejects
Duplicate records: Window functions, business keys
Late arriving data: Backdated updates
Slow job: Partitioning, indexes, parallelism

7) Tools Commonly Used by ETL QA

Informatica
Microsoft SSIS
Ab Initio
Pentaho
Talend

(Testers validate logic consistently across tools.)

8) ETL Defect Examples + Test Cases

Defect: SCD2 not expiring old row

Expected: current_flag=N, end_date populated
Actual: old row still current
Severity: High

Test Case:

Input change in dimension attribute
Validate two rows exist with correct flags

9) Quick Revision Sheet (Cheat Notes)

S2T, SCD1/SCD2, CDC
Counts, aggregates, RI
Window functions for dedup
Hashing for change detection
Performance: partition, parallelism

10) FAQs (Snippet-Ready)

Q: Is SQL mandatory for Ab Initio ETL testing?
Yes—SQL is essential for validation and reconciliation.

Q: How many rounds focus on ETL QA?
Usually 1–2 deep technical rounds.

Q: Can Informatica experience help?
Absolutely—concepts are transferable.

Ab Initio ETL Testing Interview Questions – Real-Time QA Guide (40–150 Q&A)