1. Introduction
ETL testing interview questions on SQL queries are a core part of interviews for ETL QA, Data Warehouse Testing, BI Testing, and Data Validation roles. Unlike UI or API testing, ETL testing is data-centric, and SQL becomes the primary tool to validate correctness, completeness, and performance.
Interviewers expect candidates to:
- Understand ETL architecture
- Validate source-to-target (S2T) mappings
- Write complex SQL queries
- Handle real-time data mismatches
- Identify ETL defects before they hit reports
This article is written as a deeply interview-oriented, SQL-focused guide, useful for freshers, mid-level testers, and experienced data QA professionals.
2. What is ETL Testing? (Definition + Example)
ETL Testing validates the process of Extracting data from source systems, Transforming it using business rules, and Loading it into a target data warehouse or data mart.
Simple Example
- Source: Sales table from OLTP system
- Transform: Remove duplicates, convert currency, calculate total sales
- Load: Fact_Sales table in data warehouse
ETL testing ensures:
- No missing or duplicate records
- Transformations are correct
- Reports show accurate data
Typical ETL Architecture
- Source Systems – OLTP databases, flat files, APIs
- Staging Area – Raw extracted data
- Transformation Layer – Business rules applied
- Target (DW / Data Mart) – Fact & Dimension tables
- Reporting Layer – BI tools, dashboards
4. ETL Testing Interview Questions on SQL Queries (Basic → Advanced)
Basic ETL & SQL Interview Questions
Q1. What is ETL testing?
ETL testing verifies that data is correctly extracted, transformed, and loaded from source systems to target systems.
Q2. Why is SQL important in ETL testing?
SQL is used to validate record counts, data accuracy, transformations, aggregations, and performance.
Q3. What is a data warehouse?
A centralized repository that stores historical, integrated data for analysis and reporting.
Q4. What is a staging table?
A temporary table used to store raw extracted data before transformation.
Source-to-Target (S2T) Mapping Questions
Q5. What is S2T mapping?
A document that defines how source columns map to target columns along with transformation rules.
Q6. How do you validate S2T mapping using SQL?
By writing SQL queries that compare source values vs transformed target values.
SQL JOIN-Based Interview Questions
Q7. Why are JOINs important in ETL testing?
JOINs help validate relationships between source and target tables.
Example – Data Validation Using JOIN
SELECT s.order_id, s.amount AS src_amt, t.amount AS tgt_amt
FROM src_orders s
JOIN tgt_fact_orders t
ON s.order_id = t.order_id
WHERE s.amount <> t.amount;
Q8. Which JOIN is most commonly used in ETL validation?
INNER JOIN and LEFT JOIN.
GROUP BY & Aggregation Questions
Q9. Why is GROUP BY important in ETL testing?
It validates aggregated metrics such as total sales, revenue, and counts.
SELECT region, SUM(sales_amount)
FROM tgt_fact_sales
GROUP BY region;
Q10. How do you validate aggregated data?
By comparing aggregated results between source and target.
Window Function Interview Questions
Q11. What are window functions used for in ETL testing?
To calculate running totals, rankings, and partitions without collapsing rows.
SELECT customer_id,
SUM(amount) OVER (PARTITION BY customer_id) AS total_spend
FROM tgt_fact_sales;
Q12. Difference between GROUP BY and window functions?
| GROUP BY | Window Function |
| Aggregates rows | Retains row details |
| Reduces output rows | Same row count |
5. Slowly Changing Dimension (SCD) Questions with SQL
Q13. What is SCD Type 1?
Overwrites old data without maintaining history.
Q14. What is SCD Type 2?
Maintains historical records using effective dates and active flags.
SCD Type 2 Validation Query
SELECT customer_id, start_date, end_date, is_active
FROM dim_customer
WHERE customer_id = 101;
Q15. How do you test SCD2 logic?
- Old record expired
- New record inserted
- Only one active record exists
6. Record Count & Data Completeness SQL Examples
Record Count Validation
SELECT COUNT(*) FROM src_customer;
SELECT COUNT(*) FROM tgt_dim_customer;
Missing Records Validation
SELECT s.customer_id
FROM src_customer s
LEFT JOIN tgt_dim_customer t
ON s.customer_id = t.customer_id
WHERE t.customer_id IS NULL;
7. Null Handling & Default Value Scenarios
Q16. How do you test null handling in ETL?
By checking whether nulls are replaced with defaults or rejected.
SELECT * FROM tgt_dim_customer
WHERE email IS NULL;
8. Performance Tuning Interview Questions (SQL Focus)
Q17. How do you identify slow ETL queries?
Using execution plans and query statistics.
EXPLAIN ANALYZE
SELECT * FROM tgt_fact_sales
WHERE order_date >= ‘2025-01-01’;
Q18. How do indexes help ETL performance?
Indexes reduce scan time during joins and filters.
9. Scenario-Based ETL Testing Interview Questions
Scenario 1: Record Count Mismatch
Possible Causes:
- Filter condition issue
- Join mismatch
- Duplicate source records
Scenario 2: Incorrect Aggregation in Reports
Testing Approach:
- Validate GROUP BY logic
- Recalculate metrics manually
- Compare source vs target totals
Scenario 3: ETL Job Takes Too Long
Solutions:
- Partition data
- Optimize SQL
- Use parallel processing
10. ETL Tools Asked in Interviews
Common ETL tools you should be aware of:
- Informatica
- Microsoft SSIS
- Ab Initio
- Talend
- Pentaho
Interview Tip: SQL knowledge is more important than tool syntax.
11. ETL Defect Examples + Test Case Samples
Common ETL Defects
| Defect Type | Example |
| Data loss | Missing rows |
| Transformation error | Wrong calculation |
| Duplicate data | Multiple records |
| Performance issue | SLA breach |
Sample ETL Test Case
| Field | Value |
| Test Case ID | ETL_TC_01 |
| Scenario | Validate SCD2 |
| Source | src_customer |
| Target | dim_customer |
| Expected | History preserved |
12. ETL Testing Interview Questions – Advanced SQL
Q19. What is hashing in ETL testing?
Used to compare large datasets efficiently using checksum values.
Q20. What are audit fields?
Fields like created_date, updated_date, batch_id used for traceability.
Q21. How do you test incremental loads?
By validating delta records using last_updated_date or watermark columns.
13. Quick Revision Sheet (SQL-Focused)
- ETL = Extract + Transform + Load
- Always validate count + data + transformation
- JOIN & GROUP BY are mandatory
- SCD2 = history maintenance
- Performance testing matters
14. FAQs – ETL Testing Interview Questions on SQL Queries
Q1. Is ETL testing hard for beginners?
No, strong SQL and DW basics are enough.
Q2. Is ETL testing fully automated?
Mostly manual SQL-based with partial automation.
Q3. What is the most important ETL interview skill?
Writing and explaining SQL queries confidently.
Q4. Do companies expect tool expertise?
Conceptual understanding is more important than tool-specific syntax.
