1. What is ETL Testing? (Definition + Example)
ETL Testing is the process of validating data as it moves through Extract → Transform → Load pipelines to ensure accuracy, completeness, consistency, and performance in a data warehouse (DW).
Simple Example
- Source: OLTP system (Orders table)
- Transformation: Currency conversion, deduplication, business rules
- Target: Fact_Sales table in DW
ETL Testing ensures:
- Correct records are extracted
- Transformations follow business logic
- Loaded data matches Source-to-Target (S2T) mapping
- No data loss, duplication, or performance issues
2. Data Warehouse Flow: Source → Staging → Transform → Load → Reporting
DW Layer Responsibilities
| Layer | Purpose |
| Source | Transactional systems (ERP, CRM, flat files, APIs) |
| Staging | Raw data landing zone, minimal transformations |
| Transformation | Business rules, joins, aggregations, SCD |
| Target (DW) | Facts & dimensions optimized for analytics |
| Reporting | BI tools, dashboards, KPIs |
3. ETL Architecture & Mapping Validation
ETL Architecture Components
- Source systems
- Staging tables
- ETL tool (Informatica / SSIS / Ab Initio)
- Data warehouse (Star/Snowflake schema)
- Reporting layer
Mapping Validation
- Column-to-column checks
- Data type & length validation
- Transformation logic validation
- Default values & audit fields
4. ETL Testing Interview Questions SQL (Basic → Advanced)
Below are 70+ interview-tested ETL testing interview questions SQL with crisp answers.
Basic ETL & SQL Questions (1–20)
- What is ETL testing?
Validation of extracted, transformed, and loaded data against business rules. - Difference between ETL and ELT?
ETL transforms before load; ELT transforms after load. - What is a staging table?
Temporary storage for raw extracted data. - What is S2T mapping?
Document mapping source columns to target columns with rules. - What SQL query checks record count?
SELECT COUNT(*) FROM source_orders;
SELECT COUNT(*) FROM target_fact_orders;
- What is data reconciliation?
Comparing source and target data to detect mismatches. - What are audit columns?
load_date, batch_id, created_by, updated_ts. - What is surrogate key?
System-generated key used in DW instead of natural key. - Difference between fact and dimension tables?
Facts store measures; dimensions store descriptive attributes. - What is data profiling?
Analyzing source data quality before ETL. - What is truncation testing?
Ensuring no data loss due to column length. - What is referential integrity in DW?
Fact keys must exist in dimension tables. - What SQL validates null values?
SELECT COUNT(*) FROM dim_customer WHERE email IS NULL;
- What is incremental load?
Loading only new or changed records. - What is full load?
Reloading entire dataset. - What is checksum or hashing in ETL?
Used to detect data changes. - What is CDC (Change Data Capture)?
Identifies source data changes. - Difference between inner join and left join?
Inner returns matches; left returns all from left. - What is data lineage?
Tracking data from source to report. - What is data validation vs data verification?
Validation checks business rules; verification checks movement.
Intermediate SQL & ETL Questions (21–45)
- How do you validate JOIN logic in ETL?
SELECT o.order_id, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;
- How to detect duplicate records?
SELECT order_id, COUNT(*)
FROM staging_orders
GROUP BY order_id
HAVING COUNT(*) > 1;
- What is Slowly Changing Dimension (SCD)?
Technique to manage dimension changes. - Difference between SCD Type 1 and Type 2?
Type 1 overwrites; Type 2 maintains history. - SCD Type 2 SQL validation example
SELECT customer_id, COUNT(*)
FROM dim_customer
GROUP BY customer_id
HAVING COUNT(*) > 1;
- How do you validate aggregations?
SELECT customer_id, SUM(order_amount)
FROM fact_orders
GROUP BY customer_id;
- What is late arriving dimension?
Fact arrives before dimension. - How do you handle null values in ETL?
Defaults, rejection, or transformation rules. - What is lookup transformation?
Fetches related data from reference tables. - How do you validate date transformations?
SELECT order_date
FROM fact_orders
WHERE order_date > CURRENT_DATE;
- Difference between truncate and delete?
Truncate is faster, no rollback. - What is data drift?
Unexpected changes in source data structure. - How do you validate currency conversion?
SELECT amount_usd = amount_local * exchange_rate
FROM staging_sales;
- What is reject table?
Stores failed records. - What is metadata testing?
Validating schema, data types, constraints. - What is a control table?
Tracks load status and batch info. - How to validate record count after transformation?
SELECT COUNT(*) FROM staging_orders WHERE status=’ACTIVE’;
- What is data skew?
Uneven data distribution affecting performance. - What is partitioning in DW?
Dividing tables to improve query performance. - What is indexing strategy in ETL?
Index after load for faster inserts. - What is surrogate key generation method?
Sequence, identity, or UUID. - What is data archival testing?
Ensuring old data is archived correctly. - Difference between OLTP and OLAP?
OLTP = transactions; OLAP = analytics. - What is transformation logic testing?
Validating business rules. - What is idempotent ETL?
Multiple runs give same result.
Advanced SQL & Performance Questions (46–70)
- Window function example in ETL validation
SELECT customer_id,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY updated_ts DESC) rn
FROM dim_customer;
- How do you validate latest record in SCD2?
Filter where current_flag = ‘Y’. - How to identify missing records?
SELECT s.id
FROM source s
LEFT JOIN target t ON s.id = t.id
WHERE t.id IS NULL;
- What is ETL performance testing?
Measuring load time, throughput, resource usage. - How do you tune slow ETL jobs?
Indexing, partitioning, pushdown optimization. - What is bulk load?
High-volume data loading. - What is pushdown optimization?
Executing transformations in database. - How do you validate historical data accuracy?
Compare effective_date ranges. - What is factless fact table?
Tracks events without measures. - How do you test error handling?
Validate reject counts and logs. - What is late arriving fact?
Fact arrives after reporting cycle. - How do you validate data freshness?
SELECT MAX(load_date) FROM fact_sales;
- What is data balancing?
Ensuring totals match across systems. - What is ETL regression testing?
Ensuring changes don’t break existing logic. - How do you validate negative scenarios?
Invalid data, nulls, boundary values. - What is parallel processing?
Running ETL jobs concurrently. - What is watermark column?
Used for incremental load tracking. - How do you validate decimal precision?
SELECT CAST(amount AS DECIMAL(10,2)) FROM staging;
- What is data anonymization testing?
Ensuring PII masking rules work. - What is reconciliation report?
Summary of counts, sums, rejects. - How do you test ETL restartability?
Restart from failure point. - What is schema evolution testing?
Validating changes in source schema. - What is data latency?
Delay between source update and DW availability. - How do you validate derived columns?
Compare formula results with source. - What is end-to-end ETL testing?
Source → report validation.
5. Real SQL Validation Examples (Sample Dataset)
Source Orders
| order_id | customer_id | amount |
| 1 | 101 | 500 |
| 2 | 102 | 300 |
Target Fact
SELECT customer_id, SUM(amount) total_amount
FROM fact_orders
GROUP BY customer_id;
6. Scenario-Based ETL Testing Questions
- Record count mismatch → check filters & reject tables
- Null values appearing → validate default rules
- Performance degradation → analyze indexes & partitions
- Duplicate facts → validate surrogate key logic
7. ETL Tools Used in Projects
- Informatica
- Microsoft SSIS
- Ab Initio
- Pentaho
- Talend
8. ETL Defect Examples + Test Case
Defect: Duplicate customer records in SCD2
Root Cause: Hash key not implemented
Fix: Implement MD5 hash on business keys
Sample Test Case
| Field | Value |
| Scenario | SCD2 Change |
| SQL | COUNT per customer_id |
| Expected | Only one active record |
9. Quick Revision Sheet
- Validate counts, sums, duplicates
- Check SCD1 vs SCD2
- Use JOIN, GROUP BY, window functions
- Always reconcile source vs target
10. FAQs (Featured Snippet Ready)
Q1. What SQL is most used in ETL testing?
JOIN, GROUP BY, COUNT, SUM, window functions.
Q2. Is SQL enough for ETL testing?
SQL + ETL tool + business understanding is required.
Q3. What is most important ETL testing skill?
Data analysis and SQL expertise.
