What Is ETL Testing? (Definition + Real-World Example)
ETL Testing validates data during Extract → Transform → Load processes to ensure accuracy, completeness, consistency, auditability, and performance in a Data Warehouse (DW).
Real-world ETL + UNIX example
In a production ETL project, daily source files arrive on a UNIX server. An ETL tester uses UNIX commands to:
- Check file arrival and size
- Validate record counts before loading
- Monitor logs and job status
Then SQL is used to validate transformations and target data. That’s why UNIX interview questions for ETL testing are extremely common in real interviews.
Data Warehouse Flow: Source → Staging → Transform → Load → Reporting
- Source – Databases, flat files, APIs
- Staging – Raw data landing (often on UNIX servers)
- Transformation – Business rules, joins, SCD logic
- Load – Fact & Dimension tables
- Reporting – BI dashboards, MIS, analytics
UNIX Interview Questions for ETL Testing (With Answers)
(Basic → Advanced | Real-Time Focus)
🔹 Basic UNIX Questions for ETL Testers (1–15)
- Why is UNIX important for ETL testing?
ETL jobs, files, and logs usually run on UNIX servers. - Which UNIX command checks file existence?
ls, ls -ltr - How do you check file size in UNIX?
ls -lh, du -h filename
How do you count records in a flat file?
wc -l filename
How do you view first 10 records of a file?
head filename
How do you view last 10 records?
tail filename
How do you search a value in a file?
grep “ERROR” filename
How do you check file permissions?
ls -l filename
- What is a delimiter in ETL files?
Character separating columns (comma, pipe |).
How do you count columns in a file?
head -1 file | awk -F”|” ‘{print NF}’
- How do you check job logs in UNIX?
Using cat, less, grep. - What is staging data?
Raw data stored before transformation.
How do you rename a file?
mv oldfile newfile
How do you copy files in UNIX?
cp source target
How do you remove files?
rm filename
🔹 Scenario-Based UNIX ETL Questions (16–35)
- Scenario: Source file arrived late. How do you verify?
Check timestamp using ls -ltr.
Scenario: File is empty. How do you test?
wc -l filename
- Scenario: File record count mismatches target table. What do you do?
Compare wc -l with SQL COUNT(*). - How do you validate delimiter issues?
Use awk -F.
Scenario: Header/trailer present. How do you exclude?
tail -n +2 file | head -n -1
How do you validate duplicate records in a file?
sort file | uniq -d
- Scenario: Unexpected NULLs after load. UNIX check?
Validate empty fields using awk.
How do you check job failure reason?
grep -i error job.log
- Scenario: Multiple files arrive for same day. What test?
Validate file naming convention.
How do you check file encoding?
file filename
Scenario: Incorrect column count in file. How to detect?
awk -F”|” ‘{print NF}’ file | sort | uniq -c
- How do you test incremental file loads?
Validate file timestamps and naming patterns. - Scenario: Control file mismatch. What to check?
Record counts and checksum.
How do you move processed files?
mv file /archive/
- Scenario: Large file performance issue. UNIX help?
Split file using split.
How do you compress files?
gzip filename
How do you check disk space issues?
df -h
- Scenario: Permission denied error. What do you test?
File ownership and permissions. - How do you automate file validation?
Shell scripts with wc, awk, grep. - Why is shell scripting useful for ETL testers?
Automates repetitive validation.
🔹 Advanced UNIX + ETL Scenarios (36–55)
- Scenario: Job rerun creates duplicate data. UNIX role?
Validate archive and control files.
How do you validate checksum/hash in UNIX?
md5sum filename
- Scenario: File arrives partially. How to test?
Compare expected vs actual record count.
How do you monitor running jobs?
ps -ef | grep jobname
- Scenario: Parallel file processing causes issues. Test?
Validate file locking strategy. - How do you schedule ETL jobs?
Using crontab.
How do you view cron jobs?
crontab -l
- Scenario: Job stuck in running state. UNIX check?
Process ID and logs.
How do you redirect output to a file?
command > output.txt
How do you append logs?
command >> log.txt
- Scenario: Disk full issue during ETL. UNIX validation?
df -h - How do you test archive cleanup?
Validate retention policy using find. - Scenario: Incorrect date in file name. Impact?
Incremental load failure. - How do you test end-to-end ETL using UNIX?
File → staging → DB validation. - Scenario: File delimiter changes suddenly. UNIX check?
Validate column count using awk. - How do you test restartability?
Ensure processed files are archived. - How do you identify long-running processes?
top, ps -ef. - Scenario: Log size too large. How to analyze?
tail -1000 log. - How do you test data quality thresholds?
Combine UNIX + SQL validations. - How do you explain UNIX usage in ETL interviews?
File validation, job monitoring, automation.
Real SQL Query Examples Used Along With UNIX
Sample Tables
- src_orders(order_id, cust_id, amount, order_dt)
- fact_sales(order_id, cust_sk, amount, load_dt)
1️⃣ Record Count Validation
SELECT COUNT(*) FROM fact_sales;
Compare with:
wc -l orders_file.dat
2️⃣ GROUP BY Validation
SELECT cust_id, SUM(amount)
FROM fact_sales
GROUP BY cust_id;
3️⃣ JOIN Validation
SELECT COUNT(*)
FROM fact_sales f
LEFT JOIN dim_customer d
ON f.cust_sk = d.cust_sk
WHERE d.cust_sk IS NULL;
4️⃣ Window Function (Dedup)
SELECT *
FROM (
SELECT order_id,
ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn
FROM src_orders
) t
WHERE rn = 1;
ETL Tools Commonly Used with UNIX
- Informatica
- Microsoft SQL Server Integration Services
- Ab Initio
- Pentaho
- Talend
ETL Defect Examples + UNIX Test Case
Defect Example
Issue: Record count mismatch
- Root Cause: Partial file load
- UNIX Validation: wc -l, file size check
- SQL Validation: COUNT(*)
Sample Test Case
- Test: Validate daily source file load
- Expected:
- File exists
- Record count matches target
- Job log shows SUCCESS
- File exists
UNIX for ETL Testing – Quick Revision Sheet
- ls -ltr → file arrival
- wc -l → record count
- grep error → log validation
- awk -F → delimiter & column checks
- Combine UNIX + SQL for full validation
FAQs (Snippet-Friendly)
Q1. Why is UNIX important for ETL testing?
Because ETL jobs, files, and logs are managed on UNIX servers.
Q2. Which UNIX commands are mandatory for ETL testers?
ls, wc, grep, awk, tail, head.
Q3. How do UNIX and SQL work together in ETL testing?
UNIX validates files; SQL validates loaded data.
