UNIX Interview Questions for ETL Testing

What Is ETL Testing? (Definition + Real-World Example)

ETL Testing validates data during Extract → Transform → Load processes to ensure accuracy, completeness, consistency, auditability, and performance in a Data Warehouse (DW).

Real-world ETL + UNIX example

In a production ETL project, daily source files arrive on a UNIX server. An ETL tester uses UNIX commands to:

  • Check file arrival and size
  • Validate record counts before loading
  • Monitor logs and job status

Then SQL is used to validate transformations and target data. That’s why UNIX interview questions for ETL testing are extremely common in real interviews.


Data Warehouse Flow: Source → Staging → Transform → Load → Reporting

  1. Source – Databases, flat files, APIs
  2. Staging – Raw data landing (often on UNIX servers)
  3. Transformation – Business rules, joins, SCD logic
  4. Load – Fact & Dimension tables
  5. Reporting – BI dashboards, MIS, analytics

UNIX Interview Questions for ETL Testing (With Answers)

(Basic → Advanced | Real-Time Focus)


🔹 Basic UNIX Questions for ETL Testers (1–15)

  1. Why is UNIX important for ETL testing?
    ETL jobs, files, and logs usually run on UNIX servers.
  2. Which UNIX command checks file existence?
    ls, ls -ltr
  3. How do you check file size in UNIX?
    ls -lh, du -h filename

How do you count records in a flat file?

wc -l filename

How do you view first 10 records of a file?

head filename

How do you view last 10 records?

tail filename

How do you search a value in a file?

grep “ERROR” filename

How do you check file permissions?

ls -l filename

  1. What is a delimiter in ETL files?
    Character separating columns (comma, pipe |).

How do you count columns in a file?

head -1 file | awk -F”|” ‘{print NF}’

  1. How do you check job logs in UNIX?
    Using cat, less, grep.
  2. What is staging data?
    Raw data stored before transformation.

How do you rename a file?

mv oldfile newfile

How do you copy files in UNIX?

cp source target

How do you remove files?

rm filename


🔹 Scenario-Based UNIX ETL Questions (16–35)

  1. Scenario: Source file arrived late. How do you verify?
    Check timestamp using ls -ltr.

Scenario: File is empty. How do you test?

wc -l filename

  1. Scenario: File record count mismatches target table. What do you do?
    Compare wc -l with SQL COUNT(*).
  2. How do you validate delimiter issues?
    Use awk -F.

Scenario: Header/trailer present. How do you exclude?

tail -n +2 file | head -n -1

How do you validate duplicate records in a file?

sort file | uniq -d

  1. Scenario: Unexpected NULLs after load. UNIX check?
    Validate empty fields using awk.

How do you check job failure reason?

grep -i error job.log

  1. Scenario: Multiple files arrive for same day. What test?
    Validate file naming convention.

How do you check file encoding?

file filename

Scenario: Incorrect column count in file. How to detect?

awk -F”|” ‘{print NF}’ file | sort | uniq -c

  1. How do you test incremental file loads?
    Validate file timestamps and naming patterns.
  2. Scenario: Control file mismatch. What to check?
    Record counts and checksum.

How do you move processed files?

mv file /archive/

  1. Scenario: Large file performance issue. UNIX help?
    Split file using split.

How do you compress files?

gzip filename

How do you check disk space issues?

df -h

  1. Scenario: Permission denied error. What do you test?
    File ownership and permissions.
  2. How do you automate file validation?
    Shell scripts with wc, awk, grep.
  3. Why is shell scripting useful for ETL testers?
    Automates repetitive validation.

🔹 Advanced UNIX + ETL Scenarios (36–55)

  1. Scenario: Job rerun creates duplicate data. UNIX role?
    Validate archive and control files.

How do you validate checksum/hash in UNIX?

md5sum filename

  1. Scenario: File arrives partially. How to test?
    Compare expected vs actual record count.

How do you monitor running jobs?

ps -ef | grep jobname

  1. Scenario: Parallel file processing causes issues. Test?
    Validate file locking strategy.
  2. How do you schedule ETL jobs?
    Using crontab.

How do you view cron jobs?

crontab -l

  1. Scenario: Job stuck in running state. UNIX check?
    Process ID and logs.

How do you redirect output to a file?

command > output.txt

How do you append logs?

command >> log.txt

  1. Scenario: Disk full issue during ETL. UNIX validation?
    df -h
  2. How do you test archive cleanup?
    Validate retention policy using find.
  3. Scenario: Incorrect date in file name. Impact?
    Incremental load failure.
  4. How do you test end-to-end ETL using UNIX?
    File → staging → DB validation.
  5. Scenario: File delimiter changes suddenly. UNIX check?
    Validate column count using awk.
  6. How do you test restartability?
    Ensure processed files are archived.
  7. How do you identify long-running processes?
    top, ps -ef.
  8. Scenario: Log size too large. How to analyze?
    tail -1000 log.
  9. How do you test data quality thresholds?
    Combine UNIX + SQL validations.
  10. How do you explain UNIX usage in ETL interviews?
    File validation, job monitoring, automation.

Real SQL Query Examples Used Along With UNIX

Sample Tables

  • src_orders(order_id, cust_id, amount, order_dt)
  • fact_sales(order_id, cust_sk, amount, load_dt)

1️⃣ Record Count Validation

SELECT COUNT(*) FROM fact_sales;

Compare with:

wc -l orders_file.dat

2️⃣ GROUP BY Validation

SELECT cust_id, SUM(amount)

FROM fact_sales

GROUP BY cust_id;

3️⃣ JOIN Validation

SELECT COUNT(*)

FROM fact_sales f

LEFT JOIN dim_customer d

ON f.cust_sk = d.cust_sk

WHERE d.cust_sk IS NULL;

4️⃣ Window Function (Dedup)

SELECT *

FROM (

  SELECT order_id,

         ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY order_dt DESC) rn

  FROM src_orders

) t

WHERE rn = 1;


ETL Tools Commonly Used with UNIX

  • Informatica
  • Microsoft SQL Server Integration Services
  • Ab Initio
  • Pentaho
  • Talend

ETL Defect Examples + UNIX Test Case

Defect Example

Issue: Record count mismatch

  • Root Cause: Partial file load
  • UNIX Validation: wc -l, file size check
  • SQL Validation: COUNT(*)

Sample Test Case

  • Test: Validate daily source file load
  • Expected:
    • File exists
    • Record count matches target
    • Job log shows SUCCESS

UNIX for ETL Testing – Quick Revision Sheet

  • ls -ltr → file arrival
  • wc -l → record count
  • grep error → log validation
  • awk -F → delimiter & column checks
  • Combine UNIX + SQL for full validation

FAQs (Snippet-Friendly)

Q1. Why is UNIX important for ETL testing?
Because ETL jobs, files, and logs are managed on UNIX servers.

Q2. Which UNIX commands are mandatory for ETL testers?
ls, wc, grep, awk, tail, head.

Q3. How do UNIX and SQL work together in ETL testing?
UNIX validates files; SQL validates loaded data.

Leave a Comment

Your email address will not be published. Required fields are marked *