Home Wiki Programming & Logic Text Processing: grep, sed, and awk — The Engineer's Power Tools
Programming & Logic

Text Processing: grep, sed, and awk — The Engineer's Power Tools

grep: Searching Inside Text Files

When a factory server generates thousands of log lines per hour, you need tools to find specific information fast. grep is the most important text search tool in Linux.

grep "ERROR" /var/log/scada.log              # Find lines containing ERROR
grep -i "warning" /var/log/scada.log         # Case-insensitive
grep -n "ALARM" /var/log/scada.log           # Show line numbers
grep -c "CRITICAL" /var/log/scada.log        # Count matches
grep -r "modbus" /opt/scada/config/          # Search recursively
grep -v "DEBUG" /var/log/app.log             # Lines NOT matching (invert)
grep -A 3 "FAULT" /var/log/plc.log           # Show 3 lines after each match
grep -B 2 "FAULT" /var/log/plc.log           # 2 lines before

The context flags (-A, -B, -C) are essential for troubleshooting. When you find an error, you almost always need surrounding lines.

Regular Expressions: Advanced Search Patterns

grep "^2026-04-15" sensor.log                # Lines starting with a date
grep "\.csv$" filelist.txt                   # Lines ending with .csv
grep "sensor[AB]" data.log                   # sensorA or sensorB
grep -E "(ALARM|FAULT|CRITICAL)" system.log  # Multiple keywords
grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" access.log  # IP addresses
Pattern Matches
. Any single character
* Zero or more of previous char
^ Start of line
$ End of line
[abc] Any character in set
[^abc] Any character NOT in set
+ One or more (use -E)

sed: Search and Replace in Files

sed 's/old/new/' file.txt                    # Replace first per line
sed 's/old/new/g' file.txt                   # Replace ALL per line
sed -i 's/old/new/g' file.txt               # Edit in place
sed -i.bak 's/old/new/g' file.txt           # Edit in place, keep backup

Practical uses:

sed -i 's/192.168.1.50/192.168.1.100/g' /opt/scada/config/*.yaml  # Update IPs
sed '/^#/d' config.yaml                      # Delete comment lines
sed '/^$/d' report.txt                       # Delete blank lines
sed -n '10,20p' large_file.csv               # Print only lines 10-20

Always use -i.bak on production servers to create automatic backups before changes.

awk: Processing Column-Based Data

awk treats each line as fields separated by whitespace or a custom delimiter. It excels at structured data like CSV and log files.

awk '{print $1}' access.log                  # Print first column
awk -F',' '{print $2}' data.csv              # Comma delimiter, column 2
awk '$3 > 100' sensor_readings.csv           # Lines where column 3 > 100
awk -F',' '{sum += $2} END {print sum}' data.csv  # Sum column 2
awk -F',' '$4 == "ALARM"' events.csv         # Filter by status field

Formatted output:

awk -F',' '{printf "Sensor: %-10s Temp: %6.1f\n", $1, $2}' readings.csv

cut, sort, and uniq: Complementary Tools

cut -d',' -f1,3 data.csv                    # Extract fields 1 and 3
sort -n numbers.txt                          # Numeric sort
sort -t',' -k3 -n data.csv                  # Sort by column 3
sort error_codes.txt | uniq -c               # Count occurrences (must sort first)
sort error_codes.txt | uniq -d               # Show only duplicated lines

Practical Example: Extracting Temperature Alarms From a Sensor Log

Given /var/log/sensors/temp_2026-04-15.csv:

2026-04-15T08:00:01,sensor_01,72.3,NORMAL
2026-04-15T08:00:01,sensor_02,98.7,ALARM
2026-04-15T08:00:02,sensor_02,101.4,CRITICAL
# Find all alarm events
grep -E "(ALARM|CRITICAL)" /var/log/sensors/temp_2026-04-15.csv

# Count alarms per sensor
grep -E "(ALARM|CRITICAL)" /var/log/sensors/temp_2026-04-15.csv | \
  awk -F',' '{print $2}' | sort | uniq -c | sort -rn

# Highest temperature reading
awk -F',' '{print $3}' /var/log/sensors/temp_2026-04-15.csv | sort -n | tail -1

# Save CRITICAL events to a report
grep "CRITICAL" /var/log/sensors/temp_2026-04-15.csv > /tmp/critical_report.csv

This combination of grep, awk, sort, and uniq processes millions of lines in seconds.

Summary

In this lesson you learned the core text processing tools:

  • grep searches for patterns; use -r for directories, -E for extended regex.
  • Regular expressions match complex patterns like IPs and date ranges.
  • sed performs search-and-replace; always use -i.bak for safety.
  • awk processes column-based data with filtering and calculations.
  • cut, sort, and uniq complement the main tools for extraction and deduplication.
  • Combining these tools lets you analyze sensor logs and extract alarms in seconds.

In the next lesson, you will learn the Linux permission model for securing SCADA configuration files and controlling access to sensitive data.

grep sed awk regex text-processing log-analysis البحث في النصوص التعبيرات النظامية تحليل السجلات التصفية المعالجة النصية الأنماط