Python for Engineers: From Zero to Data Analysis
Why Python for Industrial Engineers?
Imagine a factory floor with dozens of sensors -- temperature, pressure, vibration, flow rate -- each logging a reading every second. By the end of the day you have millions of numbers. How do you analyze them? How do you detect that a motor is overheating?
Python is the most widely used language for data analysis and machine learning. Its syntax reads almost like plain English, and its ecosystem of scientific libraries makes it the ideal tool for engineers who need to process real-world data without years of programming training.
Why Python specifically?
| Feature | Details |
|---|---|
| Easy syntax | Code reads like English sentences |
| Engineering libraries | NumPy, Pandas, Matplotlib, SciPy |
| Machine learning | TensorFlow, scikit-learn, PyTorch |
| Automation | File processing, report generation, device control |
| Huge community | Answers exist for virtually every question |
Variables and Data Types
A variable in Python is a named container for a value. You do not need to declare the type -- Python infers it automatically:
# Sensor data
temperature = 78.5 # float
machine_id = "CNC-042" # string
is_running = True # boolean
sensor_count = 12 # integer
# List of sensor readings
readings = [78.5, 79.1, 80.3, 77.8, 81.2]
# Dictionary for machine information
machine = {
"id": "CNC-042",
"type": "Lathe",
"location": "Hall 3",
"max_temp": 95.0
}
Core data types:
int-- whole numbers:42,-7,1000float-- decimal numbers:3.14,78.5str-- text:"hello","CNC-042"bool-- logical values:TrueorFalselist-- ordered sequences:[1, 2, 3]dict-- key-value pairs:{"temp": 78.5}
Conditions: Making Decisions
Suppose you are monitoring a motor's temperature and want an alert when it exceeds a threshold:
temperature = 88.5
max_allowed = 85.0
critical = 95.0
if temperature > critical:
print("CRITICAL! Shut down motor immediately!")
# send_emergency_stop()
elif temperature > max_allowed:
print(f"Warning: temperature {temperature}°C exceeds limit")
# send_alert_to_operator()
else:
print("Temperature within normal range")
Notice how the code reads almost like a plain English sentence.
Loops: Repeating Tasks
In industrial settings you often need to scan through a batch of readings or process an entire day of data:
# for loop: check each reading
readings = [78.5, 85.2, 92.1, 77.3, 88.9, 96.5]
max_allowed = 90.0
alerts = []
for i, temp in enumerate(readings):
if temp > max_allowed:
alerts.append(f"Reading {i+1}: {temp}°C - exceeded!")
print(f"Total violations: {len(alerts)}")
for alert in alerts:
print(f" - {alert}")
# while loop: continuous monitoring
import time
def read_sensor():
"""Simulate a sensor reading"""
import random
return round(random.uniform(70, 100), 1)
monitoring = True
while monitoring:
temp = read_sensor()
print(f"Current temperature: {temp}°C")
if temp > 95:
print("ALERT! Stopping monitor")
monitoring = False
time.sleep(1) # wait one second
Functions: Organizing Code
A function is a reusable block of code that performs a specific task. Instead of repeating yourself, write it once and call it whenever needed:
def check_temperature(temp, machine_name, max_limit=85.0):
"""
Check machine temperature and return status.
"""
if temp > max_limit * 1.1: # more than 110% of limit
return "critical", f"{machine_name}: critical temp ({temp}°C)"
elif temp > max_limit:
return "warning", f"{machine_name}: above limit ({temp}°C)"
else:
return "normal", f"{machine_name}: normal ({temp}°C)"
# Using the function
machines = [
("CNC-01", 82.3),
("CNC-02", 91.7),
("Pump-05", 96.2),
("Compressor-03", 78.1),
]
for name, temp in machines:
status, message = check_temperature(temp, name)
if status != "normal":
print(f"[{status.upper()}] {message}")
NumPy: Fast Scientific Computing
NumPy is the foundational library for scientific computing in Python. It handles arrays and mathematical operations at near-C speed:
import numpy as np
# Vibration sensor readings for one hour (3600 samples)
vibration_data = np.random.normal(loc=2.5, scale=0.8, size=3600)
# Basic statistics
print(f"Mean: {np.mean(vibration_data):.2f} mm/s")
print(f"Std Dev: {np.std(vibration_data):.2f} mm/s")
print(f"Max: {np.max(vibration_data):.2f} mm/s")
print(f"Min: {np.min(vibration_data):.2f} mm/s")
# Detect anomalies (beyond 3 standard deviations)
mean = np.mean(vibration_data)
std = np.std(vibration_data)
anomalies = vibration_data[np.abs(vibration_data - mean) > 3 * std]
print(f"Anomalies detected: {len(anomalies)}")
NumPy vs plain Python performance:
| Operation | Plain Python | NumPy |
|---|---|---|
| Sum 1M elements | ~150 ms | ~1 ms |
| 1000x1000 matrix multiply | Minutes | Fraction of a second |
| Memory usage | Inefficient | Optimized |
Pandas: Sensor Data Analysis
Pandas is the go-to tool for reading and analyzing tabular data -- perfect for sensor logs stored in CSV files:
import pandas as pd
# Read sensor CSV
df = pd.read_csv("sensor_log.csv")
# View the first 5 rows
print(df.head())
Assume the CSV contains columns: timestamp, machine_id, temperature, vibration, pressure
# Quick summary statistics
print(df.describe())
# Filter: only readings above 85°C
high_temp = df[df["temperature"] > 85]
print(f"High temperature readings: {len(high_temp)}")
# Group by: average temperature per machine
avg_per_machine = df.groupby("machine_id")["temperature"].mean()
print(avg_per_machine.sort_values(ascending=False))
# Add a new column: temperature alert flag
df["temp_alert"] = df["temperature"] > 85
# Save results
df.to_csv("analyzed_data.csv", index=False)
Practical Example: Analyzing a Full Day of Factory Data
Suppose you receive a CSV file with 24 hours of sensor data from a factory:
import pandas as pd
import numpy as np
# Load data
df = pd.read_csv("factory_24h.csv", parse_dates=["timestamp"])
# 1. General summary
print("=== Daily Summary ===")
print(f"Total readings: {len(df):,}")
print(f"Period: {df['timestamp'].min()} to {df['timestamp'].max()}")
print(f"Machines monitored: {df['machine_id'].nunique()}")
# 2. Detect violations
limits = {"temperature": 85, "vibration": 5.0, "pressure": 150}
for param, limit in limits.items():
violations = df[df[param] > limit]
pct = len(violations) / len(df) * 100
print(f"\n{param}: {len(violations)} violations ({pct:.1f}%)")
if len(violations) > 0:
worst = violations.loc[violations[param].idxmax()]
print(f" Worst reading: {worst[param]} at {worst['timestamp']}")
print(f" Machine: {worst['machine_id']}")
# 3. Hourly analysis: when do problems peak?
df["hour"] = df["timestamp"].dt.hour
hourly_avg = df.groupby("hour")["temperature"].mean()
peak_hour = hourly_avg.idxmax()
print(f"\nPeak average temperature at hour: {peak_hour}:00")
# 4. Per-machine report
print("\n=== Machine Report ===")
for machine in df["machine_id"].unique():
m_data = df[df["machine_id"] == machine]
avg_t = m_data["temperature"].mean()
max_t = m_data["temperature"].max()
alerts = len(m_data[m_data["temperature"] > 85])
print(f"{machine}: avg={avg_t:.1f}°C, max={max_t:.1f}°C, alerts={alerts}")
Next Steps
Once you have mastered these basics, you can expand into:
Matplotlibfor plotting sensor data chartsSciPyfor advanced statistics and Fourier analysisscikit-learnfor predictive maintenance and anomaly detectionopenpyxlfor automated Excel report generation- PLC integration via
pymodbusoropcuaprotocols
Python is not a replacement for industrial control languages like Structured Text or Ladder Logic -- but it is the ideal tool for data analysis and intelligent decision-making on the factory floor.