How to Build a Simple Anomaly Detector for System Logs Using Python

Learn how to simulate login logs and detect unusual activities using Machine learning and python—no prior cybersecurity experience needed!

🔍 What You’ll Learn

In this step-by-step tutorial, you’ll:

Simulate login logs using Python.
Build a dataset with fake user activity.
Train a simple anomaly detection model using Isolation Forest.
Detect suspicious logins like brute-force attacks or abnormal behaviors.

🧰 Tools & Skills You Need

Basic Python knowledge
Familiarity with pandas, sklearn
Jupyter Notebook or any Python IDE
No real logs? No problem —we’ll simulate our own!

📦 Step 1: Simulate System Log Data

Let’s create a dummy dataset to mimic login events with timestamps, user IDs, login success/failure, and IP addresses.

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import random

# Set reproducible randomness
np.random.seed(42)

logins = []
start_time = datetime.now()

for i in range(1000):
    timestamp = start_time + timedelta(seconds=i * random.randint(1, 5))
    user_id = random.choice(['user1', 'user2', 'user3', 'admin', 'guest'])
    success = np.random.choice([1, 0], p=[0.95, 0.05])  # 5% failures
    ip_address = f"192.168.0.{random.randint(1, 255)}"
    logins.append([timestamp, user_id, success, ip_address])

logs_df = pd.DataFrame(logins, columns=['timestamp', 'user', 'success', 'ip'])
logs_df.to_csv("system_logs.csv", index=False)

📁 This will save a file called ⁣,system_logs.csv which we’ll use for our anomaly detector.

🧠 Step 2: Load and Explore the Logs

logs = pd.read_csv("system_logs.csv")
print(logs.head())
print(logs['user'].value_counts())

Take a look at your data. You’ll see patterns — maybe the admin failed more logins than expected? 👀

⚙️ Step 3: Feature Engineering

Let’s convert text-based data into numerical features that a machine learning model can understand.

from sklearn.preprocessing import LabelEncoder

df = logs.copy()

# Convert timestamps to seconds since start
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['time_diff'] = (df['timestamp'] - df['timestamp'].min()).dt.total_seconds()

# Encode users and IPs
le_user = LabelEncoder()
le_ip = LabelEncoder()
df['user_encoded'] = le_user.fit_transform(df['user'])
df['ip_encoded'] = le_ip.fit_transform(df['ip'])

# Final feature set
features = df[['time_diff', 'user_encoded', 'ip_encoded', 'success']]

🧪 Step 4: Apply Isolation Forest for Anomaly Detection

from sklearn.ensemble import IsolationForest

model = IsolationForest(contamination=0.05, random_state=42)
model.fit(features)

# Predict anomalies
df['anomaly'] = model.predict(features)

-1 → Anomaly detected ❗
1 → Normal login ✅

📊 Step 5: Visualize the Results

Let’s see how many anomalies were detected and who triggered them.

import matplotlib.pyplot as plt
import seaborn as sns

# Count anomalies
print(df['anomaly'].value_counts())

# Visualize
sns.scatterplot(data=df, x='time_diff', y='user_encoded', hue='anomaly')
plt.title("Anomaly Detection in Login Events")
plt.xlabel("Time (seconds)")
plt.ylabel("User")
plt.show()

👁 You’ll likely see that some logins from a particular user/IP stand out — those are our suspicious behaviors!

🛡️ Real-Life Use Case

This mini-project simulates a real-world use case:

An organization wants to detect unauthorized access attempts in real-time using machine learning.

With a real log dataset (from firewalls, servers, etc.), this approach becomes a powerful intrusion detection system.

🚀 What’s Next?

Integrate real server logs.
Add geolocation data (country/IP lookup).
Send alerts via email when anomalies are detected.
Use more advanced models like One-Class SVM, Autoencoders, or LSTM.

📁 Get the Full Code

👉 Visit the GitHub repository for this project

🔐 Final Thoughts

Machine learning isn’t just for big tech — you can use it right now to improve your cybersecurity defenses.

This project shows how easy it is to start with small, practical ideas and grow your skills step-by-step.

💬 Have Questions?

Drop your thoughts in the comments at @hackingwit. I’d love to hear what you’re building next!