Learn how to simulate login logs and detect unusual activities using Machine learning and python—no prior cybersecurity experience needed!
🔍 What You’ll Learn
In this step-by-step tutorial, you’ll:
- Simulate login logs using Python.
- Build a dataset with fake user activity.
- Train a simple anomaly detection model using Isolation Forest.
- Detect suspicious logins like brute-force attacks or abnormal behaviors.
🧰 Tools & Skills You Need
- Basic Python knowledge
- Familiarity with pandas, sklearn
- Jupyter Notebook or any Python IDE
- No real logs? No problem —we’ll simulate our own!
📦 Step 1: Simulate System Log Data
Let’s create a dummy dataset to mimic login events with timestamps, user IDs, login success/failure, and IP addresses.
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import random
# Set reproducible randomness
np.random.seed(42)
logins = []
start_time = datetime.now()
for i in range(1000):
timestamp = start_time + timedelta(seconds=i * random.randint(1, 5))
user_id = random.choice(['user1', 'user2', 'user3', 'admin', 'guest'])
success = np.random.choice([1, 0], p=[0.95, 0.05]) # 5% failures
ip_address = f"192.168.0.{random.randint(1, 255)}"
logins.append([timestamp, user_id, success, ip_address])
logs_df = pd.DataFrame(logins, columns=['timestamp', 'user', 'success', 'ip'])
logs_df.to_csv("system_logs.csv", index=False)
📁 This will save a file called ,system_logs.csv
which we’ll use for our anomaly detector.
🧠 Step 2: Load and Explore the Logs
logs = pd.read_csv("system_logs.csv")
print(logs.head())
print(logs['user'].value_counts())
Take a look at your data. You’ll see patterns — maybe the admin
failed more logins than expected? 👀
⚙️ Step 3: Feature Engineering
Let’s convert text-based data into numerical features that a machine learning model can understand.
from sklearn.preprocessing import LabelEncoder
df = logs.copy()
# Convert timestamps to seconds since start
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['time_diff'] = (df['timestamp'] - df['timestamp'].min()).dt.total_seconds()
# Encode users and IPs
le_user = LabelEncoder()
le_ip = LabelEncoder()
df['user_encoded'] = le_user.fit_transform(df['user'])
df['ip_encoded'] = le_ip.fit_transform(df['ip'])
# Final feature set
features = df[['time_diff', 'user_encoded', 'ip_encoded', 'success']]
🧪 Step 4: Apply Isolation Forest for Anomaly Detection
from sklearn.ensemble import IsolationForest
model = IsolationForest(contamination=0.05, random_state=42)
model.fit(features)
# Predict anomalies
df['anomaly'] = model.predict(features)
-1
→ Anomaly detected ❗1
→ Normal login ✅
📊 Step 5: Visualize the Results
Let’s see how many anomalies were detected and who triggered them.
import matplotlib.pyplot as plt
import seaborn as sns
# Count anomalies
print(df['anomaly'].value_counts())
# Visualize
sns.scatterplot(data=df, x='time_diff', y='user_encoded', hue='anomaly')
plt.title("Anomaly Detection in Login Events")
plt.xlabel("Time (seconds)")
plt.ylabel("User")
plt.show()
👁 You’ll likely see that some logins from a particular user/IP stand out — those are our suspicious behaviors!
🛡️ Real-Life Use Case
This mini-project simulates a real-world use case:
An organization wants to detect unauthorized access attempts in real-time using machine learning.
With a real log dataset (from firewalls, servers, etc.), this approach becomes a powerful intrusion detection system.
🚀 What’s Next?
- Integrate real server logs.
- Add geolocation data (country/IP lookup).
- Send alerts via email when anomalies are detected.
- Use more advanced models like One-Class SVM, Autoencoders, or LSTM.
📁 Get the Full Code
👉 Visit the GitHub repository for this project
🔐 Final Thoughts
Machine learning isn’t just for big tech — you can use it right now to improve your cybersecurity defenses.
This project shows how easy it is to start with small, practical ideas and grow your skills step-by-step.
💬 Have Questions?
Drop your thoughts in the comments at @hackingwit. I’d love to hear what you’re building next!