🛡️ AI Agent for Malware Detection (Static Analysis) 💻

✨ Project Overview

This project builds an AI agent using a Random Forest model to detect malware by analyzing executable files (without running them!). We focus on static analysis – inspecting file characteristics and structure.

Due to challenges in accessing large external datasets, this project demonstrates a robust proof-of-concept using a small, custom-collected dataset of benign and dummy suspicious files. This showcases the core methodology of AI-driven static malware detection.

🚀 How It Works & Key Features

Our AI learns to spot malware by looking at a file's unique "fingerprint" (features) instead of its behavior.

Static Detection: Safe and fast analysis, no execution needed.
AI-Powered: Random Forest model for smart pattern recognition.
Manageable Data Handling: Demonstrates feature extraction and model training using a controlled, custom dataset.

📊 Dataset: Custom Samples

For this project, we utilize a custom dataset consisting of:

Benign Samples: Common Windows executable files (e.g., notepad.exe, calc.exe).
Suspicious Sample: The harmless eicar.com test file (used to simulate a suspicious executable).

Features are extracted from these files using the pefile library. This approach allows for a self-contained demonstration of the static analysis pipeline without requiring large, external malware datasets.

🛠️ Quick Setup Guide

Get this project running in a few steps!

1. Prepare Your System 💻

Python (3.10/3.11 Rec.): python.org (add to PATH!). 🐍
7-Zip (Windows): 7-zip.org (for general use, though not strictly needed for this small dataset).
Visual C++ Redistributable (x64): Microsoft Learn. Install & restart.
Disk Space: A few GBs of free space are sufficient for this custom dataset. 💾

2. Get Code & Setup Environment 📂

# Clone this repo
git clone [https://github.com/U210709718/AI-Agents-for-malware-analysis-and-detection.git](https://github.com/U210709718/AI-Agents-for-malware-analysis-and-detection.git)
cd AI-Agents-for-malware-analysis-and-detection

# Setup virtual environment & install libraries
py -m venv .venv
.\.venv\Scripts\activate # Windows
pip install -r requirements.txt # (You'll create this file)

3. Prepare Custom Data & Extract Features 🔬

Create Data Folders:

mkdir -p data/my_test_data/benign_samples
mkdir -p data/my_test_data/suspicious_samples

Collect Samples:
- Benign: Copy notepad.exe, calc.exe, mspaint.exe from C:\Windows\System32\ into data/my_test_data/benign_samples/.
- Suspicious: Manually create eicar.com in data/my_test_data/suspicious_samples/.
  - Open Notepad, paste X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H* exactly.
  - Save as eicar.com, select "All Files (*.*)" for type. (Your antivirus might quarantine it; restore it if needed).
Extract Features: This step uses custom_feature_extractor.py to get numerical features from your samples.
- Ensure custom_feature_extractor.py (from src/) is in src/.
- With your virtual environment active, run from the project root (AI-Agents-for-malware-analysis-and-detection/):
```
py src/custom_feature_extractor.py
```
This creates extracted_features.csv in data/my_test_data/.

4. Run the Detector! ▶️

Once features are extracted:

cd src
py malware_detector.py

This script loads your extracted_features.csv, scales the features, trains a Random Forest model, evaluates it, and saves your trained model (.pkl files) into the models/ directory.

(Training on this small dataset is very fast!)

📈 Results

(This section from malware_detector.py's output. due to the very small dataset, some metrics might be 0.0 or 1.0, and a UserWarning regarding a single label in y_test is expected and normal.)

The Random Forest model achieved the following performance on the custom test set:

Accuracy: 1.0000
Precision: 1.0000
Recall: 1.0000
F1-Score: 1.0000

Confusion Matrix:

[[2 0 ]
 [0  1]]

🎥 Video Demo: (https://youtu.be/bZv6dZoknXk)

User Interface

by using streamlit in python langauge, these are screenshorts of result of the analysis :

💡 Future Enhancements

Large Dataset Integration: Adapt the project to process and train on larger, real-world datasets (like EMBER 2018) once accessible, utilizing memory-efficient loading (e.g., np.memmap).
Live PE File Analysis: Implement a module to extract features from any new PE file and use the trained model for real-time prediction.
More ML Models: Explore other machine learning algorithms (e.g., Gradient Boosting, SVM, simple Neural Networks).
**Making the model predicting future risks

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data/my_test_data		data/my_test_data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
safe.png		safe.png
suspious.png		suspious.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ AI Agent for Malware Detection (Static Analysis) 💻

✨ Project Overview

🚀 How It Works & Key Features

📊 Dataset: Custom Samples

🛠️ Quick Setup Guide

1. Prepare Your System 💻

2. Get Code & Setup Environment 📂

3. Prepare Custom Data & Extract Features 🔬

4. Run the Detector! ▶️

📈 Results

🎥 Video Demo: (https://youtu.be/bZv6dZoknXk)

User Interface

💡 Future Enhancements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛡️ AI Agent for Malware Detection (Static Analysis) 💻

✨ Project Overview

🚀 How It Works & Key Features

📊 Dataset: Custom Samples

🛠️ Quick Setup Guide

1. Prepare Your System 💻

2. Get Code & Setup Environment 📂

3. Prepare Custom Data & Extract Features 🔬

4. Run the Detector! ▶️

📈 Results

🎥 Video Demo: (https://youtu.be/bZv6dZoknXk)

User Interface

💡 Future Enhancements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages