DGA Anomaly Detection

This project provides machine learning and NLP pipelines for detecting Domain Generation Algorithms (DGA) anomalies in DNS traffic. Malicious actors use DGAs to periodically generate rendezvous domain names, making botnets resilient to takedowns. This system categorizes domains as benign or malignant based on their structure.

Overview

Modular Features: Engineers token-based (ngram lengths) and raw byte-based features from domain strings.
Ensemble Model: Uses a robust state-of-the-art CatBoostClassifier ensemble model.
Automated Pipeline: Full end-to-end functionality (loading, processing, model training, evaluation) provided in standard scripts.

The Best Model

The current best-performing model is catboost.0.977.26_ensemble.model (trained on 1000 iterations). It leverages two sets of engineered features:

Ngram Lengths: Tokenizer extracts up to 14 token lengths.
Raw Bytes: 26 features representing the exact ASCII bytes (padded or cropped symmetrically).

Usage

You can launch the training and evaluation workflow by executing the main python script from the root directory:

python src/train.py

Note: The original Jupyter notebook containing earlier research is accessible in experiments/DGA_detection.ipynb.

Documentation

For an in-depth explanation of the logic behind this project, please refer to the markdown files in the docs/ folder:

Experiments: Background context and evolution of the models.
Tools: The specific tech stack and libraries used.
Workflows: The core architecture and execution logic.
Evaluations: Key metrics and rationale behind modeling strategies.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
docs		docs
experiments		experiments
models		models
src		src
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DGA Anomaly Detection

Overview

The Best Model

Usage

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DGA Anomaly Detection

Overview

The Best Model

Usage

Documentation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages