CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

CLAG is a memory framework for small language model (SLM) agents that organizes long-horizon memory through agent-driven clustering.

CLAG employs an SLM-driven router to assign incoming memories to semantically coherent clusters and autonomously generates cluster-specific profiles—including topic summaries and descriptive tags—to establish each cluster as a self-contained functional unit.

Overview

CLAG addresses this problem with three key components:

Agentic Routing: routes each new memory into the most appropriate semantic cluster
Localized Evolution: refines and consolidates memories only within the routed cluster
Cluster-Aware Two-Stage Retrieval: first selects relevant clusters, then retrieves fine-grained evidence only inside them

Repository Structure

.
├── CLAG_memory.py                  # CLAG memory implementation
├── test_CLAG.py                    # Main evaluation / experiment entry point
├── prepare_bioasq.py               # BioASQ preprocessing
├── prepare_bioasq_gold_context.py  # BioASQ preprocessing utilities
├── run_prepare_bioasq_all.sh       # BioASQ preprocessing pipeline
├── data/                           # Datasets and processed files
├── logs_CLAG/                      # Experiment logs
├── results_CLAG/                   # Output results
└── figure/                         # README figures

Installation

Requirements

Python 3.9+ recommended

Setup

git clone https://github.com/<your-org-or-user>/CLAG.git
cd CLAG

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

Note
Some NLTK resources may be downloaded at runtime if they are not already installed.

Data Preparation

Included datasets

The following datasets are already included under data/:

HotpotQA
LoCoMo

BioASQ

BioASQ is not distributed with this repository. Please download the required files from the official BioASQ participant area:

training10b
test dataset

After downloading, place the files under data/ and run:

bash run_prepare_bioasq_all.sh

This script generates processed files under data/processed/, including chunked versions used in the experiments.

Quickstart

Run CLAG evaluation with:

python3 test_CLAG.py \
  --dataset data/locomo10.json \
  --backend sglang \
  --model gpt-4o-mini

Common arguments

--dataset: path to the evaluation dataset JSON
--backend: backend to use (openai, ollama, or sglang)
--model: model name for the selected backend
--output: path to save output JSON
--ratio: evaluate only a subset of the dataset (0.0 to 1.0)
--retrieve_k: retrieval top-k (default: 10)

Backend Setup

OpenAI backend

export OPENAI_API_KEY="YOUR_KEY"

SGLang backend

By default, CLAG expects an SGLang server at:

http://localhost:30000

You can override this with:

--sglang_host
--sglang_port

Ollama backend

Make sure your Ollama server is running locally and the specified model is available.

Reproducibility

To reproduce the main experiments, use test_CLAG.py with the target dataset and backend configuration.

Citation

If you find this repository useful, please cite:

@article{roh2026clag,
  title={CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents},
  author={Roh, Taeyun and Jang, Wonjune and Jung, Junha and Kang, Jaewoo},
  journal={arXiv preprint arXiv:2603.15421},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

Overview

Repository Structure

Installation

Requirements

Setup

Data Preparation

Included datasets

BioASQ

Quickstart

Common arguments

Backend Setup

OpenAI backend

SGLang backend

Ollama backend

Reproducibility

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
figure		figure
CLAG_memory.py		CLAG_memory.py
README.md		README.md
load_dataset.py		load_dataset.py
prepare_bioasq.py		prepare_bioasq.py
prepare_bioasq_gold_context.py		prepare_bioasq_gold_context.py
requirements.txt		requirements.txt
run_prepare_bioasq_all.sh		run_prepare_bioasq_all.sh
test_CLAG.py		test_CLAG.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

Overview

Repository Structure

Installation

Requirements

Setup

Data Preparation

Included datasets

BioASQ

Quickstart

Common arguments

Backend Setup

OpenAI backend

SGLang backend

Ollama backend

Reproducibility

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages