A lightweight Django Q&A chatbot powered by FastText embeddings and cosine-similarity search. It matches a user's message against a database of question/answer pairs and returns the closest answer, or a fallback message when nothing is similar enough. It supports English and Persian through separate pretrained models, selectable at runtime.
- Each
QAPair(question + answer) stores a precomputed FastText embedding of its question, generated automatically on save and held in the database as rawfloat32bytes. - On startup the app lazily builds a single pre-normalized in-memory matrix of all question embeddings. A chat request then costs one embedding (the incoming message) plus one matrix-vector product — not one embedding per stored question per request.
- The in-memory matrix is rebuilt automatically (via
post_save/post_deletesignals) whenever aQAPairis added, edited, or deleted, so it stays consistent without restarting the server. - If the best cosine similarity is below
SIMILARITY_THRESHOLD, a fallback reply is returned.
Why this matters: the matching cost is independent of corpus size in embedding terms — adding more FAQs does not multiply the per-request embedding work.
| Component | Version / role |
|---|---|
| Python | 3.11+ |
| Django | 5.2.7 — web framework, admin, ORM |
| FastText | fasttext-wheel 0.9.2 — word embeddings (prebuilt wheel) |
| NumPy | 1.24.3 — vectorized similarity search |
| hazm | 0.10.0 — Persian text normalization/tokenization (optional) |
| Database | SQLite (default); any Django-supported DB for production |
| Frontend | HTML + Bulma (CDN) + vanilla JS (fetch) |
git clone https://github.com/ITheEqualizer/Chatbot.git
cd Chatbot
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtPlace the .bin file in the project root (model files are gitignored — they are large):
Copy .env.example and export the variables (or set them in your shell / process manager):
cp .env.example .envSensible local defaults are built in, so you can skip this for development.
python manage.py migrate
python manage.py createsuperuser
python manage.py runserver- Chat UI: http://127.0.0.1:8000/
- Admin (add FAQs): http://127.0.0.1:8000/admin/
After adding FAQ entries in the admin, their embeddings are computed automatically on save. If you
imported rows in bulk or changed the model, run python manage.py rebuild_embeddings once.
All settings are read from environment variables in chatbot/settings.py:
| Variable | Default | Purpose |
|---|---|---|
DJANGO_SECRET_KEY |
insecure dev key | Set a real secret in production. |
DJANGO_DEBUG |
True |
Set False in production. |
DJANGO_ALLOWED_HOSTS |
localhost,127.0.0.1 |
Comma-separated allowed hostnames. |
CHATBOT_LANGUAGE |
en |
en or fa — picks the default model and preprocessor. |
MODEL_PATH |
follows CHATBOT_LANGUAGE |
Override the FastText model file path. |
SIMILARITY_THRESHOLD |
0.85 |
Minimum cosine similarity (0–1) to return an answer. |
BOT_LOG_LEVEL |
INFO |
Log level for the bot logger. |
Set CHATBOT_LANGUAGE=fa (with ChatBot_Persian.bin present). This selects both the Persian model
and the hazm-based preprocessor in bot/persian_process.py, and the
English/Persian content of your QAPair entries should match the selected language. Embeddings are
tied to the active model, so after switching languages run:
python manage.py rebuild_embeddingspython manage.py test botThe suite mocks FastText, so it runs without the model file or the fasttext package installed.
It covers preprocessing, cosine math, embedding storage, cache invalidation, the chat endpoint
(matching, threshold fallback, 400/405 handling), and CSRF enforcement.
chatbot/ Django project (settings, urls, wsgi/asgi)
bot/
models.py QAPair model (question, answer, embedding_vector) + save() override
embedding.py Lazy model loading, language-aware preprocessing, (de)serialization
cache.py EmbeddingCache: normalized in-memory matrix + vectorized search
views.py index page + chat_api endpoint
apps.py Connects cache-invalidation signals on startup
persian_process.py Persian preprocessing (hazm)
management/commands/rebuild_embeddings.py Recompute/store all embeddings
migrations/ Committed schema + embedding backfill
templates/bot/index.html Chat UI
static/bot/chat.js Frontend logic (sends CSRF token)
tests.py Test suite
manage.py
requirements.txt
.env.example Documented environment variables
- Set
DJANGO_DEBUG=False, a realDJANGO_SECRET_KEY, andDJANGO_ALLOWED_HOSTS. - Run
python manage.py collectstatic(output goes tostaticfiles/, which is gitignored). - Ship the FastText
.binmodel with your deployment (bake into the image or fetch on boot). - Serve with gunicorn/uWSGI behind HTTPS, e.g.
gunicorn chatbot.wsgi. - Upgrading an older clone that already had a
db.sqlite3(created before migrations were committed): runpython manage.py migrate --fake-initial, thenpython manage.py rebuild_embeddings.
MIT — see LICENSE.
- Django, for the framework and admin.
- Facebook Research, for FastText.
- The hazm project, for Persian NLP tooling.
یک چتبات سبک مبتنی بر Django که با استفاده از جاسازیهای FastText و شباهت کسینوسی کار میکند. پیام کاربر را با مجموعهای از جفتهای پرسش/پاسخ مطابقت میدهد و نزدیکترین پاسخ را برمیگرداند؛ اگر شباهت کافی نباشد، یک پاسخ پیشفرض داده میشود. از انگلیسی و فارسی پشتیبانی میکند.
برای هر QAPair، جاسازی پرسش یک بار هنگام ذخیره محاسبه و در پایگاهداده نگهداری میشود. هنگام
اجرا، همهٔ این جاسازیها در یک ماتریس نرمالشده در حافظه بارگذاری میشوند، بنابراین هر درخواست فقط
یک جاسازی (برای پیام ورودی) و یک ضرب ماتریسی هزینه دارد — نه یک جاسازی به ازای هر پرسش
در هر درخواست. این ماتریس هنگام افزودن/ویرایش/حذف QAPair بهصورت خودکار بهروزرسانی میشود.
git clone https://github.com/ITheEqualizer/Chatbot.git
cd Chatbot
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# مدل فارسی را دانلود کرده و با نام ChatBot_Persian.bin در ریشهٔ پروژه قرار دهید
python manage.py migrate
python manage.py createsuperuser
python manage.py runserverمتغیر محیطی CHATBOT_LANGUAGE=fa را تنظیم کنید (و فایل ChatBot_Persian.bin را قرار دهید). این
کار هم مدل فارسی و هم پیشپردازندهٔ مبتنی بر hazm را انتخاب میکند. پس از تغییر زبان یا مدل، دستور
زیر را اجرا کنید تا جاسازیها بازسازی شوند:
python manage.py rebuild_embeddingsتنظیمات از متغیرهای محیطی خوانده میشوند (به .env.example مراجعه کنید): DJANGO_SECRET_KEY،
DJANGO_DEBUG، DJANGO_ALLOWED_HOSTS، CHATBOT_LANGUAGE، MODEL_PATH، SIMILARITY_THRESHOLD.
python manage.py test botآزمونها مدل FastText را شبیهسازی میکنند و بدون نیاز به فایل مدل اجرا میشوند.
تحت مجوز MIT منتشر شده است — فایل LICENSE را ببینید.