Compare different embedding models side-by-side with real benchmarks and metrics
This is a complete benchmarking framework for comparing different semantic search embedding models. Instead of guessing which model works best for semantic search, get real data:
- ⚡ Speed Metrics: Document indexing time and query latency
- 🎯 Relevance Scores: Semantic distance measurements
- 📈 Comparative Analysis: Side-by-side performance tables
- 🧪 Real-World Testing: 20 diverse documents across 4 categories
- 🔄 Reproducible Results: Same test suite every run
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| 🔵 Default | Lightweight | ⚡⚡⚡ | ⭐⭐ | Quick prototypes |
| 🟢 MiniLM-L6-v2 | 38M | ⚡⚡⚡⚡ | ⭐⭐⭐⭐ | Production (Balanced) |
| 🟣 MPNet-base-v2 | 109M | ⚡⚡ | ⭐⭐⭐⭐⭐ | High-accuracy search |
Testing: default
Documents added in: 3.573s
Avg query time: 0.4861s
Avg relevance distance: 0.8008 (lower is better)
Query Time (ms) Distance
Sport-related query 640.50 0.9197
Tech-related query 455.72 0.8826
Finance-related query 434.48 0.6427
Health-related query 478.39 0.6865
Programming query 421.42 0.8725
─────────────────────────────────────────────────────────
Testing: all-MiniLM-L6-v2
Documents added in: 0.649s
Avg query time: 0.0313s
Avg relevance distance: 0.4004 (lower is better)
Query Time (ms) Distance
Sport-related query 61.40 0.4599
Tech-related query 23.72 0.4413
Finance-related query 26.98 0.3214
Health-related query 26.36 0.3433
Programming query 18.19 0.4363
─────────────────────────────────────────────────────────
Testing: all-mpnet-base-v2
Documents added in: 1.396s
Avg query time: 0.1359s
Avg relevance distance: 0.4164 (lower is better)
Query Time (ms) Distance
Sport-related query 156.71 0.5530
Tech-related query 140.52 0.4529
Finance-related query 160.53 0.3032
Health-related query 104.48 0.3088
Programming query 117.06 0.4640
| Model | Add Time | Query Time | Relevance | Speed Rank | Accuracy Rank |
|---|---|---|---|---|---|
| Default | 3.57s | 486ms | 0.8008 | 🥉 | 🥉 |
| MiniLM-L6-v2 | 0.65s | 31ms | 0.4004 | 🥇 | 🥇 |
| MPNet-base-v2 | 1.40s | 136ms | 0.4164 | 🥈 | 🥈 |
| Category | Winner | Metric |
|---|---|---|
| ⚡ Fastest Document Indexing | all-MiniLM-L6-v2 | 0.649s |
| 🚀 Fastest Query Time | all-MiniLM-L6-v2 | 31ms avg |
| 🎯 Best Relevance Score | all-MiniLM-L6-v2 | 0.4004 distance |
pip install -r requirements.txtpython benchmark.pyExpected output: ~2-3 minutes (first run downloads embedding models)
semantic-search-benchmark/
├── benchmark.py # Main benchmark runner
├── benchmark_data.py # Dataset & test queries
├── test.py # Basic in-memory example
├── test_v2.py # Persistent storage example
├── test_v3.py # Semantic search demo
├── requirements.txt # Dependencies
├── .gitignore # Git ignore rules
└── README.md # This file
⚽ Sports (5 docs)
- Ronaldo scored an incredible goal last night
- Messi won the World Cup with Argentina
- Real Madrid won the Champions League final
- Liverpool defeated Manchester City
- Nadal won his 14th tennis grand slam
💻 Technology (5 docs)
- Python is a great programming language for beginners
- Machine learning is used in stock price prediction
- TensorFlow and PyTorch are popular deep learning frameworks
- Artificial intelligence is revolutionizing software development
- Cloud computing provides scalable infrastructure
💰 Finance (5 docs)
- The stock market crashed badly this week
- Interest rates are rising due to inflation
- Bitcoin reached a new all-time high
- The Federal Reserve raised interest rates
- Real estate prices continue to rise in major cities
🏥 Health (5 docs)
- Regular exercise improves cardiovascular health
- COVID-19 vaccines have saved millions of lives
- Mental health awareness is becoming increasingly important
- Healthy diet and sleep patterns prevent chronic diseases
- Meditation reduces stress and anxiety
- ⚽ "football player scored a goal"
- 🧠 "machine learning and artificial intelligence"
- 📊 "money and market crash"
- 💪 "exercise and health"
- 🖥️ "programming languages and software"
| Model | Speed | Accuracy | Use Case |
|---|---|---|---|
| Default | Very Fast | Low | Quick tests, low-stakes |
| MiniLM | ⚡ Fastest | ⭐⭐⭐⭐ High | RECOMMENDED - Most scenarios |
| MPNet | Slower | 🎯 Highest | Accuracy critical tasks |
- 🏃 Need speed? → Use MiniLM (balanced winner)
- 🎯 Need accuracy? → Use MPNet (but slower)
- 🚀 Production ready? → Use MiniLM (best all-around)
- 🧪 Just testing? → Use Default (fastest to setup)
Edit benchmark.py:
models_to_test = [
("default", "./benchmark_db_default"),
("all-MiniLM-L6-v2", "./benchmark_db_minilm"),
("all-mpnet-base-v2", "./benchmark_db_mpnet"),
("your-model-name", "./benchmark_db_custom"), # Add here
]Edit benchmark_data.py:
QUERY_TESTS = [
{
"query": "your query here",
"expected_category": "category",
"description": "Your description"
},
]Edit benchmark_data.py DOCUMENTS list to test different domains.
| Resource | Link | Description |
|---|---|---|
| 📖 ChromaDB Docs | trychroma.com | Official documentation |
| 🤗 Sentence Transformers | sbert.net | Embedding models & guide |
| 🏆 Model Leaderboard | huggingface.co/spaces/mteb | Compare all models |
| 🔬 MTEB Benchmark | github.com/embeddings-benchmark | Industry benchmarks |
- Python 3.8+
- chromadb >= 0.4.0
- sentence-transformers >= 2.2.0
- numpy >= 1.21.0
| File | Purpose |
|---|---|
benchmark.py |
🎯 Core benchmarking logic & runner |
benchmark_data.py |
📚 Test dataset & queries |
test.py |
📖 Basic embedding example |
test_v2.py |
💾 Persistent storage demonstration |
test_v3.py |
🔍 Semantic search demo |
requirements.txt |
📦 Python dependencies |
.gitignore |
🚫 Git exclusion rules |
pip install -r requirements.txtpython benchmark.py- Check terminal output for detailed metrics
- Compare models in the summary table
- Identify winner for your use case
- Add your own documents to
benchmark_data.py - Test additional embedding models
- Modify query tests for your domain
- Default: Instant setup, moderate query speed
- MiniLM: Best performance (fastest overall)
- MPNet: Higher latency, best accuracy
- Default: Basic semantic understanding
- MiniLM: Good understanding of document relationships
- MPNet: Excellent semantic comprehension
- Default: Minimal footprint
- MiniLM: ~384MB loaded (38M params)
- MPNet: ~1.2GB loaded (109M params)
Have improvements? Found a bug? Want to add:
- More embedding models for comparison?
- Different benchmark datasets?
- Additional metrics?
Feel free to fork and submit pull requests!
MIT License - Feel free to use this in your projects!
