Retrieval-Augmented Generation (RAG) evaluation can be categorized into several types:
- Relevance: Measures if the retrieved documents are useful.
- Faithfulness (Precision): Ensures that the generated output is accurate.
- Coherence: Evaluates if the text makes logical sense.
- Semantic Similarity: Measures similarity between generated text and reference data using embeddings.
For this project, we use Semantic Similarity. This approach measures how close the retrieved document is to the original query in meaning, ensuring high-quality retrieval.
- Semantic similarity is efficient and widely used in RAG applications.
- It helps quantify how much retrieved information aligns with the original query.
- It allows for automated and scalable evaluation using embeddings.
pip install faiss-cpu sentence-transformers numpy scikit-learn
python rag_evaluation.pyThe program logs:
- The query
- The retrieved document
- The computed similarity score